JP2010015088A

JP2010015088A - Data generating device, data generating program, and reproduction system

Info

Publication number: JP2010015088A
Application number: JP2008176844A
Authority: JP
Inventors: Yuichi Tsukamoto; 有一塚本; Isao Shindo; 功進藤
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2008-07-07
Filing date: 2008-07-07
Publication date: 2010-01-21
Anticipated expiration: 2028-07-07
Also published as: JP5184234B2

Abstract

PROBLEM TO BE SOLVED: To provide a data generating device for generating linked display data for displaying a text including a character string not provided with reading by a predetermined timing coinciding with sound reproduction. SOLUTION: The data generating device for generating linked display data for displaying a text by a predetermined timing coinciding with sound reproduction generates a language string by extracting a feature parameter from a spectrum component of a sound bandpass included in sound source data, classifies the text into a plurality of character strings for applying a reading to each character string, generates linked display data including the character strings provided with reading and time stamp information indicating a timing of reproduction of the character string in a language string with the same reading, and allots a time zone of a predetermined length to the character string not provided with reading based on the time stamp information of the character string immediately before the character string not provided with reading so that the character string not provided with reading and the time stamp information indicating the time zone allotted to the character string are included in the linked display data. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音声の再生に合わせてテキストを所定のタイミングで表示するための連動表示用データを生成するデータ生成装置及びデータ生成プログラム、並びに、再生装置に関する。 The present invention relates to a data generation device, a data generation program, and a playback device that generate linked display data for displaying text at a predetermined timing in accordance with the playback of audio.

音楽に合わせて画面に表示される歌詞を見ながら歌を歌うためのカラオケ機器が広く普及している。多くのカラオケ機器は、音楽の再生に合わせて歌詞を画面に表示し、歌われる部分の歌詞の表示色を適当なタイミングで滑らかに変えていく。例えば、グレーから白に歌詞の表示色を変えていく。このようなカラオケ機器は、音楽データ及び歌詞データの他に、音楽の再生に合わせて歌詞の表示色を変えるための連動表示用データを用いる。特許文献１及び２は、連動表示用データを作成するための技術について説明する。 Karaoke devices for singing songs while watching the lyrics displayed on the screen according to the music are widely used. Many karaoke devices display lyrics on the screen as music is played, and smoothly change the display color of the lyrics to be sung at an appropriate timing. For example, the display color of the lyrics is changed from gray to white. Such karaoke equipment uses linked display data for changing the display color of lyrics in accordance with music reproduction, in addition to music data and lyrics data. Patent Documents 1 and 2 describe a technique for creating linked display data.

特許文献１には、音声コンテンツと歌詞等のテキストデータとの間の同期タイミングを設定するリンクテーブルを作成するファイル作成装置が開示されている。当該ファイル作成装置はリンク作成部を備え、リンク作成部は、図１８に示すように、周波数弁別部２０１と、特徴抽出部２０２と、言語作成部２０３と、言語モデルデータベース２０４と、言語モデルメモリ部２０５と、フレームカウント部２０６と、経過時間算出部２０７と、テキスト分割部２０８と、テキストメモリ部２０９と、テーブル作成部２１１とを有する。 Patent Document 1 discloses a file creation apparatus that creates a link table for setting the synchronization timing between audio content and text data such as lyrics. The file creation apparatus includes a link creation unit, which, as shown in FIG. 18, includes a frequency discrimination unit 201, a feature extraction unit 202, a language creation unit 203, a language model database 204, and a language model memory. A section 205, a frame count section 206, an elapsed time calculation section 207, a text division section 208, a text memory section 209, and a table creation section 211.

周波数弁別部２０１は、フレームデータとして供給された音楽データの周波数スペクトルを解析し、人の音声帯域のスペクトル成分を抽出する。特徴抽出部２０２は、抽出されたスペクトル成分を音響分析して特徴パラメータを抽出する。言語作成部２０３は、特徴抽出部２０２で抽出された特徴パラメータと、音声モデルデータベース２０４に格納された各基準言語の特徴パラメータとを比較して言語列を生成する。音声モデルデータベース２０４は、五十音や濁音などの各言語モデルの特徴パラメータを言語モデル毎に記憶している。言語モデルメモリ部２０５は、言語作成部２０３で生成された言語列をテキストデータに変換して記憶する。 The frequency discriminating unit 201 analyzes the frequency spectrum of the music data supplied as the frame data, and extracts the spectrum component of the human voice band. The feature extraction unit 202 performs acoustic analysis on the extracted spectral components to extract feature parameters. The language creation unit 203 compares the feature parameters extracted by the feature extraction unit 202 with the feature parameters of each reference language stored in the speech model database 204 to generate a language string. The speech model database 204 stores feature parameters of each language model such as Japanese syllabary and muddy sound for each language model. The language model memory unit 205 converts the language string generated by the language creation unit 203 into text data and stores it.

フレームカウント部２０６は、供給されたフレーム数を供給開始から累積し、その累積フレーム数を経過時間算出部２０７に供給する。経過時間算出部２０７は、累積フレーム数に基づいて、再生開始時からの経過時間を算出する。テキスト分割部２０８は、テキストデータを複数のブロックに区分けする。テキストメモリ部２０９は、ブロック毎に区分けされたテキストデータを記憶する。 The frame count unit 206 accumulates the supplied number of frames from the start of supply, and supplies the accumulated number of frames to the elapsed time calculation unit 207. The elapsed time calculation unit 207 calculates the elapsed time from the start of playback based on the cumulative number of frames. The text dividing unit 208 divides the text data into a plurality of blocks. The text memory unit 209 stores text data divided into blocks.

マッチング部２１０は、テキストメモリ部２０９に記憶されたブロック毎のテキストデータと、言語モデルメモリ部２０５に記憶された言語列のテキストデータとを比較し、両者が整合したタイミングで識別信号をテーブル作成部２１１に送る。リンク作成部２１１は、マッチング部２１０から識別信号が供給されたタイミングで経過時間算出部２０７から経過時間データを取得し、この経過時間データを当該テキストブロックの再生タイミングとしてリンクテーブルに設定する。 The matching unit 210 compares the text data for each block stored in the text memory unit 209 with the text data of the language string stored in the language model memory unit 205, and creates a table of identification signals at the timing when both match. To the unit 211. The link creation unit 211 acquires elapsed time data from the elapsed time calculation unit 207 at the timing when the identification signal is supplied from the matching unit 210, and sets the elapsed time data in the link table as the reproduction timing of the text block.

また、特許文献２には、図１９に示すように、漢字仮名混じりの文字列で構成された歌詞情報を変換辞書部１６を用いて読みに変換し、当該変換した読みを音符等の楽譜情報に割り当てる歌詞割当装置が開示されている。変換辞書部１６は、所定の単語とその読み情報を対応して記憶する。 In Patent Document 2, as shown in FIG. 19, lyric information composed of character strings mixed with kanji characters is converted into readings using the conversion dictionary unit 16, and the converted readings are score information such as notes. A lyrics allocating device for allocating to is disclosed. The conversion dictionary unit 16 stores predetermined words and their reading information in association with each other.

特開２００３−２８０６７０号公報JP 2003-280670 A 特開２００２−８２６６５号公報JP 2002-82665 A

特許文献１のファイル作成装置は、上述のように、人の音声帯域のスペクトル成分を音響分析して特徴パラメータを抽出し、この特徴パラメータを音声モデルデータベース２０４に格納された各基準言語の特徴パラメータと比較して言語列を生成している。このため、言語列には句読点やクエスチョンマーク等の無音記号が含まれない。その結果、テキストメモリ部２０９に記憶されたテキストデータに無音記号が含まれていても、無音記号は言語作成部２０３によって生成された言語列とは整合しないため、当該無音記号を含む区分けされたテキストブロックには適した同期タイミングが設定されない可能性がある。また、特許文献１のファイル作成装置では、テキストブロックの先頭文字に対応する言語情報が抽出言語列の言語情報にマッチングした時点を比較開始時点とする事により、前記無音記号が含まれる場合の不一致をテキストブロック単位で抑制・回避しているが、テキストブロックが無音記号だけで構成されている場合、又は著しく多くの無音記号で構成されている場合には、適した同期タイミングが設定されない可能性がある。 As described above, the file creation apparatus of Patent Document 1 acoustically analyzes a spectral component of a human voice band to extract a feature parameter, and extracts the feature parameter from the feature parameter of each reference language stored in the speech model database 204. Compared with, the language string is generated. For this reason, the language string does not include silent symbols such as punctuation marks and question marks. As a result, even if silence data is included in the text data stored in the text memory unit 209, the silence symbol does not match the language string generated by the language creation unit 203, and therefore, the silence data including the silence symbol is classified. There is a possibility that a suitable synchronization timing is not set for the text block. Further, in the file creation device of Patent Document 1, the time when the language information corresponding to the first character of the text block matches the language information of the extracted language string is set as the comparison start time, so that the mismatch occurs when the silence symbol is included. May be suppressed or avoided in units of text blocks, but if the text block consists only of silence symbols, or if it consists of a significant number of silence symbols, the appropriate synchronization timing may not be set. There is.

また、特許文献２の歌詞割当装置は、上述のように、変換辞書部１６を用いて歌詞情報を読みに変換する。しかし、当該歌詞割当装置は、読みを持たない句読点やクエスチョンマーク等の無音記号は変換辞書部１６には登録されていないため、変換できない。このため、無音記号を楽譜情報に割り当てることができない。 Further, the lyrics allocating device of Patent Document 2 converts the lyrics information into readings using the conversion dictionary unit 16 as described above. However, the lyrics allocating device cannot convert silence symbols such as punctuation marks and question marks that do not have readings because they are not registered in the conversion dictionary unit 16. For this reason, a silence symbol cannot be assigned to the score information.

また、当該歌詞割当装置は、変換辞書部１６に記録されていない固有名詞等の文字列を正確な読みに変換できない可能性がある。本来とは異なる読みを楽譜情報に割り当てると、適切なタイミングで歌詞が表示されない可能性があるため好ましくない。なお、ユーザは、変換辞書の内容を追加及び削除できる。このため、上述の固有名詞等の文字列とその読みを新たに追加できるが、この作業は、変換辞書の内容を変更するための操作方法を知るユーザが時間と手間をかけて行わなければならない。 In addition, there is a possibility that the lyrics allocating device cannot convert a character string such as a proper noun recorded in the conversion dictionary unit 16 into an accurate reading. It is not preferable to assign a reading different from the original to the musical score information because the lyrics may not be displayed at an appropriate timing. The user can add and delete the contents of the conversion dictionary. For this reason, character strings such as the above-mentioned proper nouns and their readings can be newly added, but this work has to be done by the user who knows the operation method for changing the contents of the conversion dictionary, taking time and effort. .

本発明の目的は、読みが付与されない文字列又は正確な読みが付与されなかった文字列を含むテキストを音声の再生に合わせた所定のタイミングで表示するための連動表示用データを生成するデータ生成装置及びデータ生成プログラム、並びに、再生装置を提供することである。 An object of the present invention is to generate data for generating interlocking display data for displaying text including a character string to which no reading is given or a character string to which an accurate reading is not given at a predetermined timing in accordance with audio reproduction. An apparatus, a data generation program, and a playback apparatus are provided.

本発明は、音声の再生に合わせてテキストを所定のタイミングで表示するための連動表示用データを生成するデータ生成装置であって、音源データに含まれる音声帯域のスペクトル成分から特徴パラメータを抽出し、当該抽出した特徴パラメータを所定の言語の特徴パラメータと比較して言語列を生成する言語列生成部と、前記テキストを複数の文字列に区分けして、各文字列に読みを付与するテキスト処理部と、読みが付与された各文字列を前記言語列と比較して、対象とされている文字列と、当該対象とされている文字列と読みが一致する前記言語列内の文字列が再生されるタイミングを前記音源データの再生経過時間によって示すタイムスタンプ情報とを含む連動表示用データを生成する第１データ処理部と、読みが付与されなかった文字列の直前の文字列のタイムスタンプ情報に基づいて、前記読みが付与されなかった文字列に所定長の時間帯を割り当てて、当該読みが付与されなかった文字列と、この文字列に割り当てた時間帯を示すタイムスタンプ情報とを前記連動表示用データに含める第２データ処理部と、を備えたデータ生成装置を提供する。 The present invention is a data generation device for generating linked display data for displaying text at a predetermined timing in accordance with voice reproduction, and extracts a feature parameter from a spectral component of a voice band included in sound source data. A language string generation unit that generates a language string by comparing the extracted feature parameter with a feature parameter of a predetermined language; and text processing that divides the text into a plurality of character strings and gives a reading to each character string Each character string to which the reading is given is compared with the language string, and the target character string and the character string in the language string whose reading matches the target character string are A first data processing unit for generating interlocking display data including time stamp information indicating the playback timing by the elapsed playback time of the sound source data, and no reading was given Based on the time stamp information of the character string immediately before the character string, a predetermined length of time zone is assigned to the character string to which the reading is not given, and the character string to which the reading is not given and the character string assigned to the character string. There is provided a data generation device comprising a second data processing unit including time stamp information indicating a specific time zone in the interlocking display data.

上記データ生成装置は、前記テキスト処理部によって読みが付与されなかった文字列又は読みが一致する文字列が前記言語列にない文字列と、当該文字列のテキスト内位置を示す情報とを記憶する位置管理部を備え、前記第２データ処理部は、前記第１データ処理部が、前記テキストに含まれる読みが付与された文字列の全てと、読みが付与された各文字列のタイムスタンプ情報とを含む連動表示用データを生成した後、前記位置管理部に記録された各文字列に所定長の時間帯を割り当てる。 The data generation apparatus stores a character string that has not been read by the text processing unit or a character string that does not have a matching character string in the language string, and information that indicates the position of the character string in the text. A position management unit, wherein the second data processing unit is configured such that the first data processing unit includes all of the character strings to which the reading included in the text is given and the time stamp information of each character string to which the reading is given. Are generated, and a predetermined time period is assigned to each character string recorded in the position management unit.

上記データ生成装置では、前記第２データ処理部は、前記位置管理部が記憶する対象とされた文字列の直前の文字列のタイムスタンプ情報が示す終了時間から直後の文字列のタイムスタンプ情報が示す開始時間までの時間長を算出し、前記時間長が所定時間以上であれば、前記終了時間から前記開始時間までの時間内の所定長の時間帯を、前記対象とされた文字列に割り当て、前記時間長が所定時間未満であれば、前記直前の文字列に割り当てられた時間帯及び前記直後の文字列に割り当てられた時間帯の少なくともいずれか一方を短縮し、当該短縮した時間帯を、前記対象とされた文字列に割り当てる。 In the data generation device, the second data processing unit may receive the time stamp information of the character string immediately after the end time indicated by the time stamp information of the character string immediately before the character string to be stored by the position management unit. A time length up to the start time shown is calculated, and if the time length is equal to or longer than a predetermined time, a time zone of a predetermined length within the time from the end time to the start time is assigned to the target character string If the time length is less than a predetermined time, at least one of the time zone assigned to the immediately preceding character string and the time zone assigned to the immediately following character string is shortened, and the shortened time zone is , Assign to the target character string.

上記データ生成装置では、前記第２データ処理部は、前記直前の文字列に割り当てられた時間帯及び前記直後の文字列に割り当てられた時間帯のいずれか一方を前記所定長の時間短縮する。 In the data generation device, the second data processing unit shortens one of a time zone assigned to the immediately preceding character string and a time zone assigned to the immediately following character string by the predetermined length.

上記データ生成装置では、前記第２データ処理部は、前記終了時間から前記開始時間までの時間帯及び前記短縮した時間帯を、前記読みが付与されなかった文字列に割り当てる。 In the data generation device, the second data processing unit assigns the time zone from the end time to the start time and the shortened time zone to the character string to which the reading is not given.

上記データ生成装置では、前記位置管理部に記録される文字列のテキスト内位置を示す情報は、当該文字列の前記テキストの先頭からの文字数を示す。 In the data generation device, the information indicating the position in the text of the character string recorded in the position management unit indicates the number of characters from the beginning of the text of the character string.

上記データ生成装置では、前記位置管理部に記録される文字列のテキスト内位置を示す情報は、前記テキストを所定の表示形態で表示したときの表示領域における行番号及び列番号を示す。 In the data generation device, the information indicating the position in the text of the character string recorded in the position management unit indicates a row number and a column number in a display area when the text is displayed in a predetermined display form.

上記データ生成装置では、前記位置管理部に記録される文字列のテキスト内位置を示す情報は、当該文字列の前記テキストの先頭の文字列からの文字列番号を示す。 In the data generation device, the information indicating the position in the text of the character string recorded in the position management unit indicates a character string number from the first character string of the text of the character string.

上記データ生成装置では、前記位置管理部に記録される文字列のテキスト内位置を示す情報は、前記テキストに含まれる前記テキスト処理部で区分けされた各文字列同士の順序関係を示す。 In the data generation apparatus, the information indicating the position in the text of the character string recorded in the position management unit indicates an order relationship between the character strings classified by the text processing unit included in the text.

本発明は、音声の再生に合わせてテキストを所定のタイミングで表示するための連動表示用データを生成するデータ生成装置であって、音源データに含まれる音声帯域のスペクトル成分から特徴パラメータを抽出し、当該抽出した特徴パラメータを所定の言語の特徴パラメータと比較して言語列を生成する言語列生成部と、前記テキストを複数の文字列に区分けして、各文字列に読みを付与するテキスト処理部と、読みが付与された各文字列を前記言語列と比較して、対象とされている文字列と、当該対象とされている文字列と読みが一致する前記言語列内の文字列が再生されるタイミングを前記音源データの再生経過時間によって示すタイムスタンプ情報とを含む連動表示用データを生成するデータ処理部と、読みが付与されなかった文字列に全ての前記言語列に相当する特別な読みを付与する特別読み付与部と、前記データ処理部は、前記特別な読みが付与された文字列の直前の文字列のタイムスタンプ情報に基づいて、前記特別な読みが付与された文字列と前記言語列とを比較し、読みが一致する前記言語列内の文字列が再生されるタイミングを当該特別な読みが付与された文字列に割り当てて、当該特別な読みが付与された文字列と、この文字列に割り当てた時間帯を示すタイムスタンプ情報とを前記連動表示用データに含めるデータ生成装置を提供する。 The present invention is a data generation device for generating linked display data for displaying text at a predetermined timing in accordance with voice reproduction, and extracts a feature parameter from a spectral component of a voice band included in sound source data. A language string generation unit that generates a language string by comparing the extracted feature parameter with a feature parameter of a predetermined language; and text processing that divides the text into a plurality of character strings and gives a reading to each character string Each character string to which the reading is given is compared with the language string, and the target character string and the character string in the language string whose reading matches the target character string are A data processing unit for generating linked display data including time stamp information indicating the playback timing by the elapsed playback time of the sound source data, and characters that are not read A special reading assigning unit that gives special readings corresponding to all the language strings, and the data processing unit, based on time stamp information of a character string immediately before the character string to which the special reading is given, Compare the character string with the special reading and the language string, and assign the timing at which the character string in the language string that matches the reading is reproduced to the character string with the special reading; There is provided a data generation device that includes a character string to which the special reading is given and time stamp information indicating a time zone assigned to the character string in the linked display data.

上記データ生成装置は、読みが付与されなかった文字列に含まれるいずれかの文字に対し、読みを推測する読み推測部を備え、前記特別読み付与部は、前記読み推測部が読みを推測できなかった文字列に前記特別な読みを付与する。 The data generation device includes a reading estimation unit that estimates reading for any character included in a character string to which reading is not given, and the special reading giving unit can guess reading by the reading estimation unit. The special reading is given to the missing character string.

上記データ生成装置では、前記データ処理部は、前記特別な読みが付与された文字列の直前の文字列のタイムスタンプ情報が示す終了時間から所定長の時間帯を、前記特別な読みが付与された文字列に割り当てる。 In the data generation device, the data processing unit is provided with the special reading for a predetermined time period from the end time indicated by the time stamp information of the character string immediately before the character string to which the special reading is given. Assign to the specified string.

本発明は、コンピュータを、上記データ生成装置が備える各部として機能させるためのデータ生成プログラムを提供する。 The present invention provides a data generation program for causing a computer to function as each unit included in the data generation apparatus.

本発明は、上記データ生成装置又は上記データ生成プログラムで作成した連動表示用データに基づき、音声の再生に合わせてテキスト内の読みを付与できない文字列を所定のタイミングで表示する再生装置を提供する。 The present invention provides a playback device that displays a character string that cannot be read in a text at a predetermined timing in accordance with voice playback based on the interlocking display data created by the data generation device or the data generation program. .

本発明は、上記データ生成装置又は上記データ生成プログラムで作成した連動表示用データに基づき、音声の再生に合わせてテキスト内の読みを付与できない文字列を所定のタイミングで表示し、その表示部分を利用してユーザの選択箇所を取得する事により、読みの付与できない文字列から再生を開始する再生装置を提供する。 The present invention displays a character string that cannot be read in the text at a predetermined timing in accordance with the reproduction of the voice based on the interlocking display data created by the data generation device or the data generation program, and displays the display portion. Provided is a playback device that starts playback from a character string that cannot be read by using a user-selected location.

本発明に係るデータ生成装置及びデータ生成プログラム、並びに、再生装置によれば、読みが付与されない文字列又は正確な読みが付与されなかった文字列を含むテキストを音声の再生に合わせた所定のタイミングで表示するための連動表示用データを生成ことができる。 According to the data generation device, the data generation program, and the playback device according to the present invention, the predetermined timing in which the text including the character string to which the reading is not given or the character string to which the accurate reading is not given is adapted to the voice reproduction. It is possible to generate interlocking display data for display with.

以下、本発明の実施形態について、図面を参照して説明する。以下説明する実施形態のデータ生成装置は、音源データ及びテキストデータに基づいて、音源データの再生に合わせた所定のタイミングでテキストを表示するための連動表示用データを生成する。なお、テキストは、句読点や特殊文字を含む日本語又は外国語の文章、歌詞等である。また、テキストを表示するとは、例えば、テキストの表示色の透過率を５０％から０％に変更したり、テキストの表示色を別の見やすい色に変更することや、テキストが全く表示されていない状態から表示することである。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. A data generation apparatus according to an embodiment described below generates linked display data for displaying text at a predetermined timing in accordance with reproduction of sound source data, based on sound source data and text data. The text is Japanese or foreign language sentences, lyrics, etc. including punctuation marks and special characters. In addition, when displaying text, for example, the transparency of the display color of the text is changed from 50% to 0%, the display color of the text is changed to another easy-to-see color, or the text is not displayed at all. It is to display from the state.

データ生成装置が連動表示用データを生成する際、テキストは１文字以上の文字で構成される文字列に分けられる。この文字列は、形態素（言語で意味を持つ最小単位）でも文字１つでも音素（言語の持つ音の最小単位）でも構わない。生成された連動表示用データは、テキストデータと、テキストを細分化した各文字列に割り当てられた固有の時間情報であるタイムスタンプ情報とを含む。タイムスタンプ情報は、各文字列を表示する開始時間と、その文字列を表示し終えるまでの時間長又は終了時間とを含む。これら開始時間及び終了時間は、音声の所定の再生経過時間によって示される。 When the data generating device generates the interlocking display data, the text is divided into character strings composed of one or more characters. This character string may be a morpheme (minimum unit having meaning in the language), a single character, or a phoneme (minimum unit of sound possessed by the language). The generated interlocking display data includes text data and time stamp information which is unique time information assigned to each character string obtained by subdividing the text. The time stamp information includes a start time for displaying each character string and a time length or an end time until the character string is displayed. These start time and end time are indicated by a predetermined playback elapsed time of the sound.

（第１の実施形態）
図１は、第１の実施形態のデータ生成装置を示すブロック図である。図１に示すように、第１の実施形態のデータ生成装置１００は、入力部１０１と、データ生成部１０３と、位置管理部１０５と、データ加工部１０７と、データ記憶部１０９とを備える。なお、データ生成部１０３、位置管理部１０５及びデータ加工部１０７はプログラムを実行することによって動作する。 (First embodiment)
FIG. 1 is a block diagram illustrating a data generation apparatus according to the first embodiment. As illustrated in FIG. 1, the data generation device 100 according to the first embodiment includes an input unit 101, a data generation unit 103, a location management unit 105, a data processing unit 107, and a data storage unit 109. The data generation unit 103, the position management unit 105, and the data processing unit 107 operate by executing a program.

入力部１０１は、音源データ及びテキストデータをデータ生成装置１００に入力するためのインタフェースである。データ生成部１０３は、入力部１０１を介して入力された音源データとテキストデータのマッチングを行い、連動表示用データを生成する。なお、データ生成部１０３による音源データとテキストデータのマッチングの詳細については後述する。位置管理部１０５は、データ生成部１０３における処理の過程で読みが付与されなかった文字列や音源データに基づくテキストとマッチングしない文字列とそのテキスト内位置を示す情報を記憶する。なお、音源データに基づくテキストとマッチングしない文字列は、データ生成部１０３による処理の過程で、当該文字列に正確な読みが付与されなかった結果、音源データに基づくテキストとマッチングしないことが考えられる。 The input unit 101 is an interface for inputting sound source data and text data to the data generation apparatus 100. The data generation unit 103 performs matching between the sound source data input via the input unit 101 and the text data, and generates linked display data. Details of the matching between the sound source data and the text data by the data generation unit 103 will be described later. The position management unit 105 stores a character string that has not been read during the process of the data generation unit 103 or a character string that does not match text based on sound source data and information indicating the position in the text. It should be noted that a character string that does not match the text based on the sound source data may not match the text based on the sound source data as a result of an accurate reading being not given to the character string in the process of the data generation unit 103. .

データ加工部１０７は、読みが付与されなかった文字列や音源データに基づくテキストとマッチングしない文字列にタイムスタンプ情報を割り当てて、データ生成部１０３によって生成された連動表示用データを更新する。データ記憶部１０９は、データ生成部１０３によって生成され、データ加工部１０７によって更新された連動表示用データを記憶する。 The data processing unit 107 assigns time stamp information to a character string that does not match the text based on the character string and the sound source data to which reading is not applied, and updates the interlocking display data generated by the data generation unit 103. The data storage unit 109 stores the interlocking display data generated by the data generation unit 103 and updated by the data processing unit 107.

なお、位置管理部１０５に記録されるテキスト内位置を示す情報は、複数の形態の中のいずれか１つの形態によって示される。複数の形態とは、（１）テキストデータが示すテキストの先頭からの文字数、（２）テキストデータが示すテキストを所定の表示形態で表示したときの表示領域における行番号及び列番号、（３）テキストデータが示すテキストを形態素解析処理等を行うことによって複数の文字列に区分けした際の先頭の文字列からの文字列番号、（４）テキストデータが示すテキストを形態素解析処理等を行うことによって複数の文字列に区分けした際の区分けされた各文字列同士の順序関係（例えば音声対話記述言語であるVoiceXML等のデータ構造が理解しやすい形式で記述されるのが好ましい。）に基づく記述情報である。 Note that the information indicating the position in the text recorded in the position management unit 105 is shown in any one of a plurality of forms. The plurality of forms are (1) the number of characters from the beginning of the text indicated by the text data, (2) the row number and column number in the display area when the text indicated by the text data is displayed in a predetermined display form, and (3) Character string number from the first character string when the text indicated by the text data is divided into a plurality of character strings by performing morphological analysis processing, etc., (4) By performing morphological analysis processing on the text indicated by the text data Descriptive information based on the order relationship between the divided character strings when divided into a plurality of character strings (for example, a data structure such as VoiceXML which is a spoken dialogue description language is preferably described in an easy-to-understand format). It is.

図２は、第１の実施形態のデータ生成装置１００の動作を示すフローチャートである。図２に示すように、ステップＳ１０１では、入力部１０１を介して入力された音源データ及びテキストデータがデータ生成部１０３に送られる。次に、ステップＳ１０３では、データ生成部１０３が音源データとテキストデータのマッチングを行い、連動表示用データを生成する。次に、ステップＳ１０５では、データ加工部１０７が、読みを付与できなかった文字列にタイムスタンプ情報を割り当てて、ステップＳ１０３で生成された連動表示用データを更新する。次に、ステップＳ１０７では、データ加工部１０７が、ステップＳ１０５で更新した連動表示用データをデータ記憶部１０９に格納する。 FIG. 2 is a flowchart illustrating the operation of the data generation device 100 according to the first embodiment. As shown in FIG. 2, in step S 101, sound source data and text data input via the input unit 101 are sent to the data generation unit 103. Next, in step S103, the data generation unit 103 performs matching between the sound source data and the text data to generate linked display data. Next, in step S105, the data processing unit 107 assigns time stamp information to the character string that cannot be read, and updates the interlocking display data generated in step S103. Next, in step S107, the data processing unit 107 stores the interlocking display data updated in step S105 in the data storage unit 109.

図３及び図４は、第１の実施形態のデータ生成装置１００が行うステップＳ１０３の詳細を示すフローチャートである。図３に示すように、データ生成部１０３は、図２のステップＳ１０１で入力された音源データの周波数スペクトルを解析し、人の音声帯域のスペクトル成分を抽出する（ステップＳ２０１）。次に、データ生成部１０３は、ステップＳ２０１で抽出したスペクトル成分を音響分析して特徴パラメータを抽出する（ステップＳ２０３）。次に、データ生成部１０３は、ステップＳ２０３で抽出した特徴パラメータを所定の言語の特徴パラメータと比較して言語列を生成する（ステップＳ２０５）。次に、データ生成部１０３は、ステップＳ２０５で生成した言語列をテキストデータに変換する（ステップＳ２０７）。なお、ステップＳ２０５で用いられる所定の言語の特徴パラメータが日本語の特徴パラメータの場合、ステップＳ２０７で得られる言語列のテキストデータはひらがな又はカタカナである。 3 and 4 are flowcharts showing details of step S103 performed by the data generation apparatus 100 according to the first embodiment. As shown in FIG. 3, the data generation unit 103 analyzes the frequency spectrum of the sound source data input in step S101 of FIG. 2, and extracts a spectrum component of the human voice band (step S201). Next, the data generation unit 103 acoustically analyzes the spectrum component extracted in step S201 and extracts a feature parameter (step S203). Next, the data generation unit 103 generates a language string by comparing the feature parameter extracted in step S203 with a feature parameter of a predetermined language (step S205). Next, the data generation unit 103 converts the language string generated in step S205 into text data (step S207). If the feature parameter of the predetermined language used in step S205 is a Japanese feature parameter, the text data of the language string obtained in step S207 is hiragana or katakana.

次に、データ生成部１０３は、図２のステップＳ１０１で入力されたテキストデータが示すテキストを、形態素解析等の構造解析処理を行うことによって複数の文字列に区分けする（ステップＳ２０９）。図２のステップＳ１０１で入力されたテキストデータが日本語の場合、当該テキストには漢字や句読点等が含まれている可能性が高い。一方、ステップＳ２０７で得られた言語列のテキストデータはひらがな又はカタカナである。このため、図４に示すように、データ生成部１０３は、図示しない読み変換辞書を用いて、ステップＳ２０９で区分けされた各文字列に読みを付与する（ステップＳ２１１）。 Next, the data generation unit 103 divides the text indicated by the text data input in step S101 of FIG. 2 into a plurality of character strings by performing structural analysis processing such as morphological analysis (step S209). If the text data input in step S101 in FIG. 2 is Japanese, there is a high possibility that the text contains kanji or punctuation marks. On the other hand, the text data of the language string obtained in step S207 is hiragana or katakana. For this reason, as shown in FIG. 4, the data generation unit 103 gives a reading to each character string divided in step S209 using a reading conversion dictionary (not shown) (step S211).

次に、データ生成部１０３は、ステップＳ２１１で読みが付与されたか否かを文字列毎に判断する（ステップＳ２１３）。データ生成部１０３は、ステップ２１３で読みが付与されなかったと判断された文字列とそのテキスト内位置を示す情報を位置管理部１０５に登録する（ステップＳ２１５）。一方、データ生成部１０３は、ステップ２１３で読みが付与されたと判断された文字列をステップＳ２０７で得られた言語列のテキストデータと比較して（ステップＳ２１７）、言語列のテキストデータが示すテキスト内に対象とされている文字列と読みの一致する文字列があるか否かを判断する（ステップＳ２１９）。 Next, the data generation unit 103 determines, for each character string, whether or not reading is given in step S211 (step S213). The data generation unit 103 registers, in the position management unit 105, the character string that is determined not to be read in step 213 and information indicating the position in the text in the position management unit 105 (step S215). On the other hand, the data generation unit 103 compares the character string determined to have been read in step 213 with the text data of the language string obtained in step S207 (step S217), and the text indicated by the text data of the language string It is determined whether or not there is a character string whose reading matches the target character string (step S219).

ステップＳ２１９で、対象とされている文字列が言語列のテキスト内にあればステップＳ２２１に進み、言語列のテキスト内になければステップＳ２１５に進み、当該対象とされている文字列とそのテキスト内位置を示す情報を位置管理部１０５に記録する。ステップＳ２２１では、データ生成部１０３が、対象とされている文字列及びこの文字列に対応するタイムスタンプ情報を連動表示用データに書き出す。なお、各文字列のタイムスタンプ情報は、音源データを再生した際に、対象とされている文字列が発声されるタイミングに応じて決定される。 If it is determined in step S219 that the target character string is within the text of the language string, the process proceeds to step S221. If the character string is not within the text of the language string, the process proceeds to step S215. Information indicating the position is recorded in the position management unit 105. In step S221, the data generation unit 103 writes the target character string and time stamp information corresponding to the character string in the interlocking display data. The time stamp information of each character string is determined according to the timing at which the target character string is uttered when the sound source data is reproduced.

ステップＳ２２１を行った後、データ生成部１０３は、ステップＳ２１７で文字列と比較する言語列のテキストデータの解析対象位置を時間情報に基づき更新する（ステップＳ２２３）。データ生成部１０３は、ステップＳ２０９で区分けされた文字列の全てに対して、ステップＳ２１５における位置管理部１０５への記録又はステップＳ２２１における連動表示用データへの書き出しが行われたかを判断し（ステップＳ２２５）、全ての文字列に対して当該処理が行われた場合はこのサブファンクションを終了し、そうでなければステップＳ２１１に戻る。 After performing step S221, the data generation unit 103 updates the analysis target position of the text data of the language string to be compared with the character string in step S217 based on the time information (step S223). The data generation unit 103 determines whether all of the character strings classified in step S209 have been recorded in the position management unit 105 in step S215 or written in the interlocking display data in step S221 (step S215). S225) If the process has been performed for all character strings, this sub-function is terminated. Otherwise, the process returns to step S211.

図５は、第１の実施形態のデータ生成装置１００が行うステップＳ１０５の詳細を示すフローチャートである。図５に示すように、データ加工部１０７は、位置管理部１０５から文字列とそのテキスト内位置を示す情報を読み込む（ステップＳ３０１）。次に、データ加工部１０７は、ステップＳ１０３で生成した連動表示用データに基づいて、ステップＳ３０１で読み込んだ文字列の前後の文字列を特定し、前の文字列の終了時間から後の文字列の開始時間までの時間長を計算する（ステップＳ３０３）。 FIG. 5 is a flowchart illustrating details of step S105 performed by the data generation device 100 according to the first embodiment. As shown in FIG. 5, the data processing unit 107 reads a character string and information indicating the position in the text from the position management unit 105 (step S301). Next, the data processing unit 107 specifies character strings before and after the character string read in step S301 based on the interlocking display data generated in step S103, and the character string after the end time of the previous character string. The time length up to the start time is calculated (step S303).

次に、データ加工部１０７は、ステップＳ３０３で算出した時間長が例えば１０ｍ秒といった所定時間以上であるかを判断し（ステップＳ３０５）、時間長が所定時間以上であればステップＳ３０７に進み、所定時間未満であればステップＳ３０９に進む。ステップＳ３０７で、データ加工部１０７は、ステップＳ３０１で読み込んだ文字列に、前の文字列の終了時間から後の文字列の開始時間までの時間内の所定長の時間帯を割り当てて、連動表示用データを更新する。すなわち、ステップＳ３０７では、ステップＳ３０１で読み込んだ文字列及びこの文字列に割り当てた時間帯を示すタイムスタンプ情報を連動表示用データに書き出す。 Next, the data processing unit 107 determines whether the time length calculated in step S303 is equal to or longer than a predetermined time such as 10 milliseconds (step S305). If the time length is equal to or longer than the predetermined time, the data processing unit 107 proceeds to step S307. If it is less than the time, the process proceeds to step S309. In step S307, the data processing unit 107 assigns a time period of a predetermined length within the time from the end time of the previous character string to the start time of the subsequent character string to the character string read in step S301, and displays the interlocked display. Update the data. That is, in step S307, the character string read in step S301 and the time stamp information indicating the time zone assigned to the character string are written in the interlocking display data.

一方、ステップＳ３０９で、データ加工部１０７は、ステップＳ３０１で読み込んだ文字列に所定長の時間帯を割り当てるよう、当該文字列の前後の文字列に割り当てられた２つの時間帯の少なくともいずれか一方を短縮し、その短縮した時間帯を当該文字列に割り当てた上で、連動表示用データを更新する。データ加工部１０７は、当該文字列の前の文字列に割り当てられた時間帯を短縮する場合、当該文字列の前の文字列の時間帯の終了時間を所定時間長早め、当該設定変更によって空いた時間帯を当該文字列に割り当てる。また、データ加工部１０７は、当該文字列の後の文字列に割り当てられた時間帯を短縮する場合、当該文字列の後の文字列の時間帯の開始時間を所定時間長遅らせ、当該設定変更によって空いた時間帯を当該文字列に割り当てる。
尚、当該文字列の前後の文字列に割り当てられた時間帯を短縮する場合、当該文字列の特徴（開始文字、終了文字等）に応じて前後のどちらの文字列に割り当てられた時間帯を短縮するか判断してもよい。 On the other hand, in step S309, the data processing unit 107 assigns at least one of the two time zones assigned to the character strings before and after the character string so as to assign a predetermined time zone to the character string read in step S301. And the linked display data is updated after assigning the shortened time zone to the character string. When the data processing unit 107 shortens the time zone assigned to the character string before the character string, the data processing unit 107 shortens the end time of the time zone of the character string before the character string by a predetermined time, and is freed by the setting change. Assign the specified time zone to the string. Further, when shortening the time zone assigned to the character string after the character string, the data processing unit 107 delays the start time of the time zone of the character string after the character string by a predetermined time length, and changes the setting. Assign a free time zone to the string.
In addition, when shortening the time zone assigned to the character string before and after the character string, the time zone assigned to either the character string before or after the character string (start character, end character, etc.) You may decide whether to shorten.

なお、データ加工部１０７は、当該文字列の前の文字列に割り当てられた時間帯の終了時間を早め、かつ、当該文字列の後の文字列に割り当てられた時間帯の開始時間を遅らせることによって、当該設定変更によって空いた時間帯を当該文字列に割り当てても良い。さらに、当該文字列の前後の文字列から短縮する時間長は、ステップＳ３０３で算出した時間長に応じて調整しても良い。すなわち、ステップＳ３０３で算出した時間長と、当該文字列の前後の文字列から短縮して得られた時間長との和が所定長となるよう、当該文字列の前後の文字列の時間帯を短縮しても良い。 The data processing unit 107 advances the end time of the time zone assigned to the character string before the character string, and delays the start time of the time zone assigned to the character string after the character string. Thus, a time zone vacated by the setting change may be assigned to the character string. Furthermore, the time length shortened from the character strings before and after the character string may be adjusted according to the time length calculated in step S303. That is, the time zone of the character string before and after the character string is set so that the sum of the time length calculated in step S303 and the time length obtained by shortening the character string before and after the character string becomes a predetermined length. It may be shortened.

ステップＳ３０７又はステップＳ３０９が行われた後、データ加工部１０７は、位置管理部１０５に登録された全ての文字列に対して上記処理が行われたかを判断し（ステップＳ３１１）、全ての文字列に対して当該処理が行われた場合はこのサブファンクションを終了し、そうでなければステップＳ３０１に戻る。 After step S307 or step S309 has been performed, the data processing unit 107 determines whether or not the above processing has been performed on all character strings registered in the position management unit 105 (step S311), and all character strings. If the process is performed, the subfunction is terminated. Otherwise, the process returns to step S301.

本実施形態のデータ生成装置１００には、図６に示すように、ユーザからの指示に応じて音源データの再生や連動表示用データに基づくテキストの表示等を行う再生装置５０を接続しても良い。再生装置５０には、操作受付部５１、再生制御部５３、再生部５５及び表示部５７が設けられている。操作受付部５１はユーザによる再生装置５０の操作を受け付け、再生制御部５３は操作内容に応じた処理を行う。当該処理によって、再生部５５は音源データを再生し、表示部５７はテキストを表示する。このとき、再生制御部５３は、データ生成装置１００のデータ記憶部１０９に格納されている連動表示用データに基づいて、音源データの再生に合わせて読みが付与できないまたは読みが正確でない文字列を含むテキストを所定のタイミングで表示するよう処理する。図７は、表示部５７に表示されるテキストの一例を示す図である。 As shown in FIG. 6, the data generating apparatus 100 of the present embodiment may be connected with a reproducing apparatus 50 that reproduces sound source data or displays text based on linked display data in accordance with an instruction from the user. good. The playback device 50 includes an operation receiving unit 51, a playback control unit 53, a playback unit 55, and a display unit 57. The operation receiving unit 51 receives an operation of the playback device 50 by the user, and the playback control unit 53 performs a process according to the operation content. By this processing, the playback unit 55 plays back the sound source data, and the display unit 57 displays text. At this time, based on the interlocking display data stored in the data storage unit 109 of the data generation device 100, the playback control unit 53 selects a character string that cannot be read in accordance with the playback of the sound source data or that is not accurately read. Processing is performed so that the contained text is displayed at a predetermined timing. FIG. 7 is a diagram illustrating an example of text displayed on the display unit 57.

また、操作受付部５１の一形態として、表示部５７上にタッチパネルが設けられている場合、ユーザは表示部５７に表示された読みが付与できないまたは読みが正確でない文字列を含むテキスト中の所望の文字列を選択することができる。再生制御部５３は、当該操作に応じて、選択された文字列に対応する再生時間から音源データを再生し、かつ、テキストを表示するよう処理する。尚、選択対象となる文字列の単位は形態素で区切られた各文字列でも、表示部５７における表示上の行でも問題ない。また、選択対象となる各文字列に読みが付与されない文字列のみで構成されている文字列がある場合、前後の読みが付与されている文字列に含める事で、選択対象となる文字列には必ず読みを持つ文字列が含まれるようにしても良い。 Further, when a touch panel is provided on the display unit 57 as one form of the operation receiving unit 51, the user cannot give a reading displayed on the display unit 57 or a desired text in a text including a character string that is not correctly read. Can be selected. In response to the operation, the reproduction control unit 53 reproduces the sound source data from the reproduction time corresponding to the selected character string and performs processing to display the text. The unit of the character string to be selected may be either a character string delimited by morphemes or a display line on the display unit 57. In addition, if there is a character string consisting only of character strings that are not given readings to each character string to be selected, by including it in the character strings to which the previous and next readings are given, May include a character string with reading.

以上説明したように、本実施形態のデータ生成装置１００によれば、読みが付与されない文字列又は正確な読みが付与されなかった文字列に、所定長の時間帯が割り当てられる。したがって、本実施形態のデータ生成装置１００によって生成及び更新された連動表示用データに基づくテキストの表示では、句読点やクエスチョンマーク等の無音記号で表される文字列、又は固有名詞や略字等の正確な読みが付与されにくい文字列を含むテキストが、音源の再生に合わせて実際に近いタイミングで表示される。さらに、ユーザは、このような文字列を再生開始点として指定することができる。 As described above, according to the data generation device 100 of the present embodiment, a predetermined time period is assigned to a character string to which no reading is given or a character string to which no accurate reading is given. Therefore, in the display of text based on the interlocking display data generated and updated by the data generation apparatus 100 of the present embodiment, character strings represented by silence symbols such as punctuation marks and question marks, or proper nouns and abbreviations are accurately displayed. A text containing a character string that is difficult to read is displayed at a timing close to the actual playback of the sound source. Furthermore, the user can designate such a character string as a reproduction start point.

（第２の実施形態）
図８は、第２の実施形態のデータ生成装置を示すブロック図である。図８に示すように、第２の実施形態のデータ生成装置１１０は、入力部１１１と、データ生成部１１３と、特別読み付与部１１５と、データ記憶部１１７とを備える。なお、データ生成部１１３及び特別読み付与部１１５はプログラムを実行することによって動作する。 (Second Embodiment)
FIG. 8 is a block diagram illustrating a data generation apparatus according to the second embodiment. As illustrated in FIG. 8, the data generation device 110 according to the second embodiment includes an input unit 111, a data generation unit 113, a special reading provision unit 115, and a data storage unit 117. Note that the data generation unit 113 and the special reading provision unit 115 operate by executing a program.

入力部１１１は、音源データ及びテキストデータをデータ生成装置１１０に入力するためのインタフェースである。データ生成部１１３は、入力部１１１を介して入力された音源データとテキストデータのマッチングを第１の実施形態と同様に行い、連動表示用データを生成する。特別読み付与部１１５は、データ生成部１１３における処理の過程で読みが付与されなかった文字列に特別な読みを付与する。なお、「特別な読み」とは、全ての前記言語列に相当する「＊（Linux（登録商標）等のオペレーティングシステムにおけるワイルドカードに相当する）」等である。データ記憶部１１７は、データ生成部１１３によって生成された連動表示用データを記憶する。 The input unit 111 is an interface for inputting sound source data and text data to the data generation device 110. The data generation unit 113 performs matching between the sound source data input via the input unit 111 and the text data in the same manner as in the first embodiment, and generates linked display data. The special reading assigning unit 115 assigns a special reading to the character string to which no reading is given in the process of the data generation unit 113. The “special reading” is “* (corresponding to a wild card in an operating system such as Linux (registered trademark))” corresponding to all the language strings. The data storage unit 117 stores the interlocking display data generated by the data generation unit 113.

図９は、第２の実施形態のデータ生成装置１１０の動作を示すフローチャートである。図９に示すように、ステップＳ１１１では、入力部１１１を介して入力された音源データ及びテキストデータがデータ生成部１１３に送られる。次に、ステップＳ１１３では、データ生成部１１３が音源データとテキストデータのマッチングを行い、連動表示用データを生成する。次に、ステップＳ１１５では、データ生成部１１３が、ステップＳ１１３で生成した連動表示用データをデータ記憶部１１７に格納する。 FIG. 9 is a flowchart illustrating the operation of the data generation device 110 according to the second embodiment. As shown in FIG. 9, in step S 111, sound source data and text data input via the input unit 111 are sent to the data generation unit 113. Next, in step S113, the data generation unit 113 performs matching between the sound source data and the text data to generate linked display data. Next, in step S115, the data generation unit 113 stores the interlocking display data generated in step S113 in the data storage unit 117.

図１０及び図１１は、第２の実施形態のデータ生成装置１１０が行うステップＳ１１３の詳細を示すフローチャートである。図１０及び図１１に示すように、データ生成部１１３は、図３及び図４を参照して説明したステップＳ２０１〜Ｓ２１３及びステップＳ２１７〜Ｓ２２５を第１の実施形態と同様に行う。本実施形態では、図１１に示すステップＳ２１３が行われた後、特別読み付与部１１５は、読みが付与されなかった文字列に特別な読みを付与する（ステップＳ２５１）。次に、データ生成部１１３は、特別な読みが付与された文字列に、当該文字列の直前の文字列の終了時間以降の該当する言語列と特別な読みを比較して（ステップＳ２５３）、読みの一致する言語列が存在するか否かを判断する（ステップＳ２５５）。その後、該当する言語列と特別な読みが一致した場合はステップＳ２５７へ進み、一致しなかった場合はステップＳ２５９へ進む。ステップＳ２５７では、特別な読みが付与された文字列へ該当する言語列の時間情報を割り当てて、ステップＳ２２１へ進む。一方、ステップＳ２５９では、特別な読みが付与された文字列の直前の文字列の終了時間から始まる所定長の時間帯を割り当てた後、ステップＳ２２１に進む。また、本実施形態では、ステップＳ２１９で、対象とされている文字列が言語列のテキスト内にないと判断されたときにもステップＳ２５３に進み、データ生成部１１３は同様の処理を行う。 10 and 11 are flowcharts illustrating details of step S113 performed by the data generation device 110 according to the second embodiment. As shown in FIGS. 10 and 11, the data generation unit 113 performs steps S201 to S213 and steps S217 to S225 described with reference to FIGS. 3 and 4 in the same manner as in the first embodiment. In this embodiment, after step S213 shown in FIG. 11 is performed, the special reading assigning unit 115 assigns a special reading to the character string to which no reading is given (step S251). Next, the data generation unit 113 compares the special reading with the corresponding language string after the end time of the character string immediately before the character string to the character string to which the special reading is given (step S253). It is determined whether or not there is a language string that matches the reading (step S255). Thereafter, if the corresponding language string matches the special reading, the process proceeds to step S257, and if not matched, the process proceeds to step S259. In step S257, the time information of the corresponding language string is assigned to the character string to which the special reading is given, and the process proceeds to step S221. On the other hand, in step S259, after assigning a predetermined time period starting from the end time of the character string immediately before the character string to which the special reading is given, the process proceeds to step S221. In this embodiment, when it is determined in step S219 that the target character string is not included in the text of the language string, the process proceeds to step S253, and the data generation unit 113 performs the same process.

以上説明したように、本実施形態のデータ生成装置１１０によれば、読みが付与されない文字列には特別な読みが付与されるため、連動表示用データの更新を行わずに、句読点やクエスチョンマーク等の無音記号で表される文字列を含むテキストの連動表示用データを生成することができる。なお、本実施形態のデータ生成装置１１０にも、第１の実施形態と同様に、再生装置５０を接続しても良い。 As described above, according to the data generation device 110 of the present embodiment, a special reading is given to a character string to which no reading is given, and therefore punctuation marks and question marks are not performed without updating the interlocking display data. Data for interlocking display of text including a character string represented by a silent symbol such as can be generated. Note that the playback device 50 may be connected to the data generation device 110 of the present embodiment as in the first embodiment.

（第３の実施形態）
図１２は、第３の実施形態のデータ生成装置を示すブロック図である。図１２に示すように、第３の実施形態のデータ生成装置１２０は、入力部１２１と、データ生成部１２３と、読み推測部１２５と、特別読み付与部１２７と、データ記憶部１２９とを備える。なお、データ生成部１２３及び読み推測部１２５はプログラムを実行することによって動作する。 (Third embodiment)
FIG. 12 is a block diagram illustrating a data generation apparatus according to the third embodiment. As illustrated in FIG. 12, the data generation device 120 according to the third embodiment includes an input unit 121, a data generation unit 123, a reading estimation unit 125, a special reading assigning unit 127, and a data storage unit 129. . Note that the data generation unit 123 and the reading estimation unit 125 operate by executing a program.

入力部１２１は、音源データ及びテキストデータをデータ生成装置１２０に入力するためのインタフェースである。データ生成部１２３は、入力部１２１を介して入力された音源データとテキストデータのマッチングを第１の実施形態と同様に行い、連動表示用データを生成する。読み推測部１２５は、データ生成部１２３における処理の過程で読みが付与されなかった文字列の読みを推測する。特別読み付与部１２７は、読み推測部１２５で読みを推測できなかった文字列に特別な読みを付与する。なお、「特別な読み」とは、第２の実施形態と同様、全ての前記言語列に相当する「＊（Linux（登録商標）等のオペレーティングシステムにおけるワイルドカードに相当する）」等である。データ記憶部１２９は、データ生成部１２３によって生成された連動表示用データを記憶する。 The input unit 121 is an interface for inputting sound source data and text data to the data generation device 120. The data generation unit 123 performs matching between the sound source data input via the input unit 121 and the text data in the same manner as in the first embodiment, and generates linked display data. The reading estimation unit 125 estimates reading of a character string to which readings are not given in the course of processing in the data generation unit 123. The special reading giving unit 127 gives a special reading to the character string that the reading guessing unit 125 could not guess. Note that the “special reading” is “* (corresponding to a wild card in an operating system such as Linux (registered trademark)”) corresponding to all the language strings as in the second embodiment. The data storage unit 129 stores the interlocking display data generated by the data generation unit 123.

図１３は、第３の実施形態のデータ生成装置１２０の動作を示すフローチャートである。図１３に示すように、ステップＳ１２１では、入力部１２１を介して入力された音源データ及びテキストデータがデータ生成部１２３に送られる。次に、ステップＳ１２３では、データ生成部１２３が音源データとテキストデータのマッチングを行い、連動表示用データを生成する。次に、ステップＳ１２５では、データ生成部１２３が、ステップＳ１２３で生成した連動表示用データをデータ記憶部１２９に格納する。 FIG. 13 is a flowchart illustrating the operation of the data generation device 120 according to the third embodiment. As shown in FIG. 13, in step S 121, the sound source data and text data input via the input unit 121 are sent to the data generation unit 123. Next, in step S123, the data generation unit 123 performs matching between the sound source data and the text data, and generates interlocking display data. Next, in step S125, the data generation unit 123 stores the interlocking display data generated in step S123 in the data storage unit 129.

図１４及び図１５は、第３の実施形態のデータ生成装置１２０が行うステップＳ１２３の詳細を示すフローチャートである。図１４及び図１５に示すように、データ生成部１２３は、図３及び図４を参照して説明したステップＳ２０１〜Ｓ２１３及びステップＳ２１７〜Ｓ２２５を第１の実施形態と同様に行う。本実施形態では、図１５に示すステップＳ２１３が行われた後、読み推測部１２５は、読みが付与されなかった文字列の読みを推測する（ステップＳ２７１）。ステップＳ２７３では、ステップＳ２７１で読み推測部１２５が読みを推測できたか否かを判断し、読みを推測できた場合はステップＳ２１７に進み、推測できなかった場合はステップＳ２７５に進む。 14 and 15 are flowcharts illustrating details of step S123 performed by the data generation device 120 of the third embodiment. As illustrated in FIGS. 14 and 15, the data generation unit 123 performs steps S201 to S213 and steps S217 to S225 described with reference to FIGS. 3 and 4 in the same manner as in the first embodiment. In this embodiment, after step S213 shown in FIG. 15 is performed, the reading estimation unit 125 estimates reading of a character string to which reading is not given (step S271). In step S273, it is determined whether or not the reading estimation unit 125 has estimated reading in step S271. If reading can be estimated, the process proceeds to step S217. If not, the process proceeds to step S275.

ステップＳ２７５では、特別読み付与部１２７は、読み推測部１２５が読みを推測できなかった文字列に特別な読みを付与する。次に、データ生成部１２３は、特別な読みが付与された文字列に、当該文字列の直前の文字列の終了時間から始まる所定長の時間帯を割り当て（ステップＳ２５３）た後、ステップＳ２２１に進む。また、本実施形態では、第２の実施形態と同様、ステップＳ２１９で、対象とされている文字列が言語列のテキスト内にないと判断されたときにもステップＳ２５３に進み、データ生成部１２３は同様の処理を行う。 In step S275, the special reading giving unit 127 gives a special reading to the character string that the reading guessing unit 125 could not guess. Next, the data generation unit 123 assigns a predetermined length of time zone starting from the end time of the character string immediately before the character string to the character string to which the special reading is given (step S253), and then to step S221. move on. In this embodiment, as in the second embodiment, when it is determined in step S219 that the target character string is not in the text of the language string, the process proceeds to step S253, and the data generation unit 123 is executed. Performs the same process.

図１６は、第３の実施形態で読み推測部１２５が行うステップＳ２７１の詳細を示すフローチャートである。図１６に示すように、読み推測部１２５は、読みが付与されなかった文字列に含まれる文字毎に読みとして登録されている音素が含まれているかを解析する（ステップＳ４０１）。なお、読み推測部１２５は、図１７に一例が示される音素リストを用いて前記解析を行う。次に、読み推測部１２５は、文字列に含まれる文字毎に、各文字に対応する音素が音素リスト中に含まれるかを判断し（ステップＳ４０３）、含まれる場合はステップＳ４０５に進み、含まれない場合はステップＳ４０７に進む。 FIG. 16 is a flowchart illustrating details of step S271 performed by the reading estimation unit 125 in the third embodiment. As illustrated in FIG. 16, the reading estimation unit 125 analyzes whether a phoneme registered as a reading is included for each character included in a character string to which reading is not given (step S 401). The reading estimation unit 125 performs the analysis using a phoneme list, an example of which is shown in FIG. Next, the reading estimation unit 125 determines, for each character included in the character string, whether the phoneme corresponding to each character is included in the phoneme list (step S403). If included, the process proceeds to step S405. If not, the process proceeds to step S407.

ステップＳ４０５では、読み推測部１２５は、対象とされた文字に対応する音素を読みとして設定する。一方、ステップＳ４０７では、読み推測部１２５は、対象とされた文字に読みを設定しない。ステップＳ４０５又はステップＳ４０７が行われた後、読み推測部１２５は、対象とされている文字列に含まれる全ての文字に対して上記処理を行ったかを判断し（ステップＳ４０９）、全ての文字に対して当該処理が行われた場合はこのサブファンクションを終了し、そうでなければステップＳ４０１に戻る。 In step S405, the reading estimation unit 125 sets a phoneme corresponding to the targeted character as a reading. On the other hand, in step S407, the reading estimation unit 125 does not set the reading for the target character. After step S405 or step S407 is performed, the reading estimation unit 125 determines whether or not the above processing has been performed on all characters included in the target character string (step S409). On the other hand, if the process is performed, the subfunction is terminated. Otherwise, the process returns to step S401.

以上説明したように、本実施形態のデータ生成装置１２０によれば、読みが付与されない文字列に対しては読みを推測した上で言語列のテキストデータと比較するため、新語や固有名詞等の文字列を含むテキストの連動表示用データをより適切に生成することができる。また、読みを推測できない文字列には特別な読みが付与されるため、第２の実施形態と同様、句読点やクエスチョンマーク等の無音記号で表される文字列を含むテキストの連動表示用データを生成することができる。なお、本実施形態のデータ生成装置１２０にも、第１の実施形態と同様に、再生装置５０を接続しても良い。 As described above, according to the data generation device 120 of the present embodiment, for a character string to which reading is not given, a reading is inferred and compared with text data in a language string. Data for interlocking display of text including character strings can be generated more appropriately. In addition, since special readings are given to character strings that cannot be guessed, similar to the second embodiment, data for interlocking display of text including character strings represented by silent symbols such as punctuation marks and question marks is displayed. Can be generated. Note that the playback device 50 may also be connected to the data generation device 120 of this embodiment, as in the first embodiment.

本発明に係るデータ生成装置は、読みが付与されない文字列又は正確な読みが付与されなかった文字列を含むテキストを音声の再生に合わせた所定のタイミングで表示するための連動表示用データを生成する装置等として有用である。 The data generation device according to the present invention generates linked display data for displaying text including a character string to which no reading is given or a character string to which an accurate reading is not given at a predetermined timing in accordance with the reproduction of the voice. It is useful as a device to perform.

第１の実施形態のデータ生成装置を示すブロック図1 is a block diagram illustrating a data generation device according to a first embodiment; 第１の実施形態のデータ生成装置の動作を示すフローチャートThe flowchart which shows operation | movement of the data generation apparatus of 1st Embodiment. 第１の実施形態のデータ生成装置が行うステップＳ１０３の詳細を示すフローチャートThe flowchart which shows the detail of step S103 which the data generation apparatus of 1st Embodiment performs 第１の実施形態のデータ生成装置が行うステップＳ１０３の詳細を示すフローチャートThe flowchart which shows the detail of step S103 which the data generation apparatus of 1st Embodiment performs 第１の実施形態のデータ生成装置が行うステップＳ１０５の詳細を示すフローチャートThe flowchart which shows the detail of step S105 which the data generation apparatus of 1st Embodiment performs 第１の実施形態のデータ生成装置に再生装置が接続された構成を示すブロック図The block diagram which shows the structure by which the reproducing | regenerating apparatus was connected to the data generation apparatus of 1st Embodiment 再生装置の表示部に表示されるテキストの一例を示す図The figure which shows an example of the text displayed on the display part of a reproducing | regenerating apparatus 第２の実施形態のデータ生成装置を示すブロック図The block diagram which shows the data generation apparatus of 2nd Embodiment 第２の実施形態のデータ生成装置の動作を示すフローチャートThe flowchart which shows operation | movement of the data generation apparatus of 2nd Embodiment. 第２の実施形態のデータ生成装置が行うステップＳ１１３の詳細を示すフローチャートThe flowchart which shows the detail of step S113 which the data generation apparatus of 2nd Embodiment performs. 第２の実施形態のデータ生成装置が行うステップＳ１１３の詳細を示すフローチャートThe flowchart which shows the detail of step S113 which the data generation apparatus of 2nd Embodiment performs. 第３の実施形態のデータ生成装置を示すブロック図The block diagram which shows the data generation apparatus of 3rd Embodiment 第３の実施形態のデータ生成装置の動作を示すフローチャートThe flowchart which shows operation | movement of the data generation apparatus of 3rd Embodiment. 第３の実施形態のデータ生成装置が行うステップＳ１２３の詳細を示すフローチャートThe flowchart which shows the detail of step S123 which the data generation apparatus of 3rd Embodiment performs. 第３の実施形態のデータ生成装置が行うステップＳ１２３の詳細を示すフローチャートThe flowchart which shows the detail of step S123 which the data generation apparatus of 3rd Embodiment performs. 第３の実施形態で読み推測部が行うステップＳ２７１の詳細を示すフローチャートThe flowchart which shows the detail of step S271 which a reading estimation part performs in 3rd Embodiment. 音素リストの一例を示す図Figure showing an example of a phoneme list 特許文献１に開示されたファイル作成装置が備えるリンク作成部の内部構成を示すブロック図The block diagram which shows the internal structure of the link preparation part with which the file preparation apparatus disclosed by patent document 1 is provided. 特許文献２に開示された歌詞割当装置の内部構成を示すブロック図The block diagram which shows the internal structure of the lyrics allocation apparatus disclosed by patent document 2

Explanation of symbols

１００，１１０，１２０データ生成装置
１０１，１１１，１２１入力部
１０３，１１３，１２３データ生成部
１０５位置管理部
１０７データ加工部
１０９，１１７，１２９データ記憶部
１１５，１２７特別読み付与部
１２５読み推測部
５０再生装置
５１操作受付部
５３再生制御部
５５再生部
５７表示部 100, 110, 120 Data generation device 101, 111, 121 Input unit 103, 113, 123 Data generation unit 105 Position management unit 107 Data processing unit 109, 117, 129 Data storage unit 115, 127 Special reading assigning unit 125 Reading estimation unit 50 playback device 51 operation receiving unit 53 playback control unit 55 playback unit 57 display unit

Claims

A data generation device that generates interlocking display data for displaying text at a predetermined timing in accordance with audio reproduction,
A language string generating unit that extracts a feature parameter from a spectral component of a voice band included in the sound source data and compares the extracted feature parameter with a feature parameter of a predetermined language;
A text processing unit that divides the text into a plurality of character strings and gives a reading to each character string;
Each character string to which reading is given is compared with the language string, and the target character string and the character string in the language string whose reading matches the target character string are reproduced. A first data processing unit for generating data for interlocking display including time stamp information indicating timing by reproduction elapsed time of the sound source data;
Based on the time stamp information of the character string immediately before the character string to which no reading is given, assigning a predetermined time zone to the character string to which the reading is not given, and the character string to which the reading is not given A second data processing unit including time stamp information indicating a time zone assigned to the character string in the linked display data;
A data generation device comprising:

The data generation device according to claim 1,
A character string that is not given a reading by the text processing unit or a character string that does not have a matching character string in the language string, and a position management unit that stores information indicating a position in the text of the character string,
The second data processing unit is for interlocking display in which the first data processing unit includes all of character strings to which readings included in the text are given and time stamp information of each character string to which readings are given. A data generation device characterized by assigning a predetermined time period to each character string recorded in the position management unit after generating data.

The data generation device according to claim 2,
The second data processing unit
Calculating the time length from the end time indicated by the time stamp information of the character string immediately before the character string to be stored by the position management unit to the start time indicated by the time stamp information of the character string immediately after,
If the time length is equal to or longer than a predetermined time, a predetermined length of time zone from the end time to the start time is assigned to the target character string,
If the time length is less than a predetermined time, shorten at least one of the time zone assigned to the immediately preceding character string and the time zone assigned to the immediately following character string, the shortened time zone, A data generation device, wherein the data generation device is assigned to the target character string.

The data generation device according to claim 3,
The second data processing unit shortens the predetermined time by either one of a time zone assigned to the immediately preceding character string and a time zone assigned to the immediately following character string. apparatus.

The data generation device according to claim 3,
The second data processing unit assigns a time zone from the end time to the start time and the shortened time zone to a character string to which the reading is not given.

The data generation device according to claim 2,
The data generation apparatus characterized in that the information indicating the position of the character string in the text recorded in the position management unit indicates the number of characters from the beginning of the text of the character string.

The data generation device according to claim 2,
The data generation apparatus characterized in that the information indicating the position in text of the character string recorded in the position management unit indicates a row number and a column number in a display area when the text is displayed in a predetermined display form.

The data generation device according to claim 2,
The data generation apparatus characterized in that the information indicating the position of the character string in the text recorded in the position management unit indicates a character string number from the first character string of the text of the character string.

The data generation device according to claim 2,
The data generation apparatus, wherein the information indicating the position of the character string in the text recorded in the position management unit indicates an order relationship between the character strings classified by the text processing unit included in the text.

A data generation device that generates interlocking display data for displaying text at a predetermined timing in accordance with audio reproduction,
A language string generating unit that extracts a feature parameter from a spectral component of a voice band included in the sound source data and compares the extracted feature parameter with a feature parameter of a predetermined language;
A text processing unit that divides the text into a plurality of character strings and gives a reading to each character string;
Each character string to which reading is given is compared with the language string, and the target character string and the character string in the language string whose reading matches the target character string are reproduced. A data processing unit for generating interlocking display data including time stamp information indicating timing by the elapsed reproduction time of the sound source data;
A special reading assigning unit that gives special readings corresponding to all the language strings to character strings to which no reading is given,
The data processing unit compares the character string with the special reading with the language string based on the time stamp information of the character string immediately before the character string with the special reading. The timing at which the matching character string in the language string is reproduced is assigned to the character string to which the special reading is given, and the character string to which the special reading is given and the time zone assigned to the character string And a time stamp information to be included in the linked display data.

The data generation device according to claim 10,
It has a reading guesser that guesses the reading for any character included in the character string that has not been given a reading,
The data generation device according to claim 1, wherein the special reading assigning unit assigns the special reading to a character string that the reading estimation unit cannot guess.

The data generation device according to claim 10 or 11,
The data processing unit assigns a time zone having a predetermined length from an end time indicated by time stamp information of a character string immediately before the character string to which the special reading is given to the character string to which the special reading is given. A data generator characterized by the above.

The data generation program for functioning a computer as each part with which the data generation apparatus of Claims 1-12 is provided.

A playback device for playing back sound while displaying text,
A character string that cannot be read in the text in accordance with the reproduction of the voice based on the data for linked display created by the data generation device according to claim 1 or the data generation program according to claim 13 at a predetermined timing. A playback device characterized by being displayed.

A playback device for playing back sound while displaying text,
A character string that cannot be read in the text in accordance with the reproduction of the voice based on the data for linked display created by the data generation device according to claim 1 or the data generation program according to claim 13 at a predetermined timing. The playback apparatus is characterized in that playback is started from a character string that cannot be read by acquiring the user's selection location using the display portion.