JP2016156989A

JP2016156989A - Voice synthesizer and program

Info

Publication number: JP2016156989A
Application number: JP2015035229A
Authority: JP
Inventors: 成田　健; Takeshi Narita; 健成田
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2015-02-25
Filing date: 2015-02-25
Publication date: 2016-09-01

Abstract

PROBLEM TO BE SOLVED: To provide a technique for easy voice synthesis of utterance contents in which a plurality of languages of words are mixed.SOLUTION: A voice synthesizer (karaoke device) acquires utterance content information (S120), and collates a constitution morpheme which is a morpheme constituting the utterance content information with first dictionary data to specify a phonetic sign of the constitution morpheme (S160). The first dictionary data is obtained by associating each morpheme with a phonetic sign of the morpheme for each language. On the basis of second dictionary data, each constitution morpheme is replaced with a phonogram in a specific language corresponding to the phonetic sign specified in the S160 (S160). The second dictionary data is obtained by associating each phonetic sign with a phonogram in a specific language corresponding to the phonetic sign. According to the replaced phonogram, synthesized sound which utters the utterance content information in the specific language is generated and output (S250).SELECTED DRAWING: Figure 2

Description

本発明は、合成音を生成する音声合成装置、及びプログラムに関する。 The present invention relates to a speech synthesizer that generates synthesized sound, and a program.

従来、音高と音価との組み合わせからなる音符に歌詞を割り当てた楽曲を合成音にて歌唱させる歌唱合成装置が知られている（特許文献１参照）。
この種の歌唱合成装置において、合成音を生成するためには、音声合成に必要な音声データを格納するライブラリと、ライブラリに格納された音声データに基づいて音声合成を実行する合成エンジンとが必要となる。 2. Description of the Related Art Conventionally, a singing synthesizer is known that sings a musical composition in which lyrics are assigned to musical notes composed of combinations of pitches and note values with synthesized sounds (see Patent Document 1).
In this type of synthesizer, in order to generate synthesized sound, a library that stores speech data necessary for speech synthesis and a synthesis engine that performs speech synthesis based on the speech data stored in the library are required. It becomes.

特開２００６−２５８８４６号公報JP 2006-258846 A

ところで、楽曲における歌詞には、複数の言語の単語が混在している場合がある。
このように歌詞の中に複数の言語の単語が混在していると、ライブラリと合成エンジンとを、各言語に併せて用意する必要がある。 By the way, there are cases where words in a plurality of languages are mixed in the lyrics in the music.
If words in a plurality of languages are mixed in the lyrics as described above, it is necessary to prepare a library and a synthesis engine for each language.

特に、ライブラリに格納される音声データは、同一の人物から生成したデータであることが好ましい。このため、発声させるべき発声内容情報の中に複数の言語の単語が混在していると、その発声内容情報を発声させるために必要なデータを確保するための手間が多大なものとなる。 In particular, the audio data stored in the library is preferably data generated from the same person. For this reason, if words in a plurality of languages are mixed in the utterance content information to be uttered, a lot of labor is required to secure data necessary for uttering the utterance content information.

つまり、従来の技術では、複数の言語の単語が混在する発声内容を簡易に音声合成できないという課題があった。
そこで、本発明は、複数の言語の単語が混在している発声内容を簡易に音声合成する技術を提供することを目的とする。 In other words, the conventional technology has a problem that it is not possible to easily synthesize speech content in which words in a plurality of languages are mixed.
Therefore, an object of the present invention is to provide a technique for easily synthesizing speech content in which words of a plurality of languages are mixed.

上記目的を達成するためになされた本発明の１つの態様は、内容取得手段と、言語解析手段と、置換手段と、合成音出力手段とを備える音声合成装置に関する。
内容取得手段は、発声すべき内容の文言を表す発声内容情報を取得する。言語解析手段は、第１辞書データに構成形態素を照合することで、構成形態素の発音記号を特定する。ここで言う第１辞書データは、各形態素とその形態素の発音記号とを言語ごとに対応付けたデータである。また、構成形態素は、内容取得手段で取得した発声内容情報を構成する形態素である。 One aspect of the present invention made to achieve the above object relates to a speech synthesizer including content acquisition means, language analysis means, replacement means, and synthesized sound output means.
The content acquisition means acquires utterance content information representing the wording of the content to be uttered. The language analysis means identifies the phonetic symbol of the constituent morpheme by collating the constituent morpheme with the first dictionary data. The first dictionary data referred to here is data in which each morpheme and a phonetic symbol of the morpheme are associated with each language. Further, the constituent morpheme is a morpheme constituting the utterance content information acquired by the content acquisition unit.

さらに、置換手段は、第２辞書データに基づいて、言語解析手段で特定された発音記号に対応する特定言語での表音文字へと、構成形態素それぞれを置き換える。ここで言う第２辞書データは、発音記号のそれぞれと当該発音記号に対応する特定言語の表音文字とを対応付けたデータである。 Further, the replacement unit replaces each constituent morpheme with a phonetic character in a specific language corresponding to the phonetic symbol specified by the language analysis unit based on the second dictionary data. The second dictionary data referred to here is data in which each phonetic symbol is associated with a phonetic character in a specific language corresponding to the phonetic symbol.

そして、合成音出力手段は、置換手段で置き換えられた表音文字に従って、発声内容情報を特定言語で発声した合成音を生成して出力する。
このような音声合成装置によれば、構成形態素を特定言語の表音文字へと変換することができる。つまり、音声合成装置によれば、複数の言語の単語が発声内容情報に混在していたとしても、全ての単語（形態素）を特定言語の表音文字へと変換できる。 Then, the synthesized sound output means generates and outputs a synthesized sound in which the utterance content information is uttered in a specific language according to the phonetic character replaced by the replacement means.
According to such a speech synthesizer, a constituent morpheme can be converted into a phonetic character of a specific language. That is, according to the speech synthesizer, even if words in a plurality of languages are mixed in the utterance content information, all words (morphemes) can be converted into phonetic characters in a specific language.

このため、音声合成装置によれば、複数の言語それぞれのライブラリや合成エンジンを用意することなく、当該発声内容情報の合成音を生成して出力することができる。換言すれば、音声合成装置によれば、複数の言語の単語が混在している発声内容情報であっても、当該発声内容情報の合成音を簡易に生成できる。 Therefore, according to the speech synthesizer, it is possible to generate and output a synthesized sound of the utterance content information without preparing a library or a synthesis engine for each of a plurality of languages. In other words, according to the speech synthesizer, even if it is utterance content information in which words of a plurality of languages are mixed, a synthesized sound of the utterance content information can be easily generated.

また、言語解析手段は、構成形態素が特定言語の形態素であるか否かを判定し、その判定の結果、特定言語の形態素でなければ、当該構成形態素の発音記号を特定してもよい。
このような音声合成装置によれば、構成形態素の発音記号を特定する条件を、当該構成形態素が特定言語以外の言語の形態素である場合とすることができる。これにより、音声合成装置によれば、１つの発声内容情報において、構成形態素の発音記号を特定する回数を低減でき、構成形態素の発音記号を特定するために実行する処理の処理量を低減できる。 In addition, the language analysis unit may determine whether the constituent morpheme is a morpheme of a specific language, and if the result of the determination is not a morpheme of the specific language, the linguistic symbol of the constituent morpheme may be specified.
According to such a speech synthesizer, a condition for specifying a phonetic symbol of a constituent morpheme can be a case where the constituent morpheme is a morpheme of a language other than the specific language. Thereby, according to the speech synthesizer, it is possible to reduce the number of times that the phonetic symbols of the constituent morphemes are specified in one utterance content information, and it is possible to reduce the amount of processing to be executed to specify the phonetic symbols of the constituent morphemes.

さらに、言語解析手段は、日本語を特定言語として、構成形態素が特定言語の形態素であるか否かを判定してもよい。
このような音声合成装置によれば、日本語を特定言語として処理を実行できる。 Furthermore, the language analysis means may determine whether the constituent morpheme is a morpheme of a specific language with Japanese as the specific language.
According to such a speech synthesizer, processing can be executed with Japanese as a specific language.

そして、内容取得手段は、楽曲の旋律を構成する複数の音符のうちの少なくとも一部の音符に割り当てられた文言である歌詞を、発声内容情報として取得してもよい。
このような音声合成装置によれば、楽曲の歌詞に複数の言語の単語（形態素）が混在する場合であっても、その楽曲を音声合成にて歌唱させることができる。 And the content acquisition means may acquire the lyrics which are the words assigned to at least some of the notes constituting the melody of the music as the utterance content information.
According to such a speech synthesizer, even if words of a plurality of languages (morphemes) are mixed in the lyrics of the music, the music can be sung by voice synthesis.

ところで、発声内容情報の構成形態素を特定言語の表音文字へと変換した場合、その表音文字の文字数が対象音符の数よりも多くなる可能性がある。
そこで、本発明の１つの態様としての音声合成装置は、音符分割手段と、割当手段とを備えていてもよい。 By the way, when the constituent morphemes of the utterance content information are converted into phonograms of a specific language, there is a possibility that the number of phonograms is larger than the number of target notes.
Therefore, the speech synthesizer as one aspect of the present invention may include a note dividing unit and an assigning unit.

この場合、音符分割手段は、置換手段で、構成形態素それぞれを特定言語の表音文字へと置き換えた結果、表音文字の文字数が、当該構成形態素が割り当てられた対象音符の数よりも多い場合、表音文字の文字数に合致するように対象音符を分割する。そして、割当手段が、音符分割手段で分割された各音符に表音文字を割り当てる。 In this case, the note dividing means replaces each constituent morpheme with a phonetic character of a specific language by the replacing means, and as a result, the number of characters of the phonetic character is larger than the number of target notes to which the constituent morpheme is assigned. The target note is divided so as to match the number of phonetic characters. Then, the assigning means assigns a phonetic character to each note divided by the note dividing means.

さらに、合成音出力手段が、割当手段で各音符に割り当てられた表音文字それぞれを、置換手段で置き換えられた表音文字としてもよい。
このような音声合成装置においては、１つの音符に１つの表音文字を割り当てることができる。この結果、音声合成装置によれば、発声内容情報の合成音を楽曲に適したものとすることができ、違和感の少ない合成音とすることができる。 Further, the synthesized sound output means may replace the phonetic character assigned to each note by the assigning means with the phonetic character replaced by the replacing means.
In such a speech synthesizer, one phonetic character can be assigned to one note. As a result, according to the speech synthesizer, the synthesized sound of the utterance content information can be made suitable for music and can be a synthesized sound with less discomfort.

なお、本発明は、プログラムとしてなされていてもよい。
この場合のプログラムは、発声内容情報を取得する内容取得手順と、第１辞書データに構成形態素を照合することで、構成形態素の発音記号を特定する言語解析手順とをコンピュータに実行させる。さらに、本発明の１つの態様としてのプログラムは、第２辞書データに基づいて、言語解析手順で特定された発音記号に対応する特定言語での表音文字へと、構成形態素それぞれを置き換える置換手順と、その置き換えられた表音文字に従って、発声内容情報を特定言語で発声した合成音を生成して出力する合成音出力手順とをコンピュータに実行させる。 Note that the present invention may be implemented as a program.
The program in this case causes the computer to execute a content acquisition procedure for acquiring utterance content information and a language analysis procedure for identifying phonetic symbols of the constituent morphemes by collating the constituent morphemes with the first dictionary data. Further, the program as one aspect of the present invention includes a replacement procedure for replacing each constituent morpheme with a phonetic character in a specific language corresponding to the phonetic symbol specified in the language analysis procedure based on the second dictionary data. And a synthesized sound output procedure for generating and outputting a synthesized sound in which the utterance content information is uttered in a specific language according to the replaced phonetic character.

このように、本発明がプログラムとしてなされていれば、記録媒体から必要に応じてコンピュータにロードさせて起動することや、必要に応じて通信回線を介してコンピュータに取得させて起動することにより用いることができる。そして、コンピュータに各手順を実行させることで、そのコンピュータを音声合成装置として機能させることができる。 As described above, if the present invention is implemented as a program, it is used by loading the computer from a recording medium as necessary and starting it, or by acquiring it and starting it through a communication line as necessary. be able to. Then, by causing the computer to execute each procedure, the computer can function as a speech synthesizer.

なお、ここで言う記録媒体には、例えば、ＤＶＤ−ＲＯＭ、ＣＤ−ＲＯＭ、ハードディスク等のコンピュータ読み取り可能な電子媒体を含む。 The recording medium referred to here includes, for example, a computer-readable electronic medium such as a DVD-ROM, a CD-ROM, and a hard disk.

音声合成システムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of a speech synthesis system. 再生処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a reproduction | regeneration process. 特定言語変換処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a specific language conversion process. 特定言語変換処理の概要を示す説明図であり、（Ａ）は発声内容情報の一例を示す図であり、（Ｂ）は歌詞ワードを発音記号へと変換した状況を例示する図であり、（Ｃ）は特定言語の表音文字へと発音記号を置換した状況を例示する図である。It is explanatory drawing which shows the outline | summary of a specific language conversion process, (A) is a figure which shows an example of utterance content information, (B) is a figure which illustrates the condition which converted the lyric word into the phonetic symbol, C) is a diagram illustrating a situation where phonetic symbols are replaced with phonetic characters in a specific language. 発声速度調整処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of an utterance speed adjustment process. 発声速度調整処理の概要を説明する説明図である。It is explanatory drawing explaining the outline | summary of an utterance speed adjustment process.

以下に本発明の実施形態を図面と共に説明する。
＜音声合成システム＞
図１に示す音声合成システム１は、ユーザが指定した楽曲（以下、対象楽曲と称す）における歌詞を歌唱した合成音声をガイドボーカルとして出力するシステムである。 Embodiments of the present invention will be described below with reference to the drawings.
<Speech synthesis system>
A speech synthesis system 1 shown in FIG. 1 is a system that outputs a synthesized speech in which lyrics of a song specified by a user (hereinafter referred to as a target song) are sung as a guide vocal.

音声合成システム１は、情報処理サーバ１０と、カラオケ装置３０とを備えている。
情報処理サーバ１０には、少なくとも、ＭＩＤＩ楽曲ＭＤと、辞書データＤＤとが格納されている。 The speech synthesis system 1 includes an information processing server 10 and a karaoke device 30.
The information processing server 10 stores at least a MIDI music piece MD and dictionary data DD.

カラオケ装置３０は、情報処理サーバ１０に記憶され、かつ対象楽曲に対応するＭＩＤＩ楽曲ＭＤを演奏する。さらに、カラオケ装置３０は、音源データＰＤに従って、その楽曲を歌唱した合成音声を生成してガイドボーカルとして出力する。なお、音声合成システム１は、複数のカラオケ装置３０を備えている。
＜ＭＩＤＩ楽曲＞
ＭＩＤＩ楽曲ＭＤは、楽曲ごとに予め用意されたものであり、楽曲データと、歌詞データと、楽曲情報とを有している。 The karaoke apparatus 30 plays a MIDI music MD stored in the information processing server 10 and corresponding to the target music. Further, the karaoke apparatus 30 generates a synthesized voice in which the music is sung in accordance with the sound source data PD and outputs it as a guide vocal. Note that the speech synthesis system 1 includes a plurality of karaoke apparatuses 30.
<MIDI music>
The MIDI music MD is prepared in advance for each music, and has music data, lyrics data, and music information.

このうち、楽曲データは、周知のＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）規格によって、一つの楽曲の楽譜を表したデータである。この楽曲データは、当該楽曲にて用いられる楽器ごとの楽譜を表す楽譜トラックを少なくとも有している。 Among these, the music data is data representing the score of one music according to the well-known MIDI (Musical Instrument Digital Interface) standard. This music data has at least a musical score track representing a musical score for each musical instrument used in the music.

楽譜トラックには、ＭＩＤＩ音源から出力される個々の演奏音について、少なくとも、音高（いわゆるノートナンバー）と、ＭＩＤＩ音源が演奏音を出力する期間（以下、音符長と称す）とが規定されている。楽譜トラックにおける音符長は、当該演奏音のノートオンタイミングと、当該演奏音のノートオフタイミングとによって規定されている。 The musical score track defines at least a pitch (so-called note number) and a period during which the MIDI sound source outputs the performance sound (hereinafter referred to as a note length) for each performance sound output from the MIDI sound source. Yes. The note length in the score track is defined by the note-on timing of the performance sound and the note-off timing of the performance sound.

一方、歌詞データは、楽曲の歌詞に関するデータであり、歌詞テキストデータと、歌詞出力データとを備えている。歌詞テキストデータは、楽曲の歌詞を表す。歌詞出力データは、歌詞を構成する文字（以下、「歌詞構成文字」と称す）の出力タイミングである歌詞
出力タイミングを、楽曲データの演奏と対応付けるタイミング対応関係が規定されたデータである。また、タイミング対応関係においては、楽曲の主旋律を構成する演奏音（即ち、音符）のうち、少なくとも一部の音符に歌詞構成文字を割り当てることが規定されている。 On the other hand, the lyric data is data relating to the lyrics of the music, and includes lyric text data and lyric output data. The lyric text data represents the lyrics of the music. The lyrics output data is data in which a timing correspondence relationship that associates the lyrics output timing, which is the output timing of characters constituting the lyrics (hereinafter referred to as “lyric constituent characters”), with the performance of the music data. In addition, the timing correspondence relationship stipulates that lyrics constituent characters are assigned to at least some of the performance sounds (that is, notes) constituting the main melody of the music.

さらに、歌詞データにおいては、歌詞構成文字に使用される文字の種類を表す文字種類情報が、歌詞構成文字ごとに規定されている。ここで言う文字の種類とは、当該文字が用いられる言語を表す情報であり、例えば、漢字、平仮名、片仮名、ラテン文字、アラビア文字などを含む。 Furthermore, in the lyric data, character type information indicating the type of characters used for the lyric constituent characters is defined for each lyric constituent character. The character type mentioned here is information representing the language in which the character is used, and includes, for example, kanji, hiragana, katakana, latin, and arabic.

楽曲情報は、楽曲に関する情報であり、楽曲を識別する識別情報（即ち、楽曲ＩＤ）を含む。また、ここで言う楽曲に関する情報には、例えば、楽曲名、アーティスト名、歌詞の言語を表す歌詞言語情報などを含む。
＜辞書データ＞
辞書データＤＤは、言語ごとに用意されるデータであり、各形態素と、その形態素の発音記号と、その発音記号に対応する特定言語の表音文字とが対応付けられている。 The music information is information related to the music, and includes identification information (that is, music ID) for identifying the music. In addition, the information related to the music referred to here includes, for example, a music title, an artist name, and lyrics language information indicating a language of lyrics.
<Dictionary data>
The dictionary data DD is data prepared for each language, and each morpheme, a phonetic symbol of the morpheme, and a phonetic character of a specific language corresponding to the phonetic symbol are associated with each other.

なお、本実施形態において、言語とは、音声や文字によって、情報を表現・伝達する規則であり、例えば、日本語、英語、ドイツ語、フランス語、中国語などである。また、特定言語とは、規定された１つの言語である。本実施形態においては、日本語を特定言語として説明する。
＜情報処理サーバ＞
情報処理サーバ１０は、通信部１２と、記憶部１４と、制御部１６とを備えている。 In the present embodiment, the language is a rule for expressing and transmitting information by voice or characters, such as Japanese, English, German, French, or Chinese. Further, the specific language is one specified language. In the present embodiment, Japanese is described as a specific language.
<Information processing server>
The information processing server 10 includes a communication unit 12, a storage unit 14, and a control unit 16.

このうち、通信部１２は、通信網を介して、情報処理サーバ１０が外部との間で通信を行う。すなわち、情報処理サーバ１０は、通信網を介してカラオケ装置３０と接続されている。なお、ここで言う通信網は、有線による通信網であっても良いし、無線による通信網であっても良い。 Among these, the communication unit 12 performs communication between the information processing server 10 and the outside via a communication network. That is, the information processing server 10 is connected to the karaoke apparatus 30 via a communication network. The communication network referred to here may be a wired communication network or a wireless communication network.

記憶部１４は、記憶内容を読み書き可能に構成された周知の記憶装置である。この記憶部１４には、複数のＭＩＤＩ楽曲ＭＤが記憶される。なお、図１に示す符号「ｎ」は、情報処理サーバ１０の記憶部１４に記憶されているＭＩＤＩ楽曲ＭＤを識別する識別子であり、楽曲ごとに割り当てられている。この符号「ｎ」は、１以上の自然数である。 The storage unit 14 is a known storage device configured to be able to read and write stored contents. The storage unit 14 stores a plurality of MIDI music pieces MD. 1 is an identifier for identifying the MIDI music piece MD stored in the storage unit 14 of the information processing server 10, and is assigned to each music piece. This code “n” is a natural number of 1 or more.

さらに、記憶部１４には、辞書データＤＤが記憶される。なお、図１に示す符号「ｍ」は、情報処理サーバ１０の記憶部１４に記憶されている辞書データＤＤを識別する識別子であり、言語ごとに割り当てられている。この符号「ｍ」は、２以上の自然数である。 Further, the storage unit 14 stores dictionary data DD. 1 is an identifier for identifying the dictionary data DD stored in the storage unit 14 of the information processing server 10, and is assigned for each language. This code “m” is a natural number of 2 or more.

制御部１６は、ＲＯＭ１８，ＲＡＭ２０，ＣＰＵ２２を備えた周知のマイクロコンピュータを中心に構成された周知の制御装置である。ＲＯＭ１８は、電源が切断されても記憶内容を保持する必要がある処理プログラムやデータを記憶する。ＲＡＭ２０は、処理プログラムやデータを一時的に記憶する。ＣＰＵ２２は、ＲＯＭ１８やＲＡＭ２０に記憶された処理プログラムに従って各処理を実行する。
＜カラオケ装置＞
カラオケ装置３０は、通信部３２と、入力受付部３４と、楽曲再生部３６と、記憶部３８と、音声制御部４０と、映像制御部４６と、制御部５０とを備えている。 The control unit 16 is a known control device that is configured around a known microcomputer including a ROM 18, a RAM 20, and a CPU 22. The ROM 18 stores processing programs and data that need to retain stored contents even when the power is turned off. The RAM 20 temporarily stores processing programs and data. The CPU 22 executes each process according to a processing program stored in the ROM 18 or the RAM 20.
<Karaoke equipment>
The karaoke apparatus 30 includes a communication unit 32, an input reception unit 34, a music playback unit 36, a storage unit 38, an audio control unit 40, a video control unit 46, and a control unit 50.

通信部３２は、通信網を介して、カラオケ装置３０が外部との間で通信を行う。入力受付部３４は、外部からの操作に従って情報や指令の入力を受け付ける入力機器である。ここで言う入力機器とは、例えば、キーやスイッチ、リモコンの受付部などである。 In the communication unit 32, the karaoke apparatus 30 communicates with the outside via a communication network. The input receiving unit 34 is an input device that receives input of information and commands in accordance with external operations. The input device referred to here is, for example, a key, a switch, a reception unit of a remote controller, or the like.

楽曲再生部３６は、情報処理サーバ１０からダウンロードしたＭＩＤＩ楽曲ＭＤに基づく楽曲の演奏を実行する。この楽曲再生部３６は、例えば、ＭＩＤＩ音源である。音声制御部４０は、音声の入出力を制御するデバイスであり、出力部４２と、マイク入力部４４とを備えている。 The music playback unit 36 performs a music performance based on the MIDI music MD downloaded from the information processing server 10. The music reproducing unit 36 is, for example, a MIDI sound source. The voice control unit 40 is a device that controls voice input / output, and includes an output unit 42 and a microphone input unit 44.

マイク入力部４４には、マイク６２が接続される。これにより、マイク入力部４４は、マイク６２を介して入力された音声を取得する。出力部４２には、スピーカ６０が接続されている。出力部４２は、楽曲再生部３６によって再生される楽曲の音源信号、マイク入力部４４からの歌唱音の音源信号をスピーカ６０に出力する。スピーカ６０は、出力部４２から入力された音源信号を音に換えて出力する。 A microphone 62 is connected to the microphone input unit 44. As a result, the microphone input unit 44 acquires the sound input via the microphone 62. A speaker 60 is connected to the output unit 42. The output unit 42 outputs the sound source signal of the music reproduced by the music reproducing unit 36 and the sound source signal of the singing sound from the microphone input unit 44 to the speaker 60. The speaker 60 outputs the sound source signal input from the output unit 42 instead of sound.

映像制御部４６は、制御部５０から送られてくる映像データに基づく映像または画像を出力する。映像制御部４６には、映像または画像を表示する表示部６４が接続されている。 The video control unit 46 outputs a video or an image based on the video data sent from the control unit 50. The video control unit 46 is connected to a display unit 64 that displays video or images.

記憶部３８は、記憶内容を読み書き可能に構成された周知の記憶装置である。この記憶部１４には、音源データＰＤが格納される。この音源データＰＤは、合成音声の生成（即ち、音声合成）に必要となるデータである。具体的には、音源データＰＤは、音声素片であってもよいし、フォルマント合成に用いる各種パラメータであってもよい。 The storage unit 38 is a well-known storage device configured to be able to read and write stored contents. The storage unit 14 stores sound source data PD. The sound source data PD is data necessary for generation of synthesized speech (that is, speech synthesis). Specifically, the sound source data PD may be a speech unit or various parameters used for formant synthesis.

制御部５０は、ＲＯＭ５２，ＲＡＭ５４，ＣＰＵ５６を少なくとも有した周知のコンピュータを中心に構成されている。ＲＯＭ５２は、電源が切断されても記憶内容を保持する必要がある処理プログラムやデータを記憶する。ＲＡＭ５４は、処理プログラムやデータを一時的に記憶する。ＣＰＵ５６は、ＲＯＭ５２やＲＡＭ５４に記憶された処理プログラムに従って各処理を実行する。 The control unit 50 is configured around a known computer having at least a ROM 52, a RAM 54, and a CPU 56. The ROM 52 stores processing programs and data that need to retain stored contents even when the power is turned off. The RAM 54 temporarily stores processing programs and data. The CPU 56 executes each process according to a processing program stored in the ROM 52 or the RAM 54.

本実施形態のＲＯＭ５２には、再生処理を制御部５０が実行するための処理プログラムが記憶されている。
＜再生処理＞
次に、制御部５０が実行する再生処理について説明する。 The ROM 52 of the present embodiment stores a processing program for the control unit 50 to execute the reproduction process.
<Reproduction processing>
Next, the reproduction process executed by the control unit 50 will be described.

この再生処理は、対象楽曲の再生順序となると起動される。
そして、再生処理が起動されると、図２に示すように、制御部５０は、まず、対象楽曲の識別番号（楽曲ＩＤ）を取得する（Ｓ１１０）。続いて、制御部５０は、Ｓ１１０で取得した楽曲ＩＤが含まれているＭＩＤＩ楽曲ＭＤの歌詞データを、情報処理サーバ１０から取得する（Ｓ１２０）。すなわち、本実施形態のＳ１２０では、発声すべき内容の文言を表す発声内容情報として、歌詞データを取得する。 This reproduction process is activated when the reproduction order of the target music is reached.
When the reproduction process is activated, as shown in FIG. 2, the control unit 50 first acquires the identification number (music ID) of the target music (S110). Subsequently, the control unit 50 acquires the lyrics data of the MIDI music MD including the music ID acquired in S110 from the information processing server 10 (S120). That is, in S120 of this embodiment, lyric data is acquired as utterance content information representing the wording of the content to be uttered.

そして、制御部５０は、Ｓ１２０で取得した歌詞データによって表された歌詞（以下、「対象歌詞」と称す）が特定言語であるか否かを、楽曲情報に基づいて判定する（Ｓ１３０）。 Then, the control unit 50 determines whether or not the lyrics expressed by the lyrics data acquired in S120 (hereinafter referred to as “target lyrics”) is a specific language based on the music information (S130).

このＳ１３０での判定の結果、対象歌詞が特定言語でない場合（Ｓ１３０：ＮＯ）、制御部５０は、詳しくは後述するＳ２００へと再生処理を移行させる。具体的に、本実施形態のＳ１３０では、楽曲情報に含まれる歌詞言語情報が特定言語であれば、対象歌詞が特定言語であるものと判定する。なお、本実施形態においては、歌詞言語情報が特定言語である状態として、特定言語の歌詞ワードと、特定言語以外の歌詞ワードが混在する場合であって、特定言語の歌詞ワードが占める割合が、予め規定された比率以上であることを含む。ここで言う歌詞ワードとは、対象歌詞を構成する形態素であり、特許請求の範囲に記
載された構成形態素の一例である。 As a result of the determination in S130, when the target lyrics are not a specific language (S130: NO), the control unit 50 shifts the reproduction process to S200 described later in detail. Specifically, in S130 of the present embodiment, if the lyrics language information included in the music information is a specific language, it is determined that the target lyrics are the specific language. In the present embodiment, as a state where the lyric language information is a specific language, when the lyric word of the specific language and the lyric word other than the specific language are mixed, the ratio of the lyric word of the specific language is Including that the ratio is equal to or greater than a predetermined ratio. The lyric word mentioned here is a morpheme constituting the target lyrics, and is an example of a constituent morpheme described in the claims.

一方、Ｓ１３０での判定の結果、対象歌詞が特定言語であれば（Ｓ１３０：ＹＥＳ）、制御部５０は、対象歌詞を構成する全ての歌詞ワードの中から、１つの歌詞ワードを抽出する（Ｓ１４０）。このＳ１４０では、制御部５０は、対象楽曲における時間の進行に沿って、歌詞ワードを抽出すればよい。なお、以下では、Ｓ１４０にて抽出した歌詞ワードを対象ワードと称す。 On the other hand, as a result of the determination in S130, if the target lyrics are a specific language (S130: YES), the control unit 50 extracts one lyrics word from all the lyrics words constituting the target lyrics (S140). ). In S <b> 140, the control unit 50 may extract lyrics words along with the progress of time in the target music. Hereinafter, the lyric word extracted in S140 is referred to as a target word.

続いて、制御部５０は、対象ワードが、特定言語にて用いられる文字以外の文字で表されているか否かを判定する（Ｓ１５０）。このＳ１５０での判定の結果、対象ワードが特定言語にて用いられる文字以外の文字で表されていなければ、即ち、特定言語にて用いられる文字（例えば、平仮名、片仮名、漢字）で表されていれば（Ｓ１５０：ＮＯ）、制御部５０は、再生処理をＳ１６０へと移行させる。なお、本実施形態のＳ１５０では、制御部５０は、歌詞データに含まれる文字種類情報が特定言語にて用いられる文字を表していれば、Ｓ１５０において否定判定する。 Subsequently, the control unit 50 determines whether or not the target word is represented by characters other than those used in the specific language (S150). As a result of the determination in S150, if the target word is not represented by characters other than those used in the specific language, that is, represented by characters (for example, hiragana, katakana, kanji) used in the specific language. If so (S150: NO), the controller 50 shifts the reproduction process to S160. In S150 of the present embodiment, the control unit 50 makes a negative determination in S150 if the character type information included in the lyrics data represents a character used in a specific language.

そのＳ１６０では、制御部５０は、対象ワードに対する特定言語の表音文字を辞書データＤＤから特定して、その対象ワードを特定言語の表音文字へと置換する。制御部５０は、その後、詳しくは後述するＳ１８０へと再生処理を移行させる。 In S <b> 160, the control unit 50 identifies the phonetic character in the specific language for the target word from the dictionary data DD, and replaces the target word with the phonetic character in the specific language. Thereafter, the control unit 50 shifts the reproduction process to S180 described later in detail.

一方、Ｓ１５０での判定の結果、対象ワードが特定言語にて用いられる文字以外の文字（例えば、ラテン文字、アラビア文字、繁体字など）で表されていれば（Ｓ１５０：ＹＥＳ）、制御部５０は、再生処理をＳ１７０へと移行させる。 On the other hand, as a result of the determination in S150, if the target word is represented by characters other than those used in the specific language (for example, Latin characters, Arabic characters, traditional characters, etc.) (S150: YES), the control unit 50 Shifts the reproduction process to S170.

そのＳ１７０では、制御部５０は、対象ワードを特定言語の表音文字へと変換する特定言語変換処理を実行する。この特定言語変換処理の詳細は、後述する。
さらに、制御部５０は、対象ワードを構成する表音文字の１つ１つを、１つの音符に割り当てる発声速度調整処理を実行する（Ｓ１８０）。この発声速度調整処理の詳細については、後述する。 In S170, the control unit 50 executes a specific language conversion process for converting the target word into a phonetic character of a specific language. Details of the specific language conversion process will be described later.
Furthermore, the control unit 50 performs a speech rate adjustment process for assigning each phonogram included in the target word to one note (S180). Details of this speech rate adjustment process will be described later.

続いて、制御部５０は、対象歌詞を構成する全ての歌詞ワードに対して、Ｓ１４０〜Ｓ１８０のステップを実行したか否かを判定する（Ｓ１９０）。このＳ１９０での判定の結果、全ての歌詞ワードに対して、Ｓ１４０〜Ｓ１８０のステップを実行していなければ（Ｓ１９０：ＮＯ）、制御部５０は、再生処理をＳ１４０へと戻す。そのＳ１４０では、制御部５０は、新たな歌詞ワードを対象ワードとして抽出して、Ｓ１５０からＳ１９０までのステップを実行する。 Subsequently, the control unit 50 determines whether or not the steps of S140 to S180 have been executed for all lyrics words constituting the target lyrics (S190). As a result of the determination in S190, if the steps of S140 to S180 are not executed for all the lyrics words (S190: NO), the control unit 50 returns the reproduction process to S140. In S140, the control unit 50 extracts a new lyric word as a target word, and executes steps S150 to S190.

一方、Ｓ１８０での判定の結果、全ての歌詞ワードに対して、Ｓ１４０〜Ｓ１８０のステップを実行していれば（Ｓ１９０：ＹＥＳ）、制御部５０は、詳しくは後述するＳ２４０へと再生処理を移行させる。 On the other hand, as a result of the determination in S180, if the steps of S140 to S180 are executed for all the lyrics words (S190: YES), the control unit 50 shifts the reproduction process to S240 described later in detail. Let

ところで、対象歌詞が特定言語でない場合（Ｓ１３０：ＮＯ）に移行するＳ２００では、制御部５０は、対象歌詞を構成する全ての歌詞ワードの中から、１つの歌詞ワードを抽出する。このＳ２００では、制御部５０は、対象楽曲における時間の進行に沿って、歌詞ワードを抽出すればよい。なお、以下では、Ｓ２００にて抽出した１つの歌詞ワードについても、対象ワードと称す。 By the way, in S200 which transfers to the case where object lyrics are not a specific language (S130: NO), the control part 50 extracts one lyrics word from all the lyrics words which comprise object lyrics. In S <b> 200, the control unit 50 may extract a lyric word along with the progress of time in the target song. In the following, one lyric word extracted in S200 is also referred to as a target word.

続いて、制御部５０は、特定言語変換処理を実行する（Ｓ２１０）。さらに、制御部５０は、発声速度調整処理を実行する（Ｓ２２０）。続いて、制御部５０は、対象歌詞を構成する全ての歌詞ワードに対して、Ｓ２００〜Ｓ２２０のステップを実行したか否かを判
定する（Ｓ２３０）。 Subsequently, the control unit 50 executes a specific language conversion process (S210). Further, the control unit 50 executes a speech rate adjustment process (S220). Subsequently, the control unit 50 determines whether or not the steps of S200 to S220 have been executed for all the lyrics words constituting the target lyrics (S230).

このＳ２３０での判定の結果、全ての歌詞ワードに対して、Ｓ２００〜Ｓ２３０のステップを実行していなければ（Ｓ２３０：ＮＯ）、制御部５０は、再生処理をＳ２００へと戻す。そのＳ２００では、制御部５０は、新たな歌詞ワードを対象ワードとして抽出して、Ｓ２１０からＳ２３０までのステップを実行する。 As a result of the determination in S230, if the steps of S200 to S230 are not executed for all lyrics words (S230: NO), the control unit 50 returns the reproduction process to S200. In S200, the control unit 50 extracts a new lyric word as a target word and executes the steps from S210 to S230.

一方、Ｓ２３０での判定の結果、全ての歌詞ワードに対して、Ｓ２００〜Ｓ２２０のステップを実行していれば（Ｓ２３０：ＹＥＳ）、制御部５０は、再生処理をＳ２４０へと移行させる。 On the other hand, as a result of the determination in S230, if the steps of S200 to S220 are executed for all the lyrics words (S230: YES), the control unit 50 shifts the reproduction process to S240.

そのＳ２４０では、制御部５０は、Ｓ１１０で取得した楽曲ＩＤに対応するＭＩＤＩ楽曲ＭＤを情報処理サーバ１０から取得して、対象楽曲を再生する。このＳ２４０により、対象楽曲の演奏音がスピーカ６０から放音される。 In S240, the control unit 50 acquires the MIDI music MD corresponding to the music ID acquired in S110 from the information processing server 10, and reproduces the target music. By this S240, the performance sound of the target music is emitted from the speaker 60.

さらに、再生処理では、制御部５０は、記憶部３８に記憶された音源データＰＤに従って、対象楽曲の対象歌詞を正確に歌唱したガイドボーカルを音声合成によって生成して、出力する（Ｓ２５０）。Ｓ２５０では、制御部５０は、特定言語の表音文字だけで表された対象歌詞を、その表音文字に従って音声合成する。なお、本実施形態において、音声合成は、波形接続やフォルマント合成などの周知の手法によって実現すればよい。このＳ２５０により、ガイドボーカルがスピーカ６０から放音される。 Further, in the reproduction process, the control unit 50 generates and outputs a guide vocal that accurately sings the target lyrics of the target music according to the sound source data PD stored in the storage unit 38 (S250). In S250, the control unit 50 synthesizes the target lyrics expressed only by the phonetic characters in the specific language according to the phonetic characters. In the present embodiment, the speech synthesis may be realized by a known method such as waveform connection or formant synthesis. By this S250, the guide vocal is emitted from the speaker 60.

その後、本再生処理を終了する。
＜特定言語変換処理＞
次に、再生処理のＳ１７０，Ｓ２１０で起動される特定言語変換処理について説明する。 Thereafter, the reproduction process ends.
<Specific language conversion processing>
Next, the specific language conversion process activated in S170 and S210 of the reproduction process will be described.

この特定言語変換処理では、図３に示すように、制御部５０は、情報処理サーバ１０の記憶部１４に格納されている辞書データＤＤの中から１つの辞書データＤＤを選択して、対象ワードを照合する（Ｓ３１０）。以下、Ｓ３１０で選択した辞書データＤＤを対象辞書データと称す。 In this specific language conversion process, as shown in FIG. 3, the control unit 50 selects one dictionary data DD from the dictionary data DD stored in the storage unit 14 of the information processing server 10, and the target word Are collated (S310). Hereinafter, the dictionary data DD selected in S310 is referred to as target dictionary data.

このＳ３１０での照合の結果、対象ワードに適合する形態素が、対象辞書データＤＤに存在していなければ（Ｓ３２０：ＮＯ）、制御部５０は、特定言語変換処理をＳ３１０へと戻す。そのＳ３１０では、新たな辞書データＤＤを対象辞書データＤＤとして選択して、対象ワードを照合する。 As a result of the collation in S310, if a morpheme that matches the target word does not exist in the target dictionary data DD (S320: NO), the control unit 50 returns the specific language conversion process to S310. In S310, new dictionary data DD is selected as target dictionary data DD, and the target word is collated.

一方、Ｓ３１０での照合の結果、対象ワードに適合する形態素が、対象辞書データＤＤに存在していれば（Ｓ３２０：ＹＥＳ）、制御部５０は、特定言語変換処理をＳ３３０へと移行させる。そのＳ３３０では、制御部５０は、対象辞書データＤＤに基づいて、対象ワードを発音記号へと変換する。具体的にＳ３３０では、制御部５０は、図４（Ａ），図４（Ｂ）に示すように、対象ワード（即ち、形態素）に対応する発音記号を対象辞書データＤＤから特定して、その対象ワードを発音記号へと変換する。 On the other hand, as a result of the collation in S310, if a morpheme that matches the target word exists in the target dictionary data DD (S320: YES), the control unit 50 shifts the specific language conversion process to S330. In S330, the control unit 50 converts the target word into a phonetic symbol based on the target dictionary data DD. Specifically, in S330, the control unit 50 specifies a phonetic symbol corresponding to the target word (ie, morpheme) from the target dictionary data DD as shown in FIGS. 4 (A) and 4 (B). Convert the target word into phonetic symbols.

さらに、特定言語変換処理では、制御部５０は、Ｓ３３０で発音記号へと変換された対象ワードを、特定言語の表音文字へと置き換える（Ｓ３４０）。具体的にＳ３４０では、制御部５０は、図４（Ｃ）に示すように、対象ワードに対する特定言語の表音文字を対象辞書データＤＤから特定して、その対象ワードを特定言語の表音文字へと置換する。本実施形態において、特定言語の表音文字として、片仮名を用いることが考えられる。 Further, in the specific language conversion process, the control unit 50 replaces the target word converted into a phonetic symbol in S330 with a phonetic character in the specific language (S340). Specifically, in S340, as shown in FIG. 4C, the control unit 50 specifies a phonetic character in a specific language for the target word from the target dictionary data DD, and specifies the target word as a phonetic character in the specific language. Replace with In the present embodiment, it is conceivable to use katakana as phonetic characters in a specific language.

その後、制御部５０は、本特定言語変換処理を終了し、再生処理のＳ１８０またはＳ２２０へと処理を戻す。
すなわち、特定言語変換処理では、特定言語以外の言語で表された歌詞ワードを、その歌詞ワードの発音記号の発声態様に類似する発声態様を有した特定言語の表音文字へと置換する。
＜発声速度調整処理＞
次に、再生処理のＳ１８０，Ｓ２２０で起動される発声速度調整処理について説明する。 Thereafter, the control unit 50 ends the specific language conversion process, and returns the process to S180 or S220 of the reproduction process.
That is, in the specific language conversion process, a lyric word expressed in a language other than the specific language is replaced with a phonogram of a specific language having an utterance form similar to the utterance form of the pronunciation symbol of the lyric word.
<Voice rate adjustment processing>
Next, the speech rate adjustment process activated in the reproduction processes S180 and S220 will be described.

この発声速度調整処理が起動されると、図５に示すように、制御部５０は、特定言語の表音文字へと置換された対象ワードの文字数が、１文字であるか否かを判定する（Ｓ４１０）。このＳ４１０での判定の結果、文字数が１文字であれば（Ｓ４１０：ＹＥＳ）、制御部５０は、発声速度調整処理をＳ４２０へと移行させる。 When this utterance speed adjustment process is activated, as shown in FIG. 5, the control unit 50 determines whether or not the number of characters of the target word replaced with the phonetic character of the specific language is one character. (S410). As a result of the determination in S410, if the number of characters is one (S410: YES), the control unit 50 shifts the utterance speed adjustment processing to S420.

そのＳ４２０では、制御部５０は、対象ワードを表す表音文字１文字を、当該対象ワードが割り当てられた１つの音符に割り当てる。制御部５０は、その後、本発声速度調整処理を終了して、再生処理のＳ１９０またはＳ２３０へと処理を戻す。 In S420, the control unit 50 assigns one phonetic character representing the target word to one note to which the target word is assigned. Thereafter, the control unit 50 ends the speech rate adjustment process and returns the process to S190 or S230 of the reproduction process.

一方、Ｓ４１０での判定の結果、特定言語の表音文字へと置換された対象ワードの文字数が、１文字でなければ（Ｓ４１０：ＮＯ）、制御部５０は、発声速度調整処理をＳ４３０へと移行させる。そのＳ４３０では、制御部５０は、対象ワードが割り当てられた音符（以下、対象音符と称す）の情報を、ＭＩＤＩ楽曲ＭＤから取得する。このＳ４３０で取得する対象音符の情報とは、例えば４分音符や８分音符などの音符の種類や音符の音価である。 On the other hand, as a result of the determination in S410, if the number of characters of the target word replaced with the phonetic character in the specific language is not one character (S410: NO), the control unit 50 advances the utterance speed adjustment processing to S430. Transition. In S430, the control unit 50 acquires information on a note (hereinafter referred to as a target note) to which the target word is assigned from the MIDI music piece MD. The target note information acquired in S430 is the type of note such as a quarter note or an eighth note, or the note value of the note.

そして、発声速度調整処理では、制御部５０は、対象ワードを表す表音文字の文字数に音符の数が合致するように、対象音符を分割する（Ｓ４４０）。さらに、発声速度調整処理では、制御部５０は、Ｓ４４０で分割した対象音符のそれぞれに、対象ワードを表す表音文字の１つ１つを割り当てる（Ｓ４５０）。 In the utterance speed adjustment processing, the control unit 50 divides the target note so that the number of notes matches the number of phonograms representing the target word (S440). Further, in the utterance speed adjustment process, the control unit 50 assigns each phonogram representing the target word to each of the target notes divided in S440 (S450).

その後、制御部５０は、その後、本発声速度調整処理を終了して、再生処理のＳ１９０またはＳ２３０へと処理を戻す。
すなわち、発声速度調整処理では、図６に示すように、対象ワードを表す表音文字の文字数が、当該対象ワードが割り当てられた対象音符の数よりも多い場合、その対象ワードを表す表音文字の文字数に合致するように対象音符を分割する。そして、発声速度調整処理では、分割された各音符に、対象ワードを表す表音文字のそれぞれを割り当てる。
［実施形態の効果］
以上説明したように、カラオケ装置３０によれば、歌詞ワードを特定言語の表音文字へと変換することができる。つまり、カラオケ装置３０によれば、複数の言語の単語が対象歌詞に混在していたとしても、全ての単語（形態素）を特定言語の表音文字へと変換できる。 Thereafter, the control unit 50 ends the utterance speed adjustment process and returns the process to S190 or S230 of the reproduction process.
That is, in the utterance speed adjustment process, as shown in FIG. 6, when the number of phonograms representing the target word is larger than the number of target notes to which the target word is assigned, the phonogram representing the target word The target note is divided so as to match the number of characters. In the utterance speed adjustment process, each of the phonetic characters representing the target word is assigned to each divided note.
[Effect of the embodiment]
As explained above, according to the karaoke apparatus 30, a lyric word can be converted into a phonetic character of a specific language. That is, according to the karaoke apparatus 30, even if words in a plurality of languages are mixed in the target lyrics, all the words (morphemes) can be converted into phonetic characters in a specific language.

このため、カラオケ装置３０によれば、複数の言語それぞれのライブラリや合成エンジンを用意することなく、当該対象歌詞の合成音を生成して出力することができる。
換言すると、カラオケ装置３０によれば、複数の言語の単語が混在している対象歌詞であっても、当該発声内容情報の合成音を簡易に生成できる。 For this reason, according to the karaoke apparatus 30, it is possible to generate and output a synthesized sound of the target lyrics without preparing a library or a synthesis engine for each of a plurality of languages.
In other words, according to the karaoke apparatus 30, a synthesized sound of the utterance content information can be easily generated even for target lyrics in which words in a plurality of languages are mixed.

また、再生処理においては、対象ワードの発音記号を特定する条件を、当該対象ワードが特定言語以外の言語の形態素である場合としている。
これにより、再生処理によれば、１つの対象歌詞において、対象ワードの発音記号の特
定及び特定言語の表音文字への置換を実行する回数を低減でき、対象ワードの発音記号の特定及び特定言語の表音文字への置換を実行する処理の処理量を低減できる。 In the reproduction process, the condition for specifying the phonetic symbol of the target word is a case where the target word is a morpheme of a language other than the specific language.
As a result, according to the playback process, the number of executions of specifying the phonetic symbol of the target word and replacing it with a phonetic character in a specific language can be reduced in one target lyrics, specifying the phonetic symbol of the target word and the specific language It is possible to reduce the amount of processing for executing the replacement with the phonetic character.

ところで、対象歌詞の歌詞ワードを特定言語の表音文字へと変換した場合、その表音文字の文字数が対象音符の数よりも多くなる可能性がある。
このような場合であっても、本実施形態の再生処理では、発声速度調整処理において、１つの音符に１つの表音文字を割り当てることを実施している。この結果、再生処理によれば、対象楽曲の歌詞を歌唱した合成音を楽曲に適したものとすることができ、違和感の少ない合成音とすることができる。
［その他の実施形態］
以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において、様々な態様にて実施することが可能である。 By the way, when the lyrics word of the target lyric is converted into a phonetic character of a specific language, the number of characters of the phonetic character may be larger than the number of target notes.
Even in such a case, in the reproduction process of the present embodiment, one phonetic character is assigned to one note in the utterance speed adjustment process. As a result, according to the reproduction process, the synthesized sound in which the lyrics of the target song are sung can be made suitable for the song, and the synthesized sound with less sense of incongruity can be obtained.
[Other Embodiments]
As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment, In the range which does not deviate from the summary of this invention, it is possible to implement in various aspects.

例えば、上記実施形態の再生処理では、歌詞データを発声内容情報として取得していたが、発声内容情報は、これに限るものではない。すなわち、発声内容情報は、発声すべき内容の文言を表す情報であれば、文学であっても、その他の内容であってもよい。 For example, in the reproduction process of the above embodiment, the lyrics data is acquired as the utterance content information, but the utterance content information is not limited to this. That is, the utterance content information may be literature or other content as long as it is information representing the wording of the content to be uttered.

また、上記実施形態においては、日本語を特定言語としていたが、本発明における特定言語は、これに限るものでなく、どのような言語を特定言語としてもよい。
さらに、上記実施形態における辞書データＤＤには、各形態素と、その形態素の発音記号と、その発音記号に対応する特定言語の表音文字とが対応付けられていたが、この辞書データの構造は、これに限るものではない。例えば、辞書データは、各形態素と、その形態素の発音記号とを対応付けた第１辞書データ、及び発音記号のそれぞれと当該発音記号に対応する特定言語の表音文字とを対応付けた第２辞書データを備えていてもよい。 In the above embodiment, Japanese is the specific language, but the specific language in the present invention is not limited to this, and any language may be the specific language.
Furthermore, in the dictionary data DD in the above embodiment, each morpheme, a phonetic symbol of the morpheme, and a phonetic character in a specific language corresponding to the phonetic symbol are associated. The structure of this dictionary data is However, it is not limited to this. For example, the dictionary data includes first dictionary data in which each morpheme is associated with a phonetic symbol of the morpheme, and second data in which each phonetic symbol is associated with a phonetic character of a specific language corresponding to the phonetic symbol. Dictionary data may be provided.

また、上記実施形態においては、音源データＰＤは、カラオケ装置３０の記憶部３８に格納されていたが、本発明において、音源データＰＤは、カラオケ装置３０の記憶部３８に格納されていなくともよく、情報処理サーバ１０に格納されていてもよい。 In the above embodiment, the sound source data PD is stored in the storage unit 38 of the karaoke apparatus 30. However, in the present invention, the sound source data PD may not be stored in the storage unit 38 of the karaoke apparatus 30. It may be stored in the information processing server 10.

また、上記実施形態において、再生処理を実行する主体は、カラオケ装置３０であったが、再生処理の実行主体は、これに限るものではなく、パーソナルコンピュータやスマートホンなどの情報処理装置であってもよい。 Further, in the above embodiment, the main body that executes the playback process is the karaoke apparatus 30, but the main body that executes the playback process is not limited to this, and is an information processing apparatus such as a personal computer or a smart phone. Also good.

また、本発明は、前述したカラオケ装置（音声合成装置）の他、当該カラオケ装置としてコンピュータを機能させるためのプログラム、当該プログラムが記録された記録媒体など、種々の形態で実現することができる。 The present invention can be realized in various forms such as a program for causing a computer to function as the karaoke device, a recording medium on which the program is recorded, in addition to the karaoke device (speech synthesizer) described above.

なお、上記実施形態の構成の一部を省略した態様も本発明の実施形態である。また、上記実施形態と変形例とを適宜組み合わせて構成される態様も本発明の実施形態である。また、特許請求の範囲に記載した文言によって特定される発明の本質を逸脱しない限度において考え得るあらゆる態様も本発明の実施形態である。
［実施形態と特許請求の範囲との対応関係］
最後に、上記実施形態の記載と、特許請求の範囲の記載との関係を説明する。 In addition, the aspect which abbreviate | omitted a part of structure of the said embodiment is also embodiment of this invention. Further, an aspect configured by appropriately combining the above embodiment and the modification is also an embodiment of the present invention. Moreover, all the aspects which can be considered in the limit which does not deviate from the essence of the invention specified by the wording described in the claims are the embodiments of the present invention.
[Correspondence between Embodiment and Claims]
Finally, the relationship between the description of the above embodiment and the description of the scope of claims will be described.

上記実施形態の再生処理におけるＳ１２０を実行することで得られる機能が、特許請求の範囲に記載された内容取得手段の一例であり、特定言語変換処理のＳ３１０〜Ｓ３３０を実行することで得られる機能が、特許請求の範囲に記載された言語解析手段の一例である。また、特定言語変換処理のＳ３４０を実行することで得られる機能が、特許請求の範囲に記載された置換手段の一例であり、再生処理のＳ２５０を実行することで得られる機
能が、特許請求の範囲に記載された合成音出力手段の一例である。 The function obtained by executing S120 in the reproduction process of the above embodiment is an example of the content acquisition unit described in the claims, and the function obtained by executing S310 to S330 of the specific language conversion process Is an example of the language analysis means described in the claims. Further, the function obtained by executing S340 of the specific language conversion process is an example of the replacement means described in the claims, and the function obtained by executing S250 of the reproduction process is claimed. It is an example of the synthetic | combination sound output means described in the range.

さらに、発声速度調整処理のＳ４４０を実行することで得られる機能が、特許請求の範囲に記載された音符分割手段の一例であり、Ｓ４５０を実行することで得られる機能が、特許請求の範囲に記載された割当手段の一例である。 Furthermore, the function obtained by executing S440 of the utterance speed adjustment process is an example of the note dividing means described in the claims, and the function obtained by executing S450 is included in the claims. It is an example of the described allocation means.

１…音声合成システム１０…情報処理サーバ１２…通信部１４…記憶部１６…制御部１８，５２…ＲＯＭ２０，５４…ＲＡＭ２２，５６…ＣＰＵ３０…カラオケ装置３２…通信部３４…入力受付部３６…楽曲再生部３８…記憶部４０…音声制御部４２…出力部４４…マイク入力部４６…映像制御部５０…制御部６０…スピーカ６２…マイク６４…表示部 DESCRIPTION OF SYMBOLS 1 ... Speech synthesis system 10 ... Information processing server 12 ... Communication part 14 ... Memory | storage part 16 ... Control part 18,52 ... ROM 20,54 ... RAM 22,56 ... CPU 30 ... Karaoke apparatus 32 ... Communication part 34 ... Input reception part 36 ... Music reproduction unit 38 ... Storage unit 40 ... Audio control unit 42 ... Output unit 44 ... Microphone input unit 46 ... Video control unit 50 ... Control unit 60 ... Speaker 62 ... Microphone 64 ... Display unit

Claims

Content acquisition means for acquiring utterance content information representing the wording of the content to be uttered;
By comparing each morpheme and the phonetic symbol of the morpheme for each language, by comparing the constituent morpheme that is the morpheme constituting the utterance content information acquired by the content acquisition unit, Language analysis means for identifying phonetic symbols;
Based on the second dictionary data in which each of the phonetic symbols is associated with a phonetic character of a specific language corresponding to the phonetic symbol, the phonetic in a specific language corresponding to the phonetic symbol specified by the language analyzing means Replacement means for replacing each of the constituent morphemes into characters,
A speech synthesizer comprising: synthesized sound output means for generating and outputting a synthesized sound produced by uttering the utterance content information in a specific language according to the phonetic character replaced by the replacement means.

The language analysis means includes
It is determined whether or not the constituent morpheme is a morpheme of the specific language, and if the result of the determination is not a morpheme of the specific language, a phonetic symbol of the constituent morpheme is specified. The speech synthesizer described.

The language analysis means includes
The speech synthesis apparatus according to claim 1, wherein Japanese is used as the specific language, and it is determined whether or not the constituent morpheme is a morpheme of the specific language.

The content acquisition means includes
The lyric that is a word assigned to at least some of the notes constituting the melody of the music is acquired as the utterance content information. The speech synthesis apparatus according to one item.

When the number of characters of the phonetic character is greater than the number of target notes to which the constituent morpheme is assigned as a result of replacing the constituent morphemes with the phonetic characters of the specific language by the replacement means, Note dividing means for dividing the target note so as to match the number of characters,
Assigning means for assigning the phonogram to each note divided by the note dividing means;
The synthetic sound output means includes
The phonetic character according to any one of claims 1 to 4, wherein each phonetic character assigned to each note by the assigning unit is a phonetic character replaced by the replacing unit. Synthesizer.

A content acquisition procedure for acquiring utterance content information representing the wording of the content to be uttered;
By comparing each morpheme and a phonetic symbol of the morpheme for each language, by comparing the constituent morpheme that is the morpheme constituting the utterance content information acquired in the content acquisition procedure, Language analysis procedure to identify phonetic symbols;
Based on the second dictionary data in which each of the phonetic symbols is associated with a phonetic character in a specific language corresponding to the phonetic symbol, the phonetic in a specific language corresponding to the phonetic symbol specified in the language analysis procedure A replacement procedure for replacing each of the constituent morphemes into characters,
A program for causing a computer to execute a synthetic sound output procedure for generating and outputting a synthetic sound produced by uttering the utterance content information in a specific language according to the phonetic character replaced in the replacement procedure.