JP4839967B2

JP4839967B2 - Instruction device and program

Info

Publication number: JP4839967B2
Application number: JP2006155296A
Authority: JP
Inventors: 伸吾神谷
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-06-02
Filing date: 2006-06-02
Publication date: 2011-12-21
Anticipated expiration: 2026-06-02
Also published as: JP2007322933A

Description

本発明は、指導装置及びプログラムに関する。 The present invention relates to a guidance instrumentation 置及 beauty program.

歌唱や楽器演奏の指導においては、模範となる歌唱者（又は演奏者）が手本を示し、それに似せて歌唱（又は演奏）を行わせるといった指導が行われている。例えば、特許文献１には、ユーザの演奏と手本の演奏との差をユーザに分かりやすく伝えるために、ユーザ演奏表示と手本の演奏表示とを、それぞれのノートナンバおよびベロシティに応じた表示図形でそれぞれのノートナンバおよび発生タイミングに応じた位置に表示する方法が提案されている。
特開２００２−９１２９０号公報 In the guidance of singing and musical instrument performance, guidance is given in which a model singer (or player) shows a model and sings (or performs) in a similar manner. For example, in Japanese Patent Laid-Open No. 2004-260, the user performance display and the model performance display are displayed according to the respective note numbers and velocities in order to convey the difference between the user performance and the model performance to the user in an easy-to-understand manner. There has been proposed a method of displaying a figure in a position corresponding to each note number and generation timing.
JP 2002-91290 A

ところで、歌手のように熟練した歌唱者は、楽譜に沿って機械的に歌を歌うことはほとんどなく、その多くが、歌い始めや歌い終わりを意図的にずらしたり、ビブラートやこぶし等の歌唱技法を用いたりして歌のなかに情感を表現する。歌唱を練習する者は、このような意図的なタイミングのずれや歌唱技法を真似て歌いたいという要望をもつ者が少なくない。
しかしながら、従来の歌唱指導装置においては、このような歌唱者の意図的な歌唱方法を分かりやすく練習者に伝えることができなかった。これは楽器の演奏についても同様である。
本発明は上述した背景の下になされたものであり、歌唱者の歌唱方法（又は演奏者の演奏方法）を練習者に分かりやすく伝えることのできる技術を提供することを目的とする。 By the way, skilled singers like singers rarely sing songs mechanically along the score. To express emotion in the song. Many people who practice singing have a desire to sing by imitating such intentional timing shifts and singing techniques.
However, in the conventional singing instruction apparatus, such an intentional singing method of the singer could not be easily communicated to the practitioner. The same applies to the performance of musical instruments.
The present invention has been made under the background described above, and it is an object of the present invention to provide a technique that can easily convey a singer's singing method (or a player's playing method) to a practitioner.

本発明の好適な態様である指導装置は、発音タイミングが時系列に連なる複数の音素の発音タイミングを表す楽譜音データを記憶する第１の記憶手段と、模範となる音を表す模範音データを記憶する第２の記憶手段と、前記第２の記憶手段によって記憶されている模範音データを音素毎に区切り、前記第１の記憶手段によって記憶されている楽譜音データと音素単位で対応付ける対応付け手段と、前記対応付け手段による対応付け結果に基づいて、前記楽譜音データが表す各音素の発音タイミングを、前記模範音データが表す音素の発音タイミングに変換して、加工楽譜音データを生成する加工楽譜音データ生成手段と、前記楽譜音データと加工楽譜音データ生成手段により生成された加工楽譜音データとを出力する出力手段とを備える。
この態様において、前記出力手段は、前記楽譜音データが表す音素の発音区間と前記加工楽譜音データが表す音素の発音区間とを、同一の時間軸に対応させて表示手段に表示してもよい。
この態様において、前記模範音データから、当該模範音データのピッチ、スペクトル及びパワーのうちの少なくともいずれか一つを示す音分析データを生成する音分析データ生成手段と、前記音分析データ生成手段により生成された音分析データの時間的な変化のパターンを解析して、この解析結果が予め定められたパターンに対応するか否かを判定し、対応する場合には当該パターンに対応する区間を特定の技法が用いられている区間として特定する技法区間特定手段とを備え、前記出力手段は、前記加工楽譜音データが表す音素の発音区間のうち、前記技法区間特定手段によって特定された区間に対応する部分の表示態様を、それ以外の区間の表示態様と異ならせて前記表示手段に表示してもよい。
この態様において、前記音分析データ生成手段は、前記模範音データからピッチを示す
音分析データを生成し、前記技法区間特定手段は、前記音分析データ生成手段により生成された音分析データの示すピッチの時間的な変化のパターンを解析して、低いピッチから高いピッチに連続的に変化する区間を特定し、前記出力手段は、前記加工楽譜音データが表す音素の発音区間のうち、前記技法区間特定手段によって特定された区間に対応する部分の表示態様を、当該区間のピッチ変化を表す表示態様にして前記表示手段に表示してもよい。 A teaching device according to a preferred aspect of the present invention includes: first storage means for storing musical score sound data representing pronunciation timings of a plurality of phonemes whose pronunciation timings are connected in time series; and model sound data representing a model sound. The second storage means for storing and the model sound data stored by the second storage means are divided for each phoneme, and the association is made to associate the musical score sound data stored by the first storage means with each phoneme unit. Based on the association result by the means and the association means, the pronunciation timing of each phoneme represented by the musical score sound data is converted into the pronunciation timing of the phoneme represented by the exemplary sound data to generate processed musical score sound data Processed musical score data generating means; and output means for outputting the musical score data and the processed musical score data generated by the processed musical score data generating means.
In this aspect, the output means may display on the display means the pronunciation period of the phoneme represented by the musical score sound data and the pronunciation period of the phoneme represented by the processed musical score data corresponding to the same time axis. .
In this aspect, the sound analysis data generating means for generating sound analysis data indicating at least one of the pitch, spectrum and power of the model sound data from the model sound data, and the sound analysis data generation means Analyzes the temporal change pattern of the generated sound analysis data, determines whether or not the analysis result corresponds to a predetermined pattern, and if so, identifies the section corresponding to the pattern A technique section specifying means for specifying as a section in which the technique is used, and the output means corresponds to a section specified by the technique section specifying means among the phoneme pronunciation sections represented by the processed musical score data. The display mode of the portion to be displayed may be displayed on the display means different from the display mode of the other sections.
In this aspect, the sound analysis data generating means generates sound analysis data indicating a pitch from the model sound data, and the technique section specifying means is a pitch indicated by the sound analysis data generated by the sound analysis data generating means. Analyzing the temporal change pattern of the sound, identifying a section that continuously changes from a low pitch to a high pitch, and the output means includes the technique section of the phoneme pronunciation section represented by the processed musical score data. You may display on the said display means the display mode of the part corresponding to the area specified by the specific means as a display mode showing the pitch change of the said area.

本発明によれば、歌唱者の歌唱方法（又は演奏者の演奏方法）を練習者に分かりやすく伝えることができる。 According to the present invention, the singing method of the singer (or the performing method of the performer) can be communicated to the practitioner in an easy-to-understand manner.

次に、この発明を実施するための最良の形態を説明する。
＜Ａ：構成＞
図１は、この発明の一実施形態であるカラオケ装置１のハードウェア構成を示したブロック図である。このカラオケ装置１は、カラオケ伴奏を再生するカラオケ機能を備えるとともに、練習者に歌唱指導を行うための歌唱の指導装置としても機能する。図において、ＣＰＵ（Central Processing Unit）１１は、ＲＯＭ（Read Only Memory）１２又は記憶部１４に記憶されているコンピュータプログラムを読み出してＲＡＭ（Random Access Memory）１３にロードし、これを実行することにより、カラオケ装置１の各部を制御する。記憶部１４は、例えばハードディスクなどの大容量の記憶手段であり、伴奏データ記憶領域１４ａと、楽譜音データ記憶領域１４ｂと、模範音データ記憶領域１４ｃと、模範技法データ記憶領域１４ｄとを有している。表示部１５は、例えば液晶ディスプレイなどであり、ＣＰＵ１１の制御の下で、カラオケ装置１を操作するためのメニュー画面や、背景画像に歌詞テロップが重ねられたカラオケ画面などの各種画面を表示する。操作部１６は、テンキーや上下キー、演奏開始キーなどの各種のキーを備えており、押下されたキーに対応した操作信号をＣＰＵ１１へ出力する。マイクロフォン１７は、練習者が発声した音声を収音し、音声信号（アナログデータ）を出力する。音声処理部１８は、マイクロフォン１７が出力する音声信号（アナログデータ）をデジタルデータに変換してＣＰＵ１１に出力する。スピーカ１９は、音声処理部１８から出力される音声信号に応じた強度で放音する。 Next, the best mode for carrying out the present invention will be described.
<A: Configuration>
FIG. 1 is a block diagram showing a hardware configuration of a karaoke apparatus 1 according to an embodiment of the present invention. The karaoke apparatus 1 has a karaoke function for reproducing karaoke accompaniment and also functions as a singing instruction apparatus for instructing a practitioner to sing. In the figure, a CPU (Central Processing Unit) 11 reads a computer program stored in a ROM (Read Only Memory) 12 or a storage unit 14, loads it into a RAM (Random Access Memory) 13, and executes it. Control each part of the karaoke apparatus 1. The storage unit 14 is a large-capacity storage unit such as a hard disk, and includes an accompaniment data storage area 14a, a musical score data storage area 14b, an exemplary sound data storage area 14c, and an exemplary technique data storage area 14d. ing. The display unit 15 is, for example, a liquid crystal display, and displays various screens such as a menu screen for operating the karaoke device 1 and a karaoke screen in which lyrics telop is superimposed on a background image under the control of the CPU 11. The operation unit 16 includes various keys such as a numeric keypad, an up / down key, and a performance start key, and outputs an operation signal corresponding to the pressed key to the CPU 11. The microphone 17 collects the voice uttered by the practitioner and outputs a voice signal (analog data). The audio processing unit 18 converts an audio signal (analog data) output from the microphone 17 into digital data and outputs the digital data to the CPU 11. The speaker 19 emits sound with an intensity corresponding to the audio signal output from the audio processing unit 18.

記憶部１４の伴奏データ記憶領域１４ａには、楽曲の伴奏を行う各種楽器の演奏音が楽曲の進行に伴って記された伴奏データが、楽曲に割り当てられた楽曲ＩＤに関連付けられて記憶されている。伴奏データは、例えばＭＩＤＩ（Musical Instruments Digital Interface）形式などのデータ形式であり、練習者がカラオケ歌唱する際に再生される。
楽譜音データ記憶領域１４ｂには、発音タイミングが時系列に連なる複数の音素（歌詞を構成するそれぞれの語）の発音タイミングを表す楽譜音データが記憶されている。この楽譜音データのデータ形式は、ＭＩＤＩ形式などのデータ形式である。 In the accompaniment data storage area 14a of the storage unit 14, accompaniment data in which performance sounds of various musical instruments that accompany the music are recorded as the music progresses is stored in association with the music ID assigned to the music. Yes. The accompaniment data has a data format such as MIDI (Musical Instruments Digital Interface) format, and is reproduced when the practitioner sings a karaoke song.
In the musical score data storage area 14b, musical score data representing the pronunciation timing of a plurality of phonemes (each word constituting the lyrics) whose pronunciation timing is continuous in time series is stored. The data format of the musical score data is a data format such as MIDI format.

模範音データ記憶領域１４ｃには、例えばＷＡＶＥ形式やＭＰ３（MPEG1 Audio Layer-3）形式などの音声データであって、伴奏データによって表される伴奏に従って歌唱者が歌唱した模範となる歌唱音声（以下、模範音声）を表す音声データ（以下、模範音データ）が記憶されている。 In the model sound data storage area 14c, for example, voice data in the WAVE format or MP3 (MPEG1 Audio Layer-3) format, etc., which is a singing voice (hereinafter referred to as a model) sung by the singer in accordance with the accompaniment represented by the accompaniment data , Voice data representing model voice) (hereinafter, model voice data) is stored.

模範技法データ記憶領域１４ｄには、模範音データ記憶領域１４ｃに記憶された模範音データの表す模範となる歌唱音声に用いられている歌唱技法の種類とタイミングとを示すデータ（以下、「模範技法データ」）が記憶される。
図２は、模範技法データの内容の一例を示す図である。図示のように、模範技法データは、「区間情報」と「技法種別」との各項目が互いに関連付けられている。これらの項目のうち、「区間情報」の項目には、模範音データにおいて歌唱技法が用いられた区間を示す情報が記憶される。なお、この区間情報が示す区間は、開始時刻情報と終了時刻情報とによって表される時間幅を有した区間であってもよく、また、ある１点の時刻を示すものであってもよい。
「技法種別」の項目には、例えば「ビブラート」、「しゃくり」、「こぶし」、「ファルセット」、「つっこみ」、「ため」、「息継ぎ」などの歌唱の技法を識別する識別情報が記憶される。「ビブラート」は、音の高さをほんのわずかに連続的に上下させ、震えるような音色を出すという技法である。「しゃくり」は、目的の音より低い音から発音し、音程を滑らかに目的の音に近づけていくという技法である。「こぶし」は、装飾的に加える、うねるような節回しを行うという技法である。「ファルセット」は、いわゆる「裏声」で歌うという技法である。「つっこみ」は、歌い出しを本来のタイミングよりも早いタイミングにするという技法である。「ため」は、歌い出しを本来のタイミングよりも遅いタイミングにするという技法である。「息継ぎ」は、練習者が息継ぎをするタイミングを意味する。 In the exemplary technique data storage area 14d, data indicating the type and timing of the singing technique used for the exemplary singing voice represented by the exemplary sound data stored in the exemplary sound data storage area 14c (hereinafter referred to as "exemplary technique"). Data ") is stored.
FIG. 2 is a diagram illustrating an example of the contents of model technique data. As illustrated, in the exemplary technique data, items of “section information” and “technique type” are associated with each other. Among these items, the “section information” item stores information indicating a section in which the singing technique is used in the model sound data. The section indicated by the section information may be a section having a time width represented by the start time information and the end time information, or may indicate a certain point of time.
In the item of “technical type”, for example, identification information for identifying a technique of singing such as “vibrato”, “shakuri”, “fist”, “farset”, “tsukkomi”, “for”, “breathing” is stored. The “Vibrato” is a technique that produces a trembling tone by raising and lowering the pitch of the sound only slightly. “Shikkuri” is a technique in which sound is generated from a sound lower than the target sound, and the pitch is smoothly brought close to the target sound. “Fist” is a technique of adding decorative and undulating tunes. “Falset” is a technique of singing with a so-called “back voice”. “Tsukumi” is a technique in which singing is performed at a timing earlier than the original timing. “Tame” is a technique in which singing is made later than the original timing. “Respiration” means the timing when the practitioner takes a breath.

＜Ｂ：動作＞
＜Ｂ−１：動作例１＞
次に、図３に示すフローチャートを参照しつつ、カラオケ装置１の動作を説明する。
練習者は、カラオケ装置１の操作部１６を操作して、歌唱したい楽曲を選定する操作を行う。操作部１６は操作された内容に応じた操作信号をＣＰＵ１１へ出力し、ＣＰＵ１１は、操作部１６から出力される操作信号に応じて、楽曲を選定する。ＣＰＵ１１は、選定した楽曲に対応する模範音データを模範音データ記憶領域１４ｃから読み出し、読み出した模範音声データから、所定時間長のフレーム単位でピッチ及びスペクトルを検出し、検出したピッチ、スペクトルを示す音分析データを生成する（ステップＳ１）。スペクトルの検出にはＦＦＴ（Fast Fourier Transform）が用いられる。続けて、ＣＰＵ１１は、模範音データのスペクトルと楽譜音データとに基づいて、模範音データに含まれる音素（語）と楽譜音データに含まれる音素（語）との対応関係（対応箇所）を求める（ステップＳ２）。すなわち、ＣＰＵ１１は、模範音データを音素毎に区切り、模範音データと楽譜音データとを音素単位で対応付ける。 <B: Operation>
<B-1: Operation example 1>
Next, the operation of the karaoke apparatus 1 will be described with reference to the flowchart shown in FIG.
The practitioner operates the operation unit 16 of the karaoke apparatus 1 to perform an operation of selecting a song to be sung. The operation unit 16 outputs an operation signal corresponding to the operated content to the CPU 11, and the CPU 11 selects music according to the operation signal output from the operation unit 16. The CPU 11 reads out the model sound data corresponding to the selected music piece from the model sound data storage area 14c, detects the pitch and spectrum from the read out model sound data in frame units of a predetermined time length, and indicates the detected pitch and spectrum. Sound analysis data is generated (step S1). FFT (Fast Fourier Transform) is used for spectrum detection. Subsequently, based on the spectrum of the model sound data and the score sound data, the CPU 11 determines the correspondence (corresponding part) between the phoneme (word) included in the model sound data and the phoneme (word) included in the score sound data. Obtained (step S2). That is, the CPU 11 divides the model sound data into phonemes, and associates the model sound data and the score sound data in units of phonemes.

模範音声データの各音素の発音タイミングと楽譜音データの各音素の発音タイミングとは時間的に前後にずれている可能性がある。具体的には、例えば、模範となる歌唱者が歌い始めや歌い終わりを意図的にずらして歌唱した場合には、模範音声と楽譜音とは時間的に前後にずれている。このように模範音声と楽譜音とが時間的に前後にずれている場合であっても、両者を対応付けられるようにするため、音声認識技術を用いて模範音データにおける歌詞（音素）の発音タイミングを特定する。なお、音声認識技術に限らず、模範音データにおける各音素の発音タイミングは、予め手動で切っておき、音素と発音タイミングを対応付けて記憶しておいたものを使用してもよい。
図４において、波形Ｇ１は、模範音データの表す音声を示す波形であり、実線Ｇ２１〜Ｇ２８は、楽譜音データの表す音素毎のピッチと発音タイミングとを示すものである。図において、例えば、歌詞の「た」に対応する音素の発音開始タイミングと発音終了タイミングとは、模範音声と楽譜音とで時間的に前後にずれていることがわかる。このように、模範音声の発音タイミングが楽譜音の発音タイミングとずれていたとしても、音声認識を行うことにより、一方の音データの時間軸を他方の音データの時間軸に合わせて伸縮し、その伸縮によって合わせられた時間軸上の位置を同じくする音素どうしを対応付けることができる。 There is a possibility that the sounding timing of each phoneme in the model voice data and the sounding timing of each phoneme in the musical score sound data are shifted in time. Specifically, for example, when an exemplary singer sings by intentionally shifting the start and end of singing, the exemplary voice and the score sound are shifted forward and backward in time. In this way, even if the model voice and the score sound are shifted forward and backward in time, the pronunciation of the lyrics (phonemes) in the model sound data is performed using voice recognition technology so that they can be associated with each other. Identify timing. Note that the sound generation timing of each phoneme in the model sound data may be manually cut in advance and stored in association with the phoneme and the sound generation timing.
In FIG. 4, a waveform G <b> 1 is a waveform indicating the sound represented by the model sound data, and solid lines G <b> 21 to G <b> 28 indicate the pitch and sounding timing for each phoneme represented by the musical score sound data. In the figure, for example, it can be seen that the pronunciation start timing and the pronunciation end timing of the phoneme corresponding to the word “ta” are shifted forward and backward in time between the model voice and the score sound. In this way, even if the sound generation timing of the model voice is shifted from the sound generation timing of the score sound, by performing voice recognition, the time axis of one sound data is expanded or contracted to match the time axis of the other sound data, It is possible to associate phonemes having the same position on the time axis adjusted by the expansion and contraction.

次いで、ＣＰＵ１１は、ステップＳ２の対応付け結果に基づいて、楽譜音データが表す各音素の発音タイミングを、模範音データが表す音素の発音タイミングに変換して、加工楽譜音データを生成する（ステップＳ３）。次いで、ＣＰＵ１１は、楽譜音データとステップＳ３で生成した加工楽譜音データとを表示部１５に出力することによって、楽譜音データが表す音素の発音区間と加工楽譜音データが表す音素の発音区間とを、同一の時間軸に対応させて表示部１５に表示させる（ステップＳ４）。
図５は、ステップＳ４において表示部１５に表示される画面の一例を示す図である。図において、実線Ｇ２１〜Ｇ２８（以下、「実線Ｇ２」と称する）は楽譜音データが表す音素の発音区間を示すものであり、実線Ｇ３１〜Ｇ３８（以下、「実線Ｇ３」と称する）は、加工楽譜音データが表す音素の発音区間を示すものである。なお、図５に示す例においては、参考のために模範音データの波形Ｇ１も示しているが、この波形Ｇ１は表示されなくてもよい。
実線Ｇ２と実線Ｇ３とにおいて、縦軸はピッチの高低を示し、横軸は時刻を示している。ＣＰＵ１１は、実線Ｇ２と実線Ｇとを同一の時間軸に対応させて表示部１５に表示させる。すなわち、実線Ｇ２と実線Ｇ３とによって、楽譜音と加工楽譜音とのそれぞれに含まれる音素のピッチと発音タイミング（発音開始タイミングと発音終了タイミング）とが表現されている。また、実線Ｇ２と実線Ｇ３は、音素毎の区切り位置で区切られて表示される。 Next, the CPU 11 converts the sound generation timing of each phoneme represented by the musical score sound data to the sound generation timing of the phoneme represented by the model sound data based on the association result of Step S2, and generates processed musical score sound data (Step S11). S3). Next, the CPU 11 outputs the musical score sound data and the processed musical score sound data generated in step S3 to the display unit 15, thereby generating the phoneme sounding interval represented by the musical score sound data and the phoneme sounding interval represented by the processed musical score sound data. Are displayed on the display unit 15 in correspondence with the same time axis (step S4).
FIG. 5 is a diagram illustrating an example of a screen displayed on the display unit 15 in step S4. In the figure, solid lines G21 to G28 (hereinafter referred to as “solid line G2”) indicate the phoneme sounding sections represented by the musical score data, and solid lines G31 to G38 (hereinafter referred to as “solid line G3”) are processed. It shows the phoneme pronunciation interval represented by the musical score data. In addition, in the example shown in FIG. 5, although the waveform G1 of model sound data is also shown for reference, this waveform G1 may not be displayed.
In the solid line G2 and the solid line G3, the vertical axis indicates the pitch height, and the horizontal axis indicates the time. The CPU 11 displays the solid line G2 and the solid line G on the display unit 15 in association with the same time axis. That is, the solid line G2 and the solid line G3 represent the pitches of the phonemes and the sounding timings (sounding start timing and sounding end timing) included in each of the musical score sound and the processed musical score sound. Further, the solid line G2 and the solid line G3 are displayed by being separated at a separation position for each phoneme.

模範となる歌唱者が歌い出しを本来のタイミングよりも早いタイミングにするいわゆる「つっこみ」という技法や、歌い出しを本来のタイミングよりも遅いタイミングにするいわゆる「ため」という技法を用いた場合、練習者は、楽譜音を表す実線Ｇ２を見ても、模範となる歌唱者が発音タイミングをずらしている箇所を把握することはできない。また、波形Ｇ１に示すような模範音声の波形を見たとしても、波形から利用者が発音タイミングを把握することは困難であることが多い。これに対し、本実施形態においては、実線Ｇ３によって示される音素の発音タイミングは模範音声の音素の発音タイミングであるから、練習者は、表示部１５に表示される画面を見ることで、模範音声の発音タイミングを把握することが容易になる。さらに練習者の音程カーブを重ねて表示することで模範音声に対する発音タイミングの遅れ進みを表現することができる。 When the model singer uses the so-called “push” technique that makes the singing start earlier than the original timing or the so-called “for” technique that makes the singing later than the original timing, practice Even if the person sees the solid line G2 representing the musical score, he cannot grasp the location where the model singer has shifted the pronunciation timing. Further, even if the waveform of the model voice as shown in the waveform G1 is viewed, it is often difficult for the user to grasp the sound generation timing from the waveform. On the other hand, in the present embodiment, since the phoneme pronunciation timing indicated by the solid line G3 is the phoneme pronunciation timing of the model voice, the practitioner can view the model voice by looking at the screen displayed on the display unit 15. It becomes easy to grasp the pronunciation timing. Furthermore, by displaying the practitioner's pitch curve in an overlapping manner, it is possible to express the delay in pronunciation timing with respect to the model voice.

＜Ｂ−２：動作例２＞
次に、この実施形態の第２の動作例について、図６に示すフローチャートを参照しつつ以下に説明する。なお、図６に示すフローチャートにおいて、ステップＳ１，ステップＳ２に示す処理は、図３に示したそれと同様であり、ここではその説明を省略する。
模範音データと楽譜音データとの対応関係を求める（ステップＳ２）と、次いで、ＣＰＵ１１は、「しゃくり」の技法が用いられている区間を、模範音データから検出されたピッチに基づいて特定する。そして、ＣＰＵ１１は、特定した区間の区間情報を、その歌唱技法を示す種別情報と関連付けて記憶部１４の模範技法データ記憶領域１４ｄに記憶する（ステップＳ１３）。より詳細には、ＣＰＵ１１は、模範音データから算出したピッチの時間的な変化のパターンを解析して、低いピッチから高いピッチに連続的にピッチが変化する区間を検出し、検出した区間を「しゃくり」の歌唱技法が用いられている区間であると特定する。このとき、ＣＰＵ１１は、特定した区間の始めのピッチと終わりのピッチとを示すピッチデータを、「しゃくり」を示す種別情報と特定した区間の区間情報とに関連付けて模範技法データ記憶領域１４ｄに記憶する。
この具体例について、図７を参照しつつ説明する。図７において、曲線Ｇ４は模範音声のピッチを表すグラフである。ＣＰＵ１１がステップＳ１３の処理を実行することによって、曲線Ｇ４の区間ｋ１，ｋ２，ｋ３が、「しゃくり」の技法が用いられている区間であると特定される。
なお、この処理は、楽譜音データとの対応関係に基づいて行うようにしてもよい。すなわち、ＣＰＵ１１は、模範音データとすでに作成したＧ３との対応関係に基づいて、模範音データのピッチが、低いピッチから連続的に楽譜音データのピッチに近づいている区間を検出すればよい。 <B-2: Operation example 2>
Next, a second operation example of this embodiment will be described below with reference to the flowchart shown in FIG. In the flowchart shown in FIG. 6, the processing shown in steps S1 and S2 is the same as that shown in FIG. 3, and the description thereof is omitted here.
When the correspondence relationship between the model sound data and the musical score sound data is obtained (step S2), the CPU 11 then specifies a section in which the technique of “shaking” is used based on the pitch detected from the model sound data. . Then, the CPU 11 stores the section information of the specified section in the model technique data storage area 14d of the storage unit 14 in association with the type information indicating the singing technique (step S13). More specifically, the CPU 11 analyzes a pattern of temporal change in pitch calculated from the model sound data, detects a section where the pitch continuously changes from a low pitch to a high pitch, and detects the detected section as “ Identifies the section in which the singing technique of “shakuri” is used. At this time, the CPU 11 stores the pitch data indicating the start pitch and the end pitch of the identified section in the exemplary technique data storage area 14d in association with the type information indicating “shak” and the section information of the identified section. To do.
A specific example will be described with reference to FIG. In FIG. 7, a curve G4 is a graph representing the pitch of the model voice. When the CPU 11 executes the process of step S13, the sections k1, k2, and k3 of the curve G4 are specified as the sections in which the “shaking” technique is used.
This process may be performed based on the correspondence with the musical score data. That is, the CPU 11 may detect a section in which the pitch of the model sound data continuously approaches the pitch of the score sound data from a low pitch based on the correspondence relationship between the model sound data and the already created G3.

次いで、ＣＰＵ１１は、ステップＳ２の対応付け結果に基づいて、楽譜音データが表す各音素の発音タイミングを、模範音データが表す音素の発音タイミングに変換して、加工楽譜音データを生成する（ステップＳ１４）。なお、この処理は、上述した図３のステップＳ３の処理と同様である。 Next, the CPU 11 converts the sound generation timing of each phoneme represented by the musical score sound data to the sound generation timing of the phoneme represented by the model sound data based on the association result of Step S2, and generates processed musical score sound data (Step S11). S14). This process is the same as the process in step S3 in FIG.

次いで、ＣＰＵ１１は、楽譜音データとステップＳ１４で生成した加工楽譜音データとを表示部１５に出力することによって、楽譜音データが表す各音素の発音区間と加工楽譜音データが表す音素の発音区間とを、同一の時間軸に対応させて表示部１５に表示させる。（ステップＳ１５）。このとき、ＣＰＵ１１は、ステップＳ１４の処理により記憶部１４の模範技法データ記憶領域１４ｄに記憶された模範技法データを参照して、加工楽譜音データが表す音素の発音区間のうち、「しゃくり」の技法が用いられている区間に対応する部分の表示態様を、当該区間のピッチ変化を表す表示態様にして表示部１５に表示させる。
図８は、ステップＳ１５において表示部１５に表示される画面の一例を示す図である。図において、Ｇ３１〜Ｇ３８はステップＳ１４で作成したものであり、点線Ｇ３１Ａ〜Ｇ３８Ａは、加工楽譜音データを表すものである。ＣＰＵ１１は、点線Ｇ３１Ａ〜Ｇ３８Ａにおいて「しゃくり」の技法が用いられている区間ｋ１，ｋ２，ｋ３に対応する部分を、その区間のピッチ変化を示す形状で表示している。具体的には、例えば、区間ｋ１においては、開始時刻のピッチｐ１から終了時刻のピッチｐ２までピッチ変化が示されている。
練習者は、表示部１５に表示される画面を見ることで、どのタイミングで「しゃくり」の技法が用いられているかを把握することが容易になる。 Next, the CPU 11 outputs the musical score sound data and the processed musical score sound data generated in step S14 to the display unit 15 so that the pronunciation interval of each phoneme represented by the musical score sound data and the pronunciation interval of the phoneme represented by the modified musical score data. Are displayed on the display unit 15 in correspondence with the same time axis. (Step S15). At this time, the CPU 11 refers to the model technique data stored in the model technique data storage area 14d of the storage unit 14 by the process of step S14, and among the pronunciation intervals of the phonemes represented by the processed musical score sound data, The display mode of the portion corresponding to the section in which the technique is used is displayed on the display unit 15 in a display mode that represents the pitch change of the section.
FIG. 8 is a diagram illustrating an example of a screen displayed on the display unit 15 in step S15. In the figure, G31 to G38 are created in step S14, and dotted lines G31A to G38A represent processed musical score data. The CPU 11 displays the portions corresponding to the sections k1, k2, and k3 where the “shackle” technique is used in the dotted lines G31A to G38A in a shape indicating the pitch change of the section. Specifically, for example, in the section k1, the pitch change is shown from the pitch p1 at the start time to the pitch p2 at the end time.
By viewing the screen displayed on the display unit 15, it becomes easy for the practitioner to understand at what timing the “sucking” technique is used.

＜Ｃ：変形例＞
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。以下にその一例を示す。
（１）上述した第２の動作例では、ＣＰＵ１１は、模範音データから「しゃくり」の技法が用いられている区間を抽出した。抽出する技法は「しゃくり」に限らず、例えば、「ビブラート」、「こぶし」、「ファルセット」、「息継ぎ」、「スタッカート」、「クレッシェンド（デクレッシェンド）」などであってもよい。
具体的には、ＣＰＵ１１は、模範音データから算出したピッチの時間的な変化のパターンを解析して、中心となる周波数の上下に所定の範囲内でピッチが連続的に変動している区間を検出し、検出した区間を「ビブラート」の歌唱技法が用いられている区間であると特定する。
また、ＣＰＵ１１は、模範音データと楽譜音データとの対応関係と、模範音データから算出されたパワーとに基づいて、楽譜音データが有音である区間であって模範音データのパワー値が所定の閾値よりも小さい区間を検出し、検出した区間を「息継ぎ」の区間であると特定する。
また、ＣＰＵ１１は、模範音データから算出されたスペクトルの時間的な変化パターンを解析して、スペクトル特性がその予め決められた変化状態に急激に遷移している区間を検出し、検出した区間を「ファルセット」の歌唱技法が用いられている区間であると特定する。ここで、予め決められた変化状態とは、スペクトル特性の高調波成分が極端に少なくなる状態である。例えば、地声の場合は沢山の高調波成分が含まれるが、ファルセットになると高調波成分の大きさが極端に小さくなる。なお、この場合、ＣＰＵ１１は、ピッチが大幅に上方に変化したかどうかも参照してもよい。ファルセットは地声と同一のピッチを発生する場合でも用いられることもあるが、一般には地声では発声できない高音を発声するときに使われる技法だからである。したがって、音声データのピッチが所定音高以上の場合に限って「ファルセット」の検出をするように構成してもよい。また、男声と女声とでは一般にファルセットを用いる音高の領域が異なるので、音声データの音域や、音声データから検出されるフォルマントによって性別検出を行い、この結果を踏まえてファルセット検出の音高領域を設定してもよい。
また、ＣＰＵ１１は、スペクトル特性の変化の態様が短時間に多様に切り替わる区間を検出し、検出した部分を「こぶし」の歌唱技法が用いられている部分であると特定する。「こぶし」の場合は、短い区間において声色や発声方法を変えて唸るような味わいを付加する歌唱技法であるため、この技法が用いられている区間においてはスペクトル特性が多様に変化するからである。
また、ＣＰＵ１１は、模範音データから検出したパワーがある短い一定期間の間だけ強く現れる区間をスタッカートとして検出するようにしてもよい。また、パワーデータ値が連続的に徐々に大きくなる（小さくなる）区間をクレッシェンド（デクレッシェンド）として検出するようにしてもよい。
要するに、ＣＰＵ１１が、模範音データから生成された音分析データの示すピッチ、パワー及びスペクトルの時間的な変化のパターンを解析して、この解析結果が予め定められたパターンに対応するか否かを判定し、対応する場合には当該パターンに対応する区間を特定の歌唱技法が用いられている区間として特定すればよい。 <C: Modification>
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. An example is shown below.
(1) In the above-described second operation example, the CPU 11 extracts a section in which the “shrimp” technique is used from the model sound data. The technique of extraction is not limited to “shrimp”, and may be, for example, “vibrato”, “fist”, “farset”, “breathing”, “staccato”, “crescendo”, and the like.
Specifically, the CPU 11 analyzes a pattern of temporal change of the pitch calculated from the model sound data, and determines a section where the pitch continuously fluctuates within a predetermined range above and below the central frequency. The detected section is identified as a section in which the “vibrato” singing technique is used.
Further, the CPU 11 is a section in which the musical score sound data is sound based on the correspondence between the exemplary sound data and the musical score sound data and the power calculated from the exemplary musical sound data, and the power value of the exemplary musical sound data is the same. A section smaller than a predetermined threshold is detected, and the detected section is specified as a "breathing" section.
In addition, the CPU 11 analyzes the temporal change pattern of the spectrum calculated from the model sound data, detects a section where the spectrum characteristic is abruptly changed to the predetermined change state, and detects the detected section. It is specified that the section uses the “Falset” singing technique. Here, the predetermined change state is a state in which the harmonic component of the spectrum characteristic is extremely reduced. For example, in the case of a local voice, many harmonic components are included, but when a false set is used, the magnitude of the harmonic components becomes extremely small. In this case, the CPU 11 may also refer to whether or not the pitch has changed significantly upward. The falset is sometimes used even when generating the same pitch as the local voice, but is generally a technique used when generating high-pitched sounds that cannot be generated by the local voice. Therefore, “Falset” may be detected only when the pitch of the audio data is equal to or higher than a predetermined pitch. In addition, since the pitch range using the falset is generally different between male voice and female voice, gender detection is performed based on the voice data range and formants detected from the voice data, and based on this result, the pitch range for falset detection is determined. It may be set.
In addition, the CPU 11 detects a section in which the mode of change of the spectrum characteristic is variously switched in a short time, and identifies the detected part as a part where the “fist” singing technique is used. In the case of “fist”, it is a singing technique that adds a taste that changes the voice color and utterance method in a short section, so the spectral characteristics change variously in the section where this technique is used. .
Further, the CPU 11 may detect a section in which the power detected from the model sound data appears strong only for a short period of time as a staccato. Alternatively, a section in which the power data value continuously increases and decreases (decrease) may be detected as crescendo (decrescendo).
In short, the CPU 11 analyzes a pattern of temporal change in pitch, power, and spectrum indicated by sound analysis data generated from the model sound data, and determines whether or not the analysis result corresponds to a predetermined pattern. When it determines and respond | corresponds, what is necessary is just to identify the area corresponding to the said pattern as an area where the specific song technique is used.

「しゃくり」以外の歌唱技法を検出する場合においても、ＣＰＵ１１は、加工楽譜音データが表す音素の発音区間のうち、技法が用いられていると特定された区間に対応する部分の表示態様を、当該区間のピッチ変化を表す表示態様にして表示部１５に表示すればよい。
具体的には、例えば、ＣＰＵ１１は、図９に例示するように、「ビブラート」の技法が用いられている区間ｋ１２においては、その区間を波線で表してもよい。また、ＣＰＵ１１は、「クレッシェンド」の技法が用いられている区間ｋ１１においては、その区間を表す線の太さが徐々に太くなるように表示してもよい。 Even in the case of detecting a singing technique other than “shakuri”, the CPU 11 displays the display mode of the portion corresponding to the section specified as using the technique among the phoneme pronunciation sections represented by the processed musical score data. What is necessary is just to display on the display part 15 by making it the display mode showing the pitch change of the said area.
Specifically, for example, as illustrated in FIG. 9, the CPU 11 may represent the section with a wavy line in the section k12 in which the “vibrato” technique is used. Further, in the section k11 in which the “crescendo” technique is used, the CPU 11 may display so that the thickness of the line representing the section gradually increases.

（２）上述した実施形態では、音素毎の区切り位置で区切って表示した。これに代えて、発音区間を表す各線の区切り位置によって息継ぎの区間を表現してもよい。 (2) In the above-described embodiment, it is displayed by being separated at the separation position for each phoneme. Instead of this, the breathing interval may be expressed by the break position of each line representing the sounding interval.

（３）上述した実施形態では、楽譜音データが表す音素の発音区間と加工楽譜音データが表す音素の発音区間とを、水平に伸びるライン状の図形で表した。楽譜音データと加工楽譜音データの音素の発音区間を表す図形は、上述したライン状の図形に限らず、例えば矩形図形であってもよく、また、複数の円状図形が連なって構成された図形であってもよく、要は、加工楽譜音データが表す音素の発音区間と加工楽譜音データが表す音素の発音区間とを、同一の時間軸に対応させて表示するものであればどのようなものであってもよい。 (3) In the above-described embodiment, the phoneme sounding section represented by the musical score sound data and the phoneme sounding section represented by the processed music sound data are represented by horizontally extending line figures. The graphic representing the phoneme pronunciation interval of the musical score sound data and the processed musical score sound data is not limited to the above-described line shape, but may be, for example, a rectangular shape, or a plurality of circular shapes connected together. It may be a figure, and what is important is that the phoneme pronunciation interval represented by the processed musical score sound data and the phoneme pronunciation interval represented by the processed musical score sound data are displayed corresponding to the same time axis. It may be anything.

（４）上述した実施形態においては、音分析データとして、模範音データから検出されたピッチ及びスペクトルを示すデータを用いて、この音分析データから「しゃくり」の技法が用いられている区間を抽出した。音分析データはピッチに限らず、抽出したい歌唱技法が用いられている区間を特定できるデータであればよく、例えば「しゃくり」、「ビブラート」の歌唱技法を抽出する場合には、模範音データのピッチを示すデータを音分析データとして用いればよく、また、例えば「クレッシェンド」の技法を抽出する場合には、パワーを示すデータを音分析データとして用いればよい。要するに、音分析データは、模範音声データのピッチ、スペクトル及びパワーのうちの少なくともいずれか一つを示すデータであればよい。 (4) In the above-described embodiment, as the sound analysis data, data indicating the pitch and spectrum detected from the model sound data is used, and a section in which the technique of “shaking” is used is extracted from the sound analysis data. did. The sound analysis data is not limited to the pitch, but any data that can identify the section in which the singing technique to be extracted is used. For example, when extracting the singing technique of “Sharukuri” or “Vibrato”, the model sound data Data indicating the pitch may be used as the sound analysis data. For example, when the “crescendo” technique is extracted, data indicating the power may be used as the sound analysis data. In short, the sound analysis data may be data indicating at least one of the pitch, spectrum, and power of the model voice data.

（５）上述した実施形態においては、発音タイミングを示すデータとして、音素の発音開始タイミングと発音終了タイミングとの両方を示すデータを用いたが、音素の発音開始タイミングのみを示すデータであってもよい。 (5) In the above-described embodiment, data indicating both the pronunciation start timing and the pronunciation end timing of the phoneme is used as the data indicating the sound generation timing. Good.

（６）上述した実施形態では、ＣＰＵ１１は、模範音データから技法を抽出して模範技法データを生成した。これに代えて、模範技法データを予め記憶しておいてもよい。この場合は、ＣＰＵ１１は、模範音データから模範技法データを生成する処理を行う必要はない。 (6) In the above-described embodiment, the CPU 11 extracts the technique from the model sound data and generates the model technique data. Instead of this, the model technique data may be stored in advance. In this case, the CPU 11 does not need to perform processing for generating model technique data from the model sound data.

（７）上述した実施形態においては、模範音データを模範音データ記憶領域１４ｃに記憶させて、カラオケ装置１のＣＰＵ１１が記憶部１４から模範音データを読み出すようにしたが、これに代えて、通信ネットワークを介して模範音データを受信するようにしてもよい。また、ＵＳＢ（Universal Serial Bus）等のインタフェースを介して模範音データを入力するようにしてもよい。 (7) In the above-described embodiment, the model sound data is stored in the model sound data storage area 14c, and the CPU 11 of the karaoke apparatus 1 reads the model sound data from the storage unit 14, but instead, The model sound data may be received via a communication network. Also, the model sound data may be input via an interface such as USB (Universal Serial Bus).

（８）上述した実施形態では、カラオケ装置１が、本実施形態に係る機能の全てを実現するようになっていた。これに対し、通信ネットワークで接続された２以上の装置が上記機能を分担するようにし、それら複数の装置を備えるシステムが同実施形態のカラオケ装置１を実現するようにしてもよい。 (8) In the above-described embodiment, the karaoke apparatus 1 realizes all the functions according to the present embodiment. On the other hand, two or more devices connected via a communication network may share the above functions, and a system including the plurality of devices may realize the karaoke device 1 of the embodiment.

（９）上述した実施形態では、ＣＰＵ１１は、加工楽譜音データを表示部１５に出力した。これに代えて、加工楽譜音データを通信ネットワークを介して所定のサーバ装置に送信することによって出力してもよい。または、ＵＳＢ等のインタフェースを介して加工楽譜音データを出力してもよい。 (9) In the embodiment described above, the CPU 11 outputs the processed musical score sound data to the display unit 15. Alternatively, the processed musical score data may be output by being transmitted to a predetermined server device via a communication network. Alternatively, the processed musical score data may be output via an interface such as a USB.

（１０）上述した実施形態におけるカラオケ装置１のＣＰＵ１１によって実行されるプログラムは、磁気テープ、磁気ディスク、フレキシブルディスク、光記録媒体、光磁気記録媒体、ＣＤ（Compact Disk）−ＲＯＭ、ＤＶＤ（Digital Versatile Disk）、ＲＡＭなどの記録媒体に記憶した状態で提供し得る。また、インターネットのようなネットワーク経由でカラオケ装置１にダウンロードさせることも可能である。 (10) Programs executed by the CPU 11 of the karaoke apparatus 1 in the above-described embodiment are magnetic tape, magnetic disk, flexible disk, optical recording medium, magneto-optical recording medium, CD (Compact Disk) -ROM, DVD (Digital Versatile). Disk) or a storage medium such as a RAM. It is also possible to download to the karaoke apparatus 1 via a network such as the Internet.

（１１）上述した実施形態では、模範となる歌唱音声を表す模範音データと楽譜音データとを対応付けて、その対応結果に基づいて楽譜音データを加工するようにした。本発明における模範音データは、歌唱音声を表す音声データに限定されるものではなく、楽器の演奏音を表す音声データにも適用することができる。この場合も、カラオケ装置のＣＰＵが、楽器の演奏音を表す音声データと楽譜音データとを対応付けて、その対応結果に基づいて楽譜音データを加工する。すなわち、模範音データは、人の歌声を表す音声データであってもよく、楽器の演奏音を表す音声データであってもよい。 (11) In the embodiment described above, the model sound data representing the model singing voice and the score sound data are associated with each other, and the score sound data is processed based on the correspondence result. The model sound data in the present invention is not limited to sound data representing a singing sound, and can also be applied to sound data representing a performance sound of a musical instrument. Also in this case, the CPU of the karaoke apparatus associates the sound data representing the performance sound of the musical instrument with the score sound data, and processes the score sound data based on the correspondence result. That is, the model sound data may be voice data representing a human singing voice or voice data representing a performance sound of a musical instrument.

カラオケ装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of a karaoke apparatus. 模範技法データの内容の一例を示す図である。It is a figure which shows an example of the content of model technique data. カラオケ装置が行う処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which a karaoke apparatus performs. 模範音データと楽譜音データの対応付けを説明するための図である。It is a figure for demonstrating matching with model sound data and musical score sound data. 表示部に表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed on a display part. カラオケ装置が行う処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which a karaoke apparatus performs. 技法が用いられている区間の検出処理を説明するための図である。It is a figure for demonstrating the detection process of the area where the technique is used. 表示部に表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed on a display part. 表示部に表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed on a display part.

Explanation of symbols

１…カラオケ装置、１１…ＣＰＵ、１２…ＲＯＭ、１３…ＲＡＭ、１４…記憶部、１５…表示部、１６…操作部、１７…マイクロフォン、１８…音声処理部、１９…スピーカ。 DESCRIPTION OF SYMBOLS 1 ... Karaoke apparatus, 11 ... CPU, 12 ... ROM, 13 ... RAM, 14 ... Memory | storage part, 15 ... Display part, 16 ... Operation part, 17 ... Microphone, 18 ... Audio | voice processing part, 19 ... Speaker.

Claims

First storage means for storing musical score sound data representing the sound generation timing of a plurality of phonemes whose sound generation timing is continuous in time series;
Second storage means for storing exemplary sound data representing an exemplary sound;
An association unit that divides the model sound data stored in the second storage unit for each phoneme, and associates the musical score data stored in the first storage unit with each phoneme;
Based on the association result by the association means, the modified musical score sound that generates the processed musical score sound data by converting the pronunciation timing of each phoneme represented by the musical score sound data into the pronunciation timing of the phoneme represented by the exemplary sound data Data generation means;
An instruction device comprising: output means for outputting the musical score data and the processed musical score data generated by the processed musical score data generation means.

First storage means for storing musical score sound data representing the sound generation timing of a plurality of phonemes whose sound generation timing is continuous in time series;
Second storage means for storing exemplary sound data representing an exemplary sound;
An association unit that divides the model sound data stored in the second storage unit for each phoneme, and associates the musical score data stored in the first storage unit with each phoneme;
Based on the association result by the association means, the modified musical score sound that generates the processed musical score sound data by converting the pronunciation timing of each phoneme represented by the musical score sound data into the pronunciation timing of the phoneme represented by the exemplary sound data Data generation means;
Finger ShirubeSo location, characterized in that it comprises an output means for output the machining music sound data generated by the processing score sound data generating means.

The output means displays the pronunciation period of the phoneme represented by the musical score sound data and the pronunciation period of the phoneme represented by the processed musical score sound data on the display means in association with the same time axis. The instruction apparatus according to 1.

Sound analysis data generating means for generating sound analysis data indicating at least one of pitch, spectrum and power of the model sound data from the model sound data;
Analyze the temporal change pattern of the sound analysis data generated by the sound analysis data generation means, determine whether this analysis result corresponds to a predetermined pattern, and if it corresponds, A technique section specifying means for specifying a section corresponding to the pattern as a section in which a specific technique is used, and
The output means is configured to change a display mode of a portion corresponding to a section specified by the technique section specifying means out of a phoneme pronunciation section represented by the processed musical score sound data from a display mode of other sections. It displays on a display means. The instruction | indication apparatus of Claim 3 characterized by the above-mentioned.

The sound analysis data generation means generates sound analysis data indicating a pitch from the model sound data,
The technique section specifying unit analyzes a pattern of temporal change in pitch indicated by the sound analysis data generated by the sound analysis data generation unit, and specifies a section that continuously changes from a low pitch to a high pitch. ,
The output means sets a display mode of a portion corresponding to a section specified by the technique section specifying means in a phoneme pronunciation section represented by the processed musical score sound data to a display mode indicating a pitch change of the section. It displays on a display means. The instruction | indication apparatus of Claim 4 characterized by the above-mentioned.

A computer comprising first storage means for storing musical score sound data representing the pronunciation timings of a plurality of phonemes whose pronunciation timings are arranged in time series, and second storage means for storing exemplary sound data representing an exemplary sound ,
An association function that divides the model sound data stored in the second storage unit into phonemes, and associates the musical score data stored in the first storage unit with phoneme units;
Based on the association result by the association function, the modified musical score sound that generates the processed musical score sound data by converting the pronunciation timing of each phoneme represented by the musical score sound data into the pronunciation timing of the phoneme represented by the exemplary sound data Data generation function,
An output function for outputting the musical score data and the processed musical score data generated by the processed musical score data generation function.

A computer comprising first storage means for storing musical score sound data representing the pronunciation timings of a plurality of phonemes whose pronunciation timings are arranged in time series, and second storage means for storing exemplary sound data representing an exemplary sound ,
An association function that divides the model sound data stored in the second storage unit into phonemes, and associates the musical score data stored in the first storage unit with phoneme units;
Based on the association result by the association function, the modified musical score sound that generates the processed musical score sound data by converting the pronunciation timing of each phoneme represented by the musical score sound data into the pronunciation timing of the phoneme represented by the exemplary sound data Data generation function,
An output function for outputting the processed musical score data generated by the processed musical score data generation function.

The output function is characterized in that the phoneme pronunciation interval represented by the musical score sound data and the phoneme pronunciation interval represented by the processed musical score data are displayed on the display unit in correspondence with the same time axis. 6. The program according to 6.

On the computer,
A sound analysis data generation function for generating sound analysis data indicating at least one of pitch, spectrum and power of the model sound data from the model sound data;
Analyzing the temporal change pattern of the sound analysis data generated by the sound analysis data generation function, it is determined whether or not the analysis result corresponds to a predetermined pattern. A technique section specifying function for specifying a section corresponding to the pattern as a section in which a specific technique is used, and
In the output function, the display mode of the portion corresponding to the section specified by the technique section specifying function in the phoneme pronunciation section represented by the processed musical score data is different from the display mode of the other sections. The program according to claim 8, wherein the program is displayed on a means.

The sound analysis data generation function generates sound analysis data indicating a pitch from the model sound data,
The technique section specifying function analyzes a pattern of a temporal change in pitch indicated by the sound analysis data generated by the sound analysis data generation function, specifies a section that continuously changes from a low pitch to a high pitch,
The output function among the pre-Symbol processed music sound phoneme pronunciation interval data representing a display mode of the portion corresponding to the section specified by the techniques section identifying feature, in the display mode representative of a pitch variation of the section It displays on the said display means. The program of Claim 9 characterized by the above-mentioned.