JP2007256617A

JP2007256617A - Musical piece practice device and musical piece practice system

Info

Publication number: JP2007256617A
Application number: JP2006080810A
Authority: JP
Inventors: Juichi Sato; 寿一佐藤
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-03-23
Filing date: 2006-03-23
Publication date: 2007-10-04

Abstract

<P>PROBLEM TO BE SOLVED: To provide technology for informing a singing person (or a playing person) of a part which is likely to be mistaken in a karaoke system. <P>SOLUTION: When a song is selected by the singing person (or the playing person), a CPU 11 of the karaoke system 2 reads an accompaniment data of an indicated song and supplies it to a sound processing section 18. The sound processing section 18 converts the supplied accompaniment data to an analog signal and sound is output by supplying the signal to a speaker 19. At this time, the CPU 11 recognizes which position of the song the accompaniment signal generated by the sound processing section 18 is located, and compares the recognized position with a starting position of period indicating information acquired from a server unit 3. When difference of the both becomes the predetermined difference, the period indicated by the period indicating information is informed. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、楽曲練習装置および楽曲練習システムに関する。 The present invention relates to a music practice device and a music practice system.

カラオケ装置においては、歌唱者の歌唱の巧拙を採点するための方法が種々提案されている。例えば、特許文献１においては、歌唱された音声のピッチと基準ピッチとを比較して、どの部分がうまく歌えなかったかを判定する方法が提案されている。また、特許文献２おいては、歌唱者の過去の採点結果を記憶し、点数が悪いフレーズではガイドメロディの音量を大きくする方法が提案されている。また、採点結果をサーバに送信して、ランキングを行う方法も提案されている（例えば、特許文献３参照）。
特開２００４−０９３６０１号公報特開２００５−０４９４１０号公報特開２００５−０９９２８８号公報 In a karaoke apparatus, various methods for scoring the skill of a singer's singing have been proposed. For example, Patent Document 1 proposes a method of determining which part has not been successfully sung by comparing the pitch of the sung voice with a reference pitch. Further, Patent Document 2 proposes a method of storing the singer's past scoring results and increasing the volume of the guide melody for a phrase having a poor score. In addition, a method has been proposed in which a scoring result is transmitted to a server to perform ranking (see, for example, Patent Document 3).
Japanese Patent Application Laid-Open No. 2004-093601 JP 2005-049410 A Japanese Patent Laying-Open No. 2005-099288

ところで、特許文献１や特許文献２に記載の方法では、歌唱者が一度歌ってみなければどの部分を間違ったかを把握することができない。そのため、歌唱者がその楽曲を初めて歌唱する場合には、歌唱者は楽曲のどの部分に留意して歌唱すべきなのかを把握することはできなかった。これは楽器演奏についても同様である。
本発明は上述した背景の下になされたものであり、カラオケ装置において、間違いやすい箇所を事前に歌唱者に報知することのできる技術を提供することを目的とする。 By the way, in the methods described in Patent Document 1 and Patent Document 2, it is impossible to grasp which part is wrong unless the singer sings once. Therefore, when a singer sings the music for the first time, the singer cannot grasp what part of the music should be sung. The same applies to musical instrument performance.
The present invention has been made under the above-described background, and an object of the present invention is to provide a technique capable of notifying a singer in advance of easily mistaken parts in a karaoke apparatus.

上記課題を解決するため、本発明は、楽曲の伴奏楽音を構成する伴奏データを記憶する伴奏データ記憶手段と、前記楽曲の特定の区間を示す区間指定情報を取得する取得手段と、伴奏の開始を指示する指示手段と、前記指示手段によって伴奏の開始が指示された場合に、楽曲の進行に応じて前記伴奏データ記憶手段から伴奏データを読み出し、読み出した伴奏データに基づいて伴奏音信号を生成する伴奏音信号生成手段と、前記伴奏音信号生成手段が生成した伴奏音信号が前記楽曲のどの位置にあたるかを認識する伴奏位置認識手段と、前記伴奏位置認識手段が認識した位置と前記取得手段が取得した区間指定情報の開始位置とを比較し、両者の差が所定の差になったときに、前記区間指定情報が示す区間を報知する報知手段とを備えることを特徴とする楽曲練習装置を提供する。
本発明の好ましい態様においては、前記区間指定情報は、特定する区間の特徴を示す特徴データを含み、前記報知手段は前記区間の報知とともに、前記特徴データに応じて予め設定された態様の報知を行うことを特徴とする。 In order to solve the above problems, the present invention provides accompaniment data storage means for storing accompaniment data constituting accompaniment sounds of music, acquisition means for acquiring section designation information indicating a specific section of the music, and start of accompaniment And when the start of the accompaniment is instructed by the instruction means, the accompaniment data is read from the accompaniment data storage means according to the progress of the music, and an accompaniment sound signal is generated based on the read accompaniment data An accompaniment sound signal generating means, an accompaniment position recognizing means for recognizing which position of the music the accompaniment sound signal generated by the accompaniment sound signal generating means, a position recognized by the accompaniment position recognizing means and the acquisition means And a notifying means for notifying the section indicated by the section designation information when the difference between the two is a predetermined difference. To provide music training device according to claim.
In a preferred aspect of the present invention, the section designation information includes feature data indicating the characteristics of the section to be specified, and the notification means notifies the section in advance according to the feature data together with the notification of the section. It is characterized by performing.

また、本発明は、楽曲の伴奏楽音を構成する伴奏データを記憶する伴奏データ記憶手段と、前記楽曲に含まれる旋律の音を表す模範音声データが記憶された模範音声データ記憶手段と、前記楽曲の特定の区間を示す区間指定情報を取得する取得手段と、伴奏の開始を指示する指示手段と、前記指示手段によって伴奏の開始が指示された場合に、楽曲の進行に応じて前記伴奏データ記憶手段から伴奏データを読み出し、読み出した伴奏データに基づいて伴奏音信号を生成する伴奏音信号生成手段と、前記伴奏音信号生成手段が生成した伴奏音信号が前記楽曲のどの位置にあたるかを認識する伴奏位置認識手段と、前記伴奏位置認識手段が認識した位置と前記取得手段が取得した区間指定情報の開始位置とが一致するタイミングで、前記取得手段により取得された区間指定情報の示す区間と対応する部分の模範音声データを前記模範音声データ記憶手段から読み出し、読み出した模範音声データに基づいて音声信号を生成する音声信号生成手段とを備えることを特徴とする楽曲練習装置を提供する。
また、本発明は、楽曲の伴奏楽音を構成する伴奏データを記憶する伴奏データ記憶手段と、前記楽曲の特定の区間を示す区間指定情報を取得する取得手段と、前記取得手段により取得された区間指定情報の示す区間から伴奏の開始を指示する特定区間指示手段と、前記特定区間指示手段によって伴奏の開始が指示された場合に、前記取得手段により取得された区間指定情報の示す区間と対応する部分の伴奏データを前記伴奏データ記憶手段から読み出し、読み出した伴奏データに基づいて伴奏音信号を生成する伴奏音信号生成手段とを備えることを特徴とする楽曲練習装置を提供する。 The present invention also provides accompaniment data storage means for storing accompaniment data constituting accompaniment sounds of music, model voice data storage means for storing model voice data representing melodic sounds included in the music, and the music Acquisition means for acquiring section designation information indicating a specific section, an instruction means for instructing the start of accompaniment, and the accompaniment data storage according to the progress of the musical piece when the start of the accompaniment is instructed by the instruction means Accompaniment data is read from the means, and accompaniment sound signal generation means for generating an accompaniment sound signal based on the read accompaniment data, and the position of the music corresponding to the accompaniment sound signal generated by the accompaniment sound signal generation means is recognized Accompaniment position recognition means, the acquisition means at a timing when the position recognized by the accompaniment position recognition means coincides with the start position of the section designation information acquired by the acquisition means Voice signal generation means for reading out the model voice data corresponding to the section indicated by the section designation information acquired from the model voice data storage means and generating a voice signal based on the read model voice data. A characteristic music practice device is provided.
The present invention also provides accompaniment data storage means for storing accompaniment data constituting accompaniment sounds of music, acquisition means for acquiring section designation information indicating a specific section of the music, and sections acquired by the acquisition means Corresponding to the section indicated by the section specifying information acquired by the acquiring means when the start of accompaniment is instructed by the specific section instructing means for instructing the start of accompaniment from the section indicated by the specifying information There is provided a music practice device comprising: accompaniment sound signal generation means for reading out partial accompaniment data from the accompaniment data storage means and generating an accompaniment sound signal based on the read accompaniment data.

本発明の好ましい態様においては、前記伴奏データは、楽曲の位置を示す位置情報を含んでおり、前記伴奏位置認識手段は、前記伴奏データに含まれる位置情報から、前記伴奏音信号生成手段が生成した伴奏音信号が前記楽曲のどの位置にあたるかを認識することを特徴とする。
本発明の別の好ましい態様においては、前記伴奏位置認識手段は、前記伴奏音信号生成手段による伴奏データの読み出し処理に応じて、前記伴奏音信号生成手段が生成した伴奏音信号が前記楽曲のどの位置にあたるかを認識することを特徴とする。 In a preferred aspect of the present invention, the accompaniment data includes position information indicating the position of a song, and the accompaniment position recognition means is generated by the accompaniment sound signal generation means from position information included in the accompaniment data. It is characterized by recognizing which position of the music the accompaniment sound signal is applied to.
In another preferred aspect of the present invention, the accompaniment position recognizing means is adapted to determine which accompaniment sound signal generated by the accompaniment sound signal generating means is in accordance with a process of reading accompaniment data by the accompaniment sound signal generating means. It is characterized by recognizing whether it corresponds to a position.

また、本発明の更に好ましい態様においては、練習者の音声を表す練習者データの入力を受け付ける入力手段と、入力された練習者データと模範データ記憶手段に記憶された模範データとを、予め定められた時間単位の比較区間毎に比較して、両者の相違の程度を示す相違情報を前記比較区間毎に生成して出力する比較手段とを具備することを特徴とする。
また、本発明の更に好ましい態様においては、前記模範データは、楽曲の旋律のピッチを表すデータであり、前記楽曲練習装置は、前記練習者データから音声のピッチを算出するピッチ算出手段を備え、前記比較手段は、前記ピッチ算出手段により算出されたピッチと前記模範データ記憶手段に記憶された模範データの示すピッチとを、前記比較区間毎に比較して、両者の相違の程度を示す相違情報を前記比較区間毎に生成することを特徴とする。
また、本発明の好ましい態様においては、前記模範データは、楽曲の歌詞を表すデータであり、前記楽曲練習装置は、前記練習者データにより表される音声を認識し、認識した音声に対応する認識文字列を生成する音声認識手段を備え、前記比較手段は、前記音声認識手段により生成された認識文字列と前記模範データ記憶手段に記憶された模範データとを、前記比較区間毎に比較して、両者の相違の程度を示す相違情報を前記比較区間毎に生成することを特徴とする。
また、本発明の好ましい態様においては、前記模範データは、模範となる歌唱に用いられている技法の種類とタイミングとを示す技法データであることを特徴とする。
また、本発明の好ましい態様においては、前記練習者データは、入力される楽器の演奏音を表すデータであり、前記模範データは、模範として用いられる楽器の演奏音を表すデータであることを特徴とする。
また、本発明は、上述の練習装置を複数有し、前記各楽曲練習装置の前記比較手段が生成した相違情報をネットワークを介して受信し、受信した相違情報の統計を前記比較区間毎にとって、統計結果が予め定められた条件を満たす比較区間を抽出し、抽出された比較区間を前記楽曲の特定の区間として示す区間指定情報を生成する区間指定情報生成手段を有するサーバ装置を具備し、前記各楽曲練習装置の前記取得手段は、前記サーバ装置から区間指定情報を取得することを特徴とする楽曲練習システムを提供する。 Further, in a further preferred aspect of the present invention, input means for accepting input of trainer data representing a trainee's voice, and model data stored in the model data storage means are input in advance. Comparing means for generating and outputting difference information indicating the degree of difference between the two comparison sections for each comparison section in comparison with each other.
Further, in a further preferred aspect of the present invention, the model data is data representing a melody pitch of music, and the music practice device includes pitch calculation means for calculating a pitch of voice from the trainee data, The comparison means compares the pitch calculated by the pitch calculation means and the pitch indicated by the model data stored in the model data storage means for each comparison section, and shows difference information indicating the degree of difference between the two. Is generated for each comparison section.
Also, in a preferred aspect of the present invention, the model data is data representing the lyrics of the music, and the music practice device recognizes the voice represented by the trainer data and recognizes the recognized voice. Voice recognition means for generating a character string, and the comparison means compares the recognized character string generated by the voice recognition means with the model data stored in the model data storage means for each comparison section. The difference information indicating the degree of difference between the two is generated for each comparison section.
Also, in a preferred aspect of the present invention, the model data is technique data indicating a type and timing of a technique used for a model song.
Further, in a preferred aspect of the present invention, the practitioner data is data representing a performance sound of an input musical instrument, and the model data is data representing a performance sound of a musical instrument used as a model. And
In addition, the present invention has a plurality of the above-described practice devices, receives the difference information generated by the comparison means of each music practice device via a network, the statistics of the received difference information for each comparison section, Comprising a server device having section specifying information generating means for generating a section specifying information for extracting a comparison section in which a statistical result satisfies a predetermined condition and indicating the extracted comparison section as a specific section of the music; The acquisition means of each music practice device provides a music practice system characterized by acquiring section designation information from the server device.

本発明によれば、間違いやすい箇所を事前に歌唱者に報知することができる。 According to the present invention, it is possible to notify a singer in advance of an easily mistaken location.

＜Ａ：第１実施形態＞
＜Ａ−１：構成＞
図１は、この発明の一実施形態に係る楽曲練習システム１の全体構成の一例を示すブロック図である。このシステムは、カラオケ装置２ａ，２ｂ，２ｃとサーバ装置３とが通信ネットワーク４を介して接続されて構成される。なお、図１には３つのカラオケ装置が例示されているが、本楽曲練習システムに含まれるカラオケ装置の数は３に限定されるものではなく、これより多くても少なくてもよい。また、以下では、カラオケ装置２ａ，２ｂ，２ｃを各々区別する必要がない場合には、単に「カラオケ装置２」とする。 <A: First Embodiment>
<A-1: Configuration>
FIG. 1 is a block diagram showing an example of the overall configuration of a music practice system 1 according to an embodiment of the present invention. This system is configured by connecting karaoke apparatuses 2 a, 2 b, 2 c and a server apparatus 3 via a communication network 4. In addition, although three karaoke apparatuses are illustrated in FIG. 1, the number of karaoke apparatuses contained in this music practice system is not limited to three, and may be more or less than this. In the following description, when it is not necessary to distinguish the karaoke apparatuses 2a, 2b, and 2c, they are simply referred to as “karaoke apparatus 2”.

図２は、カラオケ装置２のハードウェア構成を例示したブロック図である。ＣＰＵ（Central Processing Unit）１１は、ＲＯＭ（Read Only Memory）１２または記憶部１４に記憶されているコンピュータプログラムを読み出してＲＡＭ（Random Access Memory）１３にロードし、これを実行することにより、カラオケ装置２の各部を制御する。記憶部１４は、例えばハードディスクなどの大容量の記憶手段であり、伴奏データ記憶領域１４ａと、歌詞データ記憶領域１４ｂと、練習者音声データ記憶領域１４ｃと、採点結果データ記憶領域１４ｄと、模範音声データ記憶領域１４ｅとを有している。表示部１５は、例えば液晶ディスプレイなどであり、ＣＰＵ１１の制御の下で、カラオケ装置２を操作するためのメニュー画面や、背景画像に歌詞テロップを重ねたカラオケ画面などの各種画面を表示する。操作部１６は、各種のキーを備えており、押下されたキーに対応した信号をＣＰＵ１１へ出力する。マイクロフォン１７は、歌唱者が発音した音声を収音する収音手段である。音声処理部１８は、マイクロフォン１７によって収音された音声（アナログデータ）をデジタルデータに変換してＣＰＵ１１に供給する。スピーカ１９は、音声処理部１８に接続されており、音声処理部１８から出力される信号に応じた強度で放音する。通信部２０は、各種通信装置等を備えており、ＣＰＵ１１の制御の下、通信ネットワーク４を介してサーバ装置３とデータの授受を行う。 FIG. 2 is a block diagram illustrating a hardware configuration of the karaoke apparatus 2. A CPU (Central Processing Unit) 11 reads a computer program stored in a ROM (Read Only Memory) 12 or a storage unit 14, loads it into a RAM (Random Access Memory) 13, and executes it to execute a karaoke apparatus. 2 parts are controlled. The storage unit 14 is a large-capacity storage unit such as a hard disk, and includes an accompaniment data storage area 14a, a lyrics data storage area 14b, a trainer voice data storage area 14c, a scoring result data storage area 14d, and a model voice. And a data storage area 14e. The display unit 15 is a liquid crystal display, for example, and displays various screens such as a menu screen for operating the karaoke device 2 and a karaoke screen in which lyrics telop is superimposed on a background image under the control of the CPU 11. The operation unit 16 includes various keys and outputs a signal corresponding to the pressed key to the CPU 11. The microphone 17 is a sound collecting unit that picks up the sound produced by the singer. The sound processing unit 18 converts sound (analog data) collected by the microphone 17 into digital data and supplies it to the CPU 11. The speaker 19 is connected to the sound processing unit 18 and emits sound with an intensity corresponding to a signal output from the sound processing unit 18. The communication unit 20 includes various communication devices and the like, and exchanges data with the server device 3 via the communication network 4 under the control of the CPU 11.

記憶部１４の伴奏データ記憶領域１４ａには、例えばＭＩＤＩ（Musical Instruments Digital Interface：登録商標）形式の伴奏データであって、各曲の伴奏を行う各種楽器の旋律の音程（ピッチ）や強さ（ベロシティ）や効果の付与等を示す情報が楽曲の進行に従って記された伴奏データが記憶されている。この伴奏データの中には、楽曲のメロディの音階を示すメロディデータが含まれている。本実施形態においては、このメロディデータを模範データとして用いる。また、この伴奏データは曲中の小節の番号を示す小節番号情報（位置情報）を含んでいる。歌詞データ記憶領域１４ｂには、伴奏データと対応する歌詞を示す歌詞データが記憶されている。 The accompaniment data storage area 14a of the storage unit 14 is, for example, accompaniment data in the MIDI (Musical Instruments Digital Interface: registered trademark) format, and the melody pitch (pitch) and strength of various instruments that accompany each song ( Accompaniment data in which information indicating the (velocity) and the application of effects is written according to the progress of the music is stored. The accompaniment data includes melody data indicating the scale of the melody of the music. In this embodiment, this melody data is used as model data. The accompaniment data includes measure number information (position information) indicating the number of the measure in the song. The lyrics data storage area 14b stores lyrics data indicating lyrics corresponding to the accompaniment data.

練習者音声データ記憶領域１４ｃには、マイクロフォン１７から音声処理部１８を経てＡ／Ｄ変換された音声データが、例えばＷＡＶＥ形式やＭＰ３（MPEG Audio Layer-3）形式で時系列に記憶される。この音声データは、練習者の音声（以下、練習者音声）を表す音声データであるから、以下では、練習者音声データという。 In the practitioner audio data storage area 14c, audio data A / D converted from the microphone 17 via the audio processing unit 18 is stored in time series, for example, in the WAVE format or MP3 (MPEG Audio Layer-3) format. Since this voice data is voice data representing the voice of the practitioner (hereinafter referred to as “practice voice”), it is hereinafter referred to as “practice voice data”.

採点結果データ記憶領域１４ｄには、練習者音声データとメロディデータとの相違の程度を示す相違情報が記憶される。カラオケ装置２のＣＰＵ１１は、練習者音声データとメロディデータとを、予め定められた区間（比較区間）毎に比較して、両者の相違の程度を示す相違情報を区間毎に生成し、生成した相違情報をこの採点結果データ記憶領域１４ｄに記憶する。 In the scoring result data storage area 14d, difference information indicating the degree of difference between the trainer voice data and the melody data is stored. The CPU 11 of the karaoke device 2 compares the trainer voice data and the melody data for each predetermined section (comparison section), and generates and generates difference information indicating the degree of difference between the two for each section. The difference information is stored in the scoring result data storage area 14d.

図３は、採点結果データの内容の一例を示す図である。図示のように、採点結果データは、「区間番号」と「ピッチ点数」と「歌詞点数」との各項目が互いに対応付けて記憶されている。これらの項目のうち、「区間番号」は、楽曲を構成する区間を識別する情報が記憶される。この実施形態においては、「区間番号」は、１小節を単位として小節毎に割り当てられた小節番号情報を用いる。「ピッチ点数」の項目には、「区間番号」と対応する区間における練習者音声データのピッチとメロディデータのピッチとの相違の程度を示す相違情報が記憶される。図３に示す例においては、１００点満点の点数に換算された数値がピッチ点数として記憶されている態様を示している。図３の例においては、ピッチ点数の数値が大きいほど両者のピッチがより近似しており、逆に数値が小さいほど両者のピッチが相違していることを示している。
「歌詞点数」の項目には、「区間番号」と対応する区間における練習者音声データの歌詞と歌詞データとの相違の程度を示す相違情報が記憶される。図３に示す例においては、この「歌詞点数」も、上述した「ピッチ点数」と同様に、１００点満点の点数に換算された数値が歌詞点数として記憶されている態様を示している。 FIG. 3 is a diagram illustrating an example of the contents of scoring result data. As shown in the figure, the scoring result data is stored with the items of “section number”, “pitch score”, and “lyric score” being associated with each other. Among these items, “section number” stores information for identifying the sections constituting the music. In this embodiment, “section number” uses measure number information assigned to each measure in units of one measure. In the “pitch score” item, difference information indicating the degree of difference between the pitch of the trainer voice data and the pitch of the melody data in the section corresponding to the “section number” is stored. In the example shown in FIG. 3, the numerical value converted into the score of 100 perfect score is shown as a pitch score. In the example of FIG. 3, the larger the numerical value of the number of pitch points, the closer the pitch between the two, and the smaller the numerical value, the different the pitch between the two.
In the “Lyric score” item, difference information indicating the degree of difference between the lyrics of the practicer voice data and the lyrics data in the section corresponding to the “section number” is stored. In the example shown in FIG. 3, this “lyric score” also indicates a mode in which a numerical value converted into a score of 100 points is stored as the lyrics score, as with the above “pitch score”.

次に、記憶部１４の模範音声データ記憶領域１４ｅには、例えばＷＡＶＥ形式やＭＰ３形式などの音声データであって、楽曲の歌唱の模範として用いられる、楽曲に含まれる旋律の音を表す音声データ（以下、「模範音声データ」）が記憶されている。 Next, in the exemplary audio data storage area 14e of the storage unit 14, for example, audio data in WAVE format or MP3 format, which is used as an exemplary model for singing a song and represents audio of melody included in the song (Hereinafter “model voice data”) is stored.

図４は、サーバ装置３のハードウェア構成を例示したブロック図である。ＣＰＵ３１は、ＲＯＭ３２または記憶部３４に記憶されているコンピュータプログラムを読み出してＲＡＭ３３にロードし、これを実行することにより、サーバ装置３の各部を制御する。記憶部３４は、例えばハードディスクなどの大容量の記憶手段であり、採点結果データベース記憶領域３４ａと、区間指定情報記憶領域３４ｂとを有している。通信部３５は、各種通信装置等を備えており、ＣＰＵ３１の制御の下、通信ネットワーク４を介してカラオケ装置２とデータの授受を行う。 FIG. 4 is a block diagram illustrating a hardware configuration of the server device 3. The CPU 31 reads out a computer program stored in the ROM 32 or the storage unit 34, loads it into the RAM 33, and executes it to control each unit of the server device 3. The storage unit 34 is a large-capacity storage unit such as a hard disk, for example, and includes a scoring result database storage area 34a and a section designation information storage area 34b. The communication unit 35 includes various communication devices and the like, and exchanges data with the karaoke device 2 via the communication network 4 under the control of the CPU 31.

記憶部３４の採点結果データベース記憶領域３４ａには、採点結果データの集合である採点結果データベースが記憶されている。この採点結果データベースは、ピッチ採点結果テーブルと、歌詞採点結果テーブルとを楽曲毎に有している。
図５は、ピッチ採点結果テーブルの内容の一例を示す図である。図示のように、このテーブルは、「区間番号」と「ピッチ点数」と「間違い数」と「間違い率」との各項目が互いに関連付けて記憶されている。これらの項目のうち、「区間番号」と「ピッチ点数」とは上述した採点結果データにおいて示したものと同様のデータが記憶される。ただし、このテーブルには、図示のように、「区間番号」に対応付けて、複数の歌唱におけるピッチ点数（「ピッチ点数（歌唱Ａ，歌唱Ｂ・・・）」）が記憶されている。次に、「間違い数」の項目には、相違度が大きいことを示す採点結果の数を示す数値が記憶される。例えば、閾値を６０（％）とすると、「区間１」と対応する「間違い数」には、「区間１」と対応する「ピッチ点数」であってその値が６０以下である「ピッチ点数」総数が記憶される。図５に示す例においては、「区間１」においては、相違度が大きいピッチ点数の数が１である場合を例示している。
次に、「間違い率」の項目には、「間違い数」をピッチ点数における歌唱の総数で除算した値が記憶される。
なお、歌詞採点結果テーブルの内容も図５に示すピッチ採点結果テーブルの内容と同様であり、ここではその説明を省略する。 The scoring result database storage area 34a of the storage unit 34 stores a scoring result database that is a set of scoring result data. This scoring result database has a pitch scoring result table and a lyrics scoring result table for each music piece.
FIG. 5 is a diagram illustrating an example of the contents of the pitch scoring result table. As shown in the figure, this table stores items of “section number”, “number of pitch points”, “number of errors”, and “error rate” in association with each other. Among these items, “section number” and “pitch score” store data similar to those shown in the above-described scoring result data. However, in this table, as shown in the figure, the number of pitch points in a plurality of songs (“the number of pitch points (song A, song B...)”) Is stored in association with the “section number”. Next, a numerical value indicating the number of scoring results indicating that the degree of difference is large is stored in the item “number of errors”. For example, if the threshold is 60 (%), the “number of errors” corresponding to “section 1” is “pitch score” corresponding to “section 1”, and the value is 60 or less. The total number is stored. In the example illustrated in FIG. 5, the “section 1” illustrates a case where the number of pitch points having a large difference is 1.
Next, in the item “error rate”, a value obtained by dividing “number of errors” by the total number of songs in the number of pitch points is stored.
The contents of the lyrics scoring result table are also the same as the contents of the pitch scoring result table shown in FIG. 5, and the description thereof is omitted here.

次に、記憶部３４の区間指定情報記憶領域３４ｂには、例えば歌詞を間違いやすい区間や、音程（ピッチ）を間違いやすい区間などといった楽曲の特定の区間を示す区間指定情報が記憶される。
図６は、区間指定情報の内容の一例を示す図である。図示のように、区間指定情報は、「曲コード」と「区間番号」と「特徴データ」との各項目が互いに関連付けて記憶される。これらの項目のうち、「曲コード」の項目には、楽曲を識別する情報が記憶される。「区間番号」の項目には、小節番号情報が記憶される。「特徴データ」の項目には、例えば「ピッチ間違い」や「歌詞間違い」など、特定する区間の特徴を示す情報が記憶される。 Next, in the section specifying information storage area 34b of the storage unit 34, section specifying information indicating a specific section of music such as a section in which lyrics are likely to be mistaken or a period (pitch) is likely to be mistaken is stored.
FIG. 6 is a diagram illustrating an example of the contents of the section designation information. As shown in the figure, the section designation information is stored in association with items of “song code”, “section number”, and “feature data”. Among these items, information for identifying music is stored in the “music code” item. In the “section number” item, bar number information is stored. In the item of “feature data”, for example, information indicating the characteristics of the specified section such as “pitch error” and “lyric error” is stored.

＜Ａ−２：動作＞
次に、楽曲練習システム１の動作を説明する。
＜Ａ−２−１：採点結果統計動作＞
まず、図７に示すフローチャートを参照しつつ、楽曲練習システム１の採点結果統計動作を説明する。
練習者は、カラオケ装置２の操作部１６を操作して歌唱したい曲を選定する。このとき、曲の伴奏を再生するに先立って、カラオケ装置２のＣＰＵ１１は、その曲の間違いやすい箇所について報知動作を行うが、この報知動作については後述するため、ここではその説明を省略する。
練習者は、カラオケ装置２の操作部１６を操作して、歌唱したい曲の伴奏データの再生を指示する。ＣＰＵ１１は、この指示に応じて図７に示す処理を開始する。ＣＰＵ１１は、まず、指定された曲の伴奏データを伴奏データ記憶領域１４ａから読み出し、音声処理部１８に供給する（ステップＳ１）。音声処理部１８は、供給された伴奏データをアナログ信号に変換してスピーカ１９に供給して放音させる。このとき、ＣＰＵ１１は表示部１５を制御して、「伴奏に合わせて歌唱してください」というような歌唱を促すメッセージを表示するようにしてもよい。練習者は、スピーカ１９から放音される伴奏に合わせて歌唱を行う。このとき、練習者の音声はマイクロフォン１７によって収音されて音声信号に変換され、音声処理部１８へと供給される。そして、音声処理部１８によってＡ／Ｄ変換された練習者音声データは、記憶部１４の練習者音声データ記憶領域１４ｃに時系列に記憶される（ステップＳ２）。 <A-2: Operation>
Next, the operation of the music practice system 1 will be described.
<A-2-1: Scoring result statistical operation>
First, the scoring result statistical operation of the music practice system 1 will be described with reference to the flowchart shown in FIG.
The practitioner operates the operation unit 16 of the karaoke apparatus 2 to select a song to be sung. At this time, prior to reproducing the accompaniment of the song, the CPU 11 of the karaoke apparatus 2 performs a notification operation for a portion where the mistake of the song is likely to occur, but since this notification operation will be described later, the description thereof is omitted here.
The practitioner operates the operation unit 16 of the karaoke apparatus 2 to instruct the reproduction of the accompaniment data of the song to be sung. The CPU 11 starts the process shown in FIG. 7 in response to this instruction. First, the CPU 11 reads the accompaniment data of the designated song from the accompaniment data storage area 14a and supplies it to the audio processing unit 18 (step S1). The sound processing unit 18 converts the supplied accompaniment data into an analog signal and supplies it to the speaker 19 for sound emission. At this time, the CPU 11 may control the display unit 15 to display a message prompting singing such as “Please sing along with the accompaniment”. The practitioner sings along with the accompaniment emitted from the speaker 19. At this time, the voice of the practitioner is picked up by the microphone 17 and converted into a voice signal, which is supplied to the voice processing unit 18. The trainer voice data A / D converted by the voice processing unit 18 is stored in the trainer voice data storage area 14c of the storage unit 14 in time series (step S2).

伴奏データの再生が終了すると、ＣＰＵ１１は、練習者音声データ記憶領域１４ｃに記憶された練習者音声データを読み出し、読み出した練習者音声データに対して音声分析処理を行い、時刻に対応したピッチを練習者音声データから算出する（ステップＳ３）。
続けて、ＣＰＵ１１は、算出したピッチと記憶部１４の伴奏データ記憶領域１４ａに記憶された伴奏データに含まれるメロディデータのピッチとを、予め定められた区間（比較区間）毎に比較して、両者の相違の程度を示すピッチ点数（相違情報）を区間毎に生成する（ステップＳ４）。 When the reproduction of the accompaniment data is completed, the CPU 11 reads the trainer voice data stored in the trainer voice data storage area 14c, performs voice analysis processing on the read trainer voice data, and sets a pitch corresponding to the time. Calculation is made from the trainee voice data (step S3).
Subsequently, the CPU 11 compares the calculated pitch with the pitch of the melody data included in the accompaniment data stored in the accompaniment data storage area 14a of the storage unit 14 for each predetermined section (comparison section). The number of pitch points (difference information) indicating the degree of difference between the two is generated for each section (step S4).

また、ＣＰＵ１１は、練習者音声データ記憶領域１４ｃに記憶された練習者音声データを読み出し、読み出した練習者音声データに対して音声認識処理を行って、練習者音声データにより表される音声を認識し、認識した音声に対応する認識文字列を生成する（ステップＳ５）。そして、ＣＰＵ１１は、生成した認識文字列と歌詞データ記憶領域１４ｂに記憶された歌詞データとを、区間毎に比較して、両者の相違の程度を示す歌詞点数（相違情報）を区間毎に生成する（ステップＳ６）。ステップＳ４とステップＳ６との処理によって、図３に例示したような、区間毎の採点結果を示す相違情報が生成される。
ＣＰＵ１１は、ステップＳ４で生成した区間毎のピッチ点数と歌詞点数とを採点結果データとして、通信ネットワーク４を介してサーバ装置３に送信する（ステップＳ７）。このとき、ＣＰＵ１１は、楽曲を識別する曲コードも、採点結果データとあわせて送信する。 Further, the CPU 11 reads the trainer voice data stored in the trainer voice data storage area 14c, performs voice recognition processing on the read trainer voice data, and recognizes the voice represented by the trainer voice data. Then, a recognized character string corresponding to the recognized voice is generated (step S5). Then, the CPU 11 compares the generated recognized character string with the lyrics data stored in the lyrics data storage area 14b for each section, and generates a lyrics score (difference information) indicating the degree of difference between the two for each section. (Step S6). By the processing in step S4 and step S6, the difference information indicating the scoring result for each section as illustrated in FIG. 3 is generated.
The CPU 11 transmits the pitch score and the lyrics score for each section generated in step S4 as scoring result data to the server device 3 via the communication network 4 (step S7). At this time, the CPU 11 also transmits a music code for identifying the music together with the scoring result data.

図８は、サーバ装置３が行う処理の流れを示すフローチャートである。サーバ装置３のＣＰＵ３１は、通信ネットワーク４を介して採点結果データと曲コードとを受信したことを検知すると、受信された採点結果データ（相違情報）を記憶部３４の採点結果データベース記憶領域３４ａに記憶する（ステップＳＢ１）。
そして、ＣＰＵ３１は、採点結果データのピッチ点数と歌詞点数とのそれぞれの統計を区間毎にとって、区間毎の「間違い数」と「間違い率」とを算出する（ステップＳＢ２）。具体的には、例えば、ピッチ点数の統計処理としては、採点結果データベースのピッチ採点結果テーブルに記憶されたピッチ点数を、区間毎に、その区間のピッチ点数が６０％以下であるものをカウントし、カウント結果をピッチ採点結果テーブルの「間違い数」の項目に記憶する。また、それぞれの区間について、間違い数をピッチ点数における歌唱の総数で除算した値を「間違い率」の項目に記憶する。
歌詞点数についても、歌詞点数と同様の統計処理を実行して、区間毎の歌詞の「間違い数」と「間違い率」とを歌詞採点結果テーブルの「間違い数」と「間違い率」との項目にそれぞれ記憶する。
このように、ピッチ点数と歌詞点数とのそれぞれについて上述した統計処理を行うことにより、図５に示すような、区間毎の統計結果（間違い数、間違い率）が得られる。
このように区間毎に統計処理を行うことで、各観点（ピッチ、歌詞など）に対して多くの人が間違えている区間、すなわち間違えやすい区間を特定することができる。 FIG. 8 is a flowchart showing the flow of processing performed by the server device 3. When the CPU 31 of the server device 3 detects that the scoring result data and the song code have been received via the communication network 4, the received scoring result data (difference information) is stored in the scoring result database storage area 34a of the storage unit 34. Store (step SB1).
Then, the CPU 31 calculates the “number of errors” and the “error rate” for each section by using the statistics of the pitch score and the lyrics score of the scoring result data for each section (step SB2). Specifically, for example, as statistical processing of the number of pitch points, the number of pitch points stored in the pitch scoring result table of the scoring result database is counted for each section whose pitch score is 60% or less. The count result is stored in the item “number of errors” in the pitch scoring result table. Further, for each section, a value obtained by dividing the number of errors by the total number of singings in the number of pitch points is stored in the item of “error rate”.
For the lyrics score, the same statistical processing is performed as the lyrics score, and the “number of errors” and “error rate” of the lyrics for each section are entered in the items of “number of errors” and “error rate” in the lyrics scoring result table Remember each.
As described above, the statistical processing (number of errors, error rate) for each section as shown in FIG. 5 is obtained by performing the above-described statistical processing for each of the pitch score and the lyrics score.
By performing statistical processing for each section in this way, it is possible to specify a section in which many people are mistaken for each viewpoint (pitch, lyrics, etc.), that is, a section that is easily mistaken.

次に、サーバ装置３のＣＰＵ３１は、相違情報（ピッチ点数、歌詞点数）の統計を区間毎にとった統計結果（間違い数、間違い率）が、予め定められた条件を満たす区間を抽出する（ステップＳＢ３）。具体的には、例えば、間違い数の多い区間から順に、予め定められた数の区間を、間違いやすい区間として抽出する。または、間違い率が所定の閾値よりも大きい区間を抽出するようにしてもよい。また、他の例としては、間違い率の上位から何位までと決め打ちして抽出するようにしてもよい。または、区間毎に点数の平均値を算出して、それが低い方から所定の数分の区間を抽出するようにしてもよい。または、区間毎に点数の平均点を算出して、平均点が所定の閾値よりも小さい区間を抽出するようにしてもよい。要するに、統計結果が予め定められた条件を満たす区間を抽出するようにすればよい。 Next, the CPU 31 of the server device 3 extracts a section in which statistical results (number of errors, error rate) obtained by taking statistics of difference information (number of pitch points, number of lyrics) for each section satisfy a predetermined condition ( Step SB3). Specifically, for example, a predetermined number of sections are extracted as sections that are likely to be mistaken in order from a section having a large number of mistakes. Or you may make it extract the area where an error rate is larger than a predetermined threshold value. Further, as another example, it may be determined by extracting from the top of the error rate to what number. Alternatively, an average value of points may be calculated for each section, and a predetermined number of sections may be extracted from the lower value. Alternatively, an average score may be calculated for each section, and a section where the average score is smaller than a predetermined threshold may be extracted. In short, it is only necessary to extract a section in which a statistical result satisfies a predetermined condition.

サーバ装置３のＣＰＵ３１は、抽出した区間を示す区間指定情報を区間指定情報記憶領域３４ｂに記憶する。なお、このとき、ＣＰＵ３１が、通信ネットワーク４を介して区間指定情報をカラオケ装置２に送信することによって区間指定情報を出力するようにしてもよい（ステップＳＢ４）。 The CPU 31 of the server device 3 stores the section designation information indicating the extracted section in the section designation information storage area 34b. At this time, the CPU 31 may output the section designation information by transmitting the section designation information to the karaoke apparatus 2 via the communication network 4 (step SB4).

＜Ａ−２−２：報知動作＞
次に、区間指定情報の報知動作について説明する。
練習者は、カラオケ装置２の操作部１６を操作して歌唱したい曲を選定する。
カラオケ装置２は、操作部１６を介して曲が選定されたことを検知すると、練習者が歌唱するに先立って、その曲において間違いやすい区間を示す情報を取得する旨のリクエスト情報を、通信ネットワーク４を介してサーバ装置３に送信する。
サーバ装置３のＣＰＵ３１は、通信ネットワーク４を介してリクエスト情報を受信したことを検知すると、受信したリクエスト情報と対応する曲の曲コードを区間指定情報記憶領域３４ｂから検索し、検索された曲コードと対応付けて記憶された区間指定情報を、通信ネットワーク４を介してカラオケ装置２に送信する。 <A-2-2: Notification operation>
Next, the notification operation of section designation information will be described.
The practitioner operates the operation unit 16 of the karaoke apparatus 2 to select a song to be sung.
When the karaoke device 2 detects that a song has been selected via the operation unit 16, the karaoke device 2 transmits request information indicating that information indicating a section that is likely to be mistaken in the song is acquired prior to the singing by the communication network. 4 to the server device 3 via
When detecting that the request information has been received via the communication network 4, the CPU 31 of the server device 3 searches the section designation information storage area 34b for the song code of the song corresponding to the received request information, and the retrieved song code. The section designation information stored in association with is transmitted to the karaoke apparatus 2 via the communication network 4.

カラオケ装置２のＣＰＵ１１は、通信ネットワーク４を介して区間指定情報を受信したことを検知すると、練習者に報知モードを選択させる旨の画面を表示部１５に表示させる。
図９は、表示部１５に表示される画面の一例を示す図である。図示のように、「間違いやすい箇所を歌唱中に報知」、「間違いやすい箇所を練習」および「間違いやすい箇所を代替歌唱」のいずれかの選択を促す画面が表示部１５に表示される。練習者は、表示部１５に表示される画面を確認しつつ、操作部１６を操作してこれらのうちのいずれかを選択する。 When the CPU 11 of the karaoke apparatus 2 detects that the section designation information has been received via the communication network 4, the CPU 11 causes the display unit 15 to display a screen for allowing the practitioner to select the notification mode.
FIG. 9 is a diagram illustrating an example of a screen displayed on the display unit 15. As shown in the figure, a screen that prompts the user to select one of “notify an easily mistaken part during singing”, “practice an easily mistaken part”, and “alternate an easily mistaken part” is displayed on the display unit 15. The practitioner operates the operation unit 16 while checking the screen displayed on the display unit 15, and selects one of these.

図９に示す画面において、「間違いやすい箇所を歌唱中に報知」が選択された場合は、カラオケ装置２のＣＰＵ１１は、指定された曲の伴奏データを伴奏データ記憶領域１４ａから読み出し、音声処理部１８に供給する。音声処理部１８は、供給された伴奏データをアナログ信号（伴奏音信号）に変換してスピーカ１９に供給して放音させる。このとき、ＣＰＵ１１は、音声処理部１８によって生成される伴奏音信号が楽曲のどの位置にあたるかを認識する。この認識処理は、具体的には、例えば、ＣＰＵ１１が、音声処理部１８に供給する伴奏データに含まれる小節番号情報から認識する。 In the screen shown in FIG. 9, when “notify an easily mistaken part during singing” is selected, the CPU 11 of the karaoke apparatus 2 reads the accompaniment data of the designated song from the accompaniment data storage area 14 a, and an audio processing unit 18 is supplied. The audio processing unit 18 converts the supplied accompaniment data into an analog signal (accompaniment sound signal) and supplies the analog signal to the speaker 19 for sound emission. At this time, the CPU 11 recognizes which position of the music the accompaniment sound signal generated by the sound processing unit 18 is. Specifically, for example, the CPU 11 recognizes this recognition process from the measure number information included in the accompaniment data supplied to the audio processing unit 18.

ＣＰＵ１１は、認識した位置（小節）とサーバ装置３から取得した区間指定情報の開始位置（小節）とを比較し、両者の差が所定の差（この実施形態では１フレーズ）になったときに、当該区間指定情報が示す区間を報知するとともに、区間指定情報に含まれる特徴データに応じて予め設定された態様の報知を行う。予め設定された態様とは、例えば、特徴データが「ピッチ間違い」である区間については、ガイドメロディの音量を大きくする態様であってもよい。また、例えば、特徴データが「歌詞間違い」である区間については、歌詞表示の文字を大きくしたり、文字に色をつけたり、文字の上部に付点を表示させる等の態様であってもよい。または、例えば、画面上に、「次のフレーズは歌詞に注意」などのメッセージを表示部１５に表示させる態様であってもよい。要するに、特徴データに応じて予め設定された態様の報知であればどのようなものであってもよい。なお、予め定められた態様とは、一つの態様に限らず、歌詞を逐一太字にする態様と、単なる注意で止める態様とを選択できるようにしてもよい。 The CPU 11 compares the recognized position (measure) with the start position (measure) of the section designation information acquired from the server device 3, and when the difference between the two becomes a predetermined difference (1 phrase in this embodiment). In addition to notifying the section indicated by the section specifying information, a preset mode is notified according to the feature data included in the section specifying information. The preset mode may be, for example, a mode in which the volume of the guide melody is increased in a section where the feature data is “pitch error”. Further, for example, with respect to the section in which the feature data is “Lyrics mistake”, it is possible to enlarge the characters displayed in the lyrics, add color to the characters, or display a dot on the upper part of the characters. Alternatively, for example, the display unit 15 may be configured to display a message such as “Be careful with lyrics for the next phrase” on the screen. In short, any information may be used as long as it is a notification in a preset manner according to the feature data. Note that the predetermined mode is not limited to one mode, and a mode in which the lyrics are bolded one by one and a mode in which the lyrics are simply stopped may be selected.

図１０は、区間指定情報の報知の態様の一例を示す図である。この例においては、区間指定情報が、歌詞の「母の背中」という部分の区間を示す情報であり、また特徴データが「歌詞間違い」である場合の報知の態様の一例について示している。この場合、ＣＰＵ１１は、楽曲の進行に応じて伴奏データ記憶領域１４ａから伴奏データを読み出して音声処理部１８に供給するとともに、供給する伴奏データと対応する歌詞データを歌詞データ記憶領域１４ｂから読み出して表示部１５に供給する。音声処理部１８は、供給される伴奏データに基づいて伴奏音信号をスピーカ１９から放音させ、表示部１５は、ＣＰＵ１１の制御の下、図１０に示すように歌詞データに応じた歌詞テロップＡ１を表示する。
このとき、ＣＰＵ１１は、サーバ装置３から取得した区間指定情報に含まれる区間番号に基づいて当該区間指定情報の開始位置を特定する。そして、ＣＰＵ１１は、伴奏音信号の位置と区間指定情報の開始位置とを比較し、両者の差が所定の差（この実施形態では１フレーズ）になったときに、区間指定情報に含まれる特徴データに応じて、当該区間指定情報が示す区間と対応する歌詞の文字Ａ１１を大きくして表示部１５に表示させ、また、「次のフレーズには歌詞に注意しましょう」といった練習者の注意を促すメッセージＡ１２を表示部１５に表示させる。 FIG. 10 is a diagram illustrating an example of a notification mode of section designation information. In this example, the section designation information is information indicating the section of the part “mother's back” of the lyrics, and an example of a notification mode when the feature data is “Lyrics mistake” is shown. In this case, the CPU 11 reads out the accompaniment data from the accompaniment data storage area 14a according to the progress of the music and supplies it to the audio processing unit 18, and reads out the lyrics data corresponding to the supplied accompaniment data from the lyrics data storage area 14b. This is supplied to the display unit 15. The audio processing unit 18 emits an accompaniment sound signal from the speaker 19 based on the supplied accompaniment data, and the display unit 15 controls the lyrics telop A1 corresponding to the lyric data as shown in FIG. Is displayed.
At this time, the CPU 11 specifies the start position of the section designation information based on the section number included in the section designation information acquired from the server device 3. Then, the CPU 11 compares the position of the accompaniment sound signal with the start position of the section designation information, and when the difference between the two becomes a predetermined difference (1 phrase in this embodiment), the feature included in the section designation information. Depending on the data, the letter A11 of the lyrics corresponding to the section indicated by the section designation information is enlarged and displayed on the display unit 15, and the practitioner's attention such as “Let's pay attention to the lyrics for the next phrase” is displayed. A prompt message A12 is displayed on the display unit 15.

このように歌詞や音程（ピッチ）の間違えやすい箇所が報知されるから、練習者は、初めて歌唱する場合であっても、これから歌唱する区間ではどういう間違いをしやすいかを事前に知ることができ、そこに注意して歌唱することができる。 In this way, because it is informed of easy to make mistakes in the lyrics and pitch (pitch), the practitioner can know in advance what kind of mistakes are likely to be made in the section to be sung, even when singing for the first time. You can sing with care.

また、本実施形態においては、過去の歌唱者の統計結果に基づいて間違いやすい箇所を特定している。間違いやすい箇所はある程度個人差はあるものの、それらは似通っていることが多いから、過去の歌唱者の統計をとることで、間違いやすい箇所をより正確に特定することができる。 Moreover, in this embodiment, the location which is easy to make an error is specified based on the statistical result of the past singer. Although there are some individual differences in the places where mistakes are likely to occur, they are often similar, so by taking statistics of past singers, it is possible to more accurately identify places where mistakes are likely to be made.

次に、図９に示す画面において、「間違いやすい箇所を練習」するモードが選択された場合には、ＣＰＵ１１は、まず、サーバ装置３から取得した区間指定情報の示す区間のリストを表示部１５に表示させる。
図１１は、区間指定情報の示す区間のリストが表示された画面の一例を示す図である。図示のように、ＣＰＵ１１は、区間指定情報の示す複数の区間の情報（何小節目、区間の特徴（歌詞間違いが多い、音程間違いが多い、等）、その区間と対応する歌詞、等）を、表示部１５に表示させる。練習者は、表示部１５に表示される画面を確認しつつ、自分が練習したい区間を選択する。ＣＰＵ１１は、区間が選択されたことを検知すると、選択された区間と対応する部分の伴奏データを、伴奏データ記憶領域１４ａから読み出して、音声処理部１８に供給する。音声処理部１８は、供給された伴奏データをアナログ信号に変換してスピーカ１９に供給して放音させる。 Next, in the screen shown in FIG. 9, when the “practice part that is easy to mistake” mode is selected, the CPU 11 first displays a list of sections indicated by the section designation information acquired from the server device 3. To display.
FIG. 11 is a diagram illustrating an example of a screen on which a list of sections indicated by the section specifying information is displayed. As shown in the figure, the CPU 11 displays information of a plurality of sections indicated by the section designation information (number of bars, section characteristics (many lyrics errors, many pitch errors, etc.), lyrics corresponding to the sections, etc.). To be displayed on the display unit 15. The practitioner selects the section he / she wants to practice while confirming the screen displayed on the display unit 15. When detecting that the section has been selected, the CPU 11 reads out the accompaniment data corresponding to the selected section from the accompaniment data storage area 14 a and supplies it to the audio processing unit 18. The sound processing unit 18 converts the supplied accompaniment data into an analog signal and supplies it to the speaker 19 for sound emission.

このように、間違いやすい区間と対応する部分の伴奏が放音されるから、練習者は、初めて歌唱する場合であっても、間違いやすい箇所を事前に練習することができる。 Thus, since the accompaniment of the part corresponding to the easily mistaken section is emitted, the practitioner can practice in advance a mistaken place even when singing for the first time.

次に、図９に示す画面において、「間違いやすい箇所を代替歌唱」のモードが選択された場合には、ＣＰＵ１１は、指定された曲の伴奏データを伴奏データ記憶領域１４ａから読み出し、音声処理部１８に供給する。音声処理部１８は、供給された伴奏データをアナログ信号に変換してスピーカ１９に供給して放音させる。また、このとき、ＣＰＵ１１は、伴奏データに含まれる小節番号情報に基づいて、音声処理部１８によって生成される伴奏音信号が楽曲のどの位置にあたるかを認識する。
ＣＰＵ１１は、認識した位置（小節）とサーバ装置３から取得した区間指定情報の開始位置（小節）とが一致するタイミングで、当該区間指定情報の示す区間と対応する部分の模範音声データを模範音声データ記憶領域１４ｅから読み出し、読み出した模範音声データを音声処理部１８に供給する。音声処理部１８は、供給された模範音声データをアナログ信号に変換してスピーカ１９に供給して放音させる。すなわち、サーバ装置３から取得した区間指定情報の示す区間においては、伴奏音と模範音声との両方が放音され、一方、サーバ装置３から取得した区間指定情報の示す区間以外においては、伴奏音のみが再生される。 Next, in the screen shown in FIG. 9, when the mode of “alternative singing easy to mistake” is selected, the CPU 11 reads out the accompaniment data of the designated song from the accompaniment data storage area 14 a, and the audio processing unit 18 is supplied. The sound processing unit 18 converts the supplied accompaniment data into an analog signal and supplies it to the speaker 19 for sound emission. At this time, the CPU 11 recognizes where the accompaniment sound signal generated by the audio processing unit 18 is based on the bar number information included in the accompaniment data.
At the timing when the recognized position (measure) matches the start position (measure) of the section designation information acquired from the server device 3, the CPU 11 sets the model voice data of the portion corresponding to the section indicated by the section designation information as model voice. The model audio data read from the data storage area 14 e is supplied to the audio processing unit 18. The voice processing unit 18 converts the supplied model voice data into an analog signal and supplies the analog signal to the speaker 19 for sound emission. That is, both the accompaniment sound and the model voice are emitted in the section indicated by the section designation information acquired from the server device 3, while the accompaniment sound is generated in a section other than the section indicated by the section designation information acquired from the server device 3. Only play.

このように、間違いやすい区間においては、模範音声が放音されるから、カラオケ装置２は、間違いやすい（難しい）箇所については練習者に代わって歌唱音声を放音することができる。すなわち、練習者は、曲のほとんどを自分で歌唱するものの、難しい（間違いやすい）箇所については自分で歌唱しなくてもカラオケ装置２が模範音声で代替歌唱してくれることになる。これにより、練習者は、曲中の難しい箇所を自分で歌唱する必要がなく、難しい箇所が含まれていて歌唱が困難であるとみなしていた曲であっても、その曲の歌唱に挑戦することができる。 In this way, the model voice is emitted in the section where mistakes are likely to occur, so that the karaoke apparatus 2 can emit the singing voice on behalf of the practitioner at the place where mistakes are difficult (difficult). In other words, the practitioner sings most of the tunes himself, but the karaoke device 2 sings with a model voice instead of singing difficult (prone to mistakes) by himself. As a result, the practitioner does not need to sing difficult parts in the song himself, and even if it is a song that contains difficult parts and is considered difficult to sing, it will challenge the song. be able to.

＜Ｂ：第２実施形態＞
次に、この発明の第２の実施形態について説明する。
この実施形態が、上述した第１の実施形態と異なる点は、カラオケ装置の記憶部に記憶されているデータが異なる点と、カラオケ装置が行う採点処理とが異なる点であり、他の構成は、上述した第１実施形態と同様である。そのため、以下の説明においては、上述した第１実施形態と同様の構成要素については、同じ符号を用いてその説明を省略する。 <B: Second Embodiment>
Next explained is the second embodiment of the invention.
This embodiment differs from the first embodiment described above in that the data stored in the storage unit of the karaoke device is different from the scoring process performed by the karaoke device, and the other configurations are as follows. This is the same as in the first embodiment described above. Therefore, in the following description, the same components as those in the first embodiment described above are denoted by the same reference numerals and the description thereof is omitted.

図１２は、この実施形態のカラオケ装置２Ａのハードウェア構成の一例を示す図である。図において、記憶部１４の模範技法データ記憶領域１４ｆには、模範音声データ記憶領域１４ｅに記憶された模範音声データの表す模範となる歌唱に用いられている歌唱技法の種類とタイミングとを示すデータ（以下、「模範技法データ」）が記憶される。本実施形態においては、この模範技法データが模範データとして用いられる。
図１３は、模範技法データの内容の一例を示す図である。図示のように、模範技法データは、「区間情報」と「種別情報」との各項目が互いに関連付けられている。これらの項目のうち、「区間情報」の項目には、模範音声データにおいて歌唱技法が用いられた区間を示す情報が記憶される。なお、この区間情報が示す区間は、開始時刻情報と終了時刻情報とによって表される時間幅を有した区間であってもよく、またはある１点の時刻を示すものであってもよい。
「種別情報」の項目には、予め複数種類設定された歌唱技法を識別する情報が記憶される。この「種別情報」は、例えば「ビブラート」、「しゃくり」、「こぶし」、「ファルセット」、「つっこみ」、「ため」、「息継ぎ」などの歌唱技法を識別する情報である。「ビブラート」は、音の高さをほんのわずかに連続的に上下させ、震えるような音色を出す技法を示す。「しゃくり」は、目的の音より低い音から発音し、音程を滑らかに目的の音に近づけていく技法を示す。「こぶし」は、装飾的に加えるうねるような節回しを行う技法を示す。「ファルセット」は、いわゆる「裏声」で歌う技法を示す。「つっこみ」は、歌い出しを本来のタイミングよりも早いタイミングにする技法を示す。「ため」は、歌い出しを本来のタイミングよりも遅いタイミングにする技法を示す。「息継ぎ」は、歌唱者が息継ぎをするタイミングを示すものである。
また、練習者技法データ記憶領域１４ｇには、練習者音声データで用いられている歌唱技法を示すデータ（以下、「練習者技法データ」）が記憶される。この練習者技法データの構成は、上述した模範技法データの構成と同様であり、「区間情報」と「種別情報」との各項目が互いに関連付けられて構成されている。 FIG. 12 is a diagram illustrating an example of a hardware configuration of the karaoke apparatus 2A according to this embodiment. In the figure, in the model technique data storage area 14f of the storage unit 14, data indicating the type and timing of the singing technique used in the model singing represented by the model voice data stored in the model voice data storage area 14e. (Hereinafter "exemplary technique data") is stored. In the present embodiment, this model technique data is used as model data.
FIG. 13 is a diagram illustrating an example of the contents of model technique data. As illustrated, in the exemplary technique data, items of “section information” and “type information” are associated with each other. Among these items, the “section information” item stores information indicating a section in which the singing technique is used in the model voice data. The section indicated by the section information may be a section having a time width represented by the start time information and the end time information, or may indicate a certain point of time.
In the “type information” item, information for identifying a plurality of types of singing techniques set in advance is stored. This “type information” is information for identifying a singing technique such as “vibrato”, “shakuri”, “fist”, “farset”, “tsukkomi”, “for”, “breathing”, and the like. “Vibrato” refers to a technique that raises and lowers the pitch of the sound only slightly and produces a trembling tone. “Shikkuri” refers to a technique in which sound is generated from a sound lower than the target sound, and the pitch is smoothly brought close to the target sound. “Fist” refers to a technique for adding a decorative undulation. “Falset” indicates a technique of singing with a so-called “back voice”. “Tsukumi” refers to a technique for making the singing start earlier than the original timing. “For” indicates a technique for making the singing timing later than the original timing. The “breathing” indicates the timing when the singer breathes.
The practitioner technique data storage area 14g stores data indicating the singing technique used in the practitioner voice data (hereinafter, “practice technique data”). The structure of this practicer technique data is the same as that of the model technique data described above, and the items “section information” and “type information” are associated with each other.

図１４は、採点結果データの内容の一例を示す図である。
図示のように、この採点結果データは、上述した第１実施形態で示した「ピッチ点数」や「歌詞点数」に加えて、「ビブラート点数」や「しゃくり点数」等、歌唱技法の採点結果が区間（比較区間）毎に記憶されている。 FIG. 14 is a diagram showing an example of the contents of scoring result data.
As shown in the figure, the scoring result data includes singing technique scoring results such as “vibrato score” and “scribbing score” in addition to the “pitch score” and “lyric score” shown in the first embodiment. Stored for each section (comparison section).

次に、この実施形態の採点結果統計動作について、図１５に示すフローチャートを参照しつつ以下に説明する。
練習者がカラオケ装置２Ａの操作部１６を操作して曲を選択すると、選択内容を示す信号が操作部１６からＣＰＵ１１に出力される。ＣＰＵ１１は、操作内容を示す信号が入力されたことを検知すると、選択された曲と対応する模範音声データを記憶部１４の模範音声データ記憶領域１４ｅから読み出し、読み出した模範音声データに対して音声分析処理を行い、時刻に対応したピッチ、パワー、スペクトルを模範音声データから算出する（ステップＳＣ１）。続けて、ＣＰＵ１１は、伴奏データ記憶領域１４ａに記憶された伴奏データに含まれるメロディデータと模範音声データ記憶領域１４ｅに記憶された模範音声データとを所定のフレーム単位で解析し、模範音声データとメロディデータとの時間的な対応関係を検出する（ステップＳＣ２）。
次に、ＣＰＵ１１は、模範音声データから算出されたピッチ、パワーおよびスペクトルの時間的な変化のパターンを解析して、この解析結果が予め定められたパターンに対応するか否かを判定し、対応する場合には当該パターンに対応する区間を特定の歌唱技法が用いられている区間として特定する。そして、ＣＰＵ１１は、特定した区間の区間情報を、その歌唱技法を示す種別情報と関連付けて記憶部１４の模範技法データ記憶領域１４ｆに記憶する（ステップＳＣ３）。 Next, the scoring result statistical operation of this embodiment will be described below with reference to the flowchart shown in FIG.
When the practitioner operates the operation unit 16 of the karaoke apparatus 2A to select a song, a signal indicating the selection content is output from the operation unit 16 to the CPU 11. When the CPU 11 detects that the signal indicating the operation content has been input, the CPU 11 reads out the model voice data corresponding to the selected song from the model voice data storage area 14e of the storage unit 14, and performs voice processing on the read model voice data. Analysis processing is performed, and the pitch, power, and spectrum corresponding to the time are calculated from the model voice data (step SC1). Subsequently, the CPU 11 analyzes the melody data included in the accompaniment data stored in the accompaniment data storage area 14a and the model voice data stored in the model voice data storage area 14e in units of predetermined frames, A temporal correspondence with the melody data is detected (step SC2).
Next, the CPU 11 analyzes the pattern of temporal change in pitch, power, and spectrum calculated from the model voice data, determines whether or not this analysis result corresponds to a predetermined pattern, and When doing so, the section corresponding to the pattern is specified as a section in which a specific singing technique is used. Then, the CPU 11 stores the section information of the specified section in the exemplary technique data storage area 14f of the storage unit 14 in association with the type information indicating the singing technique (step SC3).

ここで、ステップＳＣ３に示す、各歌唱技法が用いられている区間の特定処理について以下に説明する。本実施形態においては、ＣＰＵ１１は、「ビブラート」、「しゃくり」、「こぶし」、「ファルセット」、「つっこみ」、「ため」および「息継ぎ」の各歌唱技法が用いられている区間を特定（検出）する。これらのうち、「ビブラート」および「しゃくり」は模範音声データから算出されたピッチに基づいて検出する。また、「こぶし」および「ファルセット」は模範音声データから算出されたスペクトルに基づいて検出する。また、「ため」および「つっこみ」は、模範音声データから算出されたピッチとメロディデータとに基づいて検出する。また、「息継ぎ」は、模範音声データから算出されたパワーとメロディデータとに基づいて検出する。 Here, the specific process of the section in which each singing technique is used shown in step SC3 will be described below. In the present embodiment, the CPU 11 specifies (detects) a section in which each singing technique of “vibrato”, “shakuri”, “fist”, “farset”, “tsukkomi”, “for” and “breathing” is used. ) Among these, “vibrato” and “shrimp” are detected based on the pitch calculated from the model voice data. “Fist” and “Falset” are detected based on the spectrum calculated from the model voice data. Further, “for” and “tsukkomi” are detected based on the pitch and melody data calculated from the model voice data. Further, “breathing” is detected based on the power calculated from the model voice data and the melody data.

ＣＰＵ１１は、模範音声データとメロディデータとの対応関係と、模範音声データから算出されたピッチとに基づいて、模範音声データに含まれる音の開始時刻と当該音に対応するメロディデータの音の開始時刻とが異なる区間を特定する。ここで、ＣＰＵ１１は、模範音声データのピッチの変化タイミングがメロディデータのピッチの変化タイミングよりも早く現れている区間、すなわち模範音声データに含まれる音の開始時刻が当該音に対応するメロディデータの音の開始時刻よりも早い区間については、この区間を「つっこみ」の歌唱技法が用いられている区間であると特定する。ＣＰＵ１１は、特定した区間の区間情報を、「つっこみ」を示す識別情報と関連付けて記憶部１４の模範技法データ記憶領域１４ｆに記憶する。
逆に、ＣＰＵ１１は、模範音声データとメロディデータとの対応関係と、模範音声データから算出されたピッチとに基づいて、模範音声データのピッチの変化タイミングがメロディデータのピッチの変化タイミングよりも遅れて現れている区間、すなわち模範音声データに含まれる音の開始時刻が当該音に対応するメロディデータの音の開始時刻よりも遅い区間を検出し、検出した区間を「ため」の歌唱技法が用いられている区間であると特定する。 Based on the correspondence between the model voice data and the melody data and the pitch calculated from the model voice data, the CPU 11 starts the sound of the sound included in the model voice data and the start of the sound of the melody data corresponding to the sound. Specify a section with a different time. Here, the CPU 11 is a section in which the pitch change timing of the model voice data appears earlier than the pitch change timing of the melody data, that is, the start time of the sound included in the model voice data is the melody data corresponding to the sound. For a section earlier than the start time of the sound, this section is specified as a section in which the “Tsukumi” singing technique is used. The CPU 11 stores the section information of the specified section in the exemplary technique data storage area 14 f of the storage unit 14 in association with the identification information indicating “push”.
Conversely, the CPU 11 delays the pitch change timing of the model voice data from the pitch change timing of the melody data based on the correspondence between the model voice data and the melody data and the pitch calculated from the model voice data. , That is, a section where the start time of the sound included in the model voice data is later than the start time of the sound of the melody data corresponding to the sound, and the singing technique of “for” is used for the detected section It is specified that it is a section that has been.

また、ＣＰＵ１１は、模範音声データから算出したピッチの時間的な変化のパターンを解析して、中心となる周波数の上下に所定の範囲内でピッチが連続的に変動している区間を検出し、検出した区間を「ビブラート」の歌唱技法が用いられている区間であると特定する。 Further, the CPU 11 analyzes a pattern of temporal change of the pitch calculated from the model voice data, detects a section where the pitch continuously fluctuates within a predetermined range above and below the central frequency, The detected section is identified as a section in which the “vibrato” singing technique is used.

また、ＣＰＵ１１は、模範音声データから算出したピッチの時間的な変化のパターンを解析して、低いピッチから高いピッチに連続的にピッチが変化する区間を検出し、検出した区間を「しゃくり」の歌唱技法が用いられている区間であると特定する。なお、この処理は、メロディデータとの対応関係に基づいて行うようにしてもよい。すなわち、ＣＰＵ１１は、模範音声データとメロディデータとの対応関係に基づいて、模範音声データのピッチが、低いピッチから連続的にメロディデータのピッチに近づいている区間を検出すればよい。 Further, the CPU 11 analyzes the pattern of the temporal change of the pitch calculated from the model voice data, detects a section where the pitch continuously changes from a low pitch to a high pitch, and detects the detected section as “shrunk”. It is specified that it is a section in which the singing technique is used. This process may be performed based on the correspondence with the melody data. In other words, the CPU 11 may detect a section in which the pitch of the model voice data is continuously approaching the pitch of the melody data from a low pitch based on the correspondence relationship between the model voice data and the melody data.

また、ＣＰＵ１１は、模範音声データとメロディデータとの対応関係と、模範音声データから算出されたパワーとに基づいて、メロディデータが有音である区間であって模範音声データのパワー値が所定の閾値よりも小さい区間を検出し、検出した箇所を「息継ぎ」の区間であると特定する。 Further, the CPU 11 is a section in which the melody data is sound and the power value of the model voice data is predetermined based on the correspondence between the model voice data and the melody data and the power calculated from the model voice data. A section smaller than the threshold is detected, and the detected part is identified as a "breathing" section.

また、ＣＰＵ１１は、模範音声データから算出されたスペクトルの時間的な変化パターンを解析して、スペクトル特性がその予め決められた変化状態に急激に遷移している区間を検出し、検出した区間を「ファルセット」の歌唱技法が用いられている区間であると特定する。ここで、予め決められた変化状態とは、スペクトル特性の高調波成分が極端に少なくなる状態である。例えば、図１６に示すように、地声の場合は沢山の高調波成分が含まれるが（同図（ａ）参照）、ファルセットになると高調波成分の大きさが極端に小さくなる（同図（ｂ）参照）。なお、この場合、ＣＰＵ１１は、ピッチが大幅に上方に変化したかどうかも参照してもよい。ファルセットは地声と同一のピッチを発生する場合でも用いられることもあるが、一般には地声では発声できない高音を発声するときに使われる技法だからである。したがって、音声データのピッチが所定音高以上の場合に限って「ファルセット」の検出をするように構成してもよい。また、男声と女声とでは一般にファルセットを用いる音高の領域が異なるので、音声データの音域や、音声データから検出されるフォルマントによって性別検出を行い、この結果を踏まえてファルセット検出の音高領域を設定してもよい。
また、ＣＰＵ１１は、スペクトル特性の変化の態様が短時間に多様に切り替わる区間を検出し、検出した部分を「こぶし」の歌唱技法が用いられている部分であると特定する。「こぶし」の場合は、短い区間において声色や発声方法を変えて唸るような味わいを付加する歌唱技法であるため、この技法が用いられている区間においてはスペクトル特性が多様に変化するからである。
以上のようにして、ＣＰＵ１１は、模範音声データから各歌唱技法が用いられている区間を検出し、検出した区間を示す区間情報をその歌唱技法を示す種別情報と関連付けて記憶部１４の模範技法データ記憶領域１４ｆに記憶する。 In addition, the CPU 11 analyzes the temporal change pattern of the spectrum calculated from the model voice data, detects a section where the spectrum characteristic is abruptly changed to the predetermined change state, and detects the detected section. It is specified that the section uses the “Falset” singing technique. Here, the predetermined change state is a state in which the harmonic component of the spectrum characteristic is extremely reduced. For example, as shown in FIG. 16, in the case of a local voice, many harmonic components are included (refer to FIG. 16A), but when a falset is formed, the magnitude of the harmonic components becomes extremely small (FIG. b)). In this case, the CPU 11 may also refer to whether or not the pitch has changed significantly upward. The falset is sometimes used even when generating the same pitch as the local voice, but is generally a technique used when generating high-pitched sounds that cannot be generated by the local voice. Therefore, “Falset” may be detected only when the pitch of the audio data is equal to or higher than a predetermined pitch. In addition, since the pitch range using the falset is generally different between male voice and female voice, gender detection is performed based on the voice data range and formants detected from the voice data, and based on this result, the pitch range for falset detection is determined. It may be set.
In addition, the CPU 11 detects a section in which the mode of change of the spectrum characteristic is variously switched in a short time, and identifies the detected part as a part where the “fist” singing technique is used. In the case of “fist”, it is a singing technique that adds a taste that changes the voice color and utterance method in a short section, so the spectral characteristics change variously in the section where this technique is used. .
As described above, the CPU 11 detects the section in which each singing technique is used from the model voice data, associates the section information indicating the detected section with the type information indicating the singing technique, and stores the model technique in the storage unit 14. Store in the data storage area 14f.

図１５の説明に戻る。カラオケ装置２ＡのＣＰＵ１１は、模範技法データの生成処理（ステップＳＣ３）を終えると、伴奏データ記憶領域１４ａに記憶された伴奏データを読み出して、読み出した伴奏データを音声処理部１８に供給する。音声処理部１８は、供給された伴奏データをアナログ信号に変換して伴奏データの表す音声をスピーカ１９から発音させる。また、ＣＰＵ１１は、伴奏データを音声処理部１８に供給するに併せて、歌詞データ記憶領域１４ｂに記憶された歌詞データを表示部１５に供給して、再生される伴奏に対応する歌詞を表示部１５に表示させる。 Returning to the description of FIG. CPU11 of 2A of karaoke apparatuses will read the accompaniment data memorize | stored in the accompaniment data storage area 14a, and will supply the read accompaniment data to the audio | voice processing part 18, after the production | generation process (step SC3) of model technique data is complete | finished. The sound processing unit 18 converts the supplied accompaniment data into an analog signal and causes the speaker 19 to sound the sound represented by the accompaniment data. In addition to supplying accompaniment data to the audio processing unit 18, the CPU 11 supplies the lyrics data stored in the lyrics data storage area 14b to the display unit 15 and displays lyrics corresponding to the accompaniment to be reproduced. 15 is displayed.

練習者は、表示部１５に表示される歌詞を確認しつつ、スピーカ１９から発音される伴奏に併せて歌唱を行う。練習者によって歌唱が行われると、練習者の音声がマイクロフォン１７によって音声信号に変換され、変換された信号が音声処理部１８へ出力される。音声処理部１８は、マイクロフォン１７から出力された音声信号をデジタルデータに変換して練習者音声データとする（ステップＳＣ４）。この練習者音声データは、音声処理部１８から出力されて記憶部１４の練習者音声データ記憶領域１４ｃに記憶される。 The practitioner sings along with the accompaniment sounded from the speaker 19 while confirming the lyrics displayed on the display unit 15. When the practitioner sings, the practitioner's voice is converted into a voice signal by the microphone 17, and the converted signal is output to the voice processing unit 18. The voice processing unit 18 converts the voice signal output from the microphone 17 into digital data to obtain trainer voice data (step SC4). The trainer speech data is output from the speech processing unit 18 and stored in the trainer speech data storage area 14c of the storage unit 14.

伴奏の再生が終了すると、カラオケ装置２ＡのＣＰＵ１１は、練習者音声データ記憶領域１４ｃに記憶された練習者音声データに対して基礎分析処理を行って、ピッチ、パワー、スペクトルを算出する（ステップＳＣ５）。また、カラオケ装置２ＡのＣＰＵ１１は、伴奏データ記憶領域１４ａに記憶された伴奏データに含まれるメロディデータと練習者音声データ記憶領域１４ｃに記憶された練習者音声データとを所定のフレーム単位で解析し、練習者音声データとメロディデータとの時間的な対応関係を検出する（ステップＳＣ６）。続けて、ＣＰＵ１１は、練習者音声データから練習者技法データを生成する（ステップＳＣ７）。これらのステップＳＣ５〜ＳＣ７に示した処理と、上述したステップＳＣ２〜ＳＣ４に示した処理が異なる点は、処理対象となる音声データが異なる点である。すなわち、ステップＳＣ２〜ステップＳＣ４では模範音声データに対して処理を行い、ステップＳＣ５〜ＳＣ７では練習者音声データに対して処理が行われるものの、その処理内容については同様であるため、ステップＳＣ５〜ステップＳＣ７についてはその詳細な説明を省略する。 When reproduction of the accompaniment is completed, the CPU 11 of the karaoke apparatus 2A performs basic analysis processing on the trainer speech data stored in the trainer speech data storage area 14c to calculate the pitch, power, and spectrum (step SC5). ). Further, the CPU 11 of the karaoke apparatus 2A analyzes the melody data included in the accompaniment data stored in the accompaniment data storage area 14a and the practitioner voice data stored in the practitioner voice data storage area 14c in a predetermined frame unit. Then, the temporal correspondence between the trainer voice data and the melody data is detected (step SC6). Subsequently, the CPU 11 generates trainer technique data from the trainer voice data (step SC7). The difference between the processing shown in steps SC5 to SC7 and the processing shown in steps SC2 to SC4 described above is that the audio data to be processed is different. That is, in step SC2 to step SC4, the process is performed on the model voice data, and in steps SC5 to SC7, the process is performed on the trainer voice data. Detailed description of SC7 will be omitted.

次に、カラオケ装置２ＡのＣＰＵ１１は、模範音声データと練習者音声データとの両者の波形同士を直接対比して、例えばＤＴＷ（Dynamic Time Warping）等により、模範音声データと、練習者音声データとの時間的な対応付けをフレーム毎に行い、両者の対応箇所を検出する（ステップＳＣ８）。 Next, the CPU 11 of the karaoke apparatus 2A directly compares the waveforms of both the model voice data and the trainer voice data, and uses, for example, DTW (Dynamic Time Warping) to Are associated with each other for each frame, and the corresponding location of both is detected (step SC8).

続けて、カラオケ装置２ＡのＣＰＵ１１は、ステップＳＣ８で検出した対応箇所に基づいて、模範音声データ記憶領域１４ｅに記憶された模範音声データと音声処理部１８から出力された練習者音声データとを比較し、練習者音声データと模範音声データとの相違の程度を示す相違情報を区間毎に生成する（ステップＳＣ９）。具体的には、ＣＰＵ１１は、模範音声データのピッチと練習者音声データのピッチとを区間毎に比較して、練習者音声データと模範音声データとの相違の程度を示す相違情報を区間毎に生成して、生成した相違情報を採点結果データ記憶領域１４ｄに記憶する。また、ＣＰＵ１１は、記憶部１４の模範技法データ記憶領域１４ｆに記憶された模範技法データをひとつずつ読み出して、読み出した模範技法データと対応する練習者技法データを練習者技法データ記憶領域１４ｇから検索し、模範技法データと練習者技法データとを区間毎に比較して、両者の相違の程度を示す相違情報を生成して、採点結果データ記憶領域１４ｄに記憶する。そして、ＣＰＵ１１は、生成した採点結果データを通信ネットワーク４を介してサーバ装置３に送信する（ステップＳＣ１０）。 Subsequently, the CPU 11 of the karaoke apparatus 2A compares the model voice data stored in the model voice data storage area 14e with the practitioner voice data output from the voice processing unit 18 based on the corresponding part detected in step SC8. Then, difference information indicating the degree of difference between the trainer voice data and the model voice data is generated for each section (step SC9). Specifically, the CPU 11 compares the pitch of the model voice data and the pitch of the trainer voice data for each section, and displays difference information indicating the degree of difference between the trainer voice data and the model voice data for each section. The generated difference information is stored in the scoring result data storage area 14d. Further, the CPU 11 reads the model technique data stored in the model technique data storage area 14f of the storage unit 14 one by one, and searches the trainer technique data storage area 14g for the trainer technique data corresponding to the read model technique data. Then, the model technique data and the practitioner technique data are compared for each section, and difference information indicating the degree of difference between the two is generated and stored in the scoring result data storage area 14d. Then, the CPU 11 transmits the generated scoring result data to the server device 3 via the communication network 4 (step SC10).

サーバ装置３は、通信ネットワーク４を介して採点結果データを受信したことを検知すると、受信された採点結果データを記憶部３４の採点結果データベース記憶領域３４ａに記憶し、採点結果データの統計を区間毎にとって、区間毎の歌詞、ピッチ、歌唱技法（ビブラート、こぶし等）の統計結果（間違い数、間違い率）を算出する。なお、この処理は、図８のステップＳＢ１〜ステップＳＢ２に示した処理とその処理対象となるデータ（ピッチ、歌詞、歌唱技法）が異なるものの、おおまかな処理の流れは上述したそれと同様であるため、ここではその説明を省略する。 When detecting that the scoring result data has been received via the communication network 4, the server device 3 stores the received scoring result data in the scoring result database storage area 34a of the storage unit 34, and the statistics of the scoring result data are stored in the section. For each section, the statistical results (number of errors, error rate) of the lyrics, pitches, and singing techniques (vibrato, fist, etc.) for each section are calculated. Although this process is different from the process shown in steps SB1 and SB2 in FIG. 8 and the data (pitch, lyrics, singing technique) to be processed, the general process flow is the same as that described above. The description is omitted here.

このように、本実施形態においては、ピッチ、歌詞に加えて、歌唱技法の相違箇所を抽出するから、練習者に、歌唱技法の相違箇所についても報知することができ、より詳細な情報を練習者に報知することができる。 As described above, in this embodiment, in addition to the pitch and the lyrics, the differences in the singing technique are extracted, so the practitioner can be notified of the differences in the singing technique, and more detailed information can be practiced. The person can be notified.

＜Ｃ：変形例＞
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。以下にその一例を示す。
（１）上述した実施形態においては、ガイドメロディの音量を大きくしたり、歌詞の文字を大きくしたりする等によって、特定の区間（間違えやすい区間）を練習者に報知するようにした。報知の形態はこれに限らず、例えば注意を促す音声メッセージや警告音を出力するような形態であってもよいし、または、伴奏音の放音を開始する前に間違えやすい区間の一覧を表示する形態であってもよい。要は、練習者に対して何らかの手段でメッセージ乃至情報を伝えることができる報知形態であればよい。 <C: Modification>
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. An example is shown below.
(1) In the above-described embodiment, a specific section (a section that is easily mistaken) is notified to the practitioner by increasing the volume of the guide melody or increasing the text of the lyrics. The form of notification is not limited to this. For example, it may be a form that outputs a voice message or warning sound to call attention, or a list of sections that are likely to be mistaken is displayed before the start of accompaniment sound emission. It may be a form to do. In short, any notification form that can convey a message or information to the practitioner by some means may be used.

また、上述した実施形態においては、区間指定情報に含まれる特徴データに応じて予め設定された態様の報知を行うようにしたが、区間指定情報に特徴データを含まない構成としてもよい。この場合は、カラオケ装置のＣＰＵは、「歌詞に注意」といった注意表示を行わずに、区間指定情報の示す区間（例えば、フレーズ番号や小節番号等）を報知するようにすればよい。 Further, in the above-described embodiment, notification in a preset mode is performed according to the feature data included in the section designation information. However, the section designation information may not include feature data. In this case, the CPU of the karaoke apparatus may notify the section (for example, a phrase number or measure number) indicated by the section designation information without performing a caution display such as “attention to lyrics”.

（２）カラオケ装置２のＣＰＵ１１が行う採点動作として、第１実施形態においては練習者音声データとメロディデータとを比較し、一方、第２実施形態においては練習者音声データと模範音声データとを比較して採点を行った。採点方法はいずれか一方を用いてもよく、または、両方を用いて採点処理を行うようにしてもよい。また、上述した実施形態においては、音声のピッチ、歌詞または歌唱技法を区間毎に比較するようにしたが、これに限らず、例えばＦＦＴ（Fast Fourier Transform）を用いて練習者音声データと模範音声データとのそれぞれに対してフォルマントの検出を行い、区間毎に声質を比較して採点を行うようにしてもよい。 (2) As a scoring operation performed by the CPU 11 of the karaoke apparatus 2, the trainer voice data and the melody data are compared in the first embodiment, while the trainer voice data and the model voice data are compared in the second embodiment. The scoring was done by comparison. Either one of the scoring methods may be used, or both may be used for scoring processing. In the above-described embodiment, the pitch, lyrics, or singing technique of the voice is compared for each section. However, the present invention is not limited to this. The formants may be detected for each of the data, and the voice quality may be compared for each section for scoring.

（３）上述した実施形態においては、区間指定情報に含まれる区間番号は、１小節を単位として小節毎に割り当てられた番号を用いたが、単位区間は小節に限定されるものではなく、例えば１フレーズを単位とするものであってもよく、または音符を単位とするものであってもよい。要は、予め定められた単位であればどのようなものであってもよい。
また、上述した実施形態においては、カラオケ装置２のＣＰＵ１１は、伴奏データに含まれる小節番号情報から認識した位置（小節）とサーバ装置３から取得した区間指定情報の開始位置（小節）とを比較し、両者の差が１フレーズになったときに、当該区間指定情報が示す区間を報知するようになっていた。この場合の「両者の差」は「１フレーズ」に限定されるものではなく、例えば、２フレーズであってもよく、または、１小節であってもよい。要は、両者の差が所定の差になったときに、区間指定情報の示す区間を報知すればよい。 (3) In the above-described embodiment, the section number included in the section designation information is a number assigned to each measure in units of one measure, but the unit section is not limited to a measure. One phrase may be used as a unit, or a note may be used as a unit. In short, any unit may be used as long as it is a predetermined unit.
In the embodiment described above, the CPU 11 of the karaoke device 2 compares the position (measure) recognized from the measure number information included in the accompaniment data and the start position (measure) of the section designation information acquired from the server device 3. However, when the difference between the two becomes one phrase, the section indicated by the section designation information is notified. The “difference between the two” in this case is not limited to “1 phrase”, and may be, for example, 2 phrases or 1 measure. In short, when the difference between the two becomes a predetermined difference, the section indicated by the section designation information may be notified.

また、上述した実施形態においては、カラオケ装置２のＣＰＵ１１が伴奏データに含まれる小節番号情報に応じて、音声処理部１８によって生成される伴奏音信号が楽曲のどの位置にあたるかを認識するようになっていたが、伴奏音信号の位置の認識方法はこれに限定されるものではなく、例えば、ＣＰＵ１１による伴奏データの読み出し処理に応じて、音声処理部１８が生成した伴奏音信号が楽曲のどの位置にあたるかを認識するようにしてもよい。具体的には、例えば、カラオケ装置のＣＰＵが、記憶部から読み出して音声処理部に供給する伴奏データについて、どういう長さの音符（または休符）を供給したかを積算することによって認識するようにしてもよい。または、ＣＰＵ１１が、テンポクロックをカウントして、曲頭からの拍数でカウントするようにしてもよい。また、音符や拍数（あるいは小節やフレーズ）の積算値は、テンポの速度が分かれば時間に変換することができるので、曲頭からの時間データを用いて、伴奏音信号の曲中の位置を認識してもよい。この場合において、実施形態における相違情報や区間指定情報に含まれる「区間番号」（図８参照）に相当する部分は、伴奏音信号の位置認識に対応して、曲頭からの音符（休符も含む）の累積長、曲頭からの拍数あるいは曲頭からの経過時間を示すデータにするとよい。 Further, in the above-described embodiment, the CPU 11 of the karaoke apparatus 2 recognizes where the accompaniment sound signal generated by the audio processing unit 18 corresponds to the measure number information included in the accompaniment data. However, the method for recognizing the position of the accompaniment sound signal is not limited to this. For example, the accompaniment sound signal generated by the sound processing unit 18 in accordance with the accompaniment data reading process by the CPU 11 is the position of the music. You may make it recognize whether it corresponds to a position. Specifically, for example, the CPU of the karaoke apparatus recognizes the accompaniment data read from the storage unit and supplied to the sound processing unit by integrating the length of the note (or rest) supplied. It may be. Alternatively, the CPU 11 may count the tempo clock and count the number of beats from the beginning of the song. Also, the integrated value of notes and beats (or measures and phrases) can be converted to time if the tempo speed is known, so the time data from the beginning of the song is used to indicate the position of the accompaniment sound signal in the song. May be recognized. In this case, the portion corresponding to the “section number” (see FIG. 8) included in the difference information and the section designation information in the embodiment corresponds to the position recognition of the accompaniment sound signal, and the notes (rests) from the beginning of the song. Data) indicating the cumulative length of the song, the number of beats from the beginning of the song, or the elapsed time from the beginning of the song.

（４）上述した実施形態においては、採点結果として、相違の程度を示す相違情報を算出するようにした。これに代えて、比較データそのもの（例えば、ピッチのズレ量）などであってもよい。 (4) In the embodiment described above, difference information indicating the degree of difference is calculated as a scoring result. Instead of this, the comparison data itself (for example, a pitch shift amount) may be used.

（５）上述した実施形態においては、練習者の歌唱を評価する場合を例に挙げて説明したが、これに限らず、練習者の楽器演奏を評価するようにしてもよい。この場合、練習者音声データは入力される楽器の演奏音を表すデータであり、また、伴奏データ記憶領域１４ａには、練習したい楽器以外の楽器の演奏データが記憶されており、模範音声データ記憶領域１４ｅには、模範として用いられる楽器の演奏音を表すデータが記憶されている。 (5) In the above-described embodiment, the case where the practitioner's singing is evaluated has been described as an example. However, the present invention is not limited thereto, and the practitioner's musical instrument performance may be evaluated. In this case, the trainer voice data is data representing the performance sound of the input musical instrument, and the performance data of the musical instrument other than the musical instrument to be practiced is stored in the accompaniment data storage area 14a. The area 14e stores data representing the performance sound of a musical instrument used as a model.

（６）上述した実施形態においては、記憶部１４の模範音声データ記憶領域１４ｅに記憶される音声データはＷＡＶＥ形式やＭＰ３形式のデータとしたが、データの形式はこれに限定されるものではなく、音声を示すデータであればどのような形式のデータであってもよい。
また、上述した実施形態においては、模範音声データを模範音声データ記憶領域１４ｅに記憶させて、カラオケ装置２のＣＰＵ１１が模範音声データ記憶領域１４ｅから模範音声データを読み出すようにしたが、これに代えて、通信ネットワークを介して音声データを受信するようにしてもよい。要するに、模範音声データをＣＰＵ１１に入力するようにすればよい。
また、上述した第１の実施形態においては、模範データとして、楽曲の伴奏を行う各種楽器の旋律の音程（ピッチ）を表すメロディデータを用いた。模範データはこれに限らず、楽曲のメロディやサブメロディ、コーラスなど、楽曲の旋律のピッチを表すデータであればどのようなものであってもよい。 (6) In the above-described embodiment, the audio data stored in the exemplary audio data storage area 14e of the storage unit 14 is data in the WAVE format or the MP3 format, but the data format is not limited to this. Any type of data may be used as long as it indicates data.
In the above-described embodiment, the model voice data is stored in the model voice data storage area 14e, and the CPU 11 of the karaoke apparatus 2 reads the model voice data from the model voice data storage area 14e. Thus, the audio data may be received via a communication network. In short, the model audio data may be input to the CPU 11.
In the first embodiment described above, melody data representing the melody pitch (pitch) of various musical instruments that accompany the music is used as the model data. The model data is not limited to this, and any data may be used as long as it represents the melody pitch of the music, such as a music melody, sub melody, or chorus.

（７）上述した第１の実施形態では、練習者音声データに対して音声認識処理を行って認識した音声に対応する認識文字列を生成し、生成した認識文字列と歌詞データとを区間毎に比較して歌詞の間違いを検出した。これに代えて、模範音声データと練習者音声データのスペクトルをそれぞれ区間毎に算出して、対応する部分のスペクトルを比較することで歌詞の間違いを検出してもよい。 (7) In the first embodiment described above, a recognition character string corresponding to the speech recognized by performing speech recognition processing on the trainer speech data is generated, and the generated recognition character string and lyric data are generated for each section. Compared to the detected lyrics mistakes. Instead of this, the spectrum of the model voice data and the trainer voice data may be calculated for each section, and the corresponding part of the spectrum may be compared to detect the lyric error.

（８）上述した実施形態では、カラオケ装置２とサーバ装置３とが通信ネットワークで接続された楽曲練習システム１が、上述した実施形態に係る機能を実現するようになっている。これに対し、通信ネットワークで接続された３以上の装置が上記機能を分担するようにし、それら複数の装置を備えるシステムが同実施形態のシステムを実現するようにしてもよい。または、ひとつの装置が上記機能のすべてを実現するようにしてもよい。 (8) In the above-described embodiment, the music practice system 1 in which the karaoke device 2 and the server device 3 are connected via a communication network realizes the function according to the above-described embodiment. On the other hand, three or more devices connected via a communication network may share the above functions, and a system including the plurality of devices may realize the system of the embodiment. Alternatively, one device may realize all of the above functions.

（９）上述した実施形態におけるカラオケ装置２のＣＰＵ１１またはサーバ装置３のＣＰＵ３１によって実行されるプログラムは、磁気テープ、磁気ディスク、フレキシブルディスク、光記録媒体、光磁気記録媒体、ＣＤ（Compact Disk）−ＲＯＭ、ＤＶＤ（Digital Versatile Disk）、ＲＡＭなどの記録媒体に記憶した状態で提供し得る。また、インターネットのようなネットワーク経由でカラオケ装置２またはサーバ装置３にダウンロードさせることも可能である。 (9) The program executed by the CPU 11 of the karaoke apparatus 2 or the CPU 31 of the server apparatus 3 in the above-described embodiment is a magnetic tape, a magnetic disk, a flexible disk, an optical recording medium, a magneto-optical recording medium, or a CD (Compact Disk)- It can be provided in a state stored in a recording medium such as a ROM, a DVD (Digital Versatile Disk), or a RAM. It is also possible to download to the karaoke apparatus 2 or the server apparatus 3 via a network such as the Internet.

楽曲練習システムの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a music practice system. カラオケ装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of a karaoke apparatus. 採点結果データの内容の一例を示す図である。It is a figure which shows an example of the content of scoring result data. サーバ装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of a server apparatus. ピッチ採点結果テーブルの内容の一例を示す図である。It is a figure which shows an example of the content of the pitch scoring result table. 区間指定情報の内容の一例を示す図である。It is a figure which shows an example of the content of area designation information. カラオケ装置のＣＰＵが行う処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which CPU of a karaoke apparatus performs. サーバ装置のＣＰＵが行う採点結果統計処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the scoring result statistical process which CPU of a server apparatus performs. カラオケ装置の表示部に表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed on the display part of a karaoke apparatus. 区間指定情報の示す区間の報知の態様の一例を示す図である。It is a figure which shows an example of the alerting | reporting aspect of the area which section designation information shows. カラオケ装置の表示部に表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed on the display part of a karaoke apparatus. 本発明の第２実施形態に係るカラオケ装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the karaoke apparatus which concerns on 2nd Embodiment of this invention. 模範技法データの内容の一例を示す図である。It is a figure which shows an example of the content of model technique data. 採点結果データの内容の一例を示す図である。It is a figure which shows an example of the content of scoring result data. カラオケ装置のＣＰＵが行う処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which CPU of a karaoke apparatus performs. ファルセットの検出処理を説明するための図である。It is a figure for demonstrating the detection process of a false set.

Explanation of symbols

１…楽曲練習システム、２，２ａ，２ｂ，２ｃ…カラオケ装置、３…サーバ装置、４…通信ネットワーク、１１…ＣＰＵ、１２…ＲＯＭ、１３…ＲＡＭ、１４…記憶部、１５…表示部、１６…操作部、１７…マイクロフォン、１８…音声処理部、１９…スピーカ、２０…通信部、３１…ＣＰＵ、３２…ＲＯＭ、３３…ＲＡＭ、３４…記憶部、３５…通信部。 DESCRIPTION OF SYMBOLS 1 ... Music practice system, 2, 2a, 2b, 2c ... Karaoke apparatus, 3 ... Server apparatus, 4 ... Communication network, 11 ... CPU, 12 ... ROM, 13 ... RAM, 14 ... Memory | storage part, 15 ... Display part, 16 DESCRIPTION OF SYMBOLS ... Operation part, 17 ... Microphone, 18 ... Audio | voice processing part, 19 ... Speaker, 20 ... Communication part, 31 ... CPU, 32 ... ROM, 33 ... RAM, 34 ... Memory | storage part, 35 ... Communication part.

Claims

Accompaniment data storage means for storing accompaniment data constituting the accompaniment sound of the music;
Obtaining means for obtaining section designation information indicating a specific section of the music;
Instruction means for instructing the start of the accompaniment;
An accompaniment sound signal generating means for reading accompaniment data from the accompaniment data storage means in accordance with the progress of the music and generating an accompaniment sound signal based on the read accompaniment data when the instruction means instructs to start an accompaniment; ,
Accompaniment position recognition means for recognizing which position of the music the accompaniment sound signal generated by the accompaniment sound signal generation means is;
The position recognized by the accompaniment position recognition means is compared with the start position of the section designation information acquired by the acquisition means, and when the difference between the two becomes a predetermined difference, the section indicated by the section designation information is notified. A music practice device comprising: a notification means.

The section designation information includes feature data indicating characteristics of a section to be specified, and the notification unit performs notification of a mode set in advance according to the feature data together with notification of the section. The music practice device according to 1.

Accompaniment data storage means for storing accompaniment data constituting the accompaniment sound of the music;
Model voice data storage means in which model voice data representing melodic sounds included in the music is stored;
Obtaining means for obtaining section designation information indicating a specific section of the music;
Instruction means for instructing the start of the accompaniment;
An accompaniment sound signal generating means for reading accompaniment data from the accompaniment data storage means in accordance with the progress of the music and generating an accompaniment sound signal based on the read accompaniment data when the instruction means instructs to start an accompaniment; ,
Accompaniment position recognition means for recognizing which position of the music the accompaniment sound signal generated by the accompaniment sound signal generation means is;
Model voice data corresponding to the section indicated by the section designation information acquired by the acquisition means at the timing when the position recognized by the accompaniment position recognition means matches the start position of the section designation information acquired by the acquisition means And a sound signal generating means for generating a sound signal based on the read model sound data.

Accompaniment data storage means for storing accompaniment data constituting the accompaniment sound of the music;
Obtaining means for obtaining section designation information indicating a specific section of the music;
Specific section instructing means for instructing the start of accompaniment from the section indicated by the section specifying information acquired by the acquiring means;
When the start of the accompaniment is instructed by the specific section instruction means, the accompaniment data of the portion corresponding to the section indicated by the section designation information acquired by the acquisition means is read from the accompaniment data storage means, and the read accompaniment data is A music practice device comprising: accompaniment sound signal generation means for generating an accompaniment sound signal based on the music sound.

The accompaniment data includes position information indicating the position of the music, and the accompaniment position recognizing means uses the accompaniment sound signal generated by the accompaniment sound signal generating means from the position information included in the accompaniment data. The music practice device according to any one of claims 1 to 3, wherein the position is recognized.

The accompaniment position recognizing means recognizes in which position of the music the accompaniment sound signal generated by the accompaniment sound signal generating means corresponds to a process of reading accompaniment data by the accompaniment sound signal generating means. The music practice device according to any one of claims 1 to 3.

Input means for accepting input of trainee data representing the trainee's voice;
The input practice data and the model data stored in the model data storage means are compared for each comparison section in a predetermined time unit, and difference information indicating the degree of difference between the two is provided for each comparison section. The music practice device according to claim 1, further comprising a comparison unit that generates and outputs the comparison unit.

The exemplary data is data representing the melody pitch of the music,
Pitch calculation means for calculating the pitch of the voice from the trainee data;
The comparison means compares the pitch calculated by the pitch calculation means and the pitch indicated by the model data stored in the model data storage means for each comparison section, and shows difference information indicating the degree of difference between the two. The music practice device according to claim 7, wherein: is generated for each comparison section.

The exemplary data is data representing the lyrics of the music,
Voice recognition means for recognizing the voice represented by the trainee data and generating a recognized character string corresponding to the recognized voice;
The comparison unit compares the recognized character string generated by the voice recognition unit and the model data stored in the model data storage unit for each comparison section, and displays difference information indicating the degree of difference between the two. It produces | generates for every said comparison area. The music practice apparatus of Claim 7 characterized by the above-mentioned.

The music practice device according to claim 7, wherein the model data is technique data indicating a type and timing of a technique used for a model song.

8. The music practice according to claim 7, wherein the practitioner data is data representing a performance sound of an input musical instrument, and the model data is data representing a performance sound of a musical instrument used as a model. apparatus.

A plurality of music practice devices according to any one of claims 7 to 11,
The difference information generated by the comparison means of each music practice device is received via a network, and the statistics of the received difference information are taken for each comparison interval, and a comparison interval satisfying a predetermined statistical result is extracted. A server device having section designation information generating means for generating section designation information indicating the extracted comparison section as a specific section of the music;
The music practice system according to claim 1, wherein the acquisition unit of each music practice device obtains section designation information from the server device.