JP2007264569A

JP2007264569A - Retrieval device, control method, and program

Info

Publication number: JP2007264569A
Application number: JP2006093293A
Authority: JP
Inventors: Naohiro Emoto; 直博江本
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-03-30
Filing date: 2006-03-30
Publication date: 2007-10-11

Abstract

<P>PROBLEM TO BE SOLVED: To retrieve a model appropriate for a user practicing singing or performance. <P>SOLUTION: When receiving practice voice feature data and a music ID from a karaoke (orchestration without lyrics) device 2, a server device 3 compares the received practice voice feature data with all pieces of model voice feature data stored in a model voice data storage part 32a in association with the music ID and selects a piece of model voice feature data showing the highest similarity to the practice voice feature data from these pieces of model voice feature data and reads out model voice data corresponding to the selected piece of model voice feature data and transmits the read model voice data to the karaoke device 2. The karaoke device 2 reproduces the received model voice data as a model for the user. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、練習者の模範となるような歌唱音声や演奏音を検索する技術に関する。 The present invention relates to a technique for searching for a singing voice or performance sound that serves as a model for a practitioner.

歌唱を練習する者が、カラオケ装置によって再生される楽曲の伴奏に合わせて歌唱を行う場合、その楽曲を持ち歌としている歌手の歌い方を真似て歌うことが多い。ところが、歌唱の素人である練習者と、歌唱に熟練した歌手とでは、声質や歌唱技術に相当の開きがあるので、練習者はうまく真似することができないのが普通である。また、練習者の歌唱技術があまりにも低いと、無理に真似をして歌ったとしても、歌唱技術を上達させるのは難しい。練習者の歌唱技術を効率よく上達させるためには、その練習者の声質や歌唱技術のレベルに見合った適切な模範が必要である。これは楽曲の歌唱に限らず、楽器の演奏についても同様である。 When a person practicing singing sings along with the accompaniment of a song reproduced by a karaoke device, the singer who sings the song as a song is often imitated. However, a practitioner who is an amateur of singing and a singer who is skilled in singing have a considerable gap in voice quality and singing technology, so the practitioner usually cannot imitate well. Also, if the practitioner's singing skills are too low, it is difficult to improve their singing skills even if they try to imitate and sing. In order to efficiently improve a practitioner's singing skills, an appropriate model that matches the level of the practitioner's voice quality and singing skills is required. This applies not only to the singing of music but also to the performance of musical instruments.

例えば練習者の発話を支援するための技術分野においては、模範者（先生）の発話音声を練習者（生徒）の音声に変換して聴かせたり（特許文献１参照）、練習者自身の発話音声についてスペクトル・エンベロープを修正するなどの音声処理を施して再生する（特許文献２参照）などの仕組みが提案されている。
特開２００２−２４４５４７号公報特開２００４−１３３４０９号公報 For example, in the technical field for supporting the utterance of the practitioner, the voice of the model (teacher) is converted into the voice of the practitioner (student) and listened to (see Patent Document 1), or the utterance of the practitioner himself. A mechanism has been proposed in which sound is subjected to sound processing such as correcting a spectrum envelope and reproduced (see Patent Document 2).
JP 2002-244547 A JP 2004-133409 A

特許文献１に記載された仕組みでは、練習者自身が模範者の音声を選択しなければならないという手順が必要となり、煩雑である。また、特許文献２に記載された仕組みでは、練習者の発話音声に音声処理を施すので、不自然な発音になってしまうことがある。本発明はこのような背景に鑑みてなされたものであり、その目的は、従来とは異なる仕組みで、練習者にとって適切な模範を検索するための手法を提供することにある。 The mechanism described in Patent Document 1 requires a procedure in which the practitioner has to select the modeler's voice, which is complicated. Further, in the mechanism described in Patent Document 2, since speech processing is performed on the uttered speech of the practitioner, unnatural pronunciation may occur. The present invention has been made in view of such a background, and an object of the present invention is to provide a technique for searching for an appropriate model for a practitioner with a mechanism different from the conventional one.

上記課題を解決するため、本発明は、模範となる歌唱音声又は演奏音を表す模範音データを記憶する模範音記憶手段と、練習者の歌唱音声又は演奏音を表す練習音データを取得する取得手段と、前記模範音記憶手段によって記憶されている各々の模範音データの特徴と、前記取得手段によって取得された練習音データの特徴とを比較し、その練習音データの特徴に類似する特徴を有する模範音データを選択する選択手段と、前記選択手段によって選択された模範音データを出力する出力手段とを備えることを特徴とする検索装置を提供する。 In order to solve the above-described problems, the present invention acquires model sound storage means for storing model sound data representing a typical singing voice or performance sound, and acquisition of practice sound data representing the singing voice or performance sound of a practitioner And a feature of each model sound data stored by the model sound storage unit and a feature of the practice sound data acquired by the acquisition unit, and a feature similar to the feature of the practice sound data There is provided a search device comprising: selection means for selecting exemplary sound data having; and output means for outputting exemplary sound data selected by the selection means.

また、本発明は、歌唱又は演奏される楽曲に割り当てられた楽曲識別情報と、模範となる歌唱又は演奏を行う模範者に割り当てられた模範者識別情報と、各々の模範者による歌唱音声又は演奏音を表す模範音データとを対応付けて記憶する模範音記憶手段と、練習者の歌唱音声又は演奏音を表す練習音データと、その練習者に割り当てられた練習者識別情報とを取得する第１の取得手段と、前記模範音記憶手段によって記憶されている各々の模範音データの特徴と、前記第１の取得手段によって取得された練習音データの特徴とを比較し、その練習音データの特徴に類似する特徴を有する模範音データを選択する第１の選択手段と、前記第１の取得手段によって取得された練習者識別情報と、前記第１の選択手段によって選択された模範音データに対応付けられて前記模範音記憶手段に記憶されている模範者識別情報とを対応付けて記憶する識別情報記憶手段と、前記練習者識別情報及び前記楽曲識別情報を取得する第２の取得手段と、前記第２の取得手段によって取得された練習者識別情報に対応付けられて前記識別情報記憶手段に記憶されている模範者識別情報を特定し、特定した模範者識別情報に対応付けられて前記模範音記憶手段に記憶されている複数の模範音データのうち、前記第２の取得手段によって取得された楽曲識別情報に対応付けられている模範音データを選択する第２の選択手段と、前記第１の選択手段又は前記第２の選択手段によって選択された模範音データを出力する出力手段とを備えることを特徴とする検索装置を提供する。 The present invention also provides music identification information assigned to a song to be sung or played, model identification information assigned to a model performer singing or performing, and singing voice or performance by each model Model sound storage means that associates and stores model sound data that represents sound, practice sound data that represents the singing voice or performance sound of the practitioner, and practitioner identification information assigned to the practitioner And comparing the characteristics of each model sound data stored in the model sound storage means with the characteristics of the practice sound data acquired by the first acquisition means. First selection means for selecting model sound data having characteristics similar to the characteristics, practitioner identification information acquired by the first acquisition means, and model sound data selected by the first selection means. Identification information storage means for storing in association with the exemplary person identification information stored in the exemplary sound storage means in association with the data, and second acquisition for acquiring the practitioner identification information and the music identification information And identification information stored in the identification information storage means in association with the practitioner identification information acquired by the second acquisition means and associated with the specified exemplary identification information. Second selection means for selecting the model sound data associated with the music identification information acquired by the second acquisition means from among the plurality of model sound data stored in the model sound storage means; And an output means for outputting the model sound data selected by the first selection means or the second selection means.

また、本発明は、模範となる歌唱音声又は演奏音を表す模範音データを記憶する模範音記憶手段と、制御手段とを備えた検索装置の制御方法であって、前記制御手段が、練習者の歌唱音声又は演奏音を表す練習音データを取得する第１のステップと、前記制御手段が、前記模範音記憶手段によって記憶されている各々の模範音データの特徴と、前記第１のステップにおいて取得された練習音データの特徴とを比較し、その練習音データの特徴に類似する特徴を有する模範音データを選択する第２のステップと、前記制御手段が、前記第２のステップにおいて選択された模範音データを出力する第３のステップとを備えることを特徴とする制御方法を提供する。 The present invention also provides a control method for a search device comprising model sound storage means for storing model singing voice or model sound data representing performance sound, and control means, wherein the control means is a practitioner. A first step of acquiring practice sound data representing the singing voice or performance sound of the sound, a feature of each of the model sound data stored in the model sound storage unit by the control means, and the first step A second step of comparing the characteristics of the acquired practice sound data and selecting model sound data having characteristics similar to the characteristics of the practice sound data; and the control means is selected in the second step. And a third step of outputting model sound data.

また、本発明は、歌唱又は演奏される楽曲に割り当てられた楽曲識別情報と、模範となる歌唱又は演奏を行う模範者に割り当てられた模範者識別情報と、各々の模範者による歌唱音声又は演奏音を表す模範音データとを対応付けて記憶する模範音記憶手段と、前記模範者識別情報と練習者に割り当てられた練習者識別情報とを対応付けて記憶する識別情報記憶手段と、制御手段とを備えた検索装置の制御方法であって、前記制御手段が、練習者の歌唱音声又は演奏音を表す練習音データと、その練習者に割り当てられた練習者識別情報とを取得する第１のステップと、前記制御手段が、前記模範音記憶手段によって記憶されている各々の模範音データの特徴と、前記第１のステップにおいて取得された練習音データの特徴とを比較し、その練習音データの特徴に類似する特徴を有する模範音データを選択する第２のステップと、前記制御手段が、前記第１のステップにおいて取得された練習者識別情報と、前記第２のステップにおいて選択された模範音データに対応付けられて前記模範音記憶手段に記憶されている模範者識別情報とを対応付けて前記識別情報記憶手段に記憶させる一方、前記第２のステップにおいて選択された模範音データを出力する第３のステップと、前記制御手段が、前記練習者識別情報及び前記楽曲識別情報を取得する第４のステップと、前記制御手段が、前記第４のステップにおいて取得された練習者識別情報に対応付けられて前記識別情報記憶手段に記憶されている模範者識別情報を特定し、特定した模範者識別情報に対応付けられて前記模範音記憶手段に記憶されている複数の模範音データのうち、前記第２の取得手段によって取得された楽曲識別情報に対応付けられている模範音データを選択する第５のステップと、前記制御手段が、前記第５のステップにおいて選択された模範音データを出力する第６のステップとを備えることを特徴とする制御方法を提供する。
さらに、本発明は、コンピュータに対して機能を実現させるプログラムとしての形態も採り得る。 The present invention also provides music identification information assigned to a song to be sung or played, model identification information assigned to a model performer singing or performing, and singing voice or performance by each model Model sound storage means for storing model sound data representing sounds in association with each other, identification information storage means for storing the model person identification information and practitioner identification information assigned to the practitioner in association with each other, and control means And a control means for acquiring the practice sound data representing the singing voice or performance sound of the practitioner and the practitioner identification information assigned to the practitioner. And the control means compares the characteristics of each model sound data stored in the model sound storage means with the characteristics of the practice sound data acquired in the first step, A second step of selecting model sound data having characteristics similar to the characteristics of the sound data; and the control means is selected in the second step and the practitioner identification information acquired in the first step. The model sound data selected in the second step is stored in the identification information storage unit in association with the model identification information stored in the model sound storage unit in association with the model sound data. A third step of outputting the learner, a fourth step in which the control means acquires the practitioner identification information and the music piece identification information, and a practitioner identification in which the control means is acquired in the fourth step. The model identification information stored in the identification information storage means in association with the information is identified, and the model sound storage unit is associated with the identified model identification information. A fifth step of selecting model sound data associated with the music identification information acquired by the second acquisition means from among the plurality of model sound data stored in And a sixth step of outputting the model sound data selected in the fifth step.
Furthermore, the present invention may also take the form of a program that causes a computer to realize functions.

本発明によれば、練習者の歌唱音声又は演奏音に類似する模範音声又は模範演奏、つまり個々の練習者にとって適切な模範を検索することができる。 According to the present invention, it is possible to search for a model voice or model performance similar to the singing voice or performance sound of the practitioner, that is, a model suitable for the individual practitioner.

次に、本発明を実施するための最良の形態を説明する。
以下の説明では、歌唱を練習するものを「練習者」と呼び、その練習者にとって模範となるような歌唱を行う者（例えば歌手）を「模範者」と呼ぶ。模範者は、楽譜の内容に忠実に従って歌唱することはほとんどなく、大抵の場合、歌い始めや歌い終わりを意図的にずらしたり、声質や音量を変化させたり、或いはビブラートやこぶし等の各種歌唱技法を用いたりして、歌のなかに感情の盛り上がり（情感）を表現する。しかし、これらの表現は歌唱者によって様々に異なる。そこで、本実施形態は、多数の模範者の歌唱音声の中から、練習者の歌唱音声に似通った模範者の歌唱音声を検索し、それを練習者に聴かせて真似させることで、練習者の歌唱技術の上達を図る、というものである。 Next, the best mode for carrying out the present invention will be described.
In the following description, a person who practices singing is called a “practicing person”, and a person who performs singing that serves as an example for the practicing person (for example, a singer) is called a “executive person”. Modelers rarely sing according to the content of the score, and in most cases, singing techniques such as intentionally shifting the beginning and end of singing, changing the voice quality and volume, and various singing techniques such as vibrato and fist To express the excitement (feelings) in the song. However, these expressions vary depending on the singer. Therefore, the present embodiment searches for the singing voice of the model person who resembles the singing voice of the practitioner from among the singing voices of the model person, and listens to the practitioner to imitate it. It aims to improve the singing skills.

[１．構成]
図１は、本実施形態に係る検索システム１の全体構成を示すブロック図である。この検索システム１は、複数のカラオケ装置２ａ，２ｂ，２ｃと、サーバ装置３と、これらを接続するネットワーク４とを備えている。カラオケ装置２ａ，２ｂ，２ｃは、一般家庭や、カラオケボックス又は飲食店などの各種店舗に備えられており、ネットワーク４を介して通信を行う通信装置として機能する。サーバ装置３は、多数の模範者の歌唱音声を記憶しており、これらの中から練習者にとって適切であろうと思われる歌唱音声を検索する検索装置として機能する。ネットワーク４は、例えばＩＳＤＮ（Integrated Services Digital Network）やインターネットあるいは店舗内ネットワークであり、有線区間又は無線区間を含んでいる。図１には３つのカラオケ装置が例示されているが、この検索システム１に含まれるカラオケ装置の数は３に限定されるものではなく、これより多くても少なくてもよい。また、カラオケ装置２ａ，２ｂ，２ｃはいずれも同じ構成及び動作であるから、以下では単に「カラオケ装置２」と総称する。 [1. Constitution]
FIG. 1 is a block diagram showing the overall configuration of a search system 1 according to the present embodiment. The search system 1 includes a plurality of karaoke apparatuses 2a, 2b, and 2c, a server apparatus 3, and a network 4 that connects them. The karaoke devices 2 a, 2 b, 2 c are provided in various households such as ordinary households, karaoke boxes or restaurants, and function as communication devices that perform communication via the network 4. The server device 3 stores the singing voices of a large number of models, and functions as a search device that searches for a singing voice that seems to be appropriate for the practitioner. The network 4 is, for example, an ISDN (Integrated Services Digital Network), the Internet, or an in-store network, and includes a wired section or a wireless section. Although three karaoke apparatuses are illustrated in FIG. 1, the number of karaoke apparatuses included in the search system 1 is not limited to three, and may be more or less. Further, since the karaoke devices 2a, 2b, and 2c all have the same configuration and operation, they are simply collectively referred to as “karaoke device 2” below.

図２は、カラオケ装置２の構成を示したブロック図である。
制御部２１は例えばＣＰＵであり、記憶部２２に記憶されているコンピュータプログラムを読み出して実行することにより、カラオケ装置２の各部を制御する。表示部２３は、例えば液晶ディスプレイなどであり、制御部２１の制御の下で、カラオケ装置２を操作するためのメニュー画面や、背景画像に歌詞テロップが重ねられたカラオケ画面などの各種画面を表示する。操作部２４は、各種のキーを備えており、押下されたキーに対応した信号を制御部２１へ出力する。マイクロフォン２５は、歌唱者が発音した音声を収音する。音声処理部２６は、マイクロフォン２５によって収音された音声（アナログデータ）をデジタルデータに変換して制御部２１に出力する。スピーカ２７は、音声処理部２６から出力される音声を放音する。通信部２８は、制御部２１による制御の下で、ネットワーク４を介してサーバ装置３とデータ通信を行う。 FIG. 2 is a block diagram showing the configuration of the karaoke apparatus 2.
The control unit 21 is, for example, a CPU, and controls each unit of the karaoke apparatus 2 by reading and executing a computer program stored in the storage unit 22. The display unit 23 is a liquid crystal display, for example, and displays various screens such as a menu screen for operating the karaoke device 2 and a karaoke screen in which lyrics telop is superimposed on a background image under the control of the control unit 21. To do. The operation unit 24 includes various keys and outputs a signal corresponding to the pressed key to the control unit 21. The microphone 25 picks up sound produced by the singer. The sound processing unit 26 converts the sound (analog data) collected by the microphone 25 into digital data and outputs the digital data to the control unit 21. The speaker 27 emits sound output from the sound processing unit 26. The communication unit 28 performs data communication with the server device 3 via the network 4 under the control of the control unit 21.

記憶部２２は、例えばハードディスクなどの大容量の記憶手段であり、伴奏・歌詞データ記憶領域２２ａと、練習音声データ記憶領域２２ｂと、歌唱楽譜音データ記憶領域２２ｃとを有している。伴奏・歌詞データ記憶領域２２ａには、楽曲の伴奏を行う各種楽器の演奏音が楽曲の進行に伴って記された伴奏データと、楽曲の歌詞を示す歌詞データとが対応付けられて記憶されている。伴奏データは、例えばＭＩＤＩ（Musical Instruments Digital Interface）形式などのデータ形式であり、練習者がカラオケ歌唱する際に再生される。歌詞データは、そのカラオケ歌唱の際に歌詞テロップとして表示部２３に表示される。練習音声データ記憶領域２２ｂには、マイクロフォン２５から音声処理部２６を経てＡ／Ｄ変換された音声データが練習音声データとして記憶される。この練習音声データは例えばＷＡＶＥ形式やＭＰ３（MPEG Audio Layer-3）形式である。歌唱楽譜音データ記憶領域２２ｃには、楽曲の楽譜によって規定された歌唱部分の音程及びタイミングを表す楽譜音データ（例えばＭＩＤＩ形式）が記憶されている。この楽譜音データは、練習者の歌唱音声から、例えば「ビブラート」、「しゃくり」、「こぶし」、「ファルセット」、「つっこみ」、「ため」、「息継ぎ」などの各種技法を抽出するために利用される。 The storage unit 22 is a large-capacity storage unit such as a hard disk, for example, and includes an accompaniment / lyric data storage area 22a, a practice voice data storage area 22b, and a singing musical score sound data storage area 22c. In the accompaniment / lyric data storage area 22a, accompaniment data in which performance sounds of various musical instruments for accompaniment of music are recorded as the music progresses and lyrics data indicating the lyrics of the music are stored in association with each other. Yes. The accompaniment data has a data format such as MIDI (Musical Instruments Digital Interface) format, and is reproduced when the practitioner sings a karaoke song. The lyrics data is displayed on the display unit 23 as a lyrics telop at the time of the karaoke song. In the practice voice data storage area 22b, voice data A / D converted from the microphone 25 via the voice processing unit 26 is stored as practice voice data. The practice audio data is, for example, in WAVE format or MP3 (MPEG Audio Layer-3) format. The musical score data storage area 22c stores musical score data (for example, MIDI format) representing the pitch and timing of the singing portion defined by the musical score of the music. This music score data is used to extract various techniques such as “vibrato”, “shakuri”, “fist”, “farset”, “tsukkomi”, “for”, “breathing”, etc. Used.

次に、図３は、サーバ装置３の構成を示したブロック図である。
図３において、制御部３１は例えばＣＰＵであり、記憶部３２に記憶されているコンピュータプログラムを読み出して実行することにより、サーバ装置３の各部を制御する。記憶部３２は、例えばハードディスクなどの大容量の記憶手段である。この記憶部３２は、模範音声データ記憶領域３２ａと、練習音声特徴データ記憶領域３２ｂとを有している。通信部３３は、制御部３１による制御の下で、ネットワーク４を介してカラオケ装置２とデータ通信を行う。 Next, FIG. 3 is a block diagram showing a configuration of the server device 3.
In FIG. 3, the control unit 31 is, for example, a CPU, and controls each unit of the server device 3 by reading and executing a computer program stored in the storage unit 32. The storage unit 32 is a large-capacity storage unit such as a hard disk. The storage unit 32 includes a model voice data storage area 32a and a practice voice feature data storage area 32b. The communication unit 33 performs data communication with the karaoke apparatus 2 through the network 4 under the control of the control unit 31.

ここで、記憶部３２に記憶されている内容について詳細に説明する。
図４は、模範音声データ記憶領域３２ａに記憶されているデータの例を示す図である。図４に示すように、模範音声データ記憶領域３２ａには、楽曲に割り当てられた楽曲ＩＤ（identification：識別情報）と、模範者に割り当てられた模範者ＩＤと、その楽曲を模範者が歌唱した際の歌唱音声を表す模範音声データと、その模範者の歌唱音声の特徴を表す模範音声特徴データとが対応付けられて記憶されている。楽曲ＩＤは、例えば曲名やカラオケ曲ナンバーなどの識別情報である。模範者ＩＤは、例えば模範者の氏名（歌手名）である。模範音声データは予め録音されたものであり、サーバ装置３からカラオケ装置２へと送信されてカラオケ装置２によって練習者に対する模範音声として再生される。模範音声特徴データは、模範者の歌唱音声のピッチ（音程）、発音タイミング、スペクトル、パワー（音量）、及び、歌唱に用いられる技法の種類及びその区間を含んでいる。図４の例では、例えば楽曲ＩＤ「ｇ０１」の楽曲を模範者「○○○○」が歌唱した際の音声を表す模範音声データと、その特徴を表す模範音声特徴データとが対応付けられて記憶されている場合が例示されている。 Here, the contents stored in the storage unit 32 will be described in detail.
FIG. 4 is a diagram illustrating an example of data stored in the model audio data storage area 32a. As shown in FIG. 4, in the model voice data storage area 32a, a song ID (identification: identification information) assigned to the song, a modeler ID assigned to the modeler, and the modeler sang the song. The model voice data representing the singing voice at the time and the model voice feature data representing the characteristics of the singing voice of the model person are stored in association with each other. The song ID is identification information such as a song name or karaoke song number. The model ID is, for example, the model name (singer name) of the model. The model voice data is recorded in advance, is transmitted from the server apparatus 3 to the karaoke apparatus 2, and is reproduced by the karaoke apparatus 2 as a model voice for the practitioner. The model voice feature data includes the pitch (pitch) of the model person's singing voice, the sounding timing, the spectrum, the power (volume), the type of technique used for the singing, and its section. In the example of FIG. 4, for example, the model voice data representing the voice when the model person “XXX” sings the song with the song ID “g01” and the model voice feature data representing the feature are associated with each other. The case where it is stored is illustrated.

そして、記憶部３２の練習音声特徴データ記憶領域３２ｂには、カラオケ装置２から送信されてくる、練習音声データの特徴を表す練習音声特徴データが記憶される。この練習音声特徴データは、前述の模範音声特徴データと同様に、練習者の歌唱音声のピッチ、発音タイミング、スペクトル、パワー、及び、歌唱に用いられる技法の種類及びその区間を含んでいる。 In the practice voice feature data storage area 32b of the storage unit 32, practice voice feature data representing features of the practice voice data transmitted from the karaoke apparatus 2 is stored. The practice voice feature data includes the pitch, pronunciation timing, spectrum, power, and type of technique used for singing and its section, as in the above-described exemplary voice feature data.

［２．動作］
次に、本実施形態の動作説明を行う。
図５のシーケンス図において、練習者は、カラオケ装置２の操作部２４を操作して歌唱したい曲の楽曲ＩＤを選択し、カラオケ伴奏の再生を指示する。制御部２１は、この操作に応じて、カラオケ伴奏を開始する（ステップＳ１）。即ち、制御部２１は、伴奏・歌詞データ記憶領域２２ａから、指定された楽曲ＩＤに対応する伴奏データを読み出して音声処理部２６に供給し、音声処理部２６がその伴奏データをアナログ信号に変換し、スピーカ２７から放音させる。同時に、制御部２１は、「伴奏に合わせて歌唱してください」というような、歌唱を促すメッセージを表示部２３に表示させてから、伴奏・歌詞データ記憶領域２２ａから歌詞データを読み出して歌詞テロップを表示部２３に表示させる。練習者は、表示された歌詞テロップを参照しつつ、スピーカ２７から放音される伴奏に合わせて歌唱を行う。このとき、練習者の音声はマイクロフォン２５によって収音されて音声信号に変換され、音声処理部２６へと出力される。そして、音声処理部２６によってＡ／Ｄ変換された練習音声データは、伴奏開始からの経過時間を表す情報と共に、記憶部２２の練習音声データ記憶領域２２ｂに記憶（録音）されていく（ステップＳ２）。 [2. Operation]
Next, the operation of this embodiment will be described.
In the sequence diagram of FIG. 5, the practitioner operates the operation unit 24 of the karaoke device 2 to select a song ID of a song that the user wants to sing, and instructs playback of the karaoke accompaniment. In response to this operation, the control unit 21 starts karaoke accompaniment (step S1). That is, the control unit 21 reads the accompaniment data corresponding to the designated music ID from the accompaniment / lyric data storage area 22a and supplies the accompaniment data to the audio processing unit 26. The audio processing unit 26 converts the accompaniment data into an analog signal. Then, sound is emitted from the speaker 27. At the same time, the control unit 21 displays a message prompting singing such as “Please sing along with the accompaniment” on the display unit 23, then reads out the lyric data from the accompaniment / lyric data storage area 22a and reads the lyrics telop. Is displayed on the display unit 23. The practitioner sings along with the accompaniment emitted from the speaker 27 while referring to the displayed lyrics telop. At this time, the practitioner's voice is picked up by the microphone 25, converted into a voice signal, and output to the voice processing unit 26. The practice voice data A / D converted by the voice processing unit 26 is stored (recorded) in the practice voice data storage area 22b of the storage unit 22 together with information indicating the elapsed time from the start of accompaniment (step S2). ).

伴奏データの再生が終了すると、制御部２１は練習者の歌唱音声を録音する処理を終了する。そして、制御部２１は、練習音声データ記憶領域２２ｂに記憶された練習音声データを所定時間長のフレーム単位に分離し、それぞれのフレーム単位でピッチ、スペクトル及びパワーを算出する（ステップＳ３）。スペクトルの算出には例えばＦＦＴ（Fast Fourier Transform）を用いればよい。 When the reproduction of the accompaniment data ends, the control unit 21 ends the process of recording the practitioner's singing voice. And the control part 21 isolate | separates the practice sound data memorize | stored in the practice sound data storage area 22b into the frame unit of predetermined time length, and calculates a pitch, a spectrum, and power for each frame unit (step S3). For example, FFT (Fast Fourier Transform) may be used to calculate the spectrum.

次いで、制御部２１は、練習音声データから技法を抽出する（ステップＳ４）。前述したように、技法には、「ビブラート」、「しゃくり」、「こぶし」、「ファルセット」、「つっこみ」、「ため」、「息継ぎ」がある。これらのうち、「ビブラート」は、音の高さをほんのわずかに連続的に上下させ、震えるような音色を出すという技法である。「しゃくり」は、目的の音より低い音から発音し、音程を滑らかに目的の音に近づけていくという技法である。「こぶし」は、装飾的に加えるうねるような節回しを行うという技法である。「ファルセット」は、いわゆる「裏声」で歌うという技法である。「つっこみ」は、歌い出しを本来のタイミングよりも早いタイミングにするという技法である。「ため」は、歌い出しを本来のタイミングよりも遅いタイミングにするという技法である。「息継ぎ」は、練習者が息継ぎをするタイミングを意味する。 Next, the control unit 21 extracts a technique from the practice voice data (step S4). As described above, the techniques include “vibrato”, “shakuri”, “fist”, “farset”, “push”, “for”, and “breathing”. Among these, “vibrato” is a technique that produces a timbre-like tone by raising and lowering the pitch of the sound only slightly. “Shikkuri” is a technique in which sound is generated from a sound lower than the target sound, and the pitch is smoothly brought close to the target sound. “Fist” is a technique of performing a undulating curl that is decoratively added. “Falset” is a technique of singing with a so-called “back voice”. “Tsukumi” is a technique in which singing is performed at a timing earlier than the original timing. “Tame” is a technique in which singing is made later than the original timing. “Respiration” means the timing when the practitioner takes a breath.

まず、制御部２１は、練習音声データにおいて、上記の各技法が用いられている区間を特定（検出）する。例えば「ビブラート」及び「しゃくり」については、模範音声データのピッチに基づいて検出することができる。また、「こぶし」及び「ファルセット」については、模範音声データのスペクトルに基づいて検出することができる。また、「ため」及び「つっこみ」については、模範音声データのピッチと、歌唱楽譜音データ記憶領域２２ｃに記憶されている楽譜音データとに基づいて検出することができる。また、「息継ぎ」については、模範音声データのパワーと、歌唱楽譜音データ記憶領域２２ｃに記憶されている楽譜音データとに基づいて検出することができる。 First, the control unit 21 specifies (detects) a section in which each technique is used in the practice voice data. For example, “vibrato” and “shrimp” can be detected based on the pitch of the model voice data. Further, “fist” and “falset” can be detected based on the spectrum of the model voice data. Further, “for” and “tsukkomi” can be detected based on the pitch of the model voice data and the musical score sound data stored in the singing musical score sound data storage area 22c. Further, “breathing” can be detected based on the power of the model voice data and the musical score data stored in the singing musical score data storage area 22c.

具体的な検出方法は以下のとおりである。
制御部２１は、練習音声データと楽譜音データとの対応関係と、練習音声データから算出されたピッチとに基づいて、練習音声データに含まれる音の開始時刻と当該音に対応する楽譜音データの音の開始時刻とが異なる区間を特定する。ここで、制御部２１は、練習音声データのピッチの変化タイミングが楽譜音データのピッチの変化タイミングよりも早く現れている区間、すなわち練習音声データに含まれる音の開始時刻が当該音に対応する楽譜音データの音の開始時刻よりも早い区間については、この区間を「つっこみ」の歌唱技法が用いられている区間であると特定する。制御部２１は、このようにして特定した区間の区間情報を、「つっこみ」を示す識別情報と関連付ける。 A specific detection method is as follows.
Based on the correspondence between the practice voice data and the score sound data and the pitch calculated from the practice voice data, the control unit 21 starts the sound included in the practice voice data and the score sound data corresponding to the sound. The section where the start time of the sound is different is specified. Here, the control unit 21 corresponds to a section in which the pitch change timing of the practice voice data appears earlier than the pitch change timing of the musical score sound data, that is, the start time of the sound included in the practice voice data. For a section earlier than the start time of the sound of the musical score data, this section is specified as a section in which the “Tsukumi” singing technique is used. The control unit 21 associates the section information of the section specified in this way with identification information indicating “push”.

逆に、制御部２１は、練習音声データと楽譜音データとの対応関係と、練習音声データから算出されたピッチとに基づいて、練習音声データのピッチの変化タイミングが楽譜音データのピッチの変化タイミングよりも遅れて現れている区間、すなわち練習音声データに含まれる音の開始時刻が当該音に対応する楽譜音データの音の開始時刻よりも遅い区間を検出し、検出した区間を「ため」の歌唱技法が用いられている区間であると特定する。 Conversely, the control unit 21 determines that the pitch change timing of the practice voice data is a change in the pitch of the score sound data based on the correspondence between the practice voice data and the score sound data and the pitch calculated from the practice voice data. A section that appears later than the timing, that is, a section in which the start time of the sound included in the practice sound data is later than the start time of the sound of the musical score sound data corresponding to the sound, and the detected section is It is specified that it is a section in which the singing technique is used.

また、制御部２１は、練習音声データから算出したピッチの時間的な変化のパターンを解析して、中心となる周波数の上下に所定の範囲内でピッチが連続的に変動している区間を検出し、検出した区間を「ビブラート」の歌唱技法が用いられている区間であると特定する。 Further, the control unit 21 analyzes a pattern of temporal change of the pitch calculated from the practice voice data, and detects a section where the pitch continuously fluctuates within a predetermined range above and below the central frequency. The detected section is identified as a section in which the “vibrato” singing technique is used.

また、制御部２１は、練習音声データから算出したピッチの時間的な変化のパターンを解析して、低いピッチから高いピッチに連続的にピッチが変化する区間を検出し、検出した区間を「しゃくり」の歌唱技法が用いられている区間であると特定する。なお、この処理は、楽譜音データとの対応関係に基づいて行うようにしてもよい。すなわち、制御部２１は、練習音声データと楽譜音データとの対応関係に基づいて、練習音声データのピッチが、低いピッチから連続的に楽譜音データのピッチに近づいている区間を検出すればよい。 Further, the control unit 21 analyzes a pattern of temporal change of the pitch calculated from the practice voice data, detects a section where the pitch continuously changes from a low pitch to a high pitch, ”Is identified as the section in which the singing technique is used. This process may be performed based on the correspondence with the musical score data. That is, the control unit 21 may detect a section in which the pitch of the practice voice data continuously approaches the pitch of the score sound data from a low pitch based on the correspondence relationship between the practice voice data and the score sound data. .

また、制御部２１は、練習音声データと楽譜音データとの対応関係と、練習音声データから算出されたパワーとに基づいて、楽譜音データが有音である区間であって練習音声データのパワー値が所定の閾値よりも小さい区間を検出し、検出した箇所を「息継ぎ」の区間であると特定する。 Further, the control unit 21 is a section in which the musical score sound data is sound based on the correspondence between the practice voice data and the musical score sound data and the power calculated from the practice voice data. A section whose value is smaller than a predetermined threshold is detected, and the detected part is specified as a section of “breathing”.

また、制御部２１は、練習音声データから算出されたスペクトルの時間的な変化パターンを解析して、スペクトル特性がその予め決められた変化状態に急激に遷移している区間を検出し、検出した区間を「ファルセット」の歌唱技法が用いられている区間であると特定する。ここで、予め決められた変化状態とは、スペクトル特性の高調波成分が極端に少なくなる状態である。例えば、地声の場合は沢山の高調波成分が含まれるが、ファルセットになると高調波成分の大きさが極端に小さくなる。なお、この場合、制御部２１は、ピッチが大幅に上方に変化したかどうかも参照してもよい。ファルセットは地声と同一のピッチを発生する場合でも用いられることもあるが、一般には地声では発声できない高音を発声するときに使われる技法だからである。したがって、練習音声データのピッチが所定音高以上の場合に限って「ファルセット」の検出をするように構成してもよい。また、男声と女声とでは一般にファルセットを用いる音高の領域が異なるので、練習音声データの音域や、練習音声データから検出されるフォルマントによって性別検出を行い、この結果を踏まえてファルセット検出の音高領域を設定してもよい。 In addition, the control unit 21 analyzes the temporal change pattern of the spectrum calculated from the practice voice data, and detects and detects a section where the spectral characteristics are rapidly transitioning to the predetermined change state. The section is identified as a section in which the “Falset” singing technique is used. Here, the predetermined change state is a state in which the harmonic component of the spectrum characteristic is extremely reduced. For example, in the case of a local voice, many harmonic components are included, but when a false set is used, the magnitude of the harmonic components becomes extremely small. In this case, the control unit 21 may also refer to whether or not the pitch has changed significantly upward. The falset is sometimes used even when generating the same pitch as the local voice, but is generally a technique used when generating high-pitched sounds that cannot be generated by the local voice. Therefore, the “false set” may be detected only when the pitch of the practice voice data is equal to or higher than a predetermined pitch. In addition, since the pitch range using falsets is generally different between male voices and female voices, gender detection is performed based on the range of the practice voice data and formants detected from the practice voice data. An area may be set.

また、制御部２１は、スペクトル特性の変化の態様が短時間に多様に切り替わる区間を検出し、検出した部分を「こぶし」の歌唱技法が用いられている部分であると特定する。「こぶし」の場合は、短い区間において声色や発声方法を変えて唸るような味わいを付加する歌唱技法であるため、この技法が用いられている区間においてはスペクトル特性が多様に変化するからである。 In addition, the control unit 21 detects a section in which the mode of change of the spectrum characteristic is variously switched in a short time, and identifies the detected part as a part where the “fist” singing technique is used. In the case of “fist”, it is a singing technique that adds a taste that can be changed by changing the voice color and utterance method in a short section, so the spectral characteristics change variously in the section where this technique is used. .

以上のようにして、制御部２１は、練習音声データから、技法が用いられている区間を検出し、検出した区間を示す区間情報をその歌唱技法を示す種別情報と関連付ける。そして、制御部２１は、ステップＳ３にて算出したピッチ、スペクトル及びパワーと、ステップＳ４にて生成した区間情報及び種別情報とを含む練習音声特徴データを生成する（ステップＳ５）。このとき、制御部２１は、ピッチから発音タイミングも算出し、これを練習音声特徴データに含めておく。そして、制御部２１は、生成した練習音声特徴データを楽曲ＩＤとともに通信部２８からサーバ装置３に送信する（ステップＳ６）。 As described above, the control unit 21 detects the section in which the technique is used from the practice voice data, and associates the section information indicating the detected section with the type information indicating the singing technique. Then, the control unit 21 generates practice voice feature data including the pitch, spectrum, and power calculated in step S3 and the section information and type information generated in step S4 (step S5). At this time, the control unit 21 also calculates the sound generation timing from the pitch and includes this in the practice voice feature data. And the control part 21 transmits the produced practice audio | voice feature data to the server apparatus 3 from the communication part 28 with music ID (step S6).

サーバ装置３の制御部３１は、練習音声特徴データ及び楽曲ＩＤを受信すると、受信した練習音声特徴データと、模範音声データ記憶領域３２ａにおいて上記楽曲ＩＤと対応付けて記憶されている全ての模範者音声特徴データとを比較し、これら模範者音声特徴データの中から、練習音声特徴データとの類似度が最も高いものを選択する（ステップＳ７）。より具体的には、制御部３１は、練習音声特徴データが表すピッチと、各模範音声特徴データが表すピッチとの差分を歌唱の開始から終了までの全域にわたって積分する。同様に、制御部３１は、練習音声特徴データが表すパワーと、各模範者音声特徴データが表すパワーとの差分を歌唱の開始から終了までの全域にわたって積分する。スペクトルやタイミングについても同様である。また、制御部３１は、技法についても上記と同様に、練習音声特徴データが表す各技法の区間情報によって表わされる区間と、各模範者音声特徴データが表す各技法の区間情報によって表される区間との差分を積分する。そして、制御部３１は、上記のようにして得られた積分値を模範者音声特徴データ毎に累算し、その累算値が最も小さい模範者音声特徴データを、練習音声データとの類似度が最も高いものとして選択する。 Upon receiving the practice voice feature data and the song ID, the control unit 31 of the server device 3 receives all the practice voice feature data received and all the model persons stored in association with the song ID in the model voice data storage area 32a. The voice feature data is compared, and the model voice feature data having the highest similarity with the practice voice feature data is selected (step S7). More specifically, the control unit 31 integrates the difference between the pitch represented by the practice voice feature data and the pitch represented by each model voice feature data over the entire area from the start to the end of the singing. Similarly, the control unit 31 integrates the difference between the power represented by the practice voice feature data and the power represented by each model voice feature data over the entire area from the start to the end of the singing. The same applies to the spectrum and timing. In addition, the control unit 31 also uses the section represented by the section information of each technique represented by the practice voice feature data and the section represented by the section information of each technique represented by each model voice feature data in the same manner as described above. Integrate the difference between and. Then, the control unit 31 accumulates the integrated value obtained as described above for each model voice feature data, and sets the model voice feature data having the smallest accumulated value as the similarity to the practice voice data. Choose as the highest.

次に、制御部３１は、模範音声データ記憶領域３２ａから、選択した模範者音声特徴データに対応する模範音声データを読み出し（ステップＳ８）、読み出した模範音声データをカラオケ装置２によって再生可能なデータ形式で送信（出力）する（ステップＳ９）。カラオケ装置２の制御部２１は、受信した模範音声データを再生する（ステップＳ１０）。つまり、制御部２１は、模範音声データを音声処理部２６に供給し、音声処理部２６がその模範音声データをアナログ信号に変換し、スピーカ２７から放音させる。このとき、制御部２１は、「あなたに合った歌唱の先生は、この○○○○さんです。よく聞いて真似してみましょう。」というメッセージを表示部２３に表示させる。これにより、練習者は、自らの歌唱に合った模範者の歌唱音声を聞くことができ、それを模範とすることで自身の歌唱の上達を図ることが可能となる。 Next, the control unit 31 reads out model voice data corresponding to the selected model person voice feature data from the model voice data storage area 32a (step S8), and data that can be reproduced by the karaoke apparatus 2 by the read out model voice data. It is transmitted (output) in the form (step S9). The control unit 21 of the karaoke apparatus 2 reproduces the received model voice data (step S10). That is, the control unit 21 supplies the model voice data to the voice processing unit 26, and the voice processing unit 26 converts the model voice data into an analog signal and emits sound from the speaker 27. At this time, the control unit 21 causes the display unit 23 to display a message “The singing teacher that suits you is Mr. XXX. Let's listen and imitate.” Thereby, the practitioner can hear the singing voice of the model suitable for his / her singing, and can improve his / her singing by using it as a model.

以上説明した実施形態によれば、練習者の歌唱音声に類似する模範者の歌唱音声を検索し、それを練習者に模範として提供することができる。 According to the embodiment described above, the singing voice of the model person similar to the singing voice of the practitioner can be searched and provided to the practitioner as a model.

［３．変形例］
上述した実施形態を次のように変形してもよい。
［３−１］上述した実施形態においては、練習者の歌唱音声（練習音声）に合った模範者の歌唱音声（模範音声）を検索する場合を例に挙げて説明したが、これに限らず、練習者の楽器の演奏音（練習演奏音）に合った模範者の演奏音（模範演奏音）を検索するようにしてもよい。この場合、上述した模範者の歌唱音声に代えて模範者の演奏音を表す模範演奏音データが用いられ、練習音声データに代えて練習者の演奏音を表す練習演奏音データが用いられる。また、伴奏・歌詞データ記憶領域２２ａには、練習したい楽器（例えばギター）以外の楽器（例えばベースやドラム）の演奏データが記憶されるし、歌唱楽譜音データ記憶領域２２ｃには、楽譜に演奏音として規定された楽譜音データが記憶される。サーバ装置３の制御部３１は、これらのデータに基づき、上記と同様の処理を経て練習演奏音データに類似した模範演奏音データを検索する。
このように、本発明は、模範となる歌唱音声又は演奏音を表す模範音データを記憶しておき、練習者の歌唱音声又は演奏音を表す練習音データを取得すると、記憶している各々の模範音データの特徴と取得した練習音データの特徴とを比較し、練習音データの特徴に類似する特徴を有する模範音データを選択して出力する、という構成を採る。 [3. Modified example]
The above-described embodiment may be modified as follows.
[3-1] In the above-described embodiment, the example of searching for the singing voice (exemplary voice) of the model person who matches the singing voice (practice voice) of the practitioner has been described as an example. The performance sound (exemplary performance sound) of the model person that matches the performance sound (practice performance sound) of the practitioner's instrument may be searched. In this case, model performance sound data representing the performance sound of the model person is used instead of the singing voice of the model person described above, and practice performance sound data representing the performance sound of the practice person is used instead of the practice sound data. The accompaniment / lyric data storage area 22a stores performance data of an instrument (for example, bass or drum) other than the instrument (for example, guitar) to be practiced, and the singing score sound data storage area 22c performs performance on a score. Musical score data specified as a sound is stored. Based on these data, the control unit 31 of the server device 3 searches for model performance sound data similar to the practice performance sound data through the same processing as described above.
Thus, this invention memorize | stores the model sound data showing the singing voice or performance sound used as a model, and when the practice sound data showing the practitioner's singing voice or performance sound is acquired, each memorize | stored The feature of the model sound data is compared with the feature of the acquired practice sound data, and the model sound data having a feature similar to the feature of the practice sound data is selected and output.

［３−２］上述した実施形態では、練習者が所望の楽曲を歌唱した後に、その歌唱音声に類似する模範者の歌唱音声を検索するようになっていた。よって、練習者は少なくとも１回は楽曲を歌唱しないと、その楽曲について模範となる歌唱音声を得ることができない。練習者がこのような手順を煩雑に感じる場合には、次のように変形してもよい。この変形例は、練習者の歌唱音声と模範者の歌唱音声とが類似している場合には、例え楽曲が異なっても両者の類似度は高いという点に着目したものである。
図６に示すように、サーバ装置３の記憶部３２は、前述した模範音声データ記憶領域３２ａ及び練習音声特徴データ記憶領域３２ｂのほか、対応ＩＤ記憶領域３２ｃを有している。この対応ＩＤ記憶領域３２ｃには、図７に示すように、練習者に割り当てられた練習者ＩＤと、模範者に割り当てられた模範者ＩＤとが対応付けられて記憶されている。練習者がこの検索システム１において或る楽曲（例えば楽曲ＩＤ「ｇ０１」）を歌唱し、自身の模範となる模範音声データを検索すると、そのときに検索された模範音声データの模範者ＩＤが、その練習者の練習者ＩＤと対応付けられて対応ＩＤ記憶領域３２ｃに記憶される。以降、その練習者が別の楽曲（例えば楽曲ＩＤ「ｇ０２」）について模範音声データを検索しようとしたときには、その練習者の練習者ＩＤに対応付けられて対応ＩＤ記憶領域３２ｃに記憶されている模範者ＩＤが特定され、特定された模範者ＩＤに対応付けられて模範音声データ記憶領域３２ａに記憶されている複数の模範音声データのうち、その楽曲ＩＤ「ｇ０２」に対応付けられている模範音声データが検索される。 [3-2] In the above-described embodiment, after the practitioner sings the desired music, the singing voice of the model person similar to the singing voice is searched. Therefore, the practitioner cannot obtain a singing voice that serves as an example for the song unless the song is sung at least once. When the practitioner feels such procedures complicated, the following modifications may be made. This modification focuses on the fact that if the singing voice of the practitioner and the singing voice of the model person are similar, the similarity between them is high even if the music is different.
As shown in FIG. 6, the storage unit 32 of the server device 3 has a corresponding ID storage area 32 c in addition to the above-described exemplary voice data storage area 32 a and practice voice feature data storage area 32 b. In this correspondence ID storage area 32c, as shown in FIG. 7, the practitioner ID assigned to the practitioner and the modeler ID assigned to the modeler are stored in association with each other. When the practitioner sings a song (for example, song ID “g01”) in the search system 1 and searches for model voice data that serves as a model, the model ID of the model voice data searched at that time is The corresponding ID is stored in the corresponding ID storage area 32c in association with the practitioner's ID. Thereafter, when the practitioner tries to search for model voice data for another music piece (for example, the music piece ID “g02”), it is stored in the corresponding ID storage area 32c in association with the practitioner ID of the practitioner. The model ID associated with the music ID “g02” is identified from among the plurality of model voice data stored in the model voice data storage area 32a in association with the identified model ID. Audio data is searched.

具体的な動作例を図８に示す。この図８において、図５と同一の動作については同一の符号を付している。
図８のステップＳ１の以前において、練習者は楽曲ＩＤのほかに、自らの練習者ＩＤをカラオケ装置２に入力する。そして、ステップＳ６’において、カラオケ装置２の制御部２１は、練習音声特徴データ及び楽曲ＩＤとともに練習者ＩＤを通信部２８からサーバ装置３に送信する。サーバ装置３の制御部３１は、練習音声特徴データ、楽曲ＩＤ及び練習者ＩＤを受信すると、ステップＳ７において練習音声特徴データとの類似度が最も高い模範者音声特徴データを選択した後に、ステップＳ１１として、カラオケ装置２から受信した練習者ＩＤを、選択された模範者音声特徴データに対応付けられた模範者ＩＤとを対応付けて対応ＩＤ記憶領域３２ｃに記憶する。そして、制御部３１は、模範音声データ記憶領域３２ａから、選択した模範者音声特徴データに対応する模範音声データを読み出し（ステップＳ８）、読み出した模範音声データをカラオケ装置２に送信（出力）する（ステップＳ９）。カラオケ装置２の制御部２１は、受信した模範音声データを再生する（ステップＳ１０）。 A specific operation example is shown in FIG. In FIG. 8, the same operations as those in FIG. 5 are denoted by the same reference numerals.
Before step S1 in FIG. 8, the practitioner inputs his / her practitioner ID to the karaoke apparatus 2 in addition to the music ID. And in step S6 ', the control part 21 of the karaoke apparatus 2 transmits practice person ID to the server apparatus 3 from the communication part 28 with practice audio | voice characteristic data and music ID. Upon receiving the practice voice feature data, the song ID, and the trainer ID, the control unit 31 of the server device 3 selects the model voice feature data having the highest similarity to the practice voice feature data in step S7, and then step S11. As an example, the trainer ID received from the karaoke apparatus 2 is stored in the corresponding ID storage area 32c in association with the model ID associated with the selected model voice feature data. Then, the control unit 31 reads out the model voice data corresponding to the selected model person voice feature data from the model voice data storage area 32a (step S8), and transmits (outputs) the read out model voice data to the karaoke apparatus 2. (Step S9). The control unit 21 of the karaoke apparatus 2 reproduces the received model voice data (step S10).

この後、練習者が別の楽曲ＩＤと共に自らの練習者ＩＤをカラオケ装置２に入力すると、カラオケ装置２はこの入力を受け付ける（ステップＳ１２）。そして、制御部２１は、入力された楽曲ＩＤ及びに練習者ＩＤを通信部２８からサーバ装置３に送信する。サーバ装置３の制御部３１は、楽曲ＩＤ及び練習者ＩＤを受信すると、対応ＩＤ記憶領域３２ｃにおいて、その練習者ＩＤに対応付けられている模範者ＩＤを特定する（ステップＳ１３）。そして、制御部３１は、模範音声データ記憶領域３２ａから、上記楽曲ＩＤおよび模範者ＩＤに対応する模範音声データを読み出し（ステップＳ１４）、読み出した模範音声データをカラオケ装置２に送信する。カラオケ装置２の制御部２１は、受信した模範音声データを再生する（ステップＳ１５）。このようにすれば、練習者は練習したい楽曲をわざわざ歌唱しなくても、その楽曲について模範となる歌唱音声を検索することが可能となる。 Thereafter, when the practitioner inputs his / her practitioner ID together with another music ID to the karaoke apparatus 2, the karaoke apparatus 2 accepts this input (step S12). Then, the control unit 21 transmits the input music ID and practitioner ID from the communication unit 28 to the server device 3. The control part 31 of the server apparatus 3 will identify model person ID matched with the practicer ID in the corresponding ID storage area 32c, if music ID and practitioner ID are received (step S13). And the control part 31 reads the model audio | voice data corresponding to the said music ID and model person ID from the model audio | voice data storage area 32a (step S14), and transmits the read model audio | voice data to the karaoke apparatus 2. FIG. The control unit 21 of the karaoke apparatus 2 reproduces the received model voice data (step S15). In this way, the practitioner can search for a singing voice that serves as an example for the music without having to bother singing the music to be practiced.

また、上述した実施形態に係る検索システム１が有効に機能するためには、全ての楽曲について十分な数の模範音声データを用意しておかなければならない。なぜなら、模範音声データが少ないと、練習者の歌唱音声に合った模範音声データを得ることができないからである。そこで、検索システム１が動作する過程において、カラオケ装置２によって記憶された練習音声データそのものを、模範者の歌唱音声を表す模範音声データとしてサーバ装置３に記憶させるようにしてもよい。 In addition, in order for the search system 1 according to the above-described embodiment to function effectively, a sufficient number of exemplary voice data must be prepared for all musical pieces. This is because if there is little model voice data, model voice data that matches the singing voice of the practitioner cannot be obtained. Therefore, in the process in which the search system 1 operates, the practice voice data itself stored by the karaoke device 2 may be stored in the server device 3 as model voice data representing the model person's singing voice.

［３−３］上述した実施形態においては、練習音声特徴データと最も類似度が高い模範音声特徴データを１つ選択するようになっていたが、選択する模範音声特徴データの数は１に限定されるものではなく、類似度が高い順から複数の模範音声特徴データを選択し、選択したそれぞれの模範音声特徴データに対応する模範音声データを出力するようにしてもよい。例えば、制御部３１は、選択された複数の模範音声データに割り当てられた模範者ＩＤ（歌手名）をカラオケ装置２に送信（出力）して、これら模範者ＩＤをカラオケ装置２に一覧形式で表示させる。練習者がこの一覧の中から所望の模範者ＩＤ（歌手名）を指定すると、カラオケ装置２の制御部２１は、その模範者ＩＤをサーバ装置３に送信する。そして、制御部３１は、模範者ＩＤをカラオケ装置２から受け取ると、その模範者ＩＤが割り当てられた模範者の模範音声データをカラオケ装置２に送信（出力）する。 [3-3] In the embodiment described above, one model voice feature data having the highest similarity to the practice voice feature data is selected, but the number of model voice feature data to be selected is limited to one. Instead, a plurality of model voice feature data may be selected in descending order of similarity, and model voice data corresponding to each selected model voice feature data may be output. For example, the control unit 31 transmits (outputs) the modeler ID (singer name) assigned to the selected plurality of model voice data to the karaoke device 2 and outputs these modeler IDs to the karaoke device 2 in a list format. Display. When the practitioner designates a desired model ID (singer name) from this list, the control unit 21 of the karaoke apparatus 2 transmits the model ID to the server apparatus 3. When receiving the model ID from the karaoke apparatus 2, the control unit 31 transmits (outputs) the model voice data of the model person assigned with the model ID to the karaoke apparatus 2.

［３−４］実施形態では、練習音声特徴データと、各模範音声特徴データとの差分の積分値を用いて類似度を判定していたが、例えば、多次元空間上で、練習音声特徴データの座標と各模範音声特徴データの座標とのユークリッド距離を算出し、そのユークリッド距離が最小となる模範音声特徴データを、最も類似度が高いものとして選択するようにしてもよい。 [3-4] In the embodiment, the similarity is determined using the integrated value of the difference between the practice voice feature data and each model voice feature data. For example, in the multidimensional space, the practice voice feature data The Euclidean distance between the coordinates of each of the model voice feature data may be calculated, and the model voice feature data having the smallest Euclidean distance may be selected as having the highest similarity.

［３−５］上述した実施形態においては、模範音声特徴データや練習音声特徴データとして、音声のピッチ、タイミング、パワー、スペクトル及び技法の全てを用いたが、これらの少なくともいずれかを用いるだけでもよいし、さらにこれら以外の特徴要素を用いても良い。また、これらのうち、どの特徴要素を用いるかを練習者が操作部２４を用いて選択できるようにしてもよい。同様に、各種の技法のうちいずれかを練習者が選択できるようにしてもよい。 [3-5] In the above-described embodiment, all of the pitch, timing, power, spectrum, and technique of the voice are used as the model voice feature data and the practice voice feature data, but it is also possible to use at least one of these. Further, other characteristic elements may be used. Also, it may be possible for the practitioner to select which of these feature elements to use using the operation unit 24. Similarly, the practitioner may select any one of various techniques.

［３−６］また、上述した実施形態においては、練習音声特徴データは、カラオケ装置２の制御部２１によって生成されるようになっていたが、これに代えて、サーバ装置３の制御部３１によって生成されるようにしてもよい。また、カラオケ装置２の制御部２１が練習音声特徴データの入力を促し、練習者が予め用意しておいた練習音声特徴データを入力するようにしてもよい。この場合、例えば、制御部２１が、練習音声特徴データの入力を促す画面を表示部２３に表示させ、練習者は、例えばＵＳＢ（Universal Serial Bus）等のインタフェースを介してカラオケ装置２に練習音声特徴データを入力するようにすればよい。この場合、事前にパーソナルコンピュータ等の装置で練習音声特徴データを生成するようにしておけばよい。この際も、上述した実施形態と同様に、パーソナルコンピュータが、マイクロフォンで練習者の音声を収音して、収音した音声を分析して練習音声特徴データを生成する。また、カラオケ装置２にＲＦＩＤリーダを設けて、練習音声特徴データが書き込まれたＲＦＩＤをカラオケ装置２のＲＦＩＤリーダが読み取るようにしてもよい。 [3-6] In the embodiment described above, the practice voice feature data is generated by the control unit 21 of the karaoke apparatus 2, but instead of this, the control unit 31 of the server apparatus 3. May also be generated. Alternatively, the control unit 21 of the karaoke apparatus 2 may prompt the practice voice feature data to be input, and the practice voice feature data prepared in advance by the practitioner may be inputted. In this case, for example, the control unit 21 causes the display unit 23 to display a screen that prompts the user to input practice voice feature data, and the practitioner sends the practice voice to the karaoke apparatus 2 via an interface such as a USB (Universal Serial Bus). What is necessary is just to input feature data. In this case, the practice voice feature data may be generated in advance by a device such as a personal computer. At this time, as in the above-described embodiment, the personal computer collects the voice of the practitioner with the microphone, analyzes the collected voice, and generates practice voice feature data. Further, an RFID reader may be provided in the karaoke device 2 so that the RFID in which the practice voice feature data is written is read by the RFID reader of the karaoke device 2.

［３−７］サーバ装置３が模範音声データを出力する形態は、カラオケ装置２への送信に限らず、練習者のメール端末宛の電子メールに模範音声データを添付して送信するという形態であってもよい。また、模範音声データを記憶媒体に出力して記憶させるようにしてもよく、この場合、練習者はコンピュータを用いてこの記憶媒体から模範音声データを読み出させて再生させることで、それを聴くことができる。また、模範音声データをカラオケ装置２によって再生させる場合には、曲の最初から最後までを再生する必要はなく、その一部だけを再生するものであってもよい。例えば、特徴の類似度が低かった歌唱部分だけを再生するようにすれば、練習者はどの歌唱部分を重点的に練習すればよいのかを認識することができる。 [3-7] The form in which the server apparatus 3 outputs the model voice data is not limited to the transmission to the karaoke apparatus 2, but the form in which the model voice data is attached to the e-mail addressed to the practitioner's mail terminal. There may be. Also, the model voice data may be output and stored on a storage medium. In this case, the practitioner listens to the model voice data by reading it from the storage medium and playing it using a computer. be able to. Further, when the model voice data is reproduced by the karaoke apparatus 2, it is not necessary to reproduce the first to the last of the music, and only a part of the music may be reproduced. For example, if only a singing portion having a low feature similarity is reproduced, the practitioner can recognize which singing portion should be focused on.

［３−８］実施形態において、サーバ装置３は模範音声データから模範音声特徴データを抽出して事前に記憶しておいたが、そうではなくて、サーバ装置３が模範音声データだけを記憶しておき、検索を行う必要がある度に模範音声データから模範音声特徴データを抽出するようにしてもよい。なお、模範音声データや練習音声データはＷＡＶＥ形式やＭＰ３形式のデータとしたが、データの形式はこれに限定されるものではなく、音声を示すデータであればどのような形式のデータであってもよい。 [3-8] In the embodiment, the server device 3 extracts the model voice feature data from the model voice data and stores the model voice data in advance. Instead, the server apparatus 3 stores only the model voice data. In addition, the model voice feature data may be extracted from the model voice data every time it is necessary to perform a search. The model voice data and the practice voice data are data in the WAVE format or the MP3 format. However, the data format is not limited to this, and any format may be used as long as the data indicates voice. Also good.

［３−９］上述した実施形態では、カラオケ装置２とサーバ装置３とが通信ネットワークで接続された検索システム１が、本実施形態に係る機能の全てを実現するようになっている。これに対し、通信ネットワークで接続された３以上の装置が上記機能を分担するようにし、それら複数の装置を備えるシステムが同実施形態のシステムを実現するようにしてもよい。または、ひとつの装置が上記機能のすべてを実現するようにしてもよい。 [3-9] In the above-described embodiment, the search system 1 in which the karaoke device 2 and the server device 3 are connected via a communication network realizes all the functions according to the present embodiment. On the other hand, three or more devices connected via a communication network may share the above functions, and a system including the plurality of devices may realize the system of the embodiment. Alternatively, one device may realize all of the above functions.

［３−１０］上述した実施形態におけるカラオケ装置２の制御部２１またはサーバ装置３の制御部３１によって実行されるプログラムは、磁気テープ、磁気ディスク、フレキシブルディスク、光記録媒体、光磁気記録媒体、ＣＤ（Compact Disk）−ＲＯＭ、ＤＶＤ（Digital Versatile Disk）、ＲＡＭなどの記録媒体に記憶した状態で提供し得る。また、インターネットのようなネットワーク経由でカラオケ装置２またはサーバ装置３にダウンロードさせることも可能である。 [3-10] A program executed by the control unit 21 of the karaoke device 2 or the control unit 31 of the server device 3 in the above-described embodiment is a magnetic tape, a magnetic disk, a flexible disk, an optical recording medium, a magneto-optical recording medium, It can be provided in a state where it is stored in a recording medium such as a CD (Compact Disk) -ROM, a DVD (Digital Versatile Disk), or a RAM. It is also possible to download to the karaoke apparatus 2 or the server apparatus 3 via a network such as the Internet.

本発明の一実施形態にかかる検索システムの全体構成を示すブロック図である。1 is a block diagram showing an overall configuration of a search system according to an embodiment of the present invention. カラオケ装置の構成を示すブロック図である。It is a block diagram which shows the structure of a karaoke apparatus. サーバ装置の構成を示すブロック図である。It is a block diagram which shows the structure of a server apparatus. サーバ装置によって記憶されるデータの一例を示す図である。It is a figure which shows an example of the data memorize | stored by the server apparatus. 実施形態の動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of embodiment. 変形例にかかるサーバ装置の構成を示すブロック図である。It is a block diagram which shows the structure of the server apparatus concerning a modification. 同サーバ装置によって記憶されるデータの一例を示す図である。It is a figure which shows an example of the data memorize | stored by the server apparatus. 同変形例の動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of the modification.

Explanation of symbols

１…検索システム、２ａ，２ｂ，２ｃ…カラオケ装置、３…サーバ装置、４…ネットワーク、２１…制御部、２２…記憶部、２３……表示部、２４…操作部、２５…マイクロフォン、２６…音声処理部、２７…スピーカ、２８…通信部、３１…制御部、３２…記憶部、３３…通信部。 DESCRIPTION OF SYMBOLS 1 ... Search system, 2a, 2b, 2c ... Karaoke apparatus, 3 ... Server apparatus, 4 ... Network, 21 ... Control part, 22 ... Memory | storage part, 23 ... Display part, 24 ... Operation part, 25 ... Microphone, 26 ... Audio processing unit 27 ... speaker 28 ... communication unit 31 ... control unit 32 ... storage unit 33 ... communication unit

Claims

An exemplary sound storage means for storing exemplary sound data representing an exemplary singing voice or performance sound;
Acquisition means for acquiring practice sound data representing the singing voice or performance sound of the practitioner;
The model sound having the characteristics similar to the characteristics of the practice sound data by comparing the characteristics of each model sound data stored by the model sound storage means with the characteristics of the practice sound data acquired by the acquisition means. A selection means for selecting data;
An output unit that outputs the model sound data selected by the selection unit.

Music identification information assigned to a song to be sung or played, model identification information assigned to a model person who performs singing or playing as a model, and model sound data representing singing voice or performance sound by each model person A model sound storage means for storing
First acquisition means for acquiring practice sound data representing the singing voice or performance sound of the practitioner and practitioner identification information assigned to the practitioner;
The characteristics of each model sound data stored by the model sound storage means are compared with the characteristics of the practice sound data acquired by the first acquisition means, and the characteristics similar to the characteristics of the practice sound data are compared. First selection means for selecting exemplary sound data having;
The practitioner identification information acquired by the first acquisition unit and the modeler identification information stored in the model sound storage unit in association with the model sound data selected by the first selection unit. Identification information storage means for storing in association with each other;
Second acquisition means for acquiring the practitioner identification information and the music identification information;
The model identification information stored in the identification information storage unit in association with the practitioner identification information acquired by the second acquisition unit is specified, and the model identification information is associated with the specified model identification information. Second selection means for selecting model sound data associated with the music identification information acquired by the second acquisition means from among the plurality of model sound data stored in the sound storage means;
A search apparatus comprising: output means for outputting model sound data selected by the first selection means or the second selection means.

A communication unit that performs communication via a network with a communication device including a sound collection unit and a reproduction unit;
The acquisition means receives the practice sound data collected by the sound collection means of the communication device by the communication means,
The search device according to claim 1, wherein the output unit transmits the exemplary sound data in a data format reproducible by the reproduction unit of the communication device from the communication unit to the communication device.

The selection means selects a plurality of model sound data from the model sound data stored in the model sound storage means in descending order of similarity to the characteristics of the practice sound data acquired by the acquisition means. ,
The output means first outputs the model person identification information assigned to the plurality of model sound data selected by the selection means, and then the model person identification information designated among these model person identification information is The assigned model sound data is output. The search device according to claim 1, wherein:

The first selection unit includes a plurality of model sound data stored in the model sound storage unit, in order of decreasing similarity to the characteristics of the practice sound data acquired by the first acquisition unit. Select model sound data,
The output means first outputs the model person identification information assigned to the plurality of model sound data selected by the first selection means, and then the model person specified in the model person identification information. Output the model sound data to which the identification information is assigned,
The identification information storage means associates the practitioner identification information acquired by the first acquisition means with the exemplary person identification information specified among the plurality of exemplary person identification information output by the output means. The search device according to claim 2, wherein the search device is stored.

The feature of the singing voice or performance sound is at least one of pitch, timing, spectrum, power, and technique data indicating the type and section of the technique used for singing or performance. The search device according to claim 1 or 2.

An exemplary sound storage means for storing exemplary sound data representing an exemplary singing voice or performance sound, and a control method for a search device comprising a control means,
A first step in which the control means obtains practice sound data representing a practicing song voice or performance sound;
The control means compares the characteristics of each model sound data stored in the model sound storage means with the characteristics of the practice sound data acquired in the first step, and determines the characteristics of the practice sound data. A second step of selecting exemplary sound data having similar characteristics;
The control means comprises a third step of outputting the model sound data selected in the second step.

Music identification information assigned to a song to be sung or played, model identification information assigned to a model person who performs singing or playing as a model, and model sound data representing singing voice or performance sound by each model person A search device comprising: model sound storage means for storing the reference information, identification information storage means for storing the reference person identification information and the practitioner identification information assigned to the practitioner, and control means Control method,
A first step in which the control means acquires practice sound data representing the singing voice or performance sound of the practitioner and practitioner identification information assigned to the practitioner;
The control means compares the characteristics of each model sound data stored in the model sound storage means with the characteristics of the practice sound data acquired in the first step, and determines the characteristics of the practice sound data. A second step of selecting exemplary sound data having similar characteristics;
The control means is associated with the trainer identification information acquired in the first step and the model sound data selected in the second step, and is stored in the model sound storage means. A third step of outputting the exemplary sound data selected in the second step while storing the identification information in the identification information storage unit in association with the information;
A fourth step in which the control means acquires the practitioner identification information and the music piece identification information;
The control means specifies model identification information stored in the identification information storage means in association with the practitioner identification information acquired in the fourth step, and associates with the specified model identification information. A fifth step of selecting the model sound data associated with the music identification information acquired by the second acquisition means from among the plurality of model sound data stored in the model sound storage means; ,
The control method comprises: a sixth step of outputting the model sound data selected in the fifth step.

In a computer equipped with model sound storage means for storing model sound data representing an exemplary singing voice or performance sound,
An acquisition function for acquiring practice sound data representing a practicing song voice or performance sound;
A model sound having characteristics similar to the characteristics of the practice sound data by comparing the characteristics of each model sound data stored by the model sound storage means with the characteristics of the practice sound data acquired by the acquisition function A selection function to select data,
An output function for outputting model sound data selected by the selection function.

Music identification information assigned to a song to be sung or played, model identification information assigned to a model person who performs singing or playing as a model, and model sound data representing singing voice or performance sound by each model person A computer comprising: model sound storage means for storing the information in association with each other; and identification information storage means for storing the model identification information and the practitioner identification information assigned to the practitioner in association with each other.
A first acquisition function for acquiring practice sound data representing the singing voice or performance sound of the practitioner and practitioner identification information assigned to the practitioner;
The feature of each model sound data stored by the model sound storage means is compared with the feature of the practice sound data acquired by the first acquisition function, and the feature is similar to the feature of the practice sound data. A first selection function for selecting model sound data having;
Trainer identification information acquired by the first acquisition function and model identification information stored in the model sound storage means in association with the model sound data selected by the first selection function. A writing function to be associated and stored in the identification information storage means;
A second acquisition function for acquiring the practitioner identification information and the music identification information;
The model identification information stored in the identification information storage means in association with the practitioner identification information acquired by the second acquisition function is specified, and the model identification information is associated with the specified model identification information. A second selection function for selecting model sound data associated with the music identification information acquired by the second acquisition means from among the plurality of model sound data stored in the sound storage means;
A program for realizing an output function for outputting model sound data selected by the first selection function or the second selection function.