JP2009276776A

JP2009276776A - Music piece identification device and its method, music piece identification and distribution device and its method

Info

Publication number: JP2009276776A
Application number: JP2009188394A
Authority: JP
Inventors: Mototsugu Abe; 素嗣安部; Masayuki Nishiguchi; 正之西口
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2009-08-17
Filing date: 2009-08-17
Publication date: 2009-11-26
Anticipated expiration: 2020-10-05
Also published as: JP4788810B2

Abstract

<P>PROBLEM TO BE SOLVED: To be able to easily identify a music piece name of a music piece by use of one segment of the music piece and also to distribute the music piece. <P>SOLUTION: Music segment data 41 for one part of a music piece for a few second is transmitted to a power spectrum analysis part 43 and converted into a time frequency distribution. A feature matrix generation part 44 executes resampling of the time frequency distribution data from the power spectrum analysis part 43 at a prescribed frequency and time interval, and generates a music segment feature matrix (feature vector) S<SB>fu</SB>. The data of the music segment feature matrix S<SB>fu</SB>is transmitted to a matching part 45 and compared with a plurality of music feature matrixes A<SB>ft</SB>obtained from a database 35 to calculate the degree of similarity, so as to identify a music piece name to be a candidate out of music pieces registered already. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、例えば電子音楽配信のための楽曲同定装置及び方法、楽曲同定配信装置及び方法に関するものであり、特に携帯電話を利用した楽曲配信を行うための楽曲同定装置及び方法、楽曲同定配信装置及び方法に関する。 The present invention relates to a music identification apparatus and method for electronic music distribution, for example, a music identification distribution apparatus and method, and in particular, a music identification apparatus and method for music distribution using a mobile phone, and a music identification distribution apparatus. And a method.

近年は、膨大な数の新曲が発表されており、過去に作成された様々な楽曲も含めて、それらの全てを知ることは到底不可能な状況になっている。また、それら様々な楽曲は、テレビジョン放送やラジオ放送の番組中やコマーシャルメッセージ中に流されたり、街頭で流されることも多く、したがって、人々は自らの意思にかかわらず、様々な楽曲を耳にする機会が多い。 In recent years, a huge number of new songs have been announced, and it is impossible to know all of them, including various songs created in the past. Also, these various songs are often played in television and radio programs, commercial messages, and on the streets, so people listen to various songs regardless of their intentions. There are many opportunities to make.

このように、周囲に溢れる楽曲の中から、ある気に入った楽曲を入手しようとした場合、購入者は、一般に以下の（１）〜（４）の順番のような手続きを踏むことになる。 As described above, when trying to obtain a favorite music from among the music overflowing in the surrounding area, the purchaser generally takes a procedure such as the following order (1) to (4).

（１）楽曲を耳にする。 (1) Listen to music.

（２）楽曲名を調査する。 (2) Check the song name.

（３）楽曲販売店に行く。 (3) Go to a music store.

（４）楽曲名を指定し楽曲を購入する。 (4) Designate a song name and purchase a song.

従来技術として、特許文献１−６が知られている。 Patent documents 1-6 are known as conventional technology.

特開平０７−１２１５５６号公報Japanese Patent Application Laid-Open No. 07-121556 特開平０２−０３３２００号公報Japanese Patent Laid-Open No. 02-033200 特開平０９−２９３０８３号公報JP 09-293083 A 特開２０００−１７２６９３号公報JP 2000-172893 A 特開２０００−１８７６７１号公報JP 2000-187671 A 特開平０６−０６９９９９号公報Japanese Patent Laid-Open No. 06-069999

しかしながら、多くの場合、上記（２）の楽曲名の調査には多大な労力を要する。また、楽曲を知ると同時に楽曲名を知り得る場合もあるものの、例えば一部分のみを耳にした曲、放送番組中やコマーシャルメッセージで耳にした曲、外出時に耳にした曲などの場合、その楽曲名を同定できずに結局は入手に至らないことも多い。一方、調査の結果として楽曲名を知り得た場合でも、実際に楽曲を入手するためには、販売店に足を運ぶか、あるいは、電子音楽配信などを利用する必要がある。電子音楽配信を利用する場合は、販売店に足を運ぶ必要はないものの、パーソナルコンピュータを動作させ、インターネットヘの接続を行い、配信者への支払い方法の登録を行い、楽曲をダウンロードするなど、多段階の行動が必要となる。 However, in many cases, a great deal of labor is required for the investigation of the music name in (2) above. In addition, although you may know the song name at the same time as knowing the song, for example, if you listen to only a part of the song, a song you hear in a broadcast program or commercial message, or a song you hear when you go out, that song In many cases, the name cannot be identified, and eventually it cannot be obtained. On the other hand, even if the name of the music is known as a result of the survey, in order to actually obtain the music, it is necessary to go to a store or use electronic music distribution. When using electronic music distribution, it is not necessary to go to the store, but operate a personal computer, connect to the Internet, register the payment method to the distributor, download music, etc. Multi-step action is required.

そこで、本発明はこのような実情に鑑みてなされたものであり、例えば演奏、再生、放送などがされている楽曲或いはその一部断片であっても、その楽曲を容易に同定及び入手（例えば購入）可能とし、また、その楽曲の代金の支払い、楽曲の再生等をも可能にする楽曲同定方法及び装置、楽曲同定配信装置及び方法を提供することを目的とする。 Therefore, the present invention has been made in view of such circumstances, and for example, even if a piece of music is being played, played back, broadcasted, or a partial fragment thereof, the piece of music can be easily identified and obtained (for example, It is an object of the present invention to provide a music identification method and apparatus, a music identification distribution apparatus and method, which can be purchased), and can also be used for payment of the music, reproduction of music, and the like.

上述の課題を解決するために、本発明に係る楽曲同定装置は、音楽データの音楽断片データから候補となる楽曲名を同定するための音楽断片特徴量を抽出する音楽断片特徴量抽出手段と、上記音楽断片特徴量抽出手段により抽出された音楽断片特徴量に基づいて、既に登録されている楽曲の中から、候補となる楽曲名を同定する同定手段とを有する。 In order to solve the above-described problems, a music identification device according to the present invention includes a music fragment feature amount extraction unit that extracts music fragment feature amounts for identifying candidate song names from music fragment data of music data, And identifying means for identifying a candidate song name from already registered songs based on the music fragment feature value extracted by the music fragment feature value extracting means.

また、本発明に係る楽曲同定方法は、音楽データの音楽断片データから候補となる楽曲名を同定するための音楽断片特徴量を抽出する音楽断片特徴量抽出工程と、上記音楽断片特徴量抽出工程により抽出された音楽断片特徴量に基づいて、既に登録されている楽曲の中から、候補となる楽曲名を同定する同定工程とを有する。 The music identification method according to the present invention includes a music fragment feature extraction step for extracting a music fragment feature for identifying a candidate song name from music fragment data of music data, and the music fragment feature extraction step. And an identification step of identifying candidate song names from among the already registered songs based on the music fragment feature values extracted by the above.

また、本発明に係る楽曲同定配信装置は、音楽データを記録する記録手段と、上記音楽データから音楽を再生する再生手段と、携帯電話運用システムを介して、上記音楽データの音楽断片データから候補となる楽曲名を同定するための音楽断片特徴量を抽出する音楽断片特徴量抽出手段と、上記音楽断片特徴量抽出手段により抽出された音楽断片特徴量に基づいて、既に登録されている楽曲の中から、候補となる楽曲名を同定する同定手段と、上記同定手段に基づいて、同定された楽曲を送信する送信手段とを有する。 In addition, the music identification and distribution device according to the present invention includes a recording means for recording music data, a playback means for playing music from the music data, and a candidate from music fragment data of the music data via a mobile phone operation system. A music fragment feature quantity extracting means for extracting a music fragment feature quantity for identifying a song name, and the music piece feature quantity extracted by the music fragment feature quantity extraction means Among the above, an identification means for identifying a candidate music name, and a transmission means for transmitting the identified music based on the identification means.

また、本発明に係る楽曲同定配信方法は、音楽データを記録する記録工程と、上記音楽データから音楽を再生する再生工程と、携帯電話運用システムを介して、上記音楽データの音楽断片データから候補となる楽曲名を同定するための音楽断片特徴量を抽出する音楽断片特徴量抽出工程と、上記音楽断片特徴量抽出工程により抽出された音楽断片特徴量に基づいて、既に登録されている楽曲の中から、候補となる楽曲名を同定する同定工程と、上記同定工程により同定された楽曲を送信する送信工程とを有する。 In addition, the music identification and distribution method according to the present invention includes a recording step of recording music data, a playback step of playing music from the music data, and candidates from music fragment data of the music data via a mobile phone operation system. A music fragment feature extraction step for extracting a music fragment feature for identifying the music name to be and a music fragment feature extracted in the music fragment feature extraction step, Among them, the method includes an identification step for identifying a candidate song name, and a transmission step for transmitting the song identified by the identification step.

本発明によれば、音楽断片特徴量に基づいて、既に登録されている楽曲の中から、候補となる楽曲名を同定することにより、例えば演奏、再生、放送などがされている楽曲の一部断片であっても、その楽曲を容易に同定できる。また、同定した楽曲を携帯電話運用システムを介して配信することができる。 According to the present invention, for example, a part of a piece of music that is played, played, broadcast, etc., by identifying candidate music names from among already registered music pieces based on the music fragment feature amount Even if it is a fragment, the music can be easily identified. In addition, the identified music can be distributed via the mobile phone operation system.

本発明の実施の形態に係る携帯電話端末の送信部の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the transmission part of the mobile telephone terminal which concerns on embodiment of this invention. 本発明の実施の形態に係る携帯電話端末の受信部の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the receiving part of the mobile telephone terminal which concerns on embodiment of this invention. 楽曲をデータベースへ登録するためのシステムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the system for registering a music in a database. 特徴行列とマスクパターンの説明に用いる図である。It is a figure used for description of a feature matrix and a mask pattern. 音楽断片から楽曲を同定するための楽曲同定システムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the music identification system for identifying a music from a music fragment. マッチング方法の流れを示すフローチャートである。It is a flowchart which shows the flow of a matching method. 電子音楽同定配信システムの全体構成を示すシステム構成図である。1 is a system configuration diagram showing an overall configuration of an electronic music identification distribution system. 楽曲登録方法の流れを示すフローチャートである。It is a flowchart which shows the flow of a music registration method. 楽曲同定方法の流れを示すフローチャートである。It is a flowchart which shows the flow of a music identification method. 楽曲試聴方法の流れを示すフローチャートである。It is a flowchart which shows the flow of the music trial listening method. 楽曲購入方法の流れを示すフローチャートである。It is a flowchart which shows the flow of the music purchase method.

以下、本発明の好ましい実施の形態について、図面を参照しながら説明する。 Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.

先ず、第１の発明として携帯端末について述べ、続いて、第２の発明として音楽同定方法について述べ、さらに第３の発明として電子音楽同定配信サービスシステムについて述べる。 First, a portable terminal will be described as the first invention, then a music identification method will be described as the second invention, and an electronic music identification distribution service system will be described as the third invention.

第１の発明の携帯端末が適用される一実施の形態としての携帯電話端末の送信部の概要を図１に示し、受信部の概要を図２に示す。 FIG. 1 shows an overview of a transmission unit of a mobile phone terminal as an embodiment to which the mobile terminal of the first invention is applied, and FIG. 2 shows an overview of a reception unit.

本実施の形態の携帯電話端末の送信部は、一般の携帯電話装置と略々同様の構成を有し、図１に示すように、音響電気変換手段としてのマイクロフォン１、「０」〜「９」までの１０キー、通話開始キー、通話終了キー、電源オン／オフキー等の各種キーや各種メニュー、アイコン、アプリケーションなどの選択用の選択手段（例えばプッシュスイッチ機能を有するジョグダイヤルなどを含む）を備えたキー入力部２、マイクロフォン１により音響電気変換されたアナログ音響信号をディジタル音響信号に変換するＡ／Ｄ変換器４、ディジタル音響信号（例えば通話時の通話音声信号）を符号化する音声用符号化器６、キー入力部２や音声用符号化器６からの信号などを多重化するマルチプレクサ８、電送用の高周波信号への変調を行う変調器９等を備えている。これら各構成要素は、従来の携帯電話装置に搭載されているものと同様であるため、ここではこれらの詳細な動作の説明を省略する。 The transmission unit of the mobile phone terminal of the present embodiment has substantially the same configuration as that of a general mobile phone device. As shown in FIG. 1, the microphone 1 as an acoustoelectric conversion means, “0” to “9” 10 keys, a call start key, a call end key, various keys such as a power on / off key, and selection means for selecting various menus, icons, applications, and the like (including a jog dial having a push switch function, for example) A key input unit 2; an A / D converter 4 that converts an analog sound signal that has been subjected to acoustoelectric conversion by the microphone 1 into a digital sound signal; and a voice code that encodes a digital sound signal (for example, a voice signal during a call) 6, a multiplexer 8 that multiplexes signals from the key input unit 2 and the speech encoder 6, and a modulation that modulates a high frequency signal for transmission It is equipped with a 9, and the like. Since these components are the same as those mounted on the conventional mobile phone device, detailed description of these operations is omitted here.

本実施の形態の携帯電話端末の送信部において、一般の携帯電話装置と異なる独自の部分は、録音ボタン３、スイッチ５、および音楽データ処理部７であり、以下、これらの詳細な動作を説明する。 In the transmission unit of the mobile phone terminal according to the present embodiment, unique parts different from general mobile phone devices are a recording button 3, a switch 5, and a music data processing unit 7, which will be described in detail below. To do.

録音ボタン３は例えばオン（ＯＮ）／オフ（ＯＦＦ）ボタンであり、そのオン／オフ信号はマイクロフォン１の指向性切り換え制御信号とスイッチ５の切り換え制御信号となっている。 The recording button 3 is, for example, an ON / OFF button, and the ON / OFF signal is a directivity switching control signal for the microphone 1 and a switching control signal for the switch 5.

当該録音ボタン３がオフ（ＯＦＦ）になされている場合、マイクロフォン１の指向性は電話用として近距離に設定され、また、スイッチ５は音声用符号化器６側に設定される。したがって、当該録音ボタン３がオフになされている場合、マイクロフォン１を通じて入力された音響信号（この場合は通話時の音声信号となる）は、Ａ／Ｄ変換器４によりディジタル化され、音声用符号化器６により符号化されて、通話電送に用いられる音声データ（ディジタルビットストリームデータ）となされ、さらに、変調器９にて高周波信号に変調された後、携帯電話会社の運用システムヘと送信される。またこの通話時において、例えば、キー入力部２が操作者により操作されて当該キー入力部２から番号や動作指令などの入力がなされた場合、そのキー入力信号は、マルチプレクサ６にて上記音声データと組み合わされ、さらに上記変調器９にて高周波信号に変調された後、携帯電話会社の運用システムヘと送信される。以上は、従来の携帯電話の基本動作と全く同じである。 When the recording button 3 is turned off (OFF), the directivity of the microphone 1 is set to a short distance for telephone use, and the switch 5 is set to the voice encoder 6 side. Therefore, when the recording button 3 is turned off, the acoustic signal input through the microphone 1 (in this case, the voice signal at the time of a call) is digitized by the A / D converter 4, and the voice code The data is encoded by the encoder 6 to be used as voice data (digital bit stream data) used for telephone call transmission, and is further modulated to a high frequency signal by the modulator 9 and then transmitted to the operation system of the mobile phone company. The Also, during this call, for example, when the key input unit 2 is operated by an operator and a number or an operation command is input from the key input unit 2, the key input signal is sent to the audio data by the multiplexer 6. And further modulated to a high-frequency signal by the modulator 9, and then transmitted to the operation system of the mobile phone company. The above is exactly the same as the basic operation of a conventional mobile phone.

一方、録音ボタン３がオン（ＯＮ）になされた場合、マイクロフォン１の指向性は音楽録音用として遠距離に設定され、また、このときのスイッチ５は音楽データ処理部７側に設定される。これにより、マイクロフォン１を通じて入力された音響信号（この場合は音楽信号となる）は、Ａ／Ｄ変換器４によりディジタル化され、音楽データ処理部７により後述のように圧縮符号化或いは特徴抽出された後にビットストリーム化され、一時記憶部１０に送られて少なくとも所定の時間（例えば数秒間）分蓄積される。当該一時記憶部１０に蓄積された所定の時間分の音楽データに係るビットストリームは、自動的若しくは操作者の操作によるキー入力部２からの読み出し指示信号に応じて読み出されてマルチプレクサ８に送られる。当該マルチプレクサ８には、操作者の操作に応じてキー入力部２から出力された後述する要求信号などが入力され、上記所定時間分の音楽データに係るビットストリームと上記キー入力部２からの信号とが多重化されて変調器９に送られる。変調器９からは、上記通話時と同様に、高周波信号への変調がなされ、携帯電話運用システムヘと送信される。 On the other hand, when the recording button 3 is turned on (ON), the directivity of the microphone 1 is set to a long distance for music recording, and the switch 5 at this time is set to the music data processing unit 7 side. As a result, the acoustic signal (in this case, a music signal) input through the microphone 1 is digitized by the A / D converter 4 and compression encoded or feature extracted by the music data processing unit 7 as described later. After that, it is converted into a bit stream, sent to the temporary storage unit 10, and accumulated for at least a predetermined time (for example, several seconds). A bit stream related to music data for a predetermined time accumulated in the temporary storage unit 10 is read out in response to a read instruction signal from the key input unit 2 automatically or by an operator's operation and sent to the multiplexer 8. It is done. The multiplexer 8 receives a request signal, which will be described later, output from the key input unit 2 according to the operation of the operator, and receives a bit stream related to the music data for the predetermined time and a signal from the key input unit 2. Are multiplexed and sent to the modulator 9. The modulator 9 modulates the high frequency signal and transmits it to the mobile phone operation system in the same manner as in the above call.

なお、マイクロフォン１の指向性の調節は、一つのマイクロフォンの指向性を変更する方法でもよいし、指向性の異なる複数のマイクロフォンを切り替えて使用する方法でもよい。上記一時記憶部１０は、一般の携帯電話装置が備えるメモリとは別の大容量のメモリとして新たに設けても良いし、一般の携帯電話装置に設けられているメモリが大容量メモリである場合はそのメモリをそのまま用いても良い。また、上記一時記憶部１０は、後述する受信部のメモリ１８と共用することも可能であり、半導体メモリカードのように取り外し可能なものであっても良い。上記一時記憶部１０として、半導体メモリカードのように取り外し可能なものを用いると、マイクロフォン１により取り込んだ楽曲だけでなく、別の携帯電話端末や携帯型情報処理装置など様々な装置にて取得した楽曲についても、後述するような楽曲名の同定や楽曲の購入等を行えることになる。 The directivity of the microphone 1 may be adjusted by changing the directivity of one microphone or by switching between a plurality of microphones having different directivities. The temporary storage unit 10 may be newly provided as a large-capacity memory different from a memory included in a general mobile phone device, or when the memory provided in the general mobile phone device is a large-capacity memory May use the memory as it is. The temporary storage unit 10 can be shared with the memory 18 of the receiving unit described later, or may be removable like a semiconductor memory card. When the removable device such as a semiconductor memory card is used as the temporary storage unit 10, it is acquired not only by the music captured by the microphone 1 but also by various devices such as another mobile phone terminal and a portable information processing device. As for the music, the music name can be identified and the music can be purchased as described later.

上記音楽データ処理部７は、上記一時記憶部１０に蓄積（以下、適宜録音とする）された上記所定時間の短時間音楽データ（以下、音楽断片或いは音楽クリップと呼ぶ）から、その楽曲名の同定に必要となる符号化データを生成、あるいは楽曲名同定に必要となる後述する特微量を抽出するものである。 The music data processing unit 7 uses the music name of the music name from the short-time music data (hereinafter referred to as a music fragment or music clip) of the predetermined time stored in the temporary storage unit 10 (hereinafter referred to as recording as appropriate). Coded data required for identification is generated, or features to be described later required for music name identification are extracted.

すなわち本実施の形態において、上記音楽データ処理部７は、以下の第１、第２の構成の何れかを取り得る。 In other words, in the present embodiment, the music data processing unit 7 can take any of the following first and second configurations.

第１の音楽データ処理部７の構成は、例えばいわゆるＭＰＥＧオーディオなどのオーディオ圧縮用符号化器をそのまま用いる構成である。この場合、一時記憶部１０に録音された音楽断片は、上記オーディオ圧縮用符号化器によりそのまま圧縮符号化されて送信され、後述する楽曲同定システムに送られ、当該楽曲同定システムの側で、楽曲名同定に必要な後述する特微量への変換が行われることになる。この例によれば、本実施の形態の携帯電話端末に搭載しなければならない構成が一般的な符号化器のみで良いため、携帯電話端末のコストを低減することが可能となる。同時に、この例の場合、楽曲名同定のための後述する特徴抽出は楽曲同定システム側で行うことになるため、当該特徴抽出のための多量の演算を必要とする精度の高い同定法を導入できることになり、また、自動的に同定できない楽曲については、楽曲同定システム側の人が聞いて確認することもでき、さらに、新たな同定アルゴリズムが開発された場合に旧同定アルゴリズムを当該新たな同定アルゴリズムに置き換えるようなことが容易になる、などの利点がある。 The configuration of the first music data processing unit 7 is a configuration in which an audio compression encoder such as so-called MPEG audio is used as it is. In this case, the music piece recorded in the temporary storage unit 10 is compressed and encoded as it is by the audio compression encoder and transmitted to the music identification system described later. Conversion to a later-described special amount necessary for name identification is performed. According to this example, since only a general encoder needs to be installed in the mobile phone terminal of the present embodiment, the cost of the mobile phone terminal can be reduced. At the same time, in this example, feature extraction to be described later for music name identification is performed on the music identification system side, so a highly accurate identification method that requires a large amount of computation for the feature extraction can be introduced. In addition, songs that cannot be automatically identified can be heard and confirmed by people on the music identification system side, and when a new identification algorithm is developed, the old identification algorithm is replaced with the new identification algorithm. There are advantages such as easy replacement.

第２の音楽データ処理部７の構成は、当該音楽データ処理部７において後述する特徴抽出法を用いた特徴抽出を行い、その特徴量情報（すなわち例えば特徴ベクトル）を符号化する構成である。この例の場合、当該音楽データ処理部７により得られる特徴量の情報は、第１の構成のような圧縮符号化された音楽データと比べて、そのデータ量は遥かに少なくなる。このため、前記一時記憶部１０の記憶容量（メモリの容量）や、そのデータを電送する際の電送容量、電送時間などを節約できる利点がある。 The configuration of the second music data processing unit 7 is a configuration in which the music data processing unit 7 performs feature extraction using a feature extraction method described later, and encodes the feature amount information (for example, feature vectors). In the case of this example, the feature amount information obtained by the music data processing unit 7 has a much smaller data amount than the compression-coded music data as in the first configuration. Therefore, there is an advantage that the storage capacity (memory capacity) of the temporary storage unit 10, the transmission capacity when transmitting the data, the transmission time, and the like can be saved.

次に、図２に示した本実施の形態の携帯電話端末の受信部も、従来の携帯電話装置と略々同様の構成を有し、アンテナを介して受信した高周波信号を復調する復調器１１、多重化されたデータを分離するデマルチプレクサ１２、デマルチプレクサ１２により分離された通話用の音声データを復号化する音声用復号化器１５、上記復号化により得られた通話用のディジタル音声データをアナログ音声信号に変換するＤ／Ａ変換器１６、Ｄ／Ａ変換器１６からの通話用アナログ音声信号を放音するスピーカ１７、デマルチプレクサ１２により分離された制御データに含まれる番号や文字等のデータから表示用信号を生成する表示処理部１３、液晶表示デバイス等からなるディレイ１４等を備えている。これら各構成要素は、従来の携帯電話装置に搭載されているものと同様であるため、ここではこれらの詳細な動作の説明を省略する。 Next, the receiving unit of the mobile phone terminal according to the present embodiment shown in FIG. 2 has a configuration substantially similar to that of the conventional mobile phone device, and demodulates the high frequency signal received via the antenna. A demultiplexer 12 for separating the multiplexed data, a speech decoder 15 for decoding the speech data for speech separated by the demultiplexer 12, and the speech digital speech data obtained by the above decoding. A D / A converter 16 for converting into an analog voice signal, a speaker 17 for emitting a speech analog voice signal from the D / A converter 16, and numbers and characters included in the control data separated by the demultiplexer 12 A display processing unit 13 that generates a display signal from data, a delay 14 including a liquid crystal display device, and the like are provided. Since these components are the same as those mounted on the conventional mobile phone device, detailed description of these operations is omitted here.

一方、本実施の形態の携帯電話端末の受信部において、一般の携帯電話装置と異なる独自の部分は、音楽信号処理系の構成として、メモリ１８、オーディオ用復号化器１９、Ｄ／Ａ変換器２０、スピーカ２１が設けられていることであり、以下、これらの詳細な動作を説明する。 On the other hand, in the receiving unit of the mobile phone terminal of the present embodiment, a unique part different from a general mobile phone device includes a memory 18, an audio decoder 19, a D / A converter as a configuration of a music signal processing system. 20 and the speaker 21 are provided, and their detailed operations will be described below.

この図２に示す受信部において、前記復調器１１にて復調された受信データに、通話用の音声データとは別の本実施の形態に係る音楽データが含まれている場合、上記デマルチプレクサ１２は、受信データから当該音楽データを分離し、メモリ１８に送る。メモリ１８は、当該音楽データを一時蓄積する。なお、当該メモリ１８は、携帯電話端末に内蔵されているものでも、或いは、半導体メモリカードのように取り外し可能なものであっても良い。メモリ１８として、半導体メモリカードのように取り外し可能なものを用いると、取得した音楽データを他の機器でも再生できて都合がよい。 In the receiving unit shown in FIG. 2, when the received data demodulated by the demodulator 11 includes music data according to the present embodiment, which is different from voice data for speech, the demultiplexer 12 Separates the music data from the received data and sends it to the memory 18. The memory 18 temporarily stores the music data. Note that the memory 18 may be a built-in mobile phone terminal or a removable one such as a semiconductor memory card. If a removable memory such as a semiconductor memory card is used as the memory 18, the acquired music data can be conveniently reproduced on other devices.

上記音楽データは、楽曲の全て若しくは十分なデータが受信された後、メモリ１８から読み出されてオーディオ用復号器１９に送られ復号され、さらにＤ／Ａ変換器２０を経てスピーカ２１若しくはヘッドフォンなどにより音として出力される。なお、Ｄ／Ａ変換器２０及びスピーカ２１については、部品点数を減らすためには音声信号処理系のＤ／Ａ変換器１６及びスピーカ１７と共用しても良いが、通話用の音声信号と楽曲用の音楽信号とでは、一般に必要とされる音質が異なる（例えば音楽用は高品質）ため、それぞれ必要とされる音質に応じて別個に設けることが望ましい。 After all or sufficient data of the music is received, the music data is read from the memory 18, sent to the audio decoder 19, decoded, and further passed through the D / A converter 20 to the speaker 21 or headphones. Is output as sound. The D / A converter 20 and the speaker 21 may be shared with the D / A converter 16 and the speaker 17 of the audio signal processing system in order to reduce the number of components. Since the required sound quality is generally different from that of a music signal (for example, high quality for music), it is desirable to provide them separately according to the required sound quality.

次に、第２の発明としての楽曲名の同定方法について以下に説明する。 Next, a method for identifying a song name as the second invention will be described below.

本発明に係る楽曲同定方法は、大別して、以下に述べるように、楽曲同定の前段階の処理となる楽曲のデータベースへの登録処理と、登録された楽曲データベースを用いた実際の楽曲同定処理の２つの流れにより行われる。 The music identification method according to the present invention can be broadly divided into a music registration process, which is a previous process of music identification, and an actual music identification process using the registered music database, as described below. This is done in two streams.

図３には、本発明に係る楽曲同定方法における上記楽曲のデータベースへの登録処理のためのシステム構成を示す。 FIG. 3 shows a system configuration for registration processing of the music in the database in the music identification method according to the present invention.

図３において、例えばいわゆるコンパクトディスク（ＣＤ）やネットワークなどを介して配布された複数の各楽曲データ３１は、パワースペクトル分析部３３に送られる。パワースペクトル分析部３３は、入力された複数のそれぞれの楽曲データ３１を時間周波数分布に変換し、それぞれ得られた時間周波数分布データを特徴行列生成部３４に送る。 In FIG. 3, a plurality of pieces of music data 31 distributed via, for example, a so-called compact disc (CD) or a network is sent to the power spectrum analysis unit 33. The power spectrum analysis unit 33 converts each of the plurality of input music data 31 into a time frequency distribution, and sends the obtained time frequency distribution data to the feature matrix generation unit 34.

特徴行列生成部３４では、各楽曲に対応する時間周波数分布データを所定の周波数および時間間隔でリサンプリングすることにより、各楽曲にそれぞれ対応した楽曲特徴行列（特徴ベクトル）Ａ_ｆｔを生成する。但し、図４の（ａ）に示すように、上記楽曲特徴行列Ａ_ｆｔの「ｆ」は行列の周波数方向のインデックス（行）、「ｔ」は時間方向のインデックス（列）を表している。これら各楽曲に対応した楽曲特徴行列Ａ_ｆｔのデータは、上記楽曲データ３１と共に配布された各楽曲の楽曲名、演奏家名、楽曲のＩＤ番号などの各楽曲の属性データ３２と対応付けられて、データベース３５に登録される。 The feature matrix generation unit 34 resamples the time-frequency distribution data corresponding to each song at a predetermined frequency and time interval, thereby generating a song feature matrix (feature vector) A _ft corresponding to each song. However, as shown in FIG. 4A, “f” of the music feature matrix A _ft represents an index (row) in the frequency direction of the matrix, and “t” represents an index (column) in the time direction. The data of the music feature matrix A _ft corresponding to each music is associated with the attribute data 32 of each music such as the music name, performer name, music ID number of each music distributed together with the music data 31. Are registered in the database 35.

次に、図５には、上記登録された楽曲データベースを用いた楽曲同定のためのシステム構成を示す。 Next, FIG. 5 shows a system configuration for music identification using the registered music database.

この図５において、上述した第１の発明にかかる携帯電話端末やその他の任意の方法で得られた、例えば数秒程度の楽曲の一部分（前記音楽断片）データ４１は、上記図３と同様のパワースペクトル分析部４３に送られ、時間周波数分布に変換される。また、上記図３と同様の特徴行列生成部４４では、パワースペクトル分析部４３からの時間周波数分布データを所定の周波数および時間間隔でリサンプリングし、音楽断片特徴行列（特徴ベクトル）Ｓ_ｆｕを生成する。但し、図４の（ｂ）に示すように、音楽断片特徴行列Ｓ_ｆｕの「ｆ」は上記と同様の行列の周波数方向のインデックス（行）、「ｕ」は時間方向のインデックス（列）を表す。この音楽断片特徴行列Ｓ_ｆｕのデータは、マッチング部４５へ送られる。なお、前記携帯電話端末において特徴量の抽出まで行うようにした場合は、当該図５のパワースペクトル分析部４３及び特徴行列生成部４４と同様の構成が、特徴量抽出のための構成として前記携帯電話端末の音声データ処理部７内に設けられることになる。一方、携帯電話端末にて音楽データの圧縮符号化のみ行い特徴量の抽出を行わない場合は、この図５のパワースペクトル分析部４３に供給される音楽断片のデータ４１として、前記携帯電話端末にて圧縮符号化されて送信されてきたデータを伸長復号化したデータが用いられることになる。 In FIG. 5, a piece of music (the music fragment) data 41 of, for example, about several seconds obtained by the mobile phone terminal according to the first invention described above or any other method has the same power as in FIG. It is sent to the spectrum analysis unit 43 and converted into a time frequency distribution. Also, the feature matrix generation unit 44 similar to that in FIG. 3 resamples the time frequency distribution data from the power spectrum analysis unit 43 at a predetermined frequency and time interval to generate a music fragment feature matrix (feature vector) _Sfu . To do. However, as shown in FIG. 4B, “f” of the music fragment feature matrix S _fu is an index (row) in the frequency direction of the same matrix as described above, and “u” is an index (column) in the time direction. To express. The data of the music fragment feature matrix _Sfu is sent to the matching unit 45. Note that when the mobile phone terminal performs feature value extraction, the same configuration as the power spectrum analysis unit 43 and the feature matrix generation unit 44 in FIG. It is provided in the voice data processing unit 7 of the telephone terminal. On the other hand, when the mobile phone terminal only performs compression encoding of music data and does not extract features, the mobile phone terminal stores the music fragment data 41 supplied to the power spectrum analysis unit 43 of FIG. Thus, data obtained by decompressing and decoding data that has been compression-encoded and transmitted is used.

マッチング部４５には、上記データベース３５に登録されている複数の楽曲についての楽曲特徴行列Ａ_ｆｔとそれぞれ対応する楽曲名、演奏家名、楽曲のＩＤ番号などの各楽曲の属性データ３２とからなるデータ４２も入力される。当該マッチング部４５は、データベース３５から得られた複数の楽曲特徴行列Ａ_ｆｔすなわち楽曲特徴ベクトルと、上記特徴行列生成部４４から供給された音楽断片特徴行列Ｓ_ｆｕすなわち音楽断片特徴ベクトルとを用いて、後述の方法によりマッチング（ベクトルマッチング）を行い、それらベクトルの類似度を算出し、上記音楽断片特徴ベクトルと楽曲特徴ベクトルとの間の類似度が所定の閾値を越えているとき、当該楽曲特徴ベクトルに対応する楽曲について、その類似度Ｑ、類似度が最大となる時刻（類似度最大時刻）Ｔ、および属性データ３２からなるデータ４６を出力する。 The matching unit 45 includes a song feature matrix A _ft for a plurality of songs registered in the database 35 and attribute data 32 of each song such as a corresponding song name, performer name, and song ID number. Data 42 is also input. The matching unit 45 uses a plurality of music feature matrices A _ft, that is, music feature vectors, obtained from the database 35, and the music fragment feature matrix S _fu that is supplied from the feature matrix generation unit 44, that is, music fragment feature vectors. Then, matching (vector matching) is performed by a method described later, the degree of similarity between these vectors is calculated, and when the degree of similarity between the music fragment feature vector and the music feature vector exceeds a predetermined threshold, the music feature For the music corresponding to the vector, the degree of similarity Q, the time when the degree of similarity becomes maximum (maximum degree of similarity) T, and data 46 consisting of attribute data 32 are output.

ここで、上記マッチング部４５でのマッチングは、相互相関を用いて以下のように行う。 Here, the matching in the matching unit 45 is performed as follows using the cross-correlation.

図４に示すように、楽曲特徴行列Ａ_ｆｔは、複数の楽曲全曲分の特徴行列であり、音楽断片特徴行列Ｓ_ｆｕは、ある楽曲の一部分の（劣化した）特徴行列となる。原理的には、これらの行列の相互相関が最大となる時刻の相関値を類似度とすればよいが、通例、音楽断片特徴行列Ｓ_ｆｕには雑音が加わっているため、次のような操作を行いノイズ耐性を改善する。 As shown in FIG. 4, the music feature matrix A _ft is a feature matrix for all of a plurality of music pieces, and the music fragment feature matrix S _fu is a (deteriorated) feature matrix of a part of a music piece. In principle, the correlation value at the time when the cross-correlation between these matrices becomes maximum may be used as the similarity, but since noise is usually added to the music fragment feature matrix _Sfu , the following operation is performed. To improve noise immunity.

先ず、音楽断片特徴行列Ｓ_ｆｕを、次式（１）により変換する。 First, the music fragment feature matrix S _fu is converted by the following equation (1).

Ｓ'_ｆｕ＝Ｍ_ｆｕ（Ｓ_ｆｕ−Ｂ_ｆｕ） (1)
但し、式中のＢ_ｆｕは、定常的なノイズ成分を減ずるための定数行列であり、例えば各周波数成分における最小値などによって作成される。また、式中のＭ_ｆｕは図４の（ｃ）〜図４の（ｅ）に示すように、音楽断片特徴行列Ｓ_ｆｕの一部（各図中ｍで示す部分）をマスクする行列である。図４の（ｃ）に示すマスク行列Ｍ_ｆｕは低周波成分をマスクするための行列であり、これによれば、例えば交通騒音など低周波ノイズが強い場合に、それら低周波ノイズを音楽断片特徴行列Ｓ_ｆｕから除去するのに有効となる。また、図４の（ｄ）に示すマスク行列Ｍ_ｆｕは時間マスクを行うための行列であり、この時間マスクにより、音楽断片特徴行列Ｓ_ｆｕから例えば音楽成分が強い時刻のみを取り出すことで、安定なマッチング処理が実現可能となる。また、図４の（ｅ）に示すマスク行列Ｍ_ｆｕは音声成分をマスクするための行列であり、音楽断片特徴行列Ｓ_ｆｕから音声成分が最も多く含まれる周波数範囲（例えば１００Ｈｚ〜１ｋＨｚ）を除去することで、例えば音声混入があるような場合にその音声成分を除去することが可能となる。この他にも様々なマスクパターンが考えられるが、これらのマスクを切り替えて用い、最も類似度の高いものを選択することで、ノイズ耐性の強い安定なマッチングが行われる。 S ′ _fu = M _fu (S _fu −B _fu ) (1)
However, B _fu in the equation is a constant matrix for reducing stationary noise components, and is created by, for example, the minimum value in each frequency component. M _fu in the equation is a matrix that masks a part of the music fragment feature matrix S _fu (the part indicated by m in each figure) as shown in FIGS. 4C to 4E. . The mask matrix M _fu shown in FIG. 4C is a matrix for masking low frequency components. According to this, when low frequency noise such as traffic noise is strong, the low frequency noise is converted into music fragment features. This is effective for removing from the matrix _Sfu . Also, the mask matrix M _fu shown in FIG. 4D is a matrix for performing time masking. By using this time mask, for example, only the time when the music component is strong is extracted from the music fragment feature matrix S _fu. Matching processing can be realized. Also, a mask matrix M _fu shown in FIG. 4E is a matrix for masking speech components, and a frequency range (for example, 100 Hz to 1 kHz) containing the most speech components is removed from the music fragment feature matrix S _fu. Thus, for example, when there is audio mixing, the audio component can be removed. Various other mask patterns are conceivable, but stable matching with high noise resistance is performed by switching these masks and selecting the one with the highest similarity.

次に、上記式（１）の変換を用い、各時刻での類似度を次式（２）により計算する。 Next, using the transformation of the above equation (1), the similarity at each time is calculated by the following equation (2).

さらに、各時刻の類似度のうち最大のもの及びその時刻によって、式（３）、式（４）のように、楽曲と音楽断片の類似度Ｑ及び類似度最大時刻Ｔを求める。 Further, the similarity Q and similarity maximum time T between the music piece and the music fragment are obtained by the maximum similarity among the times and the time as shown in the equations (3) and (4).

Ｑ＝max_ｔＲ(t) (3)
Ｔ＝argmax_ｔＲ(t) (4)
マッチング部４５では、図６に示す流れで、以上説明したマッチング方法を実現する。 Q = max _t R (t) (3)
T = argmax _t R (t) (4)
The matching unit 45 implements the matching method described above in the flow shown in FIG.

図６において、マッチング部４５には、上述した低周波成分のマスクや時間マスク、音声成分のマスク等の、必要と思われる各種のマスクパターンが予め用意されており、マッチング処理に先立ち、テップＳ５１として、そのうち一つのマスクパターンが選択される。 In FIG. 6, the matching unit 45 is prepared in advance with various mask patterns that may be necessary, such as the low-frequency component mask, the time mask, and the voice component mask described above. As a result, one of the mask patterns is selected.

次に、マッチング部４５では、ステップＳ５２として、全てのマスクパターンについて処理したか否かの判定が行われ、未だ全てのマスクパターンの処理が終了していないと判定した場合はステップＳ５３の処理に進み、用意した全てのマスクパターンの処理が終了したと判定した場合はステップＳ５５の処理に進む。 Next, in step S52, the matching unit 45 determines whether or not all mask patterns have been processed. If it is determined that all mask patterns have not yet been processed, the process proceeds to step S53. If it is determined that the processing of all the prepared mask patterns has been completed, the process proceeds to step S55.

ステップＳ５２において未だ全てのマスクパターンの処理が終了していないと判定され、ステップＳ５３の処理に進むと、マッチング部４５では、ステップＳ５１で選択されたマスクパターンを用いて相関関数Ｒ(t)を計算し、次のステップＳ５４において、類似度Ｑ、類似度最大時刻Ｔを計算し、得られた値を保存する。その後は、ステップＳ５１に戻り、再びマスクパターンを選択する。 If it is determined in step S52 that the processing of all the mask patterns has not been completed yet and the process proceeds to step S53, the matching unit 45 calculates a correlation function R (t) using the mask pattern selected in step S51. In the next step S54, the similarity Q and the maximum similarity time T are calculated, and the obtained values are stored. Thereafter, the process returns to step S51, and a mask pattern is selected again.

一方、ステップＳ５２において、用意した全てのマスクパターンの処理が終了したと判定され、ステップＳ５５の処理に進むと、マッチング部４５では、上記類似度Ｑの中で最大のものをもって、その楽曲と入力された音楽断片との類似度とする。 On the other hand, in step S52, it is determined that the processing of all the prepared mask patterns has been completed, and when the processing proceeds to step S55, the matching unit 45 inputs the music piece having the largest similarity score Q. The degree of similarity to the music fragment that was made.

次に、上記第１の発明に係る携帯電話端末と上記第２の発明に係る楽曲同定のシステムを用いた、第３の発明である電子音楽同定配信サービス方法及びそのシステムについて、以下に説明する。このシステムの利点は、一つの携帯端末（本実施の形態の携帯電話端末）から楽曲の調査、購入、再生、代金支払いなどが全て行われる点である。 Next, an electronic music identification distribution service method and system thereof according to a third invention using the mobile phone terminal according to the first invention and the music identification system according to the second invention will be described below. . The advantage of this system is that the music is investigated, purchased, played back, paid, etc., from a single mobile terminal (the mobile phone terminal of the present embodiment).

図７には第３の発明である電子音楽同定配信サービスが適用される実施の形態のシステム構成を示し、図８〜図１１を用いて当該システムの運用方法を説明する。なお、図７中の指示符号Ｓ７１，Ｓ７２，Ｓ７５，Ｓ７６，Ｓ７８，Ｓ７９，Ｓ８２〜Ｓ８５，Ｓ９０〜Ｓ９９は、図８〜図１１中の対応する指示符号で表されるステップの処理が行われることを示している。図８には予め行われる楽曲登録段階での処理の流れを示し、図９には楽曲同定段階での処理の流れを、図１０には楽曲試聴段階での処理の流れを、図１１には楽曲購入段階での処理の流れを示す。 FIG. 7 shows a system configuration of an embodiment to which the electronic music identification distribution service according to the third aspect of the invention is applied, and an operation method of the system will be described with reference to FIGS. In addition, in the instruction codes S71, S72, S75, S76, S78, S79, S82 to S85, and S90 to S99 in FIG. 7, the processing of the steps represented by the corresponding instruction codes in FIGS. It is shown that. FIG. 8 shows a flow of processing at the music registration stage performed in advance, FIG. 9 shows a flow of processing at the music identification stage, FIG. 10 shows a flow of processing at the music trial listening stage, and FIG. The flow of processing at the music purchase stage is shown.

図８に示す楽曲登録段階において、先ずステップＳ７１として、楽曲販売者側の楽曲配信システム６４は、楽曲同定に必要なデータ（前記複数の楽曲データとその属性データ）を楽曲同定者側の楽曲同定システム６３に送り、それらデータを受け取った楽曲同定システム６３では、その楽曲同定用の楽曲データから前述したように特徴行列を求め、属性データと対応つけて前記データベースに登録する。また、ステップＳ７２として、楽曲配信システム６４は、楽曲の試聴時に必要なデータ（複数の試聴用の楽曲データと各試聴用の楽曲データのＩＤ等）を楽曲同定システム６３に送り、当該楽曲同定システム６３では、その楽曲試聴用データをデータベースに登録する。なお、ステップＳ７１とＳ７２の処理は同時に行ってもよく、また、ステップＳ７１の処理後にステップＳ７２の処理を行っても良い。 In the music registration stage shown in FIG. 8, first, in step S71, the music distribution system 64 on the music seller side transmits the data (the plurality of music data and attribute data thereof) necessary for music identification to the music identification person on the music identification side. The music identification system 63, which has been sent to the system 63 and received the data, obtains a feature matrix from the music data for music identification as described above and registers it in the database in association with the attribute data. In step S72, the music distribution system 64 sends data necessary for the trial listening of the music (a plurality of music data for trial listening and IDs of the music data for each trial listening) to the music identification system 63, and the music identification system 63 In 63, the music preview data is registered in the database. Note that the processing of steps S71 and S72 may be performed simultaneously, or the processing of step S72 may be performed after the processing of step S71.

次に、図９に示す楽曲同定段階において、先ずステップＳ７３として、楽曲購入者側の携帯電話端末６１は、当該端末の操作者によって前記録音ボタン３がオン操作されると、そのオン操作時点で例えば放送番組中やコマーシャルメッセージ中に演奏されたり街頭で流されている楽曲の一部分(音楽断片)を録音する。また、ステップＳ７４として、携帯電話端末６１は、前記音楽データ処理部７により、その音楽断片のデータに対して前述のように圧縮符号化若しくは特徴抽出を行う。 Next, in the music identification stage shown in FIG. 9, first, as step S73, when the recording button 3 is turned on by the operator of the music purchaser, the mobile phone terminal 61 on the music purchaser side is turned on. For example, a part (music fragment) of a piece of music played in a broadcast program or a commercial message or being played on the street is recorded. In step S74, the cellular phone terminal 61 uses the music data processing unit 7 to perform compression encoding or feature extraction on the music fragment data as described above.

次に、ステップＳ７５として、携帯電話端末６１は、携帯電話運用者側の携帯電話運用システム６２に対して、キー入力部２からのキー入力に応じて生成された楽曲同定要求と共に、上記特徴抽出されたデータを送信し、さらに、ステップＳ７６として、携帯電話運用システム６２は、その楽曲同定要求と上記特徴抽出されたデータをそのまま楽曲同定者側の楽曲同定システム６３に送信する。 Next, in step S75, the mobile phone terminal 61 extracts the feature extraction request together with the music identification request generated in response to the key input from the key input unit 2 to the mobile phone operating system 62 on the mobile phone operator side. Further, in step S76, the mobile phone operation system 62 transmits the music identification request and the feature-extracted data to the music identification system 63 on the music identifier side as it is.

上記楽曲同定要求と上記圧縮符号化データ若しくは特徴抽出されたデータを受け取った楽曲同定システム６３は、ステップＳ７７として、前記第２の発明で説明したようにして特徴ベクトルを用いた楽曲同定の処理（候補となる楽曲の検索）を行う。なお、携帯電話端末６１側の音楽データ処理部７において、音楽断片の録音から特徴抽出までの処理を行っている場合、上記楽曲同定システム６３では、携帯電話端末６１から送られてきた特徴ベクトルを用いた前記マッチングにより楽曲同定を行うことになる。また、携帯電話端末６１側の音楽データ処理部７において、音楽断片の録音から音楽データの圧縮符号化処理までしか行っていない場合、上記楽曲同定システム６３では、上記携帯電話端末６１から送られてくる圧縮符号化された音楽データを復号してから前述した特徴抽出を行い、さらにマッチングにより楽曲同定を行うことになる。上記楽曲同定処理が終了すると、当該楽曲同定システム６３は、ステップＳ７８として、その楽曲同定処理により得られた候補となる楽曲に関するデータ（楽曲名や演奏家名、楽曲のＩＤ番号など）を、携帯電話運用システム６２に送信し、さらに、ステップＳ７９として、携帯電話運用システム６２は、その候補楽曲についてのデータをそのまま携帯電話端末６１に送信する。 The music identification system 63 that has received the music identification request and the compression-encoded data or feature-extracted data, as step S77, performs music identification processing using feature vectors as described in the second invention ( Search for candidate music). When the music data processing unit 7 on the mobile phone terminal 61 side performs processing from recording of music pieces to feature extraction, the music identification system 63 uses the feature vector sent from the mobile phone terminal 61. Music identification is performed by the used matching. When the music data processing unit 7 on the mobile phone terminal 61 side only performs recording from recording of music pieces to compression encoding processing of music data, the music identification system 63 sends it from the mobile phone terminal 61. After the compression-coded music data is decoded, the above-described feature extraction is performed, and music identification is further performed by matching. When the music identification process ends, the music identification system 63 carries data (music name, performer name, music ID number, etc.) relating to the candidate music obtained by the music identification process as step S78. The mobile phone operation system 62 transmits the data about the candidate music to the mobile phone terminal 61 as it is, in step S79.

上記候補楽曲についてのデータを受け取った携帯電話端末６１では、ステップＳ８０として、当該端末の操作者（楽曲購入者）により前記キー入力部２に対して候補楽曲を試聴することの要否を指示するための所定の入力操作がなされたか否かの判断を行う。このステップＳ８０において、操作者から候補楽曲を試聴することの指示入力がなされた場合（試聴する場合（Ｙ））は、図１０に示す楽曲試聴段階の処理へ進み、一方、操作者から楽曲を試聴しないことの指示入力がなされた場合（試聴しない場合（Ｎ））は、処理を終了する。 In the mobile phone terminal 61 that has received the data about the candidate music, in step S80, the operator (music purchaser) of the terminal instructs the key input unit 2 as to whether or not to listen to the candidate music. It is determined whether or not a predetermined input operation has been performed. In this step S80, when the operator inputs an instruction to audition the candidate music (when auditioning (Y)), the process proceeds to the music audition stage shown in FIG. When the instruction input not to audition is made (when not auditioning (N)), the process ends.

図１０に示す楽曲試聴段階へ進むと、先ず、ステップＳ８１として、携帯電話端末６１では、端末操作者（楽曲購入者）により前記キー入力部２に対して試聴用の楽曲を選択するための所定の入力操作がなされると、その入力操作に応じた試聴用の楽曲の選択を行う。 When the process proceeds to the music trial listening stage shown in FIG. 10, first, in step S81, in the mobile phone terminal 61, a predetermined operator for selecting a music for trial listening to the key input unit 2 by the terminal operator (music purchaser). When the input operation is performed, the music for trial listening according to the input operation is selected.

次に、ステップＳ８２として、携帯電話端末６１は、携帯電話運用システム６２に対して、前記キー入力部２からのキー入力に応じて生成された楽曲試聴要求を送信し、さらに、携帯電話運用システム６２は、ステップＳ８３として、その楽曲試聴要求をそのまま楽曲同定システム６３に送信する。 Next, in step S82, the mobile phone terminal 61 transmits a music preview request generated in response to the key input from the key input unit 2 to the mobile phone operation system 62, and further the mobile phone operation system. 62 transmits the music preview request as it is to the music identification system 63 as step S83.

上記楽曲試聴要求を受け取った楽曲同定システム６３は、ステップＳ８４として、上記楽曲試聴要求に応じた試聴用データを携帯電話運用システム６２へ送信し、さらに、携帯電話運用システム６２は、ステップＳ８５として、その試聴用データをそのまま携帯電話端末６１に送信する。このときの携帯電話端末６１では、上記試聴用データを前記メモリ１８に蓄積する。 The music identification system 63 that has received the music trial listening request transmits the data for trial listening corresponding to the music trial listening request to the mobile phone operation system 62 in step S84, and the mobile phone operation system 62 further executes step S85. The trial listening data is transmitted to the mobile phone terminal 61 as it is. At this time, the cellular phone terminal 61 stores the trial listening data in the memory 18.

次に、ステップＳ８６として、上記試聴用データの提供を受けた携帯電話端末６１では、当該端末の操作者（楽曲購入者）により前記キー入力部２に対して当該試聴用の楽曲を試聴する旨の所定の入力操作がなされると、その入力操作に応じて前記メモリ１８に蓄積された試聴用の楽曲データをオーディオ用復号化器１９に送り、当該試聴用の楽曲データを復号し、さらにＤ／Ａ変換器２０を介してスピーカ２１に送る。これにより、当該携帯電話端末６１の操作者（楽曲購入者）は、上記試聴用の楽曲を聴くことができる。 Next, in step S86, in the mobile phone terminal 61 that has received the provision of the data for trial listening, the operator (music purchaser) of the terminal trials the music for trial listening to the key input unit 2. Is sent to the audio decoder 19 in response to the input operation, and the sample music data is decoded, and further, D The signal is sent to the speaker 21 via the / A converter 20. Thereby, the operator (music purchaser) of the mobile phone terminal 61 can listen to the trial music.

その後、携帯電話端末６１では、ステップＳ８７として、当該端末の操作者（楽曲購入者）により前記キー入力部２に対して他の楽曲を試聴することの要否を指示するための所定の入力操作がなされたか否かの判断を行う。このステップＳ８７において、操作者から他の楽曲を試聴することの指示入力がなされた場合（試聴する場合（Ｙ））は、ステップＳ８１の試聴曲の選択の処理に戻り、一方、操作者から他の楽曲を試聴しないことの指示入力がなされた場合（試聴しない場合（Ｎ））は、ステップＳ８８の処理に進む。 Thereafter, in step S87, the mobile phone terminal 61 performs a predetermined input operation for instructing the key input unit 2 whether to listen to another piece of music by the operator (music purchaser) of the terminal. It is determined whether or not In this step S87, when the operator inputs an instruction to audition another song (in the case of audition (Y)), the process returns to the process of selecting the audition song in step S81, while the operator makes another If the instruction input not to listen to the song is performed (when not listening (N)), the process proceeds to step S88.

ステップＳ８８の処理に進むと、携帯電話端末６１では、操作者（楽曲購入者）によりキー入力部２に対してその試聴した楽曲を購入することの要否を指示するための所定の入力操作がなされたか否かの判断を行う。このステップＳ８８において、操作者から当該試聴した楽曲を購入することの指示入力がなされた場合（Ｙ）は、図１１に示す楽曲購入段階へ進み、一方、操作者から当該試聴した楽曲を購入しないことの指示入力がなされた場合（Ｎ）は、処理を終了する。 When the processing proceeds to step S88, the mobile phone terminal 61 performs a predetermined input operation for instructing the key input unit 2 whether or not it is necessary to purchase the sampled music by the operator (music purchaser). A determination is made as to whether it has been made. In this step S88, when the operator inputs an instruction to purchase the sample music (Y), the process proceeds to the music purchase stage shown in FIG. 11, while the operator does not purchase the sample music. If the instruction input is made (N), the process is terminated.

図１１に示す楽曲購入段階へ進むと、先ずステップＳ８９として、携帯電話端末６１では、端末操作者（楽曲購入者）によりキー入力部２に対して購入楽曲を選択するための所定の入力操作がなされると、その入力操作に応じた楽曲選択を行う。 When the process proceeds to the music purchase stage shown in FIG. 11, first, in step S89, in the mobile phone terminal 61, the terminal operator (music purchaser) performs a predetermined input operation for selecting the purchased music on the key input unit 2. If it is made, music selection is performed according to the input operation.

次に、ステップＳ９０として、携帯電話端末６１は、上記選択された楽曲の購入意思を示す楽曲購入要求を携帯電話運用システム６２に対して送信し、さらに、ステップＳ９１として、携帯電話運用システム６２は、その楽曲購入要求をそのまま楽曲販売者側の楽曲配信システム６４に送信する。 Next, in step S90, the mobile phone terminal 61 transmits a music purchase request indicating the purchase intention of the selected music to the mobile phone operation system 62. Further, in step S91, the mobile phone operation system 62 The music purchase request is transmitted to the music distribution system 64 on the music seller side as it is.

上記楽曲購入要求を受け取った楽曲販売システム６４は、ステップＳ９２として、当該楽曲購入要求に応じた楽曲データを携帯電話運用システム６２へ送信し、さらに、携帯電話運用システム６２は、ステップＳ９３として、その楽曲データをそのまま携帯電話端末６１に送信する。なお、このときの携帯電話端末６１では、上記楽曲データがメモリ１８に蓄積される。その後、当該端末操作者（楽曲購入者）によりキー入力部２に対して当該楽曲を再生する旨の所定の入力操作がなされると、携帯電話端末６１は、その入力操作に応じて前記メモリ１８に蓄積された楽曲データをオーディオ用復号化器１９に送り、当該楽曲データを復号し、さらにＤ／Ａ変換器２０を介してスピーカ２１に送る。これにより、当該携帯電話端末６１の操作者（楽曲購入者）は、上記購入した楽曲を聴くことができる。 The music sales system 64 that has received the music purchase request transmits the music data corresponding to the music purchase request to the mobile phone operation system 62 as step S92. Further, the mobile phone operation system 62 receives the music purchase request as step S93. The music data is transmitted to the mobile phone terminal 61 as it is. Note that the music data is stored in the memory 18 in the mobile phone terminal 61 at this time. Thereafter, when the terminal operator (music purchaser) performs a predetermined input operation for reproducing the music on the key input unit 2, the mobile phone terminal 61 responds to the input operation with the memory 18. Is sent to the audio decoder 19, the music data is decoded, and further sent to the speaker 21 via the D / A converter 20. Thereby, the operator (music purchaser) of the mobile phone terminal 61 can listen to the purchased music.

ここで、上述のように楽曲の同定や配信が行われた場合、その楽曲の同定や購入の代金は、楽曲購入者が楽曲同定者や楽曲配信者に直接或いは銀行振込、インターネット経由のクレジットカード精算等により支払うことも可能であるが、本発明実施の形態の電子音楽同定配信システムでは、代金支払いの一括化のために、上記携帯電話運用者を通じて支払うようにしている。 Here, when music identification and distribution are performed as described above, the music purchaser pays the music identification person or music distributor directly or by bank transfer or credit card via the Internet. Although it is possible to pay by settlement, etc., in the electronic music identification distribution system according to the embodiment of the present invention, the payment is made through the mobile phone operator in order to collect the payment.

先ず、ステップＳ９４として、楽曲同定者側の楽曲同定システム６３は、前述した楽曲の同定のための代金請求情報を、電話運用者側の携帯電話運用システム６２に対して送る。このときの携帯電話運用システム６２の電話運用者は、ステップＳ９５として、その代金を楽曲購入者に代行して上記楽曲同定者に支払う。なお、この場合の代金支払い方法としては、例えば電話運用者の銀行口座から楽曲同定者の銀行口座に代金が振り込まれるような電子決済処理を行い、その決済が行われたことを示す情報を、携帯電話運用システム６２或いは銀行より、楽曲同定者側の楽曲同定システム６３に送るようにしても良い。 First, as step S94, the music identification system 63 on the music identifier side sends the above-described charge request information for identifying the music to the mobile phone operation system 62 on the telephone operator side. The telephone operator of the mobile phone operation system 62 at this time pays the music identifier to the music identifier on behalf of the music purchaser in step S95. In addition, as a payment method in this case, for example, electronic payment processing is performed such that the price is transferred from the bank account of the telephone operator to the bank account of the music identifier, and information indicating that the payment has been performed, You may make it send to the music identification system 63 by the music identification person side from the mobile telephone operation system 62 or a bank.

同様に、ステップＳ９６として、楽曲販売者側の楽曲配信システム６４は、前述のように配信した楽曲の代金請求情報を、電話運用者側の携帯電話運用システム６２に対して送る。このときの携帯電話運用システム６２の電話運用者は、ステップＳ９７として、その代金を楽曲購入者に代行して上記楽曲配信者に支払う。なお、この場合の代金支払い方法としては、例えば電話運用者の銀行口座から楽曲配信者の銀行口座に代金が振り込まれるような電子決済処理を行い、その決済が行われたことを示す情報を、携帯電話運用システム６２或いは銀行より、楽曲配信者側の楽曲配信システム６４に送るようにしても良い。 Similarly, in step S96, the music distribution system 64 on the music seller side sends the billing information for the music distributed as described above to the mobile phone operation system 62 on the telephone operator side. At this time, the telephone operator of the mobile phone operation system 62 pays the music distributor to the music distributor on behalf of the music purchaser in step S97. In addition, as a payment method in this case, for example, an electronic payment process is performed in which the price is transferred from the bank account of the telephone operator to the bank account of the music distributor, and information indicating that the payment has been performed, You may make it send to the music distribution system 64 by the music distributor side from the mobile telephone operation system 62 or a bank.

次に、ステップＳ９８として、携帯電話運用システム６２は、上記楽曲同定者や楽曲配信者に対して上記代行して支払った金額を通話料などに加算して、上記携帯電話端末６１の使用者（楽曲購入者）に対して請求する。その後、ステップＳ９９として、当該携帯電話端末６１の使用者（楽曲購入者）は、上記楽曲同定代金や楽曲配信代金が加算された通話料を、上記電話運用者に対して、直接或いは銀行振込、インターネット経由のクレジットカード精算等により一括或いは分割して支払う。なお、携帯電話端末６１の使用者（楽曲購入者）と電話運用者との間の通話料の支払いについても、上述同様に、例えば携帯電話端末６１の使用者（楽曲購入者）の銀行口座から電話運用者の銀行口座に代金が振り込まれるような電子決済処理を行い、その決済が行われたことを示す情報を、携帯電話端末６１或いは銀行より、電話運用者側の携帯電話運用システム６２に送るようなことも可能である。 Next, as step S98, the mobile phone operation system 62 adds the amount paid on behalf of the music identifier and music distributor to the call charge and the like, and the user of the mobile phone terminal 61 ( To the music purchaser). Thereafter, as step S99, the user (music purchaser) of the mobile phone terminal 61 gives the telephone operator the telephone charge to which the music identification fee and the music distribution fee have been added, directly or by bank transfer, Pay by lump sum or by credit card payment via the Internet. In addition, as for the payment of the call charge between the user (music purchaser) of the mobile phone terminal 61 and the telephone operator, for example, from the bank account of the user (music purchaser) of the mobile phone terminal 61 as described above. Electronic payment processing is performed in which the money is transferred to the bank account of the telephone operator, and information indicating that the payment has been made is sent from the mobile phone terminal 61 or the bank to the mobile phone operating system 62 on the telephone operator side. It is also possible to send it.

以上により、代金精算の処理が完了する。 This completes the payment settlement process.

本実施の形態では、説明の便宜上、楽曲同定者と楽曲販売者を別物として説明したが、もちろん、楽曲販売者自身が同定サービスを行っても同様のことが実現される。また、説明の便宜上、試聴用の楽曲データは、同定者が提供するよう説明したが、この部分は販売者が行っても本質的な差異が生じないのは明らかである。 In the present embodiment, for the sake of convenience of explanation, the music identifier and the music seller have been described as separate items. Of course, the same thing can be realized even if the music seller himself performs the identification service. In addition, for convenience of explanation, it has been described that the music data for trial listening is provided by the identifier, but it is clear that there is no substantial difference in this part even if performed by the seller.

さらに、本実施の形態では、楽曲データの配信を受けた後に、同定代金と楽曲購入代金を支払うようにしているが、楽曲同定のみが行われ、楽曲購入が行われなかった場合には、上記楽曲同定の代金のみを請求することも可能であるし、また、楽曲同定については無料サービスとすることもできる。 Furthermore, in this embodiment, after receiving the distribution of the music data, the identification price and the music purchase price are paid, but only the music identification is performed and the music purchase is not performed. It is possible to charge only the price for music identification, and it is also possible to provide a free service for music identification.

また、本発明実施の形態では、携帯電話端末を例に挙げているが、本発明は、いわゆるパームトップ型コンピュータのような携帯型情報処理端末であっても適用可能である。但し、本発明では、上述したように音楽の録音、楽曲同定、楽曲配信、購入、再生などの全てを一台の携帯電話端末において実現することにより、携帯型情報処理端末を不要とし、さらに、いわゆる携帯型のオーディオ記録再生装置をも不要としている。 In the embodiment of the present invention, a mobile phone terminal is taken as an example. However, the present invention can be applied to a portable information processing terminal such as a so-called palmtop computer. However, in the present invention, as described above, the recording of music, the identification of music, the distribution of music, the purchase, the reproduction, etc. are all realized in one mobile phone terminal, thereby eliminating the need for a portable information processing terminal, A so-called portable audio recording / reproducing apparatus is also unnecessary.

以上説明したように、本発明実施の形態によれば、テレビジョン放送やラジオ放送等、街頭放送等により流れている音楽の一部分から、楽曲候補の検索を行うことができ、さらに、該楽曲候補の試聴、楽曲名の同定、楽曲の購入、楽曲の再生を、一台の携帯電話端末により行うことが可能となっている。すなわち、本発明実施の形態によれば、テレビジョン放送やラジオ放送等、街頭放送等により流れている音楽の調査を容易に行え、その楽曲を容易に入手可能となり、したがって、購入者は楽曲を同定できずに購入をあきらめることが減り、また購入のための時間や労力を著しく節約することが可能となる。 As described above, according to the embodiment of the present invention, it is possible to search for music candidates from a part of music that is played by street broadcasting or the like such as television broadcasting or radio broadcasting. , Listening to music names, purchasing music, and playing music can be performed with a single mobile phone terminal. That is, according to the embodiment of the present invention, it is possible to easily investigate music flowing through street broadcasting or the like such as television broadcasting or radio broadcasting, and the music can be easily obtained. Giving up purchases without being identified can be reduced, and the time and labor required for purchases can be significantly reduced.

以上の説明からも明らかなように、本発明によれば、携帯電話端末において、音響電気変換手段を介して取り込まれた音楽データを録音し、その録音された音楽データを圧縮符号化、若しくは、録音された音楽データから当該音楽の楽曲名を同定するための特徴量を抽出して楽曲同定・配信側に送り、楽曲同定・配信側において、その圧縮符号化若しくは抽出された特徴量から楽曲の候補を検索し、その候補の楽曲の試聴用データを携帯電話端末に送り、携帯電話端末での試聴の結果、購入の決定された楽曲を、楽曲同定・配信側から携帯電話端末に送り、携帯電話運用側において、試聴した楽曲と購入した楽曲に応じた課金を行うことにより、例えば演奏、再生、放送などがされている楽曲の一部断片であっても、その楽曲を容易に同定及び入手（例えば購入）可能であり、また、その楽曲の代金の支払い、楽曲の再生等をも同時に可能となる。 As is clear from the above description, according to the present invention, in the mobile phone terminal, the music data taken in via the acoustoelectric conversion means is recorded, and the recorded music data is compressed and encoded, or The feature quantity for identifying the music title of the music is extracted from the recorded music data and sent to the music identification / distribution side. The music identification / distribution side extracts the feature quantity from the compression-coded or extracted feature quantity. Search for candidates, send trial listening data of the candidate songs to the mobile phone terminal, send the songs that have been purchased as a result of the trial listening on the mobile phone terminal from the song identification / distribution side to the mobile phone terminal, For example, even if it is a partial piece of music that is being played, played, or broadcast, for example, the music can be easily identified and identified by charging on the telephone operation side according to the music that was auditioned and purchased. Available (for example purchase) is possible, also, payment of the price of the music, it is also possible at the same time playback, and the like of the music.

１マイクロフォン、２キー入力部、３録音ボタン、４Ａ／Ｄ変換器、５スイッチ、６音声用符号化器、７音楽データ処理部、８マルチプレクサ、９変調器、１０一時記憶部、１１復調器、１２デマルチプレクサ、１３表示処理部、１４ディスプレイ、１５音声用復号化器、１６，２０Ｄ／Ａ変換器、１７，２１スピーカ、１８メモリ、１９オーディオ用復号化器、３３，４３パワースペクトル分析部、３４，４４特徴行列生成部、３５データベース、４５マッチング部、６１携帯電話端末、６２携帯電話運用システム、６３楽曲同定システム、６４楽曲配信システム DESCRIPTION OF SYMBOLS 1 Microphone, 2 Key input part, 3 Recording button, 4 A / D converter, 5 Switch, 6 Voice encoder, 7 Music data processing part, 8 Multiplexer, 9 Modulator, 10 Temporary storage part, 11 Demodulator , 12 Demultiplexer, 13 Display processing unit, 14 Display, 15 Audio decoder, 16, 20 D / A converter, 17, 21 Speaker, 18 Memory, 19 Audio decoder, 33, 43 Power spectrum analysis Unit, 34, 44 feature matrix generation unit, 35 database, 45 matching unit, 61 mobile phone terminal, 62 mobile phone operation system, 63 music identification system, 64 music distribution system

Claims

Music fragment feature quantity extracting means for extracting music piece feature quantities for identifying candidate song names from music fragment data of music data;
A music identification apparatus comprising: identification means for identifying a candidate music name from among already registered music based on the music fragment feature value extracted by the music fragment feature value extraction means.

2. The music identification apparatus according to claim 1, wherein the music fragment feature amount extraction unit converts the music fragment data into a time-frequency distribution and extracts a music fragment feature amount.

It has a music database that stores and manages the already registered music,
The music identification device according to claim 2, wherein the music database stores music feature quantities extracted from a time frequency distribution of a plurality of music.

The music fragment feature amount extraction means extracts the music fragment feature matrix corresponding to the music fragment data as the music fragment feature amount by resampling the data of the time frequency distribution at a predetermined frequency and time interval,
4. The music identification apparatus according to claim 3, wherein the music database stores a music feature matrix obtained by resampling data of time frequency distribution of a plurality of music at a predetermined frequency and time interval as the music feature amount. .

The identification means includes feature matrix conversion means for converting the music fragment feature matrix into an identification feature matrix,
The feature matrix conversion means performs at least one of a process of subtracting a constant matrix for reducing a stationary noise component and a process of masking the music fragment feature matrix with a mask matrix. 5. The music identification apparatus according to claim 4, wherein a plurality of masks including at least one of a matrix for masking low frequency components, a matrix for performing time masking, and a matrix for masking audio components are used by switching.

The identification means calculates the similarity with the feature matrix for identification obtained from the feature matrix conversion means while shifting the time with respect to the music feature matrix of each music registered in the music database, The music identification device according to claim 5, wherein the similarity between the music and the music fragment and the maximum time of similarity are obtained from the maximum similarity at each time and the time.

5. The music identification apparatus according to claim 4, wherein the music feature matrix data is registered in the music database in association with the music name, performer name, and music ID number of the music from which the music feature matrix is extracted. .

A music fragment feature extraction step for extracting a music fragment feature for identifying a candidate song name from the music fragment data of the music data;
A music identification method comprising: an identification step of identifying a candidate song name from already registered songs based on the music fragment feature value extracted by the music fragment feature value extraction step.

Recording means for recording music data;
Playback means for playing music from the music data;
Music fragment feature amount extracting means for extracting music fragment feature amounts for identifying candidate song names from the music fragment data of the music data via the mobile phone operation system;
Based on the music fragment feature quantity extracted by the music fragment feature quantity extraction means, an identification means for identifying a candidate song name from among already registered songs;
A music identification distribution device comprising: transmission means for transmitting the identified music based on the identification means.

A recording process for recording music data;
A playback process for playing music from the music data;
A music fragment feature extraction step for extracting music fragment feature values for identifying candidate song names from the music fragment data of the music data via the mobile phone operation system;
Based on the music fragment feature value extracted by the music fragment feature value extraction step, an identification step for identifying a candidate song name from among already registered songs;
A music identification distribution method comprising: a transmission step of transmitting the music identified by the identification step.