JP4607659B2

JP4607659B2 - Music search apparatus and music search method

Info

Publication number: JP4607659B2
Application number: JP2005144355A
Authority: JP
Inventors: 真内部
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2005-05-17
Filing date: 2005-05-17
Publication date: 2011-01-05
Anticipated expiration: 2025-05-17
Also published as: JP2006323007A

Description

本発明は、楽曲データベースに記憶された楽曲データを検索する楽曲検索装置および楽曲検索方法関し、特に楽曲データから抽出した特徴データを用いて楽曲データベースに記憶された楽曲データを検索する楽曲検索装置および楽曲検索方法に関する。 The present invention relates to a music search apparatus and a music search method for searching for music data stored in a music database, and in particular, a music search apparatus for searching music data stored in a music database using feature data extracted from the music data and The present invention relates to a music search method.

近年、ＨＤＤ等の大容量の記憶手段が開発され、大容量の記憶手段に大量の楽曲データを記憶させることができるようになっている。大容量の記憶手段に記憶されている大量の楽曲データの検索は、アーティスト名や曲名、その他のキーワード等の書誌データを用いて行うのが一般的であるが、書誌データで検索した場合には、楽曲が持っている情感を考慮することができず、印象の異なる楽曲が検索される可能性がある。 In recent years, a large-capacity storage means such as an HDD has been developed, and a large amount of music data can be stored in the large-capacity storage means. Searching for a large amount of music data stored in a large-capacity storage means is generally performed using bibliographic data such as artist names, music titles, and other keywords. There is a possibility that a song with a different impression cannot be taken into consideration because the emotion of the song cannot be taken into account.

そこで、楽曲に対する主観的な印象に基づいて利用者の希望する楽曲を検索可能にするために、検索を希望する楽曲に対するユーザの主観的な要件を入力して数値化して出力し、その出力から、検索対象の楽曲の印象を数量化した予測印象値を算出し、算出した予測印象値をキーとして、複数の楽曲の音響信号およびその楽曲の印象を数量化した印象値を記憶した楽曲データベースを検索することにより、利用者の楽曲に対する主観的なイメージに基づいて、希望する楽曲を検索する装置が提案されている（例えば、特許文献１参照）。 Therefore, in order to make it possible to search for the music desired by the user based on the subjective impression of the music, the user's subjective requirements for the music desired to be searched are input, quantified and output, and the output , A predicted impression value obtained by quantifying the impression of the music to be searched is calculated, and a music database storing an acoustic value of a plurality of music and an impression value obtained by quantifying the impression of the music is calculated using the calculated predicted impression value as a key. There has been proposed an apparatus for searching for desired music based on a subjective image of a user's music by searching (for example, see Patent Document 1).

しかしながら、楽曲には、印象の異なるフレーズが含まれていることが多いにもかかわらず、従来技術では、楽曲の印象を印象値に集約しているため、印象の異なるフレーズが平均化した印象値になってしまったり、１箇所のフレーズに基づく印象値になってしまったりしてしまうため、ユーザが所望する印象の楽曲を必ずしも検索することができないという問題点があった。
特開２００２−２７８５４７号公報 However, even though songs often contain phrases with different impressions, the conventional technique aggregates impressions of songs into impression values, so impression values that average phrases with different impressions are averaged. Or an impression value based on a single phrase, there is a problem in that it is not always possible to search for a musical piece having a desired impression.
JP 2002-278547 A

本発明は斯かる問題点に鑑みてなされたものであり、その目的とするところは、印象の異なるフレーズを考慮して検索することができ、ユーザが所望する印象の楽曲を精度良く検索することができる楽曲検索装置および楽曲検索方法を提供する点にある。 The present invention has been made in view of such problems, and an object of the present invention is to search for a song having an impression desired by a user with high accuracy by searching for a phrase having a different impression. Is to provide a music search device and a music search method.

本発明は上記課題を解決すべく、以下に掲げる構成とした。
本発明の楽曲検索装置は、楽曲データベースに記憶された楽曲データを検索する楽曲検索装置であって、複数の項目のそれぞれの値の範囲を検索範囲として受け付ける操作手段と、前記楽曲データの時間軸上の異なる箇所からそれぞれ特徴データを抽出する特徴データ抽出手段と、該特徴データ抽出手段によって時間軸上の異なる箇所からそれぞれ抽出された複数の前記特徴データに基づいて、前記複数の項目のそれぞれの値に幅を持たせた範囲データを決定する範囲データ決定手段と、前記操作手段によって受け付けた前記検索範囲と重なる前記範囲データの前記楽曲データを特定すると共に、特定した前記楽曲データの前記範囲データにおいて前記検索範囲と重なっている領域が占める割合を重なり度合として算出することで、特定した前記楽曲データを順位付けする楽曲検索手段とを具備することを特徴とする。 In order to solve the above problems, the present invention has the following configuration.
The music search apparatus of the present invention is a music search apparatus for searching music data stored in a music database, and includes an operation means for accepting each value range of a plurality of items as a search range, and a time axis of the music data feature data extraction means for extracting each feature data from different locations on, based on the feature data of the multiple extracted respectively from different positions on the time axis by the feature data extraction means, each of said plurality of items and range data determining means for determining the range data which gave width value, along with specifying the music data of the range data overlapping the search range accepted by said operation means, the range of the music data specified Identified by calculating the ratio of the area that overlaps the search range in the data as the degree of overlap Characterized by comprising a music searching means for ranking the serial music data.

さらに、本発明の楽曲検索装置は、前記特徴データ抽出手段によって抽出された前記特徴データを、前記複数の項目の値からなる印象度データに変換する印象度データ変換手段を具備し、該印象度データ変換手段は、前記特徴データ抽出手段によって抽出された複数の前記特徴データのそれぞれについて前記印象度データに変換させることで、複数の前記特徴データを複数の前記印象度データに変換させ、前記範囲データ決定手段は、前記印象度データ変換手段によって変換された複数の前記印象度データに基づいて前記範囲データを決定することを特徴とする。 Furthermore, the song search apparatus of the present invention, the pre-Symbol feature data extracted by the feature data extraction means, comprising a impression data converting means for converting the impression data of values of said plurality of items, the impression The degree data conversion means converts each of the plurality of feature data extracted by the feature data extraction means into the impression degree data, thereby converting the plurality of feature data into the plurality of impression degree data, and range data determining means, and determines the range data based on the impression data of multiple converted by the impression data converting means.

さらに、本発明の楽曲検索装置において、前記範囲データ決定手段は、前記印象度データの各項目の最大値から最小値の間を前記範囲データとして決定することを特徴とする。 Furthermore, the song search apparatus of the present invention, before above range data determining means, and determines between the minimum value as the range data from the maximum value of each item of the impression data.

本発明は上記課題を解決すべく、以下に掲げる構成とした。
本発明の楽曲検索方法は、楽曲データベースに記憶された楽曲データを検索する楽曲検索方法であって、複数の項目のそれぞれの値の範囲を検索範囲の入力を受け付け、前記楽曲データの時間軸上の異なる箇所からそれぞれ特徴データを抽出し、時間軸上の異なる箇所からそれぞれ抽出した複数の前記特徴データに基づいて、前記複数の項目のそれぞれの値に幅を持たせた範囲データを決定し、該検索範囲と重なる前記範囲データの前記楽曲データを特定し、該特定した前記楽曲データの前記範囲データにおいて前記検索範囲と重なっている領域が占める割合を重なり度合として算出することで、前記特定した前記楽曲データを順位付けすることを特徴とする。 In order to solve the above problems, the present invention has the following configuration.
The music search method of the present invention is a music search method for searching music data stored in a music database, accepting an input of a search range for each value range of a plurality of items, on the time axis of the music data Each of the feature data is extracted from different locations, and based on the plurality of feature data respectively extracted from different locations on the time axis , range data in which each value of the plurality of items has a range is determined, identify the song data of the range data that overlaps with the search range, by calculating the degree overlap the percentage area in the range data of the song data the specific overlaps with the search range occupied and the specific The music data is ranked.

さらに、本発明の楽曲検索方法は、前記抽出した複数の前記特徴データのそれぞれについて、前記複数の項目の値からなる印象度データに変換することで、複数の前記特徴データを複数の前記印象度データに変換し、該変換した複数の前記印象度データに基づいて前記範囲データを決定することを特徴とする。 Furthermore, the music search method of the present invention converts each of the plurality of extracted feature data into impression degree data including values of the plurality of items, thereby converting the plurality of feature data into the plurality of impression degrees. It is converted into data, and the range data is determined based on the plurality of converted impression degree data.

さらに、本発明の楽曲検索方法は、前記印象度データの各項目の最大値から最小値の間を前記範囲データとして決定することを特徴とする。 Furthermore, the song search method of the present invention is characterized by determining between the minimum value as the range data from the maximum value of each item before Symbol impression data.

本発明の楽曲検索装置および楽曲検索方法は、楽曲データから複数組の特徴データを抽出し、抽出した複数組の前記特徴データに基づいて、幅を持って楽曲データの印象を表す範囲データを決定し、受け付けた検索範囲と重なる範囲データの楽曲データを特定し、特定した楽曲データの範囲データにおいて検索範囲と重なっている領域が占める割合を重なり度合として算出することで、特定した楽曲データを順位付けするように構成することにより、印象が異なるフレーズの中のいずれかにユーザが所望する印象が含まれている場合には、該当する楽曲データが検索されるため、印象の異なるフレーズを考慮して検索することができ、ユーザが所望する印象の楽曲を精度良く検索することができるという効果を奏する。 The music search apparatus and the music search method of the present invention extract a plurality of sets of feature data from music data, and determine range data that represents the impression of the song data with a width based on the extracted sets of feature data The music data of the range data that overlaps the received search range is identified, and the ratio of the area that overlaps the search range in the range data of the specified music data is calculated as the degree of overlap, so that the specified music data is ranked If the user's desired impression is included in any of the phrases with different impressions, the corresponding music data is searched. Thus, there is an effect that the music of the impression desired by the user can be searched with high accuracy.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明に係る楽曲検索装置の実施の形態の構成を示すブロック図であり、図２は、図１に示す楽曲検索装置に用いられるニューラルネットワークを事前に学習させるニューラルネットワーク学習装置の構成を示すブロック図である。 FIG. 1 is a block diagram showing a configuration of an embodiment of a music search apparatus according to the present invention, and FIG. 2 is a diagram of a neural network learning apparatus that learns in advance a neural network used in the music search apparatus shown in FIG. It is a block diagram which shows a structure.

本実施の形態の楽曲検索装置１０は、図１を参照すると、パーソナルコンピュータ等のプログラム制御で動作する情報処理装置であり、楽曲データ入力部１１と、圧縮処理部１２と、特徴データ抽出部１３と、印象度データ変換部１４と、範囲データ決定部１５と、楽曲データベース１６と、楽曲検索部１７と、ＰＣ操作部１８と、ＰＣ表示部１９と、音声出力部２０とからなる。 Referring to FIG. 1, the music search device 10 of the present embodiment is an information processing device that operates under program control such as a personal computer, and includes a music data input unit 11, a compression processing unit 12, and a feature data extraction unit 13. And an impression degree data conversion unit 14, a range data determination unit 15, a music database 16, a music search unit 17, a PC operation unit 18, a PC display unit 19, and an audio output unit 20.

楽曲データ入力部１１は、ＣＤ、ＤＶＤ等の楽曲データが記憶されている記憶媒体を読み取る機能を有し、ＣＤ、ＤＶＤ等の記憶媒体から楽曲データを入力し、圧縮処理部１２および特徴データ抽出部１３に出力する。ＣＤ、ＤＶＤ等の記憶媒体以外にインターネット等のネットワークを経由した楽曲データ（配信データ）を入力するように構成しても良い。なお、圧縮された楽曲データが入力される場合には、圧縮された楽曲データを伸長して特徴データ抽出部１３に出力する。 The music data input unit 11 has a function of reading a storage medium in which music data such as a CD and a DVD is stored. The music data input unit 11 inputs music data from a storage medium such as a CD and a DVD, and extracts a compression processing unit 12 and feature data. To the unit 13. You may comprise so that the music data (delivery data) via networks, such as the internet, other than storage media, such as CD and DVD, may be input. When compressed music data is input, the compressed music data is decompressed and output to the feature data extraction unit 13.

圧縮処理部１２は、楽曲検索時には、楽曲データ入力部１１から入力された楽曲データをＭＰ３やＡＴＲＡＣ（Adaptive Transform Acoustic Coding ）等の圧縮形式で圧縮し、圧縮した楽曲データを、アーティスト名、曲名等の書誌データと共に楽曲データベース１６に記憶させる。 The compression processing unit 12 compresses the music data input from the music data input unit 11 in a compression format such as MP3 or ATRAC (Adaptive Transform Acoustic Coding) when searching for music, and the compressed music data includes an artist name, a music title, and the like. Are stored in the music database 16 together with the bibliographic data.

特徴データ抽出部１３は、楽曲データ入力部１１から入力された楽曲データから特徴データを抽出し、抽出した特徴データを印象度データ変換部１４に出力する。 The feature data extraction unit 13 extracts feature data from the song data input from the song data input unit 11 and outputs the extracted feature data to the impression degree data conversion unit 14.

印象度データ変換部１４は、図２に示すニューラルネットワーク学習装置４０によって予め学習が施された階層型ニューラルネットワークを用いて、特徴データ抽出部１３から入力された特徴データを、人間の感性によって判断される印象度データに変換し、変換した印象度データを範囲データ決定部１５に出力する。 The impression degree data conversion unit 14 determines the feature data input from the feature data extraction unit 13 based on human sensibility using a hierarchical neural network previously learned by the neural network learning device 40 shown in FIG. The converted impression degree data is output to the range data determination unit 15.

範囲データ決定部１５は、印象度データ変換部１４から入力された印象度データに基づいて範囲データを決定し、決定した範囲データを楽曲データに関連づけて楽曲データベース１６に登録する。 The range data determination unit 15 determines range data based on the impression level data input from the impression level data conversion unit 14 and registers the determined range data in the music database 16 in association with the music data.

楽曲データベース１６は、ＨＤＤ等の大容量の記憶手段であり、圧縮処理部１２によって圧縮された楽曲データ、書誌データと、範囲データ決定部１５によって決定された範囲データとが関連づけられて記憶される。 The music database 16 is a large-capacity storage unit such as an HDD, and stores music data and bibliographic data compressed by the compression processing unit 12 in association with range data determined by the range data determination unit 15. .

楽曲検索部１７は、ＰＣ操作部１８から印象度データにおける各項目の値の範囲を検索範囲として受け付け、受け付けた検索範囲に基づいて楽曲データベース１６に記憶されている楽曲データを検索し、受け付けた検索範囲と重なる範囲データの楽曲データを特定し、特定した楽曲データの範囲データにおいて検索範囲と重なっている領域が占める割合を重なり度合として算出することで、特定した楽曲データを順位付けする。 The music search unit 17 accepts the value range of each item in the impression degree data from the PC operation unit 18 as a search range, searches the music data stored in the music database 16 based on the accepted search range, and accepts it. The music data of the range data that overlaps the search range is specified, and the ratio of the area that overlaps the search range in the range data of the specified music data is calculated as the degree of overlap, thereby ranking the specified music data.

ＰＣ操作部１８は、キーボードやマウス等の入力手段であり、楽曲データベース１６に記憶されている楽曲データを検索する検索範囲の入力が行われる。 The PC operation unit 18 is input means such as a keyboard and a mouse, and a search range for searching for music data stored in the music database 16 is input.

ＰＣ表示部１９は、例えば液晶ディスプレイ等の表示手段であり、楽曲データベース１６に記憶されている楽曲データを検索する検索範囲の表示、検索された楽曲データ（検索結果）の表示等が行われる。 The PC display unit 19 is a display unit such as a liquid crystal display, for example, and displays a search range for searching for music data stored in the music database 16 and displays searched music data (search results).

音声出力部２０は、楽曲データベース１６に記憶されている楽曲データを伸長して再生するオーディオプレーヤであり、接続されたスピーカ２１から伸長した楽曲データを音声出力させる。 The audio output unit 20 is an audio player that expands and reproduces music data stored in the music database 16, and outputs audio data of the expanded music data from the connected speaker 21.

ニューラルネットワーク学習装置４０は、印象度データ変換部１４で用いられる階層型ニューラルネットワークの学習を行う装置であり、図２を参照すると、楽曲データ入力部４１と、音声出力部４２と、特徴データ抽出部４３と、印象度データ入力部４４と、結合重み値学習部４５と、結合重み値出力部４６とからなる。 The neural network learning device 40 is a device that learns the hierarchical neural network used in the impression degree data conversion unit 14. Referring to FIG. 2, a music data input unit 41, an audio output unit 42, and feature data extraction. A unit 43, an impression degree data input unit 44, a combination weight value learning unit 45, and a combination weight value output unit 46.

楽曲データ入力部４１は、ＣＤ、ＤＶＤ等の楽曲データが記憶されている記憶媒体を読み取る機能を有し、ＣＤ、ＤＶＤ等の記憶媒体から楽曲データを入力し、音声出力部４２および特徴データ抽出部４３に出力する。ＣＤ、ＤＶＤ等の記憶媒体以外にインターネット等のネットワークを経由した楽曲データ（配信データ）を入力するように構成しても良い。なお、圧縮された楽曲データが入力される場合には、圧縮された楽曲データを伸長して音声出力部４２および特徴データ抽出部４３に出力する。 The music data input unit 41 has a function of reading a storage medium in which music data such as a CD and a DVD is stored. The music data input unit 41 inputs music data from a storage medium such as a CD and a DVD, and extracts an audio output unit 42 and feature data. To the unit 43. You may comprise so that the music data (delivery data) via networks, such as the internet, other than storage media, such as CD and DVD, may be input. When compressed music data is input, the compressed music data is decompressed and output to the audio output unit 42 and the feature data extraction unit 43.

音声出力部４２は、楽曲データ入力部４１から入力された楽曲データを伸長して再生するオーディオプレーヤであり、接続されたスピーカ２１から伸長した楽曲データを音声出力させる。 The audio output unit 42 is an audio player that expands and reproduces the music data input from the music data input unit 41, and outputs the audio data of the expanded music data from the connected speaker 21.

特徴データ抽出部４３は、楽曲データ入力部４１から入力された楽曲データから特徴データを抽出し、抽出した特徴データを結合重み値学習部４５に出力する。なお、特徴データ抽出部４３によって楽曲データから抽出される特徴データと、楽曲検索装置１０の特徴データ抽出部１３で楽曲データから抽出される特徴データとは、同一のものとする。 The feature data extraction unit 43 extracts feature data from the song data input from the song data input unit 41 and outputs the extracted feature data to the combined weight value learning unit 45. Note that the feature data extracted from the song data by the feature data extraction unit 43 and the feature data extracted from the song data by the feature data extraction unit 13 of the song search device 10 are the same.

印象度データ入力部４４は、音声出力部４２からの音声出力に基づく、評価者による印象度データの入力を受け付け、受け付けた印象度データを、階層型ニューラルネットワークの学習に用いる教師信号として結合重み値学習部４５に出力する。 The impression degree data input unit 44 accepts input of impression degree data by the evaluator based on the audio output from the audio output unit 42, and uses the received impression degree data as a joint signal as a teacher signal used for learning of the hierarchical neural network. The value is output to the value learning unit 45.

結合重み値学習部４５は、特徴データ抽出部４３から入力された特徴データと、印象度データ入力部４４から入力された印象度データとに基づいて階層型ニューラルネットワークに学習を施し、各ニューロンの結合重み値を更新し、結合重み値出力部４６を介して更新した結合重み値を出力する。学習が施された階層型ニューラルネットワーク（更新された結合重み値）は、楽曲検索装置１０の印象度データ変換部１４に移植される。 The connection weight value learning unit 45 performs learning on the hierarchical neural network based on the feature data input from the feature data extraction unit 43 and the impression degree data input from the impression degree data input unit 44, and The connection weight value is updated, and the updated connection weight value is output via the connection weight value output unit 46. The learned hierarchical neural network (updated connection weight value) is transplanted to the impression degree data conversion unit 14 of the music search apparatus 10.

まず、楽曲検索装置１０における楽曲登録動作について図３乃至図８を参照して詳細に説明する。
図３は、図１に示す楽曲検索装置における楽曲登録動作を説明するためのフローチャートであり、図４は、図１に示す特徴データ抽出部における特徴データ抽出動作を説明するためのフローチャートであり、図５は、図１に示す特徴データ抽出部から出力される特徴データ例を示す図であり、図６は、図１に示す印象度データ変換部で用いられる階層型ニューラルネットワーク例を示す説明図であり、図７は、図１に示す印象度データ変換部から出力される印象度データ例を示す図であり、図８は、図１に示す範囲データ決定部における範囲データ決定動作を説明するための説明図である。 First, the music registration operation in the music search apparatus 10 will be described in detail with reference to FIGS.
FIG. 3 is a flowchart for explaining the music registration operation in the music search apparatus shown in FIG. 1, and FIG. 4 is a flowchart for explaining the feature data extraction operation in the feature data extraction unit shown in FIG. 5 is a diagram showing an example of feature data output from the feature data extraction unit shown in FIG. 1, and FIG. 6 is an explanatory diagram showing an example of a hierarchical neural network used in the impression degree data conversion unit shown in FIG. 7 is a diagram illustrating an example of impression degree data output from the impression degree data conversion unit illustrated in FIG. 1, and FIG. 8 illustrates a range data determination operation in the range data determination unit illustrated in FIG. It is explanatory drawing for.

楽曲データ入力部１１にＣＤ、ＤＶＤ等の楽曲データが記憶されている記憶媒体をセットし、楽曲データ入力部１１から楽曲データを入力する（ステップＡ１）。 A storage medium storing music data such as CD and DVD is set in the music data input section 11 and music data is input from the music data input section 11 (step A1).

圧縮処理部１２は、楽曲データ入力部１１から入力された楽曲データを圧縮し（ステップＡ２）、圧縮した楽曲データを、アーティスト名、曲名等の書誌データと共に楽曲データベース１６に記憶させる（ステップＡ３）。 The compression processing unit 12 compresses the music data input from the music data input unit 11 (step A2), and stores the compressed music data in the music database 16 together with the bibliographic data such as the artist name and the music title (step A3). .

特徴データ抽出部１３は、楽曲データ入力部１１から入力された楽曲データから特徴データを抽出する（ステップＡ４）。なお、楽曲データから抽出する特徴データとしては、テンポ、ビート、ビート強度、平均音数、スペクトル変化量等の各種データが考えられ、いずれを用いても良いが、本実施の形態では、特徴データ抽出部１３において、ゆらぎ情報の６項目を特徴データとして抽出するように構成した。 The feature data extraction unit 13 extracts feature data from the song data input from the song data input unit 11 (step A4). Note that, as the feature data extracted from the music data, various data such as tempo, beat, beat intensity, average number of sounds, spectrum change amount, and the like can be considered, and any of them may be used. The extraction unit 13 is configured to extract six items of fluctuation information as feature data.

また、特徴データ抽出部１３は、楽曲データの時間軸上の異なる箇所からそれぞれ特徴データを抽出するように構成されている。すなわち、特徴データ抽出部１３においては、楽曲データの一部分から特徴データを抽出するようになっており、例えば、開始から３０ｓ後、６０ｓ後、１２０ｓ後のように時間をずらした複数箇所で特徴データをそれぞれ抽出する。 Further, the feature data extraction unit 13 is configured to extract feature data from different locations on the time axis of the music data. In other words, the feature data extraction unit 13 extracts feature data from a part of music data. For example, feature data is extracted at a plurality of locations at different times such as 30 s, 60 s, and 120 s after the start. Are extracted respectively.

特徴データ抽出部１３における特徴データの抽出動作は、図４を参照すると、楽曲データ入力部１１にＣＤプレーヤ等の楽曲再生装置やインターネット等のネットワークから楽曲データが入力されると（ステップＢ１）、楽曲データ入力部１１は、高速化を目的とし、楽曲データ入力部１１に入力された楽曲データを４４．１ｋＨｚから２２．０５ｋＨｚにダウンサンプリングし、ダウンサンプリングした楽曲データを特徴データ抽出部１３に出力する。 With reference to FIG. 4, the feature data extraction operation in the feature data extraction unit 13 is performed when music data is input to the music data input unit 11 from a music playback device such as a CD player or a network such as the Internet (step B1). The music data input unit 11 downsamples the music data input to the music data input unit 11 from 44.1 kHz to 22.05 kHz and outputs the downsampled music data to the feature data extraction unit 13 for the purpose of speeding up. To do.

次に、特徴データ抽出部１３は、変数ｍに１を設定し（ステップＢ２）、ｎ番目のデータ解析開始点から一定のフレーム長に対してのＦＦＴ処理を行い、パワースペクトルを算出する（ステップＢ３）。なお、特徴データ抽出部１３には、楽曲データの時間軸上においてそれぞれ異なる箇所を示すＮ個のデータ解析開始点（例えば、楽曲の先頭から３０ｓ、６０ｓ、１２０ｓ〜）が予め設定されているものとする。また、本実施の形態においては、特徴データの１つである楽曲のテンポとしてテンポの周期を抽出し、テンポの周期が０．３〜１ｓの範囲にあることを想定し、サンプリング周期が２２．０５ｋＨｚである楽曲データに対して１０２４ポイントのＦＦＴ処理を行うように構成した。すなわち、ＦＦＴ処理を行うフレーム長を、１０２４／２２．０５ｋＨｚ≒４６ｍｓとし、想定した楽曲におけるテンポの周期の最小値よりも短い値としている。 Next, the feature data extraction unit 13 sets 1 to the variable m (step B2), performs FFT processing for a certain frame length from the nth data analysis start point, and calculates a power spectrum (step S2). B3). The feature data extraction unit 13 is preset with N data analysis start points (for example, 30 s, 60 s, and 120 s from the beginning of the music) indicating different locations on the time axis of the music data. And In the present embodiment, the tempo period is extracted as the tempo of the music that is one of the characteristic data, and the sampling period is set to 22.2 s assuming that the tempo period is in the range of 0.3 to 1 s. It was configured to perform 1024-point FFT processing on music data at 05 kHz. That is, the frame length for performing the FFT processing is set to 1024 / 22.05 kHz≈46 ms, which is shorter than the minimum value of the tempo cycle in the assumed music.

次に、特徴データ抽出部１３は、Ｌｏｗ（０〜２００Ｈｚ）、Ｍｉｄｄｌｅ（２００〜６００Ｈｚ）、Ｈｉｇｈ（６００〜１１０５０Ｈｚ）の周波数帯域を予め設定しておき、Ｌｏｗ、Ｍｉｄｄｌｅ、Ｈｉｇｈの３帯域のパワースペクトルを積分し、平均パワーを算出し（ステップＢ４）、ステップＢ３〜ステップＢ４の処理動作を行ったフレーム個数が予め定められた設定値（２０４８）に達したか否かを判断し（ステップＢ５）、ステップＢ３〜ステップＢ４の処理動作を行ったフレーム個数が予め定められた設定値に達していない場合には、データ解析開始点をシフトしながら（ステップＢ６）、ステップＢ３〜ステップＢ４の処理動作を繰り返す。これにより、ステップＢ３〜ステップＢ４の処理動作は、予め定められたフレーム個数の設定値分行われることになり、Ｌｏｗ、Ｍｉｄｄｌｅ、Ｈｉｇｈ３帯域の平均パワーの時系列データをそれぞれ求めることができる。なお、本実施の形態では、解析時間長を６０ｓとし、データ解析開始点を６０ｓ＊２２．０５ｋＨｚ／２０４８≒６４６ポイントずつシフトしながらＦＦＴ処理を行い、２０４８ポイント、６０ｓの平均パワーの時系列データを作成するように構成した。 Next, the feature data extraction unit 13 sets frequency bands of Low (0 to 200 Hz), Middle (200 to 600 Hz), and High (600 to 11050 Hz) in advance, and the power of the three bands of Low, Middle, and High. The spectrum is integrated, the average power is calculated (step B4), and it is determined whether or not the number of frames for which the processing operations in steps B3 to B4 have been performed has reached a predetermined setting value (2048) (step B5). ) If the number of frames that have undergone the processing operations in steps B3 to B4 has not reached a predetermined set value, the data analysis start point is shifted (step B6) while the processing in steps B3 to B4 is performed. Repeat the operation. As a result, the processing operations of step B3 to step B4 are performed for a set value of a predetermined number of frames, and time series data of average power in the Low, Middle, and High3 bands can be respectively obtained. In this embodiment, the analysis time length is 60 s, the data analysis start point is shifted by 60 s * 22.05 kHz / 2048≈646 points, and FFT processing is performed to obtain time series data of 2048 points and 60 s average power. Configured to create.

次に、特徴データ抽出部１３は、ステップＢ３〜ステップＢ５の処理動作によって算出したＬｏｗ、Ｍｉｄｄｌｅ、Ｈｉｇｈの平均パワーの時系列データに対しそれぞれＦＦＴを行い、ゆらぎ情報を算出する（ステップＢ７）。なお、本実施の形態では、平均パワーの時系列データに対して２０４８ポイントのＦＦＴ処理を行うように構成した。 Next, the feature data extraction unit 13 performs FFT on the time-series data of the average power of Low, Middle, and High calculated by the processing operations of Step B3 to Step B5, and calculates fluctuation information (Step B7). In this embodiment, 2048-point FFT processing is performed on time series data of average power.

次に、特徴データ抽出部１３は、Ｌｏｗ、Ｍｉｄｄｌｅ、ＨｉｇｈにおけるＦＦＴ分析結果から、横軸を対数周波数、縦軸を対数パワースペクトルとしたグラフにおける近似直線を最小２乗法等によって算出し（ステップＢ８）、近似直線の傾きと、近似直線のＹ切片とを求め（ステップＢ９）、Ｌｏｗ、Ｍｉｄｄｌｅ、Ｈｉｇｈのそれぞれにおける近似直線の傾きおよびＹ切片を特徴データとして抽出する。 Next, the feature data extraction unit 13 calculates an approximate straight line in a graph with the logarithmic frequency on the horizontal axis and the logarithmic power spectrum on the vertical axis from the FFT analysis results in Low, Middle, and High by the least square method or the like (step B8 ), The slope of the approximate line and the Y intercept of the approximate line are obtained (step B9), and the slope and Y intercept of the approximate line in each of Low, Middle, and High are extracted as feature data.

次に、特徴データ抽出部１３は、変数ｍが予め定められた設定数Ｍであるか否かを判断し（ステップＢ１０）、変数ｍが設定数Ｍに到達していない場合には、変数ｍをインクリメント（ステップＢ１１）、ステップＢ３の処理に戻り、変数ｍが設定数Ｍに到達した場合には、特徴データの抽出動作を終了する。これにより、特徴データ抽出部１３においては、図５に示すような、Ｌｏｗ、Ｍｉｄｄｌｅ、Ｈｉｇｈのそれぞれにおける近似直線の傾きおよびＹ切片の６項目の特徴データがＭ組求められることになり、特徴データ抽出部１３は、求めたＭ組の特徴データを印象度データ変換部１４に出力する。 Next, the feature data extraction unit 13 determines whether or not the variable m is a predetermined set number M (step B10), and if the variable m has not reached the set number M, the variable m Is incremented (step B11), and the process returns to step B3. When the variable m reaches the set number M, the feature data extraction operation is terminated. As a result, the feature data extraction unit 13 obtains M sets of feature data of six items of the slope of the approximate straight line and the Y-intercept in each of Low, Middle, and High as shown in FIG. The extraction unit 13 outputs the obtained M sets of feature data to the impression degree data conversion unit 14.

次に、印象度データ変換部１４は、図６に示すような入力層（第１層）、中間層（第ｎ層）、出力層（第Ｎ層）からなる階層型ニューラルネットワークを用い、入力層（第１層）に特徴データ抽出部１３で抽出されたＭ組の特徴データをそれぞれ入力することによって、出力層（第Ｎ層）からＭ組の印象度データを出力、すなわちＭ組の特徴データをそれぞれ印象度データに変換する（ステップＡ５）。印象度データは、出力層（第Ｎ層）のニューロン数Ｌ_Ｎと同数の項目数Ｎ（本実施の形態では、Ｎ＝２）となり、印象度データ変換部１４では、図７に示すような、項目数Ｎの印象度データがＭ組求められることになり、印象度データ変換部１４は、求めたＭ組の印象度データを範囲データ決定部１５に出力する。なお、中間層（第ｎ層）の各ニューロンの結合重み値ｗは、評価者によって予め学習が施されている。 Next, the impression degree data conversion unit 14 uses a hierarchical neural network including an input layer (first layer), an intermediate layer (nth layer), and an output layer (Nth layer) as shown in FIG. By inputting the M sets of feature data extracted by the feature data extraction unit 13 to the layer (first layer), M sets of impression degree data are output from the output layer (Nth layer), that is, M sets of features. Each data is converted into impression data (step A5). The impression degree data is the number N of items (N = 2 in the present embodiment) which is the same as the number of neurons L _N in the output layer (Nth layer), and the impression degree data conversion unit 14 as shown in FIG. Thus, M sets of impression degree data of the number N of items are obtained, and the impression degree data conversion unit 14 outputs the obtained M sets of impression degree data to the range data determination unit 15. Note that the connection weight value w of each neuron in the intermediate layer (nth layer) is learned in advance by the evaluator.

また、本実施の形態では、入力層（第１層）に入力される特徴データ、すなわち特徴データ抽出部１３によって抽出される特徴データの項目は、前述のようにＬｏｗ、Ｍｉｄｄｌｅ、Ｈｉｇｈのそれぞれにおける近似直線の傾きおよびＹ切片の６項目であり、入力層（第１層）のニューロン数Ｌ_１は、６個となっている。さらに、印象度データの項目数は、任意であるが、本実施の形態では、印象度データの項目としては、人間の感性によって判断される「明るい、暗い」、「激しい、穏やか」の２項目を設定し、各項目を７段階評価で表すように設定した。従って、出力層（第Ｎ層）のニューロン数Ｌ_Ｎは、２個となっている。中間層（第ｎ層：ｎ＝２，…，Ｎ−１）のニューロン数Ｌｎは、適宜設定すると良い。 Further, in the present embodiment, the feature data input to the input layer (first layer), that is, the feature data items extracted by the feature data extraction unit 13, are as described above in each of Low, Middle, and High. a six items of slope and Y-intercept of the approximate straight line, the number of neurons L ₁ of the input layer (first layer) has a six. Furthermore, although the number of items of impression degree data is arbitrary, in this embodiment, the items of impression degree data include two items of “bright and dark” and “severe and gentle” determined by human sensitivity. Was set so that each item was represented by a seven-step evaluation. Therefore, the number of neurons L _N in the output layer (Nth layer) is two. The number of neurons Ln in the intermediate layer (nth layer: n = 2,..., N−1) may be set as appropriate.

さらに、出力層（第Ｎ層）から出力される印象度データの各項目の値は、後述するように階層型ニューラルネットワークを学習させる教師信号として７段階評価の印象度データを用いているため、それぞれほぼ１〜７の範囲の実数となり、１以下は、１に、７以上は、７に変換するように構成した。印象度データにおいて、項目「明るい、暗い」が最小値（１）に近いほど「明るい」印象を与える楽曲データであり、最大値（７）に近いほど「暗い」印象を与える楽曲データである。また、印象度データにおいて、項目「激しい、穏やか」が最小値（１）に近いほど「激しい」印象を与える楽曲データであり、最大値（７）に近いほど「穏やか」な印象を与える楽曲データである。 Furthermore, since the value of each item of the impression degree data output from the output layer (Nth layer) uses the impression degree data of the seven-step evaluation as a teacher signal for learning the hierarchical neural network as described later, Each is a real number in the range of approximately 1 to 7, and 1 or less is converted to 1, and 7 or more is converted to 7. In the impression degree data, music data that gives a “bright” impression as the item “bright, dark” is closer to the minimum value (1), and music data that gives a “dark” impression as it gets closer to the maximum value (7). In addition, in the impression degree data, the music data that gives a “severe” impression as the item “violent and gentle” is closer to the minimum value (1), and the music data that gives a “gentle” impression as it is closer to the maximum value (7). It is.

範囲データ決定部１５は、印象度データ変換部１４から入力されたＭ組の印象度データに基づいて範囲データを決定し（ステップＡ６）、決定した範囲データを楽曲データに関連づけて楽曲データベース１６に記憶させる（ステップＡ７）。範囲データは、幅を持たせて楽曲データの印象を表すデータであり、本実施の形態では、Ｍ組の印象度データにおいて、各項目の最大値から最小値の間を範囲データとした。すなわち、図８に示すように、Ｍ組の印象度データにおける項目「明るい、暗い」の最大値が（３．２）、最小値が（２．３）であり、項目「激しい、穏やか」の最大値が（４．２）、最小値が（３．５）である場合には、範囲データ決定部１５は、項目「明るい、暗い」における値の範囲（２．３〜３．２）と、項目「激しい、穏やか」における値の範囲（３．５〜４．２）とを範囲データとして決定する。 The range data determination unit 15 determines range data based on the M sets of impression degree data input from the impression level data conversion unit 14 (step A6), and associates the determined range data with the song data in the song database 16. Store (step A7). The range data is data that expresses the impression of music data with a width, and in this embodiment, the range data between the maximum value and the minimum value of each item is set as range data in M sets of impression degree data. That is, as shown in FIG. 8, the maximum value of the item “bright, dark” in the M sets of impression degree data is (3.2), the minimum value is (2.3), and the item “violent, gentle” When the maximum value is (4.2) and the minimum value is (3.5), the range data determination unit 15 sets the value range (2.3 to 3.2) in the item “bright, dark”. The value range (3.5 to 4.2) in the item “violent and gentle” is determined as range data.

次に、印象度データ変換部１４における変換動作（ステップＡ５）に用いられる階層型ニューラルネットワークの学習動作について図９を参照して詳細に説明する。
図９は、図２に示すニューラルネットワーク学習装置における階層型ニューラルネットワークの学習動作を説明するためのフローチャートである。 Next, the learning operation of the hierarchical neural network used for the conversion operation (step A5) in the impression degree data conversion unit 14 will be described in detail with reference to FIG.
FIG. 9 is a flowchart for explaining the learning operation of the hierarchical neural network in the neural network learning apparatus shown in FIG.

評価者による階層型ニューラルネットワーク（結合重み値ｗ）の学習は、例えば、図２に示すニューラルネットワーク学習装置４０を用いて行われ、まず、階層型ニューラルネットワーク（結合重み値ｗ）を事前学習させるための事前学習データ（楽曲データの特徴データ＋印象度データ）の入力が行われる。 Learning of the hierarchical neural network (connection weight value w) by the evaluator is performed using, for example, the neural network learning device 40 shown in FIG. 2. First, the hierarchical neural network (connection weight value w) is pre-learned. The pre-learning data (feature data feature data + impression degree data) is input.

楽曲データ入力部４１にＣＤ、ＤＶＤ等の楽曲データが記憶されている記憶媒体をセットし、楽曲データ入力部４１から楽曲データを入力し（ステップＣ１）、特徴データ抽出部４３は、楽曲データ入力部４１から入力された楽曲データから特徴データを抽出する（ステップＣ２）。なお、特徴データ抽出部４３によって抽出する特徴データは、楽曲検索装置１０の特徴データ抽出部１３で抽出する特徴データと同一である。 A music medium such as a CD or DVD is set in the music data input unit 41, music data is input from the music data input unit 41 (step C1), and the feature data extraction unit 43 inputs the music data. Feature data is extracted from the music data input from the unit 41 (step C2). Note that the feature data extracted by the feature data extraction unit 43 is the same as the feature data extracted by the feature data extraction unit 13 of the music search device 10.

また、音声出力部４２は、楽曲データ入力部４１から入力された楽曲データを音声出力し（ステップＣ３）、評価者は、音声出力部４２からの音声出力を聞くことによって、楽曲の印象度を感性によって評価し、評価結果を印象度データとして印象度データ入力部４４から入力し（ステップＣ４）、結合重み値学習部４５は、印象度データ入力部４４から入力された印象度データを教師信号として受け付ける。なお、本実施の形態では、印象度データの項目としては、人間の感性によって判断される「明るい、暗い」、「激しい、穏やか」の２項目を設定し、各項目についての７段階評価を印象度データとして印象度データ入力部４４で受け付けるように構成した。 The audio output unit 42 outputs the music data input from the music data input unit 41 as audio (step C3), and the evaluator listens to the audio output from the audio output unit 42 to determine the impression level of the music. Evaluation is performed based on sensitivity, and the evaluation result is input as impression degree data from the impression degree data input unit 44 (step C4). The combined weight value learning unit 45 uses the impression degree data input from the impression degree data input unit 44 as a teacher signal. Accept as. In the present embodiment, two items of “bright, dark” and “severe, gentle” determined by human sensibility are set as impression level data items, and impressions of 7 levels for each item are given. The degree data is received by the impression degree data input unit 44 as degree data.

次に、特徴データと入力された印象度データとからなる学習データが予め定められたサンプル数Ｔ_１に達したか否かを判断し（ステップＣ５）、学習データがサンプル数Ｔ_１に達するまでステップＣ１〜ステップＣ４の動作が繰り返される。 Next, it is determined whether a learning data consisting of characteristic data and the inputted impression data reaches the number of samples T ₁ for a predetermined (step C5), until the learning data reaches the number of samples T ₁ Steps C1 to C4 are repeated.

結合重み値学習部４５における階層型ニューラルネットワークの学習、すなわち各ニューロンの結合重み値ｗの更新は、誤差逆伝播学習法を用いて行う。
まず、初期値として、中間層（第ｎ層）の全てのニューロンの結合重み値ｗを乱数によって−０．１〜０．１程度の範囲の小さな値に設定しておき、結合重み値学習部４５は、特徴データ抽出部４３によって抽出された特徴データを入力信号ｘ_ｊ(ｊ＝１，２，…，８) として入力層（第１層）に入力し、入力層（第１層）から出力層（第Ｎ層）に向けて、各ニューロンの出力を計算する。 The learning of the hierarchical neural network in the connection weight value learning unit 45, that is, the update of the connection weight value w of each neuron is performed using an error back propagation learning method.
First, as an initial value, the connection weight value w of all the neurons of the intermediate layer (nth layer) is set to a small value in the range of about −0.1 to 0.1 by a random number, and the connection weight value learning unit 45 inputs the feature data extracted by the feature data extraction unit 43 into the input layer (first layer) as an input signal x _j (j = 1, 2,..., 8), and from the input layer (first layer). The output of each neuron is calculated toward the output layer (Nth layer).

次に、結合重み値学習部４５は、印象度データ入力部４４から入力された印象度データを教師信号ｙ_ｊ(ｊ＝１，２，…，８) とし、出力層（第Ｎ層）の出力out_j ^Ｎと、教師信号ｙ_ｊとの誤差から、学習則δ_j ^Ｎを次式によって計算する。 Next, the combined weight value learning unit 45 uses the impression degree data input from the impression degree data input unit 44 as a teacher signal y _j (j = 1, 2,..., 8), and outputs the output layer (Nth layer). The learning rule δ _j ^N is calculated from the error between the output out _j ^N and the teacher signal y _j by the following equation.

次に、結合重み値学習部４５は、学習則δ_j ^Ｎを使って、中間層（第ｎ層）の誤差信号 δ_j ⁿ を次式によって計算する。 Next, the joint weight value learning unit 45 calculates the error signal δ _j ⁿ of the intermediate layer (nth layer) using the learning rule δ _j ^N by the following equation.

なお、数式２において、ｗは、第 n 層 j 番目と第 n -1 層ｋ番目のニューロンの間の結合重み値を表している。 In Equation 2, w represents a connection weight value between the n-th layer j-th neuron and the (n −1) -th layer k-th neuron.

次に、結合重み値学習部４５は、中間層（第ｎ層）の誤差信号 δ_j ⁿ を用いて各ニューロンの結合重み値ｗの変化量Δｗを次式によって計算し、各ニューロンの結合重み値ｗを更新する（ステップＣ６）。なお、次式において、ηは、学習率を表し、評価者による学習では、η_１(0＜η_１≦1)に設定されている。 Next, the connection weight value learning unit 45 calculates the amount of change Δw of the connection weight value w of each neuron using the following equation using the error signal δ _j ⁿ of the intermediate layer (nth layer), and the connection weight of each neuron. The value w is updated (step C6). In the following equation, η represents a learning rate, and is set to η ₁ (0 <η ₁ ≦ 1) in learning by the evaluator.

ステップＣ６では、サンプル数Ｔ_１の事前学習データのそれぞれについて学習が行われ、次に、次式に示す２乗誤差Ｅが予め定められた事前学習用の基準値Ｅ_１よりも小さいか否かが判断され（ステップＣ７）、２乗誤差Ｅが基準値Ｅ_１よりも小さくなるまでステップＣ６の動作が繰り返される。なお、２乗誤差Ｅが基準値Ｅ_１よりも小さくなると想定される学習反復回数Ｓを予め設定しておき、ステップＣ６の動作を学習反復回数Ｓ回繰り返すようにしても良い。 In step C6, for each of the pre-training data sample number T ₁ learning is performed, then, or smaller or not than the reference value E ₁ for pre-learning is square error E shown in the following equation predetermined There is judged (step C7), the operation of step C6 to square error E is smaller than the reference value E ₁ is repeated. Incidentally, the learning iterations S squared error E is assumed to be smaller than the reference value E ₁ is set in advance, may be the operation of the step C6 to repeat the learning iterations S times.

ステップＣ７で２乗誤差Ｅが基準値Ｅ_１よりも小さいと判断された場合には、結合重み値学習部４５は、事前学習させた各ニューロンの結合重み値ｗを結合重み値出力部４６によって出力し（ステップＣ８）、結合重み値出力部４６から出力された各ニューロンの結合重み値ｗは、印象度データ変換部１４に記憶される。 In the case of the square error E is determined to be smaller than the reference value E ₁ Step C7, connection weights learning unit 45, the connection weights w coupling weight value output unit 46 of the neurons were pre-learned The connection weight value w of each neuron outputted and output from the connection weight value output unit 46 is stored in the impression degree data conversion unit 14.

次に、楽曲検索部１７における楽曲検索動作について図１０および図１１を参照して詳細に説明する。
図１０は、図１に示す楽曲検索部における楽曲検索動作を説明するためフローチャートであり、図１１は、図１に示す楽曲検索部における楽曲検索動作を説明するための説明図である。 Next, the music search operation in the music search unit 17 will be described in detail with reference to FIGS. 10 and 11.
FIG. 10 is a flowchart for explaining the music search operation in the music search unit shown in FIG. 1, and FIG. 11 is an explanatory diagram for explaining the music search operation in the music search unit shown in FIG.

楽曲検索部１７は、ＰＣ操作部１８から印象度データの各項目における値の範囲を検索範囲として受け付け（ステップＤ１）、受け付けた検索範囲に基づいて楽曲データベース１６に記憶されている楽曲データを検索し、受け付けた検索範囲と重なる範囲データの楽曲データを特定する（ステップＤ２）。 The music search unit 17 receives a range of values in each item of impression degree data from the PC operation unit 18 as a search range (step D1), and searches for music data stored in the music database 16 based on the received search range. Then, the music data of the range data that overlaps the accepted search range is specified (step D2).

例えば、印象度データの項目である「明るい、暗い」、「激しい、穏やか」の２項目の範囲を示す範囲データと共に、楽曲Ａ〜Ｇが楽曲データベース１６に記憶されている場合には、図１１に示すように、７個の範囲データが存在しており、ＰＣ操作部１８から検索条件として受け付けた印象度データの各項目の値を含む範囲データの楽曲データが楽曲検索部１７によって特定される。図１１に示すように、印象度データの項目「明るい、暗い」における値の範囲（２〜３）と、項目「激しい、穏やか」における値の範囲（３〜４）とを検索範囲として受け付けた場合には、楽曲Ａと、楽曲Ｅと、楽曲Ｇとが特定される。 For example, when the music pieces A to G are stored in the music piece database 16 together with the range data indicating the ranges of the two items “bright, dark” and “severe, gentle” which are the impression degree data items, FIG. 7, there are seven range data, and the song search unit 17 specifies the song data of the range data including the value of each item of impression degree data received as a search condition from the PC operation unit 18. . As shown in FIG. 11, the value range (2-3) in the item “bright, dark” of the impression degree data and the value range (3-4) in the item “violent, gentle” were accepted as search ranges. In this case, the music A, the music E, and the music G are specified.

次に、楽曲検索部１７は、特定したそれぞれの楽曲データの範囲データに対し、検索範囲と重なっている領域が占める割合（以下、重なり度合Ｐと称す）を算出し（ステップＤ３）、重なり度合Ｐが大きい順にステップＤ２で特定した楽曲データを順位付け（ソート）し（ステップＤ４）、上位から予め定められた所定数の楽曲データを検索結果として特定する（ステップＤ５）。 Next, the music search unit 17 calculates the ratio (hereinafter referred to as overlap degree P) of the area overlapping the search range to the specified range data of each piece of music data (step D3), and the overlap degree The music data specified in step D2 is ranked (sorted) in descending order of P (step D4), and a predetermined number of music data determined in advance from the top are specified as search results (step D5).

重なり度合Ｐは、範囲データの内のどの程度（割合）が検索範囲と重なっているかを示す値であり、重なり度合Ｐが大きいほど検索範囲により適合しているものとして判断される。 The overlap degree P is a value indicating how much (ratio) of the range data overlaps the search range, and it is determined that the greater the overlap degree P, the better the search range.

楽曲データ（項目数Ｎ）の各項目の幅をそれぞれＸ_１、Ｘ_２・・・Ｘ_Ｎとし、各項目において検索範囲と重なっている幅をそれぞれＹ_１、Ｙ_２・・・Ｙ_Ｎとすると、重なり度合Ｐは、
Ｐ＝（Ｘ_１＊Ｘ_２＊・・・＊Ｘ_Ｎ）＊Ｚ／（Ｙ_１＊Ｙ_２＊・・・＊Ｙ_Ｎ）
で表される。なお、Ｚは、各範囲データに与えられている持ち点である。各範囲データに与えられている持ち点Ｚは、各楽曲データを同じ条件で検索する場合には、同じ持ち点にすることが好ましいが、優先的に検索させたい楽曲データがある場合には、当該楽曲データの範囲データに与える持ち点Ｚを他の範囲データよりも大きい値に設定できるように、また、あまり検索させたくない楽曲データがある場合には、当該楽曲データの範囲データに与える持ち点Ｚを他の範囲データよりも小さい値に設定できるように構成すると良い。 And _{_X} _1, _X 2 ··· _X _N the width of each item of the song data (number of items N) respectively, when each of the widths of overlap with the search range as _{_Y} _1, _Y 2 ··· _Y _N In each item The degree of overlap P is
P = (X ₁ * X ₂ * ... * X _N ) * Z / (Y ₁ * Y ₂ * ... * Y _N )
It is represented by Z is a point given to each range data. The score Z given to each range data is preferably the same score when searching for each piece of music data under the same conditions, but when there is song data to be preferentially searched, The possession point Z given to the range data of the music data can be set to a larger value than the other range data, and if there is music data which is not desired to be searched, the possession given to the range data of the music data It is preferable that the point Z can be set to a value smaller than other range data.

次に、楽曲検索部１７は、特定した楽曲データの書誌データを検索結果としてＰＣ表示部１９に表示させることで検索結果をユーザに通知し、ＰＣ操作部１８から再生指示が入力されると、特定した楽曲データを順次もしくはランダムに楽曲データベース１６から読み出し、読み出した楽曲データを音声出力部２０に出力してスピーカ２１から音声出力させる。 Next, the music search unit 17 notifies the user of the search result by causing the PC display unit 19 to display the bibliographic data of the specified music data as a search result, and when a playback instruction is input from the PC operation unit 18, The specified music data is read sequentially or randomly from the music database 16, and the read music data is output to the audio output unit 20 and output from the speaker 21.

以上説明したように、本実施の形態によれば、特徴データ抽出部１３は、楽曲データから複数組の特徴データを抽出し、範囲データ決定部１５は、抽出した複数組の前記特徴データに基づいて、幅を持って楽曲データの印象を表す範囲データを決定し、楽曲検索部１７は、受け付けた検索範囲と重なる範囲データの楽曲データを特定し、特定した楽曲データの範囲データにおいて検索範囲と重なっている領域が占める割合を重なり度合として算出することで、特定した楽曲データを順位付けするように構成することにより、印象が異なるフレーズの中のいずれかにユーザが所望する印象が含まれている場合には、該当する楽曲データが検索されるため、印象の異なるフレーズを考慮して検索することができ、ユーザが所望する印象の楽曲を精度良く検索することができるという効果を奏する。 As described above, according to the present embodiment, the feature data extracting unit 13 extracts a plurality of sets of feature data from the music data, and the range data determining unit 15 is based on the extracted sets of the feature data. Then, the range data representing the impression of the music data with a width is determined, and the music search unit 17 specifies the music data of the range data that overlaps the received search range, and the search data in the range data of the specified music data By calculating the ratio of overlapping areas as the degree of overlap, the composition is such that the specified music data is ranked, so that the user's desired impression is included in any of the phrases with different impressions If there is a song, the corresponding music data is searched. An effect that it is possible to improve search.

なお、本実施の形態では、楽曲データの時間軸上の異なる箇所からそれぞれ特徴データを抽出すると共に、複数組の特徴データをそれぞれ印象度データに変換することで、複数組の印象度データを求め、求めた複数組の印象度データに基づいて範囲データを決定するように構成したが、楽曲データの時間軸上の同一箇所から幅を持たせた特徴データを抽出するように構成し、幅を持たせた特徴データを変換した印象度データに基づいて範囲データを決定するようにしても良い。 In the present embodiment, the feature data is extracted from different locations on the time axis of the music data, and the plurality of sets of feature data are converted into impression data, thereby obtaining the plurality of sets of impression data. The range data is determined based on the plurality of sets of impression degree data obtained, but the feature data having a width is extracted from the same location on the time axis of the music data, and the width is The range data may be determined based on impression degree data obtained by converting the feature data provided.

図１２は、図１に示す特徴データ抽出部で幅を持たせた特徴データの抽出例を説明するための説明図である。
図４に示すステップＢ８において、近似直線を算出する代わりに、図１２に示すように上限直線と下限直線とを求め、ステップＢ９において、上限直線と下限直線との傾きと、Ｙ切片とをそれぞれ求めるように構成する。なお、図１１は、Ｌｏｗのパワースペクトルを示しており、Ｍｉｄｄｌｅ、Ｈｉｇｈについても同様に上限直線と下限直線とを求める。また、上限直線と下限直線とを求めるに際し、対数周波数は、マスクする範囲（例えば０．１Ｈｚ未満をマスクし、０．１Ｈｚのデータに基づいて上限直線と下限直線とを求める等）を設定してもよい。 FIG. 12 is an explanatory diagram for explaining an example of extracting feature data having a width provided by the feature data extracting unit shown in FIG.
In step B8 shown in FIG. 4, instead of calculating an approximate line, an upper limit line and a lower limit line are obtained as shown in FIG. 12, and in step B9, the slopes of the upper limit line and the lower limit line and the Y intercept are respectively obtained. Configure as required. Note that FIG. 11 shows a low power spectrum, and an upper limit straight line and a lower limit straight line are similarly obtained for middle and high. Further, when obtaining the upper limit line and the lower limit line, the logarithmic frequency is set to a masking range (for example, masking less than 0.1 Hz and obtaining the upper limit line and the lower limit line based on 0.1 Hz data). May be.

これにより、２組の特徴データ（上限直線に基づく特徴データと下限直線に基づく特徴データ）が求められることになり、２組の特徴データをそれぞれ印象度データに変換し、変換された２組の印象度データに基づいて範囲データを決定することができる。 As a result, two sets of feature data (feature data based on the upper limit line and feature data based on the lower limit line) are obtained, and the two sets of feature data are converted into impression data, respectively. The range data can be determined based on the impression degree data.

さらに、本実施の形態では、楽曲データの時間軸上の異なる箇所からそれぞれ特徴データを抽出すると共に、複数組の特徴データをそれぞれ印象度データに変換することで、複数組の印象度データを求め、求めた複数組の印象度データに基づいて範囲データを決定するように構成したが、特徴データを印象度データに変換することなく、直接複数組の特徴データに基づいて範囲データを決定するように構成しても良い。 Furthermore, in the present embodiment, feature data is extracted from different locations on the time axis of the music data, and a plurality of sets of impression data is obtained by converting a plurality of sets of feature data into impression data. The range data is determined based on the obtained multiple sets of impression data, but the range data is determined directly based on the multiple sets of feature data without converting the feature data into impression data. You may comprise.

さらに、本実施の形態では、全ての楽曲データから特徴データを抽出できるものとして説明したが、何らかの理由により、楽曲データから特徴データを抽出できない場合には、範囲データを全範囲とすると良い。 Furthermore, although the present embodiment has been described on the assumption that feature data can be extracted from all music data, if for some reason the feature data cannot be extracted from music data, the range data may be the entire range.

なお、本発明が上記各実施の形態に限定されず、本発明の技術思想の範囲内において、各実施の形態は適宜変更され得ることは明らかである。また、上記構成部材の数、位置、形状等は上記実施の形態に限定されず、本発明を実施する上で好適な数、位置、形状等にすることができる。なお、各図において、同一構成要素には同一符号を付している。 Note that the present invention is not limited to the above-described embodiments, and it is obvious that the embodiments can be appropriately changed within the scope of the technical idea of the present invention. In addition, the number, position, shape, and the like of the constituent members are not limited to the above-described embodiment, and can be set to a suitable number, position, shape, and the like in practicing the present invention. In each figure, the same numerals are given to the same component.

本発明に係る楽曲検索装置の実施の形態の構成を示すブロック図である。It is a block diagram which shows the structure of embodiment of the music search apparatus which concerns on this invention. 図１に示す楽曲検索装置に用いられるニューラルネットワークを事前に学習させるニューラルネットワーク学習装置の構成を示すブロック図である。It is a block diagram which shows the structure of the neural network learning apparatus which learns in advance the neural network used for the music search apparatus shown in FIG. 図１に示す楽曲検索装置における楽曲登録動作を説明するためのフローチャートである。It is a flowchart for demonstrating the music registration operation | movement in the music search apparatus shown in FIG. 図１に示す特徴データ抽出部における特徴データ抽出動作を説明するためのフローチャートである。It is a flowchart for demonstrating the feature data extraction operation | movement in the feature data extraction part shown in FIG. 図１に示す特徴データ抽出部から出力される特徴データ例を示す図である。It is a figure which shows the example of feature data output from the feature data extraction part shown in FIG. 図１に示す印象度データ変換部で用いられる階層型ニューラルネットワーク例を示す説明図である。It is explanatory drawing which shows the hierarchical neural network example used by the impression degree data conversion part shown in FIG. 図１に示す印象度データ変換部から出力される印象度データ例を示す図である。It is a figure which shows the example of impression degree data output from the impression degree data conversion part shown in FIG. 図１に示す範囲データ決定部における範囲データ決定動作を説明するための説明図である。It is explanatory drawing for demonstrating the range data determination operation | movement in the range data determination part shown in FIG. 図２に示すニューラルネットワーク学習装置における階層型ニューラルネットワークの学習動作を説明するためのフローチャートである。3 is a flowchart for explaining a learning operation of a hierarchical neural network in the neural network learning apparatus shown in FIG. 図１に示す楽曲検索部における楽曲検索動作を説明するためのフローチャートである。It is a flowchart for demonstrating the music search operation | movement in the music search part shown in FIG. 図１に示す楽曲検索部における楽曲検索動作を説明するための説明図である。It is explanatory drawing for demonstrating the music search operation | movement in the music search part shown in FIG. 図１に示す特徴データ抽出部で幅を持たせた特徴データの抽出例を説明するための説明図である。It is explanatory drawing for demonstrating the example of extraction of the feature data which gave the width | variety in the feature data extraction part shown in FIG.

Explanation of symbols

１０楽曲検索装置
１１楽曲データ入力部
１２圧縮処理部
１３特徴データ抽出部
１４印象度データ変換部
１５範囲データ決定部
１６楽曲データベース
１７楽曲検索部
１８ＰＣ操作部
１９ＰＣ表示部
２０音声出力部
２１スピーカ
４０ニューラルネットワーク学習装置
４１楽曲データ入力部
４２音声出力部
４３特徴データ抽出部
４４印象度データ入力部
４５結合重み値学習部
４６結合重み値出力部 DESCRIPTION OF SYMBOLS 10 Music search device 11 Music data input part 12 Compression processing part 13 Feature data extraction part 14 Impression degree data conversion part 15 Range data determination part 16 Music database 17 Music search part 18 PC operation part 19 PC display part 20 Audio | voice output part 21 Speaker DESCRIPTION OF SYMBOLS 40 Neural network learning apparatus 41 Music data input part 42 Audio | voice output part 43 Feature data extraction part 44 Impression degree data input part 45 Connection weight value learning part 46 Connection weight value output part

Claims

A music search device for searching music data stored in a music database,
An operation means for accepting a range of values of a plurality of items as a search range;
Feature data extracting means for extracting feature data from different locations on the time axis of the music data ;
Wherein the different locations on the time axis by the feature data extraction means multiple extracted respectively based on the feature data, range data determining means for determining a range data which gave width value of each of the plurality of items and,
Together with the search range overlaps to identify the music data of the range data received by said operating means, is calculated as the degree overlap the ratio of the area of overlap with the search range in the range data of the music data specified And a music search means for ranking the specified music data.

Said pre Symbol feature data extracted by the feature data extraction means, comprising a impression data converting means for converting the impression data of values of said plurality of items,
The impression degree data converting means converts the plurality of feature data into a plurality of impression degree data by converting each of the plurality of feature data extracted by the feature data extracting means into the impression degree data. ,
The range data determining means song search apparatus according to claim 1, wherein determining said range data based on the impression data of multiple converted by the impression data converting means.

Before above range data determining means, the music search apparatus according to claim 2, wherein the determining between a minimum value as the range data from the maximum value of each item of the impression data.

  A music search method for searching music data stored in a music database,
Accept the search range input for each value range of multiple items,
  Extract feature data from different points on the time axis of the music data,
  Based on the plurality of feature data respectively extracted from different locations on the time axis, to determine range data having a width for each value of the plurality of items,
  Specifying the music data of the range data overlapping the search range;
  A music search method, wherein the specified music data is ranked by calculating a ratio of the area overlapping the search range in the range data of the specified music data as an overlap degree.

For each of the extracted plurality of feature data, the plurality of feature data is converted into a plurality of impression data by converting the impression data including values of the plurality of items,
5. The music search method according to claim 4, wherein the range data is determined based on the plurality of converted impression degree data.

6. The music search method according to claim 5, wherein a range between a maximum value and a minimum value of each item of the impression degree data is determined as the range data.