JP5561041B2

JP5561041B2 - Relevant information retrieval device for acoustic data

Info

Publication number: JP5561041B2
Application number: JP2010199102A
Authority: JP
Inventors: 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2010-09-06
Filing date: 2010-09-06
Publication date: 2014-07-30
Anticipated expiration: 2030-09-06
Also published as: JP2012058328A

Description

本発明は、楽曲等の音楽のデータを記録した音響データに関する関連情報を音響データベースから検索する装置に関する。 The present invention relates to an apparatus for searching related information related to acoustic data in which music data such as music is recorded from an acoustic database.

最近、流れている音楽のタイトル等を知ることができる楽曲属性情報の提供サービスとして、放送された音楽に対して日時と地域を放送局に照会したり、携帯電話で流れている音楽断片を録音してデータベースに登録されているメロディーと照合したりするサービスが実用化されている（例えば、特許文献１、２参照）。一方、出願人も、音響信号（マイクロフォンで録音したアナログ音響信号を含む。コンピュータ等で処理可能なデジタル化された信号については、以降、「音響データ」という表現にする。）の所定区間の特徴を所定バイト数の検索特徴ワードに変換し、この検索特徴ワードを複数個で１セットとし、１セット内の検索特徴ワード群と、データベースに登録されている登録特徴ワード群とを照合する処理を順次行って、合致度が高いレコードを抽出することにより、録音により取得したメロディーとデータベースに登録されているメロディーとの照合処理を高速に行う技術を提案している（特許文献３参照）。 As a service to provide music attribute information that allows you to know the titles of music that has been played recently, you can query the broadcast station for the date and time of the broadcast music, and record music fragments that are being played on mobile phones. Services that collate with melodies registered in the database have been put into practical use (see, for example, Patent Documents 1 and 2). On the other hand, the applicant also has characteristics of a predetermined section of an acoustic signal (including an analog acoustic signal recorded with a microphone. A digitized signal that can be processed by a computer or the like is hereinafter referred to as “acoustic data”). Is converted into a search feature word having a predetermined number of bytes, a plurality of search feature words are set as one set, and a search feature word group in one set is compared with a registered feature word group registered in the database. A technique is proposed in which a melody obtained by recording and a melody registered in a database are collated at high speed by sequentially performing extraction of records having a high degree of match (see Patent Document 3).

特表２００４−５３６３４８号公報JP-T-2004-536348 特表２００４−５３７７６０号公報JP-T-2004-537760 特開２００７−２２６０７１号公報JP 2007-226071 A

しかしながら、上記従来の手法では、与えられた音響データ（検索音響データ）が、どのようなものであっても、検索音響データ内の全ての検索音響ワード群と、全ての登録音響データ群について、照合処理を行うため、処理負荷が高いという問題がある。 However, in the above-described conventional method, whatever the given sound data (search sound data) is, for all search sound word groups in the search sound data and all registered sound data groups, Since collation processing is performed, there is a problem that processing load is high.

そこで、本発明は、楽曲等の音響データを用いて、音響データベースに登録されている音響データに関連する関連情報を検索するにあたり、与えられた音響データの状態に基づいて、照合する範囲を制限することにより、処理負荷を抑えることが可能な音響データの関連情報検索装置を提供することを課題とする。 Therefore, the present invention limits the range to be collated based on the state of the given acoustic data when searching related information related to the acoustic data registered in the acoustic database using acoustic data such as music. Accordingly, an object of the present invention is to provide a related information search device for acoustic data capable of suppressing processing load.

上記課題を解決するため、本発明第１の態様では、与えられた音響データである検索音響データの特徴と、音響データベースに登録された原音響データの特徴との照合を行って、前記検索音響データと特徴の近い原音響データを特定することにより検索を行う装置であって、各原音響データについて、その特徴パターンを表現した登録特徴ワードの集合である登録特徴ワード群と、当該原音響データの関連情報を登録した音響データベースと、前記検索音響データに対して、所定の区間単位で、各区間における検索音響データの特徴パターンを表現した検索特徴ワードを生成し、当該検索特徴ワードの集合である特徴ワード群を得る検索特徴ワード生成手段と、前記検索特徴ワード群、登録特徴ワード群において、それぞれその一部の同数の特徴ワードを照合対象とし、登録特徴ワード群と検索特徴ワード群の関係に基づいて、登録特徴ワード群における照合対象の登録特徴ワードの移動範囲を照合範囲として決定する照合範囲決定手段と、前記検索特徴ワード群における照合対象と、音響データベースに登録された登録特徴ワード群における照合対象を、登録特徴ワード群における照合対象を前記照合範囲において移動させながら照合を行い、照合の結果、相違の程度である相違度が、所定値（判定しきい値Ｍ２）より小さい場合に、当該登録特徴ワード群に対応する原音響データを選出対象として特定する特徴ワード照合手段と、を有し、前記照合範囲決定手段は、前記登録特徴ワード群を構成する登録特徴ワードの数が、前記検索特徴ワード群を構成する検索特徴ワードの数より少ない場合に、前記登録特徴ワード群を検索対象から除外するものであり、前記検索音響データが音響素材の先頭から始まることが設定されている場合に、前記検索特徴ワード群のうち先頭から所定数の検索特徴ワードを照合対象とし、前記登録特徴ワード群のうち先頭から前記所定数の登録特徴ワードのみを照合対象および前記照合範囲として決定するものであることを特徴とする音響データの関連情報検索装置を提供する。 In order to solve the above-mentioned problem, in the first aspect of the present invention, the search sound data is collated with the characteristics of the search sound data which is given sound data and the characteristics of the original sound data registered in the sound database, and the search sound is recorded. A device that performs a search by identifying original sound data having features close to the data, and for each original sound data, a registered feature word group that is a set of registered feature words expressing the feature pattern, and the original sound data A search feature word expressing a feature pattern of the search acoustic data in each section is generated in a predetermined section unit with respect to the acoustic database in which the related information is registered and the search sound data, and a set of the search feature words The search feature word generation means for obtaining a feature word group, the search feature word group, and the registered feature word group, A collation range determination means for determining a movement range of a registered feature word to be collated in a registered feature word group as a collation range based on a relationship between a registered feature word group and a search feature word group, The collation target in the feature word group and the collation target in the registered feature word group registered in the acoustic database are collated while moving the collation target in the registered feature word group in the collation range. there dissimilarity, when the predetermined value (determination threshold M2) smaller than possess a characteristic word matching means for identifying the original sound data corresponding to the registered feature word groups as selected object, wherein the collation range determined The means is characterized in that the number of registered feature words constituting the registered feature word group is equal to the number of search feature words constituting the search feature word group. The registered feature word group is excluded from the search target when the number is smaller, and when the search acoustic data is set to start from the beginning of the acoustic material, the search feature word group is predetermined from the beginning. the number of search features words and comparison target, related from the beginning of the registered feature word groups of acoustic data, characterized in der Rukoto what determines only registered feature word of the predetermined number as a comparison target and the collation range An information retrieval apparatus is provided.

本発明第１の態様によれば、各原音響データについて、その特徴パターンを表現した登録特徴ワードの集合である登録特徴ワード群を音響データベースに登録しておき、検索音響データに対して、所定の区間単位で、各区間における検索音響データの特徴パターンを表現した検索特徴ワードを生成し、当該検索特徴ワードの集合である特徴ワード群を得て、登録特徴ワード群と検索特徴ワード群の関係に基づいて、登録特徴ワード群における照合対象の登録特徴ワードの移動範囲を照合範囲として決定し、検索特徴ワード群における照合対象と、登録特徴ワード群における照合対象を、登録特徴ワード群における照合対象を照合範囲において移動させながら照合を行い、照合の結果登録特徴ワード群における照合対象の登録特徴ワードを決定し、検索特徴ワード群のうち一部を照合対象とし、照合対象とされた検索特徴ワード群と、音響データベースに登録された登録特徴ワード群のうち決定された照合範囲に含まれる登録特徴ワード群との照合を行い、照合の結果、相違の程度である相違度が、所定値（判定しきい値Ｍ２）より小さい場合に、登録特徴ワード群に対応する原音響データを選出対象として特定するようにしたので、音響データベースに登録されている音響データに関連する関連情報を検索するにあたり、与えられた音響データの状態に基づいて、照合する範囲を制限することにより、処理負荷を抑えることが可能となる。また、本発明第１の態様によれば、登録特徴ワード群を構成する特徴ワードの数が、検索特徴ワード群を構成する検索特徴ワードの数より少ない場合に、登録特徴ワード群における照合範囲の登録特徴ワード数を０に設定するようにしたので、検索音響データの演奏時間が、原音響データより長い場合は、その原音響データを検索対象から除外することができ、全体として処理負荷を抑えることが可能となる。また、本発明第１の態様によれば、検索音響データが音響素材の先頭から始まることが設定されている場合に、検索特徴ワード群のうち先頭から所定数の検索特徴ワードを照合対象とし、登録特徴ワード群のうち先頭から同数の登録特徴ワードのみを照合対象および照合範囲として決定する。すなわち、登録特徴ワードの照合範囲において照合対象を移動させないことにする。これにより、音響素材の先頭から始まることが確実である検索音響データについて、登録特徴ワード群を的確な範囲に制限して照合処理を行うことができ、全体として処理負荷を抑えることが可能となる。 According to the first aspect of the present invention, for each original sound data, a registered feature word group, which is a set of registered feature words representing the feature pattern, is registered in the sound database, and the search sound data is predetermined. The search feature word expressing the feature pattern of the search acoustic data in each section is generated for each section, the feature word group that is a set of the search feature words is obtained, and the relationship between the registered feature word group and the search feature word group Based on the above, the movement range of the registered feature word to be collated in the registered feature word group is determined as the collation range, and the collation target in the search feature word group and the collation target in the registered feature word group are determined as the collation target in the registered feature word group. The matching feature word in the registered feature word group is determined as a result of matching. A part of the search feature word group to be collated, a search feature word group that is a collation target, and a registered feature word group that is included in the determined collation range among the registered feature word groups registered in the acoustic database; When the degree of difference, which is the degree of difference, is smaller than a predetermined value (determination threshold M2) , the original acoustic data corresponding to the registered feature word group is specified as the selection target. Therefore, when searching related information related to the acoustic data registered in the acoustic database, it is possible to reduce the processing load by limiting the range to be collated based on the state of the given acoustic data. Become. According to the first aspect of the present invention, when the number of feature words constituting the registered feature word group is smaller than the number of search feature words constituting the search feature word group, the collation range in the registered feature word group is Since the number of registered feature words is set to 0, if the performance time of the search sound data is longer than the original sound data, the original sound data can be excluded from the search target, and the processing load is reduced as a whole. It becomes possible. Further, according to the first aspect of the present invention, when the search acoustic data is set to start from the beginning of the acoustic material, a predetermined number of search feature words from the beginning of the search feature word group are to be collated. Only the same number of registered feature words from the beginning of the registered feature word group are determined as a collation target and a collation range. That is, the collation target is not moved within the collation range of the registered feature word. As a result, it is possible to perform collation processing by limiting the registered feature word group to an accurate range for the search acoustic data that is sure to start from the beginning of the acoustic material, and the processing load as a whole can be suppressed. .

また、本発明第２の態様では、本発明第１の態様の音響データの関連情報検索装置において、前記照合範囲決定手段は、前記検索音響データが音響素材の先頭から始まることが設定されていない場合に、前記検索特徴ワード群のうち先頭から所定数の検索特徴ワードを照合対象、その他を照合対象外とし、前記登録特徴ワード群のうち先頭から前記照合対象外の検索特徴ワードと同数の登録特徴ワードを除外した残りの登録特徴ワードを、前記照合範囲として決定することを特徴とする。 Moreover, in the second aspect of the present invention, in the related information search device for acoustic data according to the first aspect of the present invention, the collation range determining means is not set so that the searched acoustic data starts from the beginning of the acoustic material. In this case, a predetermined number of search feature words from the beginning of the search feature word group are subject to collation, and others are not subject to collation, and the same number of registration as the search feature words that are not subject to collation from the top of the registered feature word group The remaining registered feature words excluding the feature words are determined as the collation range.

本発明第２の態様によれば、検索音響データが音響素材の先頭から始まることが設定されていない場合に、検索特徴ワード群のうち先頭から所定数の検索特徴ワードを照合対象、その他を照合対象外とし、登録特徴ワード群のうち先頭から照合対象外の検索特徴ワードと同数の登録特徴ワードを除外した残りの登録特徴ワードを、照合範囲として決定するようにしたので、音響素材の先頭から始まるかどうか不明である検索音響データについて、登録特徴ワード群を的確な範囲に制限して照合処理を行うことができ、全体として処理負荷を抑えることが可能となる。 According to the second aspect of the present invention, when the search acoustic data is not set to start from the beginning of the acoustic material, a predetermined number of search feature words from the beginning of the search feature word group are collated and others are collated. The remaining registered feature words, excluding the same number of registered feature words as the non-matching search feature words from the beginning of the registered feature word group, are determined as the matching range from the beginning of the registered feature word group. With respect to search acoustic data that is unclear whether or not it starts, collation processing can be performed by restricting the registered feature word group to an appropriate range, and the processing load as a whole can be suppressed.

また、本発明第３の態様では、本発明第１の態様の音響データの関連情報検索装置において、前記照合範囲決定手段は、前記検索音響データが音響素材の先頭から始まることが設定されていない場合に、前記検索特徴ワード群のうち末尾から所定数の検索特徴ワードを照合対象、その他を照合対象外とし、前記登録特徴ワード群のうち末尾から前記照合対象外の検索特徴ワードと同数の登録特徴ワードを除外した残りの登録特徴ワードを、前記照合範囲として決定することを特徴とする。 In the third aspect of the present invention, in the related information search apparatus for acoustic data according to the first aspect of the present invention, the collation range determination means is not set so that the searched acoustic data starts from the beginning of the acoustic material. In this case, a predetermined number of search feature words from the end of the search feature word group are to be collated, and others are not to be collated, and the same number of registration as the search feature words from the end of the registered feature word group that are not to be collated. The remaining registered feature words excluding the feature words are determined as the collation range.

本発明第３の態様によれば、検索音響データが音響素材の先頭から始まることが設定されていない場合に、検索特徴ワード群のうち末尾から所定数の検索特徴ワードを照合対象、その他を照合対象外とし、登録特徴ワード群のうち末尾から照合対象外の検索特徴ワードと同数の登録特徴ワードを除外した残りの登録特徴ワードを、照合範囲として決定するようにしたので、音響素材の先頭から始まるかどうか不明である検索音響データについて、登録特徴ワード群を的確な範囲に制限して照合処理を行うことができ、全体として処理負荷を抑えることが可能となる。 According to the third aspect of the present invention, when the search acoustic data is not set to start from the beginning of the acoustic material, a predetermined number of search feature words from the end of the search feature word group are collated, and the others are collated. The remaining registered feature words, excluding the same number of registered feature words as the non-matching search feature words from the end of the registered feature word group, are determined as the matching range from the end of the registered feature word group. With respect to search acoustic data that is unclear whether or not it starts, collation processing can be performed by restricting the registered feature word group to an appropriate range, and the processing load as a whole can be suppressed.

また、本発明第４の態様では、本発明第１の態様の音響データの関連情報検索装置において、前記照合範囲決定手段は、前記検索音響データが音響素材の先頭から始まることが設定されていない場合に、前記検索特徴ワード群のうち中央の所定数の検索特徴ワードを照合対象、その他を照合対象外とし、前記登録特徴ワード群のうち末尾から前記照合対象外の検索特徴ワードと同数の登録特徴ワードを除外した中央の登録特徴ワードを、前記照合範囲として決定することを特徴とする。 In the fourth aspect of the present invention, in the related information search device for acoustic data according to the first aspect of the present invention, the collation range determination means is not set so that the searched acoustic data starts from the beginning of the acoustic material. In this case, a predetermined number of central search feature words in the search feature word group are to be collated, and others are not to be collated, and from the end of the registered feature word group, the same number of registration as the search feature words not to be collated A central registered feature word excluding the feature word is determined as the collation range.

本発明第４の態様によれば、検索音響データが音響素材の先頭から始まることが設定されていない場合に、検索特徴ワード群のうち中央の所定数の検索特徴ワードを照合対象、その他を照合対象外とし、登録特徴ワード群のうち末尾から照合対象外の検索特徴ワードと同数の登録特徴ワードを除外した中央の登録特徴ワードを、照合範囲として決定するようにしたので、音響素材の先頭から始まるかどうか不明である検索音響データについて、登録特徴ワード群を的確な範囲に制限して照合処理を行うことができ、全体として処理負荷を抑えることが可能となる。 According to the fourth aspect of the present invention, when the search sound data is not set to start from the head of the sound material, a predetermined number of search feature words in the center of the search feature word group are to be checked, and the others are checked. Since the central registered feature word that excludes the same number of registered feature words as the non-matching search feature words from the end of the registered feature word group is determined as the matching range from the end, it is determined from the beginning of the acoustic material. With respect to search acoustic data that is unclear whether or not it starts, collation processing can be performed by restricting the registered feature word group to an appropriate range, and the processing load as a whole can be suppressed.

また、本発明第５の態様では、本発明第１から第４のいずれか１つの態様の音響データの関連情報検索装置において、前記検索特徴ワード生成手段は、前記検索特徴ワードを位相ｈを変更しながらＨ通り生成し、当該検索特徴ワードの集合であるＨ通りの検索特徴ワード群を得るものであり、前記特徴ワード照合手段は、前記検索特徴ワード群におけるＷ個の検索特徴ワード、前記登録特徴ワード群におけるＷ個の登録特徴ワードを前記照合対象とし、位相ｈを変更しながらＷ個の検索特徴ワードと登録特徴ワードで照合を行い、１特徴ワード同士のワード単位相違度（Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ））が、Ｗ回の照合で全て所定の判定しきい値（Ｍｗ２）より小さい場合に、Ｗ個の前記ワード単位相違度の総和である合算相違度（Ｓ（ｒ，ｙ，ｈ，ｘ））を前記相違度として、前記所定値（判定しきい値Ｍ２）と比較することを特徴とする。 In the fifth aspect of the present invention, in the acoustic data related information search apparatus according to any one of the first to fourth aspects of the present invention, the search feature word generation means changes the phase h of the search feature word. H types of search feature words are generated, and H search feature word groups, which are sets of the search feature words, are obtained. The feature word collating means includes W search feature words in the search feature word group, the registration The W registered feature words in the feature word group are set as the target of collation, and collation is performed with the W search feature words and the registered feature words while changing the phase h, and the word unit difference ( D (r , y + w, h, x + w)) is, when W times all at a collation predetermined determination threshold value (Mw2) smaller than, W pieces of said word units dissimilarity sum is a summation difference of the (S ( , Y, h, as the degree of difference of x)), and comparing the predetermined value (determination threshold M2).

本発明第５の態様によれば、位相ｈを変更しながらＷ個の特徴ワード同士で照合を行い、１特徴ワード同士のワード単位相違度（Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ））が、Ｗ回の照合で全て所定の判定しきい値（Ｍｗ２）より小さい場合に、Ｗ個のワード単位相違度の総和である合算相違度（Ｓ（ｒ，ｙ，ｈ，ｘ））と所定値（判定しきい値Ｍ２）と比較するようにしたので、個々の特徴ワードについて相違が大きい場合、特徴ワード群としての照合処理を省略するため、全体として処理負荷を軽減することが可能となる。 According to the fifth aspect of the present invention, the W feature words are collated while changing the phase h, and the word unit difference ( D (r, y + w, h, x + w) ) of one feature word is When all of W collations are smaller than a predetermined determination threshold value ( Mw2 ) , the total dissimilarity ( S (r, y, h, x) ) that is the sum of W word unit dissimilarities and a predetermined value ( Since the comparison is made with the determination threshold value M2 ) , if there is a large difference between individual feature words, the collation processing as the feature word group is omitted, so that the processing load as a whole can be reduced.

本発明によれば、楽曲等の音響データを用いて、音響データベースに登録されている音響データに関連する関連情報を検索するにあたり、与えられた音響データの状態に基づいて、照合する範囲を制限することにより、処理負荷を抑えることが可能となるという効果を有する。 According to the present invention, when searching related information related to acoustic data registered in the acoustic database using acoustic data such as music, the range to be collated is limited based on the state of the given acoustic data. By doing so, it is possible to suppress the processing load.

関連情報登録装置のハードウェア構成図である。It is a hardware block diagram of a related information registration apparatus. 関連情報登録装置の機能ブロック図である。It is a functional block diagram of a related information registration device. 登録特徴ワードの生成処理を示すフローチャートである。It is a flowchart which shows the production | generation process of a registration characteristic word. 特徴ワードの生成処理の概念図である。It is a conceptual diagram of the generation process of the characteristic word. 登録特徴ワードに基づく代表登録特徴データの算出処理を示すフローチャートである。It is a flowchart which shows the calculation process of the representative registration feature data based on a registration feature word. Ｓ１９、Ｓ２０による登録特徴データ配列の平均値、標準偏差の算出処理の概念図である。It is a conceptual diagram of the calculation process of the average value and standard deviation of the registration characteristic data array by S19 and S20. 本発明に係る音響データの関連情報検索装置のハードウェア構成図である。It is a hardware block diagram of the related information search apparatus of the acoustic data which concerns on this invention. 本発明に係る音響データの関連情報検索装置の機能ブロック図である。It is a functional block diagram of the related information search device for acoustic data according to the present invention. 検索特徴ワード生成手段５０、代表検索特徴データ生成手段６０、代表特徴データ照合手段８０、特徴ワード照合手段９０の処理の概要を示す図である。It is a figure which shows the outline | summary of a process of the search feature word production | generation means 50, the representative search feature data production | generation means 60, the representative feature data collation means 80, and the feature word collation means 90. FIG. 検索特徴ワードの生成処理を示すフローチャートである。It is a flowchart which shows the production | generation process of a search characteristic word. 検索特徴ワードに基づく代表検索特徴データの算出処理を示すフローチャートである。It is a flowchart which shows the calculation process of the representative search feature data based on a search feature word. Ｓ２９、Ｓ３０による代表検索特徴データの平均値、標準偏差の算出処理の概念図である。It is a conceptual diagram of the calculation process of the average value of standard search feature data by S29 and S30, and a standard deviation. 照合範囲決定の概念を示す図である。It is a figure which shows the concept of collation range determination. 代表特徴データの概念を示す図である。It is a figure which shows the concept of representative feature data. 特徴ワード照合手段９０による検索特徴ワード群を用いた検索のフローチャートである。5 is a flowchart of a search using a search feature word group by a feature word collating unit 90. 図１５のＳ２２０におけるレコードｒの照合処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the collation process of the record r in S220 of FIG. 図１６のＳ２２２における合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）の算出処理の詳細を示すフローチャートである。FIG. 17 is a flowchart showing details of a calculation process of a total mismatch bit number S (r, y, ho, x) in S222 of FIG. 図１６のＳ２２４におけるレコード内照合処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the collation process in a record in S224 of FIG. 図１８のＳ３２２における合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）の算出処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the calculation process of sum total mismatch bit number S (r, y, h, x) in S322 of FIG.

＜１．音響データの関連情報登録装置＞
以下、本発明の実施形態について図面を参照して詳細に説明する。まず、音響データの関連情報の登録について説明する。音響データの関連情報の登録は、音響データの関連情報の登録装置（以下「関連情報登録装置」という。）により行う。関連情報登録装置は、原音響データから登録特徴ワード、代表登録特徴データを作成し、当該音響データに関連する関連情報（一般にメタデータと呼ばれる）とともに、作成した登録特徴ワード、代表登録特徴データを、音響データを特定する情報（例えば、音響データＩＤ）と対応付け、音響データベースに登録する。１つの原音響データに関する音響データＩＤ、登録特徴ワード、代表登録特徴データ、関連情報は１レコードとして音響データベースに登録される。この音響データベースは、原音響データの関連情報を検索するために用いられるものであり、原音響データ自体は、登録されないのが通常である。これは、著作権上の問題であり、機能的には、原音響データを登録する構成とすることも当然可能である。 <1. Sound data related information registration device>
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. First, registration of related information of acoustic data will be described. Registration of the related information of the acoustic data is performed by a registration device for the related information of the acoustic data (hereinafter referred to as “related information registration device”). The related information registration device creates a registered feature word and representative registered feature data from the original sound data, and creates the registered feature word and representative registered feature data together with related information related to the sound data (generally called metadata). In association with information for specifying acoustic data (for example, acoustic data ID), it is registered in the acoustic database. The acoustic data ID, registered feature word, representative registered feature data, and related information related to one original acoustic data are registered in the acoustic database as one record. This acoustic database is used for searching related information of original sound data, and the original sound data itself is usually not registered. This is a copyright problem, and functionally, it is naturally possible to adopt a configuration in which the original sound data is registered.

音響データとは、音楽や音声等をデジタル形式で記録したものであり、アナログ音響信号に対して、ＰＣＭ等の手法によりサンプリングして得られたものである。そして、原音響データとは、検索対象とされる楽曲等の音響素材の音響データである。著作権保護対策から、ＣＤ原盤の品質をもつＰＣＭ形式の音響データは一般にライセンス配布されないことが多いため、音響データベースに登録するデータとしては、あらかじめＭＰ３（MPEG-1/Layer3）などの各種非可逆圧縮処理が施された音響データファイルが与えられる場合が一般的である。しかし、入手できたデータがＭＰ３形式であったとしても、特徴ワードを作成するためには、ＭＰ３形式のデータを伸張し、サンプル列の音響データを生成する必要がある。 The acoustic data is recorded in a digital format such as music and voice, and is obtained by sampling an analog acoustic signal by a technique such as PCM. The original sound data is sound data of a sound material such as music to be searched. Because of copyright protection measures, PCM-format audio data with CD master disc quality is generally not distributed by license. Therefore, various irreversible data such as MP3 (MPEG-1 / Layer3) are pre-registered in the audio database. In general, an acoustic data file subjected to compression processing is given. However, even if the obtained data is in the MP3 format, in order to create the feature word, it is necessary to decompress the MP3 format data and generate the acoustic data of the sample sequence.

図１は、関連情報登録装置のハードウェア構成図である。関連情報登録装置は、汎用のコンピュータで実現することができ、図１に示すように、ＣＰＵ２ａ（CPU: Central Processing Unit）と、コンピュータのメインメモリであるＲＡＭ２ｂ（RAM: Random Access Memory）と、データを記憶するための大容量のデータ記憶装置２ｃ（例えば，ハードディスク）と、ＣＰＵが実行するプログラムを記憶するためのプログラム記憶装置２ｄ（例えば，ハードディスク）と、キーボード、マウス等のキー入力Ｉ／Ｆ２ｅと、外部デバイス（データ記憶媒体）とデータ通信するためのデータ入出力インターフェース２ｆと、表示デバイス（ディスプレイ）に情報を送出するための表示出力インターフェース２ｇと、を備え、互いにバスを介して接続されている。 FIG. 1 is a hardware configuration diagram of a related information registration apparatus. The related information registration device can be realized by a general-purpose computer. As shown in FIG. 1, a CPU 2a (CPU: Central Processing Unit), a main memory RAM 2b (RAM: Random Access Memory), and data A large-capacity data storage device 2c (for example, a hard disk), a program storage device 2d (for example, a hard disk) for storing programs executed by the CPU, and a key input I / F 2e such as a keyboard and a mouse And a data input / output interface 2f for data communication with an external device (data storage medium) and a display output interface 2g for sending information to a display device (display), which are connected to each other via a bus. ing.

関連情報登録装置のプログラム記憶装置２ｄには、ＣＰＵ２ａを動作させ、コンピュータを、関連情報登録装置として機能させるための専用のプログラムが実装されている。また、データ記憶装置２ｃは、処理結果として得られる登録特徴ワード、代表登録特徴データ等を関連情報と対応付けて記憶し、音響データベースとして機能するとともに、処理に必要な様々なデータを記憶する。 The program storage device 2d of the related information registration device is mounted with a dedicated program for operating the CPU 2a and causing the computer to function as the related information registration device. The data storage device 2c stores a registered feature word, representative registered feature data, and the like obtained as a processing result in association with related information, functions as an acoustic database, and stores various data necessary for processing.

図２は、関連情報登録装置の機能ブロック図である。図２において、１０は登録特徴ワード生成手段、２０は代表登録特徴データ生成手段、３０は登録手段、４０は音響データベースである。上述のように、各手段は、ＣＰＵ２ａがプログラム記憶装置２ｄから読み込んだ専用のプログラムを実行することにより実現される。 FIG. 2 is a functional block diagram of the related information registration apparatus. In FIG. 2, 10 is a registered feature word generating means, 20 is a representative registered feature data generating means, 30 is a registering means, and 40 is an acoustic database. As described above, each unit is realized by the CPU 2a executing a dedicated program read from the program storage device 2d.

登録特徴ワード生成手段１０は、音響データから所定数のサンプルを音響フレームとして順次読み込み、読み込んだ音響フレームを利用して、周波数解析を行い、その音響データの特徴を表現した特徴ワードを生成する機能を有している。この特徴ワードは、ある音響データの特徴を少ないデータ量で表現したものであり、スペクトルの特徴を表した特徴パターンと、音量データにより構成される（著作権法上、生成された特徴ワードより原音響データを再現できない、即ち複製行為ができないことが要求され、特徴ワードはその条件を満たすので音響データベースへの登録が認められている。）。音響データベース４０に登録される特徴ワードを特に「登録特徴ワード」と呼ぶ。また、この特徴ワードを作成する基になる音響データを特に「原音響データ」と呼ぶ。この「原音響データ」としては、著作権者等が有している「原本」となるデータそのものではなく、この「原本」に著作権保護のための改変が施されたものを用いるのが普通である。もちろん「原本」となるデータそのものを「原音響データ」として用いることも可能である。後述するように、本発明においては、部分特徴ワードや検索特徴ワード等の他の特徴ワードが出現するが、これらは、いずれも特徴ワードの基本的な構造としては同一であるが、部分特徴ワードと検索特徴ワードは登録特徴ワードと異なり、後述する検索処理において位相をずらした照合に対応させるため、位相をずらした複数（本実施形態では５種）の特徴ワード群のセットをもたせているという相違がある。また、本明細書では、４０ビット構成の最小単位を「特徴ワード」、照合に用いられる特徴ワードの集合を「特徴ワード群」と呼ぶ。 The registered feature word generation unit 10 sequentially reads a predetermined number of samples as acoustic frames from the acoustic data, performs frequency analysis using the read acoustic frames, and generates a feature word expressing the features of the acoustic data have. This feature word expresses features of certain acoustic data with a small amount of data, and is composed of a feature pattern representing the features of the spectrum and volume data. It is required that the acoustic data cannot be reproduced, i.e., cannot be duplicated, and the feature word satisfies the condition, so registration in the acoustic database is permitted). A feature word registered in the acoustic database 40 is particularly called a “registered feature word”. In addition, the sound data on which the feature word is created is particularly referred to as “original sound data”. As this "original sound data", it is normal to use the "original" that has been modified to protect the copyright, not the original data itself owned by the copyright holders. It is. Of course, the data itself that is the “original” can also be used as the “original sound data”. As will be described later, in the present invention, other feature words such as a partial feature word and a search feature word appear. These are the same as the basic structure of the feature word, but the partial feature word. Unlike the registered feature word, the search feature word is provided with a set of a plurality of feature words (five types in the present embodiment) whose phases are shifted in order to correspond to collation in which phases are shifted in search processing described later. There is a difference. In this specification, the minimum unit of 40-bit configuration is called a “feature word”, and a set of feature words used for collation is called a “feature word group”.

代表登録特徴データ生成手段２０は、登録特徴ワード生成手段１０により生成された登録特徴ワード群を用いて、１音響データにつき、１つの代表登録特徴データを生成する機能を有する。登録特徴ワードは、原音響データの部分的な特徴を表現するのに対して、代表登録特徴データは、１つの原音響データの全体的な特徴を表現する。 The representative registered feature data generating unit 20 has a function of generating one representative registered feature data for each piece of acoustic data using the registered feature word group generated by the registered feature word generating unit 10. The registered feature word represents a partial feature of the original sound data, whereas the representative registered feature data represents the overall feature of one original sound data.

登録手段３０は、登録特徴ワード生成手段１０により生成された登録特徴ワード群と、代表登録特徴データ生成手段２０により生成された代表登録特徴データを、元の原音響データの制作や著作権に関連する関連情報（一般にメタデータと呼ばれる）、および原音響データを特定するために原音響データの著作権情報等を管理する事業者が個別に定義付けたＩＤと対応付けて音響データベース４０に登録する機能を有している。ここで、関連情報とは、楽曲名、ジャンル名など楽曲を特定するテキスト情報、作詞・作曲・編曲者名、アーチスト名、プロデューサ名など原音響データの制作に関わる著作権者・著作隣接権者名に関するテキスト情報を示すものである。ただし、原音響データそのものは著作権法上の制約から、音響データベース４０に通常登録することはない。また、原音響データの制作・マスタリングに使用した一連のバイナリ形式の素材データ（ミックスダウンする前の個別の録音データ、ＭＩＤＩ打ち込みデータ）等についても、著作権法上の制約により通常登録することはない。 The registration unit 30 relates the registered feature word group generated by the registered feature word generation unit 10 and the representative registered feature data generated by the representative registered feature data generation unit 20 to the production and copyright of the original original sound data. Related information (generally referred to as metadata) and registered in the acoustic database 40 in association with an ID individually defined by a company that manages copyright information of the original acoustic data in order to identify the original acoustic data. It has a function. Here, related information refers to text information that identifies the song, such as the song name and genre name, and the copyright owner and copyright owner who are involved in the production of the original acoustic data such as the lyrics, composer, arranger name, artist name, producer name, etc. Indicates text information about the name. However, the original sound data itself is not normally registered in the sound database 40 due to restrictions on the copyright law. In addition, a series of binary-format material data (individual recording data before mixing down, MIDI input data) used for production and mastering of the original sound data is normally registered due to copyright law restrictions. Absent.

＜１．２．関連情報登録装置の処理動作＞
次に、図２に示した関連情報登録装置の処理動作について説明する。まず、関連情報登録装置では、登録特徴ワード生成手段１０が、指定された原音響データから登録特徴ワードを生成する。図３は、登録特徴ワードの生成処理を示すフローチャートである。まず、登録特徴ワード生成手段１０が、原音響データを読み込む。関連情報登録装置では、登録特徴ワード生成手段１０が、指定された原音響データから、所定数のサンプルを１音響フレームとして読み込む。登録特徴ワード生成手段１０が読み込む１音響フレームのサンプル数は、適宜設定することができるが、サンプリング周波数が４４．１ｋＨｚの場合、４０９６サンプル程度とすることが望ましい。これは、約０．０９２秒に相当する。ただし、後述する周波数変換におけるハニング窓関数の利用により、隣接窓間の連続性を考慮して、音響フレームは、所定数分のサンプルを重複させて読み込むことにしている。本実施形態では、音響フレームの区間長のちょうど半分となる２０４８サンプルを重複させている。したがって、先頭の音響フレームはサンプル１〜４０９６、２番目の音響フレームはサンプル２０４９〜６１４４、３番目の音響フレームはサンプル４０９７〜８１９２というように、順次読み込まれていくことになる。 <1.2. Processing operation of related information registration device>
Next, the processing operation of the related information registration apparatus shown in FIG. 2 will be described. First, in the related information registration device, the registered feature word generation means 10 generates a registered feature word from the designated original sound data. FIG. 3 is a flowchart showing a registration feature word generation process. First, the registered feature word generation means 10 reads the original sound data. In the related information registration device, the registered feature word generation means 10 reads a predetermined number of samples as one acoustic frame from the designated original acoustic data. The number of samples of one acoustic frame read by the registered feature word generation unit 10 can be set as appropriate, but is desirably about 4096 samples when the sampling frequency is 44.1 kHz. This corresponds to about 0.092 seconds. However, in consideration of the continuity between adjacent windows by using a Hanning window function in frequency conversion, which will be described later, the acoustic frame is read by overlapping a predetermined number of samples. In this embodiment, 2048 samples that are exactly half the section length of the acoustic frame are overlapped. Therefore, the first acoustic frame is sequentially read as samples 1 to 4096, the second acoustic frame is samples 2049 to 6144, and the third acoustic frame is samples 4097 to 8192.

続いて、登録特徴ワード生成手段１０は、読み込んだ各音響フレームに対して、周波数変換を行って、その音響フレームのスペクトルであるフレームスペクトルを得る（Ｓ１１）。具体的には、登録特徴ワード生成手段１０が読み込んだ音響フレームについて、窓関数を利用して周波数変換を行う。周波数変換としては、フーリエ変換、ウェーブレット変換その他公知の種々の手法を用いることができる。本実施形態では、フーリエ変換を用いた場合を例にとって説明する。 Subsequently, the registered feature word generation unit 10 performs frequency conversion on each read sound frame to obtain a frame spectrum that is a spectrum of the sound frame (S11). Specifically, frequency conversion is performed on the acoustic frame read by the registered feature word generation means 10 using a window function. As frequency conversion, Fourier transform, wavelet transform, and other various known methods can be used. In the present embodiment, a case where Fourier transform is used will be described as an example.

ここで、本実施形態においてフーリエ変換に利用する窓関数について説明しておく。一般に、所定の信号に対してフーリエ変換を行う場合、信号を所定の長さに区切って行う必要があるが、この場合、所定の長さの信号に対してそのままフーリエ変換（正確には短時間フーリエ変換とよばれる）を行うと、高域部に擬似成分が発生する。そこで、一般にフーリエ変換を行う場合には、ハニング窓と呼ばれる窓関数を用いて、窓の境界部のコサイン波形状で重みを落とすように信号の値を変化させた後、変化後の値に対してフーリエ変換を実行する。 Here, a window function used for Fourier transform in the present embodiment will be described. In general, when Fourier transform is performed on a predetermined signal, it is necessary to divide the signal into predetermined lengths. In this case, the Fourier transform (precisely, for a short time) is performed on a signal having a predetermined length. When called (Fourier transform), a pseudo component is generated in the high frequency region. Therefore, in general, when performing Fourier transform, the signal value is changed so that the weight is dropped by the cosine wave shape at the boundary of the window using a window function called a Hanning window, and then the value after the change is changed. To perform the Fourier transform.

Ｓ１１においてフーリエ変換を行う場合、具体的には、サンプルｉにおける値Ｘ（ｉ）（ｉ＝０，…，Ｎ−１）に対して、０〜１の実数値をもち、Ｎサンプル区間に定義されるハニング窓関数Ｗ（ｉ）（＝０．５−０．５ｃｏｓ（２πｉ／Ｎ）を用いて、以下の〔数式１〕の第１式、第２式に従った処理を行い、各周波数における実部Ａ（ｊ）、虚部Ｂ（ｊ）を得る。 When the Fourier transform is performed in S11, specifically, the value X (i) (i = 0,..., N−1) in the sample i has a real value of 0 to 1 and is defined in the N sample section. Using the Hanning window function W (i) (= 0.5−0.5 cos (2πi / N)), the processing according to the first and second formulas of [Formula 1] Real part A (j) and imaginary part B (j) are obtained.

続いて、スペクトル成分の算出を行う（Ｓ１２）。具体的には、以下の〔数式１〕第３式に従った処理を行い、各周波数における強度値Ｅ（ｊ）を得る。 Subsequently, a spectral component is calculated (S12). Specifically, processing according to the following [Formula 1] third formula is performed to obtain an intensity value E (j) at each frequency.

〔数式１〕
Ａ（ｊ）＝Σ_i=0,…,N-1Ｗ（ｉ）・Ｘ（ｉ）・ｃｏｓ（２πｉｊ／Ｎ）
Ｂ（ｊ）＝Σ_i=0,…,N-1Ｗ（ｉ）・Ｘ（ｉ）・ｓｉｎ（２πｉｊ／Ｎ）
Ｅ（ｊ）＝Ａ（ｊ）²＋Ｂ（ｊ）² [Formula 1]
A (j) = Σ _{i = 0,..., N-1} W (i) · X (i) · cos (2πij / N)
B (j) = Σ _{i = 0,..., N-1} W (i) · X (i) · sin (2πij / N)
E (j) = A (j) ² + B (j) ²

〔数式１〕において、ｉは、各音響フレーム内のＮ個のサンプルに付した通し番号であり、ｉ＝０，１，２，…Ｎ−１の整数値をとる。また、ｊは周波数の値について、値の小さなものから順に付した通し番号であり、ｊ＝０，１，２，…，Ｎ／２−１の整数値をとる。サンプリング周波数が４４．１ｋＨｚ、Ｎ＝４０９６の場合、ｊの値が１つ異なると、周波数が１０．８Ｈｚ異なることになる。 In [Expression 1], i is a serial number assigned to N samples in each acoustic frame, and takes an integer value of i = 0, 1, 2,... N−1. Further, j is a serial number assigned in order from the smallest value of the frequency value, and takes an integer value of j = 0, 1, 2,..., N / 2-1. When the sampling frequency is 44.1 kHz and N = 4096, if the value of j is different by one, the frequency will be different by 10.8 Hz.

続いて、スペクトル成分の統合処理を行う（Ｓ１３）。上記の周波数変換により、２２ｋＨｚ付近までのスペクトル成分が得られるが、本実施形態における特徴ワードの生成には、３４０Ｈｚ以上で４ｋＨｚ付近より低い範囲のスペクトル成分を用いる。これは、携帯電話の音声再生で使用される３ＧＰＰ規格等の音声圧縮形式に対応させるためである（ただし、本実施形態では常にデジタル音響データが与えられるため、携帯電話の音声録音信号を用いた照合には対応する必要はない。）。そのため、正確に、携帯電話の音声再生範囲に合わせる場合は、特徴ワードの生成の上限を３．４ｋＨｚ付近とするようにしても良い。本実施形態では、ｊ＝０〜２０４７の周波数成分のうち、４ｋＨｚ付近より高いｊ＝３８５〜２０４７については利用しない。また、３４０Ｈｚ以下であるｊ＝０〜３２の低周波成分についても利用しない。すなわち、本実施形態では、ｊ＝３３〜３８４の周波数成分を用いる。具体的には、以下の〔数式２〕に従った処理を実行し、１１周波数成分単位のＰｎに統合することになる。 Subsequently, an integration process of spectrum components is performed (S13). Although the spectral component up to about 22 kHz is obtained by the above frequency conversion, the spectral component in the range of 340 Hz or higher and lower than about 4 kHz is used for generation of the feature word in this embodiment. This is in order to correspond to a voice compression format such as 3GPP standard used for voice reproduction of a cellular phone (however, since digital acoustic data is always given in this embodiment, a voice recording signal of a cellular phone is used. There is no need to deal with matching.) For this reason, in order to accurately match the voice reproduction range of the mobile phone, the upper limit of the feature word generation may be set to around 3.4 kHz. In the present embodiment, j = 385 to 2047 higher than the vicinity of 4 kHz among the frequency components of j = 0 to 2047 is not used. Also, the low frequency component of j = 0 to 32 that is 340 Hz or less is not used. That is, in the present embodiment, j = 33 to 384 frequency components are used. Specifically, processing according to the following [Equation 2] is executed and integrated into Pn in units of 11 frequency components.

〔数式２〕
Ｐ０＝（Ｅ₃₃＋Ｅ₃₄＋…＋Ｅ₄₃）^1/4
Ｐ１＝（Ｅ₄₄＋Ｅ₄₅＋…＋Ｅ₅₄）^1/4
：
：
Ｐ３１＝（Ｅ₃₇₄＋Ｅ₃₇₅＋…＋Ｅ₃₈₄）^1/4 [Formula 2]
_{_{P0 = (E 33 + E 34}} + ... + E 43) 1/4
_{_{P1 = (E 44 + E 45}} + ... + E 54) 1/4
:
:
P31 = ( _E374 + _E375 + ... + _E384 ) ^1/4

上記〔数式２〕により、ｊ＝３３〜３８４の３５２個の周波数成分が、ｎ＝０〜３１の３２個の周波数成分に統合されることになる。上記処理は、各音響フレームについて行われ、各音響フレームについて、３２個の周波数成分が得られることになる。 According to the above [Equation 2], 352 frequency components of j = 33 to 384 are integrated into 32 frequency components of n = 0 to 31. The above process is performed for each acoustic frame, and 32 frequency components are obtained for each acoustic frame.

次に、各音響フレームについて、直前の音響フレームのスペクトル成分との差分を算出する（Ｓ１４）。上記Ｓ１１〜Ｓ１３の処理は、各音響フレームに対して順次行われる。このＳ１４におけるフレーム間差分の算出処理は、各音響フレームについてＳ１３までの処理を行った結果得られたＰ０〜Ｐ３１を利用するものである。具体的には、以下の〔数式３〕に従った処理を行い、フレーム間差分Ｄｎ（ｔ）を得る。 Next, for each acoustic frame, a difference from the spectral component of the immediately preceding acoustic frame is calculated (S14). The processes of S11 to S13 are sequentially performed on each acoustic frame. The inter-frame difference calculation processing in S14 uses P0 to P31 obtained as a result of performing the processing up to S13 for each acoustic frame. Specifically, processing according to the following [Equation 3] is performed to obtain an inter-frame difference Dn (t).

〔数式３〕
Ｄｎ（ｔ）＝｜Ｐｎ（ｔ）−Ｐｎ（ｔ−１）｜、ｎ＝０，…，３１ [Formula 3]
Dn (t) = | Pn (t) −Pn (t−1) |, n = 0,..., 31

上記〔数式３〕においてＰｎ（ｔ）は、ｔ番目の音響フレームにおける統合された周波数成分である。このように、隣接する音響フレーム間の差分を算出するのは、音響データの振幅レベルがわずかに変化するような箇所についても、振幅レベルの変化を強調させ、音響データの特徴を反映した特徴ワードを生成するためである。 In the above [Expression 3], Pn (t) is an integrated frequency component in the t-th acoustic frame. In this way, the difference between adjacent acoustic frames is calculated by emphasizing changes in the amplitude level and reflecting the characteristics of the acoustic data even in places where the amplitude level of the acoustic data slightly changes. It is for producing | generating.

フレーム間差分の算出処理を終えたら、所定フレーム数の処理が終了したかどうかを判断する（Ｓ１５）。具体的には、ｔ≧Ｔであるかどうかを判断する。その結果、ｔ＜Ｔである場合は、ｔをインクリメントしてＳ１１に戻る。Ｓ１５における判断の結果、ｔ≧Ｔである場合は、得られたＴ個の差分Ｄｎ（ｔ）の総和を求める（Ｓ１６）。すなわち、上記Ｓ１１〜Ｓ１４の処理を各音響フレームに対して順次行い、音響フレーム間の差分Ｄｎ（ｔ）がＴ個（本実施形態では１１個）得られたら、Ｔ個の差分Ｄｎ（ｔ）の総和算出を行うことになる。具体的には、以下の〔数式４〕に従った処理を行い、フレーム間差分の総和Ｓｎを得る。 When the inter-frame difference calculation process is completed, it is determined whether the predetermined number of frames have been processed (S15). Specifically, it is determined whether t ≧ T. As a result, if t <T, t is incremented and the process returns to S11. If t ≧ T as a result of the determination in S15, the total sum of the obtained T differences Dn (t) is obtained (S16). That is, when the processes of S11 to S14 are sequentially performed on each acoustic frame and T differences Dn (t) between the acoustic frames are obtained (11 in this embodiment), T differences Dn (t) are obtained. Will be calculated. Specifically, processing according to the following [Equation 4] is performed to obtain a sum Sn of interframe differences.

〔数式４〕
Ｓｎ＝Σ_t=0,…,T-1Ｄｎ（ｔ） [Formula 4]
Sn = Σ _{t = 0,..., T-1} Dn (t)

上記〔数式４〕において、“Σ_t=0,…,T-1”は、ｔ＝０からＴ−１までｔを１ずつ増加させたときの総和を意味する。続いて、上記〔数式４〕により得られたＳｎの二値化処理を行う（Ｓ１７）。具体的には、まず、３２個のＳｎ配列をｎ≧１４とｎ≦１３の上下帯域で２分割し、ｎ≦１３の１４個のうち値の大きい７個に１を与え、値の小さい７個に０を与えるとともに、ｎ≧１４の１８個のうち値の大きい９個に１を与え、値の小さい９個に０を与える。ここで、単純に全３２個のＳｎのうち値の大きい１６個と、小さい１６個に１と０を与えるのではなくて、３２バンドを周波数が高い１８バンドのグループと、周波数が低い１４バンドのグループに分けて、それぞれそのグループ内で均等に１と０を与えるようにしたのは、各種データ圧縮処理に伴う周波数特性の影響を補正するためである。上下のバンドを１８バンドと１４バンドの位置で分けたのは、実験の結果、この位置で分けたとき、検索に使用する検索音響データに対してＭＰ３などの各種データ圧縮処理を施した結果、検索精度が最も高かったためである。Ｓ１７における処理により、各周波数帯ｎについてのフレーム間差分の総和Ｓｎが１ビットで表現可能となる。そして、ｎ＝０をＬＳＢ、ｎ＝３１をＭＳＢとして３２ビットの特徴パターンＦｄ（ｙ）を得る。ここで、ｙ（＝０，…，Ｙ−１）は、１つの原音響データから生成されるＹ個の特徴ワードにおいて、その順番を示す変数である。したがって、ｙは演奏開始からの時刻に比例する変数となる。 In the above [Equation 4], “Σt _{= 0,..., T−1} ” means the total sum when t is increased by 1 from t = 0 to T−1. Subsequently, the binarization process of Sn obtained by the above [Equation 4] is performed (S17). Specifically, first, the 32 Sn arrays are divided into two in the upper and lower bands of n ≧ 14 and n ≦ 13, and 1 is given to 7 of the 14 values of n ≦ 13, and 7 of the smaller value. In addition to giving 0 to each, 1 is given to 9 having a large value among 18 pieces of n ≧ 14, and 0 is given to 9 having a small value. Here, instead of simply giving 1 and 0 to 16 large values and 16 small values out of all 32 Sn, 32 bands are grouped into 18 bands with high frequency and 14 bands with low frequency. The reason why 1 and 0 are equally given in each group is to correct the influence of the frequency characteristics associated with various data compression processes. The reason why the upper and lower bands are divided at the positions of the 18 band and the 14 band is that, as a result of the experiment, when divided at this position, the search acoustic data used for the search is subjected to various data compression processes such as MP3, This is because the search accuracy was the highest. By the processing in S17, the sum Sn of inter-frame differences for each frequency band n can be expressed by 1 bit. Then, a 32-bit feature pattern Fd (y) is obtained with n = 0 as LSB and n = 31 as MSB. Here, y (= 0,..., Y−1) is a variable indicating the order of Y feature words generated from one original sound data. Therefore, y is a variable proportional to the time from the start of performance.

次に、音量データの算出を行う（Ｓ１８）。具体的には、まず、以下の〔数式５〕を用いて総和音量Volを算出する。 Next, the volume data is calculated (S18). Specifically, first, the total volume Vol is calculated using the following [Formula 5].

〔数式５〕
Vol＝Σ_t=0,…,T-1｛Σ_n=0,…,31Ｐｎ（ｔ）｝ [Formula 5]
Vol = Σ _{t = 0, ..., T-1} {Σ _{n = 0, ..., 31} Pn (t)}

上記〔数式５〕において、“Σ_t=0,…,T-1”は、ｔ＝０からＴ−１までｔを１ずつ増加させたときの総和を意味する。上記〔数式５〕に示すように、統合処理により得られた全ての成分Ｐｎ（ｔ）の値をＴ個の音響フレームについて加算する。これにより、Ｔ個の音響フレームについての音量の総和である総和音量Volが得られる。この総和音量Volの値に適宜設定した固定のスケーリング値を乗算して、０〜２５５の範囲に収まるように正規化して音量データＶｄ(ｙ)を得る。正規化により音量データＶｄ(ｙ)は８ビットで表現されることとなる。音量データＶｄ(ｙ)は、上記〔数式５〕に示されるように、Ｔ個の音響フレームに渡る総和音量Volを基礎としているため、各フレーム単位の音量ではなく、Ｔ個の音響フレームの総和音量を表現していることになる。 In the above [Equation 5], “Σ _{t = 0,..., T−1} ” means the total when t is increased by 1 from t = 0 to T−1. As shown in the above [Formula 5], the values of all the components Pn (t) obtained by the integration process are added for T acoustic frames. As a result, a total volume Vol that is the total volume of the T acoustic frames is obtained. Multiplication volume is multiplied by a fixed scaling value set appropriately, and normalization is performed so as to be within the range of 0 to 255 to obtain volume data Vd (y). The sound volume data Vd (y) is represented by 8 bits by normalization. Since the volume data Vd (y) is based on the total volume Vol over T acoustic frames as shown in [Formula 5], the volume data Vd (y) is not the volume of each frame unit, but the total of T acoustic frames. It represents the volume.

上記Ｓ１７、Ｓ１８の処理は、順序を入れ替えて行うことも可能である。Ｓ１７、Ｓ１８による処理の結果、３２ビットの特徴パターンと８ビットの音量データにより構成される４０ビットの特徴ワードが得られる。 The processes of S17 and S18 can be performed by changing the order. As a result of the processing in S17 and S18, a 40-bit feature word composed of a 32-bit feature pattern and 8-bit volume data is obtained.

以上の処理を各音響フレームに対して実行することにより、その音響データについての特徴ワードが多数生成されることになる。例えば、上記の例のように、サンプリング周波数４４．１ｋＨｚ、１音響フレームが４０９６サンプル、音響フレームを２０４８サンプルずつ重複させた場合、１特徴ワードは約０．５０６秒となり、５分間の音響データからは、約６００個の特徴ワードが生成されることになる。 By executing the above processing for each sound frame, a large number of feature words for the sound data are generated. For example, if the sampling frequency is 44.1 kHz, the sound frame is 4096 samples, and the sound frame is overlapped by 2048 samples as in the above example, one feature word is approximately 0.506 seconds, and the sound data from 5 minutes Will generate approximately 600 feature words.

ここで、上記特徴ワードの生成処理を、図４の概念図を用いて説明する。図４（ａ）は、特徴ワードの生成対象とする音響データの波形を示す図である。関連情報登録装置では、音響データを音響フレーム単位で読み取っていくが、図４（ｂ）に示すように、読取範囲を重複させて読み取らせる。そして、各音響フレームに対して、図４（ｃ）に示すように所定の周波数範囲における周波数成分を抽出し、３２バンドに統合する。これは、上記Ｓ１１〜Ｓ１３に相当する。次に、図４（ｄ）に示すように統合成分のバンドごとの隣接する音響フレーム間における差分処理、３２バンドの統合成分の総和処理を行う。統合成分の差分処理は上記Ｓ１４に相当し、統合成分の総和処理は上記Ｓ１８の各周波数成分（ｎ＝０〜３１）の総和処理に相当する。次に、図４（ｅ）に示すように３２バンド差分成分の総和処理、および音量の総和処理を行う。３２バンド差分成分の総和処理は、上記Ｓ１６に相当し、音量の総和処理は上記Ｓ１８の各音響フレーム（ｔ＝０〜Ｔ−１）の総和処理に相当する。次に、図４（ｆ）に示すように３２バンド総和成分の二値化処理、および音量の圧縮処理（上記〔数式５〕に基づき算出された値に所定のスケーリング値を乗算して２５６段階に音量レベルを圧縮する処理）を行う。３２バンド総和成分の二値化処理は上記Ｓ１７に相当し、音量の圧縮処理は上記Ｓ１８に相当する。図４に示すように、音響データから順次音響フレームを読み込み、Ｔ個の音響フレーム単位で１つの特徴ワードを生成していく処理が行われることになる。 Here, the process of generating the feature word will be described with reference to the conceptual diagram of FIG. FIG. 4A is a diagram illustrating a waveform of acoustic data that is a generation target of a feature word. In the related information registration apparatus, the acoustic data is read in units of acoustic frames. However, as shown in FIG. Then, for each acoustic frame, as shown in FIG. 4C, frequency components in a predetermined frequency range are extracted and integrated into 32 bands. This corresponds to S11 to S13. Next, as shown in FIG. 4D, a difference process between adjacent acoustic frames for each band of the integrated component and a total process of the 32 band integrated components are performed. The integrated component difference process corresponds to S14, and the integrated component sum process corresponds to the sum process of each frequency component (n = 0 to 31) in S18. Next, as shown in FIG. 4E, a summation process of 32-band difference components and a summation process of volume are performed. The summation process of the 32-band difference component corresponds to S16, and the volume summation process corresponds to the summation process of each acoustic frame (t = 0 to T-1) in S18. Next, as shown in FIG. 4 (f), the binarization process of the 32-band sum component and the compression process of the sound volume (256 levels obtained by multiplying the value calculated based on the above [Formula 5] by a predetermined scaling value). To the volume level). The binarization process for the 32-band sum component corresponds to S17, and the volume compression process corresponds to S18. As shown in FIG. 4, a process of sequentially reading sound frames from sound data and generating one feature word in units of T sound frames is performed.

指定された１つの原音響データ全体に対して、登録特徴ワード群（登録特徴ワードの集合）が得られたら、次に、代表登録特徴データ生成手段２０が、登録特徴ワード群を用いて代表登録特徴データを生成する。代表登録特徴データは、登録特徴ワードを基礎とする特徴成分の時間方向における平均値、標準偏差の組として構成される。図５は、登録特徴ワードを基礎とする特徴成分の時間方向における平均値、標準偏差の算出処理を示すフローチャートである。まず、代表登録特徴データ生成手段２０は、登録特徴ワードを多値化処理することにより登録特徴データ配列を生成する（Ｓ１９）。具体的には、以下の〔数式６〕に従った処理を実行することにより、登録特徴ワードを基礎とする特徴成分である登録特徴データ配列Ｚ(ｎ,ｙ)（ｎ＝０,…,３１；ｙ＝０,…,Ｙ−１)を生成する。 When a registered feature word group (a set of registered feature words) is obtained for the entire specified original sound data, the representative registered feature data generation unit 20 then performs a representative registration using the registered feature word group. Generate feature data. The representative registered feature data is configured as a set of an average value and a standard deviation in the time direction of feature components based on the registered feature word. FIG. 5 is a flowchart showing a calculation process of an average value and a standard deviation in the time direction of feature components based on registered feature words. First, the representative registered feature data generation means 20 generates a registered feature data array by multi-value processing the registered feature word (S19). Specifically, by executing processing according to the following [Equation 6], a registered feature data array Z (n, y) (n = 0,..., 31) which is a feature component based on the registered feature word. Y = 0,..., Y-1) is generated.

〔数式６〕
Ｆｄ(ｙ)の各ビットｎが１の場合、Ｚ(ｎ,ｙ)←Ｖｄ(ｙ)
Ｆｄ(ｙ)の各ビットｎが０の場合、Ｚ(ｎ,ｙ)←−Ｖｄ(ｙ) [Formula 6]
When each bit n of Fd (y) is 1, Z (n, y) ← Vd (y)
When each bit n of Fd (y) is 0, Z (n, y) ← −Vd (y)

次に、代表登録特徴データ生成手段２０は、登録特徴データ配列Ｚ(ｎ, ｙ)の時間方向ｙにおける平均値配列Ｃｄ（ｎ, ｒ）、標準偏差配列Ｌｄ（ｎ, ｒ）を算出する（Ｓ２０）。具体的には、以下の〔数式７〕に従った処理を実行することにより、平均値配列Ｃｄ（ｎ, ｒ）、標準偏差配列Ｌｄ（ｎ, ｒ）を算出する。 Next, the representative registered feature data generation means 20 calculates an average value array Cd (n, r) and a standard deviation array Ld (n, r) in the time direction y of the registered feature data array Z (n, y) ( S20). Specifically, the average value array Cd (n, r) and the standard deviation array Ld (n, r) are calculated by executing processing according to the following [Equation 7].

〔数式７〕
Ｃｄ（ｎ, ｒ）=[Σ_y=0,…,Y-1 Ｚ(ｎ, ｙ)]／Ｙ
Ｌｄ（ｎ, ｒ）=[Σ_y=0,…,Y-1 (Ｚ(ｎ, ｙ)-Ｃｄ(ｎ，ｒ))²／Ｙ]^1/2 [Formula 7]
Cd (n, r) = [Σ _{y = 0,..., Y-1} Z (n, y)] / Y
Ld (n, r) = [Σ _{y = 0,..., Y-1} (Z (n, y) −Cd (n, r)) ² / Y] ^1/2

図６は、Ｓ１９、Ｓ２０による登録特徴データ配列の平均値、標準偏差の算出処理の概念図である。図６に示すように、登録特徴ワードは各時刻ｙ（ｙ＝０,…,Ｙ−１）に対応して３２ビットの特徴パターンＦ(ｙ)と８ビットの音量データＶ(ｙ)を有している。図６においては、特徴パターンの３２個の各ビットをＢｉｔ０〜Ｂｉｔ３１で示している。 FIG. 6 is a conceptual diagram of the process of calculating the average value and standard deviation of the registered feature data array in S19 and S20. As shown in FIG. 6, the registered feature word has a 32-bit feature pattern F (y) and 8-bit volume data V (y) corresponding to each time y (y = 0,..., Y−1). doing. In FIG. 6, 32 bits of the feature pattern are indicated by Bit0 to Bit31.

そして、上記〔数式６〕に示したようにビットｎ（Ｂｉｔ０〜Ｂｉｔ３１）の値が０であるか１であるかにより、Ｚ(ｎ,ｙ)の値を音量データの値そのままとするか、音量データの値に−１を乗じたものとするかを決定し、ビットｎに対応するＺ(ｎ,ｙ)の値を定める。このとき、８ビットの音量データを負の値とする場合が生じるため、Ｚ(ｎ,ｙ)は１６ビットで表現する。３２ビット特徴パターンでは、各バンドについて１ビットで表現されていたものが、１６ビット（２バイト）で表現されることになるので、各時刻ｙにおける登録特徴データは６４バイトとなる。 Depending on whether the value of bit n (Bit 0 to Bit 31) is 0 or 1 as shown in [Formula 6], the value of Z (n, y) is left as the value of the volume data, It is determined whether the value of the volume data is multiplied by −1, and the value of Z (n, y) corresponding to bit n is determined. At this time, since 8-bit volume data may be a negative value, Z (n, y) is expressed by 16 bits. In the 32-bit feature pattern, what is represented by 1 bit for each band is represented by 16 bits (2 bytes), so the registered feature data at each time y is 64 bytes.

代表登録特徴データ生成手段２０により代表登録特徴データとして生成された平均値配列Ｃｄ（ｎ, ｒ）、標準偏差配列Ｌｄ（ｎ, ｒ）は、音響データＩＤ等の音響データを特定する情報（ｒと１対１で対応）と対応付けて音響データベース４０に登録される。 The average value array Cd (n, r) and standard deviation array Ld (n, r) generated as representative registered feature data by the representative registered feature data generating means 20 is information (r for identifying acoustic data such as an acoustic data ID). And one-to-one correspondence) and is registered in the acoustic database 40.

各原音響データについて、その原音響データについての関連情報、音響データＩＤ、登録特徴ワード群、代表登録特徴データを対応付けて音響データベース４０に登録する。関連情報としては、当該原音響データに関連する情報であれば、どのようなものでも良いが、例えば、当該原音響データが楽曲であれば、曲名や演奏者名、当該原音響データがＣＭ音声であれば、そのスポンサー企業の名前やＵＲＬ等を用いることができる。ただし、当該原音響データの制作・マスタリングに使用した一連のバイナリ形式の素材データ（ミックスダウンする前の個別の録音データ、ＭＩＤＩ打ち込みデータ）等は著作権法上の制約により通常対象外とする。 For each original sound data, the related information about the original sound data, the sound data ID, the registered feature word group, and the representative registered feature data are registered in the sound database 40 in association with each other. The related information may be any information as long as it is related to the original sound data. For example, if the original sound data is a song, the name of the song, the name of the player, and the original sound data are CM audio. If so, the name or URL of the sponsoring company can be used. However, a series of binary-format material data (individual recording data before mixing down, MIDI input data) used for production / mastering of the original sound data is not normally subject to restrictions due to copyright laws.

＜２．関連情報検索装置＞
次に、本発明に係る音響データの関連情報検索装置（以下「関連情報検索装置」という。）について説明する。図７は、関連情報検索装置のハードウェア構成図である。関連情報検索装置は、関連情報登録装置と同様、汎用のコンピュータで実現することができ、図７に示すように、ＣＰＵ３ａ（CPU: Central Processing Unit）と、コンピュータのメインメモリであるＲＡＭ３ｂ（RAM: Random Access Memory）と、データを記憶するための大容量のデータ記憶装置３ｃ（例えば，ハードディスク）と、ＣＰＵが実行するプログラムを記憶するためのプログラム記憶装置３ｄ（例えば，ハードディスク）と、キーボード、マウス等のキー入力Ｉ／Ｆ３ｅと、外部デバイス（データ記憶媒体）とデータ通信するためのデータ入出力インターフェース３ｆと、表示デバイス（ディスプレイ）に情報を送出するための表示出力インターフェース３ｇと、を備え、互いにバスを介して接続されている。 <2. Related information search device>
Next, a related information search device for acoustic data according to the present invention (hereinafter referred to as “related information search device”) will be described. FIG. 7 is a hardware configuration diagram of the related information search apparatus. Similar to the related information registration device, the related information search device can be realized by a general-purpose computer. As shown in FIG. 7, a CPU 3a (CPU: Central Processing Unit) and a RAM 3b (RAM: RAM: the main memory of the computer). Random Access Memory), a large-capacity data storage device 3c (for example, hard disk) for storing data, a program storage device 3d (for example, hard disk) for storing programs executed by the CPU, a keyboard, and a mouse A key input I / F 3e, a data input / output interface 3f for data communication with an external device (data storage medium), and a display output interface 3g for sending information to a display device (display). They are connected to each other via a bus.

関連情報検索装置のプログラム記憶装置３ｄには、ＣＰＵ３ａを動作させ、コンピュータを、関連情報検索装置として機能させるための専用のプログラムが実装されている。また、データ記憶装置３ｃは、登録特徴ワード、代表登録特徴データ等を関連情報と対応付けて記憶されており、音響データベースとして機能するとともに、処理に必要な様々なデータを記憶する。図７では、単体のコンピュータで実現した例を示したが、音響データベースが稼動されているサーバーコンピュータとネットワークで接続されている高性能な演算処理機能を備えているパーソナルコンピュータが、専用のプログラムに従って各手段の内容を実行するようにしても良い。 The program storage device 3d of the related information search device is mounted with a dedicated program for operating the CPU 3a and causing the computer to function as the related information search device. The data storage device 3c stores registered feature words, representative registered feature data, and the like in association with related information, functions as an acoustic database, and stores various data necessary for processing. Although FIG. 7 shows an example realized by a single computer, a personal computer having a high-performance arithmetic processing function connected to a server computer on which an acoustic database is operated via a network follows a dedicated program. The contents of each means may be executed.

図８は、本発明に係る関連情報検索装置の機能ブロック図である。図８において、４０は音響データベース、４５はモード設定手段、５０は検索特徴ワード生成手段、６０は代表検索特徴データ生成手段、７０は照合範囲決定手段、８０は代表特徴データ照合手段、９０は特徴ワード照合手段、１００は情報出力手段である。関連情報検索装置は、利用者が保有している検索音響データを用いて、音響データベースに登録されている原音響データに関する関連情報を検索音響データに関連する関連情報として検索するものである。検索音響データとは、検索に用いる音響データである。検索の際、検索音響データから生成した特徴ワードである検索特徴ワードと、あらかじめ音響データベース４０に登録されている登録特徴ワードの照合を行う必要がある。そのため、検索特徴ワードと登録特徴ワードは基本的に同一の構造である必要がある（なお、前者の検索特徴ワード群は位相を変化させた複数（Ｈ個）の特徴ワード群のセットが生成される）。検索特徴ワードと登録特徴ワードの基になる検索音響データと原音響データは種々の符号化形式で圧縮され、入手形態により互いに異なる符号化形式になるのが一般的であるため、同一の符号化形式になるように変換する必要がある。本実施形態では、検索音響データも原音響データも同じ仕様（サンプリング周波数:44.1kHz、量子化ビット数:16bits、チャンネル数:1・モノラルといったＰＣＭ形式のパラメータ）のＰＣＭ形式になるように変換し統一させるようにしている。 FIG. 8 is a functional block diagram of the related information search apparatus according to the present invention. In FIG. 8, 40 is an acoustic database, 45 is a mode setting means, 50 is a search feature word generation means, 60 is a representative search feature data generation means, 70 is a collation range determination means, 80 is a representative feature data collation means, and 90 is a feature. A word collating unit 100 is an information output unit. The related information search device searches for related information related to the original sound data registered in the sound database as related information related to the search sound data using the search sound data held by the user. The search sound data is sound data used for search. At the time of search, it is necessary to collate a search feature word, which is a feature word generated from the search acoustic data, with a registered feature word registered in the acoustic database 40 in advance. Therefore, the search feature word and the registered feature word need to have basically the same structure (note that the former search feature word group generates a set of multiple (H) feature word groups with different phases. ) The search audio data and the original audio data that are the basis of the search feature word and the registered feature word are compressed in various encoding formats, and are generally in different encoding formats depending on the acquisition form. It needs to be converted to a format. In this embodiment, the search sound data and the original sound data are converted into the PCM format having the same specifications (sampling frequency: 44.1 kHz, quantization bit number: 16 bits, channel number: 1 / monophonic PCM format parameter). I try to unify.

モード設定手段４５は、関連情報検索装置が備えている複数のモードの中からいずれのモードに従って処理を行うかを設定するものであり、キーボード、マウス等の入力機器およびキー入力Ｉ／Ｆ３ｅにより実現される。設定可能なモードとしては、検索音響データモードおよび音量判定モードが用意されている。検索音響データモードは、検索音響データの状態を示すものであり、イントロ検索、全尺検索の２つが選択可能になっている。イントロ検索とは、検索音響データが、音響素材（楽曲の場合、原曲を意味する。）の先頭を含む場合であり、全尺検索は、検索音響データが音響素材から一切切り出しを行われたものでなく、音響素材全てを用いた場合、すなわち、音響素材と同一の時間的長さをもつ場合に対応するものである。音量判定モードは、後述するワード単位相違度Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）に音量成分をどのように加味するかを設定するものであり、Ｏｆｆ、Ｗｅｉｇｈｔ、Ｍａｔｃｈ、Ｂｏｔｈの４つが選択可能になっている。 The mode setting unit 45 sets which mode is to be used for processing from among a plurality of modes provided in the related information search apparatus, and is realized by an input device such as a keyboard and a mouse and a key input I / F 3e. Is done. As modes that can be set, a search sound data mode and a sound volume determination mode are prepared. The search sound data mode indicates the state of the search sound data, and can be selected from an intro search and a full scale search. Intro search is a case where the search sound data includes the beginning of the sound material (in the case of music, it means the original music), and in the full length search, the search sound data is completely cut out from the sound material. This corresponds to the case where all the acoustic material is used, that is, the same time length as the acoustic material. The volume determination mode sets how to add a volume component to the word unit dissimilarity D (r, y + w, h, x + w), which will be described later, and can be selected from Off, Weight, Match, and Both. It has become.

検索特徴ワード生成手段５０は、図２に示した登録特徴ワード生成手段１０と同様、読み込んだ音響フレームを利用して、周波数解析を行い、その検索音響データの特徴を表現した特徴ワードを生成する機能を有している。ただし、位相をずらした複数（Ｈ個）の特徴ワード群のセットを生成するようにしている。代表検索特徴データ生成手段６０は、図２に示した代表登録特徴データ生成手段２０と同様、検索特徴ワード生成手段５０により生成された検索特徴ワード群を用いて、１つの検索音響データにつき、１つの代表検索特徴データを生成する機能を有する。検索特徴ワードは、検索音響データの部分的な特徴を表現するのに対して、代表検索特徴データは、１つの検索音響データの全体的な特徴を表現する。 Similar to the registered feature word generation unit 10 shown in FIG. 2, the search feature word generation unit 50 performs frequency analysis using the read sound frame and generates a feature word expressing the features of the search sound data. It has a function. However, a set of a plurality of (H) feature word groups whose phases are shifted is generated. Similar to the representative registered feature data generation unit 20 shown in FIG. 2, the representative search feature data generation unit 60 uses the search feature word group generated by the search feature word generation unit 50, for each search acoustic data. It has a function of generating one representative search feature data. The search feature word expresses a partial feature of the search acoustic data, while the representative search feature data expresses the overall feature of one search acoustic data.

代表特徴データ照合手段８０は、生成した代表検索特徴データと、音響データベース４０に登録されている代表登録特徴データとの照合を行う機能を有している。特徴ワード照合手段９０は、生成した検索特徴ワードと、音響データベース４０に登録されている登録特徴ワードとの照合を行う機能を有している。情報出力手段１００は、特徴ワード照合手段９０による照合の結果、検索音響データの特徴に類似する原音響データについての関連情報を、音響データベース４０から抽出して出力する機能を有している。 The representative feature data matching unit 80 has a function of matching the generated representative search feature data with the representative registered feature data registered in the acoustic database 40. The feature word collating unit 90 has a function of collating the generated search feature word with the registered feature word registered in the acoustic database 40. The information output unit 100 has a function of extracting and outputting related information about the original sound data similar to the feature of the searched sound data as a result of the matching by the feature word matching unit 90 from the sound database 40.

ここで、検索特徴ワード生成手段５０、代表検索特徴データ生成手段６０、代表特徴データ照合手段８０、特徴ワード照合手段９０の関係の概要を図９を用いて説明しておく。図９に示すように、原音響データについては、その原音響データに対して特徴ワード生成処理を実行し、得られた複数個の特徴ワードを音響データベースに登録しておく。さらに、複数個の特徴ワードに対して代表特徴データ生成処理を実行し、原音響データごとに１つの代表特徴データを登録しておく。検索時には、代表検索特徴データ生成手段６０が、検索音響データに対して特徴ワード生成処理を実行した後、代表検索特徴データ生成手段６０が、特徴ワードに対して代表特徴データ生成処理を実行し、１つの代表特徴データを生成する。そして、代表特徴データ照合手段８０が、検索音響データから得られた代表特徴データと音響データベース内の各代表特徴データの照合処理を行う。そして、照合の結果、条件を満たす原音響データに対してのみ、特徴ワード照合手段９０が、特徴ワード同士の照合を行う。 Here, an outline of the relationship among the search feature word generation unit 50, the representative search feature data generation unit 60, the representative feature data collation unit 80, and the feature word collation unit 90 will be described with reference to FIG. As shown in FIG. 9, for the original sound data, a feature word generation process is executed on the original sound data, and a plurality of obtained feature words are registered in the sound database. Further, representative feature data generation processing is executed for a plurality of feature words, and one representative feature data is registered for each original sound data. At the time of the search, the representative search feature data generation unit 60 executes the feature word generation process on the search acoustic data, and then the representative search feature data generation unit 60 executes the representative feature data generation process on the feature word, One representative feature data is generated. Then, the representative feature data matching unit 80 performs a matching process between the representative feature data obtained from the searched sound data and each representative feature data in the sound database. As a result of the collation, the feature word collating unit 90 collates the feature words only with respect to the original sound data that satisfies the condition.

このように、まず、代表特徴データ照合手段８０が、音響データごとに１つだけ生成され、音響データ全体の特徴を表現した代表特徴データを用いて絞込みを行うことにより、大きく相違している音響データを対象から除外する。そして、比較的類似している原音響データに対して、特徴ワード照合手段９０が、部分的特徴を表現した特徴ワードを用いて照合を行うことにより、的確な検索を行うことができる。 In this way, first, the representative feature data matching unit 80 generates only one for each acoustic data, and narrows down using the representative feature data representing the characteristics of the entire acoustic data, so that the acoustic features that are greatly different Exclude data from the target. And the characteristic word collation means 90 collates with respect to the comparatively similar original sound data using the characteristic word which expressed the partial feature, and can perform an exact search.

＜２．２．関連情報検索装置の処理動作＞
続いて、図８に示した装置の処理動作について説明する。まず、検索オペレータが保有している検索音響データを検索したいと思った場合、関連情報検索装置に対して起動の指示を行い、起動後、検索対象とする検索音響データを指定する。これは、キー入力Ｉ／Ｆ３ｅを介して所定のコンピュータ画面上のボタンを操作し、関連情報検索装置のデータ記憶装置３ｃ内に保存されている検索音響データを指定することにより実行できる。現実には、検索音響データは、ＭＰ３等の圧縮形式であることが多いため、ＰＣＭ形式に変換した後、処理を行う。また、検索オペレータは、モード設定手段４５により、検索モード、音量判定モードの設定を行う。設定を行わない場合は、検索モードについては、“イントロ検索” “全尺検索”以外の通常検索が実行され、音量判定モードについては、“Ｏｆｆ”が設定される。設定された情報は、ＣＰＵ３ａにより、ＲＡＭ３ｂの所定の領域に書き込まれ、各手段が参照可能な状態となる。検索モードにおいて、“イントロ検索” “全尺検索”が選択された場合は、その検索音響データが音響素材の先頭から始まることが設定されることになる。 <2.2. Processing operation of related information retrieval device>
Next, the processing operation of the apparatus shown in FIG. 8 will be described. First, when the search operator wants to search the search sound data held by the search operator, the related information search device is instructed to start, and after the start, the search sound data to be searched is designated. This can be executed by operating a button on a predetermined computer screen via the key input I / F 3e and designating search acoustic data stored in the data storage device 3c of the related information search device. Actually, since the search sound data is often in a compression format such as MP3, it is processed after being converted into the PCM format. Further, the search operator sets the search mode and the sound volume determination mode by the mode setting means 45. When the setting is not performed, a normal search other than “Intro search” and “Full scale search” is executed for the search mode, and “Off” is set for the volume determination mode. The set information is written into a predetermined area of the RAM 3b by the CPU 3a, and each means can be referred to. When “Intro search” or “Full scale search” is selected in the search mode, the search acoustic data is set to start from the head of the acoustic material.

指示が入力されると、検索特徴ワード生成手段５０が、指定された検索音響データから、それぞれ所定数のサンプルを１音響フレームとして読み込む。この処理は、関連情報登録装置が行ったのと同様に行われる。すなわち、１音響フレームのサンプル数は、サンプリング周波数が４４．１ｋＨｚの場合、４０９６サンプルとする。また、音響フレームは、２０４８サンプルを重複させて読み込むことにしている。 When an instruction is input, the search feature word generation unit 50 reads a predetermined number of samples as one sound frame from the specified search sound data. This process is performed in the same manner as that performed by the related information registration apparatus. That is, the number of samples in one acoustic frame is 4096 samples when the sampling frequency is 44.1 kHz. In addition, the sound frame is read by overlapping 2048 samples.

ここから検索特徴ワードの生成までの処理は、図１０のフローチャートに従ったものとなる。図１０のフローチャートは、登録特徴ワード生成についての図３のフローチャートとほぼ同様のものとなっている。検索特徴ワード生成手段５０は、Ｓ１１と同様にして、読み込んだ各音響フレームに対して周波数変換を行って、その音響フレームのスペクトルであるフレームスペクトルを得る（Ｓ２１）。関連情報登録装置と同様、周波数変換としては、フーリエ変換、ウェーブレット変換その他公知の種々の手法を用いることができるが、関連情報登録装置の処理と合わせる必要があるため、本実施形態では、フーリエ変換を用いる。 The processing from here to the generation of the search feature word follows the flowchart of FIG. The flowchart in FIG. 10 is substantially the same as the flowchart in FIG. 3 for the registration feature word generation. In the same manner as in S11, the search feature word generation unit 50 performs frequency conversion on each read sound frame to obtain a frame spectrum that is a spectrum of the sound frame (S21). As with the related information registration apparatus, as the frequency conversion, Fourier transform, wavelet transform, and other various known methods can be used. However, since it is necessary to match the processing of the related information registration apparatus, in this embodiment, the Fourier transform Is used.

続いて、スペクトル成分の算出を行う（Ｓ２２）。具体的には、Ｓ１２と同様、上記〔数式１〕第３式に従った処理を行い、各周波数における強度値Ｅ（ｊ）を得る。 Subsequently, a spectral component is calculated (S22). Specifically, similarly to S12, the processing according to the above [Formula 1] third formula is performed to obtain the intensity value E (j) at each frequency.

続いて、スペクトル成分の間引き処理を行う（Ｓ２３）。具体的には、Ｓ１３と同様、上記〔数式２〕に従った処理を実行し、１１周波数成分単位のＰｎに間引くことになる。 Subsequently, thinning processing of spectral components is performed (S23). Specifically, as in S13, the process according to the above [Equation 2] is executed and thinned out to Pn in units of 11 frequency components.

上記〔数式２〕により、ｊ＝３３〜３８４の３５２の周波数成分が、ｎ＝０〜３１の３２の周波数成分に間引かれることになる。上記処理は、各音響フレームについて行われ、各音響フレームについて、３２個の周波数成分が得られることになる。 According to the above [Expression 2], 352 frequency components of j = 33 to 384 are thinned out to 32 frequency components of n = 0 to 31. The above process is performed for each acoustic frame, and 32 frequency components are obtained for each acoustic frame.

次に、各音響フレームについて、直前の音響フレームのスペクトル成分との差分を算出する（Ｓ２４）。上記Ｓ２１〜Ｓ２３の処理は、各音響フレームに対して順次行われる。このＳ２４におけるフレーム間差分の算出処理は、各音響フレームについてＳ２３までの処理を行った結果得られたＰ０〜Ｐ３１を利用するものである。具体的には、Ｓ１４と同様、上記〔数式３〕に従った処理を行い、フレーム間差分Ｄｎ（ｔ）を得る。 Next, for each acoustic frame, a difference from the spectral component of the immediately preceding acoustic frame is calculated (S24). The processes of S21 to S23 are sequentially performed on each acoustic frame. The inter-frame difference calculation processing in S24 uses P0 to P31 obtained as a result of performing the processing up to S23 for each acoustic frame. Specifically, similarly to S14, the process according to the above [Equation 3] is performed to obtain the inter-frame difference Dn (t).

上記Ｓ２１〜Ｓ２４の処理を各音響フレームに対して順次行い、音響フレーム間の差分Ｄｎ（ｔ）がＴ個（本実施形態では１１個）得られたら、そのＴ個分の総和を求める（Ｓ２６）。すなわち、以下の〔数式８〕に従った処理を行い、フレーム間差分の総和Ｓｎ（ｈ）を得る。〔数式８〕のｈｓは位相をずらす最小単位の音響フレーム数であり、(Ｔ／Ｈで定義される。ただし、ｈｓは整数値でないと意味を持たないため、本実施形態では小数点以下を切り捨て、ｈｓ＝−２としている。 When the processes of S21 to S24 are sequentially performed on each acoustic frame and T differences (n in this embodiment) Dn (t) between the acoustic frames are obtained, a total sum of T is obtained (S26). ). That is, processing according to the following [Equation 8] is performed to obtain a sum Sn (h) of inter-frame differences. Hs in [Equation 8] is the number of sound frames in the minimum unit for shifting the phase, and is defined by (T / H. However, since hs has no meaning unless it is an integer value, in this embodiment, the decimal part is rounded down. , Hs = −2.

〔数式８〕
Ｓｎ（ｈ）＝Σ_t=0,…,T-1Ｄｎ（ｔ＋ｈ・ｈｓ） [Formula 8]
Sn (h) = Σ _{t = 0,..., T-1} Dn (t + h · hs)

上記〔数式８〕において、ｈは位相を特定する位相番号であり、０≦ｈ≦Ｈ−１のＨ通りの値をとる整数である。続いて、上記〔数式８〕により得られたＳｎ（ｈ）の二値化処理を行う（Ｓ２７）。具体的には、Ｓ１７と同様の処理をＳｎ（ｈ）配列に対して実行する。すなわち、Ｓｎ（ｈ）配列をｎ≧１４とｎ≦１３の上下帯域で２分割し、ｎ≦１３の１４個中値の大きい７個に１を与え、値の小さい７個に０を与えるとともに、ｎ≧１４の１８個中値の大きい９個に１を与え、値の小さい９個に０を与える。Ｓ２７における処理により、各ｎについてのＳｎ（ｈ）が１ビットで表現可能となる。そして、ｎ＝０をＬＳＢ、ｎ＝３１をＭＳＢとして３２ビットの特徴パターンＦ（ｈ,ｘ）を得る。ここで、ｘ（＝０，・・・，Ｘ）は、検索音響データから生成されるＸ個の特徴ワードにおいて、その順番を示す変数である。したがって、ｘは演奏開始からの時刻に比例する変数となる。 In the above [Equation 8], h is a phase number that identifies the phase, and is an integer that takes H values of 0 ≦ h ≦ H−1. Subsequently, the binarization process of Sn (h) obtained by the above [Equation 8] is performed (S27). Specifically, the same processing as S17 is executed for the Sn (h) array. That is, the Sn (h) array is divided into two in the upper and lower bands of n ≧ 14 and n ≦ 13, 1 is given to 7 of 14 large values of n ≦ 13, and 0 is given to 7 of the smaller values. , N ≧ 14 and 9 of the 18 large values are given 1 and 9 of the small values are given 0. By the processing in S27, Sn (h) for each n can be expressed by 1 bit. Then, a 32-bit feature pattern F (h, x) is obtained with n = 0 as LSB and n = 31 as MSB. Here, x (= 0,..., X) is a variable indicating the order of X feature words generated from the search sound data. Therefore, x is a variable proportional to the time from the start of performance.

次に、音量データの算出を行う（Ｓ２８）。具体的には、まず、以下の〔数式９〕を用いて各位相番号ｈについて総和音量Vol（ｈ）を算出する。 Next, the volume data is calculated (S28). Specifically, first, the total volume Vol (h) is calculated for each phase number h using [Formula 9] below.

〔数式９〕
Vol（ｈ）＝Σ_t=0,…,T-1｛Σ_n=0,…,31Ｐｎ（ｔ＋ｈ・ｈｓ）｝ [Formula 9]
Vol (h) = Σ _{t = 0,..., T-1} {Σ _{n = 0,..., 31} Pn (t + h · hs)}

上記〔数式５〕に示すように、間引き処理した全ての成分Ｐｎ（ｔ＋ｈ・ｈｓ）の値をＴ個の音響フレームについて加算する。これにより、各位相番号ｈについて、Ｔ個の音響フレームについての音量の総和である総和音量Vol（ｈ）が得られる。この総和音量Volの値に適宜設定した固定のスケーリング値を乗算して、０〜２５５の範囲に収まるように正規化して音量データＶ(ｈ,ｘ)を得る。正規化により音量データＶ(ｈ,ｘ)は８ビットで表現されることとなる。音量データＶ(ｈ,ｘ)は、上記〔数式９〕に示されるように、Ｔ個の音響フレームに渡る音量の総和を基礎としているため、各フレーム単位の音量ではなく、Ｔ個の音響フレームの総和音量を表現していることになる。 As shown in [Expression 5] above, the values of all the thinned components Pn (t + h · hs) are added for T acoustic frames. As a result, for each phase number h, a total volume Vol (h) that is the total volume for the T acoustic frames is obtained. The value of the total volume Vol is multiplied by a fixed scaling value set as appropriate, and is normalized so that it falls within the range of 0 to 255 to obtain volume data V (h, x). The sound volume data V (h, x) is represented by 8 bits by normalization. Since the volume data V (h, x) is based on the sum of volume over T acoustic frames as shown in [Equation 9], the volume data V (h, x) is not the volume of each frame unit, but T sound frames. It represents the total volume of.

上記Ｓ２７、Ｓ２８の処理は、順序を入れ替えて行うことも可能である。Ｓ２７、Ｓ２８による処理の結果、３２ビットの特徴パターンと８ビットの音量データにより構成される４０ビットの特徴ワードが得られる。 The processes of S27 and S28 can be performed by changing the order. As a result of the processing in S27 and S28, a 40-bit feature word composed of a 32-bit feature pattern and 8-bit volume data is obtained.

以上の処理を各音響フレームに対して実行することにより、その音響データについての特徴ワードが多数（Ｘ個）生成されることになる。例えば、上記の例のように、サンプリング周波数４４．１ｋＨｚ、１音響フレームが４０９６サンプル、音響フレームを２０４８サンプルずつ重複させた場合、１特徴ワードは約０．５０６秒となり、３０秒間の音響データからは、約６０個（＝Ｘ個）の特徴ワードが生成されることになる。 By executing the above processing for each acoustic frame, a large number (X) of characteristic words for the acoustic data are generated. For example, when the sampling frequency is 44.1 kHz, the sound frame is 4096 samples, and the sound frame is overlapped by 2048 samples as in the above example, one feature word is about 0.506 seconds, and the sound data from 30 seconds Will generate about 60 (= X) feature words.

上記のようにして、関連情報検索装置では、関連情報登録装置に比べて音響データの単位時間あたりＨ倍（本実施形態では、Ｈ＝５）の特徴ワードを生成する。（ただし、全体としては関連情報検索装置で生成される特徴ワードの方が、関連情報登録装置に比べて顕著に少なくなる。）関連情報登録装置に比べて検索対象とする検索音響データは、楽曲の一部が切り取られたものであることが多い。これは、利用者が演奏されている音楽の一部を録音することにより取得されることがあるためである。そのため、必ずしも、データベースに登録された原音響データとのタイミングが一致するものではなく、位置ズレが生じることがある。関連情報登録装置において生成した手法でも、１１個の音響フレームを平均化して生成しているため、比較的位置ズレには強い。しかし、リズム変化が激しい検索音響データの場合、特徴ワードの生成単位である１１音響フレームの、ほぼ半分である５音響フレーム程度ずれると、顕著に異なる特徴ワードが生成され、誤った情報が検索されてしまう。そのため、１解析単位である１１音響フレームの範囲内でｈｓ（＝２音響フレーム）ずつ遅らせて（位相を変更して）複数の検索特徴ワードを生成して、音響データベース４０内の登録特徴ワードと照合するようにする。 As described above, the related information search device generates H times (in this embodiment, H = 5) feature words per unit time of the acoustic data as compared with the related information registration device. (However, as a whole, the feature words generated by the related information search device are significantly fewer than the related information registration device.) The search acoustic data to be searched for compared to the related information registration device is music. Often part of this is cut off. This is because the user may obtain it by recording a part of the music being played. For this reason, the timing of the original sound data registered in the database does not necessarily match, and a positional shift may occur. Even in the method generated in the related information registration apparatus, 11 acoustic frames are averaged and generated, so that they are relatively resistant to positional deviation. However, in the case of searched sound data with a sharp rhythm change, if the sound sound is shifted by about 5 sound frames, which is almost half of 11 sound frames, which is a feature word generation unit, significantly different feature words are generated, and erroneous information is searched. End up. Therefore, a plurality of search feature words are generated by delaying (changing the phase) by hs (= 2 sound frames) within the range of 11 sound frames as one analysis unit, and the registered feature words in the sound database 40 Try to match.

具体的には、関連情報登録装置では、フレーム１〜フレーム１１までで１つの特徴ワードを生成するが、関連情報検索装置では、フレーム１〜フレーム１１で特徴ワードを生成するとともに、２音響フレーム分、４音響フレーム分、６音響フレーム分、８音響フレーム分ずらした（位相を変更した）音響フレーム群からも特徴ワードを生成する。すなわち、フレーム３〜フレーム１３、フレーム５〜フレーム１５、フレーム７〜フレーム１７、フレーム９〜フレーム１９においても特徴ワードを生成する。結局、図１０のフローチャートに従った処理を各音響フレームに対して実行することにより、Ｘ個の検索特徴ワードを構成する配列として、［Ｆ(ｈ,ｘ), Ｖ(ｈ,ｘ)］（ｈ＝０,…,Ｈ−１；ｘ＝０,…,Ｘ−１)が得られることになる。 Specifically, the related information registration device generates one feature word from frame 1 to frame 11, whereas the related information search device generates a feature word from frame 1 to frame 11 and for two acoustic frames. A feature word is also generated from a group of acoustic frames shifted (by changing the phase) by four acoustic frames, six acoustic frames, and eight acoustic frames. That is, feature words are also generated in frames 3 to 13, frames 5 to 15, frames 7 to 17, and frames 9 to 19. Eventually, by executing the processing according to the flowchart of FIG. 10 for each acoustic frame, [F (h, x), V (h, x)] ( h = 0, ..., H-1; x = 0, ..., X-1).

位相のズレを考慮してＨ個の検索特徴ワード群を生成したら、次に、代表検索特徴データ生成手段６０が、検索特徴ワード群を用いて代表検索特徴データを生成する。代表検索特徴データは、検索特徴ワードを基礎とする特徴成分の時間方向における平均値、標準偏差の組として構成される。図１１は、検索特徴ワードを基礎とする特徴成分の時間方向における平均値、標準偏差の算出処理を示すフローチャートである。まず、代表検索特徴データ生成手段６０は、検索特徴ワードを多値化処理することにより検索特徴データ配列を生成する（Ｓ２９）。具体的には、以下の〔数式１０〕に従った処理を実行することにより、検索特徴ワードを基礎とする特徴成分である検索特徴データ配列Ｚ(ｎ,ｈ,ｘ)（ｎ＝０,…,３１；ｈ＝０,…,Ｈ−１；ｘ＝０,…,Ｘ−１)を生成する。 After generating the H search feature word groups in consideration of the phase shift, the representative search feature data generating means 60 then generates representative search feature data using the search feature word groups. The representative search feature data is configured as a set of an average value and a standard deviation in the time direction of feature components based on the search feature word. FIG. 11 is a flowchart showing a calculation process of an average value and a standard deviation in the time direction of feature components based on a search feature word. First, the representative search feature data generation means 60 generates a search feature data array by multi-value processing the search feature words (S29). Specifically, by executing processing according to the following [Equation 10], a search feature data array Z (n, h, x) (n = 0,...) That is a feature component based on the search feature word. , 31; h = 0,..., H−1; x = 0,.

〔数式１０〕
Ｆ(ｈ,ｘ)の各ビットｎが１の場合、Ｚ(ｎ,ｈ,ｘ)←Ｖ(ｈ,ｘ)
Ｆ(ｈ,ｘ)の各ビットｎが０の場合、Ｚ(ｎ,ｈ,ｘ)←−Ｖ(ｈ,ｘ) [Formula 10]
When each bit n of F (h, x) is 1, Z (n, h, x) ← V (h, x)
When each bit n of F (h, x) is 0, Z (n, h, x) ← −V (h, x)

次に、代表検索特徴データ生成手段６０は、検索特徴データ配列Ｚ(ｎ,ｈ,ｘ)を用いて平均値配列Ｃ（ｎ）、標準偏差配列Ｌ（ｎ）を算出する（Ｓ３０）。具体的には、以下の〔数式１１〕に従った処理を実行することにより、平均値配列Ｃ（ｎ）、標準偏差配列Ｌ（ｎ）を算出する。 Next, the representative search feature data generation means 60 calculates the average value array C (n) and the standard deviation array L (n) using the search feature data array Z (n, h, x) (S30). Specifically, the average value array C (n) and the standard deviation array L (n) are calculated by executing processing according to the following [Equation 11].

〔数式１１〕
Ｃ(ｎ)=[Σ_x=0,…,X-1Σ_h=0,…,H-1Ｚ(ｎ,ｈ,ｘ)]／(５Ｘ)
Ｌ(ｎ)=[Σ_x=0,…,X-1Σ_h=0,…,H-1(Ｚ(ｎ,ｈ,ｘ)-Ｃ(ｎ))²／(５Ｘ)]^1/2 [Formula 11]
C (n) = [Σx _{= 0, ..., X-1} Σh _{= 0, ..., H-} 1Z (n, h, x)] / (5X)
L (n) = [Σx _{= 0, ..., X-1} Σh _{= 0, ..., H-1} (Z (n, h, x) -C (n)) ² / (5X)] ^1/2

図１２は、Ｓ２９、Ｓ３０による代表検索特徴データの平均値、標準偏差の算出処理の概念図である。図１２は、ある特定の位相ｈに対する代表検索特徴データの平均値、標準偏差の算出処理を示している。図１２に示すように、検索特徴ワードは各時刻ｘ（ｘ＝０,・・・,Ｘ−１）に対応して３２ビットの特徴パターンと８ビットの音量データを有している。図１２においては、特徴パターンの３２個の各ビットをＢｉｔ０〜Ｂｉｔ３１で示している。 FIG. 12 is a conceptual diagram of processing for calculating the average value and standard deviation of representative search feature data in S29 and S30. FIG. 12 shows a process for calculating the average value and standard deviation of the representative search feature data for a specific phase h. As shown in FIG. 12, the search feature word has a 32-bit feature pattern and 8-bit volume data corresponding to each time x (x = 0,..., X−1). In FIG. 12, 32 bits of the feature pattern are indicated by Bit0 to Bit31.

そして、上記〔数式１０〕に示したようにビットｎ（Ｂｉｔ０〜Ｂｉｔ３１）の値が０であるか１であるかにより、Ｚ(ｎ, ｈ, ｘ)の値を音量データの値そのままとするか、音量データの値に−１を乗じたものとするかを決定し、ビットｎに対応するＺ(ｎ, ｈ, ｘ)の値を定める。このとき、８ビットの音量データを負の値とする場合が生じるため、Ｚ(ｎ, ｈ, ｘ)は１６ビットで表現する。３２ビット特徴パターンでは、各バンドについて１ビットで表現されていたものが、１６ビット（２バイト）で表現されることになるので、各時刻ｘにおける検索特徴データは６４バイトとなる。 Then, as shown in the above [Equation 10], depending on whether the value of the bit n (Bit 0 to Bit 31) is 0 or 1, the value of Z (n, h, x) remains as the value of the volume data. Or the value of the volume data is multiplied by −1, and the value of Z (n, h, x) corresponding to bit n is determined. At this time, since 8-bit volume data may be a negative value, Z (n, h, x) is expressed by 16 bits. In the 32-bit feature pattern, what is represented by 1 bit for each band is represented by 16 bits (2 bytes), so the search feature data at each time x is 64 bytes.

検索の目的とする検索音響データについて、検索特徴ワード群および代表検索特徴データが得られたら、実際に音響データベース４０を参照して照合処理を実行することになる。この際、まず、照合範囲決定手段７０が、登録特徴ワードの照合範囲を決定する処理を行う。検索音響データは、利用者が検索用に用いる音響データであるため、必ずしも楽曲全体を記録しているとは限らず、楽曲の一部だけを録音して音響データとして取得したような場合もある。そのため、照合範囲決定手段７０は、検索音響データが実際に元の楽曲のどの部分から取得されたものであるかがわからないことを前提として、効率的に照合を行うための照合範囲の決定を行う。照合範囲決定の概念図を図１３に示す。図１３における矩形の横幅は、特徴ワード群を構成する特徴ワード数を示している。 When the search feature word group and the representative search feature data are obtained for the search acoustic data to be searched, the collation process is actually executed with reference to the acoustic database 40. At this time, first, the collation range determining means 70 performs processing for determining the collation range of the registered feature word. The search sound data is the sound data used by the user for the search. Therefore, the entire music is not necessarily recorded, and only a part of the music may be recorded and acquired as the sound data. . For this reason, the collation range determination means 70 determines a collation range for efficient collation on the premise that it is not known from which part of the original music the retrieved acoustic data is actually acquired. . FIG. 13 shows a conceptual diagram for determining the collation range. The horizontal width of the rectangle in FIG. 13 indicates the number of feature words constituting the feature word group.

具体的には、まず、照合範囲決定手段７０は、検索特徴ワード群の検索特徴ワード数Ｘとあるレコードｒについての登録特徴ワード群の登録特徴ワード数Ｙ（ｒ）を比較する。ここでは、検索特徴ワード群の検索特徴ワード数は、再生時の長さに比例するものとして登録特徴ワード群の登録特徴ワード数との比較を行っている。したがって、検索特徴ワード数Ｘは位相を考慮しない場合のものとする。 Specifically, first, the collation range determination means 70 compares the number X of search feature words in the search feature word group with the number Y (r) of registered feature words in the registered feature word group for a certain record r. Here, the number of search feature words in the search feature word group is compared with the number of registered feature words in the registered feature word group as being proportional to the length at the time of reproduction. Therefore, the search feature word number X is assumed to be when the phase is not considered.

そして、検索特徴ワード群の検索特徴ワード数Ｘと登録特徴ワード群の登録特徴ワード数Ｙ（ｒ）の比較の結果、ＸがＹ（ｒ）より大きい場合、すなわち、図１３（ａ）に示すような場合、照合範囲決定手段７０は、そのレコードを検索対象から除外し、次のレコードに移行する。ＸがＹ（ｒ）より大きい場合、検索音響データの一部が原音響データより長いことになる。原音響データは音響データ全体を記録しているため、この両者は異なる原音響データに基づくものであることが明らかとなる。そこで、この場合は、そのレコードを検索対象から除外するのである。 Then, as a result of comparison between the search feature word number X of the search feature word group and the registered feature word number Y (r) of the registered feature word group, when X is larger than Y (r), that is, as shown in FIG. In such a case, the collation range determination unit 70 excludes the record from the search target and moves to the next record. When X is larger than Y (r), a part of the search sound data is longer than the original sound data. Since the original sound data records the entire sound data, it becomes clear that both are based on different original sound data. Therefore, in this case, the record is excluded from the search target.

検索特徴ワード群の検索特徴ワード数Ｘと登録特徴ワード群の登録特徴ワード数Ｙ（ｒ）の比較の結果、ＸがＹ（ｒ）より小さい場合、照合範囲決定手段７０は、検索特徴ワード群に、原音響データの先頭部分から作成した特徴ワードが含まれているかどうかを判断する。具体的には、ＲＡＭ３ｂの所定の領域を参照し、モード設定手段４５によりイントロ検索または全尺検索が設定されているかどうかを確認する。これらの検索モードが設定されている場合、検索特徴ワード群に、原音響データの先頭部分から作成した特徴ワードが含まれていることが明らかであるため、検索音響データと原音響データが同一であるとすれば、先頭の検索特徴ワードと登録特徴ワードが一致するはずである。そこで、検索特徴ワード群に、原音響データの先頭部分から作成した特徴ワードが含まれている場合は、照合範囲決定手段７０は、先頭の検索特徴ワードと登録特徴ワードを照合対象として設定する処理を行う。しかし、先頭の１つの特徴ワード同士を比較しただけでは、他の楽曲にも同様な特徴ワードをもつ可能性もあるため、正しい照合結果が得られなくなる。そこで、照合範囲決定手段７０は、所定の時間の長さに対応するα個の特徴ワードを照合対象とする処理を行う。本実施形態では、１２秒程度に相当する特徴ワード群を照合対象とする処理を行っている。上述のように、本実施形態では、１特徴ワードは約０．５０６秒であるので、１２秒は、特徴ワード２４個分に相当する。すなわち、この場合、検索特徴ワード群の先頭からの特徴ワード２４個と、登録特徴ワード群の先頭からの特徴ワード２４個が照合対象とされる。このときの状態を示したものが図１３（ｂ）である。 If X is smaller than Y (r) as a result of comparison between the search feature word number X of the search feature word group and the registered feature word number Y (r) of the registered feature word group, the collation range determination means 70 searches the search feature word group. Whether or not a feature word created from the head portion of the original sound data is included. Specifically, referring to a predetermined area of the RAM 3b, it is confirmed whether or not the intro search or full scale search is set by the mode setting means 45. When these search modes are set, it is clear that the search feature word group includes a feature word created from the beginning of the original sound data, so the search sound data and the original sound data are the same. If there is, the first search feature word and the registered feature word should match. Therefore, when the search feature word group includes a feature word created from the head portion of the original sound data, the collation range determination unit 70 sets the head search feature word and the registered feature word as a collation target. I do. However, just comparing the top one feature word may result in another song having the same feature word, so that a correct collation result cannot be obtained. Accordingly, the collation range determination unit 70 performs processing for targeting α feature words corresponding to a predetermined length of time. In the present embodiment, processing is performed on a feature word group corresponding to about 12 seconds. As described above, in this embodiment, one feature word is about 0.506 seconds, so 12 seconds corresponds to 24 feature words. That is, in this case, 24 feature words from the beginning of the search feature word group and 24 feature words from the beginning of the registered feature word group are targeted for collation. FIG. 13B shows the state at this time.

検索特徴ワード群に、原音響データの先頭部分から作成した特徴ワードが含まれていない場合で、かつ検索特徴ワード群の先頭部分からα（＜Ｘ）個を照合対象とする場合、照合範囲決定手段７０は、登録特徴ワード群における検索特徴ワード群の先頭検索特徴ワード（ｘ＝０）との照合範囲を、先頭（ｙ＝０）から登録特徴ワード群の最後尾登録特徴ワード（ｙ＝Ｙ（ｒ）−１）より検索特徴ワード群の個数（Ｘ）だけ前方に位置する範囲までとする。すなわち、登録特徴ワード群における検索特徴ワード群の先頭検索特徴ワード（ｘ＝０）との照合範囲は、図１３（ｃ）に示すように、先頭（ｙ＝０）から（Ｙ（ｒ）−Ｘ−１）番目の登録特徴ワードまでとなる。 If the search feature word group does not include the feature word created from the head part of the original sound data and if α (<X) from the head part of the search feature word group is to be collated, the collation range is determined. The means 70 sets the collation range with the first search feature word (x = 0) of the search feature word group in the registered feature word group from the start (y = 0) to the last registered feature word (y = Y) of the registered feature word group. From (r) -1), the number of search feature word groups (X) is the range up to the front. That is, the collation range of the search feature word group in the registered feature word group with the head search feature word (x = 0) is as shown in FIG. 13C from the head (y = 0) to (Y (r) − X-1) Up to the registered feature word.

本実施形態では、標準設定として、検索特徴ワード群の先頭からの特徴ワードα個を照合対象としているが、検索特徴ワード群の最後尾からの特徴ワードα個、検索特徴ワード群の中央の特徴ワードα個を照合対象に設定しておくことも可能である。検索特徴ワード群の最後尾からの特徴ワードα個が照合対象として設定されている場合、登録特徴ワード群における検索特徴ワード群の先頭検索特徴ワード（ｘ＝０）との照合範囲は、図１３（ｄ）に示すように、（Ｘ−α）番目から（Ｙ（ｒ）−α）番目の登録特徴ワードまでとなる。 In the present embodiment, as a standard setting, α feature words from the beginning of the search feature word group are targeted for collation, but α feature words from the tail of the search feature word group and the feature at the center of the search feature word group are included. It is also possible to set α words as collation targets. When α feature words from the tail of the search feature word group are set as collation targets, the collation range of the search feature word group in the registered feature word group with the first search feature word (x = 0) is as shown in FIG. As shown in (d), from (X-α) th to (Y (r) -α) th registered feature word.

また、検索特徴ワード群の中央の特徴ワードα個が照合対象として設定されている場合、登録特徴ワード群における検索特徴ワード群の先頭検索特徴ワード（ｘ＝０）との照合範囲は、図１３（ｅ）に示すように、｛（Ｘ−α）／２｝番目から｛Ｙ（ｒ）−α−（Ｘ−α）／２｝番目の登録特徴ワードまでとなる。照合範囲決定手段７０は、以上のようにして決定した登録特徴ワードの照合範囲を、レコード別に特徴ワード照合手段９０に渡す。
When the central feature word α of the search feature word group is set as a collation target, the collation range of the search feature word group in the registered feature word group with the first search feature word (x = 0) is as shown in FIG. As shown in ( e ), from {(X−α) / 2} th to {Y (r) −α− (X−α) / 2} th registered feature word. The collation range determination unit 70 passes the collation range of the registered feature word determined as described above to the feature word collation unit 90 for each record.

登録特徴ワードの照合範囲が決定したら、次に、代表特徴データ照合手段８０が、代表検索特徴データを用いて、検索対象であるレコードの絞込みを行う。具体的には、代表検索特徴データと音響データベース４０に登録された代表登録特徴データとの比較を行い、所定の条件を満たすレコードのみを、特徴ワード照合手段９０として抽出する。そして、特徴ワード照合手段９０が、検索特徴ワード群を用いて、音響データベース４０に登録された原音響データの関連情報を検索する。具体的には、検索特徴ワード群と、音響データベース４０に登録された登録特徴ワード群との比較を行い、所定の条件を満たすレコードを抽出する。代表特徴データ照合手段８０による処理は、検索特徴ワード群を用いた比較処理の演算負荷が高いため、検索対象とするレコードを絞込むために行われる。 When the collation range of the registered feature word is determined, the representative feature data collating unit 80 next narrows down the records to be searched using the representative search feature data. Specifically, the representative search feature data and the representative registered feature data registered in the acoustic database 40 are compared, and only records satisfying a predetermined condition are extracted as the feature word matching means 90. And the characteristic word collation means 90 searches the relevant information of the original sound data registered into the sound database 40 using the search characteristic word group. Specifically, the search feature word group and the registered feature word group registered in the acoustic database 40 are compared, and a record satisfying a predetermined condition is extracted. The processing by the representative feature data matching unit 80 is performed to narrow down the records to be searched because the calculation load of the comparison processing using the search feature word group is high.

ここで、代表特徴データ照合手段８０により行われる代表特徴データの概念について説明しておく。図１４は、代表特徴データの概念を示す図である。代表特徴データは、図６、図１２に示したように、３２個の統合周波数帯に応じた値を持つ。したがって、各統合周波数帯を１次元とする３２次元構造をとることになる。代表特徴データを用いた絞込みでは、３２次元空間にプロットした検索音響データの代表検索特徴データと交わる代表登録特徴データを有する原音響データのみを絞り込み結果として抽出する。 Here, the concept of representative feature data performed by the representative feature data matching unit 80 will be described. FIG. 14 is a diagram illustrating the concept of representative feature data. The representative feature data has values corresponding to the 32 integrated frequency bands as shown in FIGS. Therefore, the integrated frequency band has a one-dimensional structure. In the narrowing down using the representative feature data, only the original sound data having the representative registered feature data intersecting with the representative search feature data of the search sound data plotted in the 32-dimensional space is extracted as the narrowing result.

図１４では、概念的に示すため、３２次元の代表特徴データのうち、ある２次元についてプロットしている。図１４において、黒丸は、音響データごとの平均値、黒丸を中心とした円は、黒丸から音響データごとの標準偏差を半径としている。そして、その円が検索音響データの円と交わらない原音響データを検索対象から除外し、その円が検索音響データの円と交わる原音響データのみを絞り込み結果として抽出する。そして、これを全３２次元について交わるものを抽出するか、所定数の次元で交わるものを抽出するかは、適宜設定することができる。本実施形態では、〔数式７〕〔数式１１〕に示したように、３２の周波数帯ごとに平均値と標準偏差で２次元化している。 In FIG. 14, for the sake of conceptual illustration, a certain two-dimensional plot is plotted among the 32-dimensional representative feature data. In FIG. 14, the black circle is an average value for each acoustic data, and the circle around the black circle has a radius that is a standard deviation for each acoustic data from the black circle. Then, the original sound data whose circle does not intersect with the circle of the search sound data is excluded from the search target, and only the original sound data whose circle intersects with the circle of the search sound data is extracted as the narrowing result. Then, it can be set as appropriate whether to extract what intersects all 32 dimensions or extract what intersects a predetermined number of dimensions. In this embodiment, as shown in [Formula 7] and [Formula 11], each of the 32 frequency bands is two-dimensionalized with an average value and a standard deviation.

続いて、代表特徴データ照合手段８０、特徴ワード照合手段９０による検索処理を、図１５〜図１９のフローチャートを用いて説明する。図１５〜図１９においては、各変数を以下のように定義する。 Next, search processing by the representative feature data matching unit 80 and the feature word matching unit 90 will be described with reference to the flowcharts of FIGS. 15 to 19, each variable is defined as follows.

[登録特徴ワード]
Ｒ：レコード件数（音響データベース４０が管理する原音響データの数）
Ｙ（ｒ）：レコードｒ(ｒ＝０，…，Ｒ−１)の登録特徴ワード数
Ｆｄ（ｒ，ｙ）：レコードｒ(ｒ＝０，…，Ｒ−１)の特徴パターン配列（ｙ＝０，…，Ｙ−１）、３２ビット
Ｖｄ（ｒ，ｙ）：レコードｒ(ｒ＝０，…，Ｒ−１)の音量データ配列（ｙ＝０，…，Ｙ−１）、８ビット
Ｃｄ（ｎ，ｒ）：レコードｒ(ｒ＝０，…，Ｒ−１)における登録特徴データ配列の平均値
Ｌｄ（ｎ，ｒ）：レコードｒ(ｒ＝０，…，Ｒ−１)における登録特徴データ配列の標準偏差 [Registration feature word]
R: Number of records (number of original acoustic data managed by the acoustic database 40)
Y (r): Number of registered feature words of record r (r = 0,..., R−1) Fd (r, y): Feature pattern array of record r (r = 0,..., R−1) (y = 0,..., Y-1), 32 bits Vd (r, y): Volume data array (y = 0,..., Y-1) of record r (r = 0,..., R-1), 8 bits Cd (N, r): average value of registered feature data array in record r (r = 0,..., R−1) Ld (n, r): registered feature in record r (r = 0,..., R−1) Standard deviation of data array

[検索特徴ワード]
Ｘ（ｈ）：位相番号ｈ（ｈ＝０，…，Ｈ−１）における検索特徴ワード数
Ｆ（ｈ，ｘ）：特徴パターン配列（ｘ＝０，…，Ｘ（ｈ）−１）、３２ビット
Ｖ（ｈ，ｘ）：音量データ配列（ｘ＝０，…，Ｘ（ｈ）−１）、８ビット
Ｃ（ｎ）：検索特徴データ配列の平均値
Ｌ（ｎ）：検索特徴データ配列の標準偏差 [Search feature word]
X (h): Number of search feature words in phase number h (h = 0,..., H−1) F (h, x): Feature pattern array (x = 0,..., X (h) −1), 32 Bit V (h, x): Volume data array (x = 0,..., X (h) -1), 8 bits C (n): Average value of search feature data array L (n): Search feature data array standard deviation

[照合変数]
Ｗ：照合ワード数（ｗ＝０，…，Ｗ−１）、照合する登録特徴ワード、検索特徴ワードの数（例．Ｗ＝６）
Ｕ（ｒ）：代表特徴データ距離
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）：特徴パターンのワード単位不一致ビット数（０以上３２以下）
Ｓ（ｒ，ｙ，ｈ，ｘ）：合算不一致ビット数、照合ワード数Ｗ個のワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）を合算したもの（Σ_w=0,…,_W-1Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）） [Collation variable]
W: Number of collation words (w = 0,..., W-1), registered feature words to be collated, number of search feature words (eg, W = 6)
U (r): representative feature data distance D (r, y + w, h, x + w): word unit mismatch bit number of feature pattern (0 to 32)
S (r, y, h, x): the sum of the number of mismatched bits, the number of matching word counts D (r, y + w, h, x + w) (Σw _{= 0,...} , _{W -1} D (r, y + w, h, x + w))

[判定しきい値（事前に設定）]
Ｍｕ：代表特徴データ距離Ｕ（ｒ）の判定しきい値
Ｍ１：合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）の判定しきい値
Ｍ２：合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）の判定しきい値
Ｍｗ１：ワード単位不一致ビット数Ｄ（ｒ，ｙ，ｈｏ，ｘ）の判定しきい値
Ｍｗ２：ワード単位不一致ビット数Ｄ（ｒ，ｙ，ｈ，ｘ）の判定しきい値 [Judgment threshold (preset)]
Mu: Determination threshold value of representative feature data distance U (r) M1: Determination threshold value of total mismatch bit number S (r, y, ho, x) M2: Total mismatch bit number S (r, y, h, x) Determination threshold value Mw1: Determination threshold value of word unit mismatch bit number D (r, y, ho, x) Mw2: Determination threshold value of word unit mismatch bit number D (r, y, h, x) value

判定しきい値Ｍｕ、Ｍ１、Ｍ２、Ｍｗ１、Ｍｗ２については、以下のようにして、原音響データごとに求めることができる。Ｍｕは登録対象の原音響データに対して登録特徴ワードおよび代表登録特徴データを生成してレコードｒに登録する際に、原音響データより無作為に数箇所切り出した部分音響データを複数個作成し、各部分音響データに対して検索特徴ワードと代表検索特徴データを作成し、各々後述の数式１２に基づいて算出される代表特徴データ距離Ｕ（ｒ）を算出して、全てのレコードｒについて各々算出される複数の代表特徴データ距離Ｕ（ｒ）の最大値＋βをＭｕとして与える。（βは最大値より少し大きめに設定する言う意味で、例えばβ＝１）。これにより、少なくとも該当するレコードｒに対する代表特徴データ距離Ｕ（ｒ）はＭｕを超えることは無いため、代表特徴データ距離Ｕ（ｒ）がＭｕを超えれば該当しないレコードであると即判断できる。 The determination threshold values Mu, M1, M2, Mw1, and Mw2 can be obtained for each original sound data as follows. Mu generates a plurality of partial sound data that are randomly cut out from the original sound data when generating a registered feature word and representative registered feature data for the original sound data to be registered and registering it in the record r. , A search feature word and representative search feature data are created for each partial sound data, and a representative feature data distance U (r) calculated based on Equation 12 described later is calculated. The maximum value + β of the calculated representative feature data distances U (r) is given as Mu. (In the sense that β is set slightly larger than the maximum value, for example, β = 1). Thereby, since at least the representative feature data distance U (r) for the corresponding record r does not exceed Mu, if the representative feature data distance U (r) exceeds Mu, it can be immediately determined that the record is not applicable.

Ｍ１とＭｗ１については、登録特徴ワードと複数の検索特徴ワードの各々と、ｈ＝ｈｏ固定にして、ワード単位不一致ビット数Ｄ（ｒ，ｙ，ｈｏ，ｘ）を後述する方法で所定のワード数だけ連続して算出し、それらの総和を基に合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）を算出し、この値が最も小さくなるときの最小合算不一致ビット数Ｓｍｉｎ（ｒ，ｙ，ｈｏ，ｘ）を算出するとともに、その時の総和の各要素Ｄ（ｒ，ｙ，ｈｏ，ｘ）の最大ワード単位不一致ビット数Ｄｍａｘ（ｒ，ｙ，ｈｏ，ｘ）を求め、複数の検索特徴ワードごとに求めたＳｍｉｎ（ｒ，ｙ，ｈｏ，ｘ）の最大値＋βをＭ１とし、Ｄｍａｘ（ｒ，ｙ，ｈｏ，ｘ）の最大値＋βをＭｗ１とする。 For M1 and Mw1, the registered feature word and each of the plurality of search feature words are fixed to h = ho, and the word unit mismatch bit number D (r, y, ho, x) is determined by a method described later. Are calculated continuously, and the total number of unmatched bits S (r, y, ho, x) is calculated based on the sum of them, and the minimum total number of unmatched bits Smin (r, y, ho, x), and the maximum word unit mismatch bit number Dmax (r, y, ho, x) of each element D (r, y, ho, x) of the sum at that time is obtained, and a plurality of search feature words The maximum value + β of Smin (r, y, ho, x) obtained for each is defined as M1, and the maximum value + β of Dmax (r, y, ho, x) is defined as Mw1.

Ｍ２とＭｗ２については、登録特徴ワードと複数の検索特徴ワードの各々と、ｈを０からＨ−１の範囲で変化させながら、ワード単位不一致ビット数Ｄ（ｒ，ｙ，ｈ，ｘ）を後述する方法で所定のワード数だけ連続して算出し、それらの総和を基に合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）を算出し、この値が最も小さくなるときの最小合算不一致ビット数Ｓｍｉｎ（ｒ，ｙ，ｈ，ｘ）を算出するとともに、その時の総和の各要素Ｄ（ｒ，ｙ，ｈ，ｘ）の最大ワード単位不一致ビット数Ｄｍａｘ（ｒ，ｙ，ｈ，ｘ）を求め、複数の検索特徴ワードごとに求めたＳｍｉｎ（ｒ，ｙ，ｈ，ｘ）の最大値＋βをＭ２とし、Ｄｍａｘ（ｒ，ｙ，ｈ，ｘ）の最大値＋βをＭｗ１とする。 As for M2 and Mw2, the word unit mismatch bit number D (r, y, h, x) is described later while changing each of the registered feature words and the plurality of search feature words and h in the range of 0 to H-1. In this way, a predetermined number of words are continuously calculated, and the total mismatch bit number S (r, y, h, x) is calculated based on the total sum of them, and the minimum total mismatch bit when this value is the smallest The number Smin (r, y, h, x) is calculated, and the maximum word unit mismatch bit number Dmax (r, y, h, x) of each element D (r, y, h, x) of the sum at that time is calculated. The maximum value + β of Smin (r, y, h, x) obtained for each of the plurality of search feature words is set as M2, and the maximum value + β of Dmax (r, y, h, x) is set as Mw1.

上記のような処理を、コンピュータに専用のプログラムを実行させることにより行い、レコードごとに異なる判定しきい値Ｍｕ、Ｍ１、Ｍ２、Ｍｗ１、Ｍｗ２を得ることができる。そして、全てのレコードに対して算出された各判定しきい値の最大値を、全てのレコードに共通な判定しきい値として設定する。本実施形態では、Ｍｕ＝３５０、Ｍ１＝７０、Ｍ２＝５４、Ｍｗ１＝２０、Ｍｗ２＝１３と与えている。例えば、レコード１がＭｕ＝３５０、Ｍ１＝６０、Ｍ２＝５４、Ｍｗ１＝１８、Ｍｗ２＝１３で、レコード２がＭｕ＝３００、Ｍ１＝７０、Ｍ２＝４４、Ｍｗ１＝２０、Ｍｗ２＝１２であれば、双方のレコードに共通な判定しきい値として、Ｍｕ＝３５０、Ｍ１＝７０、Ｍ２＝５４、Ｍｗ１＝２０、Ｍｗ２＝１３と与える。 The processing as described above is performed by causing a computer to execute a dedicated program, and different determination threshold values Mu, M1, M2, Mw1, and Mw2 can be obtained for each record. Then, the maximum value of each determination threshold value calculated for all records is set as a determination threshold value common to all records. In the present embodiment, Mu = 350, M1 = 70, M2 = 54, Mw1 = 20, and Mw2 = 13 are given. For example, record 1 is Mu = 350, M1 = 60, M2 = 54, Mw1 = 18, Mw2 = 13, and record 2 is Mu = 300, M1 = 70, M2 = 44, Mw1 = 20, Mw2 = 12. For example, Mu = 350, M1 = 70, M2 = 54, Mw1 = 20, and Mw2 = 13 are given as judgment threshold values common to both records.

図１５は、特徴ワード照合手段９０による検索特徴ワード群を用いた音響データ検索のフローチャートである。まず、初期設定を行う（Ｓ２１０）。具体的には、適合テーブルＳｍｉｎ（ｃ）＝初期値Ｂｉｇ２、適合テーブルＲｍｉｎ（ｃ）＝−１、適合件数ｃ＝０、レコード番号ｒ＝０に設定する。初期値Ｂｉｇ２は、最小不一致ビット数として取り得る値よりも十分に大きな値であれば良く、事前に設定されることになる。 FIG. 15 is a flowchart of the acoustic data search using the search feature word group by the feature word collating unit 90. First, initial setting is performed (S210). Specifically, the matching table Smin (c) = initial value Big2, the matching table Rmin (c) = − 1, the matching number c = 0, and the record number r = 0 are set. The initial value Big2 may be a value sufficiently larger than a value that can be taken as the minimum number of mismatch bits, and is set in advance.

次に、特徴ワード照合手段９０は、代表特徴データ距離Ｕ（ｒ）の算出を行う（Ｓ２１１）。具体的には、以下の〔数式１２〕に従った処理を実行し、レコードｒについての、代表登録特徴データと代表検索特徴データの距離である代表特徴データ距離Ｕ（ｒ）を算出する。 Next, the feature word matching unit 90 calculates the representative feature data distance U (r) (S211). Specifically, processing according to the following [Equation 12] is executed to calculate the representative feature data distance U (r) that is the distance between the representative registered feature data and the representative search feature data for the record r.

〔数式１２〕
Ｕ（ｒ）＝２５６×［Σ_n=0,…,31｛（Ｃｄ（ｎ，ｒ）−Ｃ（ｎ））／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））｝²］^1/2／［Σ_n=0,…,31｛Ｃｄ（ｎ，ｒ）／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））｝²］^1/4／［Σ_n=0,…,31｛（Ｃ（ｎ）／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））｝²］^1/4 [Formula 12]
U (r) = 256 × [Σ _{n = 0,..., 31} {(Cd (n, r) −C (n)) / (Ld (n, r) + L (n))} ² ] ^1/2 / [Σ _{n = 0, ..., 31} {Cd (n, r) / (Ld (n, r) + L (n))} ² ] ^1/4 / [Σ _{n = 0, ..., 31} {(C (n ) / (Ld (n, r) + L (n))} ² ] ^1/4

上記〔数式１２〕では、[]で括った項が３つ存在するが、１番目の[]の平方根を、２番目の[]の４乗根、３番目の[]の４乗根で除したものに正規化のための係数“２５６”を乗じている。正規化のための係数は、正規化の範囲に合わせて適宜変更することが可能である。なお、上記〔数式１２〕において、“Σ_n=0,…,31”は、ｎ＝０から３１までｎを１ずつ増加させたときの３２個分の総和を意味する。［］で括った各項のべき乗根として、具体的に何乗にするかは適宜変更することができるが、本実施形態では、〔数式１２〕に示すように平方根、４乗根、４乗根としている。 In the above [Equation 12], there are three terms enclosed by []. The square root of the first [] is divided by the fourth root of the second [] and the fourth root of the third []. Is multiplied by a coefficient “256” for normalization. The coefficient for normalization can be appropriately changed according to the range of normalization. In the above [Expression 12], “Σ _{n = 0,..., 31} ” means the total of 32 pieces when n is incremented by 1 from n = 0 to 31. As the power root of each term enclosed in [], the specific power can be changed as appropriate, but in this embodiment, as shown in [Formula 12], the square root, the fourth root, the fourth power Roots.

上記〔数式１２〕は、代表検索特徴データと代表登録特徴データとの距離を示すが、双方の正規化相関係数を基に定義している。具体的には、代表検索特徴データの各要素ｎを検索特徴データ配列の平均値に対して検索特徴データ配列の標準偏差値と登録特徴データ配列の標準偏差値との和で３２個の各要素ごとに除して正規化した各値で定義し、（Ｃ（ｎ）／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））とする。同様に、代表登録特徴データの各要素を登録特徴データ配列の平均値に対して検索特徴データ配列の標準偏差値と登録特徴データ配列の標準偏差値との和で３２個の各要素ごとに除して正規化した各値で定義し、Ｃｄ（ｎ，ｒ）／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））とする。これら、３２個の要素どうしの正規化相関係数は、［Σ_n=0,…,31｛（Ｃｄ（ｎ，ｒ）−Ｃ（ｎ））／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））｝²］／［Σ_n=0,…,31｛Ｃｄ（ｎ，ｒ）／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））｝²］^1/2／［Σ_n=0,…,31｛（Ｃ（ｎ）／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））｝²］^1/2で与えられ、−１〜＋１の実数値をとる。上記〔数式１２〕では、この正規化相関係数値に所定の整数値２５６を乗算して整数表現にし、かつ整数値の変動範囲を拡大して該当レコードと非該当レコードとの格差をつけるため、平方根をとるようにした。このとき、正規化相関係数値をそのまま使用するか、平方根にするか、４乗根にするかは運用上の設計事項で適宜最適な方法を選択すれば良い。 [Equation 12] indicates the distance between the representative search feature data and the representative registered feature data, and is defined based on the normalized correlation coefficient of both. Specifically, each element n of the representative search feature data is represented by the sum of the standard deviation value of the search feature data array and the standard deviation value of the registered feature data array with respect to the average value of the search feature data array. Each value is defined by being normalized and divided by (C (n) / (Ld (n, r) + L (n)). Similarly, each element of the representative registered feature data is registered in the registered feature data array. Is defined by each value normalized by dividing each of the 32 elements by the sum of the standard deviation value of the search feature data array and the standard deviation value of the registered feature data array with respect to the average value of Cd (n, r) / (Ld (n, r) + L (n)) The normalized correlation coefficient between these 32 elements is [Σ _{n = 0,..., 31} {(Cd (n, r) -C (n)) / (Ld (n, r) + L (n))} 2] / [Σ n = 0, ..., 31 {Cd (n, r) / (Ld (n, r) + L (n ^{^{)} 2] 1/2 / [Σ}} n = 0, ..., 31 {(C (n) / (Ld (n, r) + L (n))} 2] is given by ^1/2, -1 to +1 In the above [Equation 12], the normalized correlation coefficient value is multiplied by a predetermined integer value 256 to give an integer expression, and the fluctuation range of the integer value is expanded to show the corresponding record and the non-corresponding record. In order to make the difference, the normalized correlation coefficient value is used as it is, whether it is the square root or the fourth root. Just choose.

代表特徴データ距離Ｕ（ｒ）は、図１４の検索音響データの円と、原音響データの円の距離に相当する。現実には、絞り込み結果として抽出する対象の決定を、図１４に示した円と円が交わった場合とするか円と円が所定の距離範囲内である場合とするか等適宜設定することができる。本実施形態では、〔数式１２〕により算出された代表特徴データ距離Ｕ（ｒ）と判定しきい値との比較により絞り込み対象を決定する。 The representative feature data distance U (r) corresponds to the distance between the circle of the search sound data and the circle of the original sound data in FIG. In reality, the determination of the target to be extracted as the narrowing-down result may be set as appropriate, for example, when the circle and the circle shown in FIG. 14 intersect or when the circle and the circle are within a predetermined distance range. it can. In the present embodiment, a narrowing target is determined by comparing the representative feature data distance U (r) calculated by [Equation 12] with a determination threshold value.

次に、特徴ワード照合手段９０は、算出した代表特徴データ距離Ｕ（ｒ）と事前に設定されている判定しきい値Ｍｕの比較を行う（Ｓ２１２）。比較の結果、代表特徴データ距離Ｕ（ｒ）がしきい値Ｍｕ以上である場合は、Ｓ２５０に進み、ｒをインクリメントして、次のレコードｒ＋１についての処理を行う。代表特徴データ距離Ｕ（ｒ）が判定しきい値Ｍｕ以上である場合、そのレコードｒについての詳細な照合計算は行わないことになる。すなわち、本実施形態では、代表特徴データ距離Ｕ（ｒ）の値によって、検索対象の絞込みを行っていることになる。 Next, the feature word collating unit 90 compares the calculated representative feature data distance U (r) with a preset determination threshold value Mu (S212). As a result of the comparison, if the representative feature data distance U (r) is greater than or equal to the threshold value Mu, the process proceeds to S250, where r is incremented, and the process for the next record r + 1 is performed. When the representative feature data distance U (r) is greater than or equal to the determination threshold value Mu, detailed collation calculation for the record r is not performed. That is, in this embodiment, the search target is narrowed down by the value of the representative feature data distance U (r).

Ｓ２１２における比較の結果、代表特徴データ距離Ｕ（ｒ）が判定しきい値Ｍｕより小さい場合は、レコードｒに対応付けて登録された登録特徴ワード群と、検索特徴ワード群との照合を行う（Ｓ２２０）。Ｓ２２０における照合の結果、所定の条件を満たすレコードについては、Ｒｍｉｎ（ｃ）にそのレコード番号ｒが与えられて出力される。このＳ２２０の処理については後述する。 As a result of the comparison in S212, when the representative feature data distance U (r) is smaller than the determination threshold value Mu, the registered feature word group registered in association with the record r is collated with the search feature word group ( S220). As a result of the collation in S220, the record number r is given to Rmin (c) and outputted. The process of S220 will be described later.

Ｒｍｉｎ（ｃ）が得られたら、Ｒｍｉｎ（ｃ）が０以上かどうかを判断する（Ｓ２３０）。Ｒｍｉｎ（ｃ）が０以上の場合、レコードが適合したと判断して、適合件数ｃに１加算する処理を行う（Ｓ２４０）。Ｒｍｉｎ（ｃ）が０未満の場合、レコードが適合しなかったと判断して、適合件数ｃの加算は行わない。詳しくは後述するが、Ｓ２２０において、Ｒｍｉｎ（ｃ）に０未満の値を初期値として設定しておき、レコードが適合すると判断された場合に０以上の値であるレコード番号ｒをＲｍｉｎ（ｃ）に与える。このため、Ｓ２４０においては、Ｒｍｉｎ（ｃ）が０以上かどうかを判断することによりレコードの適合を判断するのである。 If Rmin (c) is obtained, it is determined whether Rmin (c) is 0 or more (S230). When Rmin (c) is 0 or more, it is determined that the record is matched, and a process of adding 1 to the number of matching cases c is performed (S240). If Rmin (c) is less than 0, it is determined that the record has not been matched, and the matching number c is not added. As will be described in detail later, in S220, a value less than 0 is set as an initial value in Rmin (c), and when it is determined that the record is suitable, a record number r that is 0 or more is set to Rmin (c). To give. For this reason, in S240, the suitability of the record is determined by determining whether Rmin (c) is 0 or more.

次に、レコードを特定する変数ｒをインクリメント、すなわち１だけ増加する（Ｓ２５０）。そして、レコードを特定する変数ｒが音響データベース４０内の総レコード数Ｒに達したかどうかを判断し（Ｓ２６０）、達していない場合は、Ｓ２２０に戻って、次のレコードｒについて照合処理を行う。各レコードｒについて処理を実行し、レコードｒが総レコード数Ｒに達したら、すなわちＲ個全ての総レコードに対する処理を終えたら、適合テーブルＲｍｉｎ（ｃ）を、適合テーブルＳｍｉｎ（ｃ）の値に基づいて昇順ソートし、適合件数ｃとともに一覧出力する（Ｓ２７０）。 Next, the variable r specifying the record is incremented, that is, increased by 1 (S250). Then, it is determined whether or not the variable r for specifying the record has reached the total number R of records in the acoustic database 40 (S260). If not, the process returns to S220 to perform the collation process for the next record r. . When processing is performed for each record r and the record r reaches the total number of records R, that is, when processing for all R total records is completed, the matching table Rmin (c) is changed to the value of the matching table Smin (c). Based on the ascending order, the list is output together with the matching number c (S270).

次に、図１５のＳ２２０におけるレコードｒの照合処理の詳細について図１６のフローチャートを用いて説明する。まず、初期設定を行う（Ｓ２２１）。具体的には、適合テーブルＳｍｉｎ（ｃ）＝初期値Ｂｉｇ２、適合テーブルＲｍｉｎ（ｃ）＝−１、検索特徴ワードを特定する変数ｘ＝０、位相を特定する変数ｈ＝ｈｏ、登録特徴ワードを特定する変数ｙ＝０に設定する。ｈｏとしては、Ｈ＝５の場合、ｈｏ＝０，１，２，３，４のいずれかに設定することができるが、通常演算処理が最も少ないｈｏ＝０に設定する。続いて、合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）の算出を行う（Ｓ２２２）。合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）が得られたら、合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）が、判定しきい値Ｍ１以下であるかどうかを判断する（Ｓ２２３）。判定しきい値Ｍ１の値は、適宜設定することが可能であり、本実施形態では事前に設定されている。合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）が判定しきい値Ｍ１より大きい場合、Ｓ２２５に進み、ｘをインクリメントして、次の検索特徴ワードｘ＋１についての処理を行う。合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）が判定しきい値Ｍ１以下である場合、特定された（ｘ，ｙ）における登録特徴ワードｙと、検索特徴ワードｘとの照合を行う（Ｓ２２４）。 Next, details of the record r matching process in S220 of FIG. 15 will be described using the flowchart of FIG. First, initial setting is performed (S221). Specifically, the matching table Smin (c) = initial value Big2, the matching table Rmin (c) = − 1, the variable x = 0 specifying the search feature word, the variable h = ho specifying the phase, and the registered feature word Set the variable y to be specified to 0 As for ho, when H = 5, it can be set to any of ho = 0, 1, 2, 3, and 4, but it is set to ho = 0 with the least amount of normal arithmetic processing. Subsequently, the total mismatch bit number S (r, y, ho, x) is calculated (S222). When the total mismatch bit number S (r, y, ho, x) is obtained, it is determined whether or not the total mismatch bit number S (r, y, ho, x) is equal to or less than the determination threshold M1 (S223). ). The value of the determination threshold value M1 can be set as appropriate, and is set in advance in the present embodiment. When the total mismatch bit number S (r, y, ho, x) is larger than the determination threshold value M1, the process proceeds to S225, x is incremented, and the process for the next search feature word x + 1 is performed. When the total mismatch bit number S (r, y, ho, x) is equal to or less than the determination threshold M1, the registered feature word y in the specified (x, y) is collated with the search feature word x ( S224).

次に、変数ｘをインクリメントする（Ｓ２２５）。そして、ｘとＸ（ｈ）の比較を行う（Ｓ２２６）。比較の結果、ｘがＸ（ｈ）に達していない場合は、Ｓ２２２に戻って次の検索特徴ワードｘ＋１についての処理を行う。比較の結果、ｘがＸ（ｈ）に達している場合は、ｘ＝０としてｈをインクリメントして、Ｓ２２２〜Ｓ２２７の処理を繰り返し、Ｘ（ｈ）がＨになったら、ｘ＝０としてｙをインクリメントする（Ｓ２２７）。そして、ｙとＹ（ｒ）の比較を行う（Ｓ２２８）。比較の結果、ｙがＹ（ｒ）に達していない場合は、Ｓ２２２に戻って次の登録特徴ワードｙ＋１についての処理を行う。 Next, the variable x is incremented (S225). Then, x and X (h) are compared (S226). As a result of the comparison, if x does not reach X (h), the process returns to S222 and the process for the next search feature word x + 1 is performed. If x reaches X (h) as a result of the comparison, x is incremented with x = 0, and the processing of S222 to S227 is repeated. When X (h) becomes H, x = 0 and y Is incremented (S227). Then, y is compared with Y (r) (S228). If y does not reach Y (r) as a result of the comparison, the process returns to S222 and the process for the next registered feature word y + 1 is performed.

比較の結果、ｙがＹ（ｒ）に達している場合は、Ｙ（ｒ）個全ての登録特徴ワードに対する処理を終えたことになるので、その時点における適合テーブルＲｍｉｎ（ｃ）、適合テーブルＳｍｉｎ（ｃ）を出力する。この適合テーブルＲｍｉｎ（ｃ）、適合テーブルＳｍｉｎ（ｃ）が図１５のＳ２２０において得られることになる。 If y reaches Y (r) as a result of the comparison, the processing for all Y (r) registered feature words has been completed. Therefore, the matching table Rmin (c) and the matching table Smin at that time point. (C) is output. The matching table Rmin (c) and the matching table Smin (c) are obtained in S220 of FIG.

次に、図１６のＳ２２２における合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）の算出処理の詳細について図１７のフローチャートを用いて説明する。まず、初期設定を行う（Ｓ２８１）。具体的には、合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）＝０、特徴ワードの照合個数を示す変数ｗ＝０に設定する。初期設定後、登録特徴ワードの音量データＶｄ（ｒ，ｙ＋ｗ）、検索特徴ワードの音量データＶ（ｈ，ｘ＋ｗ）がともに０より大きいという条件を満たすかどうかを判断する（Ｓ２８２）。登録特徴ワードの音量データＶｄ（ｒ，ｙ＋ｗ）、検索特徴ワードの音量データＶ（ｈ，ｘ＋ｗ）は、それぞれ原音響データ、検索音響データに対して〔数式５〕、〔数式９〕に従った処理を実行し、算出された“Vol”、“Vol(ｈ)”を正規化したものである。Ｓ２８２における判断の結果、登録特徴ワードの音量データＶｄ（ｒ，ｙ＋ｗ）、検索特徴ワードの音量データＶ（ｚ，ｈ，ｘ＋ｗ）がともに０より大きいという条件を満たす場合に限り、登録特徴ワード１つと検索特徴ワード１つを比較した場合の、不一致ビット数であるワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）を算出する（Ｓ２８３）。 Next, the details of the calculation processing of the total mismatch bit number S (r, y, ho, x) in S222 of FIG. 16 will be described with reference to the flowchart of FIG. First, initial setting is performed (S281). Specifically, the total mismatch bit number S (r, y, ho, x) = 0 and the variable w = 0 indicating the number of feature word collations are set. After the initial setting, it is determined whether or not the condition that the volume data Vd (r, y + w) of the registered feature word and the volume data V (h, x + w) of the search feature word are both greater than 0 is satisfied (S282). The volume data Vd (r, y + w) of the registered feature word and the volume data V (h, x + w) of the search feature word are in accordance with [Formula 5] and [Formula 9] with respect to the original sound data and the retrieved sound data, respectively. The process is executed, and the calculated “Vol” and “Vol (h)” are normalized. As a result of the determination in S282, the registered feature word 1 only when the volume data Vd (r, y + w) of the registered feature word and the volume data V (z, h, x + w) of the search feature word are both greater than 0. The word unit mismatch bit number D (r, y + w, ho, x + w), which is the number of mismatch bits, is compared with one search feature word (S283).

次に、ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）が判定しきい値Ｍｗ１以下であるかどうかを判断する（Ｓ２８４）。判定しきい値Ｍｗ１の値は、適宜設定することが可能であり、本実施形態では事前に設定されている。Ｓ２８４における判断の結果、ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）が判定しきい値Ｍｗ１以下である場合に限り、ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）を合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）に加算する処理を行う（Ｓ２８５）。 Next, it is determined whether or not the word unit mismatch bit number D (r, y + w, ho, x + w) is equal to or less than the determination threshold value Mw1 (S284). The value of the determination threshold value Mw1 can be set as appropriate, and is set in advance in the present embodiment. As a result of the determination in S284, the word unit mismatch bit number D (r, y + w, ho, x + w) is obtained only when the word unit mismatch bit number D (r, y + w, ho, x + w) is equal to or less than the determination threshold value Mw1. A process of adding to the total mismatch bit number S (r, y, ho, x) is performed (S285).

Ｓ２８５においては、特徴ワードの照合個数を示す変数ｗをインクリメントする処理も行う。そして、特徴ワードの照合個数を示す変数ｗが所定数Ｗに達したかどうかを判断し（Ｓ２８６）、達していない場合は、Ｓ２８２に戻って、次の特徴ワードについて処理を行う。各特徴ワードについて処理を実行し、変数ｗが所定数Ｗに達したら、その時点における合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）を、あるｘ，ｈ（＝０），ｙについての合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）として出力する。この合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）が図１６のＳ２２２において得られることになる。なお、Ｓ２８２、Ｓ２８４において条件を満たさないと判断された場合には、合算不一致ビット数の算出エラーが出力される。 In S285, a process of incrementing the variable w indicating the number of feature word collations is also performed. Then, it is determined whether or not the variable w indicating the number of collated feature words has reached a predetermined number W (S286). If not, the process returns to S282 and the next feature word is processed. When processing is performed for each feature word and the variable w reaches a predetermined number W, the total mismatch bit number S (r, y, ho, x) at that time is calculated for a certain x, h (= 0), y. Output as the total mismatch bit number S (r, y, ho, x). This sum mismatch bit number S (r, y, ho, x) is obtained in S222 of FIG. If it is determined that the condition is not satisfied in S282 and S284, a calculation error of the number of total mismatch bits is output.

次に、図１６のＳ２２４におけるレコード内照合処理の詳細について図１８のフローチャートを用いて説明する。まず、初期設定を行う（Ｓ３２１）。位相を特定する変数ｈ＝０に設定する。続いて、合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）の算出を行う（Ｓ３２２）。合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）が得られたら、合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）が、判定しきい値Ｍ２以下であるかどうかを判断する（Ｓ３２３）。判定しきい値Ｍ２の値は、適宜設定することが可能であり、本実施形態では事前に設定されている。合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）が判定しきい値Ｍ２以下である場合、合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）が、合算不一致ビット数最小値Ｓｍｉｎ（ｃ）より小さいかどうかを判断する（Ｓ３２４）。 Next, details of the intra-record matching process in S224 of FIG. 16 will be described using the flowchart of FIG. First, initial setting is performed (S321). A variable h = 0 specifying the phase is set. Subsequently, the sum mismatch bit number S (r, y, h, x) is calculated (S322). When the total mismatch bit number S (r, y, h, x) is obtained, it is determined whether or not the total mismatch bit number S (r, y, h, x) is equal to or smaller than the determination threshold M2 (S323). ). The value of the determination threshold value M2 can be set as appropriate, and is set in advance in the present embodiment. When the total mismatch bit number S (r, y, h, x) is equal to or less than the determination threshold value M2, the total mismatch bit number S (r, y, h, x) is the total mismatch bit number minimum value Smin (c It is determined whether it is smaller than (S324).

合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）が合算不一致ビット数最小値Ｓｍｉｎ（ｃ）より小さい場合に限り、ｒの値を適合テーブルＲｍｉｎ（ｃ）にセットし、Ｓ（ｒ，ｙ，ｈ，ｘ）の値を適合テーブルＳｍｉｎ（ｃ）にセットする処理を行う（Ｓ３２５）。次に、変数ｈをインクリメントする（Ｓ３２６）。そして、ｈとＨの比較を行う（Ｓ３２７）。比較の結果、ｈがＨに達していない場合は、Ｓ３２２に戻って次の位相番号ｈ＋１についての処理を行う。 Only when the total mismatch bit number S (r, y, h, x) is smaller than the total mismatch bit number minimum value Smin (c), the value of r is set in the matching table Rmin (c), and S (r, y , H, x) is set in the matching table Smin (c) (S325). Next, the variable h is incremented (S326). Then, h and H are compared (S327). If h does not reach H as a result of the comparison, the process returns to S322 and the process for the next phase number h + 1 is performed.

比較の結果、ｈがＨに達している場合は、Ｈ個全ての位相に対する処理を終えたことになるので、その時点における適合テーブルＲｍｉｎ（ｃ）、適合テーブルＳｍｉｎ（ｃ）を出力する。この適合テーブルＲｍｉｎ（ｃ）、適合テーブルＳｍｉｎ（ｃ）が図１６のＳ２２４において得られることになる。 If h has reached H as a result of the comparison, the processing for all H phases has been completed, and the matching table Rmin (c) and matching table Smin (c) at that time are output. The matching table Rmin (c) and the matching table Smin (c) are obtained in S224 of FIG.

次に、図１８のＳ３２２における合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）の算出処理の詳細について図１９のフローチャートを用いて説明する。まず、初期設定を行う（Ｓ３８１）。具体的には、合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）＝０、特徴ワードの照合個数を示す変数ｗ＝０に設定する。初期設定後、登録特徴ワードの音量データＶｄ（ｒ，ｙ＋ｗ）、検索特徴ワードの音量データＶ（ｈ，ｘ＋ｗ）がともに０より大きいという条件を満たすかどうかを判断する（Ｓ３８２）。Ｓ３８２における判断の結果、登録特徴ワードの音量データＶｄ（ｒ，ｙ＋ｗ）、検索特徴ワードの音量データＶ（ｚ，ｈ，ｘ＋ｗ）がともに０より大きいという条件を満たす場合に限り、登録特徴ワード１つと検索特徴ワード１つを比較した場合の、不一致ビット数であるワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）を算出する（Ｓ３８３）。 Next, details of the calculation processing of the total mismatch bit number S (r, y, h, x) in S322 of FIG. 18 will be described using the flowchart of FIG. First, initial setting is performed (S381). Specifically, the total mismatch bit number S (r, y, h, x) = 0 and the variable w = 0 indicating the number of feature word collations are set. After the initial setting, it is determined whether or not the condition that the volume data Vd (r, y + w) of the registered feature word and the volume data V (h, x + w) of the search feature word are both greater than 0 is satisfied (S382). As a result of the determination in S382, the registered feature word 1 only when the volume data Vd (r, y + w) of the registered feature word and the volume data V (z, h, x + w) of the search feature word are both greater than 0. The word unit mismatch bit number D (r, y + w, h, x + w), which is the number of mismatch bits when one search feature word is compared, is calculated (S383).

次に、ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）が判定しきい値Ｍｗ２以下であるかどうかを判断する（Ｓ３８４）。判定しきい値Ｍｗ２の値は、適宜設定することが可能であり、本実施形態では事前に設定されている。Ｓ３８４における判断の結果、ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）が判定しきい値Ｍｗ２以下である場合に限り、ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）を合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）に加算する処理を行う（Ｓ３８５）。Ｓ３８５においては、特徴ワードの照合個数を示す変数ｗをインクリメントする処理も行う。そして、特徴ワードの照合個数を示す変数ｗが所定数Ｗに達したかどうかを判断し（Ｓ３８６）、達していない場合は、Ｓ３８２に戻って、次の特徴ワードについて処理を行う。各特徴ワードについて処理を実行し、変数ｗが所定数Ｗに達したら、その時点における合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）を、あるｘ，ｈ，ｙについての合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）として出力する。この合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）が図１８のＳ３２２において得られることになる。なお、Ｓ３８２、Ｓ３８４において条件を満たさないと判断された場合には、合算不一致ビット数の算出エラーが出力される。 Next, it is determined whether or not the word unit mismatch bit number D (r, y + w, h, x + w) is equal to or less than the determination threshold value Mw2 (S384). The value of the determination threshold value Mw2 can be set as appropriate, and is set in advance in the present embodiment. As a result of the determination in S384, the word unit mismatch bit number D (r, y + w, h, x + w) is obtained only when the word unit mismatch bit number D (r, y + w, h, x + w) is equal to or less than the determination threshold value Mw2. A process of adding to the total mismatch bit number S (r, y, h, x) is performed (S385). In S385, a process of incrementing the variable w indicating the number of feature word collations is also performed. Then, it is determined whether or not the variable w indicating the number of feature words to be collated has reached a predetermined number W (S386). If not, the process returns to S382 and the next feature word is processed. When processing is performed for each feature word and the variable w reaches a predetermined number W, the total mismatch bit number S (r, y, h, x) at that time is calculated as the total mismatch bit number for a certain x, h, y. Output as S (r, y, h, x). This sum mismatch bit number S (r, y, h, x) is obtained in S322 of FIG. If it is determined in S382 and S384 that the condition is not satisfied, an error in calculating the total number of mismatch bits is output.

＜２．３．ワード単位不一致ビット数の算出＞
図１７のＳ２８３、図１９のＳ３８３におけるワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）の算出について説明する。ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）の算出については、利用者により設定される音量判定モードにより具体的な処理内容が異なる。音量判定モードとしては、Ｏｆｆ、Ｗｅｉｇｈｔ、Ｍａｔｃｈ、Ｂｏｔｈの４つが存在する。 <2.3. Calculation of word unit mismatch bit count>
The calculation of the word unit mismatch bit number D (r, y + w, h, x + w) in S283 in FIG. 17 and S383 in FIG. 19 will be described. Regarding the calculation of the word unit mismatch bit number D (r, y + w, h, x + w), the specific processing contents differ depending on the sound volume determination mode set by the user. There are four volume determination modes: Off, Weight, Match, and Both.

＜２．３．１．音量判定モード“Ｏｆｆ”＞
音量判定モード“Ｏｆｆ”は重みを付加しないモードであり、音量判定モード“Ｏｆｆ”が設定されている場合、ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）は、そのままワード単位の相違の程度を示すワード単位相違度Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）となる。音量判定モード“Ｏｆｆ”の場合、ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）＝０として初期値を設定した後、Ｆｄ（ｒ，ｙ＋ｗ）とＦ（ｈ，ｘ＋ｗ）の３２ビットを対応するビット単位に順次比較し、ビットが異なるごとに、Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）に１加算していく。登録特徴ワード群、検索特徴ワード群のいずれにおいても、特徴ワードの特徴パターンは同様の規則で作成され、低周波成分をＬＳＢ、高周波成分をＭＳＢとした３２ビットの構成であるので、照合はこれらの各ビット値が一致するかどうかにより行うことができる。 <2.3.1. Volume judgment mode “Off”>
The volume determination mode “Off” is a mode in which no weight is added, and when the volume determination mode “Off” is set, the word unit mismatch bit number D (r, y + w, h, x + w) is directly different in word units. This is a word unit dissimilarity D (r, y + w, h, x + w) indicating the degree of. In the sound volume determination mode “Off”, the initial value is set with the word unit mismatch bit number D (r, y + w, h, x + w) = 0, and then 32 bits of Fd (r, y + w) and F (h, x + w). Are sequentially compared in corresponding bit units, and 1 is added to D (r, y + w, h, x + w) every time the bits differ. In both the registered feature word group and the search feature word group, the feature pattern of the feature word is created according to the same rule, and has a 32-bit configuration in which the low frequency component is LSB and the high frequency component is MSB. This can be done depending on whether or not the bit values match.

＜２．３．２．音量判定モード“Ｗｅｉｇｈｔ”＞
音量判定モード“Ｗｅｉｇｈｔ”は重みを付加するモードであり、音量判定モード“Ｗｅｉｇｈｔ” が設定されている場合、ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）＝０として初期値を設定した後、Ｆｄ（ｒ，ｙ＋ｗ）とＦ（ｈ，ｘ＋ｗ）の３２ビットを対応するビット単位に順次比較する。比較の結果に基づき、以下の〔数式１３〕に従った処理を実行して、Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）の値を定める。この結果、Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）はワード単位不一致ビット数ではなく、音量データを加味したワード単位の相違の程度を示すワード単位相違度となる。このモードは、登録特徴ワードと検索特徴ワードとの照合の際、検索特徴ワードの基礎となる検索音響データがアナログ変換などの信号処理を伴っていて、原音響データと音量の相対変化と音量の絶対値の双方が異なる場合に適切な照合結果を与える。 <2.3.2. Volume judgment mode “Weight”>
The sound volume determination mode “Weight” is a mode for adding a weight, and when the sound volume determination mode “Weight” is set, the initial value is set with the word unit mismatch bit number D (r, y + w, h, x + w) = 0. After that, the 32 bits of Fd (r, y + w) and F (h, x + w) are sequentially compared in corresponding bit units. Based on the result of the comparison, processing according to the following [Equation 13] is executed to determine the value of D (r, y + w, h, x + w). As a result, D (y + w, z, h, x + w) is not a word unit mismatch bit number but a word unit dissimilarity indicating the degree of difference in word units taking into account volume data. In this mode, when the registered feature word is matched with the search feature word, the search acoustic data that is the basis of the search feature word is accompanied by signal processing such as analog conversion. Appropriate matching results are given when both absolute values are different.

〔数式１３〕
Ｆｄ（ｒ，ｙ＋ｗ）側がビット１で、Ｆ（ｈ，ｘ＋ｗ）側がビット０の場合、
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）←Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）＋Ｖｄ（ｒ，ｙ＋ｗ）・２／｛Ｖｄ（ｒ，ｙ＋ｗ）＋Ｖ（ｈ，ｘ＋ｗ）｝
Ｆｄ（ｒ，ｙ＋ｗ）側がビット０で、Ｆ（ｈ，ｘ＋ｗ）側がビット１の場合、
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）←Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）＋Ｖ（ｈ，ｘ＋ｗ）・２／｛Ｖｄ（ｒ，ｙ＋ｗ）＋Ｖ（ｈ，ｘ＋ｗ）｝ [Formula 13]
When the Fd (r, y + w) side is bit 1 and the F (h, x + w) side is bit 0,
D (r, y + w, h, x + w) ← D (r, y + w, h, x + w) + Vd (r, y + w) · 2 / {Vd (r, y + w) + V (h, x + w)}
When the Fd (r, y + w) side is bit 0 and the F (h, x + w) side is bit 1,
D (r, y + w, h, x + w) ← D (r, y + w, h, x + w) + V (h, x + w) · 2 / {Vd (r, y + w) + V (h, x + w)}

＜２．３．３．音量判定モード“Ｍａｔｃｈ”＞
音量判定モード“Ｍａｔｃｈ”が設定されている場合、まず、音量判定モード“Ｏｆｆ”の場合の処理を行って、Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）を得る。そして、以下の〔数式１４〕に従った処理を実行することにより、重みを乗算してＤ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）の値を定める。この結果、Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）はワード単位不一致ビット数ではなく、音量データの変動パターンの相違分を加味したワード単位の相違の程度を示すワード単位相違度となる。このモードは、登録特徴ワードと検索特徴ワードとの照合の際、検索特徴ワードの基礎となる検索音響データが各種データ圧縮などの信号処理を伴っていて、原音響データと音量の相対変化にはあまり相違がないが、絶対値が異なる場合に適切な照合結果を与える。本実施形態では本モードが最も推奨される。 <2.3.3. Volume judgment mode “Match”>
When the sound volume determination mode “Match” is set, first, processing in the sound volume determination mode “Off” is performed to obtain D (r, y + w, h, x + w). Then, by executing the processing according to the following [Equation 14], the value of D (r, y + w, h, x + w) is determined by multiplying the weight. As a result, D (y + w, z, h, x + w) is not a word unit mismatch bit number but a word unit dissimilarity indicating the degree of difference in word units taking into account the difference in the fluctuation pattern of the volume data. In this mode, when the registered feature word and the search feature word are collated, the search acoustic data that is the basis of the search feature word is accompanied by signal processing such as various data compression. Although there is not much difference, an appropriate matching result is given when the absolute values are different. In this embodiment, this mode is most recommended.

〔数式１４〕
Ｖｄ（ｒ，ｙ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）＞Ｖ（ｈ，ｘ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）の場合、
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）←Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）／｛Ｖ（ｈ，ｘ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）｝
Ｖｄ（ｒ，ｙ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）＜Ｖ（ｈ，ｘ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）の場合、
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）←Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）／｛Ｖｄ（ｒ，ｙ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）｝ [Formula 14]
When Vd (r, y + w) · Vd (r, y + w−1)> V (h, x + w) · V (h, x + w−1),
D (r, y + w, h, x + w) ← D (r, y + w, h, x + w) · Vd (r, y + w) · Vd (r, y + w−1) / {V (h, x + w) · V (h, x + w-1)}
When Vd (r, y + w) · Vd (r, y + w−1) <V (h, x + w) · V (h, x + w−1),
D (r, y + w, h, x + w) ← D (r, y + w, h, x + w) · V (h, x + w) · V (h, x + w−1) / {Vd (r, y + w) · Vd (r, y + w-1)}

なお、ｗ＝０の場合、上記〔数式１４〕において、Ｖｄ（ｒ，ｙ＋ｗ−１）＝Ｖｄ（ｒ，ｙ＋ｗ）およびＶ（ｈ，ｘ＋ｗ−１）＝Ｖ（ｈ，ｘ＋ｗ）とする。 When w = 0, in the above [Equation 14], Vd (r, y + w−1) = Vd (r, y + w) and V (h, x + w−1) = V (h, x + w).

＜２．３．４．音量判定モード“Ｂｏｔｈ”＞
音量判定モード“Ｂｏｔｈ”が設定されている場合、まず、音量判定モード“Ｗｅｉｇｈｔ”の場合の処理を行って、Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）を得る。そして、以下の〔数式１５〕に従った処理を実行することにより、重みを乗算してＤ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）の値を定める。この結果、Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）はワード単位不一致ビット数ではなく、音量データを加味したワード単位の相違の程度を示すワード単位相違度となる。このモードは、登録特徴ワードと検索特徴ワードとの照合の際、検索特徴ワードの基礎となる検索音響データが波形歪みを伴う高い圧縮率のデータ圧縮やアナログ変換などの信号処理を伴っていて、原音響データと音量の相対変化と音量の絶対値の双方が顕著に異なる場合に適切な照合結果を与える。 <2.3.4. Volume judgment mode “Both”>
When the sound volume determination mode “Both” is set, first, processing in the sound volume determination mode “Weight” is performed to obtain D (r, y + w, h, x + w). Then, by executing processing according to the following [Equation 15], the value of D (r, y + w, h, x + w) is determined by multiplying the weight. As a result, D (y + w, z, h, x + w) is not a word unit mismatch bit number but a word unit dissimilarity indicating the degree of difference in word units taking into account volume data. In this mode, when the registered feature word and the search feature word are collated, the search acoustic data that is the basis of the search feature word is accompanied by signal processing such as high-compression data compression or analog conversion with waveform distortion. An appropriate collation result is given when both the original acoustic data, the relative change in volume, and the absolute value of volume are significantly different.

〔数式１５〕
Ｖｄ（ｒ，ｙ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）＞Ｖ（ｈ，ｘ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）の場合、
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）←Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）／｛Ｖ（ｈ，ｘ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）｝
Ｖｄ（ｒ，ｙ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）＜Ｖ（ｈ，ｘ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）の場合、
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）←Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）／｛Ｖｄ（ｒ，ｙ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）｝ [Formula 15]
When Vd (r, y + w) · V (h, x + w−1)> V (h, x + w) · Vd (r, y + w−1),
D (r, y + w, h, x + w) ← D (r, y + w, h, x + w) · Vd (r, y + w) · V (h, x + w−1) / {V (h, x + w) · Vd (r, y + w-1)}
When Vd (r, y + w) · V (h, x + w−1) <V (h, x + w) · Vd (r, y + w−1),
D (r, y + w, h, x + w) ← D (r, y + w, h, x + w) · V (h, x + w) · Vd (r, y + w−1) / {Vd (r, y + w) · V (h, x + w-1)}

なお、ｗ＝０の場合、上記〔数式１５〕において、Ｖｄ（ｒ，ｙ＋ｗ−１）＝Ｖｄ（ｒ，ｙ＋ｗ）およびＶ（ｈ，ｘ＋ｗ−１）＝Ｖ（ｈ，ｘ＋ｗ）とする。 When w = 0, in the above [Expression 15], Vd (r, y + w−1) = Vd (r, y + w) and V (h, x + w−1) = V (h, x + w).

音量判定モード“Ｏｆｆ”以外の場合、Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）はワード単位不一致ビット数ではなく、音量データを加味したワード単位の相違の程度を示すワード単位相違度となるため、図１６、図１７におけるＳ（ｒ，ｙ，ｈ，ｘ）は合算不一致ビット数ではなく、合算相違度を表すことになる。また、図１５、図１６におけるＳｍｉｎは最小不一致ビット数ではなく、最小相違度を表すことになる。 In cases other than the sound volume determination mode “Off”, D (r, y + w, h, x + w) is not the number of word unit mismatch bits, but the word unit dissimilarity indicating the degree of difference in word units taking into account the sound volume data. In FIG. 16 and FIG. 17, S (r, y, h, x) represents the total dissimilarity, not the total mismatch bit number. Further, Smin in FIGS. 15 and 16 represents not the minimum mismatch bit number but the minimum dissimilarity.

結局、図１５のＳ２７０において適合テーブルＲｍｉｎ（ｃ）が、適合テーブルＳｍｉｎ（ｃ）の値に基づいて昇順ソートされ、適合件数ｃとともに一覧出力されるので、これが検索音響データを用いて検索された検索結果となる。出力された一覧の中から１つが選択された場合には、情報出力手段１００は、そのレコードを音響データベース４０から抽出する。その際、原音響データの取得指示が行われると、情報出力手段１００は、レコードに記録されたファイル名から記憶装置内に格納されている当該原音響データに関連する関連情報や当該原音響データを特定するＩＤを出力する。 Eventually, in S270 of FIG. 15, the matching table Rmin (c) is sorted in ascending order based on the value of the matching table Smin (c) and is listed together with the number of matching cases c. This is searched using the search acoustic data. It becomes a search result. When one is selected from the output list, the information output unit 100 extracts the record from the acoustic database 40. At that time, when an instruction to acquire the original sound data is given, the information output means 100 uses the file name recorded in the record to obtain related information related to the original sound data stored in the storage device or the original sound data. An ID for specifying is output.

以上、本発明の好適な実施形態について説明したが、本発明は、上記実施形態に限定されず種々の変形が可能である。上記実施形態においては、様々な処理が組み合わされて、全体として処理負荷を抑えて、的確な検索を行うことを可能としているが、複数の処理の組み合わせのうち１以上の処理を省略することが可能である。この場合、処理負荷が若干増えたり、検索精度が若干低下したりすることになるが、本発明の効果は充分発揮することができる。例えば、上記実施形態では、照合範囲決定手段７０は、図１３（ａ）〜（ｅ）に示した全ての場合に対応して照合範囲を決定するようにしたが、図１３（ａ）〜（ｅ）のいずれか１つ以上に対応した処理を実行するようにしても良い。できれば、図１３（ａ）（ｂ）の２つの処理を含むようにするのが望ましい。 The preferred embodiments of the present invention have been described above, but the present invention is not limited to the above embodiments, and various modifications can be made. In the above embodiment, various processes are combined to reduce the processing load as a whole and perform an accurate search. However, one or more processes may be omitted from a combination of a plurality of processes. Is possible. In this case, the processing load slightly increases or the search accuracy slightly decreases, but the effect of the present invention can be sufficiently exerted. For example, in the above embodiment, the collation range determination means 70 determines the collation range corresponding to all cases shown in FIGS. 13A to 13E. The processing corresponding to any one or more of e) may be executed. If possible, it is desirable to include the two processes shown in FIGS.

また、上記実施形態では、図１６、図１８に示したように、特徴ワード照合手段９０は、位相ｈを特定した状態で第１の照合を行い、第１の照合の結果、相違の程度である相違度が、所定の判定しきい値Ｍ１より小さい場合に、位相ｈを変更しながら行う第２の照合を実行するようにしたが、このような２段階の照合処理とせずに、位相を変化させる第２の照合のみを行うようにしても良い。第２の照合のみの場合は、処理負荷が増加することになるが、照合精度は変化しないため、照合範囲決定手段７０による照合範囲の制限により、本発明の効果は得ることができる。 In the above embodiment, as shown in FIGS. 16 and 18, the feature word collating unit 90 performs the first collation in a state in which the phase h is specified. When a certain degree of difference is smaller than the predetermined determination threshold value M1, the second matching is performed while changing the phase h. However, the phase is changed without performing the two-step matching process. Only the second collation to be changed may be performed. In the case of only the second collation, the processing load increases, but the collation accuracy does not change. Therefore, the collation range is limited by the collation range determining means 70, so that the effect of the present invention can be obtained.

また、上記実施形態では、図１６、図１７に示したように、特徴ワード照合手段９０は、Ｗ個のワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）が全て判定しきい値Ｍｗ１より小さい場合に、合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）が、判定しきい値Ｍ１以下であるかどうかを判断し、図１８、図１９に示したように、特徴ワード照合手段９０は、Ｗ個のワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）が全て判定しきい値Ｍｗ２より小さい場合に、合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）が、判定しきい値Ｍ２以下であるかどうかを判断するようにしたが、いずれの場合も、ワード単位不一致ビット数Ｄが、判定しきい値Ｍｗ１、Ｍｗ２以下であるかどうかの判定を省略しても良い。ワード単位不一致ビット数Ｄの判定を省略した場合、個々の特徴ワードの差が大きいものが含まれる可能性があるが、全体として相違度Ｍ１、Ｍ２以下であれば、両者が類似していることの許容範囲とすることもできるためである。また、Ｗ個のワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）が全て判定しきい値Ｍｗ１より小さいかどうかの判断、および合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）が、判定しきい値Ｍ１以下であるかどうかの判断を省略し、Ｗ個のワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）が全て判定しきい値Ｍｗ２より小さく、合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）が、判定しきい値Ｍ２以下であるかどうかだけを判断するようにしても良い。 In the above embodiment, as shown in FIG. 16 and FIG. 17, the feature word collating unit 90 determines that the W word unit mismatch bit numbers D (r, y + w, ho, x + w) are all determined threshold values Mw1. If it is smaller, it is determined whether or not the sum mismatch bit number S (r, y, ho, x) is equal to or less than the determination threshold value M1, and as shown in FIGS. 90, when all the W word unit mismatch bits D (r, y + w, h, x + w) are smaller than the determination threshold Mw2, the total mismatch bit number S (r, y, h, x) is determined. Whether or not the threshold value M2 is equal to or less than the threshold value M2 is determined. In any case, the determination as to whether or not the word unit mismatch bit number D is equal to or less than the determination threshold values Mw1 and Mw2 may be omitted. . If the determination of the word unit mismatch bit number D is omitted, there may be a case where the difference between the individual characteristic words is large, but if the overall difference is M1 or M2, the two are similar. This is because the allowable range can be made. In addition, the determination as to whether all the W word unit mismatch bits D (r, y + w, ho, x + w) are smaller than the determination threshold Mw1 and the total mismatch bit number S (r, y, ho, x) are The determination of whether or not the threshold value is less than or equal to the determination threshold value M1 is omitted, and the W word unit mismatch bit numbers D (r, y + w, h, x + w) are all smaller than the determination threshold value Mw2 and the total mismatch bit number S It may be determined only whether (r, y, h, x) is equal to or less than the determination threshold value M2.

また、上記実施形態では、関連情報登録装置において代表登録特徴データを音響データベースに登録しておき、代表検索特徴データ生成手段６０により代表検索特徴データを生成し、代表特徴データ照合手段８０が、代表登録特徴データと代表検索特徴データの照合を行って、検索対象である原音響データの絞込みを行ったが、処理負荷を特に抑える必要がない場合は、この絞込み処理も必ずしも行う必要はない。 In the above embodiment, the representative registration feature data is registered in the acoustic database in the related information registration device, the representative search feature data generation unit 60 generates the representative search feature data, and the representative feature data matching unit 80 The registered feature data and the representative search feature data are collated to narrow down the original acoustic data to be searched. However, if there is no need to particularly reduce the processing load, this narrowing processing is not necessarily performed.

また、上記実施形態では、統合後の周波数バンドの数をｎ＝０〜３１までの３２バンドとしたが、状況に応じて適宜増減することが可能である。 Moreover, in the said embodiment, although the number of the frequency bands after integration was 32 bands from n = 0 to 31, it can be appropriately increased or decreased according to the situation.

また、上記実施形態では、音量判定モードとして４つのモードを選択可能としたが、上記４つのモードのうち、２つまたは３つのモードが選択可能に設定されていても良いし、１つのモードが固定的に設定されていても良い。 In the above embodiment, four modes can be selected as the sound volume determination mode. However, two or three of the four modes may be set to be selectable, and one mode may be selected. It may be fixedly set.

また、上記実施形態では、著作権上の問題から原音響データを音響データベースに登録しない場合について説明したが、原音響データそのものを関連情報の一部として音響データベースに登録しておき、検索の結果、抽出されたレコードに対応する原音響データを取得するようにしても良い。 In the above embodiment, the case where the original sound data is not registered in the sound database due to copyright problems has been described. However, the original sound data itself is registered in the sound database as a part of the related information, and the search result is obtained. The original sound data corresponding to the extracted record may be acquired.

本発明は、ＣＤ・ＤＶＤ等を用いた民生・業務用途における鑑賞用のパッケージ音楽分野、放送事業者等が商業目的で配信する放送・ネットワーク音楽配信分野における音楽著作権の保護（不正コピーの監視）および音楽属性情報の提供（楽曲タイトル検索サービス）等の産業に利用可能である。 The present invention relates to the protection of music copyright (monitoring illegal copying) in the field of package music for viewing for consumer and business use using CDs and DVDs, and the field of broadcasting and network music distribution distributed for commercial purposes by broadcasters and the like. ) And provision of music attribute information (music title search service).

２ａ、３ａ・・・ＣＰＵ
２ｂ、３ｂ・・・ＲＡＭ
２ｃ、３ｃ・・・データ記憶装置
２ｄ、３ｄ・・・プログラム記憶装置
２ｅ、３ｅ・・・キー入力Ｉ／Ｆ
２ｆ、３ｆ・・・データ入出力Ｉ／Ｆ
２ｇ、３ｇ・・・表示出力Ｉ／Ｆ
１０・・・登録特徴ワード生成手段
２０・・・代表登録特徴データ生成手段
３０・・・登録手段
４０・・・音響データベース
４５・・・モード設定手段
５０・・・検索特徴ワード生成手段
６０・・・代表検索特徴データ生成手段
７０・・・照合範囲決定手段
８０・・・代表特徴データ照合手段
９０・・・特徴ワード照合手段
１００・・・情報出力手段 2a, 3a ... CPU
2b, 3b ... RAM
2c, 3c ... data storage device 2d, 3d ... program storage device 2e, 3e ... key input I / F
2f, 3f ... Data I / O I / F
2g, 3g ... Display output I / F
DESCRIPTION OF SYMBOLS 10 ... Registration feature word generation means 20 ... Representative registration feature data generation means 30 ... Registration means 40 ... Acoustic database 45 ... Mode setting means 50 ... Search feature word generation means 60 ... Representative search feature data generation means 70: collation range determination means 80 ... representative feature data collation means 90 ... feature word collation means 100 ... information output means

Claims

A search is performed by comparing the characteristics of the retrieved acoustic data, which is the given acoustic data, with the characteristics of the original acoustic data registered in the acoustic database, and specifying the original acoustic data having characteristics similar to the retrieved acoustic data. A device for performing
For each original sound data, a registered feature word group that is a set of registered feature words expressing the feature pattern, an acoustic database in which related information of the original sound data is registered,
A search feature word that generates a search feature word that represents a feature pattern of the search acoustic data in each section for each of the search acoustic data, and obtains a search feature word group that is a set of the search feature words Generating means;
In the search feature word group and the registered feature word group, a part of the same number of feature words is set as a verification target, and the verification target registration in the registered feature word group is performed based on the relationship between the registered feature word group and the search feature word group. Collation range determining means for determining the movement range of the feature word as a collation range;
The collation target in the search feature word group and the collation target in the registered feature word group registered in the acoustic database are collated while moving the collation target in the registered feature word group in the collation range. When the degree of difference is smaller than a predetermined value , a feature word matching unit that identifies the original acoustic data corresponding to the registered feature word group as a selection target;
I have a,
The collation range determining means excludes the registered feature word group from the search target when the number of registered feature words constituting the registered feature word group is smaller than the number of search feature words constituting the searched feature word group. When the search acoustic data is set to start from the beginning of the acoustic material, a predetermined number of search feature words from the beginning of the search feature word group are to be collated, and the registered feature word group Related information retrieval apparatus of the sound data, characterized in der Rukoto what determines only registered feature word of the predetermined number as a comparison target and the collation range from the beginning of the.

Oite to claim 1,
When the search acoustic data is not set to start from the beginning of the acoustic material, the collation range determination means is a collation target for a predetermined number of search feature words from the top of the search feature word group, and the other is a collation target The remaining registered feature words excluding the same number of registered feature words as the search feature words that are not to be collated from the beginning of the registered feature word group are determined as the collation range. Related information retrieval device.

Oite to claim 1,
The collation range determining means is a collation target for a predetermined number of search feature words from the end of the search feature word group when the search acoustic data is not set to start from the beginning of the acoustic material, and the other collation target And the remaining registered feature words excluding the same number of registered feature words as the search feature words that are not to be collated from the end of the registered feature word group are determined as the collation range. Related information retrieval device.

Oite to claim 1,
When the search acoustic data is not set to start from the beginning of the acoustic material, the collation range determination means is a collation target for a predetermined number of search feature words in the center of the search feature word group, and the other is a collation target The central registered feature word excluding the same number of registered feature words as the non-matching search feature words from the end of the registered feature word group is determined as the matching range. Related information retrieval device.

In any one of Claims 1-4 ,
The search feature word generation means generates H search feature words while changing the phase h, and obtains H search feature word groups that are sets of the search feature words.
The characteristic word matching means, W number of search features word in the search feature word group, said W-pieces of registered feature words in the registered feature word groups as the checking target, W number of search features word while changing the phase h and collates the registration feature word, 1 wherein the word-word units dissimilarity each other, when all the W times of matching small Ri by the predetermined determination threshold value is the sum of the W number of the word unit dissimilarity The related information search device for acoustic data , wherein the sum difference is compared with the predetermined value as the difference .

The program for functioning a computer as a related information search apparatus of the acoustic data as described in any one of Claims 1-5 .