JP5516274B2

JP5516274B2 - Relevant information retrieval device for acoustic data

Info

Publication number: JP5516274B2
Application number: JP2010212337A
Authority: JP
Inventors: 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2010-09-22
Filing date: 2010-09-22
Publication date: 2014-06-11
Anticipated expiration: 2030-09-22
Also published as: JP2012068827A

Description

本発明は、楽曲等の音楽のデータを記録した音響データに関する関連情報を音響データベースから検索する装置に関する。 The present invention relates to an apparatus for searching related information related to acoustic data in which music data such as music is recorded from an acoustic database.

最近、流れている音楽のタイトル等を知ることができる楽曲属性情報の提供サービスとして、放送された音楽に対して日時と地域を放送局に照会したり、携帯電話で流れている音楽断片を録音してデータベースに登録されているメロディーと照合したりするサービスが実用化されている（例えば、特許文献１、２参照）。一方、出願人も、ＰＣＭ等によりデジタル化された音響データの所定区間の特徴を所定バイト数の検索特徴ワードに変換し、この検索特徴ワードを複数個で１セットとし、１セット内の検索特徴ワード群と、データベースに登録されている登録特徴ワード群とを照合する処理を順次行って、合致度が高いレコードを抽出することにより、録音により取得したメロディーとデータベースに登録されているメロディーとの照合処理を高速に行う技術を提案している（特許文献３参照）。 As a service to provide music attribute information that allows you to know the titles of music that has been played recently, you can query the broadcast station for the date and time of the broadcast music, and record music fragments that are being played on mobile phones. Services that collate with melodies registered in the database have been put into practical use (see, for example, Patent Documents 1 and 2). On the other hand, the applicant also converts the feature of a predetermined section of the sound data digitized by PCM or the like into a search feature word having a predetermined number of bytes, and sets a plurality of search feature words as one set of search features within one set. By sequentially processing the word group and the registered feature word group registered in the database and extracting records with a high degree of match, the melody obtained by recording and the melody registered in the database A technique for performing collation processing at high speed has been proposed (see Patent Document 3).

上記特許文献３の技術では、検索の目的として与えられる音響データ（検索音響データ）から作成した検索特徴ワード群と、データベースに登録されている登録特徴ワード群を比較するに際し、検索音響データが、発せられている音楽を録音することにより取得される場合もある。この場合、先頭位置から取得することができず、登録特徴ワード群の作成元である原音響データとの位置がずれ、特徴ワード群同士の比較を一律なしきい値で判断すると的確な検索ができないという問題があった。この問題を解決するため、出願人は、データベース全体で一律ではなく、各登録特徴ワード群ごと（レコードごと）に、判定しきい値を設定する技術を提案した（特許文献４参照）。 In the technique of the above-mentioned patent document 3, when comparing the search feature word group created from the acoustic data (search acoustic data) given as the purpose of the search and the registered feature word group registered in the database, the search acoustic data is It may be obtained by recording the music being played. In this case, the position cannot be obtained from the head position, the position of the registered feature word group is shifted from the original sound data, and the comparison between the feature word groups is judged with a uniform threshold value, so that an accurate search cannot be performed. There was a problem. In order to solve this problem, the applicant has proposed a technique for setting a determination threshold value for each registered feature word group (for each record), not for the entire database (see Patent Document 4).

特表２００４−５３６３４８号公報JP-T-2004-536348 特表２００４−５３７７６０号公報JP-T-2004-537760 特開２００７−２２６０７１号公報JP 2007-226071 A 特願２０１０−００９３１０号Japanese Patent Application No. 2010-009310

しかしながら、上記特許文献４の技術では、検索結果として複数のレコードが該当する場合、優先順位の判定に不具合が生じる場合がある。例えば、Ａ、Ｂ２つのレコードの判定しきい値が、Ａ＜Ｂという関係で、照合の結果、検索特徴ワード群との相違度が各判定しきい値以下となり、双方が適合する場合、相違度が小さいものを優先する設定にしておくと、判定しきい値の小さいレコードＡの方が優先され易くなり、適性な評価を行うのが難しいという問題がある。 However, in the technique disclosed in Patent Document 4, when a plurality of records correspond to the search result, there may be a problem in the priority order determination. For example, if the determination threshold values of the two records A and B are in the relationship of A <B, the degree of difference from the search feature word group is equal to or less than each determination threshold value as a result of matching, If priority is given to those having a smaller value, the record A having a smaller determination threshold value is more likely to be prioritized, making it difficult to perform appropriate evaluation.

そこで、本発明は、楽曲等の音響データを用いて、音響データベースに登録されている音響データに関連する関連情報を検索するにあたり、複数のレコードが適合した場合に、その優先順位を的確に定めることが可能な音響データの関連情報検索装置を提供することを課題とする。 Therefore, in the present invention, when searching for related information related to acoustic data registered in the acoustic database using acoustic data such as music, when a plurality of records are matched, the priority order is accurately determined. It is an object of the present invention to provide a related information search apparatus for acoustic data that can be used.

上記課題を解決するため、本発明第１の態様では、与えられた音響データである検索音響データの特徴と、音響データベースに登録された原音響データの特徴との照合を行って、前記検索音響データと特徴の近い原音響データを特定することにより検索を行う装置であって、各原音響データについて、原音響データの特徴を表現した登録特徴ワードの集合である登録特徴ワード群と、登録特徴ワード群の比較のための個別判定参考値と、当該原音響データの関連情報を登録した音響データベースと、前記検索音響データに対して、所定の区間単位で、各区間における検索音響データの特徴を表現した検索特徴ワードを生成し、当該検索特徴ワードの集合である特徴ワード群を得る検索特徴ワード生成手段と、前記検索特徴ワード群と、前記音響データベースに登録された登録特徴ワード群との位置関係を移動させながら両者間で照合を行い、前記検索特徴ワード群と前記登録特徴ワード群とが最も適合する位置における最小の相違の程度である最小相違度を算出し、設定された標準最小相違度と前記登録特徴ワード群に対応して登録された個別判定参考値との比率に基づいて前記最小相違度を補正して補正最小相違度を得て、当該補正最小相違度が所定の判定しきい値より小さい場合に、当該登録特徴ワード群に対応する原音響データを選出対象として特定する特徴ワード照合手段を有する音響データの関連情報検索装置を提供する。 In order to solve the above-mentioned problem, in the first aspect of the present invention, the search sound data is collated with the characteristics of the search sound data which is given sound data and the characteristics of the original sound data registered in the sound database, and the search sound is recorded. A device that performs a search by identifying original sound data having features close to the data, and for each original sound data, a registered feature word group that is a set of registered feature words that express the features of the original sound data, and a registered feature Individual determination reference values for comparison of word groups, an acoustic database in which relevant information of the original sound data is registered, and the characteristics of searched acoustic data in each section in a predetermined section unit with respect to the searched acoustic data Search feature word generation means for generating a expressed search feature word and obtaining a feature word group that is a set of the search feature words, the search feature word group, and the sound The minimum is the degree of the smallest difference at the position where the search feature word group and the registered feature word group are best matched while collating between the two while moving the positional relationship with the registered feature word group registered in the database The difference is calculated, and the corrected minimum difference is obtained by correcting the minimum difference based on the ratio between the set standard minimum difference and the individual determination reference value registered corresponding to the registered feature word group. And an acoustic data related information retrieval device having a feature word collating unit that identifies the original acoustic data corresponding to the registered feature word group as a selection target when the corrected minimum difference is smaller than a predetermined determination threshold value. provide.

本発明第１の態様によれば、各原音響データについて、その特徴パターンを表現した登録特徴ワードの集合である登録特徴ワード群を音響データベースに登録しておき、検索音響データに対して、所定の区間単位で、検索音響データの特徴を表現した検索特徴ワードを生成し、当該検索特徴ワードの集合である特徴ワード群を得て、検索特徴ワード群と、登録特徴ワード群との照合を行い、両者が最も適合する位置における最小の相違の程度である最小相違度を算出し、設定された標準最小相違度と登録特徴ワード群に対応して登録された個別判定参考値との比率に基づいて最小相違度を補正して補正最小相違度を得て、補正最小相違度が所定の判定しきい値より小さい場合に、その登録特徴ワード群に対応する原音響データを選出対象として特定するようにしたので、音響データベースに登録されている音響データに関連する関連情報を検索するにあたり、複数のレコードが適合した場合に、その優先順位を的確に定めることが可能となる。 According to the first aspect of the present invention, for each original sound data, a registered feature word group, which is a set of registered feature words representing the feature pattern, is registered in the sound database, and the search sound data is predetermined. A search feature word that expresses the feature of the search acoustic data is generated for each section, a feature word group that is a set of the search feature word is obtained, and the search feature word group and the registered feature word group are collated. , Calculate the minimum difference, which is the degree of the minimum difference at the best matching position, and based on the ratio of the set standard minimum difference and the individual judgment reference value registered corresponding to the registered feature word group If the corrected minimum dissimilarity is smaller than a predetermined judgment threshold by correcting the minimum dissimilarity, the original acoustic data corresponding to the registered feature word group is selected. Since so as to identify, when searching for related information relating to the sound data registered in the sound database, when a plurality of records are compatible, it is possible to determine the priority accurately.

また、本発明第２の態様では、本発明第１の態様の音響データの関連情報検索装置において、前記選出対象として特定された原音響データに関する関連情報を、前記補正最小相違度の値に基づいて、順序付けて出力する情報出力手段をさらに有することを特徴とする。 Further, according to the second aspect of the present invention, in the related information search apparatus for acoustic data according to the first aspect of the present invention, the related information regarding the original acoustic data specified as the selection target is based on the value of the corrected minimum difference. And an information output means for outputting in order.

本発明第２の態様によれば、選出対象として特定された原音響データに関する関連情報を、補正最小相違度の値に基づいて、順序付けて出力するようにしたので、複数のレコードが適合した場合に、各原音響データの特徴に応じた優先順位により出力が行われることになる。 According to the second aspect of the present invention, the related information related to the original sound data specified as the selection target is output in order based on the value of the corrected minimum dissimilarity. In addition, the output is performed in the priority order according to the characteristics of each original sound data.

また、本発明第３の態様では、本発明第１または第２の態様の音響データの関連情報検索装置において、前記特徴ワードは、音響データの解析により得られた周波数スペクトルの各成分を所定周波数単位でビット値により表現した所定ビット数の特徴パターンを有し、前記特徴ワード照合手段は、前記特徴パターンをビット単位で比較することにより、前記登録特徴ワード群と前記検索特徴ワード群を照合して不一致ビット数を求めるものであり、前記登録特徴ワード群と前記検索特徴ワード群とが最も適合する位置を、前記不一致ビット数が最小である位置とし、前記最小相違度として、前記不一致ビット数が最小となる最小不一致ビット数を求めることを特徴とする。 According to the third aspect of the present invention, in the acoustic data related information search apparatus according to the first or second aspect of the present invention, the feature word includes each component of the frequency spectrum obtained by analyzing the acoustic data at a predetermined frequency. A feature pattern of a predetermined number of bits expressed by a bit value in units, and the feature word collating unit collates the registered feature word group and the search feature word group by comparing the feature pattern in bit units. The position where the registered feature word group and the search feature word group are most suitable is set as the position where the number of mismatch bits is the smallest, and the number of mismatch bits is used as the minimum dissimilarity. It is characterized in that the minimum number of inconsistent bits that minimizes is obtained.

本発明第３の態様によれば、特徴ワードが、音響データの解析により得られた周波数スペクトルの各成分を所定周波数単位でビット値により表現した所定ビット数の特徴パターンを有し、特徴パターンをビット単位で比較することにより、登録特徴ワード群と検索特徴ワード群を照合して不一致ビット数を求め、この不一致ビット数が最小となる最小不一致ビット数を最小相違度とするようにしたので、ビット間の一致不一致を判断することにより高速に照合処理を行うことが可能となる。 According to the third aspect of the present invention, the feature word has a feature pattern having a predetermined number of bits in which each component of the frequency spectrum obtained by analyzing the acoustic data is expressed by a bit value in a predetermined frequency unit. By comparing the registered feature word group and the search feature word group by comparing in bit units, the number of mismatch bits is obtained, and the minimum mismatch bit number that minimizes the number of mismatch bits is set as the minimum difference. It is possible to perform a collation process at high speed by determining the coincidence / non-coincidence between bits.

また、本発明第４の態様では、本発明第１または第２の態様の音響データの関連情報検索装置において、前記特徴ワードは、音響データの解析により得られた周波数スペクトルの各成分を所定周波数単位でビット値により表現した所定ビット数の特徴パターンと、音量データを有し、前記特徴ワード照合手段は、前記特徴パターンをビット単位で比較することにより、前記登録特徴ワード群と前記検索特徴ワード群を照合して不一致ビット数を求めた後、各特徴ワードの音量データに基づく重みを前記不一致ビット数に加算または乗算して重み付け不一致ビット数を算出し、前記登録特徴ワード群と前記検索特徴ワード群とが最も適合する位置を、前記重み付け不一致ビット数が最小である位置とし、前記重み付け不一致ビット数が最小となる最小重み付け不一致ビット数を前記最小相違度として求めることを特徴とする。 According to the fourth aspect of the present invention, in the acoustic data related information search apparatus according to the first or second aspect of the present invention, the feature word includes each component of the frequency spectrum obtained by analyzing the acoustic data at a predetermined frequency. A feature pattern of a predetermined number of bits expressed by a bit value in units and volume data, and the feature word collating means compares the feature patterns bit by bit to thereby register the registered feature word group and the search feature word After the group is collated to determine the number of mismatch bits, the weight based on the volume data of each feature word is added to or multiplied by the mismatch bit number to calculate the weight mismatch bit number, and the registered feature word group and the search feature The position that best matches the word group is the position where the weighted mismatch bit number is the smallest, and the weight mismatch bit number is the smallest. And obtaining the minimum weighted number of inconsistent bits as said minimum dissimilarity.

本発明第４の態様によれば、特徴ワードが、音響データの解析により得られた周波数スペクトルの各成分を所定周波数単位でビット値により表現した所定ビット数の特徴パターンと、音量データを有し、特徴パターンをビット単位で比較することにより、登録特徴ワード群と検索特徴ワード群を照合して不一致ビット数を求め、各特徴ワードの音量データに基づく重みを不一致ビット数に加算または乗算して重み付け不一致ビット数を算出し、この重み付け不一致ビット数が最小となる最小不一致ビット数を最小相違度とするようにしたので、音量データの変動を加味した高精度な照合処理を行うことが可能となる。 According to the fourth aspect of the present invention, the feature word has a feature pattern having a predetermined number of bits in which each component of the frequency spectrum obtained by analyzing the acoustic data is expressed by a bit value in a predetermined frequency unit, and volume data. By comparing the feature pattern bit by bit, the registered feature word group and the search feature word group are collated to obtain the number of mismatch bits, and the weight based on the volume data of each feature word is added to or multiplied by the mismatch bit number. Since the number of weighted mismatch bits is calculated and the minimum mismatch bit number that minimizes the weight mismatch bit number is set as the minimum dissimilarity, it is possible to perform high-precision collation processing taking into account fluctuations in volume data Become.

また、本発明第５の態様では、本発明第１から第４のいずれか１つの態様の音響データの関連情報検索装置において、前記個別判定参考値は、あらかじめ、前記登録特徴ワード群に対応する原音響データに対して、部分的に切り出して部分音響データを得て、当該部分音響データに対して、所定の長さの単位区間を設定し、当該単位区間を解析した情報を基に、部分音響データの特徴を表現した特徴ワードの集合を部分特徴ワード群として生成し、前記登録特徴ワード群と前記部分特徴ワード群との時間的な位置関係をずらしながら両者間で照合を行い、前記登録特徴ワード群と前記部分特徴ワード群とが最も適合する位置における最小の相違の程度である最小相違度に基づいて算出することを特徴とする。 In the fifth aspect of the present invention, in the acoustic data related information search apparatus according to any one of the first to fourth aspects of the present invention, the individual determination reference value corresponds to the registered feature word group in advance. Partial sound data is obtained by partially cutting out the original sound data, a unit section of a predetermined length is set for the partial sound data, and the partial sound is analyzed based on information obtained by analyzing the unit section. A set of feature words expressing the characteristics of the acoustic data is generated as a partial feature word group, and the registration feature word group and the partial feature word group are collated with each other while shifting the temporal positional relationship between the registered feature word group and the registration The feature word group and the partial feature word group are calculated on the basis of the minimum degree of difference which is the minimum degree of difference at the best matching position.

本発明第５の態様によれば、個別判定参考値は、登録特徴ワード群に対応する原音響データから部分的に切り出した部分音響データに対して、所定の長さの単位区間を解析した情報を基に、部分音響データの特徴を表現した特徴ワードの集合を部分特徴ワード群として生成し、登録特徴ワード群と部分特徴ワード群の照合を行い、登録特徴ワード群と部分特徴ワード群とが最も適合する位置における最小の相違の程度である最小相違度に基づいて算出するようにしたので、各音響データに適した個別判定参考値を求めることが可能となる。 According to the fifth aspect of the present invention, the individual determination reference value is information obtained by analyzing a unit section of a predetermined length with respect to partial sound data partially cut out from the original sound data corresponding to the registered feature word group. Based on the above, a set of feature words expressing the characteristics of the partial acoustic data is generated as a partial feature word group, the registered feature word group and the partial feature word group are collated, and the registered feature word group and the partial feature word group are Since the calculation is based on the minimum difference, which is the degree of the minimum difference at the most suitable position, an individual determination reference value suitable for each acoustic data can be obtained.

また、本発明第６の態様では、本発明第５の態様の音響データの関連情報検索装置において、前記特徴ワードは、音響データの解析により得られた周波数スペクトルの各成分を所定周波数単位でビット値により表現した所定ビット数の特徴パターンを有し、前記部分特徴ワード群は、互いに切り出し位置が異なる複数の部分音響データごとに生成され、前記特徴パターンをビット単位で比較することにより前記登録特徴ワード群と前記部分特徴ワード群を照合して不一致ビット数を求め、前記登録特徴ワード群と前記検索特徴ワード群とが最も適合する位置を、前記不一致ビット数が最小である位置とし、前記最小相違度として、前記不一致ビット数が最小となる最小不一致ビット数を前記部分音響データごとに求め、前記部分音響データごとの最小不一致ビット数の平均値に基づいて、前記個別判定参考値を求めることを特徴とする。 According to a sixth aspect of the present invention, in the related information retrieval apparatus for acoustic data according to the fifth aspect of the present invention, the feature word is a bit of each component of the frequency spectrum obtained by analyzing the acoustic data in a predetermined frequency unit. The partial feature word group is generated for each of a plurality of partial sound data having different cut-out positions, and the registered features are compared in bit units. A word group and the partial feature word group are collated to determine the number of mismatch bits, and the position where the registered feature word group and the search feature word group are most suitable is set as the position where the number of mismatch bits is the minimum, and the minimum As the degree of difference, a minimum mismatch bit number that minimizes the mismatch bit number is obtained for each partial sound data, and each partial sound data Minimum mismatch based on the average value of the number of bits of, and obtains the individual judgment reference value.

本発明第６の態様によれば、特徴ワードが、音響データの解析により得られた周波数スペクトルの各成分を所定周波数単位でビット値により表現した所定ビット数の特徴パターンを有し、部分特徴ワード群は、互いに切り出し位置が異なる複数の部分音響データごとに生成され、特徴パターンをビット単位で比較することにより登録特徴ワード群と部分特徴ワード群を照合して不一致ビット数を求め、この不一致ビット数が最小となる最小不一致ビット数を最小相違度とし、この最小不一致ビット数の平均値に基づいて、個別判定参考値を求めるようにしたので、高速な照合処理に適した個別判定参考値を得ることが可能となる。 According to the sixth aspect of the present invention, the feature word has a feature pattern having a predetermined number of bits in which each component of the frequency spectrum obtained by analyzing the acoustic data is expressed by a bit value in a predetermined frequency unit, and the partial feature word A group is generated for each of a plurality of partial sound data whose cut-out positions are different from each other, and by comparing the feature pattern bit by bit, the registered feature word group and the partial feature word group are collated to obtain the number of mismatch bits. The minimum discrepancy bit number that minimizes the number is set as the minimum dissimilarity, and the individual judgment reference value is obtained based on the average value of this minimum discrepancy bit number, so the individual judgment reference value suitable for high-speed matching processing is obtained. Can be obtained.

また、本発明第７の態様では、本発明第５の態様の音響データの関連情報検索装置において、前記特徴ワードは、音響データの解析により得られた周波数スペクトルの各成分を所定周波数単位でビット値により表現した所定ビット数の特徴パターンと、音量データを有し、前記部分特徴ワード群は、互いに切り出し位置が異なる複数の部分音響データごとに生成され、前記特徴パターンをビット単位で比較することにより前記登録特徴ワード群と前記部分特徴ワード群を照合して不一致ビット数を求めた後、各特徴ワードの音量データに基づく重みを前記不一致ビット数に加算または乗算して重み付け不一致ビット数を算出し、前記登録特徴ワード群と前記部分特徴ワード群とが最も適合する位置を、前記重み付け不一致ビット数が最小である位置とし、前記重み付け不一致ビット数が最小となる最小重み付け不一致ビット数を前記最小相違度として前記部分音響データごとに求め、前記部分音響データごとの最小不一致ビット数の平均値に基づいて、前記個別判定参考値を求めることを特徴とする。 In the seventh aspect of the present invention, in the related information search apparatus for acoustic data according to the fifth aspect of the present invention, the feature word is a bit of each component of the frequency spectrum obtained by analysis of the acoustic data in a predetermined frequency unit. A feature pattern having a predetermined number of bits expressed by a value and volume data, and the partial feature word group is generated for each of a plurality of partial sound data having different cut-out positions, and the feature patterns are compared in bit units. The registered feature word group and the partial feature word group are collated to determine the number of mismatch bits, and the weight based on the volume data of each feature word is added to or multiplied by the mismatch bit number to calculate the weight mismatch bit number The position where the registered feature word group and the partial feature word group are most suitable is the position where the weighted mismatch bit number is the smallest. Position, the minimum weighted mismatch bit number that minimizes the number of weighted mismatch bits is determined for each partial acoustic data as the minimum dissimilarity, and based on the average value of the minimum mismatch bit numbers for each partial acoustic data, A determination reference value is obtained.

本発明第７の態様によれば、特徴ワードが、音響データの解析により得られた周波数スペクトルの各成分を所定周波数単位でビット値により表現した所定ビット数の特徴パターンと、音量データを有し、部分特徴ワード群は、互いに切り出し位置が異なる複数の部分音響データごとに生成され、特徴パターンをビット単位で比較することにより登録特徴ワード群と部分特徴ワード群を照合して不一致ビット数を求めた後、各特徴ワードの音量データに基づく重みを前記不一致ビット数に加算または乗算して重み付け不一致ビット数を算出し、この重み付け不一致ビット数が最小となる最小重み付け不一致ビット数を最小相違度とし、
この最小重み付け不一致ビット数の平均値に基づいて、個別判定参考値を求めるようにしたので、高精度な照合処理に適した個別判定参考値を得ることが可能となる。 According to the seventh aspect of the present invention, the feature word has a feature pattern having a predetermined number of bits in which each component of the frequency spectrum obtained by analyzing the acoustic data is expressed by a bit value in a predetermined frequency unit, and volume data. A partial feature word group is generated for each of a plurality of partial sound data having different cut-out positions, and the feature pattern is compared bit by bit to check the registered feature word group and the partial feature word group to obtain the number of mismatch bits. After that, the weight based on the volume data of each feature word is added or multiplied to the number of mismatch bits to calculate the weight mismatch bit number, and the minimum weight mismatch bit number that minimizes the weight mismatch bit number is set as the minimum difference. ,
Since the individual determination reference value is obtained based on the average value of the minimum weighted mismatch bits, it is possible to obtain the individual determination reference value suitable for high-precision collation processing.

また、本発明第８の態様では、本発明第１から第７のいずれか１つの態様の音響データの関連情報検索装置において、前記検索特徴ワード生成手段は、前記検索音響データに対して、前記特徴ワード群を、位相ｈを変更しながらＨ通りの検索特徴ワード群として生成するものであり、前記特徴ワード照合手段は、前記Ｈ通りの位相ｈのうち１つのｈを特定した前記検索特徴ワード群と、前記音響データベースに登録された登録特徴ワード群との照合を、第１の照合として行い、第１の照合の結果、前記補正最小相違度が第１の判定しきい値（ＭＭ１）より小さい場合に、前記検索特徴ワード群と、前記音響データベースに登録された登録特徴ワード群との照合を、前記位相ｈを変更しながら行う第２の照合を実行し、第２の照合の結果、前記登録特徴ワード群と前記検索特徴ワード群とが最も適合する位相ｈにおける前記補正最小相違度が、第２の判定しきい値（ＭＭ２）より小さい場合に、当該登録特徴ワード群に対応する原音響データを特定するものであることを特徴とする音響データの関連情報検索装置。
ことを特徴とする。 In the eighth aspect of the present invention, in the acoustic data related information search apparatus according to any one of the first to seventh aspects of the present invention, the search feature word generation means The feature word group is generated as H search feature word groups while changing the phase h, and the feature word collating means specifies the search feature word specifying one of the H phase h. The group is registered with the registered feature word group registered in the acoustic database as a first collation. As a result of the first collation, the corrected minimum dissimilarity is obtained from the first determination threshold (MM1). When the search feature word group is small, a second collation is performed while matching the search feature word group and the registered feature word group registered in the acoustic database while changing the phase h. As a result of the second collation, Above The original sound corresponding to the registered feature word group when the corrected minimum difference in phase h that best matches the recorded feature word group and the search feature word group is smaller than the second determination threshold (MM2). An apparatus for retrieving related information of acoustic data, characterized by identifying data.
It is characterized by that.

本発明第８の態様によれば、検索特徴ワード群を、位相ｈを変更しながらＨ通り生成し、Ｈ通りの位相ｈのうち１つのｈを特定した検索特徴ワード群と登録特徴ワード群との照合を、第１の照合として行い、第１の照合の結果、補正最小相違度が第１の判定しきい値（ＭＭ１）より小さい場合に、検索特徴ワード群と登録特徴ワード群との照合を、位相ｈを変更しながら行う第２の照合を実行し、第２の照合の結果、登録特徴ワード群と検索特徴ワード群とが最も適合する位相ｈにおける補正最小相違度が、第２の判定しきい値（ＭＭ２）より小さい場合に、当該登録特徴ワード群に対応する原音響データを特定するようにしたので、検索音響データと原音響データの特徴が大きく異なる場合に、処理負荷の高い位置ズレ対応処理を行うことなく、検索対象から除外することができ、全体として検索に掛かる処理負荷を軽減することが可能となる。 According to the eighth aspect of the present invention, the search feature word group is generated in H ways while changing the phase h, and the search feature word group and the registered feature word group that specify one of the H phase phases h, Is collated as the first collation, and when the corrected minimum difference is smaller than the first determination threshold (MM1) as a result of the first collation, the search feature word group and the registered feature word group are collated. Is performed while changing the phase h. As a result of the second collation, the corrected minimum difference degree in the phase h that best matches the registered feature word group and the search feature word group is the second match. Since the original sound data corresponding to the registered feature word group is specified when the value is smaller than the determination threshold (MM2), the processing load is high when the characteristics of the search sound data and the original sound data are greatly different. Perform misalignment processing Ku, can be excluded from the search, it is possible to reduce the processing load on the search as a whole.

また、本発明第９の態様では、本発明第１から第８のいずれか１つの態様の音響データの関連情報検索装置において、前記音響データベースは、各原音響データについて、さらに前記登録特徴ワード群を用いて生成された代表登録特徴データを登録しており、さらに、前記生成された検索特徴ワード群を用いて代表検索特徴データを生成する代表検索特徴データ生成手段と、前記代表検索特徴データと前記音響データベースに登録された代表登録特徴データとの照合を行う代表特徴データ照合手段と、を有し、前記特徴ワード照合手段は、前記代表特徴データ照合手段による照合の結果、前記代表検索特徴データと前記代表登録特徴データの相違度を前記個別判定参考値を用いて補正した値が所定の範囲内であると判断される場合に限り、前記検索特徴ワード群と、前記音響データベースに登録された登録特徴ワード群との照合を行うものであることを特徴とする。 In the ninth aspect of the present invention, in the acoustic data related information search apparatus according to any one of the first to eighth aspects of the present invention, the acoustic database further includes the registered feature word group for each original acoustic data. Representative registration feature data generated using the above-mentioned search feature data generation means for generating representative search feature data using the generated search feature word group, the representative search feature data, Representative feature data collating means for collating with representative registered feature data registered in the acoustic database, and the feature word collating means, as a result of collation by the representative feature data collating means, the representative search feature data Only when it is determined that the value obtained by correcting the difference between the representative registered feature data and the representative registered feature data using the individual determination reference value is within a predetermined range. The search characteristic word groups, and characterized in that for matching the registered feature word groups registered in the sound database.

本発明第９の態様によれば、音響データベースが、各原音響データについて、さらに登録特徴ワード群を用いて生成された代表登録特徴データを登録しており、さらに、検索特徴ワード群を用いて代表検索特徴データを生成し、代表検索特徴データと代表登録特徴データとの照合を行い、代表特徴データ照合手段による照合の結果、代表検索特徴データと代表登録特徴データの相違度を前記個別判定参考値を用いて補正した値が所定の範囲内であると判断される場合に限り、検索特徴ワード群と登録特徴ワード群との照合を行うようにしたので、音響データベースに登録されている音響データに関連する関連情報を検索するにあたり、音響データの大まかな特徴を用いて、照合する対象レコードの絞り込みを行うことにより、処理負荷を抑えることが可能となる。 According to the ninth aspect of the present invention, the acoustic database registers representative registered feature data generated using a registered feature word group for each original acoustic data, and further uses a search feature word group. Generate representative search feature data, collate the representative search feature data with the representative registered feature data, and check the difference between the representative search feature data and the representative registered feature data as a result of the collation by the representative feature data matching means. The search feature word group and the registered feature word group are collated only when it is determined that the value corrected using the value is within the predetermined range, so the acoustic data registered in the acoustic database When searching for related information related to, the processing load is reduced by narrowing down the records to be matched using the rough characteristics of the acoustic data. It becomes possible.

本発明によれば、楽曲等の音響データを用いて、音響データベースに登録されている音響データに関連する関連情報を検索するにあたり、複数のレコードが適合した場合に、その優先順位を的確に定めることが可能となるという効果を有する。 According to the present invention, when searching for related information related to acoustic data registered in the acoustic database using acoustic data such as music, when a plurality of records are matched, the priority is accurately determined. It has the effect that it becomes possible.

関連情報登録装置のハードウェア構成図である。It is a hardware block diagram of a related information registration apparatus. 関連情報登録装置の機能ブロック図である。It is a functional block diagram of a related information registration device. 登録特徴ワードの生成処理を示すフローチャートである。It is a flowchart which shows the production | generation process of a registration characteristic word. 特徴ワードの生成処理の概念図である。It is a conceptual diagram of the generation process of the characteristic word. 登録特徴ワードに基づく代表登録特徴データの算出処理を示すフローチャートである。It is a flowchart which shows the calculation process of the representative registration feature data based on a registration feature word. Ｓ１９、Ｓ２０による登録特徴データ配列の平均値、標準偏差の算出処理の概念図である。It is a conceptual diagram of the calculation process of the average value and standard deviation of the registration characteristic data array by S19 and S20. 個別判定参考値の算出処理を示すフローチャートである。It is a flowchart which shows the calculation process of an individual determination reference value. 図７のＳ１３０の位相ｈを固定（特定）した部分区間ｚの照合処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the collation process of the partial area z which fixed (identified) the phase h of S130 of FIG. 図８のＳ１３２の合算不一致ビット数Ｓ（ｙ，ｚ，ｈ，ｘ）の算出処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the calculation process of the sum mismatch bit number S (y, z, h, x) of S132 of FIG. 図７のＳ１５０の位相ｈを変化させた部分区間ｚの照合処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the collation process of the partial area z which changed the phase h of S150 of FIG. 本発明に係る音響データの関連情報検索装置のハードウェア構成図である。It is a hardware block diagram of the related information search apparatus of the acoustic data which concerns on this invention. 本発明に係る音響データの関連情報検索装置の機能ブロック図である。It is a functional block diagram of the related information search device for acoustic data according to the present invention. 検索特徴ワード生成手段５０、代表検索特徴データ生成手段６０、代表特徴データ照合手段８０、特徴ワード照合手段９０の処理の概要を示す図である。It is a figure which shows the outline | summary of a process of the search feature word production | generation means 50, the representative search feature data production | generation means 60, the representative feature data collation means 80, and the feature word collation means 90. FIG. 検索特徴ワードの生成処理を示すフローチャートである。It is a flowchart which shows the production | generation process of a search characteristic word. 検索特徴ワードに基づく代表検索特徴データの算出処理を示すフローチャートである。It is a flowchart which shows the calculation process of the representative search feature data based on a search feature word. Ｓ２９、Ｓ３０による代表検索特徴データの平均値、標準偏差の算出処理の概念図である。It is a conceptual diagram of the calculation process of the average value of standard search feature data by S29 and S30, and a standard deviation. 照合範囲決定の概念を示す図である。It is a figure which shows the concept of collation range determination. 代表特徴データの概念を示す図である。It is a figure which shows the concept of representative feature data. 特徴ワード照合手段９０による検索特徴ワード群を用いた検索のフローチャートである。5 is a flowchart of a search using a search feature word group by a feature word collating unit 90. 図１９のＳ２４０におけるレコードｒの照合処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the collation process of the record r in S240 of FIG. 図２０のＳ２４２における合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）の算出処理の詳細を示すフローチャートである。FIG. 21 is a flowchart showing details of a process of calculating a total mismatch bit number S (r, y, ho, x) in S242 of FIG. 20. 図２０のＳ２４５におけるレコード内照合処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the collation process in a record in S245 of FIG. 図２２のＳ３２２における合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）の算出処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the calculation process of sum total mismatch bit number S (r, y, h, x) in S322 of FIG.

＜１．音響データの関連情報登録装置＞
以下、本発明の実施形態について図面を参照して詳細に説明する。まず、音響データの関連情報の登録について説明する。音響データの関連情報の登録は、音響データの関連情報の登録装置（以下「関連情報登録装置」という。）により行う。関連情報登録装置は、原音響データから登録特徴ワード、代表登録特徴データを作成し、当該音響データに関連する関連情報（一般にメタデータと呼ばれる）とともに、作成した登録特徴ワード、代表登録特徴データ、個別判定参考値を、音響データを特定する情報（例えば、音響データＩＤ）と対応付け、音響データベースに登録する。１つの原音響データに関する音響データＩＤ、登録特徴ワード、代表登録特徴データ、個別判定参考値、関連情報は１レコードとして音響データベースに登録される。この音響データベースは、原音響データの関連情報を検索するために用いられるものであり、原音響データ自体は、登録されないのが通常である。これは、著作権上の問題であり、機能的には、原音響データを登録する構成とすることも当然可能である。 <1. Sound data related information registration device>
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. First, registration of related information of acoustic data will be described. Registration of the related information of the acoustic data is performed by a registration device for the related information of the acoustic data (hereinafter referred to as “related information registration device”). The related information registration device creates a registered feature word and representative registered feature data from the original sound data, together with related information related to the sound data (generally called metadata), the created registered feature word, representative registered feature data, The individual determination reference value is associated with information (for example, an acoustic data ID) specifying acoustic data and registered in the acoustic database. The acoustic data ID, registered feature word, representative registered feature data, individual determination reference value, and related information related to one original acoustic data are registered in the acoustic database as one record. This acoustic database is used for searching related information of original sound data, and the original sound data itself is usually not registered. This is a copyright problem, and functionally, it is naturally possible to adopt a configuration in which the original sound data is registered.

音響データとは、音楽や音声等をデジタル形式で記録したものであり、アナログ音響信号に対して、ＰＣＭ等の手法によりサンプリングして得られたものである。そして、原音響データとは、検索対象とされる楽曲等の音響素材の音響データである。著作権保護対策から、ＣＤ原盤の品質をもつＰＣＭ形式の音響データは一般にライセンス配布されないことが多いため、音響データベースに登録するデータとしては、あらかじめＭＰ３（MPEG-1/Layer3）などの各種非可逆圧縮処理が施された音響データファイルが与えられる場合が一般的である。しかし、入手できたデータがＭＰ３形式であったとしても、特徴ワードを作成するためには、ＭＰ３形式のデータを伸張し、サンプル列の音響データを生成する必要がある。 The acoustic data is recorded in a digital format such as music and voice, and is obtained by sampling an analog acoustic signal by a technique such as PCM. The original sound data is sound data of a sound material such as music to be searched. Because of copyright protection measures, PCM-format audio data with CD master disc quality is generally not distributed by license. Therefore, various irreversible data such as MP3 (MPEG-1 / Layer3) are pre-registered in the audio database. In general, an acoustic data file subjected to compression processing is given. However, even if the obtained data is in the MP3 format, in order to create the feature word, it is necessary to decompress the MP3 format data and generate the acoustic data of the sample sequence.

図１は、関連情報登録装置のハードウェア構成図である。関連情報登録装置は、汎用のコンピュータで実現することができ、図１に示すように、ＣＰＵ２ａ（CPU: Central Processing Unit）と、コンピュータのメインメモリであるＲＡＭ２ｂ（RAM: Random Access Memory）と、データを記憶するための大容量のデータ記憶装置２ｃ（例えば，ハードディスク）と、ＣＰＵが実行するプログラムを記憶するためのプログラム記憶装置２ｄ（例えば，ハードディスク）と、キーボード、マウス等のキー入力Ｉ／Ｆ２ｅと、外部デバイス（データ記憶媒体）とデータ通信するためのデータ入出力インターフェース２ｆと、表示デバイス（ディスプレイ）に情報を送出するための表示出力インターフェース２ｇと、を備え、互いにバスを介して接続されている。 FIG. 1 is a hardware configuration diagram of a related information registration apparatus. The related information registration device can be realized by a general-purpose computer. As shown in FIG. 1, a CPU 2a (CPU: Central Processing Unit), a main memory RAM 2b (RAM: Random Access Memory), and data A large-capacity data storage device 2c (for example, a hard disk), a program storage device 2d (for example, a hard disk) for storing programs executed by the CPU, and a key input I / F 2e such as a keyboard and a mouse And a data input / output interface 2f for data communication with an external device (data storage medium) and a display output interface 2g for sending information to a display device (display), which are connected to each other via a bus. ing.

関連情報登録装置のプログラム記憶装置２ｄには、ＣＰＵ２ａを動作させ、コンピュータを、関連情報登録装置として機能させるための専用のプログラムが実装されている。また、データ記憶装置２ｃは、処理結果として得られる登録特徴ワード、代表登録特徴データ、個別判定参考値等を関連情報と対応付けて記憶し、音響データベースとして機能するとともに、処理に必要な様々なデータを記憶する。 The program storage device 2d of the related information registration device is mounted with a dedicated program for operating the CPU 2a and causing the computer to function as the related information registration device. In addition, the data storage device 2c stores a registered feature word, representative registered feature data, individual determination reference value, and the like obtained as a processing result in association with related information, functions as an acoustic database, and performs various processes necessary for processing. Store the data.

図２は、関連情報登録装置の機能ブロック図である。図２において、１０は登録特徴ワード生成手段、１５は部分特徴ワード生成手段、２０は代表登録特徴データ生成手段、２５は代表部分特徴データ生成手段、３０は判定参考値算出手段、３５は登録手段、４０は音響データベースである。上述のように、各手段は、ＣＰＵ２ａがプログラム記憶装置２ｄから読み込んだ専用のプログラムを実行することにより実現される。 FIG. 2 is a functional block diagram of the related information registration apparatus. In FIG. 2, 10 is a registered feature word generating means, 15 is a partial feature word generating means, 20 is a representative registered feature data generating means, 25 is a representative partial feature data generating means, 30 is a judgment reference value calculating means, and 35 is a registering means. , 40 is an acoustic database. As described above, each unit is realized by the CPU 2a executing a dedicated program read from the program storage device 2d.

登録特徴ワード生成手段１０は、音響データから所定数のサンプルを音響フレームとして順次読み込み、読み込んだ音響フレームを利用して、周波数解析を行い、その音響データの特徴を表現した特徴ワードを生成する機能を有している。この特徴ワードは、ある音響データの特徴を少ないデータ量で表現したものであり、スペクトルの特徴を表した特徴パターンと、音量データにより構成される（著作権法上、生成された特徴ワードより原音響データを再現できない、即ち複製行為ができないことが要求され、特徴ワードはその条件を満たすので音響データベースへの登録が認められている。）。音響データベース４０に登録される特徴ワードを特に「登録特徴ワード」と呼ぶ。また、この特徴ワードを作成する基になる音響データを特に「原音響データ」と呼ぶ。この「原音響データ」としては、著作権者等が有している「原本」となるデータそのものではなく、この「原本」に著作権保護のための改変が施されたものを用いるのが普通である。もちろん「原本」となるデータそのものを「原音響データ」として用いることも可能である。後述するように、本発明においては、部分特徴ワードや検索特徴ワード等の他の特徴ワードが出現するが、これらは、いずれも特徴ワードの基本的な構造としては同一であるが、部分特徴ワードと検索特徴ワードは登録特徴ワードと異なり、後述する検索処理において位相をずらした照合に対応させるため、位相をずらした複数（本実施形態では５種）の特徴ワード群のセットをもたせているという相違がある。また、本明細書では、４０ビット構成の最小単位を「特徴ワード」、照合に用いられる特徴ワードの集合を「特徴ワード群」と呼ぶ。 The registered feature word generation unit 10 sequentially reads a predetermined number of samples as acoustic frames from the acoustic data, performs frequency analysis using the read acoustic frames, and generates a feature word expressing the features of the acoustic data have. This feature word expresses features of certain acoustic data with a small amount of data, and is composed of a feature pattern representing the features of the spectrum and volume data. It is required that the acoustic data cannot be reproduced, i.e., cannot be duplicated, and the feature word satisfies the condition, so registration in the acoustic database is permitted). A feature word registered in the acoustic database 40 is particularly called a “registered feature word”. In addition, the sound data on which the feature word is created is particularly referred to as “original sound data”. As this "original sound data", it is normal to use the "original" that has been modified to protect the copyright, not the original data itself owned by the copyright holders. It is. Of course, the data itself that is the “original” can also be used as the “original sound data”. As will be described later, in the present invention, other feature words such as a partial feature word and a search feature word appear. These are the same as the basic structure of the feature word, but the partial feature word. Unlike the registered feature word, the search feature word is provided with a set of a plurality of feature words (five types in the present embodiment) whose phases are shifted in order to correspond to collation in which phases are shifted in search processing described later. There is a difference. In this specification, the minimum unit of 40-bit configuration is called a “feature word”, and a set of feature words used for collation is called a “feature word group”.

部分特徴ワード生成手段１５は、原音響データの一部分である部分音響データに対して、登録特徴ワード生成手段１０と同様、所定数のサンプルを音響フレームとして順次読み込み、読み込んだ音響フレームを利用して、周波数解析を行い、その部分音響データの特徴を表現した特徴ワードを生成する機能を有している。登録特徴ワード生成手段１０、部分特徴ワード生成手段１５は、共に同一の原音響データから音響フレームを読み込み、特徴ワードを作成する点で同一であるが、登録特徴ワード生成手段１０は、原音響データ全体（すなわち先頭から最後まで）に対して処理を行うのに対して、部分特徴ワード生成手段１５は、原音響データより複数の短い部分区間（検索時に生成する検索特徴ワードと同程度の５〜１５秒程度の区間）を盲目的に切り出して抽出し、更に各部分区間ごとに位相をずらした複数（本実施形態では５種）の部分音響データを生成する点で異なる。１つの原音響データより何個の部分区間を抽出するかについては、事前に設定しておくことになるが、例えば、５個を抽出する場合、原音響データ全体が５分間の長さで、部分区間の長さが１５秒であったとすると、先頭（０分）〜１５秒、１分０秒〜１分１５秒、２分０秒〜２分１５秒、３分０秒〜３分１５秒、４分〜４分１５秒の５個の部分区間を抽出し、５個の部分区間より生成される各部分音響データに対して処理を行う。 The partial feature word generation unit 15 sequentially reads a predetermined number of samples as acoustic frames for partial acoustic data, which is a part of the original acoustic data, in the same manner as the registered feature word generation unit 10, and uses the read acoustic frames. , It has a function of performing frequency analysis and generating a feature word representing the characteristics of the partial sound data. Both the registered feature word generation means 10 and the partial feature word generation means 15 are the same in that they read an acoustic frame from the same original sound data and create a feature word. The partial feature word generation means 15 performs processing on the whole (that is, from the beginning to the end), while the partial feature word generation means 15 has a plurality of short partial sections (same as the search feature words generated at the time of search). This is different in that a plurality of (five types in the present embodiment) partial sound data are generated by blindly cutting out and extracting a section of about 15 seconds and shifting the phase for each partial section. The number of partial sections to be extracted from one original sound data will be set in advance. For example, when extracting five pieces, the entire original sound data is 5 minutes long, If the length of the partial section is 15 seconds, the head (0 minutes) to 15 seconds, 1 minute 0 seconds to 1 minute 15 seconds, 2 minutes 0 seconds to 2 minutes 15 seconds, 3 minutes 0 seconds to 3 minutes 15 Five partial sections of seconds, 4 minutes to 4 minutes 15 seconds are extracted, and processing is performed on each partial acoustic data generated from the five partial sections.

音響フレームについては、登録特徴ワード生成手段１０、部分特徴ワード生成手段１５も同じサンプル数で取得し、所定の処理を行って特徴ワードを生成する。音響フレームを構成するサンプル数、特徴ワードの生成手法については、登録特徴ワード生成手段１０、部分特徴ワード生成手段１５で同一である必要があるが、部分特徴ワードについては同一の部分区間の部分音響データをもとに位相をずらした複数のセットの部分特徴ワード群が生成されるようにしている点で相違がある。 For the acoustic frame, the registered feature word generation unit 10 and the partial feature word generation unit 15 also acquire the same number of samples and perform a predetermined process to generate a feature word. The number of samples constituting the sound frame and the feature word generation method need to be the same in the registered feature word generation means 10 and the partial feature word generation means 15, but the partial feature words have partial sounds in the same partial section. There is a difference in that a plurality of sets of partial feature word groups whose phases are shifted based on data are generated.

また、上述のように、登録特徴ワード生成手段１０、部分特徴ワード生成手段１５は、同一サンプル数の音響フレームに対して、若干異なる処理を行って特徴ワードを生成するが、最小単位として生成される４０ビットの特徴ワードの形式は同一である。ただし、上述のように、原音響データ全体を対象として登録特徴ワード生成手段１０が作成する特徴ワードを「登録特徴ワード」、部分音響データを対象として部分特徴ワード生成手段２０が作成する特徴ワードを「部分特徴ワード」と呼ぶ。 In addition, as described above, the registered feature word generation unit 10 and the partial feature word generation unit 15 generate a feature word by performing slightly different processing on the same number of sampled sound frames, but are generated as a minimum unit. The format of the 40-bit feature word is the same. However, as described above, the feature word created by the registered feature word generation unit 10 for the entire original sound data is “registered feature word”, and the feature word created by the partial feature word generation unit 20 for partial acoustic data is the target word. This is called “partial feature word”.

代表登録特徴データ生成手段２０は、登録特徴ワード生成手段１０により生成された登録特徴ワード群を用いて、１つの原音響データにつき、１つの代表登録特徴データを生成する機能を有する。登録特徴ワードは、原音響データの部分的な特徴を表現するのに対して、代表登録特徴データは、１つの原音響データの全体的な特徴を表現する。 The representative registered feature data generating unit 20 has a function of generating one representative registered feature data for one original sound data using the registered feature word group generated by the registered feature word generating unit 10. The registered feature word represents a partial feature of the original sound data, whereas the representative registered feature data represents the overall feature of one original sound data.

代表部分特徴データ生成手段２５は、部分特徴ワード生成手段１５により生成された部分特徴ワード群を用いて、１つの部分音響データにつき、１つの代表部分特徴データを生成する機能を有する。部分特徴ワードは、部分音響データの部分的な特徴を表現するのに対して、代表部分特徴データは、１つの部分音響データの全体的な特徴を表現する。 The representative partial feature data generation unit 25 has a function of generating one representative partial feature data for one partial acoustic data using the partial feature word group generated by the partial feature word generation unit 15. The partial feature word represents a partial feature of partial acoustic data, while the representative partial feature data represents an overall feature of one partial acoustic data.

判定参考値算出手段３０は、１つの登録特徴ワード群と複数の部分特徴ワード群との間、および１つの代表登録特徴データと複数の代表部分特徴データとの間での照合を行う。この際、登録特徴ワード群に比べて各部分特徴ワード群の特徴ワード数が顕著に少ないため、１つの登録特徴ワード群と複数の部分特徴ワード群との間については、時間的な位置をずらしながら照合を行う。更に各部分特徴ワード群は位相が異なる複数種で１セットとして構成されているため、各群とも個別に照合を行う必要がある。例えば、登録特徴ワード群が５分の長さに対応しており、５種の１５秒間の部分区間に対応して５種の部分特徴ワード群があり、更に各部分特徴ワード群が位相を変化させて５通り作成されている場合、５×５種の１５秒の部分特徴ワード群を用いて５分の長さの登録特徴ワード群と時間位置を１５／２秒ずつオーバラップさせながら、ずらして照合を行う場合、５×５×３００×２／１５＝１０００通りの組み合わせで照合を行うことになる。これらの組み合わせで照合を行った結果、５種の各部分区間ごとに最小の相違を示す値を算出し、それらの平均値に基づいて、個別判定参考値を算出する。詳細については、後述するが、１つの原音響データについて、１つの登録特徴ワード群と複数の部分特徴ワード群との間の照合により４つの個別判定参考値、１つの代表登録特徴データと複数の代表部分特徴データとの間の照合により１つの個別判定参考値を算出する。 The determination reference value calculation means 30 performs collation between one registered feature word group and a plurality of partial feature word groups, and one representative registered feature data and a plurality of representative partial feature data. At this time, since the number of feature words of each partial feature word group is significantly smaller than that of the registered feature word group, the temporal position is shifted between one registered feature word group and a plurality of partial feature word groups. While collating. Furthermore, since each partial feature word group is configured as a set of a plurality of types having different phases, it is necessary to collate each group individually. For example, a registered feature word group corresponds to a length of 5 minutes, there are five types of partial feature word groups corresponding to five types of 15-second partial sections, and each partial feature word group changes phase. 5 types of 15-second partial feature words are used, and the registered feature words with a length of 5 minutes and the time position overlap each other by 15/2 seconds. Thus, collation is performed with 5 × 5 × 300 × 2/15 = 1000 combinations. As a result of collation with these combinations, a value indicating the minimum difference is calculated for each of the five types of partial sections, and an individual determination reference value is calculated based on the average value thereof. Although details will be described later, for one original sound data, four individual determination reference values, one representative registered feature data, and a plurality of pieces of information are obtained by collation between one registered feature word group and a plurality of partial feature word groups. One individual determination reference value is calculated by collation with the representative partial feature data.

登録手段３５は、登録特徴ワード生成手段１０により生成された登録特徴ワード群と、代表登録特徴データ生成手段２０により生成された代表登録特徴データと、判定参考値算出手段３０により算出された５種の個別判定参考値を、元の原音響データの制作や著作権に関連する関連情報（一般にメタデータと呼ばれる）、および原音響データを特定するために原音響データの著作権情報等を管理する事業者が個別に定義付けたＩＤと対応付けて音響データベース４０に登録する機能を有している。ここで、関連情報とは、楽曲名、ジャンル名など楽曲を特定するテキスト情報、作詞・作曲・編曲者名、アーチスト名、プロデューサ名など原音響データの制作に関わる著作権者・著作隣接権者名に関するテキスト情報を示すものである。ただし、原音響データそのものは著作権法上の制約から、音響データベース４０に通常登録することはない。また、原音響データの制作・マスタリングに使用した一連のバイナリ形式の素材データ（ミックスダウンする前の個別の録音データ、ＭＩＤＩ打ち込みデータ）等についても、著作権法上の制約により通常、登録することはない。 The registration unit 35 includes a registered feature word group generated by the registered feature word generation unit 10, representative registration feature data generated by the representative registration feature data generation unit 20, and five types calculated by the determination reference value calculation unit 30. Manages the individual judgment reference values for the original original sound data, related information related to the copyright (generally called metadata), and the original sound data copyright information to identify the original sound data. It has a function of registering in the acoustic database 40 in association with an ID individually defined by a business operator. Here, related information refers to text information that identifies the song, such as the song name and genre name, and the copyright owner and copyright owner who are involved in the production of the original acoustic data such as the lyrics, composer, arranger name, artist name, producer name, etc. Indicates text information about the name. However, the original sound data itself is not normally registered in the sound database 40 due to restrictions on the copyright law. In addition, a series of binary-format material data (individual recording data before mixing down, MIDI input data), etc., used for production and mastering of the original sound data should be registered normally due to restrictions on copyright law. There is no.

＜１．２．関連情報登録装置の処理動作＞
次に、図２に示した関連情報登録装置の処理動作について説明する。まず、関連情報登録装置では、登録特徴ワード生成手段１０が、指定された原音響データから登録特徴ワードを生成する。図３は、登録特徴ワードの生成処理を示すフローチャートである。まず、登録特徴ワード生成手段１０が、原音響データを読み込む。関連情報登録装置では、登録特徴ワード生成手段１０が、指定された原音響データから、所定数のサンプルを１音響フレームとして読み込む。登録特徴ワード生成手段１０が読み込む１音響フレームのサンプル数は、適宜設定することができるが、サンプリング周波数が４４．１ｋＨｚの場合、４０９６サンプル程度とすることが望ましい。これは、約０．０９２秒に相当する。ただし、後述する周波数変換におけるハニング窓関数の利用により、隣接窓間の連続性を考慮して、音響フレームは、所定数分のサンプルを重複させて読み込むことにしている。本実施形態では、音響フレームの区間長のちょうど半分となる２０４８サンプルを重複させている。したがって、先頭の音響フレームはサンプル１〜４０９６、２番目の音響フレームはサンプル２０４９〜６１４４、３番目の音響フレームはサンプル４０９７〜８１９２というように、順次読み込まれていくことになる。 <1.2. Processing operation of related information registration device>
Next, the processing operation of the related information registration apparatus shown in FIG. 2 will be described. First, in the related information registration device, the registered feature word generation means 10 generates a registered feature word from the designated original sound data. FIG. 3 is a flowchart showing a registration feature word generation process. First, the registered feature word generation means 10 reads the original sound data. In the related information registration device, the registered feature word generation means 10 reads a predetermined number of samples as one acoustic frame from the designated original acoustic data. The number of samples of one acoustic frame read by the registered feature word generation unit 10 can be set as appropriate, but is desirably about 4096 samples when the sampling frequency is 44.1 kHz. This corresponds to about 0.092 seconds. However, in consideration of the continuity between adjacent windows by using a Hanning window function in frequency conversion, which will be described later, the acoustic frame is read by overlapping a predetermined number of samples. In this embodiment, 2048 samples that are exactly half the section length of the acoustic frame are overlapped. Therefore, the first acoustic frame is sequentially read as samples 1 to 4096, the second acoustic frame is samples 2049 to 6144, and the third acoustic frame is samples 4097 to 8192.

続いて、登録特徴ワード生成手段１０は、読み込んだ各音響フレームに対して、周波数変換を行って、その音響フレームのスペクトルであるフレームスペクトルを得る（Ｓ１１）。具体的には、登録特徴ワード生成手段１０が読み込んだ音響フレームについて、窓関数を利用して周波数変換を行う。周波数変換としては、フーリエ変換、ウェーブレット変換その他公知の種々の手法を用いることができる。本実施形態では、フーリエ変換を用いた場合を例にとって説明する。 Subsequently, the registered feature word generation unit 10 performs frequency conversion on each read sound frame to obtain a frame spectrum that is a spectrum of the sound frame (S11). Specifically, frequency conversion is performed on the acoustic frame read by the registered feature word generation means 10 using a window function. As frequency conversion, Fourier transform, wavelet transform, and other various known methods can be used. In the present embodiment, a case where Fourier transform is used will be described as an example.

ここで、本実施形態においてフーリエ変換に利用する窓関数について説明しておく。一般に、所定の信号に対してフーリエ変換を行う場合、信号を所定の長さに区切って行う必要があるが、この場合、所定の長さの信号に対してそのままフーリエ変換（正確には短時間フーリエ変換とよばれる）を行うと、高域部に擬似成分が発生する。そこで、一般にフーリエ変換を行う場合には、ハニング窓と呼ばれる窓関数を用いて、窓の境界部のコサイン波形状で重みを落とすように信号の値を変化させた後、変化後の値に対してフーリエ変換を実行する。 Here, a window function used for Fourier transform in the present embodiment will be described. In general, when Fourier transform is performed on a predetermined signal, it is necessary to divide the signal into predetermined lengths. In this case, the Fourier transform (precisely, for a short time) is performed on a signal having a predetermined length. When called (Fourier transform), a pseudo component is generated in the high frequency region. Therefore, in general, when performing Fourier transform, the signal value is changed so that the weight is dropped by the cosine wave shape at the boundary of the window using a window function called a Hanning window, and then the value after the change is changed. To perform the Fourier transform.

Ｓ１１においてフーリエ変換を行う場合、具体的には、サンプルｉにおける値Ｘ（ｉ）（ｉ＝０，…，Ｎ−１）に対して、０〜１の実数値をもち、Ｎサンプル区間に定義されるハニング窓関数Ｗ（ｉ）（＝０．５−０．５ｃｏｓ（２πｉ／Ｎ）を用いて、以下の〔数式１〕の第１式、第２式に従った処理を行い、各周波数における実部Ａ（ｊ）、虚部Ｂ（ｊ）を得る。 When the Fourier transform is performed in S11, specifically, the value X (i) (i = 0,..., N−1) in the sample i has a real value of 0 to 1 and is defined in the N sample section. Using the Hanning window function W (i) (= 0.5−0.5 cos (2πi / N)), the processing according to the first and second formulas of [Formula 1] Real part A (j) and imaginary part B (j) are obtained.

続いて、スペクトル成分の算出を行う（Ｓ１２）。具体的には、以下の〔数式１〕の第３式に従った処理を行い、各周波数における強度値Ｅ（ｊ）を得る。 Subsequently, a spectral component is calculated (S12). Specifically, the processing according to the third equation of the following [Equation 1] is performed to obtain the intensity value E (j) at each frequency.

〔数式１〕
Ａ（ｊ）＝Σ_i=0,…,N-1Ｗ（ｉ）・Ｘ（ｉ）・ｃｏｓ（２πｉｊ／Ｎ）
Ｂ（ｊ）＝Σ_i=0,…,N-1Ｗ（ｉ）・Ｘ（ｉ）・ｓｉｎ（２πｉｊ／Ｎ）
Ｅ（ｊ）＝Ａ（ｊ）²＋Ｂ（ｊ）² [Formula 1]
A (j) = Σ _{i = 0,..., N-1} W (i) · X (i) · cos (2πij / N)
B (j) = Σ _{i = 0,..., N-1} W (i) · X (i) · sin (2πij / N)
E (j) = A (j) ² + B (j) ²

〔数式１〕において、ｉは、各音響フレーム内のＮ個のサンプルに付した通し番号であり、ｉ＝０，１，２，…，Ｎ−１の整数値をとる。また、ｊは周波数の値について、値の小さなものから順に付した通し番号であり、ｊ＝０，１，２，…，Ｎ／２−１の整数値をとる。サンプリング周波数が４４．１ｋＨｚ、Ｎ＝４０９６の場合、ｊの値が１つ異なると、周波数が１０．８Ｈｚ異なることになる。 In [Formula 1], i is a serial number assigned to N samples in each acoustic frame, and takes an integer value of i = 0, 1, 2,..., N−1. Further, j is a serial number assigned in order from the smallest value of the frequency value, and takes an integer value of j = 0, 1, 2,..., N / 2-1. When the sampling frequency is 44.1 kHz and N = 4096, if the value of j is different by one, the frequency will be different by 10.8 Hz.

続いて、スペクトル成分の統合処理を行う（Ｓ１３）。上記の周波数変換により、２２ｋＨｚ付近までのスペクトル成分が得られるが、本実施形態における特徴ワードの生成には、３４０Ｈｚ以上で４ｋＨｚ付近より低い範囲のスペクトル成分を用いる。これは、携帯電話の音声再生で使用される３ＧＰＰ規格等の音声圧縮形式に対応させるためである（ただし、本実施形態では常にデジタル音響データが与えられるため、携帯電話の音声録音信号を用いた照合には対応する必要はない。）。そのため、正確に、携帯電話の音声再生範囲に合わせる場合は、特徴ワードの生成の上限を３．４ｋＨｚ付近とするようにしても良い。本実施形態では、ｊ＝０〜２０４７の周波数成分のうち、４ｋＨｚ付近より高いｊ＝３８５〜２０４７については利用しない。また、３４０Ｈｚ以下であるｊ＝０〜３２の低周波成分についても利用しない。すなわち、本実施形態では、ｊ＝３３〜３８４の周波数成分を用いる。具体的には、以下の〔数式２〕に従った処理を実行し、１１周波数成分単位のＰｎに統合することになる。 Subsequently, an integration process of spectrum components is performed (S13). Although the spectral component up to about 22 kHz is obtained by the above frequency conversion, the spectral component in the range of 340 Hz or higher and lower than about 4 kHz is used for generation of the feature word in this embodiment. This is in order to correspond to a voice compression format such as 3GPP standard used for voice reproduction of a cellular phone (however, since digital acoustic data is always given in this embodiment, a voice recording signal of a cellular phone is used. There is no need to deal with matching.) For this reason, in order to accurately match the voice reproduction range of the mobile phone, the upper limit of the feature word generation may be set to around 3.4 kHz. In the present embodiment, j = 385 to 2047 higher than the vicinity of 4 kHz among the frequency components of j = 0 to 2047 is not used. Also, the low frequency component of j = 0 to 32 that is 340 Hz or less is not used. That is, in the present embodiment, j = 33 to 384 frequency components are used. Specifically, processing according to the following [Equation 2] is executed and integrated into Pn in units of 11 frequency components.

〔数式２〕
Ｐ０＝（Ｅ₃₃＋Ｅ₃₄＋…＋Ｅ₄₃）^1/4
Ｐ１＝（Ｅ₄₄＋Ｅ₄₅＋…＋Ｅ₅₄）^1/4
：
：
Ｐ３１＝（Ｅ₃₇₄＋Ｅ₃₇₅＋…＋Ｅ₃₈₄）^1/4 [Formula 2]
_{_{P0 = (E 33 + E 34}} + ... + E 43) 1/4
_{_{P1 = (E 44 + E 45}} + ... + E 54) 1/4
:
:
P31 = ( _E374 + _E375 + ... + _E384 ) ^1/4

上記〔数式２〕により、ｊ＝３３〜３８４の３５２個の周波数成分が、ｎ＝０〜３１の３２個の周波数成分に統合されることになる。上記処理は、各音響フレームについて行われ、各音響フレームについて、３２個の周波数成分が得られることになる。 According to the above [Equation 2], 352 frequency components of j = 33 to 384 are integrated into 32 frequency components of n = 0 to 31. The above process is performed for each acoustic frame, and 32 frequency components are obtained for each acoustic frame.

次に、各音響フレームについて、直前の音響フレームのスペクトル成分との差分を算出する（Ｓ１４）。上記Ｓ１１〜Ｓ１３の処理は、各音響フレームに対して順次行われる。このＳ１４におけるフレーム間差分の算出処理は、各音響フレームについてＳ１３までの処理を行った結果得られたＰ０〜Ｐ３１を利用するものである。具体的には、以下の〔数式３〕に従った処理を行い、フレーム間差分Ｄｎ（ｔ）を得る。 Next, for each acoustic frame, a difference from the spectral component of the immediately preceding acoustic frame is calculated (S14). The processes of S11 to S13 are sequentially performed on each acoustic frame. The inter-frame difference calculation processing in S14 uses P0 to P31 obtained as a result of performing the processing up to S13 for each acoustic frame. Specifically, processing according to the following [Equation 3] is performed to obtain an inter-frame difference Dn (t).

〔数式３〕
Ｄｎ（ｔ）＝｜Ｐｎ（ｔ）−Ｐｎ（ｔ−１）｜、ｎ＝０，…，３１ [Formula 3]
Dn (t) = | Pn (t) −Pn (t−1) |, n = 0,..., 31

上記〔数式３〕においてＰｎ（ｔ）は、ｔ番目の音響フレームにおける統合された周波数成分である。このように、隣接する音響フレーム間の差分を算出するのは、音響データの振幅レベルがわずかに変化するような箇所についても、振幅レベルの変化を強調させ、音響データの特徴を反映した特徴ワードを生成するためである。 In the above [Expression 3], Pn (t) is an integrated frequency component in the t-th acoustic frame. In this way, the difference between adjacent acoustic frames is calculated by emphasizing changes in the amplitude level and reflecting the characteristics of the acoustic data even in places where the amplitude level of the acoustic data slightly changes. It is for producing | generating.

フレーム間差分の算出処理を終えたら、所定フレーム数の処理が終了したかどうかを判断する（Ｓ１５）。具体的には、ｔ≧Ｔであるかどうかを判断する。その結果、ｔ＜Ｔである場合は、ｔをインクリメントしてＳ１１に戻る。Ｓ１５における判断の結果、ｔ≧Ｔである場合は、得られたＴ個の差分Ｄｎ（ｔ）の総和を求める（Ｓ１６）。すなわち、上記Ｓ１１〜Ｓ１４の処理を各音響フレームに対して順次行い、音響フレーム間の差分Ｄｎ（ｔ）がＴ個（本実施形態では１１個）得られたら、Ｔ個の差分Ｄｎ（ｔ）の総和算出を行うことになる。具体的には、以下の〔数式４〕に従った処理を行い、フレーム間差分の総和Ｓｎを得る。 When the inter-frame difference calculation process is completed, it is determined whether the predetermined number of frames have been processed (S15). Specifically, it is determined whether t ≧ T. As a result, if t <T, t is incremented and the process returns to S11. If t ≧ T as a result of the determination in S15, the total sum of the obtained T differences Dn (t) is obtained (S16). That is, when the processes of S11 to S14 are sequentially performed on each acoustic frame and T differences Dn (t) between the acoustic frames are obtained (11 in this embodiment), T differences Dn (t) are obtained. Will be calculated. Specifically, processing according to the following [Equation 4] is performed to obtain a sum Sn of interframe differences.

〔数式４〕
Ｓｎ＝Σ_t=0,…,T-1Ｄｎ（ｔ） [Formula 4]
Sn = Σ _{t = 0,..., T-1} Dn (t)

上記〔数式４〕において、“Σ_t=0,…,T-1”は、ｔ＝０からＴ−１までｔを１ずつ増加させたときの総和を意味する。続いて、上記〔数式４〕により得られたＳｎの二値化処理を行う（Ｓ１７）。具体的には、まず、３２個のＳｎ配列をｎ≧１４とｎ≦１３の上下帯域で２分割し、ｎ≦１３の１４個のうち値の大きい７個に１を与え、値の小さい７個に０を与えるとともに、ｎ≧１４の１８個のうち値の大きい９個に１を与え、値の小さい９個に０を与える。ここで、単純に全３２個のＳｎのうち値の大きい１６個と、小さい１６個に１と０を与えるのではなくて、３２バンドを周波数が高い１８バンドのグループと、周波数が低い１４バンドのグループに分けて、それぞれそのグループ内で均等に１と０を与えるようにしたのは、各種データ圧縮処理に伴う周波数特性の影響を補正するためである。上下のバンドを１８バンドと１４バンドの位置で分けたのは、実験の結果、この位置で分けたとき、検索に使用する検索音響データに対してＭＰ３などの各種データ圧縮処理を施した結果、検索精度が最も高かったためである。Ｓ１７における処理により、各周波数帯ｎについてのフレーム間差分の総和Ｓｎが１ビットで表現可能となる。そして、ｎ＝０をＬＳＢ、ｎ＝３１をＭＳＢとして３２ビットの特徴パターンＦｄ（ｙ）を得る。ここで、ｙ（＝０，…，Ｙ−１）は、１つの原音響データから生成されるＹ個の特徴ワードにおいて、その順番を示す変数である。したがって、ｙは演奏開始からの時刻に比例する変数となる。 In the above [Equation 4], “Σt _{= 0,..., T−1} ” means the total sum when t is increased by 1 from t = 0 to T−1. Subsequently, the binarization process of Sn obtained by the above [Equation 4] is performed (S17). Specifically, first, the 32 Sn arrays are divided into two in the upper and lower bands of n ≧ 14 and n ≦ 13, and 1 is given to 7 of the 14 values of n ≦ 13, and 7 of the smaller value. In addition to giving 0 to each, 1 is given to 9 having a large value among 18 pieces of n ≧ 14, and 0 is given to 9 having a small value. Here, instead of simply giving 1 and 0 to 16 large values and 16 small values out of all 32 Sn, 32 bands are grouped into 18 bands with high frequency and 14 bands with low frequency. The reason why 1 and 0 are equally given in each group is to correct the influence of the frequency characteristics associated with various data compression processes. The reason why the upper and lower bands are divided at the positions of the 18 band and the 14 band is that, as a result of the experiment, when divided at this position, the search acoustic data used for the search is subjected to various data compression processes such as MP3, This is because the search accuracy was the highest. By the processing in S17, the sum Sn of inter-frame differences for each frequency band n can be expressed by 1 bit. Then, a 32-bit feature pattern Fd (y) is obtained with n = 0 as LSB and n = 31 as MSB. Here, y (= 0,..., Y−1) is a variable indicating the order of Y feature words generated from one original sound data. Therefore, y is a variable proportional to the time from the start of performance.

次に、音量データの算出を行う（Ｓ１８）。具体的には、まず、以下の〔数式５〕を用いて総和音量Volを算出する。 Next, the volume data is calculated (S18). Specifically, first, the total volume Vol is calculated using the following [Formula 5].

〔数式５〕
Vol＝Σ_t=0,…,T-1｛Σ_n=0,…,31Ｐｎ（ｔ）｝ [Formula 5]
Vol = Σ _{t = 0, ..., T-1} {Σ _{n = 0, ..., 31} Pn (t)}

上記〔数式５〕において、“Σ_t=0,…,T-1”は、ｔ＝０からＴ−１までｔを１ずつ増加させたときの総和を意味する。上記〔数式５〕に示すように、統合処理により得られた全ての成分Ｐｎ（ｔ）の値をＴ個の音響フレームについて加算する。これにより、Ｔ個の音響フレームについての音量の総和である総和音量Volが得られる。この総和音量Volの値に適宜設定した固定のスケーリング値を乗算して、０〜２５５の範囲に収まるように正規化して音量データＶｄ(ｙ)を得る。正規化により音量データＶｄ(ｙ)は８ビットで表現されることとなる。音量データＶｄ(ｙ)は、上記〔数式５〕に示されるように、Ｔ個の音響フレームに渡る総和音量Volを基礎としているため、各フレーム単位の音量ではなく、Ｔ個の音響フレームの総和音量を表現していることになる。 In the above [Equation 5], “Σ _{t = 0,..., T−1} ” means the total when t is increased by 1 from t = 0 to T−1. As shown in the above [Formula 5], the values of all the components Pn (t) obtained by the integration process are added for T acoustic frames. As a result, a total volume Vol that is the total volume of the T acoustic frames is obtained. Multiplication volume is multiplied by a fixed scaling value set appropriately, and normalization is performed so as to be within the range of 0 to 255 to obtain volume data Vd (y). The sound volume data Vd (y) is represented by 8 bits by normalization. Since the volume data Vd (y) is based on the total volume Vol over T acoustic frames as shown in [Formula 5], the volume data Vd (y) is not the volume of each frame unit, but the total of T acoustic frames. It represents the volume.

上記Ｓ１７、Ｓ１８の処理は、順序を入れ替えて行うことも可能である。Ｓ１７、Ｓ１８による処理の結果、３２ビットの特徴パターンと８ビットの音量データにより構成される４０ビットの特徴ワードが得られる。 The processes of S17 and S18 can be performed by changing the order. As a result of the processing in S17 and S18, a 40-bit feature word composed of a 32-bit feature pattern and 8-bit volume data is obtained.

以上の処理を各音響フレームに対して実行することにより、その音響データについての特徴ワードが多数生成されることになる。例えば、上記の例のように、サンプリング周波数４４．１ｋＨｚ、１音響フレームが４０９６サンプル、音響フレームを２０４８サンプルずつ重複させた場合、１特徴ワードは約０．５０６秒となり、５分間の音響データからは、約６００個の特徴ワードが生成されることになる。 By executing the above processing for each sound frame, a large number of feature words for the sound data are generated. For example, if the sampling frequency is 44.1 kHz, the sound frame is 4096 samples, and the sound frame is overlapped by 2048 samples as in the above example, one feature word is approximately 0.506 seconds, and the sound data from 5 minutes Will generate approximately 600 feature words.

ここで、上記特徴ワードの生成処理を、図４の概念図を用いて説明する。図４（ａ）は、特徴ワードの生成対象とする音響データの波形を示す図である。関連情報登録装置では、音響データを音響フレーム単位で読み取っていくが、図４（ｂ）に示すように、読取範囲を重複させて読み取らせる。そして、各音響フレームに対して、図４（ｃ）に示すように所定の周波数範囲における周波数成分を抽出し、３２バンドに統合する。これは、上記Ｓ１１〜Ｓ１３に相当する。次に、図４（ｄ）に示すように統合成分のバンドごとの隣接する音響フレーム間における差分処理、３２バンドの統合成分の総和処理を行う。統合成分の差分処理は上記Ｓ１４に相当し、統合成分の総和処理は上記Ｓ１８の各周波数成分（ｎ＝０〜３１）の総和処理に相当する。次に、図４（ｅ）に示すように３２バンド差分成分の総和処理、および音量の総和処理を行う。３２バンド差分成分の総和処理は、上記Ｓ１６に相当し、音量の総和処理は上記Ｓ１８の各音響フレーム（ｔ＝０〜Ｔ−１）の総和処理に相当する。次に、図４（ｆ）に示すように３２バンド総和成分の二値化処理、および音量の圧縮処理（上記〔数式５〕に基づき算出された値に所定のスケーリング値を乗算して２５６段階に音量レベルを圧縮する処理）を行う。３２バンド総和成分の二値化処理は上記Ｓ１７に相当し、音量の圧縮処理は上記Ｓ１８に相当する。図４に示すように、音響データから順次音響フレームを読み込み、Ｔ個の音響フレーム単位で１つの特徴ワードを生成していく処理が行われることになる。 Here, the process of generating the feature word will be described with reference to the conceptual diagram of FIG. FIG. 4A is a diagram illustrating a waveform of acoustic data that is a generation target of a feature word. In the related information registration apparatus, the acoustic data is read in units of acoustic frames. However, as shown in FIG. Then, for each acoustic frame, as shown in FIG. 4C, frequency components in a predetermined frequency range are extracted and integrated into 32 bands. This corresponds to S11 to S13. Next, as shown in FIG. 4D, a difference process between adjacent acoustic frames for each band of the integrated component and a total process of the 32 band integrated components are performed. The integrated component difference process corresponds to S14, and the integrated component sum process corresponds to the sum process of each frequency component (n = 0 to 31) in S18. Next, as shown in FIG. 4E, a summation process of 32-band difference components and a summation process of volume are performed. The summation process of the 32-band difference component corresponds to S16, and the volume summation process corresponds to the summation process of each acoustic frame (t = 0 to T-1) in S18. Next, as shown in FIG. 4 (f), the binarization process of the 32-band sum component and the compression process of the sound volume (256 levels obtained by multiplying the value calculated based on the above [Formula 5] by a predetermined scaling value). To the volume level). The binarization process for the 32-band sum component corresponds to S17, and the volume compression process corresponds to S18. As shown in FIG. 4, a process of sequentially reading sound frames from sound data and generating one feature word in units of T sound frames is performed.

指定された１つの原音響データ全体に対して、登録特徴ワード群（登録特徴ワードの集合）が得られたら、次に、代表登録特徴データ生成手段２０が、登録特徴ワード群を用いて代表登録特徴データを生成する。代表登録特徴データは、登録特徴ワードを基礎とする特徴成分の時間方向における平均値、標準偏差の組として構成される。図５は、登録特徴ワードを基礎とする特徴成分の時間方向における平均値、標準偏差の算出処理を示すフローチャートである。まず、代表登録特徴データ生成手段２０は、登録特徴ワードを多値化処理することにより登録特徴データ配列を生成する（Ｓ１９）。具体的には、以下の〔数式６〕に従った処理を実行することにより、登録特徴ワードを基礎とする特徴成分である登録特徴データ配列Ｑ(ｎ,ｙ)（ｎ＝０,…,３１；ｙ＝０, …,Ｙ−１)を生成する。 When a registered feature word group (a set of registered feature words) is obtained for the entire specified original sound data, the representative registered feature data generation unit 20 then performs a representative registration using the registered feature word group. Generate feature data. The representative registered feature data is configured as a set of an average value and a standard deviation in the time direction of feature components based on the registered feature word. FIG. 5 is a flowchart showing a calculation process of an average value and a standard deviation in the time direction of feature components based on registered feature words. First, the representative registered feature data generation means 20 generates a registered feature data array by multi-value processing the registered feature word (S19). Specifically, by executing processing according to the following [Equation 6], a registered feature data array Q (n, y) (n = 0,..., 31) that is a feature component based on the registered feature word. Y = 0,..., Y-1) is generated.

〔数式６〕
Ｆｄ(ｙ)の各ビットｎが１の場合、Ｑ(ｎ,ｙ)←Ｖｄ(ｙ)
Ｆｄ(ｙ)の各ビットｎが０の場合、Ｑ(ｎ,ｙ)←−Ｖｄ(ｙ) [Formula 6]
When each bit n of Fd (y) is 1, Q (n, y) ← Vd (y)
When each bit n of Fd (y) is 0, Q (n, y) ← −Vd (y)

次に、代表登録特徴データ生成手段２０は、登録特徴データ配列Ｑ(ｎ, ｙ)の時間方向ｙにおける平均値配列Ｃｄ（ｎ, ｒ）、標準偏差配列Ｌｄ（ｎ, ｒ）を算出する（Ｓ２０）。具体的には、以下の〔数式７〕に従った処理を実行することにより、平均値配列Ｃｄ（ｎ, ｒ）、標準偏差配列Ｌｄ（ｎ, ｒ）を算出する。 Next, the representative registered feature data generation means 20 calculates an average value array Cd (n, r) and a standard deviation array Ld (n, r) in the time direction y of the registered feature data array Q (n, y) ( S20). Specifically, the average value array Cd (n, r) and the standard deviation array Ld (n, r) are calculated by executing processing according to the following [Equation 7].

〔数式７〕
Ｃｄ（ｎ, ｒ）=[Σ_y=0,…,Y-1 Ｑ(ｎ, ｙ)]／Ｙ
Ｌｄ（ｎ, ｒ）=[Σ_y=0,…,Y-1 (Ｑ(ｎ, ｙ)-Ｃｄ(ｎ，ｒ))²／Ｙ]^1/2 [Formula 7]
Cd (n, r) = [Σ _{y = 0,..., Y-1} Q (n, y)] / Y
Ld (n, r) = [Σ _{y = 0,..., Y-1} (Q (n, y) −Cd (n, r)) ² / Y] ^1/2

図６は、Ｓ１９、Ｓ２０による登録特徴データ配列の平均値、標準偏差の算出処理の概念図である。図６に示すように、登録特徴ワードは各時刻ｙ（ｙ＝０,…,Ｙ−１）に対応して３２ビットの特徴パターンＦ(ｙ)と８ビットの音量データＶ(ｙ)を有している。図６においては、特徴パターンの３２個の各ビットをＢｉｔ０〜Ｂｉｔ３１で示している。 FIG. 6 is a conceptual diagram of the process of calculating the average value and standard deviation of the registered feature data array in S19 and S20. As shown in FIG. 6, the registered feature word has a 32-bit feature pattern F (y) and 8-bit volume data V (y) corresponding to each time y (y = 0,..., Y−1). doing. In FIG. 6, 32 bits of the feature pattern are indicated by Bit0 to Bit31.

そして、上記〔数式６〕に示したようにビットｎ（Ｂｉｔ０〜Ｂｉｔ３１）の値が０であるか１であるかにより、Ｑ(ｎ,ｙ)の値を音量データの値そのままとするか、音量データの値に−１を乗じたものとするかを決定し、ビットｎに対応するＱ(ｎ,ｙ)の値を定める。このとき、８ビットの音量データを負の値とする場合が生じるため、Ｑ(ｎ,ｙ)は１６ビットで表現する。３２ビット特徴パターンでは、各バンドについて１ビットで表現されていたものが、１６ビット（２バイト）で表現されることになるので、各時刻ｙにおける登録特徴データは６４バイトとなる。 Then, depending on whether the value of bit n (Bit 0 to Bit 31) is 0 or 1 as shown in [Formula 6], the value of Q (n, y) is left as the value of the volume data, It is determined whether the value of the volume data is multiplied by −1, and the value of Q (n, y) corresponding to bit n is determined. At this time, since 8-bit volume data may be a negative value, Q (n, y) is expressed by 16 bits. In the 32-bit feature pattern, what is represented by 1 bit for each band is represented by 16 bits (2 bytes), so the registered feature data at each time y is 64 bytes.

代表登録特徴データ生成手段２０により代表登録特徴データとして生成された平均値配列Ｃｄ（ｎ, ｒ）、標準偏差配列Ｌｄ（ｎ, ｒ）は、登録手段３５により、音響データＩＤ等の音響データを特定する情報（ｒと１対１で対応）と対応付けて音響データベース４０に登録される。 The average value array Cd (n, r) and the standard deviation array Ld (n, r) generated as representative registration feature data by the representative registration feature data generation means 20 are stored as acoustic data such as an acoustic data ID by the registration means 35. The information is registered in the acoustic database 40 in association with information to be identified (corresponding to r one-to-one).

指定された１つの原音響データ全体に対して、登録特徴ワード群、代表登録特徴データが得られたら、次に、部分特徴ワード生成手段１５が、原音響データの先頭から所定の長さの区間を部分区間として順次抽出し、抽出した各部分音響データについて特徴ワード（部分特徴ワード）を生成する。部分区間の長さとしては、適宜設定することができる。部分区間は、検索時に使用する代表的な区間を想定して、与えられた原音響データ全体より盲目的（実際の実施例では均等間隔に）に抽出したもので、多少の漏れは発生しても良いが、重複がないように先頭から一定の間隔で設定され、１つの原音響データについてＺ個の部分区間が得られる。部分特徴ワードの生成は、基本的には、原音響データ全体に対して行われた登録特徴ワードの生成の場合と同様に、図３、図４に示したような手順、概念で作成される。ただし、登録特徴ワードと異なり、位相をずらした複数の部分特徴ワード群を追加生成する必要がある。 When the registered feature word group and the representative registered feature data are obtained for the entire designated original sound data, the partial feature word generation unit 15 then selects a section of a predetermined length from the beginning of the original sound data. Are sequentially extracted as partial sections, and a feature word (partial feature word) is generated for each extracted partial sound data. The length of the partial section can be set as appropriate. The partial section is extracted blindly (equally spaced in the actual embodiment) from the given original sound data, assuming a typical section to be used at the time of search, and some leakage has occurred. However, it is set at a constant interval from the beginning so that there is no overlap, and Z partial sections are obtained for one original sound data. The generation of the partial feature words is basically created by the procedure and concept as shown in FIGS. 3 and 4 as in the case of the registration feature words generated for the entire original sound data. . However, unlike the registered feature words, it is necessary to additionally generate a plurality of partial feature word groups whose phases are shifted.

部分音響データは、楽曲の一部が切り取られたものであるため、必ずしも、原音響データとのタイミングが一致するものではなく、位置ずれが生じることがある。特徴ワードの生成にあたっては、Ｔ個（本実施形態では、Ｔ＝１１）の音響フレームを平均化して生成しているため、比較的位置ずれには強い。しかし、リズム変化が激しい原音響データの場合、特徴ワードの生成単位である１１音響フレームの、ほぼ半分である５音響フレーム程度ずれると、顕著に異なる特徴ワードが生成され、登録特徴ワードとの照合において顕著な相違が発生してしまう。そのため、特徴ワードの１生成単位である１１音響フレームの範囲内で数音響フレーム（本実施形態では２音響フレーム）ずつ遅らせて（位相を変更して）複数の部分特徴ワードを生成して、音響データベース４０内に登録予定の登録特徴ワードと照合するようにする。 Since the partial sound data is a piece of music cut out, the timing does not necessarily coincide with the original sound data, and a positional shift may occur. The feature word is generated by averaging T sound frames (T = 11 in the present embodiment), and is relatively resistant to displacement. However, in the case of original sound data with a sharp change in rhythm, a feature word that is remarkably different is generated when it is shifted by about 5 sound frames, which is almost half of 11 sound frames, which is a feature word generation unit, and is compared with a registered feature word. A significant difference will occur. Therefore, a plurality of partial feature words are generated by delaying (changing the phase) several acoustic frames (in this embodiment, two acoustic frames) within a range of 11 acoustic frames, which is one generation unit of feature words, The database 40 collates with a registered feature word to be registered.

部分特徴ワード群の生成の場合、図３のＳ１６において、上記〔数式４〕に基づきＳｎを算出する代わりに、ｈ＝０，…，Ｈ−１としてＨ種類のＳｎ（ｈ）を以下〔数式８〕に基づき算出する。ｈｓは適当な正の整数で本実施形態ではｈｓ＝２である。 In the case of generating the partial feature word group, instead of calculating Sn based on the above [Formula 4] in S16 of FIG. 3, H = 0,... 8]. hs is a suitable positive integer, and hs = 2 in this embodiment.

〔数式８〕
Ｓｎ（ｈ）＝Σ_t=0,…,T-1Ｄｎ（ｔ＋ｈ・ｈｓ） [Formula 8]
Sn (h) = Σ _{t = 0,..., T-1} Dn (t + h · hs)

続いて、Ｓ１７において、Ｈ種類のＳｎ（ｈ）の各々に対して同様に二値化処理を行い、ｎ＝０をＬＳＢ、ｎ＝３１をＭＳＢとした３２ビットの特徴パターンＦ（ｚ，ｈ，ｘ）を得る。ここで、ｚ（＝０，…，Ｚ−１）は、１つの原音響データから得られたＺ個の部分音響データにおいて、その順番を示す変数である。ｘ（＝０，…，Ｘ−１）は、１つの部分音響データから生成されるＸ個の特徴ワードにおいて、その順番を示す変数である。したがって、ｘは演奏開始からの時刻に比例する変数となる。 Subsequently, in S17, binarization processing is similarly performed on each of the H types of Sn (h), and a 32-bit feature pattern F (z, h) in which n = 0 is LSB and n = 31 is MSB. , X). Here, z (= 0,..., Z−1) is a variable indicating the order in Z pieces of partial sound data obtained from one original sound data. x (= 0,..., X−1) is a variable indicating the order of X feature words generated from one partial sound data. Therefore, x is a variable proportional to the time from the start of performance.

更に、Ｓ１８において、上記〔数式５〕に基づきVolを算出する代わりに、同様にｈ＝０，…，Ｈ−１としてＨ種類のVol（ｈ）を以下〔数式９〕に基づき算出する。ｈｓは同様に適当な正の整数で本実施形態ではｈｓ＝２である。 Furthermore, in S18, instead of calculating Vol based on the above [Formula 5], H types of Vol (h) are similarly calculated based on [Formula 9] below with h = 0,..., H−1. Similarly, hs is a suitable positive integer, and hs = 2 in this embodiment.

〔数式９〕
Vol（ｈ）＝Σ_t=0,…,T-1｛Σ_n=0,…,31Ｐｎ（ｔ＋ｈ・ｈｓ）｝ [Formula 9]
Vol (h) = Σ _{t = 0,..., T-1} {Σ _{n = 0,..., 31} Pn (t + h · hs)}

この総和音量Vol（ｈ）の値に適宜設定した固定のスケーリング値を乗算して、０〜２５５の範囲に収まるように正規化して音量データＶ（ｚ，ｈ，ｘ）を得る。正規化により音量データＶ（ｚ，ｈ，ｘ）は８ビットで表現されることとなる。音量データＶ（ｚ，ｈ，ｘ）は、上記〔数式９〕に示されるように、Ｔ個の音響フレームに渡る総和音量Vol（ｈ）を基礎としているため、各フレーム単位の音量ではなく、Ｔ個の音響フレームの総和音量を表現していることになる。 The value of the total volume Vol (h) is multiplied by a fixed scaling value appropriately set, and normalized so as to fall within the range of 0 to 255 to obtain volume data V (z, h, x). The sound volume data V (z, h, x) is represented by 8 bits by normalization. Since the volume data V (z, h, x) is based on the total volume Vol (h) over T acoustic frames as shown in [Formula 9], the volume data V (z, h, x) This represents the total volume of T acoustic frames.

以上の処理から、Ｚ個の部分区間（部分音響データ）の各々が１５秒の長さをもつと仮定すると、部分特徴ワードは、１つの部分音響データにつき、３０×Ｈ個（本実施形態ではＨ＝５）生成されることになる。 From the above processing, assuming that each of Z partial sections (partial acoustic data) has a length of 15 seconds, there are 30 × H partial feature words per partial acoustic data (in the present embodiment). H = 5) will be generated.

指定された１つの原音響データ全体に対して、部分特徴ワード群（部分特徴ワードの集合）が得られたら、次に、代表部分特徴データ生成手段２５が、部分特徴ワード群を用いて代表部分特徴データを生成する。代表部分特徴データは、部分特徴ワードを基礎とする特徴成分の時間方向における平均値、標準偏差の組として構成される。代表部分特徴データの生成は、図５に示した代表登録特徴データの生成と同様にして行われる。ただし、代表登録特徴データと異なり、位相をずらした複数の代表部分特徴データを追加生成する必要がある。 Once a partial feature word group (a set of partial feature words) is obtained for the entire specified original sound data, the representative partial feature data generation means 25 then uses the partial feature word group to represent the representative part. Generate feature data. The representative partial feature data is configured as a set of an average value and a standard deviation in the time direction of the feature component based on the partial feature word. The generation of the representative partial feature data is performed in the same manner as the generation of the representative registered feature data shown in FIG. However, unlike the representative registered feature data, it is necessary to additionally generate a plurality of representative partial feature data whose phases are shifted.

部分特徴ワード群の生成の場合、代表部分特徴データ生成手段２５は、図５のＳ１９において、上記〔数式６〕に基づき登録特徴データ配列Ｑ(ｎ, ｙ)を生成する代わりに、ｈ＝０，…，Ｈ−１として部分特徴データ配列Ｑ(ｚ，ｎ，ｈ，ｘ)を以下〔数式１０〕に基づき生成する。 In the case of generating a partial feature word group, the representative partial feature data generation unit 25 generates h = 0 instead of generating the registered feature data array Q (n, y) based on the above [Equation 6] in S19 of FIG. ,..., H−1, a partial feature data array Q (z, n, h, x) is generated based on [Formula 10] below.

〔数式１０〕
Ｆ(ｚ，ｈ，ｘ)の各ビットｎが１の場合、Ｑ(ｚ，ｎ，ｈ，ｘ)←Ｖ(ｚ，ｈ，ｘ)
Ｆ(ｚ，ｈ，ｘ)の各ビットｎが０の場合、Ｑ(ｚ，ｎ，ｈ，ｘ)←−Ｖ(ｚ，ｈ，ｘ)
（ｚ＝０，…，Ｚ−１；ｎ＝０,…,３１；ｈ＝０，…，Ｈ−１；ｘ＝０,…,Ｘ（ｚ，ｈ）−１) [Formula 10]
When each bit n of F (z, h, x) is 1, Q (z, n, h, x) ← V (z, h, x)
When each bit n of F (z, h, x) is 0, Q (z, n, h, x) ← −V (z, h, x)
(Z = 0, ..., Z-1; n = 0, ..., 31; h = 0, ..., H-1; x = 0, ..., X (z, h) -1)

次に、代表部分特徴データ生成手段２５は、図５のＳ２０において、上記〔数式７〕に代えて、以下の〔数式１１〕に従った処理を実行することにより、部分特徴データ配列Ｑ(ｎ，ｚ，ｈ，ｘ)の時間方向ｘ、位相ｈにおける平均値配列Ｃ（ｚ，ｎ）、標準偏差配列Ｌ（ｚ，ｎ）を算出する。 Next, the representative partial feature data generation means 25 executes a process according to the following [Formula 11] instead of the above [Formula 7] in S20 of FIG. , Z, h, x) in the time direction x and the phase h, the average value array C (z, n) and the standard deviation array L (z, n) are calculated.

〔数式１１〕
Ｃ（ｚ，ｎ）=[Σ_{h=0,…,H-1,x=0,…,X(z,h)-1}Ｑ(ｚ，ｎ，ｈ，ｘ)]／（Ｈ・Ｘ）
Ｌ（ｚ，ｎ）=[Σ_{h=0,…,H-1,x=0,…,X(z,h)-1}(Ｑ(ｚ，ｎ，ｈ，ｘ)−Ｃ(ｚ，ｎ，ｈ，ｘ))²／（Ｈ・Ｘ）]^1/2 [Formula 11]
C (z, n) = [Σh _{= 0,..., H-1, x = 0,..., X (z, h) -1} Q (z, n, h, x)] / (H.X)
L (z, n) = [Σh _{= 0,..., H-1, x = 0,..., X (z, h) -1} (Q (z, n, h, x) -C (z, n , H, x)) ² / (H · X)] ^1/2

代表部分特徴データ生成手段２５により代表部分特徴データとして生成された平均値配列Ｃ（ｚ，ｎ）、標準偏差配列Ｌ（ｚ，ｎ）は、個別判定参考値の算出のためにのみ用いられ、音響データベース４０には登録されない。 The average value array C (z, n) and the standard deviation array L (z, n) generated as the representative partial feature data by the representative partial feature data generation means 25 are used only for calculating the individual determination reference value, It is not registered in the acoustic database 40.

ある原音響データについて、登録特徴ワード群と、代表登録特徴データと、各部分区間の部分特徴ワード群と、代表部分特徴データが得られたら、次に、判定参考値算出手段３０が、代表登録特徴データと各部分区間の代表部分特徴データ、登録特徴ワード群と各部分区間の部分特徴ワード群を用いて、その原音響データについての個別判定参考値を算出する。この個別判定参考値の算出処理を、図７〜図１０のフローチャートを用いて説明する。 When the registered feature word group, the representative registered feature data, the partial feature word group of each partial section, and the representative partial feature data are obtained for a certain original sound data, the determination reference value calculating unit 30 then performs the representative registration. Using the feature data, the representative partial feature data of each partial section, the registered feature word group, and the partial feature word group of each partial section, an individual determination reference value for the original sound data is calculated. This individual determination reference value calculation process will be described with reference to the flowcharts of FIGS.

図７〜図１０においては、各変数を以下のように定義する。
[登録特徴ワード]
Ｙ：原音響データについて登録されている登録特徴ワード数
Ｆｄ（ｙ）：特徴パターン配列（ｙ＝０，…，Ｙ−１）、３２ビット
Ｖｄ（ｙ）：音量データ配列（ｙ＝０，…，Ｙ−１）、８ビット
Ｃｄ（ｎ）：登録特徴データ配列Ｑ(ｎ,ｙ)のＹ個の平均値
Ｌｄ（ｎ）：登録特徴データ配列Ｑ(ｎ,ｙ)の標準偏差 7 to 10, each variable is defined as follows.
[Registration feature word]
Y: Number of registered feature words registered for original sound data Fd (y): Feature pattern array (y = 0,..., Y-1), 32 bits Vd (y): Volume data array (y = 0,... , Y-1), 8 bits Cd (n): Y average value of registered feature data array Q (n, y) Ld (n): Standard deviation of registered feature data array Q (n, y)

[部分特徴ワード]
Ｚ：原音響データに設定される部分区間数（部分音響データ数）
Ｘ（ｚ、ｈ）：部分区間ｚ（ｚ＝０，…，Ｚ−１）のある位相ｈ（ｈ＝０，…，Ｈ−１）における部分特徴ワード数
Ｆ（ｚ，ｈ，ｘ）：特徴パターン配列（ｘ＝０，…，Ｘ（ｚ、ｈ）−１）、３２ビット
Ｖ（ｚ，ｈ，ｘ）：音量データ配列（ｘ＝０，…，Ｘ（ｚ、ｈ）−１）、８ビット
Ｃ（ｚ，ｎ）：部分特徴データ配列Ｑ(ｚ，ｎ，ｈ，ｘ)のＨ×Ｘ個の平均値
Ｌ（ｚ，ｎ）：部分特徴データ配列Ｑ(ｚ，ｎ，ｈ，ｘ)の標準偏差 [Partial feature word]
Z: Number of partial sections set in original sound data (number of partial sound data)
X (z, h): Number of partial feature words in a phase h (h = 0,..., H−1) with a partial interval z (z = 0,..., Z−1) F (z, h, x): Feature pattern array (x = 0,..., X (z, h) -1), 32-bit V (z, h, x): Volume data array (x = 0,..., X (z, h) -1) , 8 bits C (z, n): H × X average value of partial feature data array Q (z, n, h, x) L (z, n): Partial feature data array Q (z, n, h) , X) standard deviation

[照合変数]
Ｗ：照合ワード数（ｗ＝０，…，Ｗ−１）、照合する登録特徴ワード、部分特徴ワードの数（例．Ｗ＝６）
Ｕ（ｚ）：代表特徴データ距離
Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）：特徴パターンのワード単位不一致ビット数（０以上３２以下）
Ｓ（ｙ，ｚ，ｈ，ｘ）：合算不一致ビット数、照合ワード数Ｗ個のワード単位不一致ビット数Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）を合算したもの（Σ_w=0,…,W-1Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）） [Collation variable]
W: Number of collation words (w = 0,..., W−1), number of registered feature words and partial feature words to be collated (eg, W = 6)
U (z): representative feature data distance D (y + w, z, h, x + w): number of mismatch bits in word unit of feature pattern (0 to 32)
S (y, z, h, x): the sum of the number of unmatched bits, the number of matching word numbers D (y + w, z, h, x + w) (Σw _{= 0,..., W) -1} D (y + w, z, h, x + w))

[個別判定参考値]
Ｍｕ：代表特徴データ距離の個別判定参考値
Ｍ１：位相を特定した合算不一致ビット数の個別判定参考値
Ｍ２：位相を変化させた合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）の個別判定参考値
Ｍｗ１：位相を特定したワード単位不一致ビット数の個別判定参考値
Ｍｗ２：位相を変化させたワード単位不一致ビット数の個別判定参考値 [Individual judgment reference value]
Mu: Individual determination reference value of representative feature data distance M1: Individual determination reference value of total mismatch bit number specifying phase M2: Individual determination of total mismatch bit number S (r, y, h, x) by changing phase Reference value Mw1: Individual determination reference value for the number of mismatched bits in word unit with phase specified Mw2: Reference value for individual determination of the number of mismatched bits in word unit with phase changed

図７は個別判定参考値の算出処理を示すフローチャートである。まず、初期設定を行う（Ｓ１１０）。具体的には、代表特徴データ距離Ｕ（ｚ）の総和Ｓｕ＝０、位相を特定したワード単位不一致ビット数の総和Ｓｗ１＝０、位相を変化させたワード単位不一致ビット数の総和Ｓｗ２＝０、位相を特定した合算不一致ビット数の総和Ｓ１＝０、位相を変化させたワード単位不一致ビット数の総和Ｓ２＝０、部分区間を特定する変数（区間番号）ｚ＝０に設定する。 FIG. 7 is a flowchart showing the calculation process of the individual determination reference value. First, initial setting is performed (S110). Specifically, the sum Su = 0 of the representative feature data distance U (z), the sum Sw1 = 0 of the word unit mismatch bits specifying the phase, the sum Sw2 = 0 of the word unit mismatch bits changing the phase, The sum S1 = 0 of the total number of mismatch bits specifying the phase, the sum S2 = 0 of the number of word unit mismatch bits changing the phase, and the variable (section number) z = 0 for specifying the partial section are set.

続いて、判定参考値算出手段３０は、部分区間ｚにおける代表特徴データ距離Ｕ（ｚ）の算出を行う（Ｓ１２０）。具体的には、以下の〔数式１２〕に従った処理を実行し、部分区間ｚについての、代表登録特徴データと代表部分特徴データの距離である代表特徴データ距離Ｕ（ｚ）を算出する。 Subsequently, the determination reference value calculation unit 30 calculates the representative feature data distance U (z) in the partial section z (S120). Specifically, processing according to the following [Equation 12] is executed to calculate a representative feature data distance U (z) that is a distance between the representative registered feature data and the representative partial feature data for the partial section z.

〔数式１２〕
Ｕ（ｚ）＝２５６×［Σ_n=0,…,31｛（Ｃｄ（ｎ）−Ｃ（ｚ，ｎ））／（Ｌｄ（ｎ）＋Ｌ（ｚ，ｎ））｝²］^1/2／［Σ_n=0,…,31｛Ｃｄ（ｎ）／（Ｌｄ（ｎ）＋Ｌ（ｚ，ｎ））｝²］^1/4／［Σ_n=0,…,31｛Ｃ（ｚ，ｎ）／（Ｌｄ（ｎ）＋Ｌ（ｚ，ｎ））｝²］^1/4 [Formula 12]
U (z) = 256 × [Σ _{n = 0,..., 31} {(Cd (n) −C (z, n)) / (Ld (n) + L (z, n))} ² ] ^1/2 / [Σ _{n = 0,..., 31} {Cd (n) / (Ld (n) + L (z, n))} ² ] ^1/4 / [Σ _{n = 0,..., 31} {C (z, n) / (Ld (n) + L (z, n))} ² ] ^1/4

上記〔数式１２〕では、[]で括った項が３つ存在するが、１番目の[]の平方根を、２番目の[]の４乗根、３番目の[]の４乗根で除したものに正規化のための係数“２５６”を乗じている。正規化のための係数は、正規化の範囲に合わせて適宜変更することが可能である。なお、上記〔数式１２〕において、“Σ_n=0,…,31”は、ｎ＝０から３１までｎを１ずつ増加させたときの３２個分の総和を意味する。［］で括った各項のべき乗根として、具体的に何乗にするかは適宜変更することができるが、本実施形態では、〔数式１２〕に示すように平方根、４乗根、４乗根としている。 In the above [Equation 12], there are three terms enclosed by []. The square root of the first [] is divided by the fourth root of the second [] and the fourth root of the third []. Is multiplied by a coefficient “256” for normalization. The coefficient for normalization can be appropriately changed according to the range of normalization. In the above [Expression 12], “Σ _{n = 0,..., 31} ” means the total of 32 pieces when n is incremented by 1 from n = 0 to 31. As the power root of each term enclosed in [], the specific power can be changed as appropriate, but in this embodiment, as shown in [Formula 12], the square root, the fourth root, the fourth power Roots.

上記〔数式１２〕は、代表部分特徴データと代表登録特徴データとの距離を示すが、双方の正規化相関係数を基に定義している。具体的には、代表部分特徴データの各要素ｎを部分特徴データ配列の平均値に対して部分特徴データ配列の標準偏差値と登録特徴データ配列の標準偏差値との和で３２個の各要素ごとに除して正規化した各値で定義し、（Ｃ（ｚ，ｎ）／（Ｌｄ（ｎ）＋Ｌ（ｚ，ｎ））とする。同様に、代表登録特徴データの各要素を登録特徴データ配列の平均値に対して部分特徴データ配列の標準偏差値と登録特徴データ配列の標準偏差値との和で３２個の各要素ごとに除して正規化した各値で定義し、Ｃｄ（ｎ）／（Ｌｄ（ｎ）＋Ｌ（ｚ，ｎ））とする。これら、３２個の要素どうしの正規化相関係数は、［Σ_n=0,…,31｛（Ｃｄ（ｎ）−Ｃ（ｚ，ｎ））／（Ｌｄ（ｎ）＋Ｌ（ｚ，ｎ））｝²］／［Σ_n=0,…,31｛Ｃｄ（ｎ）／（Ｌｄ（ｎ）＋Ｌ（ｚ，ｎ））｝²］^1/2／［Σ_n=0,…,31｛（Ｃ（ｚ，ｎ）／（Ｌｄ（ｎ）＋Ｌ（ｚ，ｎ））｝²］^1/2で与えられ、−１〜＋１の実数値をとる。上記〔数式１２〕では、この正規化相関係数値に所定の整数値２５６を乗算して整数表現にし、かつ整数値の変動範囲を拡大して該当レコードと非該当レコードとの格差をつけるため、平方根をとるようにした。このとき、正規化相関係数値をそのまま使用するか、平方根にするか、４乗根にするかは運用上の設計事項で適宜最適な方法を選択すれば良い。 [Equation 12] indicates the distance between the representative partial feature data and the representative registered feature data, and is defined based on the normalized correlation coefficient of both. Specifically, each element n of the representative partial feature data is represented by the sum of the standard deviation value of the partial feature data array and the standard deviation value of the registered feature data array with respect to the average value of the partial feature data array. Each element is defined as a normalized value divided by (C (z, n) / (Ld (n) + L (z, n)). Similarly, each element of the representative registered feature data is registered as a feature. The sum of the standard deviation value of the partial feature data array and the standard deviation value of the registered feature data array with respect to the average value of the data array is defined as each value normalized by dividing each of the 32 elements, and Cd ( n) / (Ld (n) + L (z, n)) The normalized correlation coefficient between these 32 elements is [Σ _{n = 0,..., 31} {(Cd (n) −C (z, n)) / ( Ld (n) + L (z, n))} 2] / [Σ n = 0, ..., 31 {Cd (n) / (Ld (n) + L (z, n)) ^{^{_{2] 1/2 / [Σ n =}}} 0, ..., 31 {(C (z, n) / (Ld (n) + L (z, n))} 2] is given by ^1/2, -1 to +1 In the above [Equation 12], the normalized correlation coefficient value is multiplied by a predetermined integer value 256 to give an integer expression, and the fluctuation range of the integer value is expanded to show the corresponding record and the non-corresponding record. In order to make the difference, the normalized correlation coefficient value is used as it is, whether it is the square root or the fourth root. Just choose.

続いて、位相ｈ＝ｈｏに固定（特定）して、部分区間ｚから生成された部分特徴ワード群と、登録特徴ワード群との照合を行う（Ｓ１３０）。ｈｏとしては、Ｈ＝５の場合、ｈｏ＝０，１，２，３，４のいずれかに設定することができるが、通常演算処理が最も少ないｈｏ＝０に設定する。Ｓ１３０における照合の結果、部分区間ｚについて、最小合算不一致ビット数Ｓ（ｙｍｉｎ，ｚ，ｈｏ，ｘｍｉｎ）が得られる。このＳ１３０の処理については後述する。最小合算不一致ビット数Ｓ（ｙｍｉｎ，ｚ，ｈｏ，ｘｍｉｎ）が得られたら、合算不一致ビット数の総和Ｓ１に加算する処理を行う（Ｓ１４０）。Ｓ１３０の処理において、最小ワード単位不一致ビット数Ｄ（ｙｍｉｎ＋ｗｍａｘ，ｚ，ｈｏ，ｘｍｉｎ＋ｗｍａｘ）も算出されるが、Ｓ１４０においては、ワード単位不一致ビット数の総和Ｓｗ１に、最小ワード単位不一致ビット数Ｄ（ｙｍｉｎ＋ｗｍａｘ，ｚ，ｈｏ，ｘｍｉｎ＋ｗｍａｘ）を加算する処理も行う。 Subsequently, the phase h = ho is fixed (specified) and the partial feature word group generated from the partial section z is collated with the registered feature word group (S130). As for ho, when H = 5, it can be set to any of ho = 0, 1, 2, 3, and 4, but it is set to ho = 0 with the least amount of normal arithmetic processing. As a result of the collation in S130, the minimum sum mismatch bit number S (ymin, z, ho, xmin) is obtained for the partial section z. The process of S130 will be described later. When the minimum sum mismatch bit number S (ymin, z, ho, xmin) is obtained, a process of adding to the sum S1 of the sum mismatch bits is performed (S140). In the process of S130, the minimum word unit mismatch bit number D (ymin + wmax, z, ho, xmin + wmax) is also calculated. In S140, the minimum word unit mismatch bit number D (ymin + wmax) , Z, ho, xmin + wmax) is also performed.

続いて、位相ｈを変化させて部分区間ｚから生成された部分特徴ワード群と、登録特徴ワード群との照合を行う（Ｓ１５０）。Ｓ１５０における照合の結果、部分区間ｚについて、位相が最も適合する場合の最小合算不一致ビット数Ｓ（ｙｍｉｎ，ｚ，ｈｍｉｎ，ｘｍｉｎ）が得られる。このＳ１５０の処理の詳細についても後述する。最小合算不一致ビット数Ｓ（ｙｍｉｎ，ｚ，ｈｍｉｎ，ｘｍｉｎ）が得られたら、合算不一致ビット数の総和Ｓ２に加算する処理を行う（Ｓ１６０）。Ｓ１５０の処理において、最小ワード単位不一致ビット数Ｄ（ｙｍｉｎ＋ｗｍａｘ，ｚ，ｈｍｉｎ，ｘｍｉｎ＋ｗｍａｘ）も算出されるが、Ｓ１６０においては、ワード単位不一致ビット数の総和Ｓｗ２に、最小ワード単位不一致ビット数Ｄ（ｙｍｉｎ＋ｗｍａｘ，ｚ，ｈｍｉｎ，ｘｍｉｎ＋ｗｍａｘ）を加算する処理も行う。 Subsequently, the partial feature word group generated from the partial section z by changing the phase h is collated with the registered feature word group (S150). As a result of the collation in S150, the minimum sum mismatch bit number S (ymin, z, hmin, xmin) when the phase is the best for the partial section z is obtained. Details of the processing in S150 will also be described later. When the minimum sum mismatch bit number S (ymin, z, hmin, xmin) is obtained, a process of adding to the sum S2 of the sum mismatch bits is performed (S160). In the process of S150, the minimum word unit mismatch bit number D (ymin + wmax, z, hmin, xmin + wmax) is also calculated, but in S160, the minimum word unit mismatch bit number D (ymin + wmax) , Z, hmin, xmin + wmax) is also performed.

次に、部分区間を特定する変数ｚをインクリメント、すなわち１だけ増加する（Ｓ１７０）。そして、部分区間を特定する変数ｚが部分区間数Ｚに達したかどうかを判断し（Ｓ１８０）、達していない場合は、Ｓ１２０に戻って、次の部分区間ｚについて、照合処理を行う。各部分区間ｚについて処理を実行し、変数ｚが部分区間数Ｚに達したら、すなわちＺ個全ての部分区間に対する処理を終えたことになるので、１つの原音響データに対応する５種の個別判定参考値を決定する（Ｓ１７０）。 Next, the variable z specifying the partial section is incremented, that is, increased by 1 (S170). Then, it is determined whether or not the variable z specifying the partial section has reached the number of partial sections Z (S180). If not, the process returns to S120, and the next partial section z is collated. When processing is performed for each partial section z and the variable z reaches the number of partial sections Z, that is, processing for all Z partial sections is completed, so five types of individual corresponding to one original sound data A determination reference value is determined (S170).

５種の個別判定参考値のうち、Ｍｕ、Ｍ１、Ｍ２はそれぞれ以下の〔数式１３〕〜〔数式１５〕に従った処理を実行することにより算出される。 Of the five types of individual determination reference values, Mu, M1, and M2 are calculated by executing processes according to the following [Equation 13] to [Equation 15], respectively.

〔数式１３〕
Ｍｕ＝Σ_z=0,…,Z-1Ｕ(ｚ)／Ｚ [Formula 13]
Mu = Σ _{z = 0, ..., Z-1} U (z) / Z

〔数式１４〕
Ｍ１＝Σ_z=0,…,Z-1Ｓ（ｙｍｉｎ，ｚ，ｈｏ，ｘｍｉｎ）／Ｚ [Formula 14]
M1 = Σ _{z = 0,..., Z-1} S (ymin, z, ho, xmin) / Z

〔数式１５〕
Ｍ２＝Σ_z=0,…,Z-1Ｓ（ｙｍｉｎ，ｚ，ｈｍｉｎ，ｘｍｉｎ）／Ｚ [Formula 15]
M2 = Σz _{= 0,..., Z-1} S (ymin, z, hmin, xmin) / Z

そして、個別判定参考値Ｍｗ１として、Ｓ（ｙｍｉｎ，ｚ，ｈｏ，ｘｍｉｎ）を構成するＤ（ｙ＋ｗ，ｚ，ｈｏ，ｘ＋ｗ）の最大値Ｄ（ｙｍｉｎ＋ｗｍａｘ，ｚ，ｈｏ，ｘｍｉｎ＋ｗｍａｘ）を与える。また、個別判定参考値Ｍｗ２として、Ｓ（ｙｍｉｎ，ｚ，ｈｍｉｎ，ｘｍｉｎ）を構成するＤ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）の最大値Ｄ（ｙｍｉｎ＋ｗｍａｘ，ｚ，ｈｍｉｎ，ｘｍｉｎ＋ｗｍａｘ）を与える。 Then, the maximum value D (ymin + wmax, z, ho, xmin + wmax) of D (y + w, z, ho, x + w) constituting S (ymin, z, ho, xmin) is given as the individual determination reference value Mw1. Further, the maximum value D (ymin + wmax, z, hmin, xmin + wmax) of D (y + w, z, h, x + w) constituting S (ymin, z, hmin, xmin) is given as the individual determination reference value Mw2.

次に、図７のＳ１３０における位相ｈを固定した部分区間ｚの照合処理の詳細について図８のフローチャートを用いて説明する。Ｓ１３０においては、ｈ＝ｈｏに設定された後、図８に示した処理に移行する。まず、初期設定を行う（Ｓ１３１）。具体的には、最小不一致ビット数Ｓｍｉｎ１＝初期値Ｂｉｇ１、部分特徴ワードを特定する変数ｘ＝０、登録特徴ワードを特定する変数ｙ＝０、不一致ビット数が最小となるときのｘを特定する変数ｘｍｉｎ＝−１、不一致ビット数が最小となるときのｙを特定する変数ｙｍｉｎ＝−１に設定する。初期値Ｂｉｇ１は、合算不一致ビット数Ｓ（ｙ，ｚ，ｈ，ｘ）が取り得る値よりも十分に大きな値であれば良く、事前に設定されることになる。 Next, details of the collating process of the partial section z in which the phase h is fixed in S130 of FIG. 7 will be described using the flowchart of FIG. In S130, after h = ho is set, the process proceeds to the process shown in FIG. First, initial setting is performed (S131). Specifically, the minimum mismatch bit number Smin1 = initial value Big1, a variable x = 0 specifying a partial feature word, a variable y = 0 specifying a registered feature word, and x when the number of mismatch bits is minimized A variable xmin = −1 and a variable ymin = −1 that specifies y when the number of mismatch bits is minimized are set. The initial value Big1 may be a value that is sufficiently larger than the value that the total mismatch bit number S (y, z, h, x) can take, and is set in advance.

続いて、合算不一致ビット数Ｓ（ｙ，ｚ，ｈ，ｘ）の算出を行う（Ｓ１３２）。合算不一致ビット数Ｓ（ｙ，ｚ，ｈ，ｘ）については後述するようにエラー値が出力される場合もあるため、合算不一致ビット数Ｓ（ｙ，ｚ，ｈ，ｘ）が正常値として得られたら、最小不一致ビット数Ｓｍｉｎ１より小さいかどうかを判断する（Ｓ１３３）。Ｓ（ｙ，ｚ，ｈ，ｘ）がＳｍｉｎ１以下である場合に限り、Ｓ（ｙ，ｚ，ｈ，ｘ）の値をＳｍｉｎ１にセットする処理を行う（Ｓ１３４）。そして、変数ｘをインクリメントして（Ｓ１３５）、Ｓ１３２〜Ｓ１３４の処理を繰り返し、さらに部分特徴ワードを特定する変数ｘが部分特徴ワード数Ｘ（ｚ，ｈ）に達したら（Ｓ１３６）、ｘ＝０として登録特徴ワードを特定する変数ｙをインクリメントして（Ｓ１３７）、Ｓ１３２〜Ｓ１３６の処理を繰り返す。すなわち、まず、登録特徴ワードｙを固定して部分特徴ワードと登録特徴ワードの比較を行い、全ての部分特徴ワードに対して処理を終えたら、登録特徴ワードを変更し、次の登録特徴ワードについて処理を行う。このようにして、全ての登録特徴ワードについて処理を実行する。 Subsequently, the sum mismatch bit number S (y, z, h, x) is calculated (S132). Since an error value may be output for the sum mismatch bit number S (y, z, h, x) as described later, the sum mismatch bit number S (y, z, h, x) is obtained as a normal value. If it is determined, it is determined whether or not the minimum mismatch bit number Smin1 is smaller (S133). Only when S (y, z, h, x) is equal to or smaller than Smin1, a process of setting the value of S (y, z, h, x) to Smin1 is performed (S134). Then, the variable x is incremented (S135), the processes of S132 to S134 are repeated, and when the variable x specifying the partial feature word reaches the partial feature word number X (z, h) (S136), x = 0. The variable y specifying the registered feature word is incremented (S137), and the processing of S132 to S136 is repeated. That is, first, the registered feature word y is fixed, the partial feature word and the registered feature word are compared, and when the processing is completed for all the partial feature words, the registered feature word is changed, and the next registered feature word is changed. Process. In this way, processing is performed for all registered feature words.

そして、ｙが全登録特徴ワード数Ｙに達したかどうかを判断し（Ｓ１３８）、達していない場合は、Ｓ１３２に戻って、次の登録特徴ワードｙについて、最小不一致ビット数Ｓｍｉｎ１の算出処理を行う。各登録特徴ワードｙについて処理を実行し、変数ｙが全登録特徴ワード数Ｙに達したら、すなわちＹ個全ての登録特徴ワードに対する処理を終えたら、その時点における合算不一致ビット数Ｓ（ｙ，ｚ，ｈ，ｘ）の最小値Ｓｍｉｎ１を、位相固定時の部分区間ｚにおける最小不一致ビット数Ｓｍｉｎとして出力する。この最小不一致ビット数Ｓｍｉｎ１が図７のＳ１３０において得られることになる。 Then, it is determined whether or not y has reached the total number Y of registered feature words (S138). If not, the process returns to S132 to calculate the minimum mismatch bit number Smin1 for the next registered feature word y. Do. When processing is performed for each registered feature word y and the variable y reaches the total number Y of registered feature words, that is, when processing for all Y registered feature words is completed, the total number of mismatch bits S (y, z at that time) , H, x) is output as the minimum mismatch bit number Smin in the partial section z when the phase is fixed. This minimum mismatch bit number Smin1 is obtained in S130 of FIG.

次に、図８のＳ１３２における合算不一致ビット数Ｓ（ｙ，ｚ，ｈ，ｘ）の算出処理の詳細について図９のフローチャートを用いて説明する。本装置では、照合の際、照合ワード数であるＷ個の連続する特徴ワード同士の照合を行う。すなわち、連続するＷ個の登録特徴ワードと連続するＷ個の部分特徴ワードをそれぞれ先頭から順に照合していく。図９においては、まず、初期設定を行う（Ｓ１８１）。具体的には、合算不一致ビット数Ｓ（ｙ，ｚ，ｈ，ｘ）＝０、特徴ワードの照合個数を示す変数ｗ＝０、Ｄｍａｘ＝０、ｗｍａｘ＝０に設定する。初期設定後、登録特徴ワードの音量データＶｄ（ｙ＋ｗ）、部分特徴ワードの音量データＶ（ｚ，ｈ，ｘ＋ｗ）がともに０より大きいという条件を満たすかどうかを判断する（Ｓ１８２）。登録特徴ワードの音量データＶｄ（ｙ＋ｗ）、部分特徴ワードの音量データＶ（ｚ，ｈ，ｘ＋ｗ）は、それぞれ原音響データ全体、部分音響データに対して〔数式５〕および〔数式９〕に従った処理を実行し、“Vol”および“Vol(h)”として算出されたものである。Ｓ１８２における判断の結果、登録特徴ワードの音量データＶｄ（ｙ＋ｗ）、部分特徴ワードの音量データＶ（ｚ，ｈ，ｘ＋ｗ）がともに０より大きいという条件を満たす場合に限り、登録特徴ワード１つと部分特徴ワード１つを比較した場合の、不一致ビット数であるワード単位不一致ビット数Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）を算出する（Ｓ１８３）。 Next, details of the calculation processing of the total mismatch bit number S (y, z, h, x) in S132 of FIG. 8 will be described using the flowchart of FIG. In this apparatus, at the time of collation, collation of W consecutive feature words, which is the number of collation words, is performed. That is, the consecutive W registered feature words and the consecutive W partial feature words are collated in order from the top. In FIG. 9, first, initial setting is performed (S181). Specifically, the sum mismatch bit number S (y, z, h, x) = 0, variables w = 0, Dmax = 0, and wmax = 0 indicating the number of feature word collations are set. After the initial setting, it is determined whether or not the condition that the volume data Vd (y + w) of the registered feature word and the volume data V (z, h, x + w) of the partial feature word are both greater than 0 is satisfied (S182). The volume data Vd (y + w) of the registered feature word and the volume data V (z, h, x + w) of the partial feature word follow [Equation 5] and [Equation 9] for the whole original sound data and the partial sound data, respectively. Are calculated as “Vol” and “Vol (h)”. As a result of the determination in S182, only when one of the registered feature words and the partial feature word volume data Vd (y + w) and the partial feature word volume data V (z, h, x + w) satisfy the condition that both are greater than 0, The word unit mismatch bit number D (y + w, z, h, x + w), which is the number of mismatch bits when one feature word is compared, is calculated (S183).

次に、ワード単位不一致ビット数Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）と所定値Ｄｍａｘの比較を行う（Ｓ１８４）。Ｓ１８４における判断の結果、ワード単位不一致ビット数Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）が所定値Ｄｍａｘより大きい場合に限り、ワード単位不一致ビット数Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）の値をＤｍａｘに設定する（Ｓ１８５）。そして、ｗの値をｗｍａｘに設定する。続いて、ワード単位不一致ビット数Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）を合算不一致ビット数Ｓ（ｙ，ｚ，ｈ，ｘ）に加算する処理を行う（Ｓ１８５）。Ｓ１８５においては、特徴ワードの照合個数を示す変数ｗをインクリメントし、照合する特徴ワードｗを次の特徴ワードｗに変更する処理も行う。Ｓ１８４における判断の結果に係らず、Ｓ１８６の処理は実行されることになる。 Next, the word unit mismatch bit number D (y + w, z, h, x + w) is compared with a predetermined value Dmax (S184). As a result of the determination in S184, the value of the word unit mismatch bit number D (y + w, z, h, x + w) is set to Dmax only when the word unit mismatch bit number D (y + w, z, h, x + w) is larger than the predetermined value Dmax. (S185). Then, the value of w is set to wmax. Subsequently, a process of adding the word unit mismatch bit number D (y + w, z, h, x + w) to the total mismatch bit number S (y, z, h, x) is performed (S185). In S185, the variable w indicating the number of feature words to be collated is incremented and the feature word w to be collated is changed to the next feature word w. Regardless of the result of the determination in S184, the process of S186 is executed.

そして、特徴ワードの照合個数を示す変数ｗが所定数Ｗに達したかどうかを判断し（Ｓ１８７）、達していない場合は、Ｓ１８２に戻って、次の特徴ワードについて処理を行う。各特徴ワードについて処理を実行し、変数ｗが所定数Ｗに達したら、その時点における合算不一致ビット数Ｓ（ｙ，ｚ，ｈ，ｘ）を、あるｘ，ｈ，ｙについての合算不一致ビット数Ｓ（ｙ，ｚ，ｈ，ｘ）として出力する。この合算不一致ビット数Ｓ（ｙ，ｚ，ｈ，ｘ）が図８のＳ１３２において得られることになる。なお、Ｓ１８２において条件を満たさないと判断された場合には、合算不一致ビット数の算出エラー値（負値）が出力される。 Then, it is determined whether or not the variable w indicating the number of collated feature words has reached the predetermined number W (S187). If not, the process returns to S182 and the next feature word is processed. When processing is performed for each feature word and the variable w reaches a predetermined number W, the total mismatch bit number S (y, z, h, x) at that time is calculated as the total mismatch bit number for a certain x, h, y. Output as S (y, z, h, x). This sum mismatch bit number S (y, z, h, x) is obtained in S132 of FIG. When it is determined in S182 that the condition is not satisfied, a calculation error value (negative value) for the number of unmatched bits is output.

次に、図７のＳ１５０における位相ｈを変化させた部分区間ｚの照合処理の詳細について図１０のフローチャートを用いて説明する。まず、初期設定を行う（Ｓ１５１）。具体的には、最小不一致ビット数Ｓｍｉｎ２＝初期値Ｂｉｇ２、位相を特定する変数ｈ＝０、不一致ビット数が最小となるときのｈを特定する変数ｈｍｉｎ＝−１に設定する。初期値Ｂｉｇ２は、合算不一致ビット数Ｓ（ｙｍｉｎ，ｚ，ｈ，ｘｍｉｎ）が取り得る値よりも十分に大きな値であれば良く、事前に設定されることになる。 Next, details of the collating process of the partial section z in which the phase h is changed in S150 of FIG. 7 will be described using the flowchart of FIG. First, initial setting is performed (S151). Specifically, the minimum mismatch bit number Smin2 = initial value Big2, the variable h = 0 for specifying the phase, and the variable hmin = −1 for specifying h when the mismatch bit number is minimized are set. The initial value Big2 may be any value that is sufficiently larger than the value that the total mismatch bit number S (ymin, z, h, xmin) can take, and is set in advance.

続いて、特定された位相ｈについて、部分区間ｚから生成された部分特徴ワード群と、登録特徴ワード群との照合を行う（Ｓ１５２）。Ｓ１５２の具体的な処理は、Ｓ１３０と同様、図８に従って実行される。Ｓ１５２の照合の結果、特定された位相ｈにおける部分区間ｚについて、最小合算不一致ビット数Ｓ（ｙｍｉｎ，ｚ，ｈ，ｘｍｉｎ）が得られる。 Subsequently, for the specified phase h, the partial feature word group generated from the partial section z is collated with the registered feature word group (S152). The specific process of S152 is executed according to FIG. 8 as in S130. As a result of the collation in S152, the minimum sum mismatch bit number S (ymin, z, h, xmin) is obtained for the partial interval z in the specified phase h.

最小合算不一致ビット数Ｓ（ｙｍｉｎ，ｚ，ｈ，ｘｍｉｎ）が得られたら、最小不一致ビット数Ｓｍｉｎ２との比較を行う（Ｓ１５３）。Ｓ（ｙｍｉｎ，ｚ，ｈ，ｘｍｉｎ）がＳｍｉｎ２以下である場合に限り、Ｓ（ｙｍｉｎ，ｚ，ｈ，ｘｍｉｎ）の値をＳｍｉｎ２に設定する処理を行う（Ｓ１５４）。そして、変数ｈをインクリメントして（Ｓ１５５）、Ｓ１５２〜Ｓ１５４の処理を繰り返す。すなわち、各位相ｈについて、Ｓ１５２〜Ｓ１５４の処理を繰り返し、各位相ｈの最小合算不一致ビット数Ｓ（ｙｍｉｎ，ｚ，ｈ，ｘｍｉｎ）を求め、その中で最小のものを最小不一致ビット数Ｓｍｉｎ２とする処理を行う。 When the minimum total mismatch bit number S (ymin, z, h, xmin) is obtained, the comparison with the minimum mismatch bit number Smin2 is performed (S153). Only when S (ymin, z, h, xmin) is equal to or smaller than Smin2, a process of setting the value of S (ymin, z, h, xmin) to Smin2 is performed (S154). Then, the variable h is incremented (S155), and the processes of S152 to S154 are repeated. That is, for each phase h, the processes of S152 to S154 are repeated to obtain the minimum sum mismatch bit number S (ymin, z, h, xmin) of each phase h, and the smallest one is the minimum mismatch bit number Smin2. Perform the process.

ｈが全位相数Ｈに達したかどうかを判断し（Ｓ１５６）、達していない場合は、Ｓ１５２に戻って、次の位相ｈについて、最小合算不一致ビット数Ｓ（ｙｍｉｎ，ｚ，ｈ，ｘｍｉｎ）の算出処理を行う。各位相ｈについて処理を実行し、変数ｈが全位相数Ｈに達したら、すなわちＨ個全ての位相に対する処理を終えたら、その時点における最小合算不一致ビット数Ｓ（ｙｍｉｎ，ｚ，ｈ，ｘｍｉｎ）の最小値Ｓｍｉｎ２を、位相変更時の部分区間ｚにおける最小不一致ビット数Ｓｍｉｎ２として出力する。この最小不一致ビット数Ｓｍｉｎ２が図７のＳ１５０において得られることになる。 It is determined whether or not h has reached the total number of phases H (S156). If not, the process returns to S152, and the minimum number of unmatched bits S (ymin, z, h, xmin) for the next phase h. The calculation process is performed. When the processing is executed for each phase h and the variable h reaches the total number of phases H, that is, when the processing for all the H phases is completed, the minimum sum mismatch bit number S (ymin, z, h, xmin) at that time The minimum value Smin2 is output as the minimum mismatch bit number Smin2 in the partial interval z at the time of phase change. This minimum mismatch bit number Smin2 is obtained in S150 of FIG.

＜１．３．ワード単位不一致ビット数の算出＞
図９のＳ１８３におけるワード単位不一致ビット数Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）の算出について説明する。ワード単位不一致ビット数Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）の算出については、利用者により設定される音量判定モードにより具体的な処理内容が異なる。音量判定モードとしては、Ｏｆｆ、Ｗｅｉｇｈｔ、Ｍａｔｃｈ、Ｂｏｔｈの４つが存在する。 <1.3. Calculation of word unit mismatch bit count>
The calculation of the word unit mismatch bit number D (y + w, z, h, x + w) in S183 in FIG. 9 will be described. Regarding the calculation of the word unit mismatch bit number D (y + w, z, h, x + w), the specific processing contents differ depending on the sound volume determination mode set by the user. There are four volume determination modes: Off, Weight, Match, and Both.

＜１．３．１．音量判定モード“Ｏｆｆ”＞
音量判定モード“Ｏｆｆ”は重みを付加しないモードであり、音量判定モード“Ｏｆｆ”が設定されている場合、ワード単位不一致ビット数Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）は、そのままワード単位の相違の程度を示すワード単位相違度Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）となる。音量判定モード“Ｏｆｆ”の場合、ワード単位不一致ビット数Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）＝０として初期値を設定した後、Ｆｄ（ｙ＋ｗ）とＦ（ｚ，ｈ，ｘ＋ｗ）の３２ビットを対応するビット単位に順次比較し、ビットが異なるごとに、Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）に１加算していく。登録特徴ワード群、部分特徴ワード群のいずれにおいても、特徴ワードの特徴パターンは同様の規則で作成され、低周波成分をＬＳＢ、高周波成分をＭＳＢとした３２ビットの構成であるので、照合はこれらの各ビット値が一致するかどうかにより行うことができる。 <1.3.1. Volume judgment mode “Off”>
The sound volume determination mode “Off” is a mode in which no weight is added. When the sound volume determination mode “Off” is set, the word unit mismatch bit number D (y + w, z, h, x + w) is the difference in word units as it is. Is a word unit dissimilarity D (y + w, z, h, x + w) indicating the degree of. In the sound volume judgment mode “Off”, the initial value is set with the word unit mismatch bit number D (y + w, z, h, x + w) = 0, and then 32 bits of Fd (y + w) and F (z, h, x + w). Are sequentially compared in corresponding bit units, and 1 is added to D (y + w, z, h, x + w) every time the bits differ. In both the registered feature word group and the partial feature word group, the feature pattern of the feature word is created according to the same rule, and has a 32-bit configuration with the low frequency component as LSB and the high frequency component as MSB. This can be done depending on whether or not the bit values match.

＜１．３．２．音量判定モード“Ｗｅｉｇｈｔ”＞
音量判定モード“Ｗｅｉｇｈｔ”は重みを付加するモードであり、音量判定モード“Ｗｅｉｇｈｔ” が設定されている場合、ワード単位不一致ビット数Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）＝０として初期値を設定した後、Ｆｄ（ｙ＋ｗ）とＦ（ｚ，ｈ，ｘ＋ｗ）の３２ビットを対応するビット単位に順次比較する。比較の結果に基づき、以下の〔数式１６〕に従った処理を実行して、Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）の値を定める。この結果、Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）はワード単位不一致ビット数ではなく、音量データを加味したワード単位の相違の程度を示すワード単位相違度となる。このモードは、登録特徴ワードと検索特徴ワードとの照合の際、検索特徴ワードの基礎となる検索音響データがアナログ変換などの信号処理を伴っていて、原音響データと音量の相対変化と音量の絶対値の双方が異なる場合に適切な照合結果を与える。 <1.3.2. Volume judgment mode “Weight”>
The volume determination mode “Weight” is a mode for adding a weight, and when the volume determination mode “Weight” is set, the initial value is set with the word unit mismatch bit number D (y + w, z, h, x + w) = 0. After that, 32 bits of Fd (y + w) and F (z, h, x + w) are sequentially compared in corresponding bit units. Based on the result of the comparison, processing according to the following [Equation 16] is executed to determine the value of D (y + w, z, h, x + w). As a result, D (y + w, z, h, x + w) is not a word unit mismatch bit number but a word unit dissimilarity indicating the degree of difference in word units taking into account volume data. In this mode, when the registered feature word is matched with the search feature word, the search acoustic data that is the basis of the search feature word is accompanied by signal processing such as analog conversion. Appropriate matching results are given when both absolute values are different.

〔数式１６〕
Ｆｄ（ｙ＋ｗ）側がビット１で、Ｆ（ｚ，ｈ，ｘ＋ｗ）側がビット０の場合、
Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）←Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）＋Ｖｄ（ｙ＋ｗ）・２／｛Ｖｄ（ｙ＋ｗ）＋Ｖ（ｚ，ｈ，ｘ＋ｗ）｝
Ｆｄ（ｙ＋ｗ）側がビット０で、Ｆ（ｚ，ｈ，ｘ＋ｗ）側がビット１の場合、
Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）←Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）＋Ｖ（ｚ，ｈ，ｘ＋ｗ）・２／｛Ｖｄ（ｙ＋ｗ）＋Ｖ（ｚ，ｈ，ｘ＋ｗ）｝ [Formula 16]
When the Fd (y + w) side is bit 1 and the F (z, h, x + w) side is bit 0,
D (y + w, z, h, x + w) ← D (y + w, z, h, x + w) + Vd (y + w) · 2 / {Vd (y + w) + V (z, h, x + w)}
When the Fd (y + w) side is bit 0 and the F (z, h, x + w) side is bit 1,
D (y + w, z, h, x + w) ← D (y + w, z, h, x + w) + V (z, h, x + w) · 2 / {Vd (y + w) + V (z, h, x + w)}

＜１．３．３．音量判定モード“Ｍａｔｃｈ”＞
音量判定モード“Ｍａｔｃｈ”が設定されている場合、まず、音量判定モード“Ｏｆｆ”の場合の処理を行って、Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）を得る。そして、以下の〔数式１７〕に従った処理を実行することにより、重みを乗算してＤ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）の値を定める。この結果、Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）はワード単位不一致ビット数ではなく、音量データの変動パターンの相違分を加味したワード単位の相違の程度を示すワード単位相違度となる。このモードは、同一の原音響データから生成される登録特徴ワードと部分特徴ワードとの照合においては意味をもたないが、後述する登録特徴ワードと検索特徴ワードとの照合の際、検索特徴ワードの基礎となる検索音響データが各種データ圧縮などの信号処理を伴っていて、原音響データと音量の相対変化にはあまり相違がないが、絶対値が異なる場合に適切な照合結果を与える。本実施形態では本モードが最も推奨される。 <1.3.3. Volume judgment mode “Match”>
When the sound volume determination mode “Match” is set, first, processing in the sound volume determination mode “Off” is performed to obtain D (y + w, z, h, x + w). Then, by executing processing according to the following [Equation 17], the weight is multiplied to determine the value of D (y + w, z, h, x + w). As a result, D (y + w, z, h, x + w) is not a word unit mismatch bit number but a word unit dissimilarity indicating the degree of difference in word units taking into account the difference in the fluctuation pattern of the volume data. This mode has no meaning in collation between registered feature words and partial feature words generated from the same original sound data, but when collating registered feature words and search feature words, which will be described later, search feature words The search acoustic data that is the basis of the above is accompanied by signal processing such as various data compression, and there is not much difference between the relative change of the original sound data and the volume, but an appropriate collation result is given when the absolute values are different. In this embodiment, this mode is most recommended.

〔数式１７〕
Ｖｄ（ｙ＋ｗ）・Ｖｄ（ｙ＋ｗ−１）＞Ｖ（ｚ，ｈ，ｘ＋ｗ）・Ｖ（ｚ，ｈ，ｘ＋ｗ−１）の場合、
Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）←Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）・Ｖｄ（ｙ＋ｗ）・Ｖｄ（ｙ＋ｗ−１）／｛Ｖ（ｚ，ｈ，ｘ＋ｗ）・Ｖ（ｚ，ｈ，ｘ＋ｗ−１）｝
Ｖｄ（ｙ＋ｗ）・Ｖｄ（ｙ＋ｗ−１）＜Ｖ（ｚ，ｈ，ｘ＋ｗ）・Ｖ（ｚ，ｈ，ｘ＋ｗ−１）の場合、
Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）←Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）・Ｖ（ｚ，ｈ，ｘ＋ｗ）・Ｖ（ｚ，ｈ，ｘ＋ｗ−１）／｛Ｖｄ（ｙ＋ｗ）・Ｖｄ（ｙ＋ｗ−１）｝ [Formula 17]
When Vd (y + w) · Vd (y + w−1)> V (z, h, x + w) · V (z, h, x + w−1),
D (y + w, z, h, x + w) ← D (y + w, z, h, x + w) .Vd (y + w) .Vd (y + w-1) / {V (z, h, x + w) .V (z, h, x + w-1)}
When Vd (y + w) · Vd (y + w−1) <V (z, h, x + w) · V (z, h, x + w−1),
D (y + w, z, h, x + w) ← D (y + w, z, h, x + w) · V (z, h, x + w) · V (z, h, x + w−1) / {Vd (y + w) · Vd ( y + w-1)}

なお、ｗ＝０の場合、上記〔数式１７〕において、Ｖｄ（ｙ＋ｗ−１）＝Ｖｄ（ｙ＋ｗ）およびＶ（ｚ，ｈ，ｘ＋ｗ−１）＝Ｖ（ｚ，ｈ，ｘ＋ｗ）とする。 When w = 0, in the above [Equation 17], Vd (y + w−1) = Vd (y + w) and V (z, h, x + w−1) = V (z, h, x + w).

＜１．３．４．音量判定モード“Ｂｏｔｈ”＞
音量判定モード“Ｂｏｔｈ”が設定されている場合、まず、音量判定モード“Ｗｅｉｇｈｔ”の場合の処理を行って、Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）を得る。そして、以下の〔数式１８〕に従った処理を実行することにより、重みを乗算してＤ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）の値を定める。この結果、Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）はワード単位不一致ビット数ではなく、音量データを加味したワード単位の相違の程度を示すワード単位相違度となる。このモードは、同一の原音響データから生成される登録特徴ワードと部分特徴ワードとの照合においては意味をもたないが、後述する登録特徴ワードと検索特徴ワードとの照合の際、検索特徴ワードの基礎となる検索音響データが波形歪みを伴う高い圧縮率のデータ圧縮やアナログ変換などの信号処理を伴っていて、原音響データと音量の相対変化と音量の絶対値の双方が顕著に異なる場合に適切な照合結果を与える。 <1.3.4. Volume judgment mode “Both”>
When the sound volume determination mode “Both” is set, first, processing in the sound volume determination mode “Weight” is performed to obtain D (y + w, z, h, x + w). Then, by executing processing according to the following [Equation 18], the value of D (y + w, z, h, x + w) is determined by multiplying the weight. As a result, D (y + w, z, h, x + w) is not a word unit mismatch bit number but a word unit dissimilarity indicating the degree of difference in word units taking into account volume data. This mode has no meaning in collation between registered feature words and partial feature words generated from the same original sound data, but when collating registered feature words and search feature words, which will be described later, search feature words If the search acoustic data that is the basis of the sound is accompanied by signal processing such as high compression ratio data compression or analog conversion with waveform distortion, both the relative change in volume and the absolute value of volume are significantly different from the original sound data Give an appropriate match result.

〔数式１８〕
Ｖｄ（ｙ＋ｗ）・Ｖ（ｚ，ｈ，ｘ＋ｗ−１）＞Ｖ（ｚ，ｈ，ｘ＋ｗ）・Ｖｄ（ｙ＋ｗ−１）の場合、
Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）←Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）・Ｖｄ（ｙ＋ｗ）・Ｖ（ｚ，ｈ，ｘ＋ｗ−１）／｛Ｖ（ｚ，ｈ，ｘ＋ｗ）・Ｖｄ（ｙ＋ｗ−１）｝
Ｖｄ（ｙ＋ｗ）・Ｖ（ｚ，ｈ，ｘ＋ｗ−１）＜Ｖ（ｚ，ｈ，ｘ＋ｗ）・Ｖｄ（ｙ＋ｗ−１）の場合、
Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）←Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）・Ｖ（ｚ，ｈ，ｘ＋ｗ）・Ｖｄ（ｙ＋ｗ−１）／｛Ｖｄ（ｙ＋ｗ）・Ｖ（ｚ，ｈ，ｘ＋ｗ−１）｝ [Formula 18]
When Vd (y + w) · V (z, h, x + w−1)> V (z, h, x + w) · Vd (y + w−1),
D (y + w, z, h, x + w) ← D (y + w, z, h, x + w) · Vd (y + w) · V (z, h, x + w−1) / {V (z, h, x + w) · Vd ( y + w-1)}
If Vd (y + w) · V (z, h, x + w−1) <V (z, h, x + w) · Vd (y + w−1),
D (y + w, z, h, x + w) ← D (y + w, z, h, x + w) · V (z, h, x + w) · Vd (y + w−1) / {Vd (y + w) · V (z, h, x + w-1)}

なお、ｗ＝０の場合、上記〔数式１８〕において、Ｖｄ（ｙ＋ｗ−１）＝Ｖｄ（ｙ＋ｗ）およびＶ（ｚ，ｈ，ｘ＋ｗ−１）＝Ｖ（ｚ，ｈ，ｘ＋ｗ）とする。 When w = 0, in the above [Equation 18], Vd (y + w−1) = Vd (y + w) and V (z, h, x + w−1) = V (z, h, x + w).

音量判定モード“Ｏｆｆ”以外の場合、Ｄ（ｙ＋ｗ，ｚ，ｈ，ｘ＋ｗ）はワード単位不一致ビット数ではなく、音量データを加味したワード単位の相違の程度を示すワード単位相違度となるため、図８、図９におけるＳ（ｙ，ｚ，ｈ，ｘ）は合算不一致ビット数ではなく、合算相違度を表すことになる。また、図１０、図８におけるＳｍｉｎ１、Ｓｍｉｎ２は最小不一致ビット数ではなく、最小相違度を表すことになる。 In the case other than the sound volume determination mode “Off”, D (y + w, z, h, x + w) is not the word unit mismatch bit number but the word unit dissimilarity indicating the degree of difference of the word unit considering the sound volume data. In FIG. 8 and FIG. 9, S (y, z, h, x) represents the total dissimilarity, not the total mismatch bit number. Further, Smin1 and Smin2 in FIGS. 10 and 8 represent not the minimum mismatch bit number but the minimum difference.

登録手段３５は、各原音響データについて、その原音響データについての関連情報、音響データＩＤ、登録特徴ワード群、代表登録特徴データ、個別判定参考値を対応付けて音響データベース４０に登録する。関連情報としては、当該原音響データに関連する情報であれば、どのようなものでも良いが、例えば、当該原音響データが楽曲であれば、曲名や演奏者名、当該原音響データがＣＭ音声であれば、そのスポンサー企業の名前やＵＲＬ等を用いることができる。ただし、当該原音響データの制作・マスタリングに使用した一連のバイナリ形式の素材データ（ミックスダウンする前の個別の録音データ、ＭＩＤＩ打ち込みデータ）等は著作権法上の制約により通常対象外とする。個別判定参考値Ｍｕ、Ｍ１、Ｍ２、Ｍｗ１、Ｍｗ２は、各レコードｒごとに、Ｍｕ（ｒ）、Ｍ１（ｒ）、Ｍ２（ｒ）、Ｍｗ１（ｒ）、Ｍｗ２（ｒ）として登録される。 The registration means 35 registers the related information, the sound data ID, the registered feature word group, the representative registered feature data, and the individual determination reference value for each original sound data in the sound database 40 in association with each other. The related information may be any information as long as it is related to the original sound data. For example, if the original sound data is a song, the name of the song, the name of the player, and the original sound data are CM audio. If so, the name or URL of the sponsoring company can be used. However, a series of binary-format material data (individual recording data before mixing down, MIDI input data) used for production / mastering of the original sound data is not normally subject to restrictions due to copyright laws. The individual determination reference values Mu, M1, M2, Mw1, and Mw2 are registered as Mu (r), M1 (r), M2 (r), Mw1 (r), and Mw2 (r) for each record r.

＜２．関連情報検索装置＞
次に、本発明に係る音響データの関連情報検索装置（以下「関連情報検索装置」という。）について説明する。図１１は、関連情報検索装置のハードウェア構成図である。関連情報検索装置は、関連情報登録装置と同様、汎用のコンピュータで実現することができ、図１１に示すように、ＣＰＵ３ａ（CPU: Central Processing Unit）と、コンピュータのメインメモリであるＲＡＭ３ｂ（RAM: Random Access Memory）と、データを記憶するための大容量のデータ記憶装置３ｃ（例えば，ハードディスク）と、ＣＰＵが実行するプログラムを記憶するためのプログラム記憶装置３ｄ（例えば，ハードディスク）と、キーボード、マウス等のキー入力Ｉ／Ｆ３ｅと、外部デバイス（データ記憶媒体）とデータ通信するためのデータ入出力インターフェース３ｆと、表示デバイス（ディスプレイ）に情報を送出するための表示出力インターフェース３ｇと、を備え、互いにバスを介して接続されている。 <2. Related information search device>
Next, a related information search device for acoustic data according to the present invention (hereinafter referred to as “related information search device”) will be described. FIG. 11 is a hardware configuration diagram of the related information search apparatus. Similar to the related information registration device, the related information search device can be realized by a general-purpose computer. As shown in FIG. 11, a CPU 3a (CPU: Central Processing Unit) and a RAM 3b (RAM: main memory of the computer) Random Access Memory), a large-capacity data storage device 3c (for example, hard disk) for storing data, a program storage device 3d (for example, hard disk) for storing programs executed by the CPU, a keyboard, and a mouse A key input I / F 3e, a data input / output interface 3f for data communication with an external device (data storage medium), and a display output interface 3g for sending information to a display device (display). They are connected to each other via a bus.

関連情報検索装置のプログラム記憶装置３ｄには、ＣＰＵ３ａを動作させ、コンピュータを、関連情報検索装置として機能させるための専用のプログラムが実装されている。また、データ記憶装置３ｃは、登録特徴ワード、代表登録特徴データ等を関連情報と対応付けて記憶されており、音響データベースとして機能するとともに、処理に必要な様々なデータを記憶する。図１１では、単体のコンピュータで実現した例を示したが、音響データベースが稼動されているサーバーコンピュータとネットワークで接続されている高性能な演算処理機能を備えているパーソナルコンピュータが、専用のプログラムに従って各手段の内容を実行するようにしても良い。 The program storage device 3d of the related information search device is mounted with a dedicated program for operating the CPU 3a and causing the computer to function as the related information search device. The data storage device 3c stores registered feature words, representative registered feature data, and the like in association with related information, functions as an acoustic database, and stores various data necessary for processing. FIG. 11 shows an example realized by a single computer. However, a personal computer having a high-performance arithmetic processing function connected to a server computer on which an acoustic database is operated via a network follows a dedicated program. The contents of each means may be executed.

図１２は、本発明に係る関連情報検索装置の機能ブロック図である。図１２において、４０は音響データベース、４５はモード設定手段、５０は検索特徴ワード生成手段、６０は代表検索特徴データ生成手段、７０は照合範囲決定手段、８０は代表特徴データ照合手段、９０は特徴ワード照合手段、１００は情報出力手段である。関連情報検索装置は、利用者が保有している検索音響データを用いて、音響データベースに登録されている原音響データに関する関連情報を検索音響データに関連する関連情報として検索するものである。検索音響データとは、検索に用いる音響データである。検索の際、検索音響データから生成した特徴ワードである検索特徴ワードと、あらかじめ音響データベース４０に登録されている登録特徴ワードの照合を行う必要がある。そのため、検索特徴ワードと登録特徴ワードは基本的に同一の構造である必要がある（なお、前者の検索特徴ワード群は位相を変化させた複数（Ｈ個）の特徴ワード群のセットが生成される）。検索特徴ワードと登録特徴ワードの基になる検索音響データと原音響データは種々の符号化形式で圧縮され、入手形態により互いに異なる符号化形式になるのが一般的であるため、同一の符号化形式になるように変換する必要がある。本実施形態では、検索音響データも原音響データも同じ仕様（サンプリング周波数:44.1kHz、量子化ビット数:16bits、チャンネル数:1・モノラルといったＰＣＭ形式のパラメータ）のＰＣＭ形式になるように変換し統一させるようにしている。 FIG. 12 is a functional block diagram of the related information search device according to the present invention. In FIG. 12, 40 is an acoustic database, 45 is a mode setting means, 50 is a search feature word generation means, 60 is a representative search feature data generation means, 70 is a collation range determination means, 80 is a representative feature data collation means, and 90 is a feature. A word collating unit 100 is an information output unit. The related information search device searches for related information related to the original sound data registered in the sound database as related information related to the search sound data using the search sound data held by the user. The search sound data is sound data used for search. At the time of search, it is necessary to collate a search feature word, which is a feature word generated from the search acoustic data, with a registered feature word registered in the acoustic database 40 in advance. Therefore, the search feature word and the registered feature word need to have basically the same structure (note that the former search feature word group generates a set of multiple (H) feature word groups with different phases. ) The search audio data and the original audio data that are the basis of the search feature word and the registered feature word are compressed in various encoding formats, and are generally in different encoding formats depending on the acquisition form. It needs to be converted to a format. In this embodiment, the search sound data and the original sound data are converted into the PCM format having the same specifications (sampling frequency: 44.1 kHz, quantization bit number: 16 bits, channel number: 1 / monophonic PCM format parameter). I try to unify.

モード設定手段４５は、関連情報検索装置が備えている複数のモードの中からいずれのモードに従って処理を行うかを設定するものであり、キーボード、マウス等の入力機器およびキー入力Ｉ／Ｆ３ｅにより実現される。設定可能なモードとしては、検索音響データモードおよび音量判定モードが用意されている。検索音響データモードは、検索音響データの状態を示すものであり、イントロ検索、全尺検索の２つが選択可能になっている。イントロ検索とは、検索音響データが、音響素材（楽曲の場合、原曲を意味する。）の先頭を含む場合であり、全尺検索は、検索音響データが音響素材から一切切り出しを行われたものでなく、音響素材全てを用いた場合、すなわち、音響素材と同一の時間的長さをもつ場合に対応するものである。音量判定モードは、後述するワード単位相違度Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）に音量成分をどのように加味するかを設定するものであり、Ｏｆｆ、Ｗｅｉｇｈｔ、Ｍａｔｃｈ、Ｂｏｔｈの４つが選択可能になっている。 The mode setting unit 45 sets which mode is to be used for processing from among a plurality of modes provided in the related information search apparatus, and is realized by an input device such as a keyboard and a mouse and a key input I / F 3e. Is done. As modes that can be set, a search sound data mode and a sound volume determination mode are prepared. The search sound data mode indicates the state of the search sound data, and can be selected from an intro search and a full scale search. Intro search is a case where the search sound data includes the beginning of the sound material (in the case of music, it means the original music), and in the full length search, the search sound data is completely cut out from the sound material. This corresponds to the case where all the acoustic material is used, that is, the same time length as the acoustic material. The volume determination mode sets how to add a volume component to the word unit dissimilarity D (r, y + w, h, x + w), which will be described later, and can be selected from Off, Weight, Match, and Both. It has become.

検索特徴ワード生成手段５０は、図２に示した登録特徴ワード生成手段１０と同様、読み込んだ音響フレームを利用して、周波数解析を行い、その検索音響データの特徴を表現した特徴ワードを生成する機能を有している。ただし、位相をずらした複数（Ｈ個）の特徴ワード群のセットを生成するようにしている。代表検索特徴データ生成手段６０は、図２に示した代表登録特徴データ生成手段２０と同様、検索特徴ワード生成手段５０により生成された検索特徴ワード群を用いて、１つの検索音響データにつき、１つの代表検索特徴データを生成する機能を有する。検索特徴ワードは、検索音響データの部分的な特徴を表現するのに対して、代表検索特徴データは、１つの検索音響データの全体的な特徴を表現する。 Similar to the registered feature word generation unit 10 shown in FIG. 2, the search feature word generation unit 50 performs frequency analysis using the read sound frame and generates a feature word expressing the features of the search sound data. It has a function. However, a set of a plurality of (H) feature word groups whose phases are shifted is generated. Similar to the representative registered feature data generation unit 20 shown in FIG. 2, the representative search feature data generation unit 60 uses the search feature word group generated by the search feature word generation unit 50, for each search acoustic data. It has a function of generating one representative search feature data. The search feature word expresses a partial feature of the search acoustic data, while the representative search feature data expresses the overall feature of one search acoustic data.

代表特徴データ照合手段８０は、生成した代表検索特徴データと、音響データベース４０に登録されている代表登録特徴データとの照合を行う機能を有している。特徴ワード照合手段９０は、生成した検索特徴ワードと、音響データベース４０に登録されている登録特徴ワードとの照合を行う機能を有している。情報出力手段１００は、特徴ワード照合手段９０による照合の結果、検索音響データの特徴に類似する原音響データについての関連情報を、音響データベース４０から抽出して出力する機能を有している。 The representative feature data matching unit 80 has a function of matching the generated representative search feature data with the representative registered feature data registered in the acoustic database 40. The feature word collating unit 90 has a function of collating the generated search feature word with the registered feature word registered in the acoustic database 40. The information output unit 100 has a function of extracting and outputting related information about the original sound data similar to the feature of the searched sound data as a result of the matching by the feature word matching unit 90 from the sound database 40.

ここで、検索特徴ワード生成手段５０、代表検索特徴データ生成手段６０、代表特徴データ照合手段８０、特徴ワード照合手段９０の関係の概要を図１３を用いて説明しておく。図１３に示すように、原音響データについては、その原音響データに対して特徴ワード生成処理を実行し、得られた複数個の特徴ワードを音響データベースに登録しておく。さらに、複数個の特徴ワードに対して代表特徴データ生成処理を実行し、原音響データごとに１つの代表特徴データを登録しておく。検索時には、代表検索特徴データ生成手段６０が、検索音響データに対して特徴ワード生成処理を実行した後、代表検索特徴データ生成手段６０が、特徴ワードに対して代表特徴データ生成処理を実行し、１つの代表特徴データを生成する。そして、代表特徴データ照合手段８０が、検索音響データから得られた代表特徴データと音響データベース内の各代表特徴データの照合処理を行う。そして、照合の結果、条件を満たす原音響データに対してのみ、特徴ワード照合手段９０が、特徴ワード同士の照合を行う。 Here, an outline of the relationship among the search feature word generation unit 50, the representative search feature data generation unit 60, the representative feature data collation unit 80, and the feature word collation unit 90 will be described with reference to FIG. As shown in FIG. 13, for the original sound data, a feature word generation process is executed on the original sound data, and a plurality of obtained feature words are registered in the sound database. Further, representative feature data generation processing is executed for a plurality of feature words, and one representative feature data is registered for each original sound data. At the time of the search, the representative search feature data generation unit 60 executes the feature word generation process on the search acoustic data, and then the representative search feature data generation unit 60 executes the representative feature data generation process on the feature word, One representative feature data is generated. Then, the representative feature data matching unit 80 performs a matching process between the representative feature data obtained from the searched sound data and each representative feature data in the sound database. As a result of the collation, the feature word collating unit 90 collates the feature words only with respect to the original sound data that satisfies the condition.

このように、まず、代表特徴データ照合手段８０が、音響データごとに１つだけ生成され、音響データ全体の特徴を表現した代表特徴データを用いて絞込みを行うことにより、大きく相違している音響データを対象から除外する。そして、比較的類似している原音響データに対して、特徴ワード照合手段９０が、部分的特徴を表現した特徴ワードを用いて照合を行うことにより、的確な検索を行うことができる。 In this way, first, the representative feature data matching unit 80 generates only one for each acoustic data, and narrows down using the representative feature data representing the characteristics of the entire acoustic data, so that the acoustic features that are greatly different Exclude data from the target. And the characteristic word collation means 90 collates with respect to the comparatively similar original sound data using the characteristic word which expressed the partial feature, and can perform an exact search.

＜２．２．関連情報検索装置の処理動作＞
続いて、図１２に示した装置の処理動作について説明する。まず、検索オペレータが保有している検索音響データを検索したいと思った場合、関連情報検索装置に対して起動の指示を行い、起動後、検索対象とする検索音響データを指定する。これは、キー入力Ｉ／Ｆ３ｅを介して所定のコンピュータ画面上のボタンを操作し、関連情報検索装置のデータ記憶装置３ｃ内に保存されている検索音響データを指定することにより実行できる。現実には、検索音響データは、ＭＰ３等の圧縮形式であることが多いため、ＰＣＭ形式に変換した後、処理を行う。また、検索オペレータは、モード設定手段４５により、検索モード、音量判定モードの設定を行う。設定を行わない場合は、検索モードについては、“イントロ検索” “全尺検索”以外の通常検索が実行され、音量判定モードについては、“Ｏｆｆ”が設定される。設定された情報は、ＣＰＵ３ａにより、ＲＡＭ３ｂの所定の領域に書き込まれ、各手段が参照可能な状態となる。検索モードにおいて、“イントロ検索” “全尺検索”が選択された場合は、その検索音響データが音響素材の先頭から始まることが設定されることになる。 <2.2. Processing operation of related information retrieval device>
Next, the processing operation of the apparatus shown in FIG. 12 will be described. First, when the search operator wants to search the search sound data held by the search operator, the related information search device is instructed to start, and after the start, the search sound data to be searched is designated. This can be executed by operating a button on a predetermined computer screen via the key input I / F 3e and designating search acoustic data stored in the data storage device 3c of the related information search device. Actually, since the search sound data is often in a compression format such as MP3, it is processed after being converted into the PCM format. Further, the search operator sets the search mode and the sound volume determination mode by the mode setting means 45. When the setting is not performed, a normal search other than “Intro search” and “Full scale search” is executed for the search mode, and “Off” is set for the volume determination mode. The set information is written into a predetermined area of the RAM 3b by the CPU 3a, and each means can be referred to. When “Intro search” or “Full scale search” is selected in the search mode, the search acoustic data is set to start from the head of the acoustic material.

指示が入力されると、検索特徴ワード生成手段５０が、指定された検索音響データから、それぞれ所定数のサンプルを１音響フレームとして読み込む。この処理は、関連情報登録装置が行ったのと同様に行われる。すなわち、１音響フレームのサンプル数は、サンプリング周波数が４４．１ｋＨｚの場合、４０９６サンプルとする。また、音響フレームは、２０４８サンプルを重複させて読み込むことにしている。 When an instruction is input, the search feature word generation unit 50 reads a predetermined number of samples as one sound frame from the specified search sound data. This process is performed in the same manner as that performed by the related information registration apparatus. That is, the number of samples in one acoustic frame is 4096 samples when the sampling frequency is 44.1 kHz. In addition, the sound frame is read by overlapping 2048 samples.

ここから検索特徴ワードの生成までの処理は、図１４のフローチャートに従ったものとなる。図１４のフローチャートは、登録特徴ワード生成についての図３のフローチャートとほぼ同様のものとなっている。検索特徴ワード生成手段５０は、Ｓ１１と同様にして、読み込んだ各音響フレームに対して周波数変換を行って、その音響フレームのスペクトルであるフレームスペクトルを得る（Ｓ２１）。関連情報登録装置と同様、周波数変換としては、フーリエ変換、ウェーブレット変換その他公知の種々の手法を用いることができるが、関連情報登録装置の処理と合わせる必要があるため、本実施形態では、フーリエ変換を用いる。 The processing from here to the generation of the search feature word follows the flowchart of FIG. The flowchart in FIG. 14 is substantially the same as the flowchart in FIG. 3 for the registration feature word generation. In the same manner as in S11, the search feature word generation unit 50 performs frequency conversion on each read sound frame to obtain a frame spectrum that is a spectrum of the sound frame (S21). As with the related information registration apparatus, as the frequency conversion, Fourier transform, wavelet transform, and other various known methods can be used. However, since it is necessary to match the processing of the related information registration apparatus, in this embodiment, the Fourier transform Is used.

続いて、スペクトル成分の算出を行う（Ｓ２２）。具体的には、Ｓ１２と同様、上記〔数式１〕第３式に従った処理を行い、各周波数における強度値Ｅ（ｊ）を得る。 Subsequently, a spectral component is calculated (S22). Specifically, similarly to S12, the processing according to the above [Formula 1] third formula is performed to obtain the intensity value E (j) at each frequency.

続いて、スペクトル成分の間引き処理を行う（Ｓ２３）。具体的には、Ｓ１３と同様、上記〔数式２〕に従った処理を実行し、１１周波数成分単位のＰｎに間引くことになる。 Subsequently, thinning processing of spectral components is performed (S23). Specifically, as in S13, the process according to the above [Equation 2] is executed and thinned out to Pn in units of 11 frequency components.

上記〔数式２〕により、ｊ＝３３〜３８４の３５２の周波数成分が、ｎ＝０〜３１の３２の周波数成分に間引かれることになる。上記処理は、各音響フレームについて行われ、各音響フレームについて、３２個の周波数成分が得られることになる。 According to the above [Expression 2], 352 frequency components of j = 33 to 384 are thinned out to 32 frequency components of n = 0 to 31. The above process is performed for each acoustic frame, and 32 frequency components are obtained for each acoustic frame.

次に、各音響フレームについて、直前の音響フレームのスペクトル成分との差分を算出する（Ｓ２４）。上記Ｓ２１〜Ｓ２３の処理は、各音響フレームに対して順次行われる。このＳ２４におけるフレーム間差分の算出処理は、各音響フレームについてＳ２３までの処理を行った結果得られたＰ０〜Ｐ３１を利用するものである。具体的には、Ｓ１４と同様、上記〔数式３〕に従った処理を行い、フレーム間差分Ｄｎ（ｔ）を得る。 Next, for each acoustic frame, a difference from the spectral component of the immediately preceding acoustic frame is calculated (S24). The processes of S21 to S23 are sequentially performed on each acoustic frame. The inter-frame difference calculation processing in S24 uses P0 to P31 obtained as a result of performing the processing up to S23 for each acoustic frame. Specifically, similarly to S14, the process according to the above [Equation 3] is performed to obtain the inter-frame difference Dn (t).

上記Ｓ２１〜Ｓ２４の処理を各音響フレームに対して順次行い、音響フレーム間の差分Ｄｎ（ｔ）がＴ個（本実施形態では１１個）得られたら、そのＴ個分の総和を求める（Ｓ２６）。すなわち、以下の〔数式１９〕に従った処理を行い、フレーム間差分の総和Ｓｎ（ｈ）を得る。〔数式１９〕のｈｓは位相をずらす最小単位の音響フレーム数であり、(Ｔ／Ｈで定義される。ただし、ｈｓは整数値でないと意味を持たないため、本実施形態では小数点以下を切り捨て、ｈｓ＝−２としている。 When the processes of S21 to S24 are sequentially performed on each acoustic frame and T differences (n in this embodiment) Dn (t) between the acoustic frames are obtained, a total sum of T is obtained (S26). ). That is, processing according to the following [Equation 19] is performed to obtain a sum Sn (h) of inter-frame differences. Hs in [Equation 19] is the number of acoustic frames in the minimum unit for shifting the phase, and is defined by (T / H. However, since hs has no meaning unless it is an integer value, in this embodiment, the decimal part is rounded down. , Hs = −2.

〔数式１９〕
Ｓｎ（ｈ）＝Σ_t=0,…,T-1Ｄｎ（ｔ＋ｈ・ｈｓ） [Formula 19]
Sn (h) = Σ _{t = 0,..., T-1} Dn (t + h · hs)

上記〔数式１９〕において、ｈは位相を特定する位相番号であり、０≦ｈ≦Ｈ−１のＨ通りの値をとる整数である。続いて、上記〔数式１９〕により得られたＳｎ（ｈ）の二値化処理を行う（Ｓ２７）。具体的には、Ｓ１７と同様の処理をＳｎ（ｈ）配列に対して実行する。すなわち、Ｓｎ（ｈ）配列をｎ≧１４とｎ≦１３の上下帯域で２分割し、ｎ≦１３の１４個中値の大きい７個に１を与え、値の小さい７個に０を与えるとともに、ｎ≧１４の１８個中値の大きい９個に１を与え、値の小さい９個に０を与える。Ｓ２７における処理により、各ｎについてのＳｎ（ｈ）が１ビットで表現可能となる。そして、ｎ＝０をＬＳＢ、ｎ＝３１をＭＳＢとして３２ビットの特徴パターンＦ（ｈ,ｘ）を得る。ここで、ｘ（＝０，…，Ｘ）は、検索音響データから生成されるＸ個の特徴ワードにおいて、その順番を示す変数である。したがって、ｘは演奏開始からの時刻に比例する変数となる。 In the above [Equation 19], h is a phase number that identifies the phase, and is an integer that takes H values of 0 ≦ h ≦ H−1. Subsequently, the binarization process of Sn (h) obtained by the above [Equation 19] is performed (S27). Specifically, the same processing as S17 is executed for the Sn (h) array. That is, the Sn (h) array is divided into two in the upper and lower bands of n ≧ 14 and n ≦ 13, 1 is given to 7 of 14 large values of n ≦ 13, and 0 is given to 7 of the smaller values. , N ≧ 14 and 9 of the 18 large values are given 1 and 9 of the small values are given 0. By the processing in S27, Sn (h) for each n can be expressed by 1 bit. Then, a 32-bit feature pattern F (h, x) is obtained with n = 0 as LSB and n = 31 as MSB. Here, x (= 0,..., X) is a variable indicating the order of X feature words generated from the search sound data. Therefore, x is a variable proportional to the time from the start of performance.

次に、音量データの算出を行う（Ｓ２８）。具体的には、まず、以下の〔数式２０〕を用いて各位相番号ｈについて総和音量Vol（ｈ）を算出する。 Next, the volume data is calculated (S28). Specifically, first, the total volume Vol (h) is calculated for each phase number h using the following [Equation 20].

〔数式２０〕
Vol（ｈ）＝Σ_t=0,…,T-1｛Σ_n=0,…,31Ｐｎ（ｔ＋ｈ・ｈｓ）｝ [Formula 20]
Vol (h) = Σ _{t = 0,..., T-1} {Σ _{n = 0,..., 31} Pn (t + h · hs)}

上記〔数式５〕に示すように、間引き処理した全ての成分Ｐｎ（ｔ＋ｈ・ｈｓ）の値をＴ個の音響フレームについて加算する。これにより、各位相番号ｈについて、Ｔ個の音響フレームについての音量の総和である総和音量Vol（ｈ）が得られる。この総和音量Volの値に適宜設定した固定のスケーリング値を乗算して、０〜２５５の範囲に収まるように正規化して音量データＶ(ｈ,ｘ)を得る。正規化により音量データＶ(ｈ,ｘ)は８ビットで表現されることとなる。音量データＶ(ｈ,ｘ)は、上記〔数式２０〕に示されるように、Ｔ個の音響フレームに渡る音量の総和を基礎としているため、各フレーム単位の音量ではなく、Ｔ個の音響フレームの総和音量を表現していることになる。 As shown in [Expression 5] above, the values of all the thinned components Pn (t + h · hs) are added for T acoustic frames. As a result, for each phase number h, a total volume Vol (h) that is the total volume for the T acoustic frames is obtained. The value of the total volume Vol is multiplied by a fixed scaling value set as appropriate, and is normalized so that it falls within the range of 0 to 255 to obtain volume data V (h, x). The sound volume data V (h, x) is represented by 8 bits by normalization. Since the volume data V (h, x) is based on the sum of the volumes over the T acoustic frames as shown in [Equation 20], the volume data V (h, x) is not the volume of each frame unit but the T acoustic frames. It represents the total volume of.

上記Ｓ２７、Ｓ２８の処理は、順序を入れ替えて行うことも可能である。Ｓ２７、Ｓ２８による処理の結果、３２ビットの特徴パターンと８ビットの音量データにより構成される４０ビットの特徴ワードが得られる。 The processes of S27 and S28 can be performed by changing the order. As a result of the processing in S27 and S28, a 40-bit feature word composed of a 32-bit feature pattern and 8-bit volume data is obtained.

以上の処理を各音響フレームに対して実行することにより、その音響データについての特徴ワードが多数（Ｘ個）生成されることになる。例えば、上記の例のように、サンプリング周波数４４．１ｋＨｚ、１音響フレームが４０９６サンプル、音響フレームを２０４８サンプルずつ重複させた場合、１特徴ワードは約０．５０６秒となり、３０秒間の音響データからは、約６０個（＝Ｘ個）の特徴ワードが生成されることになる。 By executing the above processing for each acoustic frame, a large number (X) of characteristic words for the acoustic data are generated. For example, when the sampling frequency is 44.1 kHz, the sound frame is 4096 samples, and the sound frame is overlapped by 2048 samples as in the above example, one feature word is about 0.506 seconds, and the sound data from 30 seconds Will generate about 60 (= X) feature words.

上記のようにして、関連情報検索装置では、関連情報登録装置に比べて音響データの単位時間あたりＨ倍（本実施形態では、Ｈ＝５）の特徴ワードを生成する。（ただし、全体としては関連情報検索装置で生成される特徴ワードの方が、関連情報登録装置に比べて顕著に少なくなる。）関連情報登録装置に比べて検索対象とする検索音響データは、楽曲の一部が切り取られたものであることが多い。これは、利用者が演奏されている音楽の一部を録音することにより取得されることがあるためである。そのため、必ずしも、データベースに登録された原音響データとのタイミングが一致するものではなく、位置ズレが生じることがある。関連情報登録装置において生成した手法でも、１１個の音響フレームを平均化して生成しているため、比較的位置ズレには強い。しかし、リズム変化が激しい検索音響データの場合、特徴ワードの生成単位である１１音響フレームの、ほぼ半分である５音響フレーム程度ずれると、顕著に異なる特徴ワードが生成され、誤った情報が検索されてしまう。そのため、１解析単位である１１音響フレームの範囲内でｈｓ（＝２音響フレーム）ずつ遅らせて（位相を変更して）複数の検索特徴ワードを生成して、音響データベース４０内の登録特徴ワードと照合するようにする。 As described above, the related information search device generates H times (in this embodiment, H = 5) feature words per unit time of the acoustic data as compared with the related information registration device. (However, as a whole, the feature words generated by the related information search device are significantly fewer than the related information registration device.) The search acoustic data to be searched for compared to the related information registration device is music. Often part of this is cut off. This is because the user may obtain it by recording a part of the music being played. For this reason, the timing of the original sound data registered in the database does not necessarily match, and a positional shift may occur. Even in the method generated in the related information registration apparatus, 11 acoustic frames are averaged and generated, so that they are relatively resistant to positional deviation. However, in the case of searched sound data with a sharp rhythm change, if the sound sound is shifted by about 5 sound frames, which is almost half of 11 sound frames, which is a feature word generation unit, significantly different feature words are generated, and erroneous information is searched. End up. Therefore, a plurality of search feature words are generated by delaying (changing the phase) by hs (= 2 sound frames) within the range of 11 sound frames as one analysis unit, and the registered feature words in the sound database 40 Try to match.

具体的には、関連情報登録装置では、フレーム１〜フレーム１１までで１つの特徴ワードを生成するが、関連情報検索装置では、フレーム１〜フレーム１１で特徴ワードを生成するとともに、２音響フレーム分、４音響フレーム分、６音響フレーム分、８音響フレーム分ずらした（位相を変更した）音響フレーム群からも特徴ワードを生成する。すなわち、フレーム３〜フレーム１３、フレーム５〜フレーム１５、フレーム７〜フレーム１７、フレーム９〜フレーム１９においても特徴ワードを生成する。結局、図１４のフローチャートに従った処理を各音響フレームに対して実行することにより、Ｘ個の検索特徴ワードを構成する配列として、［Ｆ(ｈ,ｘ), Ｖ(ｈ,ｘ)］（ｈ＝０,…,Ｈ−１；ｘ＝０,…,Ｘ−１)が得られることになる。 Specifically, the related information registration device generates one feature word from frame 1 to frame 11, whereas the related information search device generates a feature word from frame 1 to frame 11 and for two acoustic frames. A feature word is also generated from a group of acoustic frames shifted (by changing the phase) by four acoustic frames, six acoustic frames, and eight acoustic frames. That is, feature words are also generated in frames 3 to 13, frames 5 to 15, frames 7 to 17, and frames 9 to 19. Finally, by executing the processing according to the flowchart of FIG. 14 for each acoustic frame, [F (h, x), V (h, x)] ( h = 0, ..., H-1; x = 0, ..., X-1).

位相のズレを考慮してＨ個の検索特徴ワード群を生成したら、次に、代表検索特徴データ生成手段６０が、検索特徴ワード群を用いて代表検索特徴データを生成する。代表検索特徴データは、検索特徴ワードを基礎とする特徴成分の時間方向における平均値、標準偏差の組として構成される。図１５は、検索特徴ワードを基礎とする特徴成分の時間方向における平均値、標準偏差の算出処理を示すフローチャートである。まず、代表検索特徴データ生成手段６０は、検索特徴ワードを多値化処理することにより検索特徴データ配列を生成する（Ｓ２９）。具体的には、以下の〔数式２１〕に従った処理を実行することにより、検索特徴ワードを基礎とする特徴成分である検索特徴データ配列Ｑ(ｎ,ｈ,ｘ)（ｎ＝０, …,３１；ｈ＝０, …,Ｈ−１；ｘ＝０, …,Ｘ−１)を生成する。 After generating the H search feature word groups in consideration of the phase shift, the representative search feature data generating means 60 then generates representative search feature data using the search feature word groups. The representative search feature data is configured as a set of an average value and a standard deviation in the time direction of feature components based on the search feature word. FIG. 15 is a flowchart showing a calculation process of an average value and a standard deviation in the time direction of feature components based on a search feature word. First, the representative search feature data generation means 60 generates a search feature data array by multi-value processing the search feature words (S29). Specifically, by executing processing according to the following [Formula 21], a search feature data array Q (n, h, x) (n = 0,...) That is a feature component based on the search feature word. , 31; h = 0,..., H−1; x = 0,.

〔数式２１〕
Ｆ(ｈ,ｘ)の各ビットｎが１の場合、Ｑ(ｎ,ｈ,ｘ)←Ｖ(ｈ,ｘ)
Ｆ(ｈ,ｘ)の各ビットｎが０の場合、Ｑ(ｎ,ｈ,ｘ)←−Ｖ(ｈ,ｘ) [Formula 21]
When each bit n of F (h, x) is 1, Q (n, h, x) ← V (h, x)
When each bit n of F (h, x) is 0, Q (n, h, x) ← −V (h, x)

次に、代表検索特徴データ生成手段６０は、検索特徴データ配列Ｚ(ｎ,ｈ,ｘ)を用いて平均値配列Ｃ（ｎ）、標準偏差配列Ｌ（ｎ）を算出する（Ｓ３０）。具体的には、以下の〔数式２２〕に従った処理を実行することにより、平均値配列Ｃ（ｎ）、標準偏差配列Ｌ（ｎ）を算出する。 Next, the representative search feature data generation means 60 calculates the average value array C (n) and the standard deviation array L (n) using the search feature data array Z (n, h, x) (S30). Specifically, the average value array C (n) and the standard deviation array L (n) are calculated by executing processing according to the following [Equation 22].

〔数式２２〕
Ｃ(ｎ)=[Σ_x=0,…,X-1Σ_h=0,…,H-1Ｑ(ｎ,ｈ,ｘ)]／(ＨＸ)
Ｌ(ｎ)=[Σ_x=0,…,X-1Σ_h=0,…,H-1(Ｑ(ｎ,ｈ,ｘ)-Ｃ(ｎ))²／(ＨＸ)]^1/2 [Formula 22]
C (n) = [Σx _{= 0, ..., X-1} Σh _{= 0, ..., H-1} Q (n, h, x)] / (HX)
L (n) = [Σx _{= 0, ..., X-1} Σh _{= 0, ..., H-1} (Q (n, h, x) -C (n)) ² / (HX)] ^1/2

図１６は、Ｓ２９、Ｓ３０による代表検索特徴データの平均値、標準偏差の算出処理の概念図である。図１６は、ある特定の位相ｈに対する代表検索特徴データの平均値、標準偏差の算出処理を示している。図１６に示すように、検索特徴ワードは各時刻ｘ（ｘ＝０,…,Ｘ−１）に対応して３２ビットの特徴パターンと８ビットの音量データを有している。図１６においては、特徴パターンの３２個の各ビットをＢｉｔ０〜Ｂｉｔ３１で示している。 FIG. 16 is a conceptual diagram of the processing for calculating the average value and standard deviation of representative search feature data in S29 and S30. FIG. 16 shows an average value and standard deviation calculation process of representative search feature data for a specific phase h. As shown in FIG. 16, the search feature word is stored at each time x (x = 0, .., X-1) has a 32-bit feature pattern and 8-bit volume data. In FIG. 16, 32 bits of the feature pattern are indicated by Bit0 to Bit31.

そして、上記〔数式２１〕に示したようにビットｎ（Ｂｉｔ０〜Ｂｉｔ３１）の値が０であるか１であるかにより、Ｑ(ｎ, ｈ, ｘ)の値を音量データの値そのままとするか、音量データの値に−１を乗じたものとするかを決定し、ビットｎに対応するＱ(ｎ, ｈ, ｘ)の値を定める。このとき、８ビットの音量データを負の値とする場合が生じるため、Ｑ(ｎ, ｈ, ｘ)は１６ビットで表現する。３２ビット特徴パターンでは、各バンドについて１ビットで表現されていたものが、１６ビット（２バイト）で表現されることになるので、各時刻ｘにおける検索特徴データは６４バイトとなる。 Then, as shown in [Formula 21], the value of Q (n, h, x) is left as it is as the value of the volume data depending on whether the value of bit n (Bit 0 to Bit 31) is 0 or 1. Or the value of the volume data is multiplied by −1, and the value of Q (n, h, x) corresponding to bit n is determined. At this time, since 8-bit volume data may be a negative value, Q (n, h, x) is expressed by 16 bits. In the 32-bit feature pattern, what is represented by 1 bit for each band is represented by 16 bits (2 bytes), so the search feature data at each time x is 64 bytes.

検索の目的とする検索音響データについて、検索特徴ワード群および代表検索特徴データが得られたら、実際に音響データベース４０を参照して照合処理を実行することになる。この際、まず、照合範囲決定手段７０が、登録特徴ワードの照合範囲を決定する処理を行う。検索音響データは、利用者が検索用に用いる音響データであるため、必ずしも楽曲全体を記録しているとは限らず、楽曲の一部だけを録音して音響データとして取得したような場合もある。そのため、照合範囲決定手段７０は、検索音響データが実際に元の楽曲のどの部分から取得されたものであるかがわからないことを前提として、効率的に照合を行うための照合範囲の決定を行う。照合範囲決定の概念図を図１７に示す。図１７における矩形の横幅は、特徴ワード群を構成する特徴ワード数を示している。 When the search feature word group and the representative search feature data are obtained for the search acoustic data to be searched, the collation process is actually executed with reference to the acoustic database 40. At this time, first, the collation range determining means 70 performs processing for determining the collation range of the registered feature word. The search sound data is the sound data used by the user for the search. Therefore, the entire music is not necessarily recorded, and only a part of the music may be recorded and acquired as the sound data. . For this reason, the collation range determination means 70 determines a collation range for efficient collation on the premise that it is not known from which part of the original music the retrieved acoustic data is actually acquired. . FIG. 17 shows a conceptual diagram for determining the collation range. The horizontal width of the rectangle in FIG. 17 indicates the number of feature words constituting the feature word group.

具体的には、まず、照合範囲決定手段７０は、検索特徴ワード群の検索特徴ワード数Ｘとあるレコードｒについての登録特徴ワード群の登録特徴ワード数Ｙ（ｒ）を比較する。ここでは、検索特徴ワード群の検索特徴ワード数は、再生時の長さに比例するものとして登録特徴ワード群の登録特徴ワード数との比較を行っている。したがって、検索特徴ワード数Ｘは位相を考慮しない場合のものとする。 Specifically, first, the collation range determination means 70 compares the number X of search feature words in the search feature word group with the number Y (r) of registered feature words in the registered feature word group for a certain record r. Here, the number of search feature words in the search feature word group is compared with the number of registered feature words in the registered feature word group as being proportional to the length at the time of reproduction. Therefore, the search feature word number X is assumed to be when the phase is not considered.

そして、検索特徴ワード群の検索特徴ワード数Ｘと登録特徴ワード群の登録特徴ワード数Ｙ（ｒ）の比較の結果、ＸがＹ（ｒ）より大きい場合、すなわち、図１７（ａ）に示すような場合、照合範囲決定手段７０は、そのレコードを検索対象から除外し、次のレコードに移行する。ＸがＹ（ｒ）より大きい場合、検索音響データの一部が原音響データより長いことになる。原音響データは音響データ全体を記録しているため、この両者は異なる原音響データに基づくものであることが明らかとなる。そこで、この場合は、そのレコードを検索対象から除外するのである。 Then, as a result of comparison between the search feature word number X of the search feature word group and the registered feature word number Y (r) of the registered feature word group, when X is larger than Y (r), that is, as shown in FIG. In such a case, the collation range determination unit 70 excludes the record from the search target and moves to the next record. When X is larger than Y (r), a part of the search sound data is longer than the original sound data. Since the original sound data records the entire sound data, it becomes clear that both are based on different original sound data. Therefore, in this case, the record is excluded from the search target.

検索特徴ワード群の検索特徴ワード数Ｘと登録特徴ワード群の登録特徴ワード数Ｙ（ｒ）の比較の結果、ＸがＹ（ｒ）より小さい場合、照合範囲決定手段７０は、検索特徴ワード群に、原音響データの先頭部分から作成した特徴ワードが含まれているかどうかを判断する。具体的には、ＲＡＭ３ｂの所定の領域を参照し、モード設定手段４５によりイントロ検索または全尺検索が設定されているかどうかを確認する。これらの検索モードが設定されている場合、検索特徴ワード群に、原音響データの先頭部分から作成した特徴ワードが含まれていることが明らかであるため、検索音響データと原音響データが同一であるとすれば、先頭の検索特徴ワードと登録特徴ワードが一致するはずである。そこで、検索特徴ワード群に、原音響データの先頭部分から作成した特徴ワードが含まれている場合は、照合範囲決定手段７０は、先頭の検索特徴ワードと登録特徴ワードを照合対象として設定する処理を行う。しかし、先頭の１つの特徴ワード同士を比較しただけでは、他の楽曲にも同様な特徴ワードをもつ可能性もあるため、正しい照合結果が得られなくなる。そこで、照合範囲決定手段７０は、所定の時間の長さに対応するα個の特徴ワードを照合対象とする処理を行う。本実施形態では、１２秒程度に相当する特徴ワード群を照合対象とする処理を行っている。上述のように、本実施形態では、１特徴ワードは約０．５０６秒であるので、１２秒は、特徴ワード２４個分に相当する。すなわち、この場合、検索特徴ワード群の先頭からの特徴ワード２４個と、登録特徴ワード群の先頭からの特徴ワード２４個が照合対象とされる。このときの状態を示したものが図１７（ｂ）である。 If X is smaller than Y (r) as a result of comparison between the search feature word number X of the search feature word group and the registered feature word number Y (r) of the registered feature word group, the collation range determination means 70 searches the search feature word group. Whether or not a feature word created from the head portion of the original sound data is included. Specifically, referring to a predetermined area of the RAM 3b, it is confirmed whether or not the intro search or full scale search is set by the mode setting means 45. When these search modes are set, it is clear that the search feature word group includes a feature word created from the beginning of the original sound data, so the search sound data and the original sound data are the same. If there is, the first search feature word and the registered feature word should match. Therefore, when the search feature word group includes a feature word created from the head portion of the original sound data, the collation range determination unit 70 sets the head search feature word and the registered feature word as a collation target. I do. However, just comparing the top one feature word may result in another song having the same feature word, so that a correct collation result cannot be obtained. Accordingly, the collation range determination unit 70 performs processing for targeting α feature words corresponding to a predetermined length of time. In the present embodiment, processing is performed on a feature word group corresponding to about 12 seconds. As described above, in this embodiment, one feature word is about 0.506 seconds, so 12 seconds corresponds to 24 feature words. That is, in this case, 24 feature words from the beginning of the search feature word group and 24 feature words from the beginning of the registered feature word group are targeted for collation. FIG. 17B shows the state at this time.

検索特徴ワード群に、原音響データの先頭部分から作成した特徴ワードが含まれていない場合で、かつ検索特徴ワード群の先頭部分からα（＜Ｘ）個を照合対象とする場合、照合範囲決定手段７０は、登録特徴ワード群における検索特徴ワード群の先頭検索特徴ワード（ｘ＝０）との照合範囲を、先頭（ｙ＝０）から登録特徴ワード群の最後尾登録特徴ワード（ｙ＝Ｙ（ｒ）−１）より検索特徴ワード群の個数（Ｘ）だけ前方に位置する範囲までとする。すなわち、登録特徴ワード群における検索特徴ワード群の先頭検索特徴ワード（ｘ＝０）との照合範囲は、図１７（ｃ）に示すように、先頭（ｙ＝０）から（Ｙ（ｒ）−Ｘ−１）番目の登録特徴ワードまでとなる。 If the search feature word group does not include the feature word created from the head part of the original sound data and if α (<X) from the head part of the search feature word group is to be collated, the collation range is determined. The means 70 sets the collation range with the first search feature word (x = 0) of the search feature word group in the registered feature word group from the start (y = 0) to the last registered feature word (y = Y) of the registered feature word group. From (r) -1), the number of search feature word groups (X) is the range up to the front. That is, the collation range of the search feature word group in the registered feature word group with the head search feature word (x = 0) is from (Y (r) −) from the head (y = 0) as shown in FIG. X-1) Up to the registered feature word.

本実施形態では、標準設定として、検索特徴ワード群の先頭からの特徴ワードα個を照合対象としているが、検索特徴ワード群の最後尾からの特徴ワードα個、検索特徴ワード群の中央の特徴ワードα個を照合対象に設定しておくことも可能である。検索特徴ワード群の最後尾からの特徴ワードα個が照合対象として設定されている場合、登録特徴ワード群における検索特徴ワード群の先頭検索特徴ワード（ｘ＝０）との照合範囲は、図１７（ｄ）に示すように、（Ｘ−α）番目から（Ｙ（ｒ）−α）番目の登録特徴ワードまでとなる。 In the present embodiment, as a standard setting, α feature words from the beginning of the search feature word group are targeted for collation, but α feature words from the tail of the search feature word group and the feature at the center of the search feature word group are included. It is also possible to set α words as collation targets. When α feature words from the tail of the search feature word group are set as collation targets, the collation range of the search feature word group in the registered feature word group with the first search feature word (x = 0) is as shown in FIG. As shown in (d), from (X-α) th to (Y (r) -α) th registered feature word.

また、検索特徴ワード群の中央の特徴ワードα個が照合対象として設定されている場合、登録特徴ワード群における検索特徴ワード群の先頭検索特徴ワード（ｘ＝０）との照合範囲は、図１７（ｄ）に示すように、｛（Ｘ−α）／２｝番目から｛Ｙ（ｒ）−α−（Ｘ−α）／２｝番目の登録特徴ワードまでとなる。照合範囲決定手段７０は、以上のようにして決定した登録特徴ワードの照合範囲を、レコード別に特徴ワード照合手段９０に渡す。 Further, when α feature words in the center of the search feature word group are set as collation targets, the collation range with the first search feature word (x = 0) of the search feature word group in the registered feature word group is as shown in FIG. As shown in (d), from {(X−α) / 2} th to {Y (r) −α− (X−α) / 2} th registered feature words. The collation range determination unit 70 passes the collation range of the registered feature word determined as described above to the feature word collation unit 90 for each record.

登録特徴ワードの照合範囲が決定したら、次に、代表特徴データ照合手段８０が、代表検索特徴データを用いて、検索対象であるレコードの絞込みを行う。具体的には、代表検索特徴データと音響データベース４０に登録された代表登録特徴データとの比較を行い、所定の条件を満たすレコードのみを抽出する。そして、特徴ワード照合手段９０が、抽出されたレコードの中から、検索特徴ワード群を用いて、音響データベース４０に登録された原音響データの関連情報を検索する。具体的には、検索特徴ワード群と、音響データベース４０に登録された登録特徴ワード群との比較を行い、所定の条件を満たすレコードを抽出する。代表特徴データ照合手段８０による処理は、検索特徴ワード群を用いた比較処理の演算負荷が高いため、検索対象とするレコードを絞込むために行われる。 When the collation range of the registered feature word is determined, the representative feature data collating unit 80 next narrows down the records to be searched using the representative search feature data. Specifically, the representative search feature data is compared with the representative registered feature data registered in the acoustic database 40, and only records satisfying a predetermined condition are extracted. And the characteristic word collating means 90 searches the relevant information of the original acoustic data registered into the acoustic database 40 using the search characteristic word group from the extracted record. Specifically, the search feature word group and the registered feature word group registered in the acoustic database 40 are compared, and a record satisfying a predetermined condition is extracted. The processing by the representative feature data matching unit 80 is performed to narrow down the records to be searched because the calculation load of the comparison processing using the search feature word group is high.

ここで、代表特徴データ照合手段８０により行われる代表特徴データの概念について説明しておく。図１８は、代表特徴データの概念を示す図である。代表特徴データは、図６、図１６に示したように、３２個の統合周波数帯に応じた値を持つ。したがって、各統合周波数帯を１次元とする３２次元構造をとることになる。代表特徴データを用いた絞込みでは、３２次元空間にプロットした検索音響データの代表検索特徴データと交わる代表登録特徴データを有する原音響データのみを絞り込み結果として抽出する。 Here, the concept of representative feature data performed by the representative feature data matching unit 80 will be described. FIG. 18 is a diagram illustrating the concept of representative feature data. The representative feature data has values corresponding to the 32 integrated frequency bands as shown in FIGS. Therefore, the integrated frequency band has a one-dimensional structure. In the narrowing down using the representative feature data, only the original sound data having the representative registered feature data intersecting with the representative search feature data of the search sound data plotted in the 32-dimensional space is extracted as the narrowing result.

図１８では、概念的に示すため、３２次元の代表特徴データのうち、ある２次元についてプロットしている。図１８において、黒丸は、音響データごとの平均値、黒丸を中心とした円は、黒丸から音響データごとの標準偏差を半径としている。そして、その円が検索音響データの円と交わらない原音響データを検索対象から除外し、その円が検索音響データの円と交わる原音響データのみを絞り込み結果として抽出する。そして、これを全３２次元について交わるものを抽出するか、所定数の次元で交わるものを抽出するかは、適宜設定することができる。本実施形態では、〔数式７〕〔数式２２〕に示したように、３２の周波数帯ごとに平均値と標準偏差で２次元化している。なお、代表特徴データ距離Ｕ（ｚ）の概念も同様である。 In FIG. 18, for the purpose of conceptual illustration, a certain two-dimensional plot is plotted among the 32-dimensional representative feature data. In FIG. 18, the black circle is an average value for each acoustic data, and the circle centered on the black circle has a radius that is a standard deviation for each acoustic data from the black circle. Then, the original sound data whose circle does not intersect with the circle of the search sound data is excluded from the search target, and only the original sound data whose circle intersects with the circle of the search sound data is extracted as the narrowing result. Then, it can be set as appropriate whether to extract what intersects all 32 dimensions or extract what intersects a predetermined number of dimensions. In this embodiment, as shown in [Formula 7] and [Formula 22], each of the 32 frequency bands is two-dimensionalized with an average value and a standard deviation. The concept of the representative feature data distance U (z) is the same.

続いて、代表特徴データ照合手段８０、特徴ワード照合手段９０による検索処理を、図１９〜図２３のフローチャートを用いて説明する。図１９〜図２３においては、各変数を以下のように定義する。 Next, search processing by the representative feature data collating unit 80 and the feature word collating unit 90 will be described with reference to the flowcharts of FIGS. 19 to 23, each variable is defined as follows.

[登録特徴ワード]
Ｒ：レコード件数（音響データベース４０が管理する原音響データの数）
Ｙ（ｒ）：レコードｒ(ｒ＝０，…，Ｒ−１)の登録特徴ワード数
Ｆｄ（ｒ，ｙ）：レコードｒ(ｒ＝０，…，Ｒ−１)の特徴パターン配列（ｙ＝０，…，Ｙ−１）、３２ビット
Ｖｄ（ｒ，ｙ）：レコードｒ(ｒ＝０，…，Ｒ−１)の音量データ配列（ｙ＝０，…，Ｙ−１）、８ビット
Ｃｄ（ｎ，ｒ）：レコードｒ(ｒ＝０，…，Ｒ−１)における登録特徴データ配列の平均値
Ｌｄ（ｎ，ｒ）：レコードｒ(ｒ＝０，…，Ｒ−１)における登録特徴データ配列の標準偏差 [Registration feature word]
R: Number of records (number of original acoustic data managed by the acoustic database 40)
Y (r): Number of registered feature words of record r (r = 0,..., R−1) Fd (r, y): Feature pattern array of record r (r = 0,..., R−1) (y = 0,..., Y-1), 32 bits Vd (r, y): Volume data array (y = 0,..., Y-1) of record r (r = 0,..., R-1), 8 bits Cd (N, r): average value of registered feature data array in record r (r = 0,..., R−1) Ld (n, r): registered feature in record r (r = 0,..., R−1) Standard deviation of data array

[検索特徴ワード]
Ｘ（ｈ）：位相番号ｈ（ｈ＝０，…，Ｈ−１）における検索特徴ワード数
Ｆ（ｈ，ｘ）：特徴パターン配列（ｘ＝０，…，Ｘ（ｈ）−１）、３２ビット
Ｖ（ｈ，ｘ）：音量データ配列（ｘ＝０，…，Ｘ（ｈ）−１）、８ビット
Ｃ（ｎ）：検索特徴データ配列の平均値
Ｌ（ｎ）：検索特徴データ配列の標準偏差 [Search feature word]
X (h): Number of search feature words in phase number h (h = 0,..., H−1) F (h, x): Feature pattern array (x = 0,..., X (h) −1), 32 Bit V (h, x): Volume data array (x = 0,..., X (h) -1), 8 bits C (n): Average value of search feature data array L (n): Search feature data array standard deviation

[照合変数]
Ｗ：照合ワード数（ｗ＝０，…，Ｗ−１）、照合する登録特徴ワード、検索特徴ワードの数（例．Ｗ＝６）
Ｕ（ｒ）：代表特徴データ距離
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）：特徴パターンのワード単位不一致ビット数（０以上３２以下）
Ｓ（ｒ，ｙ，ｈ，ｘ）：合算不一致ビット数、照合ワード数Ｗ個のワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）を合算したもの（Σ_w=0,…_,W-1Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）） [Collation variable]
W: Number of collation words (w = 0,..., W-1), registered feature words to be collated, number of search feature words (eg, W = 6)
U (r): representative feature data distance D (r, y + w, h, x + w): word unit mismatch bit number of feature pattern (0 to 32)
S (r, y, h, x): the sum of the number of unmatched bits, the number of matching word numbers D (r, y + w, h, x + w) (Σw _{= 0,} ... _{, W) -1} D (r, y + w, h, x + w))

[判定しきい値（事前に設定）]
ＭＭｕ：代表特徴データ距離Ｕ（ｒ）の判定しきい値
ＭＭ１：合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）の判定しきい値
ＭＭ２：合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）の判定しきい値
ＭＭｗ１：ワード単位不一致ビット数Ｄ（ｒ，ｙ，ｈｏ，ｘ）の判定しきい値
ＭＭｗ２：ワード単位不一致ビット数Ｄ（ｒ，ｙ，ｈ，ｘ）の判定しきい値 [Judgment threshold (preset)]
MMu: Determination threshold value of representative feature data distance U (r) MM1: Determination threshold value of total mismatch bit number S (r, y, ho, x) MM2: Total mismatch bit number S (r, y, h, x) Determination threshold value MMw1: Determination threshold value of word unit mismatch bit number D (r, y, ho, x) MMw2: Determination threshold value of word unit mismatch bit number D (r, y, h, x) value

判定しきい値ＭＭｕ、ＭＭ１、ＭＭ２、ＭＭｗ１、ＭＭｗ２については、事前に設定されるが、これらの判定しきい値は、以下のようにして原音響データごとに求め、最終的にその最大値を正式な判定しきい値とすることができる。ＭＭｕは登録対象の原音響データに対して登録特徴ワードおよび代表登録特徴データを生成してレコードｒに登録する際に、原音響データより無作為に数箇所切り出した部分音響データを複数個作成し、各部分音響データに対して部分特徴ワードと代表部分特徴データを作成し、各々上記〔数式１２〕に基づいて算出される代表特徴データ距離Ｕ（ｚ）を算出して、全てのレコードｒについて各々算出される複数の代表特徴データ距離Ｕ（ｚ）の最大値＋βをＭＭｕとして与える。（βは最大値より少し大きめに設定すると言う意味で、例えばβ＝１）。これにより、少なくとも該当するレコードｒに対する代表特徴データ距離Ｕ（ｚ）はＭｕを超えることは無いため、代表特徴データ距離Ｕ（ｚ）がＭＭｕを超えれば該当しないレコードであると即判断できる。 The determination threshold values MMu, MM1, MM2, MMw1, and MMw2 are set in advance. These determination threshold values are obtained for each original sound data as follows, and finally the maximum value is determined. It can be an official decision threshold. When MMu generates registered feature words and representative registered feature data for the original sound data to be registered and registers them in the record r, MMu creates a plurality of partial sound data that are randomly cut out from the original sound data at several locations. For each record r, a partial feature word and representative partial feature data are created for each partial acoustic data, and a representative feature data distance U (z) calculated based on [Equation 12] is calculated. The maximum value + β of the plurality of representative feature data distances U (z) calculated respectively is given as MMu. (In the sense that β is set slightly larger than the maximum value, for example, β = 1). As a result, the representative feature data distance U (z) for at least the corresponding record r does not exceed Mu, so that if the representative feature data distance U (z) exceeds MMu, it can be immediately determined that the record is not applicable.

ＭＭ１とＭＭｗ１については、登録特徴ワードと複数の部分特徴ワードの各々と、ｈ＝ｈｏ固定にして、ワード単位不一致ビット数Ｄ（ｙ，ｚ，ｈｏ，ｘ）を図９に示した手法で所定のワード数だけ連続して算出し、それらの総和を基に合算不一致ビット数Ｓ（ｙ，ｚ，ｈｏ，ｘ）を算出し、この値が最も小さくなるときの最小合算不一致ビット数Ｓｍｉｎ１を算出するとともに、その時の総和の各要素Ｄ（ｙ，ｚ，ｈｏ，ｘ）の最大ワード単位不一致ビット数Ｄｍａｘを求め、複数の検索特徴ワードごとに求めたＳｍｉｎ１の最大値＋βをＭＭ１とし、Ｄｍａｘの最大値＋βをＭＭｗ１とする。 For MM1 and MMw1, the registered feature word and each of the plurality of partial feature words are fixed to h = ho, and the number D (y, z, ho, x) of word unit mismatches is determined by the method shown in FIG. The number of consecutive mismatched bits S (y, z, ho, x) is calculated based on the sum of these, and the minimum number of unmatched bits Smin1 when this value is the smallest is calculated. At the same time, the maximum word unit mismatch bit number Dmax of each element D (y, z, ho, x) of the sum at that time is obtained, and the maximum value + β of Smin1 obtained for each of the plurality of search feature words is set to MM1, The maximum value + β is assumed to be MMw1.

ＭＭ２とＭＭｗ２については、登録特徴ワードと複数の部分特徴ワードの各々と、ｈを０からＨ−１の範囲で変化させながら、ワード単位不一致ビット数Ｄ（ｙ，ｚ，ｈ，ｘ）を図９に示した手法で所定のワード数だけ連続して算出し、それらの総和を基に合算不一致ビット数Ｓ（ｙ，ｚ，ｈ，ｘ）を算出し、この値が最も小さくなるときの最小合算不一致ビット数Ｓｍｉｎ２を算出するとともに、その時の総和の各要素Ｄ（ｙ，ｚ，ｈ，ｘ）の最大ワード単位不一致ビット数Ｄｍａｘを求め、複数の部分特徴ワードごとに求めたＳｍｉｎ２の最大値＋βをＭＭ２とし、Ｄｍａｘの最大値＋βをＭＭｗ２とする。 For MM2 and MMw2, each of the registered feature words and the plurality of partial feature words and the number of word unit mismatch bits D (y, z, h, x) are changed while h is changed in the range of 0 to H-1. 9 is continuously calculated by a predetermined number of words by the method shown in FIG. 9, and the sum mismatch bit number S (y, z, h, x) is calculated based on the total sum of them, and the minimum when this value is the smallest The total mismatch bit number Smin2 is calculated, the maximum word unit mismatch bit number Dmax of each element D (y, z, h, x) of the sum at that time is obtained, and the maximum value of Smin2 obtained for each of the plurality of partial feature words + Β is MM2, and the maximum value of Dmax + β is MMw2.

上記のような処理を、コンピュータに専用のプログラムを実行させることにより行い、レコードごとに異なる判定しきい値ＭＭｕ、ＭＭ１、ＭＭ２、ＭＭｗ１、ＭＭｗ２を得ることができる。そして、全てのレコードに対して算出された各判定しきい値の最大値を、全てのレコードに共通な判定しきい値として設定する。本実施形態では、ＭＭｕ＝３５０、ＭＭ１＝７０、ＭＭ２＝５４、ＭＭｗ１＝２０、ＭＭｗ２＝１３と与えている。例えば、レコード１がＭＭｕ＝３５０、ＭＭ１＝６０、ＭＭ２＝５４、ＭＭｗ１＝１８、ＭＭｗ２＝１３で、レコード２がＭＭｕ＝３００、ＭＭ１＝７０、ＭＭ２＝４４、ＭＭｗ１＝２０、ＭＭｗ２＝１２であれば、双方のレコードに共通な判定しきい値として、ＭＭｕ＝３５０、ＭＭ１＝７０、ＭＭ２＝５４、ＭＭｗ１＝２０、ＭＭｗ２＝１３と与える。 The processing as described above is performed by causing a computer to execute a dedicated program, and different determination threshold values MMu, MM1, MM2, MMw1, and MMw2 can be obtained for each record. Then, the maximum value of each determination threshold value calculated for all records is set as a determination threshold value common to all records. In this embodiment, MMu = 350, MM1 = 70, MM2 = 54, MMw1 = 20, and MMw2 = 13 are given. For example, record 1 is MMu = 350, MM1 = 60, MM2 = 54, MMw1 = 18, MMw2 = 13, and record 2 is MMu = 300, MM1 = 70, MM2 = 44, MMw1 = 20, MMw2 = 12. For example, MMu = 350, MM1 = 70, MM2 = 54, MMw1 = 20, and MMw2 = 13 are given as judgment threshold values common to both records.

図１９は、代表特徴データ照合手段８０、特徴ワード照合手段９０による検索特徴ワード群を用いた音響データ検索のフローチャートである。まず、初期設定を行う（Ｓ２１０）。具体的には、適合テーブルＳｍｉｎ（ｃ）＝初期値Ｂｉｇ３、適合テーブルＲｍｉｎ（ｃ）＝−１、適合件数ｃ＝０、レコード番号ｒ＝０に設定する。初期値Ｂｉｇ３は、最小不一致ビット数として取り得る値よりも十分に大きな値であれば良く、事前に設定されることになる。 FIG. 19 is a flowchart of the acoustic data search using the search feature word group by the representative feature data matching unit 80 and the feature word matching unit 90. First, initial setting is performed (S210). Specifically, the matching table Smin (c) = initial value Big3, the matching table Rmin (c) = − 1, the number of matching cases c = 0, and the record number r = 0 are set. The initial value Big3 may be a value sufficiently larger than a value that can be taken as the minimum mismatch bit number, and is set in advance.

次に、代表特徴データ照合手段８０は、代表特徴データ距離Ｕ（ｒ）の算出を行う（Ｓ２１０）。具体的には、以下の〔数式２３〕に従った処理を実行し、レコードｒについての、代表登録特徴データと代表検索特徴データの距離である代表特徴データ距離Ｕ（ｒ）を算出する。 Next, the representative feature data matching unit 80 calculates the representative feature data distance U (r) (S210). Specifically, processing according to the following [Equation 23] is executed to calculate a representative feature data distance U (r) that is a distance between the representative registered feature data and the representative search feature data for the record r.

〔数式２３〕
Ｕ（ｒ）＝２５６×［Σ_n=0,…,31｛（Ｃｄ（ｎ，ｒ）−Ｃ（ｎ））／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））｝²］^1/2／［Σ_n=0,…,31｛Ｃｄ（ｎ，ｒ）／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））｝²］^1/4／［Σ_n=0,…,31｛（Ｃ（ｎ）／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））｝²］^1/4 [Formula 23]
U (r) = 256 × [Σ _{n = 0,..., 31} {(Cd (n, r) −C (n)) / (Ld (n, r) + L (n))} ² ] ^1/2 / [Σ _{n = 0, ..., 31} {Cd (n, r) / (Ld (n, r) + L (n))} ² ] ^1/4 / [Σ _{n = 0, ..., 31} {(C (n ) / (Ld (n, r) + L (n))} ² ] ^1/4

上記〔数式２３〕では、[]で括った項が３つ存在するが、１番目の[]の平方根を、２番目の[]の４乗根、３番目の[]の４乗根で除したものに正規化のための係数“２５６”を乗じている。正規化のための係数は、正規化の範囲に合わせて適宜変更することが可能である。なお、上記〔数式２３〕において、“Σ_n=0,…,31”は、ｎ＝０から３１までｎを１ずつ増加させたときの３２個分の総和を意味する。［］で括った各項のべき乗根として、具体的に何乗にするかは適宜変更することができるが、本実施形態では、〔数式２３〕に示すように平方根、４乗根、４乗根としている。 In [Equation 23], there are three terms enclosed by [], but the square root of the first [] is divided by the fourth root of the second [] and the fourth root of the third []. Is multiplied by a coefficient “256” for normalization. The coefficient for normalization can be appropriately changed according to the range of normalization. In the above [Equation 23], “Σ _{n = 0,..., 31} ” means a total of 32 pieces when n is incremented by 1 from n = 0 to 31. As the power root of each term enclosed in [], the specific power can be appropriately changed. However, in this embodiment, as shown in [Equation 23], the square root, the fourth root, the fourth power Roots.

上記〔数式２３〕は、代表検索特徴データと代表登録特徴データとの距離を示すが、双方の正規化相関係数を基に定義している。具体的には、代表検索特徴データの各要素ｎを検索特徴データ配列の平均値に対して検索特徴データ配列の標準偏差値と登録特徴データ配列の標準偏差値との和で３２個の各要素ごとに除して正規化した各値で定義し、（Ｃ（ｎ）／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））とする。同様に、代表登録特徴データの各要素を登録特徴データ配列の平均値に対して検索特徴データ配列の標準偏差値と登録特徴データ配列の標準偏差値との和で３２個の各要素ごとに除して正規化した各値で定義し、Ｃｄ（ｎ，ｒ）／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））とする。これら、３２個の要素どうしの正規化相関係数は、［Σ_n=0,…,31｛（Ｃｄ（ｎ，ｒ）−Ｃ（ｎ））／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））｝²］／［Σ_n=0,…,31｛Ｃｄ（ｎ，ｒ）／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））｝²］^1/2／［Σ_n=0,…,31｛（Ｃ（ｎ）／（Ｌｄ（ｎ，ｒ）＋Ｌ（ｎ））｝²］^1/2で与えられ、−１〜＋１の実数値をとる。上記〔数式２３〕では、この正規化相関係数値に所定の整数値２５６を乗算して整数表現にし、かつ整数値の変動範囲を拡大して該当レコードと非該当レコードとの格差をつけるため、平方根をとるようにした。このとき、正規化相関係数値をそのまま使用するか、平方根にするか、４乗根にするかは運用上の設計事項で適宜最適な方法を選択すれば良い。 [Equation 23] indicates the distance between the representative search feature data and the representative registered feature data, and is defined based on the normalized correlation coefficient of both. Specifically, each element n of the representative search feature data is represented by the sum of the standard deviation value of the search feature data array and the standard deviation value of the registered feature data array with respect to the average value of the search feature data array. Each value is defined by being normalized and divided by (C (n) / (Ld (n, r) + L (n)). Similarly, each element of the representative registered feature data is registered in the registered feature data array. Is defined by each value normalized by dividing each of the 32 elements by the sum of the standard deviation value of the search feature data array and the standard deviation value of the registered feature data array with respect to the average value of Cd (n, r) / (Ld (n, r) + L (n)) The normalized correlation coefficient between these 32 elements is [Σ _{n = 0,..., 31} {(Cd (n, r) -C (n)) / (Ld (n, r) + L (n))} 2] / [Σ n = 0, ..., 31 {Cd (n, r) / (Ld (n, r) + L (n ^{^{)} 2] 1/2 / [Σ}} n = 0, ..., 31 {(C (n) / (Ld (n, r) + L (n))} 2] is given by ^1/2, -1 to +1 In the above [Equation 23], the normalized correlation coefficient value is multiplied by a predetermined integer value 256 to give an integer expression, and the fluctuation range of the integer value is expanded to show the corresponding record and the non-corresponding record. In order to make the difference, the normalized correlation coefficient value is used as it is, whether it is the square root or the fourth root. Just choose.

代表特徴データ距離Ｕ（ｒ）は、図１８の検索音響データの円と、原音響データの円の距離に相当する。現実には、絞り込み結果として抽出する対象の決定を、図１８に示した円と円が交わった場合とするか円と円が所定の距離範囲内である場合とするか等適宜設定することができる。本実施形態では、〔数式２３〕により算出された代表特徴データ距離Ｕ（ｒ）と判定しきい値との比較により絞り込み対象を決定する。 The representative feature data distance U (r) corresponds to the distance between the circle of the search sound data and the circle of the original sound data in FIG. In reality, the determination of the target to be extracted as the narrowing-down result may be set as appropriate, for example, when the circle and the circle shown in FIG. 18 intersect or when the circle and the circle are within a predetermined distance range. it can. In this embodiment, the narrowing down target is determined by comparing the representative feature data distance U (r) calculated by [Equation 23] with the determination threshold value.

次に、代表特徴データ照合手段８０は、算出した代表特徴データ距離Ｕ（ｒ）の補正を行う（Ｓ２２０）。具体的には、以下の〔数式２４〕に従った処理を実行し、補正値Ｕ´（ｒ）を算出する。 Next, the representative feature data collating unit 80 corrects the calculated representative feature data distance U (r) (S220). Specifically, the process according to the following [Equation 24] is executed to calculate the correction value U ′ (r).

〔数式２４〕
Ｕ´（ｒ）＝Ｕ（ｒ）・｛Ｍ^Sｕ／Ｍｕ（ｒ）｝^1/2 [Formula 24]
U ′ (r) = U (r) · {M ^S u / Mu (r)} ^1/2

上記〔数式２４〕において、Ｍ^Sｕは標準値であり、判定しきい値ＭＭｕの１／２の値である。Ｍｕ（ｒ）は、音響データベース４０に登録されているレコードｒの個別判定参考値である。したがって、補正値Ｕ´（ｒ）は、標準値Ｍ^Sｕをレコードｒの個別判定参考値Ｍｕ（ｒ）で除したものの平方根を代表特徴データ距離Ｕ（ｒ）に乗じることにより得られる。 In the above [Equation 24], M ^S u is a standard value and is a value that is ½ of the determination threshold value Mmu. Mu (r) is an individual determination reference value of the record r registered in the acoustic database 40. Accordingly, the correction value U '(r) is obtained by multiplying the square root representative feature data distance U (r) but by dividing the standard value M ^S u record r of the individual judgment reference value Mu (r).

次に、代表特徴データ照合手段８０は、算出した代表特徴データ距離の補正値Ｕ´（ｒ）と事前に設定されている判定しきい値ＭＭｕの比較を行う（Ｓ２３０）。比較の結果、代表特徴データ距離の補正値Ｕ´（ｒ）が判定しきい値ＭＭｕ以上である場合は、Ｓ２７０に進み、ｒをインクリメントして、次のレコードｒ＋１についての処理を行う。代表特徴データ距離の補正値Ｕ´（ｒ）が判定しきい値ＭＭｕ以上である場合、そのレコードｒについての詳細な照合計算は行わないことになる。すなわち、本実施形態では、代表特徴データ距離の補正値Ｕ´（ｒ）の値によって、検索対象の絞込みを行っていることになる。 Next, the representative feature data collating unit 80 compares the calculated representative feature data distance correction value U ′ (r) with a preset determination threshold value MMu (S230). As a result of the comparison, when the correction value U ′ (r) of the representative feature data distance is equal to or larger than the determination threshold value Mmu, the process proceeds to S270, where r is incremented, and the process for the next record r + 1 is performed. When the correction value U ′ (r) of the representative feature data distance is equal to or greater than the determination threshold value MMu, detailed collation calculation for the record r is not performed. That is, in the present embodiment, the search target is narrowed down by the value of the representative feature data distance correction value U ′ (r).

Ｓ２３０における比較の結果、代表特徴データ距離の補正値Ｕ´（ｒ）が判定しきい値ＭＭｕより小さい場合は、特徴ワード照合手段９０が、レコードｒに対応付けて登録された登録特徴ワード群と、検索特徴ワード群との照合を行う（Ｓ２４０）。Ｓ２４０における照合の結果、所定の条件を満たすレコードについては、Ｒｍｉｎ（ｃ）にそのレコード番号ｒが与えられて出力される。このＳ２４０の処理の詳細については後述する。 If the correction value U ′ (r) of the representative feature data distance is smaller than the determination threshold value MMu as a result of the comparison in S230, the feature word collating unit 90 is registered with the registered feature word group registered in association with the record r. Then, collation with the search feature word group is performed (S240). As a result of the collation in S240, a record satisfying a predetermined condition is output with the record number r given to Rmin (c). Details of the process of S240 will be described later.

Ｒｍｉｎ（ｃ）が得られたら、Ｒｍｉｎ（ｃ）が０以上かどうかを判断する（Ｓ２５０）。Ｒｍｉｎ（ｃ）が０以上の場合、レコードが適合したと判断して、適合件数ｃに１加算する処理を行う（Ｓ２６０）。Ｒｍｉｎ（ｃ）が０未満の場合、レコードが適合しなかったと判断して、適合件数ｃの加算は行わない。詳しくは後述するが、Ｓ２４０において、Ｒｍｉｎ（ｃ）に０未満の値を初期値として設定しておき、レコードが適合すると判断された場合に０以上の値であるレコード番号ｒをＲｍｉｎ（ｃ）に与える。このため、Ｓ２６０においては、Ｒｍｉｎ（ｃ）が０以上かどうかを判断することによりレコードの適合を判断するのである。 If Rmin (c) is obtained, it is determined whether Rmin (c) is 0 or more (S250). When Rmin (c) is 0 or more, it is determined that the record is matched, and a process of adding 1 to the number of matching cases c is performed (S260). If Rmin (c) is less than 0, it is determined that the record has not been matched, and the matching number c is not added. Although details will be described later, in S240, a value less than 0 is set as an initial value in Rmin (c), and when it is determined that the record is suitable, a record number r that is greater than or equal to 0 is set to Rmin (c). To give. For this reason, in S260, the suitability of the record is determined by determining whether Rmin (c) is 0 or more.

次に、レコードを特定する変数ｒをインクリメント、すなわち１だけ増加する（Ｓ２７０）。そして、レコードを特定する変数ｒが音響データベース４０内の総レコード数Ｒに達したかどうかを判断し（Ｓ２８０）、達していない場合は、Ｓ２４０に戻って、次のレコードｒについて照合処理を行う。各レコードｒについて処理を実行し、レコードｒが総レコード数Ｒに達したら、すなわちＲ個全ての総レコードに対する処理を終えたら、適合テーブルＲｍｉｎ（ｃ）を、適合テーブルＳｍｉｎ（ｃ）の値に基づいて昇順ソートし、適合件数ｃとともに一覧出力する（Ｓ２９０）。 Next, the variable r specifying the record is incremented, that is, increased by 1 (S270). Then, it is determined whether or not the variable r for specifying the record has reached the total number R of records in the acoustic database 40 (S280). If not, the process returns to S240 to perform the collation process for the next record r. . When processing is performed for each record r and the record r reaches the total number of records R, that is, when processing for all R total records is completed, the matching table Rmin (c) is changed to the value of the matching table Smin (c). Based on the ascending order, the list is output together with the matching number c (S290).

次に、図１９のＳ２４０におけるレコードｒの照合処理の詳細について図２０のフローチャートを用いて説明する。まず、初期設定を行う（Ｓ２４１）。具体的には、適合テーブルＳｍｉｎ（ｃ）＝初期値Ｂｉｇ３、適合テーブルＲｍｉｎ（ｃ）＝−１、検索特徴ワードを特定する変数ｘ＝０、登録特徴ワードを特定する変数ｙ＝０、位相を特定する変数ｈ＝ｈｏに設定する。ｈｏとしては、Ｈ＝５の場合、ｈｏ＝０，１，２，３，４のいずれかに設定することができるが、通常演算処理が最も少ないｈｏ＝０に設定する。続いて、合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）の算出を行う（Ｓ２４２）。このＳ２４２の処理の詳細については後述する。 Next, the details of the record r matching process in S240 of FIG. 19 will be described using the flowchart of FIG. First, initial setting is performed (S241). Specifically, the matching table Smin (c) = initial value Big3, the matching table Rmin (c) = − 1, the variable x = 0 for specifying the search feature word, the variable y = 0 for specifying the registered feature word, and the phase Set variable h = ho to be specified. As for ho, when H = 5, it can be set to any of ho = 0, 1, 2, 3, and 4, but it is set to ho = 0 with the least amount of normal arithmetic processing. Subsequently, the total mismatch bit number S (r, y, ho, x) is calculated (S242). Details of the processing of S242 will be described later.

次に、特徴ワード照合手段９０は、算出した合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）の補正を行う（Ｓ２４３）。具体的には、以下の〔数式２５〕に従った処理を実行し、補正値Ｓ´（ｒ，ｙ，ｈｏ，ｘ）を算出する。 Next, the feature word collating unit 90 corrects the calculated sum mismatch bit number S (r, y, ho, x) (S243). Specifically, a process according to the following [Equation 25] is executed to calculate a correction value S ′ (r, y, ho, x).

〔数式２５〕
Ｓ´（ｒ，ｙ，ｈｏ，ｘ）＝Ｓ（ｒ，ｙ，ｈｏ，ｘ）・｛Ｍ^S１／Ｍ１（ｒ）｝^1/2 [Formula 25]
S ′ (r, y, ho, x) = S (r, y, ho, x) · {M ^S 1 / M1 (r)} ^1/2

上記〔数式２５〕において、Ｍ^S１は標準値であり、判定しきい値ＭＭ１の１／２の値である。Ｍ１（ｒ）は、音響データベース４０に登録されているレコードｒの個別判定参考値である。したがって、補正値Ｓ´（ｒ，ｙ，ｈｏ，ｘ）は、標準値Ｍ^S１をレコードｒの個別判定参考値Ｍ１（ｒ）で除したものの平方根を合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）に乗じることにより得られる。 In the above [Equation 25], M ^S 1 is a standard value, which is a value half of the determination threshold value MM1. M1 (r) is an individual determination reference value of the record r registered in the acoustic database 40. Therefore, the correction value S ′ (r, y, ho, x) is obtained by dividing the square root of the standard value M ^S 1 by the individual determination reference value M1 (r) of the record r and adding the number S (r, y, obtained by multiplying ho, x).

次に、特徴ワード照合手段９０は、算出した合算不一致ビット数の補正値Ｓ´（ｒ，ｙ，ｈｏ，ｘ）と事前に設定されている判定しきい値ＭＭ１の比較を行う（Ｓ２４４）。比較の結果、補正値Ｓ´（ｒ，ｙ，ｈｏ，ｘ）が判定しきい値ＭＭ１より大きい場合は、Ｓ２４６に進み、ｘをインクリメントして、次の検索特徴ワードｘ＋１についての処理を行う。補正値Ｓ´（ｒ，ｙ，ｈｏ，ｘ）が判定しきい値ＭＭ１以下である場合、特定された（ｘ，ｙ）における登録特徴ワードｙと、検索特徴ワードｘとの照合を行う（Ｓ２４５）。 Next, the feature word collating unit 90 compares the calculated correction value S ′ (r, y, ho, x) of the total mismatch bit number with a preset determination threshold value MM1 (S244). As a result of the comparison, if the correction value S ′ (r, y, ho, x) is larger than the determination threshold MM1, the process proceeds to S246, x is incremented, and the process for the next search feature word x + 1 is performed. When the correction value S ′ (r, y, ho, x) is equal to or smaller than the determination threshold value MM1, the registered feature word y in the specified (x, y) is compared with the search feature word x (S245). ).

次に、変数ｘをインクリメントする（Ｓ２４６）。そして、ｘとＸ（ｈｏ）の比較を行う（Ｓ２４７）。比較の結果、ｘがＸ（ｈｏ）に達していない場合は、Ｓ２４２に戻って次の検索特徴ワードｘ＋１についての処理を行う。比較の結果、ｘがＸ（ｈｏ）に達している場合は、ｘ＝０としてｙをインクリメントする（Ｓ２４８）。そして、ｙとＹ（ｒ）の比較を行う（Ｓ２４９）。比較の結果、ｙがＹ（ｒ）に達していない場合は、Ｓ２４２に戻って次の登録特徴ワードｙ＋１についての処理を行う。 Next, the variable x is incremented (S246). Then, x and X (ho) are compared (S247). As a result of the comparison, if x has not reached X (ho), the process returns to S242 and the process for the next search feature word x + 1 is performed. If x reaches X (ho) as a result of the comparison, x is set to 0 and y is incremented (S248). Then, y is compared with Y (r) (S249). As a result of the comparison, if y has not reached Y (r), the process returns to S242 to perform processing for the next registered feature word y + 1.

比較の結果、ｙがＹ（ｒ）に達している場合は、Ｙ（ｒ）個全ての登録特徴ワードに対する処理を終えたことになるので、その時点における適合テーブルＲｍｉｎ（ｃ）、適合テーブルＳｍｉｎ（ｃ）を出力する。この適合テーブルＲｍｉｎ（ｃ）、適合テーブルＳｍｉｎ（ｃ）が図１９のＳ２４０において得られることになる。 If y reaches Y (r) as a result of the comparison, the processing for all Y (r) registered feature words has been completed. Therefore, the matching table Rmin (c) and the matching table Smin at that time point. (C) is output. The matching table Rmin (c) and the matching table Smin (c) are obtained in S240 of FIG.

次に、図２０のＳ２４２における合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）の算出処理の詳細について図２１のフローチャートを用いて説明する。まず、初期設定を行う（Ｓ２８１）。具体的には、位相を特定する変数ｈ＝ｈｏ、合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）＝０、特徴ワードの照合個数を示す変数ｗ＝０に設定する。初期設定後、登録特徴ワードの音量データＶｄ（ｒ，ｙ＋ｗ）、検索特徴ワードの音量データＶ（ｈ，ｘ＋ｗ）がともに０より大きいという条件を満たすかどうかを判断する（Ｓ２８２）。登録特徴ワードの音量データＶｄ（ｒ，ｙ＋ｗ）、検索特徴ワードの音量データＶ（ｈ，ｘ＋ｗ）は、それぞれ原音響データ、検索音響データに対して〔数式５〕、〔数式２０〕に従った処理を実行し、算出された“Vol”、“Vol(ｈ)”を正規化したものである。Ｓ２８２における判断の結果、登録特徴ワードの音量データＶｄ（ｒ，ｙ＋ｗ）、検索特徴ワードの音量データＶ（ｚ，ｈ，ｘ＋ｗ）がともに０より大きいという条件を満たす場合に限り、登録特徴ワード１つと検索特徴ワード１つを比較した場合の、不一致ビット数であるワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）を算出する（Ｓ２８３）。 Next, details of the calculation processing of the total mismatch bit number S (r, y, ho, x) in S242 of FIG. 20 will be described using the flowchart of FIG. First, initial setting is performed (S281). Specifically, the variable h = ho for specifying the phase, the sum mismatch bit number S (r, y, ho, x) = 0, and the variable w = 0 indicating the number of feature word collations are set. After the initial setting, it is determined whether or not the condition that the volume data Vd (r, y + w) of the registered feature word and the volume data V (h, x + w) of the search feature word are both greater than 0 is satisfied (S282). The volume data Vd (r, y + w) of the registered feature word and the volume data V (h, x + w) of the search feature word are in accordance with [Formula 5] and [Formula 20] with respect to the original sound data and the retrieved sound data, respectively. The process is executed, and the calculated “Vol” and “Vol (h)” are normalized. As a result of the determination in S282, the registered feature word 1 only when the volume data Vd (r, y + w) of the registered feature word and the volume data V (z, h, x + w) of the search feature word are both greater than 0. The word unit mismatch bit number D (r, y + w, ho, x + w), which is the number of mismatch bits, is compared with one search feature word (S283).

次に、特徴ワード照合手段９０は、算出したワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）の補正を行う（Ｓ２８４）。具体的には、以下の〔数式２６〕に従った処理を実行し、補正値Ｄ´（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）を算出する。 Next, the feature word collating unit 90 corrects the calculated word unit mismatch bit number D (r, y + w, ho, x + w) (S284). Specifically, a process according to the following [Equation 26] is executed to calculate a correction value D ′ (r, y + w, ho, x + w).

〔数式２６〕
Ｄ´（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）＝Ｄ（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）・｛Ｍ^Sｗ１／Ｍｗ１（ｒ）｝^1/2 [Formula 26]
D'(r, y + w, ho, x + w) = D (r, y + w, ho, x + w) · {M S w1 / Mw1 (r)} 1/2

上記〔数式２６〕において、Ｍ^Sｗ１は標準値であり、判定しきい値ＭＭｗ１の１／２の値である。Ｍｗ１（ｒ）は、音響データベース４０に登録されているレコードｒの個別判定参考値である。したがって、補正値Ｄ´（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）は、標準値Ｍ^Sｗ１をレコードｒの個別判定参考値Ｍｗ１（ｒ）で除したものの平方根をワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）に乗じることにより得られる。 In the above [Equation 26], M ^s w1 is a standard value, which is a value half that of the determination threshold MMw1. Mw1 (r) is an individual determination reference value of the record r registered in the acoustic database 40. Therefore, the correction value D ′ (r, y + w, ho, x + w) is the word unit mismatch bit number D (r, y + w) obtained by dividing the square root of the standard value M ^S w1 by the individual determination reference value Mw1 (r) of the record r. , Ho, x + w).

次に、特徴ワード照合手段９０は、算出したワード単位不一致ビット数の補正値Ｄ´（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）と事前に設定されている判定しきい値ＭＭｗ１の比較を行う（Ｓ２８５）。Ｓ２８５における判断の結果、補正値Ｄ´（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）が判定しきい値ＭＭｗ１以下である場合に限り、補正値Ｄ´（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）を合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）に加算する処理を行う（Ｓ２８６）。 Next, the feature word collating unit 90 compares the calculated correction value D ′ (r, y + w, ho, x + w) of the word unit mismatch bit number with a predetermined determination threshold value MMw1 (S285). . Only when the correction value D ′ (r, y + w, ho, x + w) is equal to or smaller than the determination threshold MMw1 as a result of the determination in S285, the correction value D ′ (r, y + w, ho, x + w) A process of adding to S (r, y, ho, x) is performed (S286).

Ｓ２８６においては、特徴ワードの照合個数を示す変数ｗをインクリメントする処理も行う。そして、特徴ワードの照合個数を示す変数ｗが所定数Ｗに達したかどうかを判断し（Ｓ２８７）、達していない場合は、Ｓ２８２に戻って、次の特徴ワードについて処理を行う。各特徴ワードについて処理を実行し、変数ｗが所定数Ｗに達したら、その時点における合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）を、あるｘ，ｈ（＝ｈｏ），ｙについての合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）として出力する。この合算不一致ビット数Ｓ（ｒ，ｙ，ｈｏ，ｘ）が図２０のＳ２４２において得られることになる。なお、Ｓ２８２、Ｓ２８５において条件を満たさないと判断された場合には、合算不一致ビット数の算出エラーが出力される。 In S286, a process of incrementing the variable w indicating the number of feature word collations is also performed. Then, it is determined whether or not the variable w indicating the number of collated feature words has reached the predetermined number W (S287). If not, the process returns to S282 and the next feature word is processed. When processing is performed for each feature word and the variable w reaches a predetermined number W, the total mismatch bit number S (r, y, ho, x) at that time is calculated for a certain x, h (= ho), y. Output as the total mismatch bit number S (r, y, ho, x). This sum mismatch bit number S (r, y, ho, x) is obtained in S242 of FIG. If it is determined in S282 and S285 that the condition is not satisfied, an error in calculating the total number of mismatch bits is output.

次に、図２０のＳ２４５におけるレコード内照合処理の詳細について図２２のフローチャートを用いて説明する。まず、初期設定を行う（Ｓ３２１）。具体的には、位相を特定する変数ｈ＝０に設定する。続いて、合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）の算出を行う（Ｓ３２２）。このＳ３２２の処理の詳細については後述する。 Next, details of the intra-record matching process in S245 of FIG. 20 will be described using the flowchart of FIG. First, initial setting is performed (S321). Specifically, the variable h = 0 that specifies the phase is set. Subsequently, the sum mismatch bit number S (r, y, h, x) is calculated (S322). Details of the process of S322 will be described later.

次に、特徴ワード照合手段９０は、算出した合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）の補正を行う（Ｓ３２３）。具体的には、以下の〔数式２７〕に従った処理を実行し、補正値Ｓ´（ｒ，ｙ，ｈ，ｘ）を算出する。 Next, the feature word collating unit 90 corrects the calculated sum mismatch bit number S (r, y, h, x) (S323). Specifically, the processing according to the following [Equation 27] is executed to calculate the correction value S ′ (r, y, h, x).

〔数式２７〕
Ｓ´（ｒ，ｙ，ｈ，ｘ）＝Ｓ（ｒ，ｙ，ｈ，ｘ）・｛Ｍ^S２／Ｍ２（ｒ）｝^1/2 [Formula 27]
S ′ (r, y, h, x) = S (r, y, h, x) · {M ^S 2 / M2 (r)} ^1/2

上記〔数式２７〕において、Ｍ^S２は標準値であり、判定しきい値ＭＭ２の１／２の値である。Ｍ２（ｒ）は、音響データベース４０に登録されているレコードｒの個別判定参考値である。したがって、補正値Ｓ´（ｒ，ｙ，ｈ，ｘ）は、標準値Ｍ^S２をレコードｒの個別判定参考値Ｍ２（ｒ）で除したものの平方根を合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）に乗じることにより得られる。 In the above [Expression 27], M ^S 2 is a standard value, which is a value that is ½ of the determination threshold value MM2. M2 (r) is an individual determination reference value of the record r registered in the acoustic database 40. Accordingly, the correction value S ′ (r, y, h, x) is obtained by dividing the square root of the standard value M ^S 2 by the individual determination reference value M2 (r) of the record r and adding the number S (r, y, obtained by multiplying h, x).

次に、特徴ワード照合手段９０は、算出した合算不一致ビット数の補正値Ｓ´（ｒ，ｙ，ｈ，ｘ）と事前に設定されている判定しきい値ＭＭ２の比較を行う（Ｓ３２４）。補正値Ｓ´（ｒ，ｙ，ｈ，ｘ）が判定しきい値ＭＭ２以下である場合、補正値Ｓ´（ｒ，ｙ，ｈ，ｘ）と合算不一致ビット数最小値Ｓｍｉｎ（ｃ）の比較を行う（Ｓ３２５）。 Next, the feature word collating unit 90 compares the calculated correction value S ′ (r, y, h, x) of the total mismatch bit number with the preset determination threshold value MM2 (S324). When the correction value S ′ (r, y, h, x) is less than or equal to the determination threshold value MM2, the correction value S ′ (r, y, h, x) is compared with the sum total mismatch bit number minimum value Smin (c). (S325).

補正値Ｓ´（ｒ，ｙ，ｈ，ｘ）が合算不一致ビット数最小値Ｓｍｉｎ（ｃ）より小さい場合に限り、ｒの値を適合テーブルＲｍｉｎ（ｃ）にセットし、補正値Ｓ´（ｒ，ｙ，ｈ，ｘ）を適合テーブルＳｍｉｎ（ｃ）にセットする処理を行う（Ｓ３２６）。次に、変数ｈをインクリメントする（Ｓ３２７）。そして、ｈとＨの比較を行う（Ｓ３２８）。比較の結果、ｈがＨに達していない場合は、Ｓ３２２に戻って次の位相番号ｈ＋１についての処理を行う。 Only when the correction value S ′ (r, y, h, x) is smaller than the sum total mismatch bit number minimum value Smin (c), the value of r is set in the matching table Rmin (c), and the correction value S ′ (r , Y, h, x) is set in the matching table Smin (c) (S326). Next, the variable h is incremented (S327). Then, h and H are compared (S328). If h does not reach H as a result of the comparison, the process returns to S322 and the process for the next phase number h + 1 is performed.

比較の結果、ｈがＨに達している場合は、Ｈ個全ての位相に対する処理を終えたことになるので、その時点における適合テーブルＲｍｉｎ（ｃ）、適合テーブルＳｍｉｎ（ｃ）を出力する。この適合テーブルＲｍｉｎ（ｃ）、適合テーブルＳｍｉｎ（ｃ）が図２０のＳ２４５において得られることになる。 If h has reached H as a result of the comparison, the processing for all H phases has been completed, and the matching table Rmin (c) and matching table Smin (c) at that time are output. The matching table Rmin (c) and the matching table Smin (c) are obtained in S245 of FIG.

次に、図２２のＳ３２２における合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）の算出処理の詳細について図２３のフローチャートを用いて説明する。まず、初期設定を行う（Ｓ３８１）。具体的には、合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）＝０、特徴ワードの照合個数を示す変数ｗ＝０に設定する。初期設定後、登録特徴ワードの音量データＶｄ（ｒ，ｙ＋ｗ）、検索特徴ワードの音量データＶ（ｈ，ｘ＋ｗ）がともに０より大きいという条件を満たすかどうかを判断する（Ｓ３８２）。Ｓ３８２における判断の結果、登録特徴ワードの音量データＶｄ（ｒ，ｙ＋ｗ）、検索特徴ワードの音量データＶ（ｚ，ｈ，ｘ＋ｗ）がともに０より大きいという条件を満たす場合に限り、登録特徴ワード１つと検索特徴ワード１つを比較した場合の、不一致ビット数であるワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）を算出する（Ｓ３８３）。 Next, details of the calculation processing of the total mismatch bit number S (r, y, h, x) in S322 of FIG. 22 will be described using the flowchart of FIG. First, initial setting is performed (S381). Specifically, the total mismatch bit number S (r, y, h, x) = 0 and the variable w = 0 indicating the number of feature word collations are set. After the initial setting, it is determined whether or not the condition that the volume data Vd (r, y + w) of the registered feature word and the volume data V (h, x + w) of the search feature word are both greater than 0 is satisfied (S382). As a result of the determination in S382, the registered feature word 1 only when the volume data Vd (r, y + w) of the registered feature word and the volume data V (z, h, x + w) of the search feature word are both greater than 0. The word unit mismatch bit number D (r, y + w, h, x + w), which is the number of mismatch bits when one search feature word is compared, is calculated (S383).

次に、特徴ワード照合手段９０は、算出したワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）の補正を行う（Ｓ３８４）。具体的には、以下の〔数式２８〕に従った処理を実行し、補正値Ｄ´（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）を算出する。 Next, the feature word collating unit 90 corrects the calculated word unit mismatch bit number D (r, y + w, h, x + w) (S384). Specifically, a process according to the following [Equation 28] is executed to calculate a correction value D ′ (r, y + w, h, x + w).

〔数式２８〕
Ｄ´（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）＝Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）・｛Ｍ^Sｗ２／Ｍｗ２（ｒ）｝^1/2 [Formula 28]
D'(r, y + w, h, x + w) = D (r, y + w, h, x + w) · {M S w2 / Mw2 (r)} 1/2

上記〔数式２８〕において、Ｍ^Sｗ２は標準値であり、判定しきい値ＭＭｗ２の１／２の値である。Ｍｗ２（ｒ）は、音響データベース４０に登録されているレコードｒの個別判定参考値である。したがって、補正値Ｄ´（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）は、標準値Ｍ^Sｗ２をレコードｒの個別判定参考値Ｍｗ２（ｒ）で除したものの平方根をワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）に乗じることにより得られる。 In the above [Equation 28], M ^S w2 is a standard value, which is a value that is ½ of the determination threshold value MMw2. Mw2 (r) is an individual determination reference value of the record r registered in the acoustic database 40. Therefore, the correction value D ′ (r, y + w, h, x + w) is obtained by dividing the square root of the standard value M ^S w2 by the individual determination reference value Mw2 (r) of the record r, and the number D (r, y + w) of word unit mismatch bits. , H, x + w).

次に、特徴ワード照合手段９０は、算出したワード単位不一致ビット数の補正値Ｄ´（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）と事前に設定されている判定しきい値ＭＭｗ２の比較を行う（Ｓ３８５）。Ｓ３８５における判断の結果、補正値Ｄ´（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）が判定しきい値ＭＭｗ２以下である場合に限り、補正値Ｄ´（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）を合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）に加算する処理を行う（Ｓ３８６）。Ｓ３８６においては、特徴ワードの照合個数を示す変数ｗをインクリメントする処理も行う。そして、特徴ワードの照合個数を示す変数ｗが所定数Ｗに達したかどうかを判断し（Ｓ３８７）、達していない場合は、Ｓ３８２に戻って、次の特徴ワードについて処理を行う。各特徴ワードについて処理を実行し、変数ｗが所定数Ｗに達したら、その時点における合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）を、あるｘ，ｈ，ｙについての合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）として出力する。この合算不一致ビット数Ｓ（ｒ，ｙ，ｈ，ｘ）が図２２のＳ３２２において得られることになる。なお、Ｓ３８２、Ｓ３８５において条件を満たさないと判断された場合には、合算不一致ビット数の算出エラーが出力される。 Next, the feature word collating unit 90 compares the calculated correction value D ′ (r, y + w, h, x + w) of the word unit mismatch bit number with a preset determination threshold value MMw2 (S385). . Only when the correction value D ′ (r, y + w, h, x + w) is equal to or smaller than the determination threshold value MMw2 as a result of the determination in S385, the correction value D ′ (r, y + w, ho, x + w) A process of adding to S (r, y, h, x) is performed (S386). In S386, a process of incrementing the variable w indicating the number of feature word collations is also performed. Then, it is determined whether or not the variable w indicating the number of collated feature words has reached the predetermined number W (S387). If not, the process returns to S382 and the next feature word is processed. When processing is performed for each feature word and the variable w reaches a predetermined number W, the total mismatch bit number S (r, y, h, x) at that time is calculated as the total mismatch bit number for a certain x, h, y. Output as S (r, y, h, x). This sum mismatch bit number S (r, y, h, x) is obtained in S322 of FIG. If it is determined in S382 and S385 that the condition is not satisfied, a calculation error of the total mismatch bit number is output.

＜２．３．ワード単位不一致ビット数の算出＞
図２１のＳ２８３、図２３のＳ３８３におけるワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）の算出について説明する。ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）の算出については、利用者により設定される音量判定モードにより具体的な処理内容が異なる。音量判定モードとしては、Ｏｆｆ、Ｗｅｉｇｈｔ、Ｍａｔｃｈ、Ｂｏｔｈの４つが存在する。 <2.3. Calculation of word unit mismatch bit count>
The calculation of the word unit mismatch bit number D (r, y + w, h, x + w) in S283 in FIG. 21 and S383 in FIG. 23 will be described. Regarding the calculation of the word unit mismatch bit number D (r, y + w, h, x + w), the specific processing contents differ depending on the sound volume determination mode set by the user. There are four volume determination modes: Off, Weight, Match, and Both.

＜２．３．１．音量判定モード“Ｏｆｆ”＞
音量判定モード“Ｏｆｆ”は重みを付加しないモードであり、音量判定モード“Ｏｆｆ”が設定されている場合、ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）は、そのままワード単位の相違の程度を示すワード単位相違度Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）となる。音量判定モード“Ｏｆｆ”の場合、ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）＝０として初期値を設定した後、Ｆｄ（ｒ，ｙ＋ｗ）とＦ（ｈ，ｘ＋ｗ）の３２ビットを対応するビット単位に順次比較し、ビットが異なるごとに、Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）に１加算していく。登録特徴ワード群、検索特徴ワード群のいずれにおいても、特徴ワードの特徴パターンは同様の規則で作成され、低周波成分をＬＳＢ、高周波成分をＭＳＢとした３２ビットの構成であるので、照合はこれらの各ビット値が一致するかどうかにより行うことができる。 <2.3.1. Volume judgment mode “Off”>
The volume determination mode “Off” is a mode in which no weight is added, and when the volume determination mode “Off” is set, the word unit mismatch bit number D (r, y + w, h, x + w) is directly different in word units. This is a word unit dissimilarity D (r, y + w, h, x + w) indicating the degree of. In the sound volume determination mode “Off”, the initial value is set with the word unit mismatch bit number D (r, y + w, h, x + w) = 0, and then 32 bits of Fd (r, y + w) and F (h, x + w). Are sequentially compared in corresponding bit units, and 1 is added to D (r, y + w, h, x + w) every time the bits differ. In both the registered feature word group and the search feature word group, the feature pattern of the feature word is created according to the same rule, and has a 32-bit configuration in which the low frequency component is LSB and the high frequency component is MSB. This can be done depending on whether or not the bit values match.

＜２．３．２．音量判定モード“Ｗｅｉｇｈｔ”＞
音量判定モード“Ｗｅｉｇｈｔ”は重みを付加するモードであり、音量判定モード“Ｗｅｉｇｈｔ” が設定されている場合、ワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）＝０として初期値を設定した後、Ｆｄ（ｒ，ｙ＋ｗ）とＦ（ｈ，ｘ＋ｗ）の３２ビットを対応するビット単位に順次比較する。比較の結果に基づき、以下の〔数式２９〕に従った処理を実行して、Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）の値を定める。この結果、Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）はワード単位不一致ビット数ではなく、音量データを加味したワード単位の相違の程度を示すワード単位相違度となる。このモードは、登録特徴ワードと検索特徴ワードとの照合の際、検索特徴ワードの基礎となる検索音響データがアナログ変換などの信号処理を伴っていて、原音響データと音量の相対変化と音量の絶対値の双方が異なる場合に適切な照合結果を与える。 <2.3.2. Volume judgment mode “Weight”>
The sound volume determination mode “Weight” is a mode for adding a weight, and when the sound volume determination mode “Weight” is set, the initial value is set with the word unit mismatch bit number D (r, y + w, h, x + w) = 0. After that, the 32 bits of Fd (r, y + w) and F (h, x + w) are sequentially compared in corresponding bit units. Based on the result of the comparison, processing according to the following [Equation 29] is executed to determine the value of D (r, y + w, h, x + w). As a result, D (r, y + w, h, x + w) is not a word unit mismatch bit number but a word unit dissimilarity indicating the degree of difference in word units taking into account volume data. In this mode, when the registered feature word is matched with the search feature word, the search acoustic data that is the basis of the search feature word is accompanied by signal processing such as analog conversion. Appropriate matching results are given when both absolute values are different.

〔数式２９〕
Ｆｄ（ｒ，ｙ＋ｗ）側がビット１で、Ｆ（ｈ，ｘ＋ｗ）側がビット０の場合、
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）←Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）＋Ｖｄ（ｒ，ｙ＋ｗ）・２／｛Ｖｄ（ｒ，ｙ＋ｗ）＋Ｖ（ｈ，ｘ＋ｗ）｝
Ｆｄ（ｒ，ｙ＋ｗ）側がビット０で、Ｆ（ｈ，ｘ＋ｗ）側がビット１の場合、
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）←Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）＋Ｖ（ｈ，ｘ＋ｗ）・２／｛Ｖｄ（ｒ，ｙ＋ｗ）＋Ｖ（ｈ，ｘ＋ｗ）｝ [Formula 29]
When the Fd (r, y + w) side is bit 1 and the F (h, x + w) side is bit 0,
D (r, y + w, h, x + w) ← D (r, y + w, h, x + w) + Vd (r, y + w) · 2 / {Vd (r, y + w) + V (h, x + w)}
When the Fd (r, y + w) side is bit 0 and the F (h, x + w) side is bit 1,
D (r, y + w, h, x + w) ← D (r, y + w, h, x + w) + V (h, x + w) · 2 / {Vd (r, y + w) + V (h, x + w)}

＜２．３．３．音量判定モード“Ｍａｔｃｈ”＞
音量判定モード“Ｍａｔｃｈ”が設定されている場合、まず、音量判定モード“Ｏｆｆ”の場合の処理を行って、Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）を得る。そして、以下の〔数式３０〕に従った処理を実行することにより、重みを乗算してＤ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）の値を定める。この結果、Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）はワード単位不一致ビット数ではなく、音量データの変動パターンの相違分を加味したワード単位の相違の程度を示すワード単位相違度となる。このモードは、登録特徴ワードと検索特徴ワードとの照合の際、検索特徴ワードの基礎となる検索音響データが各種データ圧縮などの信号処理を伴っていて、原音響データと音量の相対変化にはあまり相違がないが、絶対値が異なる場合に適切な照合結果を与える。本実施形態では本モードが最も推奨される。 <2.3.3. Volume judgment mode “Match”>
When the sound volume determination mode “Match” is set, first, processing in the sound volume determination mode “Off” is performed to obtain D (r, y + w, h, x + w). Then, by executing the processing according to the following [Equation 30], the value of D (r, y + w, h, x + w) is determined by multiplying the weight. As a result, D (r, y + w, h, x + w) is not the word unit mismatch bit number but the word unit dissimilarity indicating the degree of difference in word units taking into account the difference in the fluctuation pattern of the volume data. In this mode, when the registered feature word and the search feature word are collated, the search acoustic data that is the basis of the search feature word is accompanied by signal processing such as various data compression. Although there is not much difference, an appropriate matching result is given when the absolute values are different. In this embodiment, this mode is most recommended.

〔数式３０〕
Ｖｄ（ｒ，ｙ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）＞Ｖ（ｈ，ｘ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）の場合、
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）←Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）／｛Ｖ（ｈ，ｘ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）｝
Ｖｄ（ｒ，ｙ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）＜Ｖ（ｈ，ｘ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）の場合、
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）←Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）／｛Ｖｄ（ｒ，ｙ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）｝ [Formula 30]
When Vd (r, y + w) · Vd (r, y + w−1)> V (h, x + w) · V (h, x + w−1),
D (r, y + w, h, x + w) ← D (r, y + w, h, x + w) · Vd (r, y + w) · Vd (r, y + w−1) / {V (h, x + w) · V (h, x + w-1)}
When Vd (r, y + w) · Vd (r, y + w−1) <V (h, x + w) · V (h, x + w−1),
D (r, y + w, h, x + w) ← D (r, y + w, h, x + w) · V (h, x + w) · V (h, x + w−1) / {Vd (r, y + w) · Vd (r, y + w-1)}

なお、ｗ＝０の場合、上記〔数式３０〕において、Ｖｄ（ｒ，ｙ＋ｗ−１）＝Ｖｄ（ｒ，ｙ＋ｗ）およびＶ（ｈ，ｘ＋ｗ−１）＝Ｖ（ｈ，ｘ＋ｗ）とする。 When w = 0, in the above [Equation 30], Vd (r, y + w−1) = Vd (r, y + w) and V (h, x + w−1) = V (h, x + w).

＜２．３．４．音量判定モード“Ｂｏｔｈ”＞
音量判定モード“Ｂｏｔｈ”が設定されている場合、まず、音量判定モード“Ｗｅｉｇｈｔ”の場合の処理を行って、Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）を得る。そして、以下の〔数式３１〕に従った処理を実行することにより、重みを乗算してＤ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）の値を定める。この結果、Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）はワード単位不一致ビット数ではなく、音量データを加味したワード単位の相違の程度を示すワード単位相違度となる。このモードは、登録特徴ワードと検索特徴ワードとの照合の際、検索特徴ワードの基礎となる検索音響データが波形歪みを伴う高い圧縮率のデータ圧縮やアナログ変換などの信号処理を伴っていて、原音響データと音量の相対変化と音量の絶対値の双方が顕著に異なる場合に適切な照合結果を与える。 <2.3.4. Volume judgment mode “Both”>
When the sound volume determination mode “Both” is set, first, processing in the sound volume determination mode “Weight” is performed to obtain D (r, y + w, h, x + w). Then, by executing processing according to the following [Equation 31], the value of D (r, y + w, h, x + w) is determined by multiplying the weight. As a result, D (r, y + w, h, x + w) is not a word unit mismatch bit number but a word unit dissimilarity indicating the degree of difference in word units taking into account volume data. In this mode, when the registered feature word and the search feature word are collated, the search acoustic data that is the basis of the search feature word is accompanied by signal processing such as high-compression data compression or analog conversion with waveform distortion. An appropriate collation result is given when both the original acoustic data, the relative change in volume, and the absolute value of volume are significantly different.

〔数式３１〕
Ｖｄ（ｒ，ｙ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）＞Ｖ（ｈ，ｘ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）の場合、
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）←Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）／｛Ｖ（ｈ，ｘ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）｝
Ｖｄ（ｒ，ｙ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）＜Ｖ（ｈ，ｘ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）の場合、
Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）←Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ）・Ｖｄ（ｒ，ｙ＋ｗ−１）／｛Ｖｄ（ｒ，ｙ＋ｗ）・Ｖ（ｈ，ｘ＋ｗ−１）｝ [Formula 31]
When Vd (r, y + w) · V (h, x + w−1)> V (h, x + w) · Vd (r, y + w−1),
D (r, y + w, h, x + w) ← D (r, y + w, h, x + w) · Vd (r, y + w) · V (h, x + w−1) / {V (h, x + w) · Vd (r, y + w-1)}
When Vd (r, y + w) · V (h, x + w−1) <V (h, x + w) · Vd (r, y + w−1),
D (r, y + w, h, x + w) ← D (r, y + w, h, x + w) · V (h, x + w) · Vd (r, y + w−1) / {Vd (r, y + w) · V (h, x + w-1)}

なお、ｗ＝０の場合、上記〔数式３１〕において、Ｖｄ（ｒ，ｙ＋ｗ−１）＝Ｖｄ（ｒ，ｙ＋ｗ）およびＶ（ｈ，ｘ＋ｗ−１）＝Ｖ（ｈ，ｘ＋ｗ）とする。 When w = 0, in the above [Equation 31], Vd (r, y + w−1) = Vd (r, y + w) and V (h, x + w−1) = V (h, x + w).

音量判定モード“Ｏｆｆ”以外の場合、Ｄ（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）はワード単位不一致ビット数ではなく、音量データを加味したワード単位の相違の程度を示すワード単位相違度となるため、図２０、図２１におけるＳ（ｒ，ｙ，ｈ，ｘ）は合算不一致ビット数ではなく、合算相違度を表すことになる。また、図１９、図２０におけるＳｍｉｎ（ｃ）は最小不一致ビット数ではなく、最小相違度を表すことになる。 In cases other than the sound volume determination mode “Off”, D (r, y + w, h, x + w) is not the number of word unit mismatch bits, but the word unit difference degree indicating the degree of difference in word units taking into account volume data. In FIG. 20 and FIG. 21, S (r, y, h, x) represents the total dissimilarity, not the total mismatch bit number. Also, Smin (c) in FIGS. 19 and 20 represents not the minimum mismatch bit number but the minimum difference.

情報出力手段１００は、図１９のＳ２９０においてレコード番号テーブルＲｍｉｎ（ｃ）を、最小相違度テーブルＳｍｉｎ（ｃ）の値に基づいて昇順にソートし、適合件数ｃとともに一覧出力する。最小相違度テーブルには、補正最小相違度Ｓ´（ｒ，ｙ，ｈ，ｘ）が記録されているため、補正最小相違度Ｓ´（ｒ，ｙ，ｈ，ｘ）の値にしたがって順序付けて出力される。すなわち、補正最小相違度Ｓ´（ｒ，ｙ，ｈ，ｘ）の値が小さいもの程上位に出力されることになる。これが検索音響データを用いて検索された検索結果となる。一覧出力する際、出力される内容は適宜設定することができる。例えば、関連情報全てを出力しても良いし、楽曲名のみを出力しても良い。出力された一覧の中から１つが選択された場合には、情報出力手段１００は、そのレコードを音響データベース４０から抽出する。その際、原音響データの取得指示が行われると、情報出力手段１００は、レコードに記録されたファイル名から記憶装置内に格納されている当該原音響データに関連する関連情報や当該原音響データを特定するＩＤを出力する。 The information output unit 100 sorts the record number table Rmin (c) in ascending order based on the value of the minimum difference degree table Smin (c) in S290 of FIG. 19, and outputs the list together with the number of matching cases c. Since the corrected minimum dissimilarity S ′ (r, y, h, x) is recorded in the minimum dissimilarity table, the minimum dissimilarity table is ordered according to the value of the corrected minimum dissimilarity S ′ (r, y, h, x). Is output. That is, the smaller the corrected minimum difference S ′ (r, y, h, x), the higher the output. This is a search result searched using the search acoustic data. When outputting the list, the content to be output can be set as appropriate. For example, all the related information may be output, or only the music title may be output. When one is selected from the output list, the information output unit 100 extracts the record from the acoustic database 40. At that time, when an instruction to acquire the original sound data is given, the information output means 100 uses the file name recorded in the record to obtain related information related to the original sound data stored in the storage device or the original sound data. An ID for specifying is output.

以上、本発明の好適な実施形態について説明したが、本発明は、上記実施形態に限定されず種々の変形が可能である。上記実施形態においては、様々な処理が組み合わされて、全体として処理負荷を抑えて、的確な検索を行うことを可能としているが、複数の処理の組み合わせのうち１以上の処理を省略することが可能である。この場合、処理負荷が若干増えたり、検索精度が若干低下したりすることになるが、本発明の効果は充分発揮することができる。例えば、上記実施形態では、照合範囲決定手段７０は、図１７（ａ）〜（ｅ）に示した全ての場合に対応して照合範囲を決定するようにしたが、図１７（ａ）〜（ｅ）のいずれか１つ以上に対応した処理を実行するようにしても良い。できれば、図１７（ａ）（ｂ）の２つの処理を含むようにするのが望ましい。 The preferred embodiments of the present invention have been described above, but the present invention is not limited to the above embodiments, and various modifications can be made. In the above embodiment, various processes are combined to reduce the processing load as a whole and perform an accurate search. However, one or more processes may be omitted from a combination of a plurality of processes. Is possible. In this case, the processing load slightly increases or the search accuracy slightly decreases, but the effect of the present invention can be sufficiently exerted. For example, in the above-described embodiment, the collation range determining unit 70 determines the collation range corresponding to all cases shown in FIGS. 17A to 17E. The processing corresponding to any one or more of e) may be executed. If possible, it is desirable to include the two processes of FIGS. 17 (a) and 17 (b).

また、上記実施形態では、図２０、図２２に示したように、特徴ワード照合手段９０は、位相ｈを特定した状態で第１の照合を行い、第１の照合の結果、相違の程度である相違度が、所定の判定しきい値ＭＭ１より小さい場合に、位相ｈを変更しながら行う第２の照合を実行するようにしたが、このような２段階の照合処理とせずに、位相を変化させる第２の照合のみを行うようにしても良い。第２の照合のみの場合は、処理負荷が増加することになるが、照合精度は変化しないため、照合範囲決定手段７０による照合範囲の制限により、本発明の効果は得ることができる。 In the above embodiment, as shown in FIG. 20 and FIG. 22, the feature word collating unit 90 performs the first collation in a state where the phase h is specified. When a certain degree of difference is smaller than the predetermined determination threshold value MM1, the second matching is performed while changing the phase h. However, the phase is changed without performing such a two-step matching process. Only the second collation to be changed may be performed. In the case of only the second collation, the processing load increases, but the collation accuracy does not change. Therefore, the collation range is limited by the collation range determining means 70, so that the effect of the present invention can be obtained.

また、上記実施形態では、図２０、図２１に示したように、特徴ワード照合手段９０は、Ｗ個のワード単位不一致ビット数Ｄ（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）が全て判定しきい値ＭＭｗ１より小さい場合に、合算不一致ビット数の補正値Ｓ´（ｒ，ｙ，ｈｏ，ｘ）が、判定しきい値ＭＭ１以下であるかどうかを判断し、図２２、図２３に示したように、特徴ワード照合手段９０は、Ｗ個のワード単位不一致ビット数の補正値Ｄ´（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）が全て判定しきい値ＭＭｗ２より小さい場合に、合算不一致ビット数の補正値Ｓ´（ｒ，ｙ，ｈ，ｘ）が、判定しきい値ＭＭ２以下であるかどうかを判断するようにしたが、いずれの場合も、ワード単位不一致ビット数の補正値Ｄ´が、判定しきい値ＭＭｗ１、ＭＭｗ２以下であるかどうかの判定を省略しても良い。ワード単位不一致ビット数の補正値Ｄ´の判定を省略した場合、個々の特徴ワードの差が大きいものが含まれる可能性があるが、全体として相違度ＭＭ１、ＭＭ２以下であれば、両者が類似していることの許容範囲とすることもできるためである。また、Ｗ個のワード単位不一致ビット数の補正値Ｄ´（ｒ，ｙ＋ｗ，ｈｏ，ｘ＋ｗ）が全て判定しきい値ＭＭｗ１より小さいかどうかの判断、および合算不一致ビット数の補正値Ｓ´（ｒ，ｙ，ｈｏ，ｘ）が、判定しきい値ＭＭ１以下であるかどうかの判断を省略し、Ｗ個のワード単位不一致ビット数の補正値Ｄ´（ｒ，ｙ＋ｗ，ｈ，ｘ＋ｗ）が全て判定しきい値ＭＭｗ２より小さく、合算不一致ビット数の補正値Ｓ´（ｒ，ｙ，ｈ，ｘ）が、判定しきい値ＭＭ２以下であるかどうかだけを判断するようにしても良い。 In the above embodiment, as shown in FIG. 20 and FIG. 21, the feature word collating means 90 is configured such that the W word unit mismatch bit numbers D (r, y + w, ho, x + w) are all determined threshold values MMw1. If it is smaller, it is determined whether the correction value S ′ (r, y, ho, x) of the total mismatch bit number is equal to or less than the determination threshold value MM1, and as shown in FIGS. The feature word collating means 90 corrects the total mismatch bit number correction value S ′ when all of the W word unit mismatch bit number correction values D ′ (r, y + w, h, x + w) are smaller than the determination threshold value MMw2. Whether or not (r, y, h, x) is equal to or less than the determination threshold MM2 is determined. In either case, the correction value D ′ of the word unit mismatch bit number is the determination threshold. Whether it is MMw1, MMw2 or less It may be omitted judgment. If the determination of the correction value D ′ for the word unit mismatch bit number is omitted, there may be a case where the difference between individual feature words is large, but if the overall difference is MM1 or MM2, the two are similar. This is because it can be within an allowable range. Further, it is determined whether or not the correction values D ′ (r, y + w, ho, x + w) for the W word unit mismatch bit numbers are all smaller than the determination threshold MMw1, and the correction value S ′ (r for the total mismatch bit number , Y, ho, x) omits the determination of whether or not the determination threshold value MM1 or less, and all the correction values D ′ (r, y + w, h, x + w) of the W word unit mismatch bit numbers are determined. It may be determined whether or not the correction value S ′ (r, y, h, x) of the number of unmatched bits that is smaller than the threshold MMw2 is equal to or less than the determination threshold MM2.

また、上記実施形態では、関連情報登録装置において代表登録特徴データを音響データベースに登録しておき、代表検索特徴データ生成手段６０により代表検索特徴データを生成し、代表特徴データ照合手段８０が、代表登録特徴データと代表検索特徴データの照合を行って、検索対象である原音響データの絞込みを行ったが、処理負荷を特に抑える必要がない場合は、この絞込み処理も必ずしも行う必要はない。 In the above embodiment, the representative registration feature data is registered in the acoustic database in the related information registration device, the representative search feature data generation unit 60 generates the representative search feature data, and the representative feature data matching unit 80 The registered feature data and the representative search feature data are collated to narrow down the original acoustic data to be searched. However, if there is no need to particularly reduce the processing load, this narrowing processing is not necessarily performed.

また、上記実施形態では、統合後の周波数バンドの数をｎ＝０〜３１までの３２バンドとしたが、状況に応じて適宜増減することが可能である。 Moreover, in the said embodiment, although the number of the frequency bands after integration was 32 bands from n = 0 to 31, it can be appropriately increased or decreased according to the situation.

また、上記実施形態では、音量判定モードとして４つのモードを選択可能としたが、上記４つのモードのうち、２つまたは３つのモードが選択可能に設定されていても良いし、１つのモードが固定的に設定されていても良い。 In the above embodiment, four modes can be selected as the sound volume determination mode. However, two or three of the four modes may be set to be selectable, and one mode may be selected. It may be fixedly set.

また、上記実施形態では、著作権上の問題から原音響データを音響データベースに登録しない場合について説明したが、原音響データそのものを関連情報の一部として音響データベースに登録しておき、検索の結果、抽出されたレコードに対応する原音響データを取得するようにしても良い。 In the above embodiment, the case where the original sound data is not registered in the sound database due to copyright problems has been described. However, the original sound data itself is registered in the sound database as a part of the related information, and the search result is obtained. The original sound data corresponding to the extracted record may be acquired.

本発明は、ＣＤ・ＤＶＤ等を用いた民生・業務用途における鑑賞用のパッケージ音楽分野、放送事業者等が商業目的で配信する放送・ネットワーク音楽配信分野における音楽著作権の保護（不正コピーの監視）および音楽属性情報の提供（楽曲タイトル検索サービス）等の産業に利用可能である。 The present invention relates to the protection of music copyright (monitoring illegal copying) in the field of package music for viewing for consumer and business use using CDs and DVDs, and the field of broadcasting and network music distribution distributed for commercial purposes by broadcasters and the like. ) And provision of music attribute information (music title search service).

２ａ、３ａ・・・ＣＰＵ
２ｂ、３ｂ・・・ＲＡＭ
２ｃ、３ｃ・・・データ記憶装置
２ｄ、３ｄ・・・プログラム記憶装置
２ｅ、３ｅ・・・キー入力Ｉ／Ｆ
２ｆ、３ｆ・・・データ入出力Ｉ／Ｆ
２ｇ、３ｇ・・・表示出力Ｉ／Ｆ
１０・・・登録特徴ワード生成手段
１５・・・部分特徴ワード生成手段
２０・・・代表登録特徴データ生成手段
２５・・・代表部分特徴データ生成手段
３０・・・判定参考値算出手段
３５・・・登録手段
４０・・・音響データベース
４５・・・モード設定手段
５０・・・検索特徴ワード生成手段
６０・・・代表検索特徴データ生成手段
７０・・・照合範囲決定手段
８０・・・代表特徴データ照合手段
９０・・・特徴ワード照合手段
１００・・・情報出力手段 2a, 3a ... CPU
2b, 3b ... RAM
2c, 3c ... data storage device 2d, 3d ... program storage device 2e, 3e ... key input I / F
2f, 3f ... Data I / O I / F
2g, 3g ... Display output I / F
DESCRIPTION OF SYMBOLS 10 ... Registered feature word generation means 15 ... Partial feature word generation means 20 ... Representative registration feature data generation means 25 ... Representative partial feature data generation means 30 ... Determination reference value calculation means 35 ... Registration means 40 ... Acoustic database 45 ... Mode setting means 50 ... Search feature word generation means 60 ... Representative search feature data generation means 70 ... Collation range determination means 80 ... Representative feature data Matching means 90 ... feature word matching means 100 ... information output means

Claims

A search is performed by comparing the characteristics of the retrieved acoustic data, which is the given acoustic data, with the characteristics of the original acoustic data registered in the acoustic database, and specifying the original acoustic data having characteristics similar to the retrieved acoustic data. A device for performing
For each original sound data, a registered feature word group that is a set of registered feature words expressing the characteristics of the original sound data, an individual determination reference value for comparison of the registered feature word group, and related information of the original sound data The registered acoustic database,
A search feature word generation unit that generates a search feature word that expresses a feature of the search acoustic data in each section for each of the search acoustic data, and obtains a feature word group that is a set of the search feature words. When,
While matching the positional relationship between the search feature word group and the registered feature word group registered in the acoustic database, the matching is performed between the search feature word group and the registered feature word group. The minimum difference that is the degree of the minimum difference is calculated, and the minimum difference is calculated based on the ratio between the set standard minimum difference and the individual determination reference value registered corresponding to the registered feature word group. A feature word matching means for obtaining a corrected minimum difference and correcting the original acoustic data corresponding to the registered feature word group as a selection target when the corrected minimum difference is smaller than a predetermined determination threshold value; ,
An apparatus for retrieving related information of acoustic data, comprising:

In claim 1,
The related information search apparatus for acoustic data, further comprising: information output means for outputting related information related to the original acoustic data specified as the selection target in an order based on the value of the corrected minimum difference.

In claim 1 or claim 2,
The feature word has a feature pattern of a predetermined number of bits in which each component of a frequency spectrum obtained by analyzing acoustic data is expressed by a bit value in a predetermined frequency unit,
The feature word collating means compares the feature pattern bit by bit to collate the registered feature word group and the search feature word group to obtain a mismatch bit number, and the registered feature word group and the Acoustic data characterized in that a position that best matches a search feature word group is set to a position where the number of mismatch bits is minimum, and a minimum mismatch bit number that minimizes the number of mismatch bits is obtained as the minimum dissimilarity. Related information retrieval device.

In claim 1 or claim 2,
The feature word has a feature pattern of a predetermined number of bits expressing each component of a frequency spectrum obtained by analysis of acoustic data by a bit value in a predetermined frequency unit, and volume data,
The feature word collating means compares the feature pattern bit by bit to collate the registered feature word group with the search feature word group to obtain the number of mismatch bits, and then based on the volume data of each feature word A weighted mismatch bit number is calculated by adding or multiplying the mismatch bit number with a weight, and a position where the registered feature word group and the search feature word group are most suitable is a position where the weight mismatch bit number is the smallest. An apparatus for retrieving related information of acoustic data, wherein the minimum weighted mismatch bit number that minimizes the weighted mismatch bit number is obtained as the minimum dissimilarity.

In any one of Claims 1-4,
The individual determination reference value is in advance,
For the original sound data corresponding to the registered feature word group, partial sound data is obtained by partially cutting, a unit section of a predetermined length is set for the partial sound data, and the unit section is Based on the analyzed information, a set of feature words expressing the features of the partial acoustic data is generated as a partial feature word group,
While collating between the registered feature word group and the partial feature word group while shifting the temporal positional relationship, the difference between the registered feature word group and the partial feature word group is the smallest difference A related information search device for acoustic data, characterized in that the calculation is based on a minimum degree of difference which is a degree.

In claim 5,
The feature word has a feature pattern of a predetermined number of bits in which each component of a frequency spectrum obtained by analyzing acoustic data is expressed by a bit value in a predetermined frequency unit,
The partial feature word group is generated for each of a plurality of partial sound data having different cut-out positions, and the registered feature word group and the partial feature word group are collated by comparing the feature patterns in bit units. The position where the registered feature word group and the search feature word group are most suitable is determined as the position where the number of mismatch bits is minimum, and the minimum mismatch is the minimum mismatch level where the number of mismatch bits is minimum. A related information search device for acoustic data, wherein the number of bits is obtained for each partial acoustic data, and the individual determination reference value is obtained based on an average value of the minimum number of mismatched bits for each partial acoustic data.

In claim 5,
The feature word has a feature pattern of a predetermined number of bits expressing each component of a frequency spectrum obtained by analysis of acoustic data by a bit value in a predetermined frequency unit, and volume data,
The partial feature word group is generated for each of a plurality of partial sound data having different cut-out positions, and the registered feature word group and the partial feature word group are collated by comparing the feature patterns in bit units. After obtaining the number, a weight based on the volume data of each feature word is added to or multiplied by the number of mismatch bits to calculate a weight mismatch bit number, and the registered feature word group and the partial feature word group are most suitable. The position is the position where the number of weighted mismatch bits is minimum, the minimum weighted mismatch bit number at which the weighted mismatch bit number is minimum is determined for each partial acoustic data as the minimum difference, and the minimum for each partial acoustic data The sound for obtaining the individual determination reference value based on the average value of the number of weighted mismatch bits Over other relevant information retrieval system.

In any one of Claims 1-7,
The search feature word generation means includes:
For the search acoustic data, the feature word group is generated as H search feature word groups while changing the phase h,
The feature word matching means includes
Collation of the search feature word group specifying one h out of the H phases h and the registered feature word group registered in the acoustic database is performed as a first collation, and the result of the first collation When the corrected minimum difference is smaller than the first determination threshold, the search feature word group and the registered feature word group registered in the acoustic database are collated while changing the phase h. The second matching is executed, and as a result of the second matching, the corrected minimum difference degree in the phase h that best matches the registered feature word group and the search feature word group is smaller than the second determination threshold value. In this case, the acoustic data related information search device is characterized in that the original acoustic data corresponding to the registered feature word group is specified.

In any one of Claims 1-8,
The acoustic database further registers representative registered feature data generated using the registered feature word group for each original acoustic data,
Further, representative search feature data generating means for generating representative search feature data using the generated search feature word group,
Representative feature data collating means for collating the representative search feature data with the representative registered feature data registered in the acoustic database,
The feature word collating means has a value obtained by correcting the difference between the representative search feature data and the representative registered feature data using the individual determination reference value within a predetermined range as a result of collation by the representative feature data collating means. Only when it is determined that there is an acoustic data related information search device, the search feature word group is collated with a registered feature word group registered in the acoustic database.

A program for causing a computer to function as the related information retrieval apparatus for acoustic data according to any one of claims 1 to 9.