JP6783443B2

JP6783443B2 - Information processing equipment, information processing systems, information processing methods, programs, and recording media

Info

Publication number: JP6783443B2
Application number: JP2016046084A
Authority: JP
Inventors: 満河本; 車谷　浩一; 浩一車谷; 明男幸島
Original assignee: National Institute of Advanced Industrial Science and Technology AIST
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2015-04-06
Filing date: 2016-03-09
Publication date: 2020-11-11
Anticipated expiration: 2036-03-09
Also published as: JP2016197406A

Description

本発明は、情報処理装置、情報処理システム、情報処理方法、プログラム、及び記録媒体に関する。 The present invention relates to an information processing device, an information processing system, an information processing method, a program, and a recording medium.

従来、多数のデータを特徴的なカテゴリに分類するクラスタリング手法が知られている。最近のクラスタリング手法として、Affinity Propagation（AP）法が知られている（例えば非特許文献１、２参照）。 Conventionally, a clustering method for classifying a large amount of data into characteristic categories has been known. As a recent clustering method, the Affinity Propagation (AP) method is known (see, for example, Non-Patent Documents 1 and 2).

AP法では、パラメータr(i,k)は、クラスタ内のデータポイントiがそのクラスタの中心点(exemplar)候補kに割り当てられる適切さを示すスコアであり、データポイントiからexemplar候補kに送られるメッセージであると考えられている。また、パラメータa(i,k)は、exemplar候補kのクラスタ内に、データポイントiがメンバーとして属する適切さを示すスコアであり、exemplar候補kからそのクラスタメンバーとなる見込みのデータポイントiに対して送られるメッセージであると考えられている。つまり、AP法はメッセージ交換型クラスタリング手法である。 In the AP method, the parameter r (i, k) is a score indicating the appropriateness of the data point i in the cluster being assigned to the candidate k of the center point (exemplar) of the cluster, and is sent from the data point i to the candidate k of the exemplar. Is considered to be the message to be sent. The parameter a (i, k) is a score indicating the appropriateness of the data point i to belong as a member in the cluster of the exemplar candidate k, and the data point i expected to become a cluster member from the exemplar candidate k. Is considered to be the message sent by. In other words, the Associated Press method is a message-switched clustering method.

Brendan J. Frey and Delbert Dueck, "Clustering by Passing Messages Between Data Points", University of Toronto Science 315, 972-976, February 2007Brendan J. Frey and Delbert Dueck, "Clustering by Passing Messages Between Data Points", University of Toronto Science 315, 972-976, February 2007 Inmar E. Givoni and Brendan J. Frey, "A Binary Variable Model for Affinity Propagation", Neural Computation,Vol 21, issue 6, pp 1589-1600, June 2009Inmar E. Givoni and Brendan J. Frey, "A Binary Variable Model for Affinity Propagation", Neural Computation, Vol 21, issue 6, pp 1589-1600, June 2009

しかしながら、上記の従来技術では、クラスタリングされたクラスタ内に、クラス内の代表値とは異なる外れ値を含む可能性があり、分類精度に改善の余地がある。ここで、データの一例として環境音を用いてAP法が適用されると、クラスタ内の代表音と、同じクラスタ内の音とで、異なる音が同じクラスタに分類されることがある。このように、AP法に関して、分類精度を向上させることが求められている。 However, in the above-mentioned conventional technique, there is a possibility that the clustered cluster contains outliers different from the representative values in the class, and there is room for improvement in the classification accuracy. Here, when the AP method is applied using environmental sounds as an example of data, different sounds may be classified into the same cluster depending on the representative sound in the cluster and the sound in the same cluster. In this way, the AP method is required to improve the classification accuracy.

そこで、本発明は、AP法に関して、クラスタへの分類精度を向上させることができる情報処理装置、情報処理システム、情報処理方法、プログラム、及び記録媒体を提供する。 Therefore, the present invention provides an information processing device, an information processing system, an information processing method, a program, and a recording medium capable of improving the classification accuracy into clusters with respect to the AP method.

本発明の一態様における情報処理装置は、複数のデータを取得する取得部と、
各データの特徴量を抽出する抽出部と、
抽出された特徴量を用いて生成された各データ間の類似度行列に基づく第１評価値及び第２評価値それぞれの更新を繰り返すことにより、最終的なクラスタを生成する分類部であって、前記類似度行列の自己相関は、前記第１評価値及び前記第２評価値の算出過程において生成されたクラスタ内の外れデータに関する類似度を用いて設定され、前記第１評価値は、クラスタ中心の候補データが、クラスタに属するメンバーデータのクラスタ中心になることの適切さを示す値であり、前記第２評価値は、前記メンバーデータが、前記候補データのクラスタに属することの適切さを示す値である、前記分類部と、を備える。 The information processing device according to one aspect of the present invention includes an acquisition unit that acquires a plurality of data and
An extraction unit that extracts the features of each data,
It is a classification unit that generates a final cluster by repeatedly updating each of the first evaluation value and the second evaluation value based on the similarity matrix between each data generated using the extracted feature amount. The autocorrelation of the similarity matrix is set using the similarity regarding the deviation data in the cluster generated in the calculation process of the first evaluation value and the second evaluation value, and the first evaluation value is the cluster center. The candidate data is a value indicating the appropriateness of the member data belonging to the cluster to be the cluster center, and the second evaluation value indicates the appropriateness of the member data belonging to the cluster of the candidate data. It includes the classification unit, which is a value.

第１実施形態における情報処理システムの一例を示す概念図である。It is a conceptual diagram which shows an example of the information processing system in 1st Embodiment. 第１実施形態における情報処理装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware composition of the information processing apparatus in 1st Embodiment. 第１実施形態における情報処理装置の機能構成の一例を示すブロック図である。It is a block diagram which shows an example of the functional structure of the information processing apparatus in 1st Embodiment. 環境音を計測した場所を示す図である。It is a figure which shows the place where the environmental sound was measured. マイクロフォンアレイの一例を示す図である。It is a figure which shows an example of a microphone array. マイクロフォンの位置を示す図である。It is a figure which shows the position of a microphone. パワースペクトルの一例を示す図である。It is a figure which shows an example of a power spectrum. 基底ベクトルの一例を示す図である。It is a figure which shows an example of the basis vector. 係数ベクトルの一例を示す図である。It is a figure which shows an example of a coefficient vector. 基底ベクトル及び係数ベクトルを用いて再構築したパワースペクトルの一例を示す図である。It is a figure which shows an example of the power spectrum reconstructed using the basis vector and the coefficient vector. 改良AP法を用いて音データをクラスタリングした結果の一例を示す図である。It is a figure which shows an example of the result of clustering sound data using the improved AP method. 改良AP法の精度結果（その１）の一例を示す図である。It is a figure which shows an example of the accuracy result (the 1) of the improved AP method. 改良AP法の精度結果（その２）の一例を示す図である。It is a figure which shows an example of the accuracy result (the 2) of the improved AP method. 一日の環境音の発生割合の一例を示す図である。It is a figure which shows an example of the generation rate of the environmental sound of one day. あるイベント日の１０時から１３時までのイベントＡにおける環境音発生割合の一例を示す図である。It is a figure which shows an example of the environmental sound generation rate in the event A from 10:00 to 13:00 on a certain event day. あるイベント日の１４時から１８時までのイベントＡにおける環境音発生割合の一例を示す図である。It is a figure which shows an example of the environmental sound generation ratio in the event A from 14:00 to 18:00 on a certain event day. あるイベント日の１４時から１８時までのイベントＢにおける環境音発生割合の一例を示す図である。It is a figure which shows an example of the environmental sound generation ratio in the event B from 14:00 to 18:00 on a certain event day. あるイベント日の１０時から１３時までのイベントＣにおける環境音発生割合の一例を示す図である。It is a figure which shows an example of the environmental sound generation ratio in the event C from 10:00 to 13:00 on a certain event day. あるイベント日の１４時から１８時までのイベントＣにおける環境音発生割合の一例を示す図である。It is a figure which shows an example of the environmental sound generation ratio in the event C from 14:00 to 18:00 on a certain event day. 第１実施形態におけるデータ解析処理の一例を示すフローチャートである。It is a flowchart which shows an example of the data analysis processing in 1st Embodiment. 第１実施形態における分類処理の一例を示すフローチャートである。It is a flowchart which shows an example of the classification process in 1st Embodiment. 第１実施形態におけるタグ付け処理の一例を示すフローチャートである。It is a flowchart which shows an example of the tagging process in 1st Embodiment. 第１実施形態における解析処理の一例を示すフローチャートである。It is a flowchart which shows an example of the analysis processing in 1st Embodiment. 第２実施形態における情報処理装置の機能構成の一例を示すブロック図である。It is a block diagram which shows an example of the functional structure of the information processing apparatus in 2nd Embodiment. パワースペクトルの一例を示す図である。It is a figure which shows an example of a power spectrum. エッジ検出されたパワースペクトルの一例を示す図である。It is a figure which shows an example of the power spectrum of edge detection. 分類された各カテゴリの一例を示す図である。It is a figure which shows an example of each of the classified categories. 音の大小度の計算を説明するための図である。It is a figure for demonstrating the calculation of the loudness of a sound. 音の継続度の計算を説明するための図である。It is a figure for demonstrating the calculation of the continuity of a sound. 実験地における音模様を表現する所定領域を示す図である。It is a figure which shows the predetermined area which expresses a sound pattern in an experimental place. 平常時の音模様を示す図である。It is a figure which shows the sound pattern in normal times. イベント時の音模様を示す図である。It is a figure which shows the sound pattern at the time of an event. 第２実施形態における音模様表現処理の一例を示すフローチャートである。It is a flowchart which shows an example of the sound pattern expression processing in 2nd Embodiment.

以下、図面を参照して本発明の実施の形態を説明する。ただし、以下に説明する実施形態は、あくまでも例示であり、以下に明示しない種々の変形や技術の適用を排除する意図はない。即ち、本発明は、その趣旨を逸脱しない範囲で種々変形して実施することができる。また、以下の図面の記載において、同一または類似の部分には同一または類似の符号を付して表している。図面は模式的なものであり、必ずしも実際の寸法や比率等とは一致しない。図面相互間においても互いの寸法の関係や比率が異なる部分が含まれていることがある。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the embodiments described below are merely examples, and there is no intention of excluding the application of various modifications and techniques not specified below. That is, the present invention can be implemented in various modifications without departing from the spirit of the present invention. Further, in the description of the following drawings, the same or similar parts are represented by the same or similar reference numerals. The drawings are schematic and do not necessarily match the actual dimensions and ratios. Even between drawings, parts with different dimensional relationships and ratios may be included.

［第１実施形態］
以下、本発明の第１実施形態における情報処理システムについて説明する。第１実施形態における情報処理システムは、取得された複数のデータに対し、適切な分類を行うことができる。また、情報処理システムは、適切に分類されたクラスタやデータを用いて、そのデータに適したモデルを構築し、データの解析などを行うことができる。 [First Embodiment]
Hereinafter, the information processing system according to the first embodiment of the present invention will be described. The information processing system according to the first embodiment can appropriately classify the acquired plurality of data. In addition, the information processing system can use appropriately classified clusters and data to build a model suitable for the data and analyze the data.

＜システム概要＞
図１は、第１実施形態における情報処理システムの一例を示す概念図である。図１に示すように、例えば情報処理システム１は、データを処理する情報処理装置（サーバ装置）１０Ａ、１０Ｂと、データを取得する取得装置２０Ａ、２０Ｂ、２０ＣとがネットワークＮを介して接続される。情報処理装置１０Ａ、１０Ｂは、処理を分散してもよいし、いずれか一つの装置で処理するようにしてもよい。 <System overview>
FIG. 1 is a conceptual diagram showing an example of an information processing system according to the first embodiment. As shown in FIG. 1, for example, in the information processing system 1, information processing devices (server devices) 10A and 10B for processing data and acquisition devices 20A, 20B, and 20C for acquiring data are connected via a network N. To. The information processing devices 10A and 10B may distribute the processing, or may be processed by any one of the devices.

各情報処理装置は、個別に区別して説明する場合には符号１０Ａ、１０Ｂを用い、個別に区別する必要がなく、まとめて説明する場合には符号１０を用いる。なお、取得装置の符号の用い方は、情報処理装置と同様である。 Each information processing apparatus uses reference numerals 10A and 10B when they are individually distinguished and described, and reference numerals 10 are used when they are described collectively without needing to be individually distinguished. The usage of the code of the acquisition device is the same as that of the information processing device.

取得装置２０は、分類対象のデータを取得する。例えば、データが音データであれば、取得装置２０は、集音可能なマイクやマイクロフォンアレイなどであり、データが画像データであれば、取得装置２０は、カメラなどの撮像装置であり、データが文字データであれば、文字データを取得する装置である。 The acquisition device 20 acquires the data to be classified. For example, if the data is sound data, the acquisition device 20 is a microphone or microphone array capable of collecting sound, and if the data is image data, the acquisition device 20 is an image pickup device such as a camera, and the data is If it is character data, it is a device that acquires character data.

情報処理装置１０は、取得装置２０から複数のデータを取得し、複数のデータに対して分類処理、所謂クラスタリングを行う。また、情報処理装置１０は、クラスタリングの結果を用いて、ラベル付けを行ったり、対象データのモデルを構築したりしてもよい。また、情報処理装置１０は、このモデルを用いて対象データの各種解析を行ってもよい。なお、情報処理装置１０は、自身の記憶部に記憶されているデータに対してクラスタリングや、モデル構築や、各種解析を行ってもよい。 The information processing device 10 acquires a plurality of data from the acquisition device 20 and performs classification processing, so-called clustering, on the plurality of data. In addition, the information processing apparatus 10 may use the result of clustering to perform labeling or build a model of target data. Further, the information processing apparatus 10 may perform various analyzes of the target data using this model. The information processing device 10 may perform clustering, model construction, and various analyzes on the data stored in its own storage unit.

ネットワークＮは、インターネット等であり、無線ＬＡＮのアクセスポイントや携帯電話の基地局などを含む。 The network N is the Internet or the like, and includes a wireless LAN access point, a mobile phone base station, and the like.

＜ハードウェア構成＞
次に、情報処理装置１０のハードウェア構成について説明する。図２は、第１実施形態における情報処理装置１０のハードウェア構成の一例を示すブロック図である。 <Hardware configuration>
Next, the hardware configuration of the information processing device 10 will be described. FIG. 2 is a block diagram showing an example of the hardware configuration of the information processing apparatus 10 according to the first embodiment.

図２に示すように、情報処理装置１０は、制御部１０２と、通信インタフェース１０６と、記憶部１０８と、表示部１１４と、入力部１１６と、を有し、各部はバスラインを介して接続される。 As shown in FIG. 2, the information processing device 10 includes a control unit 102, a communication interface 106, a storage unit 108, a display unit 114, and an input unit 116, and each unit is connected via a bus line. Will be done.

制御部１０２は、ＣＰＵ、ＲＯＭ、ＲＡＭ１０４等を含む。制御部１０２は、記憶部１０８に記憶される処理プログラム１１０等を実行することにより、一般的なサーバ装置としての機能に加え、例えば分類機能や、ラベル付け機能や、モデル構築機能などを実現するように構成される。 The control unit 102 includes a CPU, a ROM, a RAM 104, and the like. By executing the processing program 110 or the like stored in the storage unit 108, the control unit 102 realizes, for example, a classification function, a labeling function, a model construction function, and the like, in addition to the function as a general server device. It is configured as follows.

また、ＲＡＭ１０４は、各種情報を一時的に保持したり、ＣＰＵが各種処理を実行する際のワークエリアとして使用されたりする。 Further, the RAM 104 temporarily holds various information and is used as a work area when the CPU executes various processes.

通信インタフェース１０６は、ネットワークＮを介した取得装置２０との通信を制御する。 The communication interface 106 controls communication with the acquisition device 20 via the network N.

記憶部１０８は、例えばＨＤＤ等からなり、一般的な情報処理装置としての機能を実現するためのアプリケーション及びデータ（図示省略）を記憶することに加え、処理プログラム１１０を記憶する。また、記憶部１０８は、情報記憶部１１２を有している。 The storage unit 108 is composed of, for example, an HDD or the like, and stores the processing program 110 in addition to storing applications and data (not shown) for realizing a function as a general information processing device. Further, the storage unit 108 has an information storage unit 112.

処理プログラム１１０は、分類などの処理を行うためのプログラムであり、取得装置２０から各データを取得し、各データを処理するプログラムである。処理プログラム１１０は、コンピュータに読み取り可能な記録媒体に保存され、この記録媒体から読み出されて、記憶部１０８に記憶されてもよい。 The processing program 110 is a program for performing processing such as classification, and is a program that acquires each data from the acquisition device 20 and processes each data. The processing program 110 may be stored in a computer-readable recording medium, read from the recording medium, and stored in the storage unit 108.

情報記憶部１１２は、例えば各取得装置２０から取得されたデータや、分類に関する情報、ラベルに関する情報、モデルに関する情報などを関連付けて記憶する。 The information storage unit 112 stores, for example, data acquired from each acquisition device 20, information on classification, information on labels, information on models, and the like in association with each other.

表示部１１４は、情報処理装置１０の管理者等に情報を表示する。入力部１１６は、管理者等からの入力を受け付けたり、管理者等からの指示を受け付けたりする。また、情報処理装置１０は、表示部１１４と入力部１１６とを必ずしも設ける必要はなく、表示部１１４及び入力部１１６は、外部から情報処理装置１０に接続されるようにしてもよい。 The display unit 114 displays information to the administrator of the information processing device 10. The input unit 116 accepts an input from an administrator or the like, or receives an instruction from the administrator or the like. Further, the information processing device 10 does not necessarily have to be provided with the display unit 114 and the input unit 116, and the display unit 114 and the input unit 116 may be connected to the information processing device 10 from the outside.

＜機能構成＞
図３は、第１実施形態における情報処理装置１０の機能構成の一例を示すブロック図である。図３に示す情報処理装置１０は、送信部２０２、受信部２０４、処理制御部２０６、及び記憶部２４０を含む。 <Functional configuration>
FIG. 3 is a block diagram showing an example of the functional configuration of the information processing apparatus 10 according to the first embodiment. The information processing device 10 shown in FIG. 3 includes a transmission unit 202, a reception unit 204, a processing control unit 206, and a storage unit 240.

送信部２０２は、例えば通信インタフェース１０６等により実現されうる。送信部２０２は、分類結果や、ラベル付け結果や、構築したモデルなどを、ネットワークを介して接続された各装置に送信してもよい。 The transmission unit 202 can be realized by, for example, a communication interface 106 or the like. The transmission unit 202 may transmit the classification result, the labeling result, the constructed model, and the like to each device connected via the network.

受信部２０４は、例えば通信インタフェース１０６等により実現されうる。受信部２０４は、例えば、各取得装置２０から複数のデータを受信する。受信されたデータは、例えばマイクなどにより集音された環境音の音データとする。処理制御部２０６に出力される。 The receiving unit 204 can be realized by, for example, a communication interface 106 or the like. The receiving unit 204 receives, for example, a plurality of data from each acquisition device 20. The received data is, for example, the sound data of the environmental sound collected by a microphone or the like. It is output to the processing control unit 206.

記憶部２４０は、分類に用いるデータや、タグ付けなどに用いるデータや、モデル構築に用いるデータなどの各種データを記憶する。例えば、記憶部２４０は、ラベル付けのための特徴的なデータや、ラベル付けする識別情報などを記憶する。 The storage unit 240 stores various data such as data used for classification, data used for tagging, and data used for model construction. For example, the storage unit 240 stores characteristic data for labeling, identification information to be labeled, and the like.

処理制御部２０６は、例えば制御部１０２等により実現されうる。処理制御部２０６は、受信部２０４により受信された複数のデータ、例えば音データに対して、分類処理や、ラベル付け処理や、モデル構築処理などを実行する。 The processing control unit 206 can be realized by, for example, the control unit 102 or the like. The processing control unit 206 executes classification processing, labeling processing, model construction processing, and the like on a plurality of data received by the receiving unit 204, for example, sound data.

処理制御部２０６は、上述した処理を実行するため、取得部２１０、抽出部２１２、分類部２１４、検索部２２４、付与部２２６、解析部２２８、可視化部２３４を備える。 The processing control unit 206 includes an acquisition unit 210, an extraction unit 212, a classification unit 214, a search unit 224, a grant unit 226, an analysis unit 228, and a visualization unit 234 in order to execute the above-described processing.

取得部２１０は、受信部２０４から、複数のデータを取得する。取得部２１０は、受信部２０４以外にも自装置内の記憶部などから分類等の対象データを取得してもよい。 The acquisition unit 210 acquires a plurality of data from the reception unit 204. The acquisition unit 210 may acquire target data such as classification from a storage unit or the like in the own device in addition to the reception unit 204.

抽出部２１２は、各データの特徴量を抽出する。データが音データである場合、抽出部２１２は、音データを周波数領域に変換し、この周波数領域におけるパワースペクトルに関する特徴量を１つ又は複数抽出する。抽出部２１２は、データに適した特徴量を算出すればよい。 The extraction unit 212 extracts the feature amount of each data. When the data is sound data, the extraction unit 212 converts the sound data into a frequency domain and extracts one or more feature quantities related to the power spectrum in this frequency domain. The extraction unit 212 may calculate a feature amount suitable for the data.

より具体的には、抽出部２１２は、特徴量として、パワースペクトルに非負値行列因子分解を適用して得られた基底ベクトル、パワースペクトルの次元を揃えた分散行列、パワースペクトルにメルフィルタバンク分析を行って得られたベクトル、及び非負値行列因子分割により得られた係数ベクトルと基底ベクトルを用いて再構成されたパワースペクトルの第１主成分ベクトルのうちの少なくとも１つを含む。抽出された各データの特徴量は、分類部２１４に出力される。 More specifically, the extraction unit 212 analyzes the basis vector obtained by applying the non-negative matrix factor decomposition to the power spectrum, the dispersion matrix having the same dimensions of the power spectrum, and the melfilter bank analysis on the power spectrum as feature quantities. Includes at least one of the vector obtained by performing the above and the first principal component vector of the power spectrum reconstructed using the coefficient vector and the basis vector obtained by the non-negative matrix factor division. The feature amount of each extracted data is output to the classification unit 214.

分類部２１４は、例えば、抽出部２１２により抽出された特徴量を用いて、AP法を改良した手順によりデータの分類を行う。分類部２１４は、後述する第１評価値、第２評価値それぞれの更新を繰り返すことにより、最終的なクラスタを生成する。例えば、分類部２１４は、類似度行列の自己相関の更新値を用いて更新された第１評価値及び第２評価値に基づく値が収束するまで、クラスタ中心の候補データの決定及びクラスタの生成を行う。分類部２１４は、この第１評価値及び第２評価値の算出過程において生成されたクラスタ内の外れデータに関する類似度を用いて、類似度行列の自己相関の値を設定する。分類部２１４は、この分類法を用いるために、生成部２１６、算出部２１８、更新部２２０、判定部２２２を含む。 For example, the classification unit 214 classifies the data by using the feature amount extracted by the extraction unit 212 according to an improved procedure of the AP method. The classification unit 214 generates a final cluster by repeating updating each of the first evaluation value and the second evaluation value, which will be described later. For example, the classification unit 214 determines cluster-centered candidate data and generates clusters until the values based on the first and second evaluation values updated using the updated values of the autocorrelation of the similarity matrix converge. I do. The classification unit 214 sets the autocorrelation value of the similarity matrix by using the similarity with respect to the outlier data in the cluster generated in the process of calculating the first evaluation value and the second evaluation value. The classification unit 214 includes a generation unit 216, a calculation unit 218, an update unit 220, and a determination unit 222 in order to use this classification method.

生成部２１６は、抽出部２１２により抽出された特徴量を用いて、各データ間の類似度行列を算出する。生成部２１６は、所定の手法により２つのデータ間の類似度を算出することで類似度行列を生成する。 The generation unit 216 calculates a similarity matrix between each data using the feature amount extracted by the extraction unit 212. The generation unit 216 generates a similarity matrix by calculating the similarity between two data by a predetermined method.

例えば、音データに対して、上述したベクトルや分散行列が抽出された場合、生成部２１６は、類似度行列に含まれる類似度を、特徴量に含まれる各ベクトル又は分散行列の類似度に重み付けを行うことで算出してもよい。生成された類似度行列は、算出部２１８に出力される。 For example, when the above-mentioned vector or variance matrix is extracted from the sound data, the generation unit 216 weights the similarity included in the similarity matrix to the similarity of each vector or variance matrix included in the feature quantity. It may be calculated by performing. The generated similarity matrix is output to the calculation unit 218.

算出部２１８は、例えば、分類手法のAP法で用いられるresponsibilityとavailabilityとを算出する。ここでは、responsibilityをr(i,k)で表し、第１評価値と称し、availabilityをa(i,k)で表し、第２評価値と称す。 The calculation unit 218 calculates, for example, the responsibility and availability used in the AP method of the classification method. Here, the responsibility is represented by r (i, k) and is referred to as the first evaluation value, and the availability is represented by a (i, k) and is referred to as the second evaluation value.

算出部２１８は、生成部２１６により生成された類似度行列に基づく、クラスタ中心の候補データが、クラスタに属するメンバーデータのクラスタ中心になることの適切さを示す第１評価値を算出する。また、算出部２１８は、類似度行列の自己相関に基づく、メンバーデータが、候補データのクラスタに属することの適切さを示す第２評価値を算出する。また、算出部２１８は、更新部２２０から類似度行列の自己相関の更新値を取得した場合、自己相関の更新値や新たなクラスタに基づいて第１評価値及び第２評価値を新たに算出し、更新する。 The calculation unit 218 calculates a first evaluation value indicating the appropriateness of the cluster-centered candidate data based on the similarity matrix generated by the generation unit 216 to be the cluster center of the member data belonging to the cluster. In addition, the calculation unit 218 calculates a second evaluation value indicating the appropriateness of the member data belonging to the cluster of candidate data based on the autocorrelation of the similarity matrix. Further, when the calculation unit 218 acquires the update value of the autocorrelation of the similarity matrix from the update unit 220, the calculation unit 218 newly calculates the first evaluation value and the second evaluation value based on the update value of the autocorrelation and the new cluster. And update.

更新部２２０は、従来のAP法とは異なり、類似度行列の自己相関を示すs(k,k)の値の設定について、新規な手法を用いる。例えば、更新部２２０は、第１評価値及び第２評価値の算出過程において生成されたクラスタ内の外れデータに関する類似度を用いて、自己相関を設定する。 Unlike the conventional AP method, the update unit 220 uses a new method for setting the value of s (k, k) indicating the autocorrelation of the similarity matrix. For example, the update unit 220 sets the autocorrelation using the similarity with respect to the outlier data in the cluster generated in the process of calculating the first evaluation value and the second evaluation value.

これは、クラスタリングの性質上、類似度が低いデータであっても、いずれかのクラスタに分類する必要があるため、クラスタ内には、類似度の低いデータも含まれてしまう。そのため、類似度の低い外れ値に対して適切な処理を施すことで、第１実施形態では、適切なクラスタ数を生成し、分類精度の向上を図る。 Because of the nature of clustering, even data with low similarity needs to be classified into one of the clusters, so data with low similarity is also included in the cluster. Therefore, in the first embodiment, an appropriate number of clusters is generated and the classification accuracy is improved by performing appropriate processing for outliers having low similarity.

従来のAP法では、クラスタ数を決定するのに影響を与えるパラメータである類似度行列の自己相関の値s(k,k)は更新されない。また、第１評価値r(i,k)、第２評価値a(i,k)の更新時においても、s(k,k)の値は、更新されない。したがって、従来のAP法においては、自己相関s(k,k)の値に基づくクラスタ数が求まるのみであり、求めたクラスタには外れ値が存在する可能性が大いにある。 In the conventional AP method, the autocorrelation value s (k, k) of the similarity matrix, which is a parameter that influences the number of clusters, is not updated. Further, even when the first evaluation value r (i, k) and the second evaluation value a (i, k) are updated, the value of s (k, k) is not updated. Therefore, in the conventional AP method, only the number of clusters based on the value of autocorrelation s (k, k) can be obtained, and there is a great possibility that outliers exist in the obtained clusters.

他方、提案手法では、第１評価値r(i,k)、第２評価値a(i,k)を更新していく過程で求められるクラスタの外れ値を検出し、それらに対応する類似度行列の自己相関を更新する手法を用いる。つまり、第１評価値r(i,k)、第２評価値a(i,k)の更新過程でクラスタ数の決定に影響を与える外れデータのs(k,k)の値も更新されるので、外れデータを中心としてクラスタが生成される可能性があり、最終的に更新が収束したときのクラスタリング結果では、それぞれのクラスタ内に外れ値が存在する可能性は従来手法より小さくなる。したがって、第１実施形態では、分類精度の向上を図ることができる。 On the other hand, in the proposed method, outliers of clusters required in the process of updating the first evaluation value r (i, k) and the second evaluation value a (i, k) are detected, and the corresponding similarity is detected. A technique for updating the autocorrelation of a matrix is used. That is, in the process of updating the first evaluation value r (i, k) and the second evaluation value a (i, k), the value of s (k, k) of the outlier data that affects the determination of the number of clusters is also updated. Therefore, there is a possibility that clusters will be generated around the outlier data, and in the clustering result when the update finally converges, the possibility that outliers exist in each cluster is smaller than in the conventional method. Therefore, in the first embodiment, the classification accuracy can be improved.

また、更新部２２０は、クラスタ内の外れデータに対する類似度行列の自己相関を、他の値に更新してもよい。また、更新部２２０は、類似度行列の外れデータに関する列ベクトルに基づく値を、他の値に設定してもよい。また、更新部２２０は、この列ベクトルを所定順に並べ替え、所定番目の値を、他の値に設定してもよい。これにより、一旦クラスタに分類された外れ値を、他のクラスタに分類することを可能にし、外れデータを更新しながら、より適切なデータから構成されるクラスタが生成されやすくなる。 In addition, the update unit 220 may update the autocorrelation of the similarity matrix with respect to the outlier data in the cluster to another value. Further, the update unit 220 may set a value based on the column vector regarding the out-of-similarity data of the similarity matrix to another value. Further, the update unit 220 may rearrange the column vectors in a predetermined order and set the predetermined th value to another value. This makes it possible to classify outliers once classified into a cluster into other clusters, and it becomes easier to generate a cluster composed of more appropriate data while updating the outlier data.

また、更新部２２０は、類似度行列の各列ベクトルにおける最大値の標準偏差に基づいて、この列ベクトル内の類似度を選択し、選択した類似度を類似度行列の自己相関の初期値に設定してもよい。これにより、列ベクトルの最大値の標準偏差を用いて外れデータを事前に識別し、この外れデータに対する自己相関の初期値をより適切な値に設定することができる。初期値が適切に設定されるだけでも、クラスタ内の外れデータが少なくなり、分類精度が向上する。 Further, the update unit 220 selects the similarity in this column vector based on the standard deviation of the maximum value in each column vector of the similarity matrix, and sets the selected similarity as the initial value of the autocorrelation of the similarity matrix. It may be set. Thereby, the deviation data can be identified in advance by using the standard deviation of the maximum value of the column vector, and the initial value of the autocorrelation for the deviation data can be set to a more appropriate value. Even if the initial value is set appropriately, the amount of outlier data in the cluster is reduced and the classification accuracy is improved.

判定部２２２は、算出部２１８により算出された第１評価値と、第２評価値とに基づく値が収束するかどうかを判定する。例えば分類部２１４は、第１評価値と第２評価値との加算値が収束すれば、そのときのクラスタ中心データを代表データにする。分類部２１４は、加算値が収束していなければ、候補データの再決定やクラスタを再編し、もう一度分類を行う。 The determination unit 222 determines whether or not the value based on the first evaluation value calculated by the calculation unit 218 and the second evaluation value converges. For example, the classification unit 214 uses the cluster center data at that time as the representative data when the added value of the first evaluation value and the second evaluation value converges. If the added values have not converged, the classification unit 214 redetermines the candidate data, reorganizes the cluster, and classifies the data again.

以上の分類処理を行うことにより、クラスタ内の外れ値に対して、クラスタの分類数などに影響を与える自己相関を更新することで、外れ値がクラスタから除外されやすくなり、分類精度を向上させることができる。 By performing the above classification process, the outliers in the cluster are easily excluded from the cluster by updating the autocorrelation that affects the number of classifications in the cluster, and the classification accuracy is improved. be able to.

以下、上記処理で適切に分類されたデータを用いて、新たなモデルを構築することを考える。このとき、分類されたデータに、そのデータの特徴をラベル付けしておくことで、モデル化を行うことができる。 Hereinafter, it is considered to construct a new model using the data appropriately classified by the above processing. At this time, modeling can be performed by labeling the classified data with the characteristics of the data.

例えば、第１実施形態では、環境音の音データに対して分類されたクラスタに対し、代表音にラベル付けを行い、ラベル付けしたデータの特徴量を用いた学習をすることで、新たなモデルを構築する。この新たなモデルは、環境音全体を表すモデルであり、従来のような特定の音の推定や、音の方向の推定を表すモデルなどではない。 For example, in the first embodiment, a new model is created by labeling the representative sounds of the clusters classified with respect to the sound data of the environmental sounds and learning using the feature amount of the labeled data. To build. This new model is a model that represents the entire environmental sound, not a conventional model that represents the estimation of a specific sound or the estimation of the direction of sound.

次に、ラベル付け処理や、モデル化に関する処理について説明する。記憶部２４０は、種類が異なる各データと、この種類を示す識別情報とを関連付けて記憶する。例えば、音データの場合、電車音、拍手音、イベント音、風音などの特徴的な各種音データと、その音データの種類を示す識別情報とが関連付けられている。識別情報は、電車音を示す情報、拍手音を示す情報などであり、ラベルの一種である。 Next, the labeling process and the process related to modeling will be described. The storage unit 240 stores each type of data in association with the identification information indicating this type. For example, in the case of sound data, various characteristic sound data such as train sound, applause sound, event sound, and wind sound are associated with identification information indicating the type of the sound data. The identification information is information indicating a train sound, information indicating an applause sound, and the like, and is a kind of label.

検索部２２４は、分類部２１４により生成されたクラスタ中心のデータにマッチングする、記憶部２４０に記憶されたデータを検索する。例えば、クラスタ中心の代表音が、記憶部２４０に記憶されたどの音データにマッチングするかを、類似度などを用いて検索する。なお、クラスタ中心のデータについては、クラスタの特徴を示す代表データであれば他のデータでもよい。 The search unit 224 searches for the data stored in the storage unit 240, which matches the cluster-centered data generated by the classification unit 214. For example, which sound data stored in the storage unit 240 matches with the representative sound at the center of the cluster is searched for by using the similarity or the like. Note that the cluster-centered data may be other data as long as it is representative data indicating the characteristics of the cluster.

付与部２２６は、検索部２２４により検索されたデータに関連付けられた識別情報を含むラベルを、クラスタ中心のデータに付与する。例えば、取得部２１０により取得されたデータが、マイクロフォンアレイにより取得された音データであれば、音の方向及び／又は音の発生時間が関連付けられているので、付与部２２６は、これらの情報をラベルに含めてもよい。 The assigning unit 226 assigns a label including the identification information associated with the data searched by the search unit 224 to the cluster-centered data. For example, if the data acquired by the acquisition unit 210 is sound data acquired by the microphone array, the direction of the sound and / or the generation time of the sound are associated with each other. Therefore, the addition unit 226 provides such information. It may be included in the label.

なお、検索部２２４及び付与部２２６を用いれば自動で代表データにラベルを付与することができるが、ユーザがクラスタ内のデータを実際に調べることで、適切なラベルを付与してもよい。例えば、ユーザが、クラスタ内の代表音を実際に聞くことで、その代表音が何かを判別し、代表音にラベルを付与してもよい。 Although the representative data can be automatically given a label by using the search unit 224 and the giving unit 226, an appropriate label may be given by the user actually examining the data in the cluster. For example, the user may determine what the representative sound is by actually listening to the representative sound in the cluster, and label the representative sound.

解析部２２８は、ラベルに含まれる識別情報を用いて複数のデータを解析する。例えば、解析部２２８は、複数のデータにおける各識別情報の割合を算出し、この割合を解析結果にしてもよい。また、解析部２２８は、音の方向及び／又は音の発生時間を用いて、解析結果を細分化してもよい。 The analysis unit 228 analyzes a plurality of data using the identification information included in the label. For example, the analysis unit 228 may calculate the ratio of each identification information in a plurality of data and use this ratio as the analysis result. In addition, the analysis unit 228 may subdivide the analysis result by using the direction of the sound and / or the generation time of the sound.

また、解析部２２８は、分類されたデータを用いて解析モデルを構築するため、学習部２３０と、構築部２３２とを含む。 Further, the analysis unit 228 includes a learning unit 230 and a construction unit 232 in order to construct an analysis model using the classified data.

学習部２３０は、ラベル付けされたデータと、特徴量とを用いてアンサンブル学習を行う。アンサンブル学習とは、個々に学習した識別器を複数個用意し、それらを、単純には出力の平均を用いる等して、まとめ合わせて一つの識別器を構築する学習法である。アンサンブル学習には、ランダムサブスペース法（参考 T.K.Ho, "The Random Subspace Method for Constructing Decision Forests," IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 20, No. 8, pp. 832-844, 1998）を用いたアンサンブル学習（参考 "D. Opitz and R. Maclin, "Popular ensemble methods: An empirical study," Journal of Artificial Intelligence Research 11, pp. 169-196, 1999）を適用する。 The learning unit 230 performs ensemble learning using the labeled data and the feature amount. Ensemble learning is a learning method in which a plurality of individually learned classifiers are prepared, and they are put together to construct one classifier by simply using the average of the outputs. For ensemble learning, Random Subspace Method (Reference TKHo, "The Random Subspace Method for Constructing Decision Forests," IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 20, No. 8, pp. 832-844, 1998 ) Is applied (reference "D. Opitz and R. Maclin," Popular ensemble methods: An empirical study, "Journal of Artificial Intelligence Research 11, pp. 169-196, 1999).

構築部２３２は、学習部２３０による学習により、データに関するモデルを構築する。モデル構築には、例えば、MATLAB関数（参考 http://jp.mathworks.com/help/stats/random-subspace.html）を用いる。 The construction unit 232 constructs a model related to data by learning by the learning unit 230. For model construction, for example, the MATLAB function (reference http://jp.mathworks.com/help/stats/random-subspace.html) is used.

可視化部２３４は、構築されたモデルを用いて解析された解析結果が表示部において可視化されるように処理する。例えば、可視化部２３４は、解析結果をグラフ化し、ユーザに把握しやすくすることができる。 The visualization unit 234 processes the analysis result analyzed using the constructed model so that the analysis result is visualized in the display unit. For example, the visualization unit 234 can graph the analysis result and make it easier for the user to understand.

以上により、このモデルを用いれば、例えば環境音に対して発生割合などの環境の雰囲気(活性度)を調べることができる。 From the above, by using this model, it is possible to investigate the atmosphere (activity) of the environment such as the generation ratio with respect to the environmental sound.

＜具体例＞
ここでは、第１実施形態の具体例として、データは、環境計測から得られる環境音の音データとする。解析手法においては、様々な環境音をどのように整理して、分かり易い情報に変換するのかを主な課題とし、その課題を解決する手法を以下に説明する。 <Specific example>
Here, as a specific example of the first embodiment, the data is sound data of the environmental sound obtained from the environmental measurement. In the analysis method, the main problem is how to organize various environmental sounds and convert them into easy-to-understand information, and the method for solving the problem will be described below.

≪環境音の計測場所及び取得装置≫
図４は、環境音を計測した場所を示す図である。図４に示すように、駅ＳＴの右側の建物ＢＬ１の所定の高さに、取得装置２０（マイクロフォンアレイ）が設置される。また、この建物ＢＬ１と建物ＢＬ２との間に広場ＡＲがあり、この広場ＡＲでは、イベント等が頻繁に開催される。また、建物ＢＬ１及び建物ＢＬ２は、ショッピングセンターである。 ≪Environmental sound measurement location and acquisition device≫
FIG. 4 is a diagram showing a place where the environmental sound is measured. As shown in FIG. 4, the acquisition device 20 (microphone array) is installed at a predetermined height of the building BL1 on the right side of the station ST. In addition, there is a plaza AR between the building BL1 and the building BL2, and events and the like are frequently held in this plaza AR. The building BL1 and the building BL2 are shopping centers.

図５は、マイクロフォンアレイの一例を示す図である。図６は、マイクロフォンの位置を示す図である。環境音計測には、図５、６に示すマイクロフォンアレイ２０Ａ〜Ｄ（以下、単にマイクロフォンアレイ２０と称す。）を用いる。 FIG. 5 is a diagram showing an example of a microphone array. FIG. 6 is a diagram showing the position of the microphone. The microphone arrays 20A to D (hereinafter, simply referred to as the microphone array 20) shown in FIGS. 5 and 6 are used for the environmental sound measurement.

ここで、マイクロフォンアレイ２０は、複数のマイクロフォンＭＣを使用した音計測機器であるため、音源方向や音源位置の推定が可能となり、それらの情報もデータとして利活用が可能となる。マイクロフォンアレイ２０により集音される環境音は、そのショッピングセンター内で発生した音である。より具体的には、環境音は、電車音、広場整備音、イベント音、ハシャギ声、風音などである。マイクロフォンアレイ２０により計測される音の長さは、例えば最長で2secとする。計測された環境音は、周波数領域に変換され、パワースペクトルが算出される。算出されたパワースペクトルから特徴量が抽出され、抽出された特徴量は、環境音識別の解析モデルの構築に用いられる。 Here, since the microphone array 20 is a sound measuring device using a plurality of microphone MCs, it is possible to estimate the sound source direction and the sound source position, and the information can also be utilized as data. The environmental sound collected by the microphone array 20 is the sound generated in the shopping center. More specifically, the environmental sounds are train sounds, open space maintenance sounds, event sounds, hashagi sounds, wind sounds, and the like. The maximum length of sound measured by the microphone array 20 is, for example, 2 seconds. The measured environmental sound is converted into the frequency domain and the power spectrum is calculated. Features are extracted from the calculated power spectrum, and the extracted features are used to construct an analysis model for environmental sound identification.

≪概要≫
まず、環境音の解析手法の概要について説明する。処理制御部２０６は、ある程度振幅を持った発生音データを対象にし、短時間フーリエ変換を適用し、得られたパワースペクトルPsに非負値行列因子分解（参考澤田宏：“非負値行列因子分解NMFの基礎とデータ／信号解析への応用,”電子情報通信学会誌，Vol.95，No.9，pp.829-833，2012）を施し、基底ベクトルPvと係数ベクトルPcを求める。 ≪Overview≫
First, the outline of the environmental sound analysis method will be described. The processing control unit 206 applies a short-time Fourier transform to the generated sound data having a certain amplitude, and decomposes the obtained power spectrum Ps into a non-negative matrix factorization (reference: Hiroshi Sawada: "Non-negative matrix factorization NMF". Basics and application to data / signal analysis, "Journal of the Society of Electronics, Information and Communication Engineers, Vol.95, No.9, pp.829-833, 2012) is applied to obtain the basis vector Pv and the coefficient vector Pc.

ここで、処理制御部２０６は、求めた値P（PはPs、Pv、Pcを含む）を環境音の特徴量とみなす。処理制御部２０６は、これらの特徴量を使って類似度行列を算出して生成する。次に、処理制御部２０６は、改良したAffinity Propagation（AP）法を用いてデータをカテゴリ（クラスタ）に分類する。これにより、本具体例では、適切なカテゴリを生成し、分類精度を向上させることができる。 Here, the processing control unit 206 regards the obtained value P (P includes Ps, Pv, and Pc) as the feature amount of the environmental sound. The processing control unit 206 calculates and generates a similarity matrix using these features. Next, the processing control unit 206 classifies the data into categories (clusters) using the improved Affinity Propagation (AP) method. Thereby, in this specific example, an appropriate category can be generated and the classification accuracy can be improved.

また、処理制御部２０６は、分類されたそれぞれのカテゴリにラベル付けを行う。処理制御部２０６は、ラベル付けされたデータを利用してランダムサブスペース法を用いたアンサンブル学習を行い、環境音を識別するための解析モデルを構築する。ここで、アンサンブル学習において、例えば、弱学習器にはk最近傍法(k-NN法)を用いる。 In addition, the processing control unit 206 labels each of the classified categories. The processing control unit 206 performs ensemble learning using the random subspace method using the labeled data, and constructs an analysis model for identifying environmental sounds. Here, in ensemble learning, for example, the k-nearest neighbor method (k-NN method) is used for the weak learner.

さらに、本具体例では、環境音識別の解析モデルにより、計測環境がどのような音環境を形成しているのかを把握することができる。また、本解析手法の応用例として、例えば、イベント時などで、イベント音、歓声、拍手などを区別することにより、イベントの盛り上りなどの活性度が音によって計測できるサービスを提供することができる。 Further, in this specific example, it is possible to grasp what kind of sound environment the measurement environment forms by the analysis model of environmental sound identification. In addition, as an application example of this analysis method, for example, by distinguishing event sounds, cheers, applause, etc. at the time of an event, it is possible to provide a service that can measure the activity such as the excitement of an event by sound. ..

≪解析モデル≫
次に、解析モデルを構築するための分類処理、ラベル付け処理、モデル構築処理を順に説明する。 ≪Analysis model≫
Next, the classification process, the labeling process, and the model building process for constructing the analysis model will be described in order.

（１）分類処理
（１．１）特徴量抽出
抽出部２１２は、環境音の音データから特徴量を抽出する。例えば、抽出部２１２は、ある環境音に短時間フーリエ変換を施し、パワースペクトルを求める。 (1) Classification process (1.1) Feature amount extraction The feature amount extraction unit 212 extracts the feature amount from the sound data of the environmental sound. For example, the extraction unit 212 performs a short-time Fourier transform on a certain environmental sound to obtain a power spectrum.

図７は、パワースペクトルの一例を示す図である。図７に示す例では、縦軸は、周波数（N個）を表し、横軸は、短時間フーリエ変換のシフト数で決まるフレーム数（Mフレーム）を表す。このとき、このパワースペクトルの値は、行列の要素と考えられ、N×M行列Psが考えられる。ここで、Nは固定で、256次元である。Mは、切り出す音の長さに応じて変化する。 FIG. 7 is a diagram showing an example of a power spectrum. In the example shown in FIG. 7, the vertical axis represents frequencies (N), and the horizontal axis represents the number of frames (M frames) determined by the number of shifts of the short-time Fourier transform. At this time, the value of this power spectrum is considered to be a matrix element, and an N × M matrix Ps is considered. Where N is fixed and has 256 dimensions. M changes according to the length of the sound to be cut out.

抽出部２１２は、行列Psに、非負値行列因子分解を適用し、基底ベクトルと係数ベクトルとを求める。 The extraction unit 212 applies the non-negative matrix factorization to the matrix Ps to obtain the basis vector and the coefficient vector.

図８は、基底ベクトルの一例を示す図である。図９は、係数ベクトルの一例を示す図である。非負値行列因子分解には、例えばGamma Process非負値行列因子分解(Gap-NMF)（参考 M. Hoffman, D. Blei, and P. Cook, "Bayesian Nonparametric Matrix Factorization for Recorded Music," Proceedings of the 27^th International Conference on Machine Learning, Haifa, 2010）を用いる。 FIG. 8 is a diagram showing an example of a basis vector. FIG. 9 is a diagram showing an example of a coefficient vector. For non-negative matrix factorization, for example, Gamma Process Non-negative Matrix Factorization (Gap-NMF) (Reference M. Hoffman, D. Blei, and P. Cook, "Bayesian Nonparametric Matrix Factorization for Recorded Music," Proceedings of the 27 ^th International Conference on Machine Learning, Haifa, 2010) is used.

Gap-NMFは、特徴的な周波数が選択されるように基底ベクトルを決定する因子分解手法である。このとき、基底ベクトルの数は、パワースペクトルの形状によって異なる。また、係数ベクトルは、それぞれの基底ベクトルの時間的変化を表す値を要素に持ったベクトルとなる。図８、９に示す例では、濃い部分が大きな値を表しており、段階的に薄い色に変化していくに従って値が小さくなっていく。 Gap-NMF is a factorization method that determines the basis vector so that characteristic frequencies are selected. At this time, the number of basis vectors differs depending on the shape of the power spectrum. Further, the coefficient vector is a vector having a value representing the temporal change of each basis vector as an element. In the examples shown in FIGS. 8 and 9, the dark part represents a large value, and the value decreases as the color gradually changes to lighter.

図１０は、基底ベクトル及び係数ベクトルを用いて再構築したパワースペクトルの一例を示す図である。図１０に示すパワースペクトルは、図８に示す基底ベクトルと、図９に示す係数ベクトルとを乗算することで生成される。つまり、図１０に示すパワースペクトルは、図７に示すパワースペクトルを再構築したものである。 FIG. 10 is a diagram showing an example of a power spectrum reconstructed using a basis vector and a coefficient vector. The power spectrum shown in FIG. 10 is generated by multiplying the basis vector shown in FIG. 8 by the coefficient vector shown in FIG. That is, the power spectrum shown in FIG. 10 is a reconstruction of the power spectrum shown in FIG. 7.

Gap-NMFの特徴から、再構築されたパワースペクトル（図１０参照）は、元のパワースペクトル（図７参照）の形状の中で特徴的な周波数が強調されたパワースペクトルの形状になることが分かる。以下、再構築されたパワースペクトルをPr_sと表す。 Due to the characteristics of Gap-NMF, the reconstructed power spectrum (see FIG. 10) may have a power spectrum shape in which the characteristic frequency is emphasized in the shape of the original power spectrum (see FIG. 7). I understand. Hereinafter, the reconstructed power spectrum is referred to as Pr _s .

抽出部２１２は、環境音の特徴量として、例えば以下の４つの値を算出する。
・元のパワースペクトルを以下の計算で正方行列にした行列Π₁、
・基底ベクトルの平均値、ベクトルf、
・以下の計算で得られる正方行列Π₂に主成分分析が適用され、得られた第１主成分ベクトルe、
・メルフィルタバンク分析によってP_sを20次元に圧縮した振幅スペクトルfa
ここで、faは、Mが異なるパワースペクトルに対して、パワースペクトルの時間変化を同次元（20次元）で比較できるようにする特徴量として取り扱われる。 The extraction unit 212 calculates, for example, the following four values as the feature amount of the environmental sound.
・ Matrix Π ₁ , which is a square matrix of the original power spectrum by the following calculation.
・ Mean value of basis vector, vector f,
-Principal component analysis is applied to the square matrix Π ₂ obtained by the following calculation, and the first principal component vector e, obtained
・ Amplitude spectrum fa of P _s compressed to 20 dimensions by mel filter bank analysis
Here, fa is treated as a feature amount that enables the time change of the power spectrum to be compared in the same dimension (20 dimensions) for the power spectra having different Ms.

（１．２）分類手法
分類部２１４は、上述した４つの特徴量を用いて、一日に発生した環境音の分類を行う。ここで、マイクロフォンアレイ２０により計測された一日の音データ数をLとする。分類部２１４は、分類手法としてAP法と同様の方法を用いる。AP法は、音データそれぞれの類似度に応じて、以下の２つの式を使って、代表音(exemplar)と、それに類似する環境音とを求めることができるクラスタリング手法である。算出部２１８は、以下の式（３）（４）を用いて第１評価値及び第２評価値を算出する。
(1.2) Classification Method The classification unit 214 classifies the environmental sounds generated in one day by using the above-mentioned four feature quantities. Here, let L be the number of daily sound data measured by the microphone array 20. The classification unit 214 uses the same method as the AP method as the classification method. The AP method is a clustering method that can obtain a representative sound (exemplar) and an environmental sound similar to it by using the following two equations according to the similarity of each sound data. The calculation unit 218 calculates the first evaluation value and the second evaluation value using the following equations (3) and (4).

ここで、第１評価値r(i,k)は、AP法では、環境音データiからexemplar候補kに送られるメッセージを表す。また、第２評価値a(i,k)は、exemplar候補kからそのクラスタに含められる可能性がある環境音データiに送られるメッセージを表す。このとき、パラメータs(i,k)は、各要素間の関係（例えば類似度）を表したものである。パラメータa(k,k)は、以下の式を用いて更新する。
このように、AP法は、exemplar候補kと音データi間でメッセージをやり取りする。つまり、分類部２１４は、式（３）と式（４）とを交互に求めることによって、exemplarが求まり、そのexemplarに分類される音データが決定するメッセージ交換型クラスタリング手法である。このとき、a(i,k)の初期値は０に設定される。 Here, the first evaluation value r (i, k) represents a message sent from the environmental sound data i to the exemplar candidate k in the AP method. The second evaluation value a (i, k) represents a message sent from the exemplar candidate k to the environmental sound data i that may be included in the cluster. At this time, the parameter s (i, k) represents the relationship (for example, similarity) between each element. The parameter a (k, k) is updated using the following equation.
In this way, the AP method exchanges messages between the exemplar candidate k and the sound data i. That is, the classification unit 214 is a message exchange type clustering method in which the exemplar is obtained by alternately obtaining the equation (3) and the equation (4), and the sound data classified into the exemplar is determined. At this time, the initial value of a (i, k) is set to 0.

ここで、AP法は、従来のクラスタリング手法、例えば、k-means法のように事前にクラスタ数の情報を必要とせず、データ依存でexemplarの数が決まる手法である。このexemplarの数は、s(k,k)、つまり、環境音データの自己相関の大きさで決まることが知られている。また、従来手法では、全類似度s(i,k)の中央値をs(k,k)に設定してAP法を動作させている例が多い。 Here, the AP method is a conventional clustering method, for example, a method in which the number of exemplars is determined depending on the data without requiring information on the number of clusters in advance unlike the k-means method. It is known that the number of exemplars is determined by s (k, k), that is, the magnitude of autocorrelation of environmental sound data. Further, in the conventional method, there are many cases where the AP method is operated by setting the median value of the total similarity s (i, k) to s (k, k).

しかしながら、例えば分類対象データが音データの場合、従来のs(k,k)の設定方法では、分類がうまくいかないことが少なくない。つまり、クラスタリングの性質上、データはどこかのクラスタに分類されてしまうので、クラスタ内のデータの中には、多くのデータに対して類似度が低いデータ（以下、外れデータとも称す。）が含まれてしまう。 However, for example, when the classification target data is sound data, the conventional s (k, k) setting method often fails in classification. In other words, due to the nature of clustering, data is classified into some cluster, so some of the data in the cluster has a low degree of similarity to many data (hereinafter, also referred to as outlier data). Will be included.

この欠点を解決するために、第１実施形態では、改良したs(k,k)の設定方法が用いられる。ここで、s(i,k)を要素として持つ類似度行列をSと表記する。また、その類似度行列の縦ベクトルをs_k（k=1,2,…,L）と表記する。このとき、それぞれのs_kに対する最大値をs_maxk（k=1,2,…,L）と表す。 In order to solve this drawback, an improved s (k, k) setting method is used in the first embodiment. Here, the similarity matrix having s (i, k) as an element is expressed as S. Further, it denoted the vertical vector of the similarity matrix _{s k (k = 1,2, ...} , L) and. In this case, representing the maximum value for each of _{_{s k s maxk (k = 1,2}} , ..., L) and.

ここでは、s(k,k)の設定方法（更新方法）として、以下の２つを例に挙げて説明する。１つ目の更新方法は、式（６）が用いられる。更新部２２０は、式（６）により、外れデータを求める。
Here, the following two examples will be described as the setting method (update method) of s (k, k). Equation (6) is used as the first update method. The update unit 220 obtains the out-of-order data by the equation (6).

ここで、abs(x)は、ｘの絶対値を表し、mean(x)は、ｘの平均値を表し、std(x)は、xの標準偏差を表し、パラメータαは、正数を表す。つまり、クラスタ内データXの外れデータは、式（６）により検出することができる。更新部２２０は、検出された外れデータのデータポイントkのs(k,k)に対して、以下のルールを適用して更新する。
Here, abs (x) represents the absolute value of x, mean (x) represents the mean value of x, std (x) represents the standard deviation of x, and the parameter α represents a positive number. .. That is, the out-of-cluster data X can be detected by the equation (6). The update unit 220 applies the following rules to update s (k, k) of the data point k of the detected outlier data.

ここで、s_kは、類似度行列のk列ベクトルを表し、sort(s_k)_nthは、降順のk列ベクトルs_kの上位からn番目の値を表す。本具体例では、例えば４番目の値を用いるが、どの値を用いるかは与えられたデータの分類形態で変更する。 Here, s _k denotes the k column vectors of the similarity matrix, sort (s _k) _nth represents the n-th value from the top of the descending k column vector s _k. In this specific example, for example, the fourth value is used, but which value is used depends on the classification form of the given data.

また、式（６）のパラメータαは、どの程度の外れデータが想定されて分類が行われるかによって変更される。s(k,k)の初期値は、従来のAP法で用いられている類似度行列のすべての要素の中央値が設定される。 Further, the parameter α of the equation (6) is changed depending on how much deviation data is assumed and the classification is performed. The initial value of s (k, k) is set to the median value of all the elements of the similarity matrix used in the conventional AP method.

次に、２つ目の更新方法について説明する。更新部２２０は、最大値s_maxk（k=1,2,…,L）の標準偏差σ_sを基準に、s(k,k)の初期値を決める方法を用いる。すなわち、更新部２２０は、以下の式（８）を閾値Thとして用いてs(k,k)を決める。
Th＝σ_s／ａ（８） Next, the second update method will be described. The update unit 220 uses a method of determining the initial value of s (k, k) based on the standard deviation σ _s of the maximum value s _maxk (k = 1,2, ..., L). That is, the update unit 220 determines s (k, k) using the following equation (8) as the threshold value Th.
Th = σ _s / a (8)

ここで、定数aは、任意の正の整数とする。閾値Thは、s_kの要素の中で、できるだけ最大値に近い値をs(k,k)に設定しようとして、発明者によって考えられた閾値である。 Here, the constant a is an arbitrary positive integer. Threshold value Th, in the elements of s _k, attempt to set a value close to the largest possible value s (k, k), the threshold was considered by the inventors.

更新部２２０は、各s_kの要素において、σ_s／aよりも大きい要素で、σ_s／aに最も近い要素の値をs(k,k)の初期値に設定する。もし、s_kが該当する要素を持たなければ、更新部２２０は、s_kの2番目の値をs(k,k)の初期値に設定する。 Updating unit 220, the elements of each s _k, with larger elements than sigma _s / a, sets the value of the closest element to the sigma _s / a to an initial value of s (k, k). If you do not have an element that s _k is the case, the update unit 220 sets the second value of s _k to the initial value of s (k, k).

２つ目の更新方法の場合、分類部２１４は、改良されたAP法でクラスタリングした事後処理として、exemplarの値に応じて、クラスタ内の外れデータを求め、この外れデータをexemplar候補としてさらにクラスタリングすればよい。また、更新部２２０は、上述した２つの更新方法のいずれかを用いてもよいし、自己相関の初期値を２つ目の更新方法を用いて設定し、さらに、１つ目の更新方法を用いて、その初期値を更新してもよい。 In the case of the second update method, the classification unit 214 obtains the outlier data in the cluster according to the value of exemplar as the post-processing clustered by the improved AP method, and further clusters this outlier data as the exmplar candidate. do it. Further, the update unit 220 may use one of the two update methods described above, sets the initial value of the autocorrelation using the second update method, and further sets the first update method. It may be used to update its initial value.

分類部２１４は、r(i,k)＋a(i,k)の加算値が収束するまで分類処理を繰り返す。収束とは、例えば、再分類を所定回数以上行っても、加算値が所定範囲内にあることをいう。これにより、本具体例では、クラスタ内の外れデータを処理する改良AP法におけるクラスタリングを用いることで、分類精度を向上させることができる。 The classification unit 214 repeats the classification process until the added value of r (i, k) + a (i, k) converges. Convergence means that, for example, even if reclassification is performed a predetermined number of times or more, the added value is within a predetermined range. Thereby, in this specific example, the classification accuracy can be improved by using the clustering in the improved AP method for processing the out-of-cluster data.

（２）ラベル付け
検索部２２４は、分類部２１４により分類された各カテゴリの代表音（例えばexemplarの音）に類似する音データを、記憶部２４０に記憶された音データから検索する。例えば、検索部２２４は、代表音との類似度が一番大きい音データを検索する。 (2) Labeling The search unit 224 searches the sound data stored in the storage unit 240 for sound data similar to the representative sound (for example, the sound of the exemplar) of each category classified by the classification unit 214. For example, the search unit 224 searches for sound data having the highest degree of similarity to the representative sound.

付与部２２６は、検索された音データに関連する識別情報を含むラベルを、代表音に付与する。なお、ラベルは、識別情報以外にも、音の発生位置や発生時間帯などを含んでもよい。また、付与部２２６は、代表音と同じラベルを、クラスタ内の音データに付与してもよい。これにより、全データに対して検索処理を行う必要がなくなり、処理負荷を軽減させることができる。 The assigning unit 226 assigns a label including identification information related to the searched sound data to the representative sound. In addition to the identification information, the label may include a sound generation position, a sound generation time zone, and the like. Further, the adding unit 226 may give the same label as the representative sound to the sound data in the cluster. This eliminates the need to perform search processing on all data, and can reduce the processing load.

また、ラベル付けとして、ユーザが、代表音を聞き、代表音に合ったラベルを選択し、ラベル付けを行ってもよい。この場合、付与部２２６は、ユーザが付与した代表音のラベルを、代表音と同じカテゴリの音データに付与してもよい。 Further, as labeling, the user may listen to the representative sound, select a label suitable for the representative sound, and label the label. In this case, the giving unit 226 may give the label of the representative sound given by the user to the sound data of the same category as the representative sound.

（３）学習処理
解析部２２８は、ラベル付けされた各音データを教師信号とし、そのそれぞれに対する環境音の上述した特徴量（f',e,fa）を学習データとして用いて、一日の環境音を識別するための解析モデルを構築する。ここで、f'は、Pr_sの周波数平均値である。また、f'は、基底ベクトルの平均値、ベクトルfを用いてもよい。 (3) Learning processing The analysis unit 228 uses each labeled sound data as a teacher signal and uses the above-mentioned feature quantities (f', e, fa) of the environmental sound for each as training data for one day. Build an analytical model to identify environmental sounds. Here, f'is the frequency mean value of Pr _s . Further, f'may use the mean value of the basis vector, the vector f.

本具体例では、解析部２２８は、アンサンブル学習を用いて環境音を識別するための解析モデルを構築する。学習部２３０は、アンサンブル学習として、上述したランダムサブスペース法を用いたアンサンブル学習を実行する。また、弱学習器には、k-最近傍法を用いてモデルを構築する。ここで、モデル構築には、例えばMATLAB関数(fitensemble)を用いることにする。 In this specific example, the analysis unit 228 builds an analysis model for identifying environmental sounds using ensemble learning. The learning unit 230 executes ensemble learning using the above-mentioned random subspace method as ensemble learning. For weak learners, a model is constructed using the k-nearest neighbor method. Here, for example, the MATLAB function (fitensemble) is used for model construction.

≪実験≫
ここで、発明者らが行った実験について説明する。発明者らは、図４に示す位置に、図５，６に示すマイクロフォンアレイ２０を設置し、マイクロフォンアレイ２０は、公共空間内に発生した環境音を計測する。 ≪Experiment≫
Here, the experiments conducted by the inventors will be described. The inventors install the microphone array 20 shown in FIGS. 5 and 6 at the position shown in FIG. 4, and the microphone array 20 measures the environmental sound generated in the public space.

環境音のモデル構築には、平成26年7月7日(月)から平成26年7月11日(金)において計測された5日分の音データが用いられる。まず、分類部２１４による分類結果について説明する。この実験では、更新方法として２つ目を適用し、式（８）のパラメータaは３とする。 Five days' worth of sound data measured from July 7, 2014 (Monday) to July 11, 2014 (Friday) will be used to build the environmental sound model. First, the classification result by the classification unit 214 will be described. In this experiment, the second method is applied as the update method, and the parameter a in the equation (8) is set to 3.

また、類似度行列の要素s(i,k)は、それぞれの環境音データの特徴量を用いて、Itakura spectral距離（Di）とCOSH spectral距離（Dcosh）を用いて算出する。
パラメータCi(i=1,2,3,4)は、それぞれの距離の重み係数を表し、正数値が与えられる。
図１１は、改良AP法を用いて音データをクラスタリングした結果の一例を示す図である。図１１に示す例では、データ数に対するクラスタ数が示されている。図１１に示す例では、圧縮率（＝クラスタ数／データ数）が46.2％となっている。 The elements s (i, k) of the similarity matrix are calculated using the Itakura spectral distance (Di) and the COSH spectral distance (Dcosh) using the features of each environmental sound data.
The parameter Ci (i = 1,2,3,4) represents the weighting factor for each distance and is given a positive value.
FIG. 11 is a diagram showing an example of the result of clustering sound data using the improved AP method. In the example shown in FIG. 11, the number of clusters with respect to the number of data is shown. In the example shown in FIG. 11, the compression rate (= number of clusters / number of data) is 46.2%.

図示しないが、従来手法で用いられている全s(i,k)の中央値をs(k,k)に設定する方法を用いたAP法では、圧縮率が91.7％となる。つまり、クラスタ数が少ないため、外れデータがクラスタ内に含まれている可能性が高い。なお、自己相関の更新において、１つ目の更新方法を用いた場合には、その更新方法に合わせた圧縮率になるように、事前実験等により適宜パラメータを調整すればよい。 Although not shown, the AP method using the method of setting the median value of all s (i, k) to s (k, k) used in the conventional method has a compression ratio of 91.7%. In other words, since the number of clusters is small, there is a high possibility that outlier data is included in the cluster. When the first update method is used in the update of the autocorrelation, the parameters may be appropriately adjusted by a preliminary experiment or the like so that the compression rate matches the update method.

次に、上述した処理で構築された解析モデルの精度の検証について説明する。検証に用いた音データは、平成26年7月14日から平成26年7月18日に計測された環境音である。それぞれの音データに対し、平成26年7月7日(月)から平成26年7月11日(金)の音データを用いて構築された解析モデルが適用される。このとき、曜日毎に解析モデルが適用され、間違ったラベルが出力された回数（代表音とは異なる音が、その代表音のクラスタに分類された数）がカウントされる。 Next, verification of the accuracy of the analysis model constructed by the above processing will be described. The sound data used for the verification is the environmental sound measured from July 14, 2014 to July 18, 2014. An analysis model constructed using the sound data from July 7, 2014 (Monday) to July 11, 2014 (Friday) is applied to each sound data. At this time, the analysis model is applied for each day of the week, and the number of times an incorrect label is output (the number of sounds different from the representative sound classified into the cluster of the representative sound) is counted.

図１２は、改良AP法の精度結果（その１）の一例を示す図である。図１２に示すように、各曜日の正解率の平均は９１％である。なお、誤解答数とは、クラスタに分類された音データを人が聞いて、その音データがクラスタの代表音とは異なる数を示す。 FIG. 12 is a diagram showing an example of the accuracy result (No. 1) of the improved AP method. As shown in FIG. 12, the average accuracy rate for each day of the week is 91%. The number of incorrect answers indicates a number in which a person listens to sound data classified into clusters and the sound data is different from the representative sound of the cluster.

なお、図示しないが、従来のAP法によりクラスタリングされた代表音を使ってラベリングされたときの解析モデルを用いた場合では、正解率が約78％という結果が得られている。このことから、改良AP法により、そのクラスタリング結果は、適切にカテゴリが生成され、そのカテゴリの分類精度が増加したと考えることができる。 Although not shown, the correct answer rate is about 78% when the analysis model when labeled using the representative sounds clustered by the conventional AP method is used. From this, it can be considered that the improved AP method appropriately generated categories in the clustering result and increased the classification accuracy of the categories.

さらに、曜日毎ではなく、7月7日から7月11日の全ての音データを使って、解析モデルが構築された場合を検討する。7月14日から7月18日のそれぞれの環境音データに、この解析モデルを適用し、モデル構築に使用するデータ数が多くなると、分類精度が向上するのかどうかを発明者らは検証した。 Furthermore, we will consider the case where the analysis model is constructed using all the sound data from July 7th to July 11th, not every day of the week. This analysis model was applied to each environmental sound data from July 14 to July 18, and the inventors verified whether the classification accuracy would improve as the number of data used for model construction increased.

図１３は、改良AP法の精度結果（その２）の一例を示す図である。図１３に示すように、各曜日の正解率の平均は９３％である。図１３に示す例では、各曜日において、誤解答数が減り、平均正解率が向上している。このことから、多くのデータを使って解析モデルを構築した方が、モデルの信頼性が向上することが分かる。 FIG. 13 is a diagram showing an example of the accuracy result (No. 2) of the improved AP method. As shown in FIG. 13, the average accuracy rate for each day of the week is 93%. In the example shown in FIG. 13, the number of incorrect answers is reduced and the average correct answer rate is improved on each day of the week. From this, it can be seen that the reliability of the model is improved when the analysis model is constructed using a large amount of data.

次に、上述した解析モデルを用いて、可視化部２３４により分類された環境音の発生割合を示す。 Next, using the analysis model described above, the generation rate of environmental sounds classified by the visualization unit 234 is shown.

図１４は、一日の環境音の発生割合の一例を示す図である。図１４に示すように、本実験において、マイクロフォンアレイ２０を設置した場所は、すぐ近くに駅STがあることから（図４参照）、電車音の発生割合が一番高い。 FIG. 14 is a diagram showing an example of the generation rate of environmental sounds in one day. As shown in FIG. 14, in this experiment, the place where the microphone array 20 was installed has the station ST in the immediate vicinity (see FIG. 4), so that the train sound generation rate is the highest.

２番目は、広場ARのザワツキを表したような雑踏音の発生割合が高く、３番目は、広場の整備音の発生割合が高い。その他に、ハシャギ音、風音、イベント音などが発生している。 The second is the high rate of crowded noise that represents the zawatsuki of the plaza AR, and the third is the high rate of the maintenance sound of the plaza. In addition, there are rattling noises, wind noises, and event noises.

付与部２２６は、このような音にラベル付けを行い、可視化部２３４は、その発生割合を可視化することができる。図１４に示すグラフから、その日の音環境の雰囲気を理解することが可能になる。さらに、改良AP法を用いて生成された解析モデルは、図１２や図１３に示すように正答率が90％を超えているので、音環境の可視化は、信頼でき得るレベルに達していると言える。 The giving unit 226 labels such a sound, and the visualization unit 234 can visualize the generation ratio thereof. From the graph shown in FIG. 14, it becomes possible to understand the atmosphere of the sound environment of the day. Furthermore, the analysis model generated using the improved AP method has a correct answer rate of over 90% as shown in FIGS. 12 and 13, so that the visualization of the sound environment has reached a reliable level. I can say.

上述したように、本具体例における解析モデルは、環境音の発生割合の可視化から、その音環境の雰囲気を把握することができる。そこで、この解析モデルの応用例として、広場ARで開催されたイベントに関して、イベント中に発生する音から、そのイベントの雰囲気を把握することができるのかを発明者らは検証した。 As described above, the analysis model in this specific example can grasp the atmosphere of the sound environment from the visualization of the generation rate of the environmental sound. Therefore, as an application example of this analysis model, the inventors verified whether the atmosphere of the event could be grasped from the sound generated during the event regarding the event held in the plaza AR.

解析モデルの作成に関しては、イベントが開催された日に発生した一日の環境音の音データを使って、イベント音（歌、音楽等）、拍手音に新たにラベルが付与され、以前作成した解析モデルに、新たなラベルが付与されたクラスタが追加されるという手順でモデルが構築された。 Regarding the creation of the analysis model, the event sounds (songs, music, etc.) and applause sounds were newly labeled using the sound data of the environmental sounds of the day generated on the day the event was held, and were created before. The model was built by adding a cluster with a new label to the analysis model.

図１５は、あるイベント日の１０時から１３時までのイベントＡにおける環境音発生割合の一例を示す図である。ここで、以下に示す全ての例は、モデル構築に使用されていない音データから得られた結果である。以降の図では、環境音に、ハシャギ音、イベント音、拍手音などが含まれる。 FIG. 15 is a diagram showing an example of the environmental sound generation ratio in the event A from 10:00 to 13:00 on a certain event day. Here, all the examples shown below are the results obtained from the sound data not used for model construction. In the following figures, the environmental sounds include a hash sound, an event sound, an applause sound, and the like.

図１６は、あるイベント日の１４時から１８時までのイベントＡにおける環境音発生割合の一例を示す図である。図１６に示すイベントＡと図１５に示すイベントＡは同じイベントである。図１５及び図１６に示すグラフによれば、イベント中盤から後半にかけて、ハシャギ音、イベント音、拍手音がほぼ同じ割合で発生していることが分かる。 FIG. 16 is a diagram showing an example of the environmental sound generation ratio in the event A from 14:00 to 18:00 on a certain event day. Event A shown in FIG. 16 and event A shown in FIG. 15 are the same event. According to the graphs shown in FIGS. 15 and 16, it can be seen that the hashing sound, the event sound, and the clapping sound are generated at almost the same ratio from the middle to the latter half of the event.

これは、実際のイベントによるアンケート結果などと照らし合わせて、このイベントの雰囲気や盛り上がりがどうだったかを、このグラフと照らし合わせて検討することが可能になる。 This makes it possible to examine how the atmosphere and excitement of this event was by comparing it with the results of questionnaires from actual events, and by comparing it with this graph.

図１７は、あるイベント日の１４時から１８時までのイベントＢにおける環境音発生割合の一例を示す図である。図１７に示す例では、イベント音の発生割合が高く、これは、音楽や歌が中心のコンサート型のイベントであることが分かる。また、ハシャギ音や拍手音が発生しているが、イベント音の割合が圧倒的に多いことから、音楽や歌で盛り上ったイベントであったと推察することができる。 FIG. 17 is a diagram showing an example of the environmental sound generation ratio in the event B from 14:00 to 18:00 on a certain event day. In the example shown in FIG. 17, the rate of occurrence of event sounds is high, and it can be seen that this is a concert-type event centered on music and songs. In addition, although there are rattling sounds and applause sounds, the proportion of event sounds is overwhelmingly large, so it can be inferred that the event was lively with music and songs.

また、上述した環境音とは雰囲気が異なる例を図１８及び図１９を用いて説明する。図１８は、あるイベント日の１０時から１３時までのイベントＣにおける環境音発生割合の一例を示す図である。 Further, an example in which the atmosphere is different from the above-mentioned environmental sound will be described with reference to FIGS. 18 and 19. FIG. 18 is a diagram showing an example of the environmental sound generation ratio in the event C from 10:00 to 13:00 on a certain event day.

図１９は、あるイベント日の１４時から１８時までのイベントＣにおける環境音発生割合の一例を示す図である。図１８に示すイベントＣと図１９に示すイベントＣは同じイベントである。図１８及び図１９に示すグラフによれば、イベント開始時には、拍手音などが発生しており、盛り上りの雰囲気があったが、イベントの中盤、後半に向けて、徐々に盛り上りの雰囲気が減少しているような傾向を把握することができる。 FIG. 19 is a diagram showing an example of the environmental sound generation ratio in the event C from 14:00 to 18:00 on a certain event day. Event C shown in FIG. 18 and event C shown in FIG. 19 are the same event. According to the graphs shown in FIGS. 18 and 19, applause sounds were generated at the start of the event, and there was an atmosphere of excitement, but the atmosphere of excitement gradually increased toward the middle and the latter half of the event. It is possible to grasp the tendency that it seems to be decreasing.

以上より、環境音を識別するための解析モデルを使うことによって、イベントの雰囲気を把握することができる。よって、この解析モデルのさらなる応用例として、イベント評価への活用にも期待できる。 From the above, the atmosphere of the event can be grasped by using the analysis model for identifying the environmental sound. Therefore, as a further application example of this analysis model, it can be expected to be used for event evaluation.

本具体例では、環境音データの分類方法、その分類結果から環境音を識別するための解析モデルの構築手法を提供することができる。本具体例では、環境音の識別を約90%以上の確率で実現することができ、環境音の発生場所や、音環境の雰囲気などを理解することができる。 In this specific example, it is possible to provide a method for classifying environmental sound data and a method for constructing an analysis model for identifying environmental sounds from the classification result. In this specific example, the identification of the environmental sound can be realized with a probability of about 90% or more, and the place where the environmental sound is generated and the atmosphere of the sound environment can be understood.

また、本具体例の応用例として、イベント時の発生音からイベントの雰囲気の把握、さらにはイベントの盛り上りなどの評価にも利用することが期待できる。また、イベント時の環境音発生割合から推察する結果と、実際の雰囲気とが一致しているかどうか、イベント参加者にアンケートなどで検証することができる。 In addition, as an application example of this specific example, it can be expected to be used for grasping the atmosphere of the event from the sound generated at the time of the event, and further for evaluating the excitement of the event. In addition, it is possible to verify whether or not the result inferred from the environmental sound generation rate at the time of the event matches the actual atmosphere by questionnaires to the event participants.

＜動作＞
次に、第１実施形態における情報処理システム１の動作について説明する。図２０は、第１実施形態におけるデータ解析処理の一例を示すフローチャートである。図２０に示すステップＳ１０２で、分類部２１４は、複数のデータに対して分類処理を行う。分類処理の詳細は、図２１を用いて説明する。 <Operation>
Next, the operation of the information processing system 1 in the first embodiment will be described. FIG. 20 is a flowchart showing an example of data analysis processing according to the first embodiment. In step S102 shown in FIG. 20, the classification unit 214 performs classification processing on a plurality of data. The details of the classification process will be described with reference to FIG.

ステップＳ１０４で、検索部２２４及び付与部２２６は、ラベル付け処理を行う。ラベル付け処理の詳細は、図２２を用いて説明する。 In step S104, the search unit 224 and the granting unit 226 perform the labeling process. The details of the labeling process will be described with reference to FIG.

ステップＳ１０６で、解析部２２８は、分類されたデータを用いて、データ解析を行う。データ解析の詳細は、図２３を用いて説明する。 In step S106, the analysis unit 228 performs data analysis using the classified data. The details of the data analysis will be described with reference to FIG.

図２１は、第１実施形態における分類処理の一例を示すフローチャートである。例えば図２１に示す処理は、定期的に必要なデータが集まったときに実行される。例えば音データの場合、一日ごとに、データの取得処理が終わった後に分類処理が開始されればよい。 FIG. 21 is a flowchart showing an example of the classification process in the first embodiment. For example, the process shown in FIG. 21 is executed when necessary data is periodically collected. For example, in the case of sound data, the classification process may be started every day after the data acquisition process is completed.

図２１に示すステップＳ２０２で、取得部２１０は、取得装置２０から受信した複数のデータを取得する。例えば、取得装置２０が外に設置されたマイクロフォンの場合、取得部２１０は、環境音の音データを取得する。また、複数のデータは、自装置の記憶部２４０から取得されてもよい。 In step S202 shown in FIG. 21, the acquisition unit 210 acquires a plurality of data received from the acquisition device 20. For example, in the case of a microphone in which the acquisition device 20 is installed outside, the acquisition unit 210 acquires the sound data of the environmental sound. Further, the plurality of data may be acquired from the storage unit 240 of the own device.

ステップＳ２０４で、抽出部２１２は、各データから特徴量を抽出する。抽出される特徴量は、データに応じて変更されればよい。例えば、音データの場合、周波数領域のパワースペクトルに関する特徴量が抽出される。 In step S204, the extraction unit 212 extracts the feature amount from each data. The feature amount to be extracted may be changed according to the data. For example, in the case of sound data, features related to the power spectrum in the frequency domain are extracted.

ステップＳ２０６で、生成部２１６は、抽出された特徴量を用いて、各データ間の類似度で表される類似度行列を生成する。 In step S206, the generation unit 216 uses the extracted features to generate a similarity matrix represented by the similarity between the data.

ステップＳ２０８で、算出部２１８は、分類の収束判定に用いる評価値を算出する。例えば、算出部２１８は、改良AP法が用いられる場合、第１評価値r(i,k)と、第２評価値a(i,k)とを算出する。さらに算出部２１８は、分類処理の終了判定パラメータとして、第１評価値と第２評価値との加算値を算出する。 In step S208, the calculation unit 218 calculates the evaluation value used for the convergence test of the classification. For example, the calculation unit 218 calculates the first evaluation value r (i, k) and the second evaluation value a (i, k) when the improved AP method is used. Further, the calculation unit 218 calculates the added value of the first evaluation value and the second evaluation value as the end determination parameter of the classification process.

ステップＳ２１０で、判定部２２２は、終了判定パラメータが収束したか否かを判定する。例えば、判定部２２２は、加算値が収束したか否かを判定する。終了判定パラメータが収束していれば（ステップＳ２１０−ＹＥＳ）処理はステップＳ２１４に進み、終了判定パラメータが収束していなければ（ステップＳ２１０−ＮＯ）処理はステップＳ２１２に進む。 In step S210, the determination unit 222 determines whether or not the end determination parameters have converged. For example, the determination unit 222 determines whether or not the added value has converged. If the end determination parameters have converged (step S210-YES), the process proceeds to step S214, and if the end determination parameters have not converged (step S210-NO), the process proceeds to step S212.

ステップＳ２１２で、更新部２２０は、クラスタ内の外れデータを検出し、検出された外れデータの自己相関を、他の値に更新する。自己相関は、類似度行列の中に含まれている。更新処理が終わると、処理はステップＳ２０８に戻り、更新後の自己相関を用いて第１評価値や第２評価値が更新される In step S212, the update unit 220 detects the outlier data in the cluster and updates the autocorrelation of the detected outlier data to another value. The autocorrelation is included in the similarity matrix. When the update process is completed, the process returns to step S208, and the first evaluation value and the second evaluation value are updated using the updated autocorrelation.

ステップＳ２１４で、分類部２１４は、判定対象パラメータが終了したときのクラスタを最終の結果として決定する。これにより、従来のAP法よりも、外れデータを考慮してクラスタリングを行うため、クラスタ内の外れデータの数を減らすことができ、分類精度を向上させることができる。 In step S214, the classification unit 214 determines the cluster when the determination target parameter is completed as the final result. As a result, since clustering is performed in consideration of the outlier data as compared with the conventional AP method, the number of outlier data in the cluster can be reduced and the classification accuracy can be improved.

図２２は、第１実施形態におけるタグ付け処理の一例を示すフローチャートである。図２２に示すステップＳ３０２で、検索部２２４は、各クラスタから１つのクラスタを選択し、そのクラスタ内の代表データを選択する。代表データは、例えばクラスタ中心のデータである。 FIG. 22 is a flowchart showing an example of the tagging process according to the first embodiment. In step S302 shown in FIG. 22, the search unit 224 selects one cluster from each cluster and selects representative data in the cluster. The representative data is, for example, cluster-centered data.

ステップＳ３０４で、検索部２２４は、選択した代表データに対応するデータを、記憶部２４０の中から検索する。例えば、検索部２２４は、代表データに一番類似するデータを検索する。 In step S304, the search unit 224 searches the storage unit 240 for data corresponding to the selected representative data. For example, the search unit 224 searches for data that is most similar to the representative data.

ステップＳ３０６で、付与部２２６は、検索されたデータに関連付けられている識別情報をラベルに含めて、代表データに付与する。ラベルには、識別情報以外にも、そのデータに関する他の情報が含まれてもよい。 In step S306, the assigning unit 226 includes the identification information associated with the searched data in the label and assigns it to the representative data. In addition to the identification information, the label may contain other information about the data.

ステップＳ３０８で、付与部２２６は、全ての代表データにラベルを付与したか否かを判定する。代表データは、例えば各クラスタに１つずつ含まれる。全ての代表データにラベルが付与されていれば（ステップＳ３０８−ＹＥＳ）処理は終了し、全ての代表データにラベルが付与されていなければ（ステップＳ３０８−ＮＯ）処理はステップＳ３０２に戻り、他の代表データが選択される。 In step S308, the granting unit 226 determines whether or not all the representative data have been labeled. One representative data is included in each cluster, for example. If all the representative data are labeled (step S308-YES), the process ends, and if all the representative data are not labeled (step S308-NO), the process returns to step S302, and the other Representative data is selected.

図２３は、第１実施形態における解析処理の一例を示すフローチャートである。図２３に示すステップＳ４０２で、学習部２３０及び構築部２３２は、ラベル付けされた各データを教師信号とし、各教師信号の特徴量を学習データとして用いてアンサンブル学習を行い、解析モデルを構築する。 FIG. 23 is a flowchart showing an example of the analysis process according to the first embodiment. In step S402 shown in FIG. 23, the learning unit 230 and the construction unit 232 perform ensemble learning using each labeled data as a teacher signal and the feature amount of each teacher signal as training data, and construct an analysis model. ..

ステップＳ４０４で、分類部２１４は、この解析モデルを適用して、解析対象のデータを分類する。さらに、解析部２２８は、分類の正答率などを算出する場合には、事前に割り当てられたデータのラベルと、分類後に割り当てられたラベルとを比較し、ラベルが異なるデータをカウントする。また、人が分類後のデータを分析して、誤って分類されたデータの数をカウントしてもよい。 In step S404, the classification unit 214 applies this analysis model to classify the data to be analyzed. Further, when calculating the correct answer rate of the classification, the analysis unit 228 compares the label of the data assigned in advance with the label assigned after the classification, and counts the data having different labels. In addition, a person may analyze the classified data and count the number of misclassified data.

ステップＳ４０６で、可視化部２３４は、解析結果を表示部において可視化されるように処理する。例えば、可視化部２３４は、解析結果をグラフ化し、把握しやすくする。 In step S406, the visualization unit 234 processes the analysis result so that it can be visualized on the display unit. For example, the visualization unit 234 graphs the analysis result and makes it easy to grasp.

以上、取得されたデータ全体に対して解析モデルを構築することができ、さらに、解析結果を一目で把握しやすいように可視化することができる。 As described above, an analysis model can be constructed for the entire acquired data, and further, the analysis result can be visualized so that it can be easily grasped at a glance.

以上、第１実施形態では、多くのデータを処理するのに悩めるサービス産業の手助けをするため、まずはデータの分類において分類精度を向上させることができる。さらに、データ利活用を実行するためのデータ解析手法を提案することができる。また、その解析から得られた結果をどのようなサービスに活かすことができるのかを提示することができる。 As described above, in the first embodiment, in order to help the service industry suffering from processing a large amount of data, it is possible to improve the classification accuracy in the data classification first. Furthermore, it is possible to propose a data analysis method for executing data utilization. In addition, it is possible to present what kind of service the results obtained from the analysis can be utilized.

例えば、第１実施形態によれば、音データを解析する場合、音の方向や位置を推定するだけではなく、その音の発生状況から、その音環境がどのような雰囲気になっているか、音環境模様を把握することができる。 For example, according to the first embodiment, when analyzing sound data, not only the direction and position of the sound are estimated, but also the atmosphere of the sound environment is determined from the sound generation status. You can grasp the environmental pattern.

より具体的には、イベント時に、その音環境を計測し、どのような雰囲気のイベントであったのかを、盛り上がり、賑わいなどの音による客観的指標を提示することができる。また、共同住宅において、その周りの音環境を計測し、どのような音環境であるのかを、入居予定者に提示することができる。また、クラスタ内の外れデータを検出することで、この外れデータを分析し、何らかの特徴を把握してもよい。 More specifically, it is possible to measure the sound environment at the time of the event and present an objective index by sound such as excitement and liveliness as to what kind of atmosphere the event was. In addition, in an apartment house, the sound environment around it can be measured and the sound environment can be presented to prospective tenants. Further, by detecting the outlier data in the cluster, this outlier data may be analyzed and some characteristics may be grasped.

また、第１実施形態によれば、大量にあるデータから代表データを検出し、その代表データを用いて解析を行うことができるため、データ解析に時間がかからない。また、データのモデル構築については、適切な分類手法によりデータがカテゴリに分類されていれば、そのカテゴリに対してラベル付けを行い、モデルを構築することが可能である。 Further, according to the first embodiment, representative data can be detected from a large amount of data and analysis can be performed using the representative data, so that data analysis does not take time. Regarding data model construction, if data is classified into categories by an appropriate classification method, it is possible to label the categories and build a model.

［第２実施形態］
次に、第２実施形態について説明する。第２実施形態では、音環境を色模様としてモニタリングする技術に関する。第２実施形態で取得されるデータは、音データである。第２実施形態において、音データから１又は複数の特徴量を抽出し、抽出した特徴量を色で表現することにより、環境音などの音データを色模様（以下、音模様とも称す。）で表現することができる。 [Second Embodiment]
Next, the second embodiment will be described. The second embodiment relates to a technique for monitoring the sound environment as a color pattern. The data acquired in the second embodiment is sound data. In the second embodiment, one or a plurality of feature amounts are extracted from the sound data, and the extracted feature amounts are expressed by colors, so that the sound data such as environmental sound is expressed by a color pattern (hereinafter, also referred to as a sound pattern). Can be expressed.

例えば、第２実施形態における情報処理訴装置は、環境音の音データから、音の高低、大小、及び継続の３つの特徴を抽出し、それぞれに所定の色を割り当て、音模様を表現する。さらに、音の音源位置を推定することで、所定領域内における音模様を表現することができる。この音模様は、画面上に表示されることで、ユーザにとって視認可能となる。 For example, the information processing appeal device according to the second embodiment extracts three features of pitch, pitch, loudness, and continuation of the sound from the sound data of the environmental sound, assigns a predetermined color to each of them, and expresses a sound pattern. Further, by estimating the sound source position of the sound, it is possible to express the sound pattern within a predetermined region. This sound pattern is visible to the user by being displayed on the screen.

なお、第２実施形態におけるシステム構成、及びハードウェア構成は、第１実施形態におけるシステム構成、及びハードウェア構成とそれぞれ同様であるため、その説明を省略する。また、第２実施形態におけるシステム構成及びハード構成については、図１及び図２と同じ符号を用いる。 Since the system configuration and the hardware configuration in the second embodiment are the same as the system configuration and the hardware configuration in the first embodiment, the description thereof will be omitted. Further, the same reference numerals as those in FIGS. 1 and 2 are used for the system configuration and the hardware configuration in the second embodiment.

＜機能構成＞
図２４は、第２実施形態における情報処理装置１０の機能構成の一例を示すブロック図である。図２４に示す情報処理装置１０は、送信部２０２、受信部２０４、処理制御部３００、及び記憶部３１０を含む。送信部２０２、及び受信部２０４は、第１実施形態と同様であるため、同じ符号を付し、その説明を省略する。記憶部３１０は、分類に用いるデータや、特徴量抽出などに用いるデータや、モデル構築に用いるデータなどの各種データを記憶する。例えば、記憶部３１０は、色表現に用いるデータや、色表現された画像データなどを記憶する。 <Functional configuration>
FIG. 24 is a block diagram showing an example of the functional configuration of the information processing apparatus 10 according to the second embodiment. The information processing device 10 shown in FIG. 24 includes a transmission unit 202, a reception unit 204, a processing control unit 300, and a storage unit 310. Since the transmitting unit 202 and the receiving unit 204 are the same as those in the first embodiment, they are designated by the same reference numerals and the description thereof will be omitted. The storage unit 310 stores various data such as data used for classification, data used for feature amount extraction, and data used for model construction. For example, the storage unit 310 stores data used for color expression, color-expressed image data, and the like.

処理制御部３００は、取得部２１０、抽出部２１２、分類部２１４、音特徴抽出部３０２、表現部３０４、及び推定部３０６を含む。なお、取得部２１０、抽出部２１２、分類部２１４は、第１実施形態と同様であるため、同じ符号を付し、その説明を省略する。 The processing control unit 300 includes an acquisition unit 210, an extraction unit 212, a classification unit 214, a sound feature extraction unit 302, an expression unit 304, and an estimation unit 306. Since the acquisition unit 210, the extraction unit 212, and the classification unit 214 are the same as those in the first embodiment, they are designated by the same reference numerals and the description thereof will be omitted.

音特徴抽出部３０２は、分類部２１４により分類された各カテゴリに基づく音データの第１特徴量を抽出する。例えば、音特徴抽出部３０２は、分類部２１４により改良AP法を用いて所定期間内の音データが特徴ごとに分類された各カテゴリに対し、各カテゴリの代表となる周波数（例えばカテゴリの重心の周波数）を用いて第１特徴量を抽出する。 The sound feature extraction unit 302 extracts the first feature amount of the sound data based on each category classified by the classification unit 214. For example, the sound feature extraction unit 302 has a frequency that is representative of each category (for example, the center of gravity of the category) for each category in which sound data within a predetermined period is classified for each feature by the classification unit 214 using the improved AP method. Frequency) is used to extract the first feature quantity.

より具体的な例として、音特徴抽出部３０２は、代表周波数を大、中、小の３つに分類し、ヒストグラムを生成する。次に、音特徴抽出部３０２は、ヒストグラム内の度数に応じて、大、中、小ごとに０又は１の値を与える。音特徴抽出部３０２は、（大、中、小）の（０，０，１）から（１，１，１）までの７つのスコアを第１特徴量とする。 As a more specific example, the sound feature extraction unit 302 classifies representative frequencies into three, large, medium, and small, and generates a histogram. Next, the sound feature extraction unit 302 gives a value of 0 or 1 for each of large, medium, and small depending on the frequency in the histogram. The sound feature extraction unit 302 uses seven scores from (0,0,1) to (1,1,1) of (large, medium, small) as the first feature amount.

表現部３０４は、音データから、第１特徴量を、第１色により表現する。第１色は、例えばＧ（緑）とする。これにより、音データの特徴を色で表現することができ、音という視認しにくいものを視認しやすくすることができる。 The expression unit 304 expresses the first feature amount by the first color from the sound data. The first color is, for example, G (green). As a result, the characteristics of the sound data can be expressed by colors, and it is possible to make it easier to visually recognize a sound that is difficult to see.

また、音特徴抽出部３０２は、音データから第２特徴量及び第３特徴量を抽出してもよい。例えば、音特徴抽出部３０２は、音データの大小を示す第２特徴量と、音データの継続性を示す第３特徴量を求めてもよい。音特徴抽出部３０２は、取得された音データの大小を示す第２特徴量と、音データの継続性を示す第３特徴量とを生成する。 Further, the sound feature extraction unit 302 may extract the second feature amount and the third feature amount from the sound data. For example, the sound feature extraction unit 302 may obtain a second feature amount indicating the magnitude of the sound data and a third feature amount indicating the continuity of the sound data. The sound feature extraction unit 302 generates a second feature amount indicating the magnitude of the acquired sound data and a third feature amount indicating the continuity of the sound data.

第２特徴量として、音特徴抽出部３０２は、例えば、公知のラウドネス（参考 ITU-R BS.17701-1勧告http://www.itu.int/rec/R-REC-BS.1770/）の計算によって求められる音量（単位：ＬＫＦＳ）を基準とし、音の強弱を７段落のスコアで表す。なお、音特徴抽出部３０２は、このスコアが大きくなるほど、音量が大きくなるようにスコアを設定する。また、音特徴抽出部３０２は、単純に音データの波形のエンベロープを、所定の時間単位で求めて、その平均値を音の大きさにしてもよい。また、音特徴抽出部３０２は、パワースペクトルを周波数軸方向に足して、時間軸方向の平均値を用いて音の大きさの代わりとしてもよい。 As a second feature quantity, the sound feature extraction unit 302 has, for example, a known loudness (reference ITU-R BS.17701-1 recommendation http://www.itu.int/rec/R-REC-BS.1770/). The strength of the sound is expressed by a score of 7 paragraphs based on the volume (unit: LKFS) obtained by the calculation of. The sound feature extraction unit 302 sets the score so that the larger the score, the louder the volume. Further, the sound feature extraction unit 302 may simply obtain the envelope of the waveform of the sound data in a predetermined time unit, and set the average value as the loudness of the sound. Further, the sound feature extraction unit 302 may add the power spectrum in the frequency axis direction and use the average value in the time axis direction as a substitute for the loudness.

第３特徴量として、音特徴抽出部３０２は、例えば、音データのパワースペクトルにおけるパワー最大の時刻を基準にして相互情報量からヒストグラムを求める（参考角田拓也，中川匡弘，"相互情報量と相互相関関数を用いた筋電と筋音」の時間差の推定，信学技法，MBE2015-60，pp.37-42，2015.）。また、音特徴抽出部３０２は、求めたヒストグラムの尖度（Kurtosis）を計算する。このとき、音特徴抽出部３０２は、尖度の値が大きいほど、突発的な音を表し、尖度の値が小さくなるほど、変動が少なく持続性が高い音を表すことになる。よって、音特徴抽出部３０２は、尖度値に応じた度合を音の継続度とする。例えば、音の継続度は、７段階のスコアで表される。 As the third feature amount, the sound feature extraction unit 302 obtains a histogram from the mutual information amount based on, for example, the time of maximum power in the power spectrum of the sound data (reference Takuya Tsunoda, Masahiro Nakagawa, "mutual information amount and mutual". Estimating the time difference between myoelectricity and mechanomyogram using a correlation function, credential technique, MBE2015-60, pp.37-42, 2015.). In addition, the sound feature extraction unit 302 calculates the kurtosis of the obtained histogram. At this time, the sound feature extraction unit 302 represents a sudden sound as the kurtosis value is larger, and represents a sound with less fluctuation and higher sustainability as the kurtosis value is smaller. Therefore, the sound feature extraction unit 302 sets the degree corresponding to the kurtosis value as the sound continuity. For example, the continuity of a sound is represented by a score of 7 levels.

上述したように、３つの特徴量が抽出される場合、表現部３０４は、第２特徴量を、第２色により表現し、かつ、第３特徴量を、第３色により表現する。第２色は、Ｒ（赤）、第３色は、Ｂ（青）として表す。例えば、それぞれの色は、０〜２５５の値で表される。これにより、環境音などの音に対して色を用いて表現することができる。なお、色の組み合わせは、光の三原色を用いたが、これに限られない。 As described above, when the three feature amounts are extracted, the expression unit 304 expresses the second feature amount by the second color and the third feature amount by the third color. The second color is represented as R (red), and the third color is represented as B (blue). For example, each color is represented by a value between 0 and 255. This makes it possible to express sounds such as environmental sounds using colors. The color combination used is not limited to the three primary colors of light.

推定部３０６は、音データに対して、所定領域内の音源位置を推定する。例えば、推定部３０６は、所定領域をブロック分割し、分割ごとに周波数選択を行って、その周波数を用いるＭＵＳＩＣ法（参考 R. O. Schmidt, "Multiple Emitter Location and Signal Parameter Estimation,"IEEE Trans. on Antennas and Propagation, Vol. 34, No. 3, pp. 276-280, 1986.）により、音源位置を推定することができる。また、推定部３０６は、主に音の到来方向が推定できる方法として、ビームフォーマ法、線形予測法や最小分散法などを用いてもよい（参考大賀寿郎、山崎芳男、金田豊著、音響システムとディジタル処理、電子情報通信学会出版）。 The estimation unit 306 estimates the sound source position within a predetermined region with respect to the sound data. For example, the estimation unit 306 divides a predetermined region into blocks, selects a frequency for each division, and uses the frequency by the MUSIC method (reference RO Schmidt, "Multiple Emitter Location and Signal Parameter Optimization," IEEE Trans. On Antennas and The sound source position can be estimated from Propagation, Vol. 34, No. 3, pp. 276-280, 1986.). Further, the estimation unit 306 may mainly use a beamformer method, a linear prediction method, a minimum variance method, or the like as a method for estimating the arrival direction of sound (reference: Juro Oga, Yoshio Yamazaki, Yutaka Kaneda, acoustic system). And digital processing, published by the Institute of Electronics, Information and Communication Engineers).

このとき、表現部３０４は、推定部３０６により推定された音源位置の音データに対して、第１特徴量を第１色で表現し、第２特徴量を第２色で表現し、第３特徴量を第３色で表現する。これにより、所定領域内の音を音源位置ごとにＲＧＢで表すことができ、色を用いて音模様を視覚的に表現することができる。 At this time, the expression unit 304 expresses the first feature amount in the first color and the second feature amount in the second color with respect to the sound data of the sound source position estimated by the estimation unit 306. The feature amount is expressed in the third color. As a result, the sound in the predetermined region can be expressed in RGB for each sound source position, and the sound pattern can be visually expressed using colors.

＜音特徴量の抽出＞
次に、音特徴量の抽出について説明する。まず、音特徴抽出部３０２は、第１音特徴量を求めるため、音データのパワースペクトルを求める。ここで、音特徴抽出部３０２は、所定期間内において、時間と共に変化しうるパワースペクトルを１つの画像として捉える。 <Extraction of sound features>
Next, the extraction of sound features will be described. First, the sound feature extraction unit 302 obtains the power spectrum of the sound data in order to obtain the first sound feature amount. Here, the sound feature extraction unit 302 captures a power spectrum that can change with time within a predetermined period as one image.

図２５は、パワースペクトルの一例を示す図である。図２５に示す例では、所定値以上のパワースペクトルが表示されている。所定値には、音が発生していることを示すための適切な閾値が設定される。音特徴抽出部３０２は、所定値以上のパワースペクトルが表現された図に対し、エッジ検出を行う。 FIG. 25 is a diagram showing an example of a power spectrum. In the example shown in FIG. 25, a power spectrum having a predetermined value or more is displayed. An appropriate threshold value for indicating that sound is being generated is set in the predetermined value. The sound feature extraction unit 302 performs edge detection on a diagram in which a power spectrum of a predetermined value or more is expressed.

図２６は、エッジ検出されたパワースペクトルの一例を示す図である。図２６に示す例では、エッジ検出で囲まれた領域のそれぞれの重心において、分類部２１４により、距離が近いものが１つのカテゴリ内に改良ＡＰ法により分類される。 FIG. 26 is a diagram showing an example of an edge-detected power spectrum. In the example shown in FIG. 26, in each center of gravity of the region surrounded by the edge detection, those having a short distance are classified into one category by the improved AP method by the classification unit 214.

ここで、改良ＡＰ法を用いる理由としては、分類が詳細になり、周波数の分布把握がより精度よくなるからである。 Here, the reason for using the improved AP method is that the classification becomes more detailed and the frequency distribution can be grasped more accurately.

図２７は、分類された各カテゴリの一例を示す図である。図２７に示す例では、カテゴリＣＴ１〜７までの７つのカテゴリに分類されている。音特徴抽出部３０２は、例えば、各カテゴリ内の重心の周波数（代表周波数）を大（４ｋＨｚ以上）、中（１ｋＨｚ以上４ｋＨｚ未満）、小（１ｋＨｚ未満）の各領域に分類する。各領域の閾値は一例であって、これに限定されない。 FIG. 27 is a diagram showing an example of each of the classified categories. In the example shown in FIG. 27, the categories are classified into seven categories CT1 to CT7. For example, the sound feature extraction unit 302 classifies the frequency (representative frequency) of the center of gravity in each category into large (4 kHz or more), medium (1 kHz or more and less than 4 kHz), and small (less than 1 kHz) regions. The threshold value of each region is an example and is not limited to this.

次に、音特徴抽出部３０２は、それぞれの周波数領域のカテゴリを、時間軸上において、いくつあるかをカウントする。例えば、大の領域において横軸のカテゴリの数をカウントすると２などである。音特徴抽出部３０２は、各領域のカウント数の最大値を用いて、カウント値を正規化する。例えば、音特徴抽出部３０２は、各カウント値を最大値で除算する。このとき、或る閾値（例えば０．４）以上の値は１に設定され、閾値未満の値は０に設定される。例えば、（大、中、小）が（１、０．２、０．５）の場合、（１、０、１）となる。 Next, the sound feature extraction unit 302 counts how many categories of each frequency domain exist on the time axis. For example, counting the number of categories on the horizontal axis in a large area is 2. The sound feature extraction unit 302 normalizes the count value by using the maximum value of the count number in each region. For example, the sound feature extraction unit 302 divides each count value by the maximum value. At this time, a value above a certain threshold value (for example, 0.4) is set to 1, and a value below the threshold value is set to 0. For example, when (large, medium, small) is (1, 0.2, 0.5), it becomes (1, 0, 1).

また、音特徴抽出部３０２は、（０，０，１）〜（１，１，１）からの各パターンについてスコア（第１特徴量）を付ける。例えば、音特徴抽出について、以下の処理が行われる。
パターン：スコア値
（０，０，１）：１
（０，１，０）：２
（０，１，１）：３
（１，０，０）：４
（１，０，１）：５
（１，１，０）：６
（１，１，１）：７
このように、音特徴抽出部３０２は、第１特徴量として、１〜７の数字を用いる。 Further, the sound feature extraction unit 302 assigns a score (first feature amount) to each pattern from (0,0,1) to (1,1,1). For example, the following processing is performed for sound feature extraction.
Pattern: Score value (0,0,1): 1
(0,1,0): 2
(0,1,1): 3
(1,0,0): 4
(1,0,1): 5
(1,1,0): 6
(1,1,1): 7
As described above, the sound feature extraction unit 302 uses the numbers 1 to 7 as the first feature amount.

図２８は、音の大小度の計算を説明するための図である。図２８に示す例では、横軸が時間で、縦軸が音の大きさである。音の大きさはラウドネスを用いるが、音の大きさが計算できれば、いずれの指標が用いられてもよい。 FIG. 28 is a diagram for explaining the calculation of loudness of sound. In the example shown in FIG. 28, the horizontal axis is time and the vertical axis is loudness. Loudness is used for the loudness, but any index may be used as long as the loudness can be calculated.

音特徴抽出部３０２は、音のレベルによって、例えば７段落のスコアを付ける。音特徴抽出部３０２は、平常時の音のレベル（例えば基準レベル）を１として、音環境の音レベルに合わせて段階的に閾値を設定し、７段階のスコア（第２特徴量）を付けるようにする。基準レベルや閾値は、適宜設定変更可能である。 The sound feature extraction unit 302 assigns a score of, for example, 7 paragraphs according to the sound level. The sound feature extraction unit 302 sets a normal sound level (for example, a reference level) to 1, sets a threshold value stepwise according to the sound level of the sound environment, and assigns a 7-step score (second feature amount). To do so. The reference level and threshold can be changed as appropriate.

図２９は、音の継続度の計算を説明するための図である。図２９に示す例では、音の尖度（Kurtosis）を用いて継続度が求められる。まず、音特徴抽出部３０２は、パワースペクトルの最大値（図２９に示す円で囲む部分）に対して、相関値を、相互情報量を使って計算する。 FIG. 29 is a diagram for explaining the calculation of the continuity of sound. In the example shown in FIG. 29, the kurtosis of sound is used to determine the degree of continuity. First, the sound feature extraction unit 302 calculates the correlation value with respect to the maximum value of the power spectrum (the portion surrounded by the circle shown in FIG. 29) using the mutual information amount.

次に、音特徴抽出部３０２は、求めた値をMATLABのhistを用いてヒストグラムを生成する。音特徴抽出部３０２は、作成したヒストグラムに対して、ヒストグラムの形状を把握するための尖度（Kurtosis）を計算する。 Next, the sound feature extraction unit 302 generates a histogram of the obtained value using the hist of MATLAB. The sound feature extraction unit 302 calculates the kurtosis for grasping the shape of the histogram with respect to the created histogram.

最後に、音特徴抽出部３０２は、計算された尖度の値に応じて、以下のとおり７つのスコアを付ける。
条件：スコア値
Kurtosis≧４．５：１
４．５＞Kurtosis≧３．５：２
３．５＞Kurtosis≧２．５：３
２．５＞Kurtosis≧１．５：４
１．５＞Kurtosis≧１．０：５
１．０＞Kurtosis≧０．５：６
０．５＞Kurtosis：７ Finally, the sound feature extraction unit 302 assigns seven scores as follows according to the calculated kurtosis value.
Condition: Score value
Kurtosis ≧ 4.5: 1
4.5> Kurtosis ≧ 3.5: 2
3.5> Kurtosis ≧ 2.5: 3
2.5> Kurtosis ≧ 1.5: 4
1.5> Kurtosis ≧ 1.0: 5
1.0> Kurtosis ≧ 0.5: 6
0.5> Kurtosis: 7

表現部３０４は、各特徴量に対して、７段階のスコア値を０〜２５５の値に置換し、色表現する。例えば、表現部３０４は、（第１特徴量：第２特徴量：第３特徴量）が（５：２：１）であるとすると、色（Ｇ：Ｒ：Ｂ）について（２５５×１：２５５×（２／５）：２５５×（１／５））で表現可能である。 The expression unit 304 replaces the score value of 7 stages with a value of 0 to 255 for each feature amount and expresses the color. For example, assuming that (first feature amount: second feature amount: third feature amount) is (5: 2: 1), the expression unit 304 has (255 × 1: 2) for the color (G: R: B). It can be expressed as 255 × (2/5): 255 × (1/5)).

＜音模様の実験＞
ここで、発明者らが行なった音模様に関する実験について説明する。図３０は、実験地における音模様を表現する所定領域を示す図である。図３０に示すように、本実験地は、図４に示す実験地と同じ場所であり、所定領域は、マイクロフォンアレイ２０近傍の領域ＡＲ１０である。音データは、最長で２秒間のデータであり、ある程度振幅を持った環境音とする。また、領域ＡＲ１０は、５ｍ×５ｍの各ブロック領域に分割され、各ブロック領域に、ＲＧＢ値が求められる。 <Sound pattern experiment>
Here, an experiment on a sound pattern conducted by the inventors will be described. FIG. 30 is a diagram showing a predetermined region expressing a sound pattern in the experimental site. As shown in FIG. 30, the experimental site is the same as the experimental site shown in FIG. 4, and the predetermined area is the area AR10 in the vicinity of the microphone array 20. The sound data is data for a maximum of 2 seconds, and is an environmental sound having a certain amplitude. Further, the area AR10 is divided into each block area of 5 m × 5 m, and an RGB value is obtained for each block area.

図３１は、平常時の音模様を示す図である。図３２は、イベント時の音模様を示す図である。図３１及び図３２に示すように、様々な音が発生している（イベント時の）場合には、カラフルな（濃淡の差が大きい）音模様となり、あまり音の変化のない平常時には、青や黄緑が主体の（濃淡の差が小さい）音模様になる。つまり、音模様は、音環境の状況の変化を検知できることが分かる。よって、上述した３つの特徴量を用いる音模様は、音環境の状態を検知、及び把握するために、十分に役立つ効果的なモニタリング手法であると言える。 FIG. 31 is a diagram showing a sound pattern in normal times. FIG. 32 is a diagram showing a sound pattern at the time of an event. As shown in FIGS. 31 and 32, when various sounds are generated (at the time of an event), the sound pattern becomes colorful (the difference in shade is large), and in normal times when there is not much change in sound, it is blue. The sound pattern is mainly yellowish green (the difference in shade is small). That is, it can be seen that the sound pattern can detect changes in the state of the sound environment. Therefore, it can be said that the sound pattern using the above-mentioned three feature quantities is an effective monitoring method that is sufficiently useful for detecting and grasping the state of the sound environment.

上述した例では、３つの特徴量を７段階のスコアで表現し、それぞれの値をＲＧＢ値に置き換えて、音模様が描かれる。７段階とする理由は、第１特徴量として音の高低度を算出する際に、高、中、低と３段階に分けてヒストグラムを作成し、そのヒストグラムの大きさで０か１に変換されるため、（高，中，低）に対する値が（０，０，１）から（１，１，１）までの７つに場合分けされるからである。第１特徴量と同様に、他の特徴量も、大きさに応じた７段階のスコアを用いる。 In the above-mentioned example, three feature quantities are expressed by a score of seven stages, and each value is replaced with an RGB value to draw a sound pattern. The reason for setting 7 levels is that when calculating the pitch of the sound as the first feature, a histogram is created in 3 stages of high, medium, and low, and the size of the histogram is converted to 0 or 1. Therefore, the values for (high, medium, low) are divided into seven cases from (0,0,1) to (1,1,1). Similar to the first feature amount, the other feature amounts use a score of 7 levels according to the size.

ここで、音の大小の値と継続度の尖度値に関して、７段階に分類する閾値設定は、音環境で取得される音データによって変動するので、それぞれの音環境に応じた場合分けで閾値が決められればよい。 Here, regarding the loudness value of the sound and the kurtosis value of the continuity, the threshold value for classifying into seven stages varies depending on the sound data acquired in the sound environment. Therefore, the threshold value is divided into cases according to each sound environment. Should be decided.

また、音模様を学習データとして用いて、教師データには音模様を計測した日付を用いることにより、音模様を利用した音環境の理解、例えば、どの日に類似しているのか、いつもと違う音模様になっているのかなどが把握できる識別モデルの構築が可能になると考えられる。ここで、識別モデルには、例えば単純ベイズ分類器や判別分析（フィッシャー判別など（参照C.M.Bishop "Pattern Recognition and Machine Learning," Springer, 2006)）が利用できる。 In addition, by using the sound pattern as learning data and using the date when the sound pattern was measured as the teacher data, understanding of the sound environment using the sound pattern, for example, which day is similar, is different from usual. It is thought that it will be possible to build an identification model that can grasp whether or not it has a sound pattern. Here, for the discriminative model, for example, a naive Bayes classifier or discriminant analysis (Fisher discriminant analysis (see C.M. Bishop "Pattern Recognition and Machine Learning," Springer, 2006)) can be used.

また、他のセンサ、例えば、振動計や画像などの融合により、上述した音模様や識別モデルは、建物の老朽化（倒壊するまでの過程）のモニタリングに利用することが可能である。 In addition, by fusing other sensors such as a vibrometer and an image, the above-mentioned sound pattern and identification model can be used for monitoring the aging (process until collapse) of the building.

また、当システムは、音模様を日付ごとに対応付けてデータベースに保存することにより、所定の音模様に対する類似の音模様を検索することが可能になる。このとき、ブロックごとに類似の音模様が検索されてもよい。 In addition, the system can search for similar sound patterns to a predetermined sound pattern by associating the sound patterns with each date and storing them in the database. At this time, a similar sound pattern may be searched for each block.

＜動作＞
次に、第２実施形態における情報処理システム１の動作について説明する。図３３は、第２実施形態における音模様表現処理の一例を示すフローチャートである。図３３に示すステップＳ５０２で、取得部２１０は、音データを取得する。 <Operation>
Next, the operation of the information processing system 1 in the second embodiment will be described. FIG. 33 is a flowchart showing an example of the sound pattern expression processing in the second embodiment. In step S502 shown in FIG. 33, the acquisition unit 210 acquires sound data.

ステップＳ５０４で、推定部３０６は、所定期間毎に、取得された音データの音源位置を推定する。 In step S504, the estimation unit 306 estimates the sound source position of the acquired sound data at predetermined intervals.

ステップＳ５０６で、音特徴抽出部３０２は、推定された音源位置に対応する音データの各特徴量を抽出する。第１特徴量を抽出するため、図２０に示すステップＳ１０２の分類処理が実行される。 In step S506, the sound feature extraction unit 302 extracts each feature amount of the sound data corresponding to the estimated sound source position. In order to extract the first feature amount, the classification process of step S102 shown in FIG. 20 is executed.

ステップＳ５０８で、表現部３０４は、抽出された各特徴量に対し、それぞれ各色を示す値に置換し、音を色で表現する。これにより、上述したように、音環境を音模様として色で視覚的に表現し、容易に把握することができるようになる。 In step S508, the expression unit 304 replaces each of the extracted feature amounts with a value indicating each color, and expresses the sound by color. As a result, as described above, the sound environment can be visually expressed in color as a sound pattern and can be easily grasped.

なお、上述した第１実施形態及び第２実施形態における処理フローに含まれる各処理ステップは、処理内容に矛盾が生じない範囲で、任意に順番を変更して又は並列に実行することができるとともに、各処理ステップ間に他のステップを追加してもよい。また、便宜上１ステップとして記載されているステップは、複数ステップに分けて実行することができる一方、便宜上複数ステップに分けて記載されているものは、１ステップとして把握することができる。 In addition, each processing step included in the processing flow in the first embodiment and the second embodiment described above can be arbitrarily changed in order or executed in parallel as long as there is no contradiction in the processing contents. , Other steps may be added between each processing step. Further, the steps described as one step for convenience can be executed by being divided into a plurality of steps, while those described as one step can be grasped as one step for convenience.

以上、本発明の実施形態について述べたが、本発明は既述の実施形態に限定されるものではなく、本発明の技術的思想に基づいて各種の変形及び変更が可能である。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and various modifications and modifications can be made based on the technical idea of the present invention.

１情報処理システム
１０情報処理装置
２０取得装置
１０２制御部
１０８記憶部
２０６処理制御部
２１０取得部
２１２抽出部
２１４分類部
２２４検索部
２２６付与部
２２８解析部
２３４可視化部 1 Information processing system 10 Information processing device 20 Acquisition device 102 Control unit 108 Storage unit 206 Processing control unit 210 Acquisition unit 212 Extraction unit 214 Classification unit 224 Search unit 226 Granting unit 228 Analysis unit 234 Visualization unit

Claims

An acquisition unit that acquires multiple sound data representing environmental sounds,
An extraction unit that extracts the features of each sound data,
A classification unit that classifies the plurality of sound data into each category using the extracted features, and
An analysis unit that analyzes the sound data based on each category classified by the classification unit using the identification information of the sound data, the direction of the sound data, and / or the generation time of the sound data.
A visualization unit that visualizes the analysis results analyzed by the analysis unit,
Equipped with a,
The classification unit
By repeating the update of each of the first evaluation value and the second evaluation value based on the similarity matrix between each sound data generated using the extracted feature data, a final cluster is generated and classified into each category. Then, the autocorrelation of the similarity matrix is set using the similarity with respect to the outlier data in the cluster generated in the calculation process of the first evaluation value and the second evaluation value, and the first evaluation value is The cluster-centered candidate data is a value indicating the appropriateness of the member data belonging to the cluster to be the cluster center, and the second evaluation value is the appropriateness of the member data belonging to the cluster of the candidate data. An information processing device that is a value indicating .

A sound feature extraction unit that extracts the first feature amount, the second feature amount, and the third feature amount of the sound data based on each category, and a sound feature extraction unit.
The first feature amount is represented by the first color, the second feature amount is represented by the second color, and the third feature amount is represented by the third color. , The information processing apparatus according to claim 1.

Further, an estimation unit for estimating the sound source position within a predetermined region is provided for the sound data.
The expression part
The information processing device according to claim 2, wherein the sound data of the sound source position is expressed by the first color, the second color, and the third color.

The sound feature extraction unit
The first feature amount indicating the pitch of the sound data based on the center frequency of the category, the second feature amount indicating the magnitude of the sound data, and the third feature amount indicating the continuity of the sound data. The information processing apparatus according to claim 2, which is extracted.

The classification unit
Claim that the similarity in the column vector is selected based on the standard deviation of the maximum value in each column vector of the similarity matrix, and the selected similarity is set as the initial value of the autocorrelation of the similarity matrix. The information processing apparatus according to 1 .

The classification unit
The information processing apparatus according to any one of claims 1 to 5 , wherein the final cluster is generated by repeatedly updating the first evaluation value, the second evaluation value, and the autocorrelation value. ..

The classification unit
A generator that generates the similarity matrix using the features, and
A calculation unit that calculates the first evaluation value and the second evaluation value,
An update unit that obtains the deviation data in the cluster generated in the calculation process of the first evaluation value and the second evaluation value and updates the autocorrelation value of the deviation data to another value.
A determination unit that determines whether or not the first evaluation value and the value based on the second evaluation value newly calculated by the calculation unit using the updated value converge.
The information processing apparatus according to any one of claims 1 to 6 , comprising the above.

The update part
The information processing apparatus according to claim 7 , wherein a value based on the similarity in the column vector of the similarity matrix with respect to the deviation data is set to the other value.

The update part
The information processing apparatus according to claim 8 , wherein the similarity in the column vector is rearranged in a predetermined order, and the predetermined degree of similarity is set to the other value.

The sound data is sound data collected by a microphone.
The information processing apparatus according to any one of claims 1 to 9 , wherein the feature amount is a feature amount relating to a power spectrum in the frequency domain.

The feature quantity is a basis vector obtained by applying a non-negative matrix factor decomposition to the power spectrum, a dispersion matrix having the same dimensions of the power spectrum, and a vector obtained by performing a mel filter bank analysis on the power spectrum. The information processing according to claim 10 , further comprising at least one of the coefficient vector obtained by the non-negative matrix factor division and the first principal component vector of the power spectrum reconstructed using the basis vector. apparatus.

The classification unit
The information processing apparatus according to claim 11 , wherein the similarity included in the similarity matrix is calculated by weighting the similarity of each vector included in the feature amount or the variance matrix.

A storage unit that stores each sound data of a different type in association with identification information indicating the type.
A search unit that searches for sound data stored in the storage unit, which matches the sound data in the center of the cluster generated by the classification unit, and a search unit.
An assigning unit that assigns a label including identification information associated with the searched sound data to the sound data at the center of the cluster, and
The information processing apparatus according to any one of claims 1 to 12 , further comprising.

The analysis unit
The information processing apparatus according to claim 13 , wherein the plurality of sound data are analyzed by using the identification information included in the label.

The sound data acquired by the acquisition unit is the sound data acquired by the microphone array.
The information processing apparatus according to claim 14 , wherein the label further includes the direction of the sound data and / or the generation time of the sound data.

The analysis unit
The information processing device according to claim 15 , wherein the ratio of each identification information in the plurality of sound data is calculated and the ratio is used as an analysis result.

The visualization unit
The information processing apparatus according to claim 16 , wherein the analysis result is graphed using the direction of the sound data and / or the generation time of the sound data.

The analysis unit
A learning unit that performs ensemble learning using the labeled sound data and the feature amount.
The information processing apparatus according to claim 15 , further comprising a construction unit that constructs a model related to the sound data by learning by the learning unit.

The visualization unit
The information processing apparatus according to claim 18 , wherein the analysis result analyzed by using the model is processed so as to be visualized on the display unit.

An acquisition unit that acquires multiple sound data representing environmental sounds,
An extraction unit that extracts the features of each sound data,
A classification unit that classifies the plurality of sound data into each category using the extracted features, and
An analysis unit that analyzes the sound data based on each category classified by the classification unit using the identification information of the sound data, the direction of the sound data, and / or the generation time of the sound data.
A visualization unit that visualizes the analysis results analyzed by the analysis unit,
Equipped with a,
The classification unit
By repeating the update of each of the first evaluation value and the second evaluation value based on the similarity matrix between each sound data generated using the extracted feature data, a final cluster is generated and classified into each category. Then, the autocorrelation of the similarity matrix is set using the similarity with respect to the outlier data in the cluster generated in the calculation process of the first evaluation value and the second evaluation value, and the first evaluation value is The cluster-centered candidate data is a value indicating the appropriateness of the member data belonging to the cluster to be the cluster center, and the second evaluation value is the appropriateness of the member data belonging to the cluster of the candidate data. An information processing system that is a value indicating .

The computer
Acquiring multiple sound data representing environmental sounds and
Extracting the features of each sound data and
Using the extracted features to classify the plurality of sound data into each category,
Analyzing the sound data based on each of the classified categories using the identification information of the sound data and the direction of the sound data and / or the generation time of the sound data.
Visualizing the analyzed analysis results and
The execution,
The above classification is
By repeating the update of each of the first evaluation value and the second evaluation value based on the similarity matrix between each sound data generated using the extracted feature data, a final cluster is generated and classified into each category. Then, the autocorrelation of the similarity matrix is set using the similarity with respect to the outlier data in the cluster generated in the calculation process of the first evaluation value and the second evaluation value, and the first evaluation value is The cluster-centered candidate data is a value indicating the appropriateness of the member data belonging to the cluster to be the cluster center, and the second evaluation value is the appropriateness of the member data belonging to the cluster of the candidate data. An information processing method that is a value indicating .

On the computer
Acquiring multiple sound data representing environmental sounds and
Extracting the features of each sound data and
Using the extracted features to classify the plurality of sound data into each category,
Analyzing the sound data based on each of the classified categories using the identification information of the sound data and the direction of the sound data and / or the generation time of the sound data.
Visualizing the analyzed analysis results and
To execute ,
The above classification is
By repeating the update of each of the first evaluation value and the second evaluation value based on the similarity matrix between each sound data generated using the extracted feature data, a final cluster is generated and classified into each category. Then, the autocorrelation of the similarity matrix is set using the similarity with respect to the outlier data in the cluster generated in the calculation process of the first evaluation value and the second evaluation value, and the first evaluation value is The cluster-centered candidate data is a value indicating the appropriateness of the member data belonging to the cluster to be the cluster center, and the second evaluation value is the appropriateness of the member data belonging to the cluster of the candidate data. A program that is a value that indicates .

On the computer
Acquiring multiple sound data representing environmental sounds and
Extracting the features of each sound data and
Using the extracted features to classify the plurality of sound data into each category,
Analyzing the sound data based on each of the classified categories using the identification information of the sound data and the direction of the sound data and / or the generation time of the sound data.
Visualizing the analyzed analysis results and
To execute ,
The above classification is
By repeating the update of each of the first evaluation value and the second evaluation value based on the similarity matrix between each sound data generated using the extracted feature data, a final cluster is generated and classified into each category. Then, the autocorrelation of the similarity matrix is set using the similarity with respect to the outlier data in the cluster generated in the calculation process of the first evaluation value and the second evaluation value, and the first evaluation value is The cluster-centered candidate data is a value indicating the appropriateness of the member data belonging to the cluster to be the cluster center, and the second evaluation value is the appropriateness of the member data belonging to the cluster of the candidate data. A computer-readable recording medium on which a program is recorded, which is a value indicating .