JP2014048522A

JP2014048522A - Situation generation model creation apparatus and situation estimation apparatus

Info

Publication number: JP2014048522A
Application number: JP2012192225A
Authority: JP
Inventors: Keisuke Imoto; 桂右井本; Suehiro Shimauchi; 末廣島内; Naka Omuro; 仲大室; Yoichi Haneda; 陽一羽田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-08-31
Filing date: 2012-08-31
Publication date: 2014-03-17
Anticipated expiration: 2032-08-31
Also published as: JP5818759B2

Abstract

PROBLEM TO BE SOLVED: To accurately estimate new data without excessive fitting of a generation model to data used for model calculation.SOLUTION: Information for learning, which includes a group of acoustic signal sequences in time series and acoustic event information indicating an acoustic event corresponding to the acoustic signal sequences, is used for obtaining the probability P (acoustic event|situation) that a potential on-site situation regulated by an acoustic event generates an acoustic event and the probability P (situation|acoustic signal sequence) that an acoustic signal sequence generates a situation, so as to create a generation model.

Description

この発明は、音響イベント情報を利用して、場の状況の生成モデルを作成する技術、及び、作成された生成モデルを利用して状況を推定する技術に関する。 The present invention relates to a technique for creating a generation model of a field situation using acoustic event information, and a technique for estimating a situation using the created generation model.

非特許文献１に開示された従来技術では、状況を表す音響信号に対して、短時間（２０ｍｓｅｃ〜１００ｍｓｅｃ程度）ごとにその短時間音響信号が何の音（足音，水が流れる音；以後、音響イベントとする）であるかを示すラベルが付与された、音響イベントラベル付き音響信号列を入力とし、連続する有限個のフレーム分の音響イベントラベルを用いて音響イベントラベルごとのヒストグラムを作成する。また、生成された音響イベントラベルごとのヒストグラムに対してＧＭＭ（Gaussian Mixture Model）、ＨＭＭ（Hidden Markov Model）、ＳＶＭ（Support Vector Machine）等のモデル化手法を用い、状況モデルを生成する。 In the prior art disclosed in Non-Patent Document 1, with respect to an acoustic signal representing a situation, what kind of sound (footstep, water-flowing sound) the short-time acoustic signal is every short time (about 20 msec to 100 msec); Create a histogram for each acoustic event label using the acoustic event label for a finite number of consecutive frames as an input. . In addition, a situation model is generated using a modeling technique such as GMM (Gaussian Mixture Model), HMM (Hidden Markov Model), or SVM (Support Vector Machine) for the generated histogram for each acoustic event label.

さらに、上記状況モデルと新たに入力されたラベル付き音響信号列から算出された音響イベントのヒストグラムをそれぞれ比較し（例えば、ユークリッド距離やコサイン距離などを用いて比較する）、複数の状況モデルのうち、最も判断基準に適合しているものをその音響信号列に対応する状況を表すと判定する。このように、従来技術では音響信号列から状況を推定することができる。 Furthermore, the above situation model and the histogram of the acoustic event calculated from the newly input labeled acoustic signal sequence are respectively compared (for example, using the Euclidean distance or the cosine distance), and among the plurality of situation models. Then, it is determined that the one that best meets the judgment criterion represents the situation corresponding to the acoustic signal sequence. Thus, according to the conventional technique, the situation can be estimated from the acoustic signal sequence.

井本他，「複数の生活音の出現頻度に基づくユーザ行動の識別手法とコミュニケーションへの応用」，画像電子学会第３２回ＶＭＡ研究会Imoto et al., “User Action Identification Based on Frequency of Multiple Living Sounds and its Application to Communication”, The 32nd VMA Research Meeting of the Institute of Image Electronics Engineers of Japan

従来技術では、モデル算出に利用するデータが表す音響イベントの発生頻度そのものを直接モデル化している。このような方法では、他の音響イベントの発生頻度を適切にモデル化できない。このような方法で生成された状況モデルは、モデル算出に利用したデータに過剰にフィッティングしてしまう。そのため、従来技術によって算出された状況モデルを用いて状況識別を行う際、モデル算出に利用するデータから、わずかに異なるデータが入力されただけでも、類似度が非常に低い状況と判定されてしまう。しかしながら、状況モデルを算出する際に利用したデータと、推定時に入力されるデータがほぼ同一であることは稀である。よって、従来技術の状況モデルの利用は、状況推定精度の劣化につながる。 In the prior art, the occurrence frequency itself of the acoustic event represented by the data used for model calculation is directly modeled. Such a method cannot appropriately model the frequency of occurrence of other acoustic events. The situation model generated by such a method is excessively fitted to the data used for model calculation. Therefore, when performing situation identification using a situation model calculated by the prior art, even if slightly different data is input from the data used for model calculation, it is determined that the degree of similarity is very low. . However, it is rare that the data used when calculating the situation model and the data input at the time of estimation are almost the same. Therefore, use of the state model of the prior art leads to deterioration of the state estimation accuracy.

本発明では、生成されたモデルがモデル算出に利用されたデータに過剰にフィッティングすることなく、新たなデータに対して精度のよい状況推定を行うことを可能にする技術を提供する。 The present invention provides a technique that makes it possible to accurately estimate a situation for new data without excessively fitting the generated model to data used for model calculation.

本発明では、時系列の音響信号列の集合と、音響信号列に対応する音響イベントを表す音響イベント情報と、を含む学習用情報を用い、音響イベントによって規定される潜在的な場の状況が音響イベントを生成する確率Ｐ（音響イベント｜状況）と、音響信号列が状況を生成する確率Ｐ（状況｜音響信号列）とを得て生成モデルを作成する。 In the present invention, a situation of a potential field defined by an acoustic event is determined using information for learning including a set of time-series acoustic signal sequences and acoustic event information representing an acoustic event corresponding to the acoustic signal sequence. A generation model is created by obtaining a probability P (acoustic event | situation) for generating an acoustic event and a probability P (situation | acoustic signal string) for generating an acoustic signal sequence.

本発明では、音響イベントの発生頻度そのものを直接モデル化するのではなく、生成モデルの生成過程を、確率Ｐ（音響イベント｜状況）及び確率Ｐ（状況｜音響信号列）によって確率的に取り扱う。これにより、生成モデルがそのモデル算出に利用されたデータに過剰にフィッティングすることを抑制でき、新たなデータに対して精度のよい推定を行うことが可能となる。 In the present invention, the generation frequency of the acoustic event itself is not directly modeled, but the generation process of the generation model is probabilistically handled by the probability P (acoustic event | situation) and the probability P (situation | acoustic signal sequence). Thereby, it is possible to prevent the generated model from being excessively fitted to the data used for the model calculation, and it is possible to perform accurate estimation for new data.

第１実施形態の状況生成モデル作成装置のブロック図。The block diagram of the situation generation model creation apparatus of 1st Embodiment. 音響イベントラベル付き音響信号列を例示した図。The figure which illustrated the acoustic signal sequence with an acoustic event label. 第１実施形態の変形例１の状況生成モデル作成装置のブロック図。The block diagram of the situation generation model creation apparatus of the modification 1 of 1st Embodiment. 第１実施形態の変形例２の状況生成モデル作成装置のブロック図。The block diagram of the situation generation model creation apparatus of the modification 2 of 1st Embodiment. 第２実施形態の状況推定装置のブロック図。The block diagram of the condition estimation apparatus of 2nd Embodiment. 第２実施形態の変形例１の状況推定装置のブロック図。The block diagram of the condition estimation apparatus of the modification 1 of 2nd Embodiment. 第２実施形態の変形例２の状況推定装置のブロック図。The block diagram of the condition estimation apparatus of the modification 2 of 2nd Embodiment. 第３実施形態の状況推定装置のブロック図。The block diagram of the condition estimation apparatus of 3rd Embodiment. 第３，４実施形態の変形例１の状況推定装置のブロック図。The block diagram of the condition estimation apparatus of the modification 1 of 3rd, 4th embodiment. 第３，４実施形態の変形例２の状況推定装置のブロック図。The block diagram of the condition estimation apparatus of the modification 2 of 3rd, 4th embodiment. 第４実施形態の状況推定装置のブロック図。The block diagram of the condition estimation apparatus of 4th Embodiment.

以下、図面を参照して本発明の実施形態を説明する。
＜用語の定義＞
実施形態で用いる用語を定義する。
「音響イベント」とは、音の事象を意味する。「音響イベント」の具体例は、「包丁の音」「水が流れる音」「水音」「着火音」「火の音」「足音」「掃除機の排気音」などである。
「状況」とは、音響イベントによって規定される、潜在的な場の状況を意味する。状況の生成確率は、その状況が起こる時間区間での行動に規定され、状況は、その状況が起こる時間区間での音響イベントの生成確率を規定する。すなわち、状況は、行動と、行動によって規定される生成確率とによって表現可能である。また、音響イベントは、状況と、状況によって規定される生成確率とによって表現可能である。なお「行動」とは、人間、動物、装置などの主体が行う何らかの行動を意味する。「行動」の具体例は「料理」「掃除」などである。
「ＸがＹを生成する確率」とは、事象Ｘが起こるという条件のもとでの事象Ｙが起こる確率をいう。「ＸがＹを生成する確率」は、「ＸのもとでのＹの条件付き確率」や「ＸにおけるＹの条件付き確率」とも表現できる。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
<Definition of terms>
Terms used in the embodiments are defined.
An “acoustic event” means a sound event. Specific examples of the “acoustic event” include “knife sound”, “water flowing sound”, “water sound”, “ignition sound”, “fire sound”, “foot sound”, and “vacuum exhaust sound”.
“Situation” means a potential field situation defined by an acoustic event. The probability of situation generation is defined by the action in the time interval in which the situation occurs, and the situation defines the probability of acoustic event generation in the time interval in which the situation occurs. That is, the situation can be expressed by an action and a generation probability defined by the action. An acoustic event can be expressed by a situation and a generation probability defined by the situation. Note that “behavior” means any action performed by a subject such as a human being, an animal, or a device. Specific examples of “action” include “cooking” and “cleaning”.
“Probability that X generates Y” refers to the probability that event Y will occur under the condition that event X occurs. The “probability that X generates Y” can also be expressed as “the conditional probability of Y under X” or “the conditional probability of Y in X”.

＜第１実施形態＞
第１実施形態では、学習用情報として音響イベントラベル付き音響信号を入力とし、学習によって、音響信号列が状況を生成する確率をＰ（状況｜音響信号列）としたときの音響信号−状況生成モデル、及び、状況が音響イベントを生成する確率をＰ（音響イベント｜状況）としたときの状況−音響イベント生成モデルを算出する。 <First Embodiment>
In the first embodiment, an acoustic signal labeled with an acoustic event label is input as learning information, and an acoustic signal-situation generation when the probability that the acoustic signal sequence generates a situation by learning is P (situation | acoustic signal sequence) by learning. A situation-acoustic event generation model is calculated when the model and the probability that the situation generates an acoustic event is P (acoustic event | situation).

図１に例示するように、本形態の状況生成モデル作成装置１００は、音響信号列合成部１０１、状況モデル化部１０２、及び記憶部１０３を有する。状況生成モデル作成装置１００は、例えば、公知又は専用のコンピュータに所定のプログラムが読み込まれることで構成される。 As illustrated in FIG. 1, the situation generation model creation device 100 according to this embodiment includes an acoustic signal sequence synthesis unit 101, a situation modeling unit 102, and a storage unit 103. The situation generation model creation device 100 is configured, for example, by reading a predetermined program into a known or dedicated computer.

まず音響信号列合成部１０１に、音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ（ただし、Ｓは１以上の整数）が入力される。図２に例示するように、各音響イベントラベル付き音響信号列１１−ｓ（ただし、ｓ∈｛１，・・・，Ｓ｝）は、時系列の音響信号列１１ａ−ｓ、各音響信号列１１ａ−sに対応する音響信号列番号、短時間（数１０ｍｓｅｃ〜数ｓｅｃ）ごとに区分された音響信号列の各要素に対応する要素番号、及び短時間ごとに決定されて付与された音響イベントラベル（「音響イベント情報」に相当）を含む。各音響信号列１１ａ−ｓは音を表すデジタル信号列である。音響イベントラベルは、音響信号列の各要素に対応する音響イベントを表すラベルであり、音響信号列の要素ごとに付与される。１個の音響信号列番号には、１個以上の要素番号が対応する。 First, acoustic signal sequences with acoustic event labels 11-1,..., 11-S (where S is an integer equal to or greater than 1) are input to the acoustic signal sequence synthesis unit 101. As illustrated in FIG. 2, each acoustic event-labeled acoustic signal sequence 11-s (where sε {1,..., S}) is a time-series acoustic signal sequence 11a-s, each acoustic signal sequence. Acoustic signal sequence number corresponding to 11a-s, element number corresponding to each element of the acoustic signal sequence divided every short time (several tens of milliseconds to several seconds), and acoustic event determined and given every short time A label (corresponding to “acoustic event information”) is included. Each acoustic signal sequence 11a-s is a digital signal sequence representing sound. The acoustic event label is a label representing an acoustic event corresponding to each element of the acoustic signal string, and is given to each element of the acoustic signal string. One or more element numbers correspond to one acoustic signal sequence number.

複数個の音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ（以下、単に「ラベル付き音響信号列１１−１，・・・，１１−Ｓ」という）が音響信号列合成部１０１に入力された場合、音響信号列合成部１０１は、それらを時系列方向につなぎ合わせ、それによって１つの音響イベントラベル付き音響信号列１１（以下、単に「ラベル付き音響信号列１１」という）を得て出力する（合成処理）。音響信号列合成部１０１に１つの音響信号列１１−１のみが入力された場合、音響信号列合成部１０１はそれをラベル付き音響信号列１１として出力する。音響信号列合成部１０１から出力された音響イベントラベル付き音響信号列は、状況モデル化部１０２に入力される。なお、音響信号列合成部１０１を経由することなく、１つラベル付き音響信号列１１がそのまま状況モデル化部１０２に入力されてもよい。 A plurality of acoustic event-labeled acoustic signal sequences 11-1,..., 11-S (hereinafter simply referred to as “labeled acoustic signal sequences 11-1,..., 11-S”) are synthesized. When input to the unit 101, the acoustic signal sequence synthesizing unit 101 connects them in the time-series direction, thereby making one acoustic event labeled acoustic signal sequence 11 (hereinafter simply referred to as “labeled acoustic signal sequence 11”). ) And output (compositing process). When only one acoustic signal sequence 11-1 is input to the acoustic signal sequence synthesis unit 101, the acoustic signal sequence synthesis unit 101 outputs it as a labeled acoustic signal sequence 11. The acoustic signal sequence with the acoustic event label output from the acoustic signal sequence synthesis unit 101 is input to the situation modeling unit 102. Note that one labeled acoustic signal sequence 11 may be directly input to the situation modeling unit 102 without going through the acoustic signal sequence combining unit 101.

状況モデル化部１０２は、以下の手順に従って、入力されたラベル付き音響信号列１１から、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を生成し、それらを記憶部１０３に格納する。 The situation modeling unit 102 generates an acoustic signal-situation generation model 12 and a situation-acoustic event generation model 13 from the input labeled acoustic signal sequence 11 according to the following procedure, and stores them in the storage unit 103. To do.

［状況から音響イベントが生成される過程の論理的説明］
状況モデル化部１０２は、ラベル付き音響信号列１１から、音響信号列が状況を生成する確率Ｐ（状況｜音響信号列）と、状況が音響イベントを生成する確率Ｐ（音響イベント｜状況）とを算出する。状況モデル化部１０２は、確率Ｐ（状況｜音響信号列）としたときの音響信号−状況生成モデル１２と、確率Ｐ（音響イベント｜状況）としたときの状況−音響イベント生成モデル１３とを生成する。つまり本形態では、音響信号列が潜在的な場の状況の生成確率を規定し、状況が音響イベントの生成確率を規定すると考え、これらの関係を各生成モデルとして記述する。 [Logical explanation of the process of generating an acoustic event from the situation]
The situation modeling unit 102 determines, from the labeled acoustic signal sequence 11, the probability P (situation | acoustic signal sequence) that the acoustic signal sequence generates a situation and the probability P (acoustic event | situation) that the situation generates an acoustic event. Is calculated. The situation modeling unit 102 sets the acoustic signal-situation generation model 12 when the probability P (situation | acoustic signal sequence) is assumed and the situation-acoustic event generation model 13 when the probability P (acoustic event | situation) is assumed. Generate. That is, in this embodiment, it is assumed that the acoustic signal sequence defines the generation probability of the potential field situation, and the situation defines the generation probability of the acoustic event, and these relationships are described as each generation model.

生成確率Θ、生成確率Φ、音響信号列の集合Ωが与えられた場合における、音響イベントの列ｅの生成確率Ｐ（ｅ｜Θ，Φ，Ｓ）は以下の通りである。

ただし、Ｓはラベル付き音響信号列１１に含まれる音響信号列１１ａ−ｓ（ただし、ｓ∈｛１，・・・，Ｓ｝）の個数、Ｔは状況の種類の個数、Ｅは音響イベントの種類の個数、ｅはラベル付き音響信号列１１に与えられた音響イベントの列（ベクトル）、Θは音響信号列１１ａ−ｓ（ただし、ｓ∈｛１，・・・，Ｓ｝）が状況ｔ（ただし、ｔ∈｛１，・・・，Ｔ｝）を生成する確率Ｐ（ｔ｜ｓ）を（ｓ，ｔ）要素とするＳ×Ｔ行列、Φは状況ｔ（ただし、ｔ∈｛１，・・・，Ｔ｝）が音響イベントε（ただし、ε∈｛１，・・・，Ｅ｝）を生成する確率Ｐ（ε｜ｔ）を（ｔ，ε）要素とするＴ×Ｅ行列、Ωは音響信号列１１ａ−１，・・・，１１ａ−Ｓと成り得る列の集合、ｅ’_ｓは音響信号列１１ａ−ｓに与えられた音響イベントの列（Ｎ_ｓ次元ベクトル：Ｎ_ｓは音響信号列１１ａ−ｓに対応する音響イベントの個数）を表す。 The generation probability P (e | Θ, Φ, S) of the acoustic event sequence e when the generation probability Θ, the generation probability Φ, and the acoustic signal sequence set Ω are given is as follows.

Where S is the number of acoustic signal sequences 11a-s (where sε {1,..., S}) included in the labeled acoustic signal sequence 11, T is the number of types of situations, and E is the number of acoustic events. The number of types, e is the sequence (vector) of acoustic events given to the labeled acoustic signal sequence 11, and Θ is the acoustic signal sequence 11a-s (where sε {1,..., S}) (Where tε {1,..., T}) is an S × T matrix whose probability P (t | s) is (s, t) elements, and Φ is a situation t (where tε {1 ,..., T}) is a T × E matrix having a probability P (ε | t) that generates an acoustic event ε (where εε {1,..., E}) as (t, ε) elements. , Omega acoustic signal sequence 11a-1, ···, set of columns that can be a _11a-S, e _'s column of acoustic events given the acoustic signal sequence _{11a-s (N} s order Vector: _{N s} is the number of acoustic events) corresponding to the acoustic signal sequence 11a-s.

生成確率Θ、生成確率Φが与えられたときの、音響イベントの列ｅ’_ｓの生成確率Ｐ（ｅ’_ｓ｜Θ，Φ）は、以下の通りである。

ただし、ｅ_ｉは音響信号列１１ａ−ｓの要素番号ｉに対応する音響イベントラベルが表す音響イベント、Ｎ_ｓは音響信号列１１ａ−ｓに対応する要素数（要素番号ｉの最大値）、ｚ_ｉは音響信号列１１ａ−ｓの要素番号ｉに対応する状況、φ_ｔは状況ｔが音響イベントε（ただし、ε∈｛１，・・・，Ｅ｝）を生成する確率Ｐ（ε｜ｔ）をε番目の要素とするＥ次元ベクトル、θ_ｓは音響信号列１１ａ−ｓが状況ｔ（ただし、ｔ∈｛１，・・・，Ｔ｝）を生成する確率Ｐ（ｔ｜ｓ）をｔ番目の要素とするＴ次元ベクトル、φ_ｅｉ，ｔ（下付き添え字の「ｅｉ，ｔ」は「ｅ_ｉ，ｔ」）は状況ｔが音響イベントｅ_ｉを生成する確率Ｐ（ｅ_ｉ｜ｔ）、θ_ｔｓは音響信号列１１ａ−ｓが状況ｔを生成する確率Ｐ（ｔ｜ｓ））、αはθ_ｓ及びθ_ｔｓが従うＤｉｒｉｃｈｌｅｔ分布の性質を決める超パラメータ（例えば０．０１などの非負値をとる）、βはφ_ｔ及びφ_ｅｉ，ｔが従うＤｉｒｉｃｈｌｅｔ分布の性質を決める超パラメータ（例えば０．０１などの非負値をとる）を表す。ここで、Ｐ（φ_ｔ｜β）及びＰ（θ_ｓ｜α）はそれぞれβ，αをパラメータとするＤｉｒｉｃｈｌｅｔ分布に従うと仮定する。Ｗ−１次（Ｗは２以上の整数）のＤｉｒｉｃｈｌｅｔ分布の確率密度関数は以下の通りである。

ただし、Γはガンマ関数を表す。 The generation probability P (e ′ _s | Θ, Φ) of the acoustic event sequence e ′ _s when the generation probability Θ and the generation probability Φ are given is as follows.

However, e _i is the acoustic event represented by the acoustic event label corresponding to the element number i of the acoustic signal sequence 11a-s, N _s is the number of elements corresponding to the acoustic signal sequence 11a-s (maximum value of the element number i), z _i is a situation corresponding to the element number _i of the acoustic signal sequence 11a-s, and φ _t is a probability P (ε | t that the situation t generates an acoustic event ε (where εε {1,..., E}). ) Is an E-dimensional vector having ε-th element, and θ _s is a probability P (t | s) that the acoustic signal sequence 11a-s generates the situation t (where tε {1,..., T}). t-dimensional vector to the t-th _element, φ _{ei, t} (of the subscript "ei, t" is _"e i, t") probability situation t is to generate an acoustic event _{e i} is P _(e i | t), θ _ts is the probability P (t | s) that the acoustic signal sequence 11a-s generates the situation t, and α is θ _s and θ _ts A hyperparameter that determines the nature of the following Dirichlet distribution (for example, takes a non-negative value such as 0.01), β is a hyperparameter that determines the nature of the Dirichlet distribution that follows φ _t and φ _{ei, t} (for example, a non-negative value such as 0.01) ). Here, it is assumed that P (φ _t | β) and P (θ _s | α) follow a Dirichlet distribution with β and α as parameters, respectively. The probability density function of the W-1 order (W is an integer of 2 or more) Dirichlet distribution is as follows.

Where Γ represents a gamma function.

［生成モデルの算出過程の説明］
状況モデル化部１０２は、音響信号列が状況を生成する確率Ｐ（状況｜音響信号列）、及び状況が音響イベントを生成する確率Ｐ（音響イベント｜状況）を算出し、それぞれと対応する音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を算出する。 [Description of generation model calculation process]
The situation modeling unit 102 calculates the probability P (situation | acoustic signal string) that the acoustic signal sequence generates a situation and the probability P (acoustic event | situation) that the situation generates an acoustic event, and the corresponding sound. A signal-situation generation model 12 and a situation-acoustic event generation model 13 are calculated.

音響信号−状況生成モデル１２及び状況−音響イベント生成モデル１３の算出には、マルコフ連鎖モンテカルロ法（ＭＣＭＣ法：Markov Chain Monte Carlo methods）や変分ベイズ法（ＶＢ法：Variational Bayes methods）などの手法を用いることができる。また、ＭＣＭＣ法には，Ｍ−Ｈアルゴリズムやギブスサンプリングなどの手法があるが、ここではギブスサンプリングによる生成モデルの算出手法について説明を行う。 For the calculation of the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13, methods such as Markov chain Monte Carlo method (MCMC method) and variational Bayes method (VB method) Can be used. The MCMC method includes methods such as an MH algorithm and Gibbs sampling. Here, a generation model calculation method based on Gibbs sampling will be described.

［生成モデルの算出方法の例示］
以下に、ギブスサンプリングを用いた音響信号−状況生成モデル１２及び状況−音響イベント生成モデル１３の算出方法を例示する。 [Example of generation model calculation method]
Below, the calculation method of the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13 using Gibbs sampling is illustrated.

(I)状況モデル化部１０２は、ハイパパラメータα，βを決定する。ハイパパラメータα，βは、状況モデル化部１０２が持つ記憶部にあらかじめ保存された値を用いても良いし、ランダムに決定しても良い。また、ハイパパラメータα，βは、状況モデル化部１０２が持つ記憶部にあらかじめ保存された値やランダムに決定された値を初期値として、以下の手順により算出、更新してもよい。 (I) The situation modeling unit 102 determines hyperparameters α and β. As the hyper parameters α and β, values stored in advance in the storage unit included in the situation modeling unit 102 may be used, or may be determined at random. Further, the hyper parameters α and β may be calculated and updated by the following procedure using values stored in advance in the storage unit of the situation modeling unit 102 or randomly determined values as initial values.

(II)状況モデル化部１０２は、入力されたラベル付き音響信号列１１が含む音響信号列１１ａ−１，・・・，１１ａ−Ｓの各要素に状況ｔを一様分布に従って割り当てる。或いは、状況モデル化部１０２は、事前に状況モデル化部１０２に設定されていた方法に従って、音響信号列１１ａ−１，・・・，１１ａ−Ｓの各要素に状況ｔを割り当ててもよい。 (II) The situation modeling unit 102 assigns the situation t to each element of the acoustic signal sequences 11a-1,..., 11a-S included in the input labeled acoustic signal sequence 11 according to a uniform distribution. Alternatively, the situation modeling unit 102 may assign the situation t to each element of the acoustic signal sequence 11a-1, ..., 11a-S in accordance with a method set in the situation modeling unit 102 in advance.

さらにＳ≧２の場合、状況モデル化部１０２は、ラベル付き音響信号列１１が含む音響信号１１ａ−１，・・・，１１ａ−Ｓの各要素に対して要素番号ｉを付与しなおす。すなわち、ラベル付き音響信号列１１が含むすべての要素に対して互いに異なる要素番号ｉ（ただし、ｉ∈｛１，・・・，Ｕ｝）を付与する。ただし、Ｕはラベル付き音響信号列１１に対応する要素の総数であり、Ｕ＝Ｎ_１＋・・・＋Ｎ_Ｓを満たす。 Further, when S ≧ 2, the situation modeling unit 102 reassigns the element number i to each element of the acoustic signals 11a-1, ..., 11a-S included in the labeled acoustic signal sequence 11. That is, different element numbers i (where i∈ {1,..., U}) are assigned to all elements included in the labeled acoustic signal sequence 11. However, U is the total number of elements corresponding to the labeled acoustic signal 11 _satisfies _{U = N 1 + ··· + N} S.

状況モデル化部１０２は、各要素に状況ｔが割り当てられ、要素番号ｉが付与しなおされた（Ｓ≧２の場合）音響信号列を、「更新対象のラベル付き音響信号列」の初期値とする。 The situation modeling unit 102 assigns an acoustic signal string in which the condition t is assigned to each element and the element number i is reassigned (when S ≧ 2) to the initial value of the “labeled acoustic signal string to be updated”. And

(III)状況モデル化部１０２は、更新対象のラベル付き音響信号列の全ての要素（要素番号ｉ∈｛１，・・・，Ｎ｝）について、以下の(III-1)及び(III-2)を規定の回数（正値、１〜１０００回程度）、若しくは、所望の結果が得られるまで（例えば、割り当ての前後において、状況の割り当て先の変化が一定の閾値（例えば３０％）以下になるまでなど）繰り返す。 (III) The situation modeling unit 102 performs the following (III-1) and (III-) for all the elements (element numbers iε {1,..., N}) of the labeled acoustic signal sequence to be updated. 2) the specified number of times (positive value, about 1 to 1000 times) or until a desired result is obtained (for example, before and after allocation, the change in the allocation destination of the situation is below a certain threshold (for example, 30%) Repeat until it becomes).

(III-1)状況モデル化部１０２は、更新対象のラベル付き音響信号列について、要素番号ｉの音響イベントεに状況ｔが割り当てられる確率分布を、全ての状況ｔについて更新する。以下に、更新後の確率分布Ｐ（ｚ_ｉ＝ｔ｜ｅ_ｉ＝ε，ｚ_−ｉ，ｅ_−ｉ，Ω，α，β）を示す。

ただし、Ｃ_εｔ ^ＥＴは更新対象のラベル付き音響信号列で音響イベントεに状況ｔが割り当てられた回数を表し、Ｃ_ｔｓ ^ＴＳは更新対象のラベル付き音響信号列で状況ｔが音響信号列１１ａ−ｓに割り当てられた回数を表す。なお、表記制約上の都合から「Ｃ_εｔ ^ＥＴ」「Ｃ_ｔｓ ^ＴＳ」と表記するが、本来は式（３）に示すように「Ｃ_εｔ ^ＥＴ」の「ＥＴ」は「εｔ」の上に表記され、「Ｃ_ｔｓ ^ＴＳ」の「ＴＳ」は「ｔｓ」の上に表記される。ｚ_−ｉは要素番号ｉ以外の要素番号に対応する状況からなる列、ｅ_−ｉは要素番号ｉ以外の要素番号に対応する音響イベントからなる列を表す。 (III-1) The situation modeling unit 102 updates the probability distribution in which the situation t is assigned to the acoustic event ε of the element number i for all the situations t with respect to the acoustic signal sequence to be updated. The updated probability distribution P (z _i = t | e _i = ε, z _−i , e _−i , Ω, α, β) is shown below.

Where C _εt ^ET represents the number of times the situation t is assigned to the acoustic event ε in the acoustic signal sequence to be updated, and C _ts ^TS is the acoustic signal sequence to be updated and the situation t is the acoustic signal sequence 11a−. This represents the number of times assigned to s. Note that “C _εt ^ET ” and “C _ts ^TS ” are indicated for convenience of notation, but originally “ ^ET ” of “C _εt ^ET ” is written above “εt” as shown in Equation (3). “ ^TS ” of “C _ts ^TS ” is written on “ts”. z- _i represents a column composed of situations corresponding to element numbers other than the element number i, and e- _i represents a column composed of acoustic events corresponding to element numbers other than the element number i.

(III-2)状況モデル化部１０２は、上記の更新式（３）で得られた確率分布Ｐ（ｚ_ｉ＝ｔ｜ｅ_ｉ＝ε，ｚ_−ｉ，ｅ_−ｉ，Ω，α，β）に従って、各要素番号ｉに割り当てる状況をランダムにサンプリングする。状況モデル化部１０２は、このようにサンプリングした状況を各要素番号ｉの要素に割り当て、更新対象のラベル付き音響信号列を更新する。 (III-2) The situation modeling unit 102 obtains the probability distribution P (z _i = t | e _i = ε, z _−i , e _−i , Ω, α, β obtained by the update formula (3) above. ), The situation assigned to each element number i is randomly sampled. The situation modeling unit 102 assigns the situation sampled in this way to the element of each element number i, and updates the labeled acoustic signal sequence to be updated.

(IV)状況モデル化部１０２は、上記の(III-1)(III-2)の繰り返しによって最終的に得られたＣ_εｔ ^ＥＴ及びＣ_ｔｓ ^ＴＳを用い、以下を計算する。

(IV) The situation modeling unit 102 calculates the following using C _εt ^ET and C _ts ^TS finally obtained by repeating the above (III-1) and (III-2).

これによって状況モデル化部１０２は、音響信号列が状況を生成する確率θ_ｔｓ（ただし、ｓ∈｛１，・・・，Ｓ｝，ｔ∈｛１，・・・，Ｔ｝）の集合及び状況が音響イベントを生成する確率φ_εｔ（ただし、ｔ∈｛１，・・・，Ｔ｝，ε∈｛１，・・・，Ｅ｝）の集合を得、それぞれを音響信号−状況生成モデル１２及び状況−音響イベント生成モデル１３とする。例えば状況モデル化部１０２は、確率θ_ｔｓを（ｓ，ｔ）要素とするＳ×Ｔ行列を音響信号−状況生成モデル１２とし、確率φ_εｔを（ｔ，ε）要素とするＴ×Ｅ行列を状況−音響イベント生成モデル１３とする。 As a result, the situation modeling unit 102 sets a set of probabilities θ _ts (where s∈ {1,..., S}, t∈ {1,..., T}) that the acoustic signal sequence generates the situation and A set of probabilities φ _εt (where t∈ {1,..., T}, ε∈ {1,..., E}) that the situation generates an acoustic event is obtained as an acoustic signal-situation generation model. 12 and the situation-acoustic event generation model 13. For example, the situation modeling unit 102 uses the S × T matrix having the probability θ _ts as the (s, t) element as the acoustic signal-situation generation model 12 and the T × E matrix having the probability φ _εt as the (t, ε) element. Is a situation-acoustic event generation model 13.

或いは、状況モデル化部１０２は、(III-1)の繰り返し処理時に、式（３）の算出過程で得られる確率θ_ｔｓ及び確率φ_εｔをそれぞれ１個以上サンプリングし、式（４）（５）に代えて、サンプリングされた確率θ_ｔｓの平均値及び確率φ_εｔの平均値を用い、音響信号−状況生成モデル１２及び状況−音響イベント生成モデル１３を得てもよい。 Alternatively, the situation modeling unit 102 samples one or more probabilities θ _ts and probabilities φ _εt obtained in the calculation process of the equation (3) during the repetition process of (III-1), and the equations (4), (5) ) _{May be} used to obtain the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13 using the average value of the sampled probabilities θ _{ts and} the average value of the probability φ _εt .

また状況モデル化部１０２は、(III)に示す更新処理を１回行うごとに、以下に示す更新式を用いてハイパパラメータα,βを更新することも可能である。

ただし、α_ｎｅｘｔ，β_ｎｅｘｔは更新後のハイパパラメータα，βであり、ψ（ｚ）はディガンマ関数を表す。また、ディガンマ関数は以下の式で表わされる、ガンマ関数Γ（ｚ）の対数微分である。

ただし、Γ’（ｚ）はガンマ関数Γ（ｚ）の導関数である。 In addition, the situation modeling unit 102 can also update the hyperparameters α and β using the following update formula each time the update process shown in (III) is performed once.

Here, α _next and β _next are the updated hyperparameters α and β, and ψ (z) represents a digamma function. The digamma function is a logarithmic derivative of the gamma function Γ (z) expressed by the following equation.

Where Γ ′ (z) is a derivative of the gamma function Γ (z).

また、α及びβの更新を行った結果、α及びβの値がある閾値δ_１（＞０），δ_２（＞０）を超える又は下回る場合に対して、
ｉｆ α＜δ_１ｔｈｅｎ α＝δ_１
ｉｆ β＜δ_２ｔｈｅｎ β＝δ_２
等の処理を加えてもよい。 In addition, as a result of updating α and β, when α and β values exceed or fall below a certain threshold δ ₁ (> 0), δ ₂ (> 0),
if α <δ ₁ then α = δ ₁
if β <δ ₂ then β = δ ₂
Such processing may be added.

＜第１実施形態の変形例１＞
第１実施形態の変形例１では、音響信号列を入力として、学習によって、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を算出する。以降、同一のものには同じ参照符号を付し、説明は繰り返さない。 <Variation 1 of the first embodiment>
In Modification 1 of the first embodiment, an acoustic signal sequence is input and an acoustic signal-situation generation model 12 and a situation-acoustic event generation model 13 are calculated by learning. Hereinafter, the same reference numerals are given to the same components, and description thereof will not be repeated.

図３に例示するように、本形態の状況生成モデル作成装置１１０は、特徴量算出部１１１、音響イベント判定部１１２、音響イベントモデルデータベース（ＤＢ）１１３、音響信号列合成部１０１、状況モデル化部１０２、及び記憶部１０３を有する。状況生成モデル作成装置１１０は、例えば、公知又は専用のコンピュータに所定のプログラムが読み込まれることで構成される。 As illustrated in FIG. 3, the situation generation model creation device 110 according to the present exemplary embodiment includes a feature amount calculation unit 111, an acoustic event determination unit 112, an acoustic event model database (DB) 113, an acoustic signal sequence synthesis unit 101, and situation modeling. Unit 102 and storage unit 103. The situation generation model creation device 110 is configured, for example, by reading a predetermined program into a known or dedicated computer.

まず特徴量算出部１１１にラベルなし音響信号列１５−１，・・・，１５−Ｓが入力される。各ラベルなし音響信号列１５−ｓ（ただし、ｓ∈｛１，・・・，Ｓ｝）は、短時間（数１０ｍｓｅｃ〜数ｓｅｃ）ごとに区分された要素からなり、各要素には要素番号が付されている。 First, unlabeled acoustic signal sequences 15-1,..., 15-S are input to the feature amount calculation unit 111. Each unlabeled acoustic signal sequence 15-s (where sε {1,..., S}) is composed of elements divided every short time (several tens of milliseconds to several seconds), and each element has an element number. Is attached.

特徴量算出部１１１は、各ラベルなし音響信号列１５−ｓから、音響特徴量列（ベクトル）を算出して出力する。例えば特徴量算出部１１１は、入力されたラベルなし音響信号列１５−ｓに対し、前述の短時間（数１０ｍｓｅｃ〜数ｓｅｃ）からなるフレームごとに、音圧レベル、音響パワー、ＭＦＣＣ（Mel-Frequency Cepstrum Coefficient）特徴量、ＬＰＣ（Linear Predictive Coding）特徴量などを算出し、これらを音響特徴量列として出力する。さらに立ち上がり特性、調波性、時間周期性など（例えば、非特許文献１参照）の音響特徴量が音響特徴量列に加えられてもよい。 The feature amount calculation unit 111 calculates and outputs an acoustic feature amount sequence (vector) from each unlabeled acoustic signal sequence 15-s. For example, the feature amount calculation unit 111 performs the sound pressure level, the sound power, the MFCC (Mel−) for each frame composed of the short time (several tens of milliseconds to several seconds) with respect to the input unlabeled acoustic signal sequence 15-s. Frequency Cepstrum Coefficient (LPC) feature quantity, LPC (Linear Predictive Coding) feature quantity, etc. are calculated and output as an acoustic feature quantity sequence. Furthermore, acoustic feature quantities such as rising characteristics, harmonicity, and time periodicity (see, for example, Non-Patent Document 1) may be added to the acoustic feature quantity sequence.

立ち上がり特性とは、数十から数百ミリ秒ごとにおける、音響信号の大きさを表す指標の増加の度合いを表す指標である。ここで、音響信号の大きさを表す指標とは、例えば、音響信号の振幅の絶対値、音響信号の振幅の絶対値の対数値、音響信号のパワー又は音響信号のパワーの対数値である。例えば、以下の式（１０）で得られる値が０以上であればその値が立ち上がり特性とされ、式（１０）で得られる値が０未満であれば０が立ち上がり特性とされる。

ただし、ｋはフレームをＫ個の微小な時間区間（例えば１ｍｓｅｃ程度）に区分した場合の各時間区間に対応し、ｐ￣_ｋはｋ番目の時間区間でのサンプルの大きさを表す指標の代表値又は平均値を表す。なお、「サンプルの大きさを表す指標」の例は、サンプルの振幅、サンプルの振幅の絶対値、サンプルの振幅の対数値、サンプルのエネルギー、サンプルのパワー、又はサンプルのパワーの対数値などである。「サンプル」は音響信号列の各音響信号を表す。また、Δｐ￣_ｋはｐ￣_ｋの変化率を表す。例えば、Δｐ⁻ _ｋ＝ｐ⁻ _ｋ−ｐ⁻ _ｋ−１である。Δｐ⁻ _ｋ＝ｐ⁻ _ｋ＋１−ｐ⁻ _ｋとしてもよい。また、最小二乗法等の近似手法を用いてｋ番目の時間区間におけるｐ⁻ _ｋを近似した直線を求め、その時間区間におけるその直線の傾きをΔｐ⁻ _ｋとしてもよい。また、ｋ番目の時間区間を含む複数の時間区間におけるｐ￣_ｋ-κ，・・・，ｐ￣_ｋ-1，ｐ⁻ _ｋ，ｐ￣_ｋ+1,...ｐ￣_ｋ-κ’の近時曲線を求め、そのｋ番目の時間区間に対応する点での傾き（微分値）をΔｐ⁻ _ｋとしてもよい。またχを任意の文字として、χの右肩の「−」は、χの上付きバーを意味する。また式（１０）の分子における（ｐ￣_ｎ）^２を（ｐ￣_ｎ）^ｍとし、ｍを任意の値としても良い。 The rising characteristic is an index representing the degree of increase in the index representing the magnitude of the acoustic signal every several tens to several hundreds of milliseconds. Here, the index representing the magnitude of the acoustic signal is, for example, an absolute value of the amplitude of the acoustic signal, a logarithmic value of the absolute value of the amplitude of the acoustic signal, a power of the acoustic signal, or a logarithmic value of the power of the acoustic signal. For example, if the value obtained by the following expression (10) is 0 or more, the value is the rising characteristic, and if the value obtained by the expression (10) is less than 0, 0 is the rising characteristic.

Here, k corresponds to each time interval when the frame is divided into K minute time intervals (for example, about 1 msec), and p￣ _k is a representative index indicating the size of the sample in the kth time interval. Represents a value or average value. Examples of “index indicating sample size” are sample amplitude, absolute value of sample amplitude, logarithm of sample amplitude, sample energy, sample power, logarithm of sample power, etc. is there. “Sample” represents each acoustic signal in the acoustic signal sequence. In addition, Δp¯ _k represents the rate of change of the p¯ _k. For example, Δp ^_- _k = ^p ^- a _{k-1 - k} ^-p. ^{_{^{_{Δp - k = p - k +}}}} 1 -p - may be as _k. Alternatively, an approximation method such as a least square method may be used to obtain a straight line that approximates p ⁻ _k in the k-th time interval, and the slope of the straight line in that time interval may be Δp ⁻ _k . Also, p の_k-κ ,..., P￣ _k−1 , p ⁻ _k , p￣ _{k + 1} ,... P￣ _{k-κ ′ in} a plurality of time intervals including the k-th time interval. A recent curve may be obtained, and a slope (differential value) at a point corresponding to the k-th time interval may be Δp ⁻ _k . Further, with χ as an arbitrary character, “−” on the right shoulder of χ means a superscript bar of χ. Further, (p￣ _n ) ² in the numerator of formula (10) may be (p￣ _n ) ^m, and m may be an arbitrary value.

以下に調波性を例示する。

また、Ｎはフレームに含まれるサンプル数を表す１以上の整数、ｎはフレーム内の各サンプル点を表す１以上のＮ以下の整数、ｘ（ｎ）はサンプル点ｎでのサンプルの大きさを表す指標である。Ｒ_ｆｆ（τ）はｆ（ｎ）のラグτでの自己相関係数、ｍａｘ｛・｝は「・」の最大値を表す。ラグτは１以上Ｎ以下の整数である。Ｒ_ｆｆ（τ）は、例えば以下のように定義される。

The harmonic characteristics are exemplified below.

N is an integer of 1 or more representing the number of samples included in the frame, n is an integer of 1 or more and N or less representing each sample point in the frame, and x (n) is the size of the sample at the sample point n. It is an index to represent. R _ff (τ) represents the autocorrelation coefficient at the lag τ of f (n), and max {·} represents the maximum value of “·”. The lag τ is an integer from 1 to N. R _ff (τ) is defined as follows, for example.

以下に時間周期性を例示する。

ただし、Ｌは一周期とみなすサンプル数、Ｍは時間周期性の度合を計算するための周期数を表す１以上の整数、ｐ（・）はサンプルの大きさを表す指標を時間平滑化した値、ｐ￣はフレーム内でのサンプルの大きさを表す指標の平均値を表す。 The time periodicity is exemplified below.

Where L is the number of samples regarded as one period, M is an integer of 1 or more representing the number of periods for calculating the degree of time periodicity, and p (·) is a value obtained by time-smoothing an index representing the sample size. , P￣ represents the average value of the index indicating the size of the sample in the frame.

音響イベントモデルＤＢ１１３には、事前に算出された音響イベントモデルが複数保存されている。各音響イベントモデルは、音響イベントラベルが付された学習用の音響信号列から音響特徴量列を算出し、各音響イベントに対応する音響特徴量列をＧＭＭ，ＨＭＭ，ＳＶＭ等の周知のモデル化手法を用いてモデル化することで得られる（例えば参考文献：奥村学、高村大也、「言語処理のための機械学習入門」コロナ社）。 The acoustic event model DB 113 stores a plurality of acoustic event models calculated in advance. Each acoustic event model calculates an acoustic feature amount sequence from a learning acoustic signal sequence to which an acoustic event label is attached, and converts the acoustic feature amount sequence corresponding to each acoustic event into a well-known model such as GMM, HMM, or SVM. It is obtained by modeling using a technique (for example, reference: Manabu Okumura, Daiya Takamura, “Introduction to Machine Learning for Language Processing” Corona).

例えば、ＧＭＭの場合、音響イベントごとに音響特徴量の各種別に対応する音響イベントモデルが得られる。例えば、音響特徴量列がＦ種類（Ｆが１以上の整数）の音響特徴量ｙ_ι（ただし、ι∈｛１，・・・，Ｆ｝）からなる列ｙ_１，・・・，ｙ_Ｆである場合、各音響イベントに対応する音響イベントモデルは、それぞれ、以下のような確率モデルｐ（ｙ_ι）を要素とする列ｐ（ｙ_１），・・・，ｐ（ｙ_Ｆ）となる。

ただし、ｙ_ιは音響特徴量列（ベクトル）の要素、Ｊは正規分布の混合数、π_ｊは混合係数、Ｎ（・）は正規分布の確率密度関数、μ_ｊは分布の平均、Σ_ｊは分布の分散である。 For example, in the case of GMM, an acoustic event model corresponding to each type of acoustic feature is obtained for each acoustic event. For example, the sequence y ₁ ,..., Y _F of acoustic feature amounts y _ι (where ι∈ {1,..., F}) of F types (F is an integer of 1 or more) is included. , The acoustic event model corresponding to each acoustic event is a sequence p (y ₁ ),..., P (y _F ) whose elements are the following probability models p (y _ι ), respectively. .

Where y _ι is an element of the acoustic feature string (vector), J is the number of normal distributions, π _j is the mixing coefficient, N (•) is the probability density function of the normal distribution, μ _j is the average of the distribution, and Σ _j Is the distribution of the distribution.

或いは、音響イベントごとに音響特徴量列が対応付けられたものが音響イベントモデルとされてもよい。 Alternatively, an acoustic event model may be obtained by associating an acoustic feature quantity sequence with each acoustic event.

特徴量算出部１１１から出力された音響特徴量列は音響イベント判定部１１２に入力される。音響イベント判定部１１２は、入力された音響特徴量列と、音響イベントモデルＤＢ１１３に記憶されている複数の音響イベントモデルとをそれぞれ比較し、各フレーム（各要素番号ｉに対応）の音響特徴量列に対応する音響イベントを決定する。例えばＧＭＭが音響イベントモデルとして用いられる場合、音響イベント判定部１１２は、フレーム（要素番号ｉ）ごとに、入力された音響特徴量列の各要素ρ_ι（ただし、ι∈｛１，・・・，Ｆ｝）を各音響イベントに対応する式（１３）の各確率モデルに代入し、各音響イベントに対応する確率ｐ（ρ_１）×・・・×ｐ（ρ_Ｆ）を最大にする音響イベントを決定する。或いは、例えば音響イベントごとに音響特徴量列が対応付けられた音響イベントモデルの場合、音響イベント判定部１１２は、フレーム（要素番号ｉ）ごとに、入力された音響特徴量列との距離（ユークリッド距離やコサイン距離）が最も近い音響イベントモデルに対応する音響イベントを選択する。 The acoustic feature amount sequence output from the feature amount calculation unit 111 is input to the acoustic event determination unit 112. The acoustic event determination unit 112 compares the input acoustic feature quantity sequence with a plurality of acoustic event models stored in the acoustic event model DB 113, and the acoustic feature quantity of each frame (corresponding to each element number i). Determine the acoustic event corresponding to the column. For example, when the GMM is used as an acoustic event model, the acoustic event determination unit 112, for each frame (element number i), each element ρ _ι (where ι∈ {1,... , F}) is substituted into each probability model of the equation (13) corresponding to each acoustic event, and the sound that maximizes the probability p (ρ ₁ ) ×... × p (ρ _F ) corresponding to each acoustic event. Determine the event. Alternatively, for example, in the case of an acoustic event model in which an acoustic feature string is associated with each acoustic event, the acoustic event determination unit 112 determines the distance (Euclidean) from the input acoustic feature string for each frame (element number i). The acoustic event corresponding to the acoustic event model with the closest distance or cosine distance) is selected.

音響イベント判定部１１２は、各要素番号ｉに対して決定した音響イベントを表す音響イベントラベルを、ラベルなし音響信号列１５−ｓの各要素番号ｉの要素に付与する。音響イベント判定部１１２は、この処理を入力されたラベルなし音響信号列１５−１，・・・，１５−Ｓのすべての要素（すべての要素番号ｉ）について行い、その結果得られる音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓを出力する。 The acoustic event determination unit 112 assigns an acoustic event label representing the acoustic event determined for each element number i to the element of each element number i of the unlabeled acoustic signal sequence 15-s. The acoustic event determination unit 112 performs this process on all elements (all element numbers i) of the input unlabeled acoustic signal sequences 15-1,..., 15-S, and the acoustic event labels obtained as a result thereof The attached acoustic signal trains 11-1,..., 11-S are output.

音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓは、音響信号列合成部１０１に入力される。以降の処理は第１実施形態と同じである。 The acoustic signal strings with acoustic event labels 11-1,..., 11 -S are input to the acoustic signal string synthesizing unit 101. The subsequent processing is the same as in the first embodiment.

なお、音響信号列合成部１０１で音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓの合成処理を行うことに代えて、特徴量算出部１１１の前段でラベルなし音響信号列１５−１，・・・，１５−Ｄの合成処理を行っても良いし、音響イベント判定部１１２の前段で音響特徴量列の合成処理を行っても良い。 In addition, it replaces with performing the synthetic | combination process of the acoustic signal sequence with acoustic event label 11-1, ..., 11-S in the acoustic signal sequence synthetic | combination part 101, and an unlabeled acoustic signal sequence in the front | former stage of the feature-value calculation part 111 15-1,..., 15 -D may be performed, or the acoustic feature quantity sequence may be combined before the acoustic event determination unit 112.

＜第１実施形態の変形例２＞
第１実施形態の変形例２では、ラベルなし音響特徴量列を入力として、学習によって、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を算出する。 <Modification 2 of the first embodiment>
In the second modification of the first embodiment, the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13 are calculated by learning using an unlabeled acoustic feature string as an input.

図４に例示するように、本形態の状況生成モデル作成装置１２０は、音響イベント判定部１１２、音響イベントモデルデータベース（ＤＢ）１１３、音響信号列合成部１０１、状況モデル化部１０２、及び記憶部１０３を有する。状況生成モデル作成装置１２０は、例えば、公知又は専用のコンピュータに所定のプログラムが読み込まれることで構成される。 As illustrated in FIG. 4, the situation generation model creation device 120 of this embodiment includes an acoustic event determination unit 112, an acoustic event model database (DB) 113, an acoustic signal sequence synthesis unit 101, a situation modeling unit 102, and a storage unit. 103. The situation generation model creation device 120 is configured by, for example, a predetermined program being read into a known or dedicated computer.

まず音響イベント判定部１１２に、ラベルなし音響特徴量列１６−１，・・・，１６−Ｓが入力される。各ラベルなし音響特徴量列１６−ｓ（ただし、ｓ∈｛１，・・・，Ｓ｝）は、短時間（数１０ｍｓｅｃ〜数ｓｅｃ）ごとに区分された時系列の音響信号列、短時間ごとに区分された音響信号列の各要素に対応する要素番号、及び音響信号列の短時間ごとの音響特徴量列を含む。音響特徴量列の具体例は、第１実施形態で説明した通りである。 First, unlabeled acoustic feature quantity sequences 16-1,..., 16-S are input to the acoustic event determination unit 112. Each unlabeled acoustic feature sequence 16-s (where sε {1,..., S}) is a time-series acoustic signal sequence divided for each short time (several tens of milliseconds to several seconds). The element number corresponding to each element of the acoustic signal sequence divided for each and the acoustic feature amount sequence for each short time of the acoustic signal sequence are included. A specific example of the acoustic feature amount sequence is as described in the first embodiment.

音響イベント判定部１１２は、入力されたラベルなし音響特徴量列１６−ｓの音響特徴量列と、音響イベントモデルＤＢ１１３に記憶されている複数の音響イベントモデルを、第１実施形態の変形例１で説明したようにそれぞれ比較し、各フレーム（各要素番号ｉに対応）の音響特徴量列に対応する音響イベントを決定する。音響イベント判定部１１２は、各要素番号ｉに対して決定した音響イベントを表す音響イベントラベルを、ラベルなし音響特徴量列１６−ｓの各要素番号ｉの要素に付与する。音響イベント判定部１１２は、この処理をラベルなし音響特徴量列１６−１，・・・，１６−Ｓのすべての要素（すべての要素番号ｉ）について行い、その結果得られる音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓを出力する。 The acoustic event determination unit 112 uses the input acoustic feature string of the unlabeled acoustic feature string 16-s and the plurality of acoustic event models stored in the acoustic event model DB 113 as the first modification of the first embodiment. As described in the above, each is compared, and an acoustic event corresponding to the acoustic feature amount sequence of each frame (corresponding to each element number i) is determined. The acoustic event determination unit 112 assigns an acoustic event label representing the acoustic event determined for each element number i to the element of each element number i in the unlabeled acoustic feature quantity column 16-s. The acoustic event determination unit 112 performs this process for all the elements (all element numbers i) of the unlabeled acoustic feature amount sequences 16-1,..., 16-S, and the acoustic event labeled acoustics obtained as a result thereof. The signal trains 11-1,..., 11-S are output.

なお、音響信号列合成部１０１で合成処理を行うことに代えて、音響イベント判定部１１２の前段でラベルなし音響特徴量列１６−１，・・・，１６−Ｓの合成処理を行っても良い。 Instead of performing the synthesis process in the acoustic signal sequence synthesizing unit 101, the unlabeled acoustic feature sequence 16-1, ..., 16-S may be synthesized in the preceding stage of the acoustic event determination unit 112. good.

＜第２実施形態＞
第２実施形態では、第１実施形態で説明したように得られた状況−音響イベント生成モデル１３を用い、新たに入力された音響イベントラベル付き音響信号列から状況を推定する。 Second Embodiment
In the second embodiment, using the situation-acoustic event generation model 13 obtained as described in the first embodiment, the situation is estimated from a newly inputted acoustic signal label-attached acoustic signal sequence.

図５に例示するように、本形態の状況推定装置２００は、記憶部１０３及び生成モデル比較部２０１を有する。状況推定装置２００は、例えば、公知又は専用のコンピュータに所定のプログラムが読み込まれることで構成される。 As illustrated in FIG. 5, the situation estimation apparatus 200 according to this embodiment includes a storage unit 103 and a generated model comparison unit 201. The situation estimation apparatus 200 is configured by, for example, reading a predetermined program into a known or dedicated computer.

まず生成モデル比較部２０１に音響イベントラベル付き音響信号列２１（「音響イベントを表す音響イベント情報を含む入力情報」に相当）が入力される。音響イベントラベル付き音響信号列２１は、短時間（数１０ｍｓｅｃ〜数ｓｅｃ）ごとに区分された時系列の音響信号列、短時間ごとに区分された音響信号列の各要素に対応する要素番号、及び短時間ごとに決定されて付与された音響イベントラベル（「音響イベント情報」に相当）を含む。要素番号及び音響イベントラベルは、音響信号列の要素ごとに付与される。 First, an acoustic signal string 21 with an acoustic event label (corresponding to “input information including acoustic event information representing an acoustic event”) is input to the generation model comparison unit 201. The acoustic signal label with acoustic event label 21 is a time-series acoustic signal sequence divided every short time (several tens of milliseconds to several seconds), an element number corresponding to each element of the acoustic signal sequence divided every short time, And an acoustic event label determined and given every short time (corresponding to “acoustic event information”). The element number and the acoustic event label are given for each element of the acoustic signal string.

生成モデル比較部２０１は、入力された音響イベントラベル付き音響信号列２１と、記憶部１０３に格納された状況−音響イベント生成モデル１３とを比較し、音響イベントラベル付き音響信号列２１に対し、最も適切であると判断した状況、又は最も適切なものから順番に複数個の状況を決定し、それらを判定結果として出力する。 The generation model comparison unit 201 compares the input acoustic signal label-attached acoustic signal sequence 21 with the situation-acoustic event generation model 13 stored in the storage unit 103, and for the acoustic event label-attached acoustic signal sequence 21, A plurality of situations are determined in order from the most appropriate situation or the most appropriate situation, and these are output as judgment results.

［比較方法の例１］
音響イベントラベル付き音響信号列２１と状況−音響イベント生成モデル１３との比較方法を例示する。この例では、まず生成モデル比較部２０１が、入力された音響イベントラベル付き音響信号列２１から、以下のようにｐ（ε）（ただし、ε∈｛１，・・・，Ｅ｝）を算出する。

ただし、γは事前に設定された緩和パラメータ（例えば０．０１などの非負値）を表し、Ｃ_εは、音響イベントラベル付き音響信号列２１で音響イベントεを表す音響イベントラベルが付された要素の個数を表し、Ｎ_ｓ’は音響イベントラベル付き音響信号列２１が含む音響信号列の要素数を表す。Ｎ_ｓ’＝Ｎ_ｓであってもよいし、Ｎ_ｓ’≠Ｎ_ｓであってもよい。 [Comparative Method Example 1]
The comparison method of the acoustic signal sequence 21 with an acoustic event label and the situation-acoustic event generation model 13 is illustrated. In this example, the generation model comparison unit 201 first calculates p (ε) (where εε {1,..., E}) from the input acoustic signal sequence with acoustic event label 21 as follows. To do.

However, (gamma) represents the relaxation parameter set beforehand (for example, nonnegative values, such as 0.01), and C ( _epsilon ) is the element to which the acoustic event label showing the acoustic event (epsilon) was attached | subjected in the acoustic signal sequence 21 with an acoustic event label. N _s ′ represents the number of elements of the acoustic signal sequence included in the acoustic signal sequence with acoustic event label 21. N _s ′ = N _s or N _s ′ ≠ N _s may be used.

次に生成モデル比較部２０１は、ｐ（ε）と状況−音響イベント生成モデル１３を、下記に記すカルバックライブラー情報量（Kullback-Leibler divergence: KL divergence）やイェンセンシャノン情報量（Jensen-Shannon divergence: JS divergence）などの情報量基準に基づいて比較することで、入力された音響イベントラベル付き音響信号列２１に対応する状況を推定する。

Next, the generation model comparison unit 201 converts p (ε) and the situation-acoustic event generation model 13 into a Cullback library information amount (Kullback-Leibler divergence: KL divergence) or Jensen-Shannon divergence described below. : JS divergence) and the like, and the situation corresponding to the inputted acoustic signal label-attached acoustic signal sequence 21 is estimated.

式（１５）又は（１６）の例の場合、生成モデル比較部２０１は、Ｐ（ε）にｐ（ε）（ただし、ε∈｛１，・・・，Ｅ｝）を代入し、Ｑ（ε）に式（５）のφ_εｔ（ただし、ε∈｛１，・・・，Ｅ），ｔ∈｛１，・・・，Ｔ｝）を代入する。これにより、生成モデル比較部２０１は、各状況ｔ∈｛１，・・・，Ｔ｝に対応する情報量（合計Ｔ個の情報量）を得る。生成モデル比較部２０１は、各状況ｔ∈｛１，・・・，Ｔ｝について算出された情報量のうち、最も小さな情報量に対応する状況、又は、最も小さな情報量から順番に選択した複数個の情報量に対応する複数個の状況を、音響イベントラベル付き音響信号列２１に対応する状況として決定して出力する。 In the case of the example of Expression (15) or (16), the generation model comparison unit 201 substitutes p (ε) (where εε {1,..., E}) for P (ε), and Q ( φ _εt (where εε {1,..., E), tε {1,..., T}) of Expression (5) is substituted into ε). Thereby, the generation model comparison unit 201 obtains the information amount (total T information amount) corresponding to each situation tε {1,..., T}. The generation model comparison unit 201 selects the situation corresponding to the smallest information amount among the information amounts calculated for each situation tε {1,... A plurality of situations corresponding to each information amount is determined and output as a situation corresponding to the acoustic signal label-attached acoustic signal sequence 21.

［比較方法の例２］
以下のように状況−音響イベント生成モデル１３と音響イベントラベル付き音響信号列２１との比較を行ってもよい。この手法では、生成モデル比較部２０１が、入力されたラベル付き音響信号列２１に対し、状況−音響イベント生成モデル１３のもとでの状況の尤度の和や積を求める。以下に具体例を示す。 [Example 2 of comparison method]
The situation-acoustic event generation model 13 and the acoustic signal sequence with acoustic event label 21 may be compared as follows. In this method, the generation model comparison unit 201 obtains the sum or product of the likelihood of the situation under the situation-acoustic event generation model 13 for the input labeled acoustic signal sequence 21. Specific examples are shown below.

≪状況−音響イベント生成モデル１３のもとでの状況の尤度の和の例≫

<< Situation-Example of sum of likelihood of situation under acoustic event generation model 13 >>

≪状況−音響イベント生成モデル１３のもとでの状況の尤度の積の例≫

<< Situation-Example of the product of the likelihood of a situation under the acoustic event generation model 13 >>

ただし、式（１９）（２０）のｅ_ｉは、入力された音響イベントラベル付き音響信号列２１の要素番号ｉに対応する音響イベントラベルが表す音響イベントを表す。式（１９）（２０）は、式（５）の確率φ_εｔと、入力された音響イベントラベル付き音響信号列２１のｅ_ｉとから算出できる。 However, e _i in the equations (19) and (20) represents an acoustic event represented by the acoustic event label corresponding to the element number i of the inputted acoustic signal label-attached acoustic signal sequence 21. Equation (19) (20) can be calculated from the probability phi _.epsilon.t of formula (5), and _{e i} of the acoustic event labeled acoustic signal sequence 21 inputted.

生成モデル比較部２０１は、各状況について算出した尤度のうち、最も尤度の高い状況、又は、最も尤度の高いものから順番に選択した複数個の状況を、入力された音響イベントラベル付き音響信号列２１に対応する状況として決定して出力する。 The generated model comparison unit 201 has input acoustic event labels indicating the highest likelihood among the likelihoods calculated for each situation or a plurality of situations selected in order from the highest likelihood. The situation corresponding to the acoustic signal sequence 21 is determined and output.

＜第２実施形態の変形例１＞
第２実施形態の変形例１では、第１実施形態で説明したように得られた状況−音響イベント生成モデル１３を用い、新たに入力された音響信号列から状況を推定する。 <Modification Example 1 of Second Embodiment>
In the first modification of the second embodiment, the situation is estimated from the newly input acoustic signal sequence using the situation-acoustic event generation model 13 obtained as described in the first embodiment.

図６に例示するように、本形態の状況推定装置２１０は、特徴量算出部２１１、音響イベント判定部２１２、音響イベントモデルＤＢ１１３、記憶部１０３、生成モデル比較部２０１を有する。状況推定装置２１０は、例えば、公知又は専用のコンピュータに所定のプログラムが読み込まれることで構成される。 As illustrated in FIG. 6, the situation estimation apparatus 210 according to the present exemplary embodiment includes a feature amount calculation unit 211, an acoustic event determination unit 212, an acoustic event model DB 113, a storage unit 103, and a generation model comparison unit 201. The situation estimation device 210 is configured, for example, by reading a predetermined program into a known or dedicated computer.

まず特徴量算出部２１１にラベルなし音響信号列２２が入力される。ラベルなし音響信号列２２は、短時間（数１０ｍｓｅｃ〜数ｓｅｃ）ごとに区分された時系列の音響信号列、及び短時間ごとに区分された音響信号列の各要素に対応する要素番号を含む。 First, the unlabeled acoustic signal sequence 22 is input to the feature amount calculation unit 211. The unlabeled acoustic signal sequence 22 includes a time-series acoustic signal sequence divided every short time (several tens of milliseconds to several seconds) and an element number corresponding to each element of the acoustic signal sequence divided every short time. .

特徴量算出部２１１は、ラベルなし音響信号列２２から音響特徴量列（ベクトル）を算出して出力する。例えば特徴量算出部２１１は、第１実施形態で説明した特徴量算出部１１１と同じ方法で音響特徴量列を算出する。 The feature amount calculation unit 211 calculates and outputs an acoustic feature amount sequence (vector) from the unlabeled acoustic signal sequence 22. For example, the feature amount calculation unit 211 calculates an acoustic feature amount sequence by the same method as the feature amount calculation unit 111 described in the first embodiment.

音響イベント判定部２１２は、第１実施形態の音響イベント判定部１１２と同じ方法で、特徴量算出部２１１から出力された音響特徴量列と、音響イベントモデルＤＢ１１３に記憶されている複数の音響イベントモデルとをそれぞれ比較し、ラベルなし音響信号列２２の全ての要素について音響イベントを決定する。音響イベント判定部２１２は、決定した音響イベントを表す音響イベントラベルをラベルなし音響信号列２２の各要素に付することで、音響イベントラベル付き音響信号列２１を生成して出力する。 The acoustic event determination unit 212 is the same method as the acoustic event determination unit 112 of the first embodiment, and a plurality of acoustic events stored in the acoustic event model DB 113 and the acoustic feature amount sequence output from the feature amount calculation unit 211. Each model is compared, and acoustic events are determined for all elements of the unlabeled acoustic signal sequence 22. The acoustic event determination unit 212 generates and outputs an acoustic event label-attached acoustic signal sequence 21 by attaching an acoustic event label representing the determined acoustic event to each element of the unlabeled acoustic signal sequence 22.

音響イベントラベル付き音響信号列２１は、生成モデル比較部２０１に入力される。以降の処理は第２実施形態と同じである。 The acoustic signal sequence with acoustic event label 21 is input to the generation model comparison unit 201. The subsequent processing is the same as in the second embodiment.

＜第２実施形態の変形例２＞
第２実施形態の変形例２では、第１実施形態で説明したように得られた状況−音響イベント生成モデル１３を用い、新たに入力された音響特徴量列から状況を推定する。 <Modification 2 of the second embodiment>
In the second modification of the second embodiment, the situation is estimated from the newly input acoustic feature quantity sequence using the situation-acoustic event generation model 13 obtained as described in the first embodiment.

図７に例示するように、本形態の状況推定装置２２０は、音響イベント判定部２１２、音響イベントモデルＤＢ１１３、記憶部１０３、生成モデル比較部２０１を有する。状況推定装置２２０は、例えば、公知又は専用のコンピュータに所定のプログラムが読み込まれることで構成される。 As illustrated in FIG. 7, the situation estimation apparatus 220 according to the present embodiment includes an acoustic event determination unit 212, an acoustic event model DB 113, a storage unit 103, and a generation model comparison unit 201. The situation estimation device 220 is configured, for example, by reading a predetermined program into a known or dedicated computer.

まず音響イベント判定部２１２にラベルなし音響特徴量列２３が入力される。ラベルなし音響特徴量列２３は、短時間（数１０ｍｓｅｃ〜数ｓｅｃ）ごとに区分された時系列の音響信号列、短時間ごとに区分された音響信号列の各要素に対応する要素番号、及び音響信号列の音響特徴量列を含む。音響特徴量列の具体例は、第１実施形態で説明した通りである。 First, the unlabeled acoustic feature string 23 is input to the acoustic event determination unit 212. The unlabeled acoustic feature column 23 includes time-series acoustic signal sequences divided every short time (several tens of milliseconds to several seconds), element numbers corresponding to the elements of the acoustic signal sequence divided every short time, and The acoustic feature amount sequence of the acoustic signal sequence is included. A specific example of the acoustic feature amount sequence is as described in the first embodiment.

音響イベント判定部２１２は、第１実施形態の音響イベント判定部１１２と同じ方法で、ラベルなし音響特徴量列２３の音響特徴量列と、音響イベントモデルＤＢ１１３に記憶されている複数の音響イベントモデルとをそれぞれ比較し、ラベルなし音響特徴量列２３が含む全ての要素について音響イベントを決定する。音響イベント判定部２１２は、決定した各要素の音響イベントを表す音響イベントラベルを、ラベルなし音響特徴量列２３が含む音響信号列に付することで、音響イベントラベル付き音響信号列２１を生成して出力する。 The acoustic event determination unit 212 is the same method as the acoustic event determination unit 112 of the first embodiment, and a plurality of acoustic event models stored in the acoustic feature amount sequence 23 of the unlabeled acoustic feature amount sequence 23 and the acoustic event model DB 113. Are respectively determined, and an acoustic event is determined for all elements included in the unlabeled acoustic feature string 23. The acoustic event determination unit 212 generates the acoustic signal sequence 21 with the acoustic event label by attaching the acoustic event label representing the acoustic event of each determined element to the acoustic signal sequence included in the unlabeled acoustic feature amount sequence 23. Output.

＜第３実施形態＞
本形態は第１実施形態と第２実施形態の組み合わせである。
本形態では、音響イベントラベル付き音響信号列２１を入力として状況を推定することに加え、音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓを入力とし、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３の算出も行う。 <Third Embodiment>
This embodiment is a combination of the first embodiment and the second embodiment.
In this embodiment, in addition to estimating the situation using the acoustic event label-attached acoustic signal sequence 21 as an input, the acoustic event sequence-attached acoustic signal sequences 11-1,. The model 12 and the situation-acoustic event generation model 13 are also calculated.

図８に例示するように、本形態の状況推定装置３００は、記憶部１０３，３０３、音響信号列合成部３０１、状況モデル化部１０２、及び生成モデル比較部２０１を有する。状況推定装置３００は、例えば、公知又は専用のコンピュータに所定のプログラムが読み込まれることで構成される。 As illustrated in FIG. 8, the situation estimation apparatus 300 according to this embodiment includes storage units 103 and 303, an acoustic signal sequence synthesis unit 301, a situation modeling unit 102, and a generated model comparison unit 201. The situation estimation apparatus 300 is configured by, for example, reading a predetermined program into a known or dedicated computer.

記憶部３０３には、第１実施形態で説明した音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ、及び第２実施形態で説明した音響イベントラベル付き音響信号列２１が格納されている。 The storage unit 303 stores the acoustic signal sequence with acoustic event labels 11-1,..., 11-S described in the first embodiment and the acoustic signal sequence with acoustic event label 21 described in the second embodiment. Has been.

音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ，２１は音響信号列合成部３０１に入力される。音響信号列合成部３０１は、音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ，２１を時系列方向につなぎ合わせて一つの音響イベントラベル付き音響信号列を生成し、状況モデル化部１０２に送出する。状況モデル化部１０２は、第１実施形態で説明したように、入力された音響イベントラベル付き音響信号列から、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を生成し、それらを記憶部１０３に格納する。 The acoustic signal strings with acoustic event labels 11-1,..., 11 -S, 21 are input to the acoustic signal string synthesizing unit 301. The acoustic signal sequence synthesizing unit 301 generates a single acoustic event label-attached acoustic signal sequence by connecting the acoustic event-labeled acoustic signal sequences 11-1,..., 11-S, 21 in the time series direction. Send to the modeling unit 102. As described in the first embodiment, the situation modeling unit 102 generates the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13 from the input acoustic signal label-attached acoustic signal sequence. Is stored in the storage unit 103.

音響イベントラベル付き音響信号列２１は生成モデル比較部２０１に入力される。音響イベントラベル付き音響信号列２１は、第２実施形態で説明したように、入力された音響イベントラベル付き音響信号列２１と、記憶部１０３に格納された状況−音響イベント生成モデル１３とを比較し、音響イベントラベル付き音響信号列２１に対し、最も適切であると判断した状況、又は最も適切なものから順番に複数個の状況を選択し、それらを判定結果として出力する。 The acoustic signal sequence with acoustic event label 21 is input to the generation model comparison unit 201. As described in the second embodiment, the acoustic event label-attached acoustic signal sequence 21 compares the input acoustic event-labeled acoustic signal sequence 21 with the situation-acoustic event generation model 13 stored in the storage unit 103. Then, the situation determined to be the most appropriate for the acoustic signal string with the acoustic event label 21 or a plurality of situations in order from the most appropriate one is selected, and these are output as the determination results.

また、生成モデル比較部２０１の処理及び状況モデル化部１０２の処理のどちらを先に行っても良い。ただし、状況モデル化部１０２の処理を行う前にモデル比較部２０１の処理を行う場合、記憶部１０３に予め得られた各生成モデルが格納されていることが必要である。 Further, either the processing of the generation model comparison unit 201 or the processing of the situation modeling unit 102 may be performed first. However, when the processing of the model comparison unit 201 is performed before the processing of the situation modeling unit 102, it is necessary that each generated model obtained in advance is stored in the storage unit 103.

また、音響イベントラベル付き音響信号列２１’が、新たに入力された音響イベントラベル付き音響信号列とともに音響信号列合成部３０１に入力されてもよい。この場合、音響信号列合成部３０１がこれらを時系列方向につなぎ合わせ、状況モデル化部１０２に送出してもよい。
その他の処理は第１実施形態及び第２実施形態と同様とする。 Moreover, the acoustic signal sequence with acoustic event label 21 ′ may be input to the acoustic signal sequence synthesizing unit 301 together with the newly input acoustic signal sequence with acoustic event label. In this case, the acoustic signal sequence synthesizing unit 301 may connect them in the time series direction and send them to the situation modeling unit 102.
Other processes are the same as those in the first embodiment and the second embodiment.

＜第３実施形態の変形例１＞
本形態は第１実施形態の変形例１と第２実施形態の変形例１の組み合わせである。
本形態では、ラベルなし音響信号列１５−１，・・・，１５−Ｓ，２２を入力として、学習によって、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を算出する。さらに本形態では、状況−音響イベント生成モデル１３を用い、ラベルなし音響信号列２２から状況を推定する。 <Modification 1 of 3rd Embodiment>
This embodiment is a combination of the first modification of the first embodiment and the first modification of the second embodiment.
In this embodiment, the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13 are calculated by learning using the unlabeled acoustic signal sequences 15-1,..., 15-S, 22 as input. Further, in the present embodiment, the situation is estimated from the unlabeled acoustic signal sequence 22 using the situation-acoustic event generation model 13.

図９に例示するように、本形態の状況推定装置３１０は、特徴量算出部１１１−１，・・・，１１１−Ｓ，２１１、音響イベント判定部１１２−１，・・・，１１２−Ｓ，２１２、音響イベントモデルＤＢ１１３、及び第３実施形態の状況推定装置３００（図８参照）を有する。 As illustrated in FIG. 9, the situation estimation device 310 according to the present embodiment includes a feature amount calculation unit 111-1,..., 111-S, 211, and an acoustic event determination unit 112-1,. 212, the acoustic event model DB 113, and the situation estimation apparatus 300 (see FIG. 8) of the third embodiment.

ラベルなし音響信号列１５−１，・・・，１５−Ｓは、それぞれ特徴量算出部１１１−１，・・・，１１１−Ｓに入力される。特徴量算出部１１１−１，・・・，１１１−Ｓは、第１実施形態の変形例１で説明したように、ラベルなし音響信号列１５−１，・・・，１５−Ｓから、それぞれ音響特徴量列を得て出力する。音響イベント判定部１１２−１，・・・，１１２−Ｓは、それぞれ、第１実施形態の変形例１の音響イベント判定部１１２と同様に、入力された音響特徴量列と音響イベントモデルＤＢ１１３に記憶されている複数の音響イベントモデルとから、音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ（「学習用情報」に相当）を生成して出力する。 The unlabeled acoustic signal sequences 15-1,..., 15-S are respectively input to the feature amount calculation units 111-1,. As described in the first modification of the first embodiment, the feature amount calculation units 111-1,..., 111-S are respectively connected to the unlabeled acoustic signal sequences 15-1,. Obtain and output an acoustic feature string. The acoustic event determination units 112-1,..., 112-S are respectively input to the input acoustic feature quantity sequence and the acoustic event model DB 113 in the same manner as the acoustic event determination unit 112 of the first modification of the first embodiment. From the plurality of stored acoustic event models, acoustic signal strings with acoustic event labels 11-1,..., 11-S (corresponding to “learning information”) are generated and output.

ラベルなし音響信号列２２は特徴量算出部２１１に入力される。特徴量算出部２１１は、第２実施形態の変形例１で説明したように、ラベルなし音響信号列２２から音響特徴量列（ベクトル）を算出して出力する。音響イベント判定部２１２は、第２実施形態の変形例１で説明したように、入力された音響特徴量列と音響イベントモデルＤＢ１１３に記憶されている複数の音響イベントモデルとから、音響イベントラベル付き音響信号列２１（「入力情報」に相当）を生成して出力する。 The unlabeled acoustic signal sequence 22 is input to the feature amount calculation unit 211. As described in the first modification of the second embodiment, the feature amount calculation unit 211 calculates and outputs an acoustic feature amount sequence (vector) from the unlabeled acoustic signal sequence 22. As described in the first modification of the second embodiment, the acoustic event determination unit 212 includes an acoustic event label from the input acoustic feature quantity sequence and the plurality of acoustic event models stored in the acoustic event model DB 113. An acoustic signal sequence 21 (corresponding to “input information”) is generated and output.

音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ、２１は、記憶部３０３（図８）に格納される。以降の処理は第３実施形態と同じである。 Acoustic signal strings 11-1,..., 11-S, 21 with acoustic event labels are stored in the storage unit 303 (FIG. 8). The subsequent processing is the same as in the third embodiment.

＜第３実施形態の変形例２＞
本形態は第１実施形態の変形例２と第２実施形態の変形例２の組み合わせである。
本形態では、ラベルなし音響特徴量列１６−１，・・・，１６−Ｓ，２３を入力として、学習によって、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を算出する。さらに本形態では、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を用い、ラベルなし音響特徴量列２３から状況を推定する。 <Modification 2 of 3rd Embodiment>
This embodiment is a combination of the second modification of the first embodiment and the second modification of the second embodiment.
In this embodiment, the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13 are calculated by learning using the unlabeled acoustic feature quantity sequences 16-1,..., 16-S, 23 as input. Furthermore, in this embodiment, the situation is estimated from the unlabeled acoustic feature quantity sequence 23 using the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13.

図１０に例示するように、本形態の状況推定装置３２０は、音響イベント判定部１１２−１，・・・，１１２−Ｓ，２１２、音響イベントモデルＤＢ１１３、及び第３実施形態の状況推定装置３００（図８参照）を有する。 As illustrated in FIG. 10, the situation estimation apparatus 320 according to the present embodiment includes acoustic event determination units 112-1,..., 112-S, 212, an acoustic event model DB 113, and a situation estimation apparatus 300 according to the third embodiment. (See FIG. 8).

ラベルなし音響特徴量列１６−１，・・・，１６−Ｓは、それぞれ音響イベント判定部１１２−１，・・・，１１２−Ｓに入力される。音響イベント判定部１１２−１，・・・，１１２−Ｓは、それぞれ、第１実施形態の変形例２の音響イベント判定部１１２と同様に、入力された音響特徴量列と音響イベントモデルＤＢ１１３に記憶されている複数の音響イベントモデルとから、音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ（「学習用情報」に相当）を生成して出力する。 The unlabeled acoustic feature strings 16-1,..., 16-S are input to the acoustic event determination units 112-1,. The acoustic event determination units 112-1,..., 112-S are respectively input to the input acoustic feature quantity sequence and the acoustic event model DB 113 in the same manner as the acoustic event determination unit 112 of the second modification of the first embodiment. From the plurality of stored acoustic event models, acoustic signal strings with acoustic event labels 11-1,..., 11-S (corresponding to “learning information”) are generated and output.

ラベルなし音響特徴量列２３は音響イベント判定部２１２に入力される。音響イベント判定部２１２は、第２実施形態の変形例２と同様に、入力されたラベルなし音響特徴量列２３の音響特徴量列と音響イベントモデルＤＢ１１３に記憶されている複数の音響イベントモデルとから、音響イベントラベル付き音響信号列２１（「入力情報」に相当）を生成して出力する。 The unlabeled acoustic feature string 23 is input to the acoustic event determination unit 212. Similarly to the second modification of the second embodiment, the acoustic event determination unit 212 includes the input acoustic feature amount sequence of the unlabeled acoustic feature amount sequence 23 and a plurality of acoustic event models stored in the acoustic event model DB 113. Then, an acoustic event label-attached acoustic signal sequence 21 (corresponding to “input information”) is generated and output.

音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ、音響イベントラベル付き音響信号列２１は、記憶部３０３（図８）に格納される。以降の処理は第３実施形態と同じである。 The acoustic signal sequence with acoustic event labels 11-1,..., 11-S and the acoustic signal sequence with acoustic event label 21 are stored in the storage unit 303 (FIG. 8). The subsequent processing is the same as in the third embodiment.

＜第４実施形態＞
本形態は第３実施形態の変形である。
本形態では、音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ（「第１学習用情報」に相当）、及び音響イベントラベル付き音響信号列２１（「第２学習用情報」に相当）を入力とし、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を算出することに加え、音響イベントラベル付き音響信号列２１に対応する状況を推定する。 <Fourth embodiment>
This embodiment is a modification of the third embodiment.
In this embodiment, the acoustic event label-attached acoustic signal sequence 11-1,..., 11-S (corresponding to “first learning information”) and the acoustic event label-attached acoustic signal sequence 21 (“second learning information”). In addition to calculating the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13, the situation corresponding to the acoustic signal label-attached acoustic signal sequence 21 is estimated.

図１１に例示するように、本形態の状況推定装置４００は、記憶部１０３、３０３、音響信号列合成部４０１、状況モデル化部４０２、及び生成モデル比較部４０３を有する。状況推定装置４００は、例えば、公知又は専用のコンピュータに所定のプログラムが読み込まれることで構成される。 As illustrated in FIG. 11, the situation estimation apparatus 400 according to the present embodiment includes storage units 103 and 303, an acoustic signal sequence synthesis unit 401, a situation modeling unit 402, and a generation model comparison unit 403. The situation estimation device 400 is configured, for example, by reading a predetermined program into a known or dedicated computer.

音響信号列合成部４０１に、音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ（「第１学習用情報」に相当）及び音響イベントラベル付き音響信号列２１（「第２学習用情報」に相当）が入力される。音響信号列合成部４０１は、音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ、音響イベントラベル付き音響信号列２１を時系列方向につなぎ合わせ、それによって１つの音響イベントラベル付き音響信号列４１（以下、単に「ラベル付き音響信号列４１」という）を得て出力する。ラベル付き音響信号列４１は、状況モデル化部４０２に入力される。なお、予め、音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ，２１からラベル付き音響信号列４１が得られている場合には、音響信号列合成部４０１を経由することなく、ラベル付き音響信号列４１がそのまま状況モデル化部４０２に入力されてもよい。 The acoustic signal sequence synthesizing unit 401 includes an acoustic signal sequence with acoustic event labels 11-1,..., 11-S (corresponding to “first learning information”) and an acoustic signal sequence with acoustic event label 21 (“second” Equivalent to “learning information”). The acoustic signal sequence synthesizing unit 401 connects the acoustic signal sequence with acoustic event labels 11-1,..., 11-S and the acoustic signal sequence with acoustic event label 21 in the time series direction, thereby one acoustic event label. The attached acoustic signal sequence 41 (hereinafter simply referred to as “labeled acoustic signal sequence 41”) is obtained and output. The labeled acoustic signal sequence 41 is input to the situation modeling unit 402. In addition, when the acoustic signal sequence 41 with a label is obtained from the acoustic signal sequence with acoustic event labels 11-1,..., 11-S, 21 in advance, it goes through the acoustic signal sequence synthesis unit 401. Alternatively, the labeled acoustic signal sequence 41 may be input to the situation modeling unit 402 as it is.

状況モデル化部４０２は、入力されたラベル付き音響信号列４１を用い、第１実施形態の状況モデル化部１０２と同じ方法で、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を算出する。音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３は、記憶部１０３に格納される。 The situation modeling unit 402 uses the input labeled acoustic signal sequence 41 and uses the same method as the situation modeling unit 102 of the first embodiment in the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13. Is calculated. The acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13 are stored in the storage unit 103.

生成モデル比較部４０３には、音響イベントラベル付き音響信号列２１が入力される。生成モデル比較部４０３は、第２実施形態の生成モデル比較部２０１と同様に、入力された音響イベントラベル付き音響信号列２１と、記憶部１０３に格納された状況−音響イベント生成モデル１３とを比較し、音響イベントラベル付き音響信号列２１に対し、最も適切であると判断した状況、又は最も適切なものから順番に複数個の状況を決定し、それらを判定結果として出力する。 The generation model comparison unit 403 receives the acoustic signal sequence with acoustic event label 21. Similarly to the generation model comparison unit 201 of the second embodiment, the generation model comparison unit 403 receives the input acoustic signal label-attached acoustic signal sequence 21 and the situation-acoustic event generation model 13 stored in the storage unit 103. In comparison, the situation determined to be the most appropriate for the acoustic signal string 21 with the acoustic event label, or a plurality of situations are determined in order from the most appropriate one, and these are output as determination results.

また、音響イベントラベル付き音響信号列２１が、さらに新たに入力された音響イベントラベル付き音響信号列とともに音響信号列合成部４０１に入力されてもよい。音響信号列合成部４０１は、これらを時系列方向につなぎ合わせ、状況モデル化部４０２に送出してもよい。
その他の処理は第１実施形態、第２実施形態及び第３実施形態と同様とする。 Further, the acoustic signal sequence with acoustic event label 21 may be input to the acoustic signal sequence synthesizing unit 401 together with the newly input acoustic signal sequence with acoustic event label. The acoustic signal sequence synthesizing unit 401 may connect these in the time series direction and send them to the situation modeling unit 402.
Other processes are the same as those in the first embodiment, the second embodiment, and the third embodiment.

＜第４実施形態の変形例１＞
本形態は第３実施形態の変形例１の変形である。
本形態では、ラベルなし音響信号列１５−１，・・・，１５−Ｓ，２２を入力として、学習によって、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を算出する。さらに本形態では、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を用い、ラベルなし音響信号列２２に対応する状況を推定する。 <Modification 1 of 4th Embodiment>
This embodiment is a modification of the first modification of the third embodiment.
In this embodiment, the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13 are calculated by learning using the unlabeled acoustic signal sequences 15-1,..., 15-S, 22 as input. Furthermore, in this embodiment, the situation corresponding to the unlabeled acoustic signal sequence 22 is estimated using the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13.

図９に例示するように、本形態の状況推定装置４１０は、特徴量算出部１１１−１，・・・，１１１−Ｓ，２１１、音響イベント判定部１１２−１，・・・，１１２−Ｓ，２１２、音響イベントモデルＤＢ１１３、及び第４実施形態の状況推定装置４００（図１１参照）を有する。 As illustrated in FIG. 9, the situation estimation apparatus 410 according to this embodiment includes a feature amount calculation unit 111-1,..., 111-S, 211, and an acoustic event determination unit 112-1,. 212, the acoustic event model DB 113, and the situation estimation apparatus 400 (see FIG. 11) of the fourth embodiment.

ラベルなし音響信号列１５−１，・・・，１５−Ｓは、それぞれ特徴量算出部１１１−１，・・・，１１１−Ｓに入力される。特徴量算出部１１１−１，・・・，１１１−Ｓは、第１実施形態の変形例１で説明したように、ラベルなし音響信号列１５−１，・・・，１５−Ｓから、それぞれ音響特徴量列を得て出力する。音響イベント判定部１１２−１，・・・，１１２−Ｓは、それぞれ、第１実施形態の変形例１の音響イベント判定部１１２と同様に、入力された音響特徴量列と音響イベントモデルＤＢ１１３に記憶されている複数の音響イベントモデルとから、音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ（「第１学習用情報」に相当）を生成して出力する。 The unlabeled acoustic signal sequences 15-1,..., 15-S are respectively input to the feature amount calculation units 111-1,. As described in the first modification of the first embodiment, the feature amount calculation units 111-1,..., 111-S are respectively connected to the unlabeled acoustic signal sequences 15-1,. Obtain and output an acoustic feature string. The acoustic event determination units 112-1,..., 112-S are respectively input to the input acoustic feature quantity sequence and the acoustic event model DB 113 in the same manner as the acoustic event determination unit 112 of the first modification of the first embodiment. From the plurality of stored acoustic event models, acoustic signal strings 11-1,..., 11-S (corresponding to “first learning information”) with acoustic event labels are generated and output.

ラベルなし音響信号列２２は特徴量算出部２１１に入力される。特徴量算出部２１１及び音響イベント判定部２１２は、第２実施形態の変形例１で説明したように、ラベルなし音響信号列２２から音響特徴量列（ベクトル）を算出して出力する。音響イベント判定部２１２は、第２実施形態の変形例１で説明したように、入力された音響特徴量列と音響イベントモデルＤＢ１１３に記憶されている複数の音響イベントモデルとから、音響イベントラベル付き音響信号列２１（「第２学習用情報」に相当）を生成して出力する。 The unlabeled acoustic signal sequence 22 is input to the feature amount calculation unit 211. As described in the first modification of the second embodiment, the feature amount calculation unit 211 and the acoustic event determination unit 212 calculate and output an acoustic feature amount sequence (vector) from the unlabeled acoustic signal sequence 22. As described in the first modification of the second embodiment, the acoustic event determination unit 212 includes an acoustic event label from the input acoustic feature quantity sequence and the plurality of acoustic event models stored in the acoustic event model DB 113. An acoustic signal sequence 21 (corresponding to “second learning information”) is generated and output.

音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ，２１は、記憶部３０３（図１１）に格納される。以降の処理は第４実施形態と同じである。 Acoustic signal strings 11-1,..., 11-S, 21 with acoustic event labels are stored in the storage unit 303 (FIG. 11). The subsequent processing is the same as in the fourth embodiment.

＜第４実施形態の変形例２＞
本形態は第３実施形態の変形例２の変形である。
本形態では、ラベルなし音響特徴量列１６−１，・・・，１６−Ｓ，２３を入力として、学習によって、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を算出する。さらに本形態では、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３を用い、ラベルなし音響特徴量列２３に対応する状況を推定する。 <Modification 2 of 4th Embodiment>
This embodiment is a modification of the second modification of the third embodiment.
In this embodiment, the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13 are calculated by learning using the unlabeled acoustic feature quantity sequences 16-1,..., 16-S, 23 as input. Furthermore, in this embodiment, the situation corresponding to the unlabeled acoustic feature quantity sequence 23 is estimated using the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13.

図１０に例示するように、本形態の状況推定装置４２０は、音響イベント判定部１１２−１，・・・，１１２−Ｓ，２１２、音響イベントモデルＤＢ１１３、及び第４実施形態の状況推定装置４００（図１１参照）を有する。 As illustrated in FIG. 10, the situation estimation apparatus 420 according to the present embodiment includes acoustic event determination units 112-1,..., 112-S, 212, an acoustic event model DB 113, and a situation estimation apparatus 400 according to the fourth embodiment. (See FIG. 11).

ラベルなし音響特徴量列１６−１，・・・，１６−Ｓは、それぞれ音響イベント判定部１１２−１，・・・，１１２−Ｓに入力される。音響イベント判定部１１２−１，・・・，１１１−Ｓは、それぞれ、第１実施形態の変形例２の音響イベント判定部１１２と同様に、入力された音響特徴量列と音響イベントモデルＤＢ１１３に記憶されている複数の音響イベントモデルとから、音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ（「第１学習用情報」に相当）を生成して出力する。 The unlabeled acoustic feature strings 16-1,..., 16-S are input to the acoustic event determination units 112-1,. The acoustic event determination units 112-1,..., 111-S are respectively input to the input acoustic feature quantity sequence and the acoustic event model DB 113 in the same manner as the acoustic event determination unit 112 of the second modification of the first embodiment. From the plurality of stored acoustic event models, acoustic signal strings 11-1,..., 11-S (corresponding to “first learning information”) with acoustic event labels are generated and output.

ラベルなし音響特徴量列２３は音響イベント判定部２１２に入力される。音響イベント判定部２１２は、第２実施形態の変形例２と同様に、入力されたラベルなし音響特徴量列２３の音響特徴量列と音響イベントモデルＤＢ１１３に記憶されている複数の音響イベントモデルとから、音響イベントラベル付き音響信号列２１（「第２学習用情報」に相当）を生成して出力する。 The unlabeled acoustic feature string 23 is input to the acoustic event determination unit 212. Similarly to the second modification of the second embodiment, the acoustic event determination unit 212 includes the input acoustic feature amount sequence of the unlabeled acoustic feature amount sequence 23 and a plurality of acoustic event models stored in the acoustic event model DB 113. Then, an acoustic signal sequence 21 with acoustic event labels (corresponding to “second learning information”) is generated and output.

音響イベントラベル付き音響信号列１１−１，・・・，１１−Ｓ，２１は、記憶部３０３（図１１）に格納される。以降の処理は第３実施形態と同じである。 Acoustic signal strings 11-1,..., 11-S, 21 with acoustic event labels are stored in the storage unit 303 (FIG. 11). The subsequent processing is the same as in the third embodiment.

＜変形例等＞
本発明は上述の各実施形態に限定されるものではない。例えば、状況生成モデル作成装置や状況推定装置の処理が複数の装置で分散処理されてもよいし、上記の各実施形態で記憶部やＤＢに格納された各データが複数の記憶部やＤＢに分散して格納されてもよい。例えば、音響信号−状況生成モデル１２、及び状況−音響イベント生成モデル１３が互いに異なる記憶部に格納されてもよい。また、音響信号列が時系列の順に入力され順次処理されるのであれば、短時間ごとに区分された音響信号列の各要素に対応する要素番号が、音響イベントラベル付き音響信号列に含まれなくてもよい。 <Modifications>
The present invention is not limited to the above-described embodiments. For example, the processing of the situation generation model creation device or the situation estimation device may be distributed by a plurality of devices, and each data stored in the storage unit or DB in each of the above embodiments is stored in a plurality of storage units or DBs. It may be stored in a distributed manner. For example, the acoustic signal-situation generation model 12 and the situation-acoustic event generation model 13 may be stored in different storage units. In addition, if the acoustic signal sequence is input and processed sequentially in time series, the element number corresponding to each element of the acoustic signal sequence divided every short time is included in the acoustic signal sequence with acoustic event label. It does not have to be.

上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 The various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Needless to say, other modifications are possible without departing from the spirit of the present invention.

上述の構成をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。このプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。各部での処理の結果得られたデータは逐一メモリに格納され、必要に応じて読み出されて利用される。 When the above configuration is realized by a computer, the processing contents of the functions that each device should have are described by a program. By executing this program on a computer, the above processing functions are realized on the computer. Data obtained as a result of processing in each unit is stored in the memory one by one, and is read and used as necessary.

処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体の例は、非一時的な（non-transitory）記録媒体である。このような記録媒体の例は、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等である。 The program describing the processing contents can be recorded on a computer-readable recording medium. An example of a computer-readable recording medium is a non-transitory recording medium. Examples of such a recording medium are a magnetic recording device, an optical disk, a magneto-optical recording medium, a semiconductor memory, and the like.

このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 This program is distributed, for example, by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。処理の実行時、このコンピュータは、自己の記録装置に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, this computer reads a program stored in its own recording device and executes a process according to the read program. As another execution form of the program, the computer may read the program directly from the portable recording medium and execute processing according to the program, and each time the program is transferred from the server computer to the computer. The processing according to the received program may be executed sequentially.

上記実施形態では、コンピュータ上で所定のプログラムを実行させて本装置の処理機能が実現されたが、これらの処理機能の少なくとも一部がハードウェアで実現されてもよい。 In the above embodiment, the processing functions of the apparatus are realized by executing a predetermined program on a computer. However, at least a part of these processing functions may be realized by hardware.

１００，１１０，１２０状況生成モデル作成装置
２００，２１０，２２０，３００，３１０，３２０，４００，４１０，４２０状況推定装置 100, 110, 120 Situation generation model creation apparatus 200, 210, 220, 300, 310, 320, 400, 410, 420 Situation estimation apparatus

Claims

Using learning information including a set of time-series acoustic signal sequences and acoustic event information that represents an acoustic event corresponding to the acoustic signal sequence, a potential field situation defined by the acoustic event indicates an acoustic event. A situation generation model creating apparatus having a situation modeling unit that obtains a probability P (acoustic event | situation) to be generated and a probability P (situation | acoustic signal string) that an acoustic signal sequence generates a situation and creates a generation model .

The situation generation model creation device according to claim 1,
A feature amount calculation unit for calculating an acoustic feature amount from the acoustic signal sequence;
An acoustic event determination unit that determines an acoustic event using the acoustic feature amount; and
The acoustic event information included in the learning information represents the acoustic event determined by the acoustic event determination unit.
A situation generation model creation device characterized by that.

The situation generation model creation device according to claim 1,
Using an acoustic feature amount corresponding to the acoustic signal sequence, and having an acoustic event determination unit for determining an acoustic event;
The acoustic event information included in the learning information represents an acoustic event determined by the acoustic event determination unit.
A situation generation model creation device characterized by that.

A situation corresponding to a probability P (acoustic event | situation) that the situation of a potential field defined by the acoustic event generates an acoustic event is used as input information including acoustic event information representing the acoustic event. A situation estimation apparatus having a generation model comparison unit for estimating a corresponding situation.

The situation estimation apparatus according to claim 4, wherein
A feature amount calculation unit for calculating an acoustic feature amount from a time-series acoustic signal sequence;
An acoustic event determination unit that determines an acoustic event using the acoustic feature amount; and
The acoustic event information included in the input information represents the acoustic event determined by the acoustic event determination unit.
A situation estimation apparatus characterized by that.

The situation estimation apparatus according to claim 4, wherein
An acoustic event determination unit that determines an acoustic event using an acoustic feature amount,
The acoustic event information included in the input information represents the acoustic event determined by the acoustic event determination unit.
A situation estimation apparatus characterized by that.

Using learning information including a set of time-series acoustic signal sequences and acoustic event information that represents an acoustic event corresponding to the acoustic signal sequence, a potential field situation defined by the acoustic event indicates an acoustic event. A situation modeling unit that obtains a probability P (acoustic event | situation) to generate and a probability P (situation | acoustic signal string) that the acoustic signal sequence generates a situation;
A generation model comparison unit that estimates a situation corresponding to input information including acoustic event information representing an acoustic event using a situation-acoustic event generation model corresponding to the probability P (acoustic event | situation);
A situation estimation apparatus.

The situation estimation device according to claim 7,
A first feature quantity calculation unit for calculating a first acoustic feature quantity from a first time-series acoustic signal sequence;
A first acoustic event determination unit that determines an acoustic event using the first acoustic feature amount;
A second feature amount calculation unit for calculating a second acoustic feature amount from a second time-series acoustic signal sequence;
A second acoustic event determination unit that determines an acoustic event using the second acoustic feature amount;
The acoustic signal sequence included in the learning information is the first acoustic signal sequence,
The acoustic event information included in the learning information represents the acoustic event determined by the first acoustic event determination unit,
The acoustic event information included in the input information represents the acoustic event determined by the second acoustic event determination unit.
A situation estimation apparatus characterized by that.

The situation estimation device according to claim 7,
A first acoustic event determination unit that determines an acoustic event using the first acoustic feature amount;
A second acoustic event determination unit that determines an acoustic event using the second acoustic feature amount;
The acoustic signal sequence included in the learning information corresponds to the first acoustic feature amount,
The acoustic event information included in the learning information represents the acoustic event determined by the first acoustic event determination unit,
The acoustic event information included in the input information represents the acoustic event determined by the second acoustic event determination unit.
A situation estimation apparatus characterized by that.

A situation of a potential field defined by an acoustic event using first and second learning information including a set of time-series acoustic signal sequences and acoustic event information representing an acoustic event corresponding to the acoustic signal sequence A situation modeling unit that obtains a probability P (acoustic event | situation) that the acoustic signal generates a situation and a probability P (situation | acoustic signal string) that the acoustic signal sequence generates the situation;
A generation model comparison unit that estimates a situation corresponding to the acoustic event information included in the second learning information using a situation-acoustic event generation model corresponding to the probability P (acoustic event | situation);
A situation estimation apparatus.

The situation estimation apparatus according to claim 10, wherein
A first feature quantity calculation unit for calculating a first acoustic feature quantity from a first time-series acoustic signal sequence;
A first acoustic event determination unit that determines an acoustic event using the first acoustic feature amount;
A second feature amount calculation unit for calculating a second acoustic feature amount from a second time-series acoustic signal sequence;
A second acoustic event determination unit that determines an acoustic event using the second acoustic feature amount;
The acoustic signal sequence included in the first learning information is the first acoustic signal sequence,
The acoustic event information included in the first learning information represents the acoustic event determined by the first acoustic event determination unit,
The acoustic signal sequence included in the second learning information is the second acoustic signal sequence,
The acoustic event information included in the second learning information represents the acoustic event determined by the second acoustic event determination unit.
A situation estimation apparatus characterized by that.

The situation estimation apparatus according to claim 10, wherein
A first acoustic event determination unit that determines an acoustic event using the first acoustic feature amount;
A second acoustic event determination unit that uses the second acoustic feature amount to determine an acoustic event,
The acoustic signal sequence included in the first learning information corresponds to the first acoustic feature amount,
The acoustic event information included in the first learning information represents the acoustic event determined by the first acoustic event determination unit,
The acoustic signal sequence included in the second learning information corresponds to the second acoustic feature amount,
The acoustic event information included in the second learning information represents the acoustic event determined by the second acoustic event determination unit.
A situation estimation apparatus characterized by that.