JP5777567B2

JP5777567B2 - Acoustic feature quantity calculation device and method, specific situation model database creation device, specific element sound model database creation device, situation estimation device, calling suitability notification device, and program

Info

Publication number: JP5777567B2
Application number: JP2012116346A
Authority: JP
Inventors: 桂右井本; 島内　末廣; 末廣島内; 仲大室; 羽田　陽一; 陽一羽田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-05-22
Filing date: 2012-05-22
Publication date: 2015-09-09
Anticipated expiration: 2032-05-22
Also published as: JP2013242462A

Description

この発明は、音響信号の特徴量を抽出する技術、抽出された特徴量を用いて状況を推定する技術及び発呼の適否を通知する技術に関する。 The present invention relates to a technique for extracting a feature quantity of an acoustic signal, a technique for estimating a situation using the extracted feature quantity, and a technique for notifying whether or not a call is appropriate.

音響信号の特徴量のひとつである立ち上がり特性を計算する技術として、非特許文献１に記載された技術が知られている。立ち上がり特性とは、数十から数百ミリ秒毎における、音響信号の大きさの増加の度合いを表す指標である。 A technique described in Non-Patent Document 1 is known as a technique for calculating a rising characteristic that is one of the feature quantities of an acoustic signal. The rising characteristic is an index representing the degree of increase in the magnitude of the acoustic signal every tens to hundreds of milliseconds.

非特許文献１に記載された技術では、立ち上がり特性として、次式により定義されるLAT(Log Attack Time)の値が計算されている（例えば、非特許文献１参照。）。T_startは音響信号の振幅が増加を開始する時刻であり、T_stopは音響信号の振幅が最大となる時刻である。 In the technique described in Non-Patent Document 1, a value of LAT (Log Attack Time) defined by the following equation is calculated as the rising characteristic (for example, see Non-Patent Document 1). T _start is a time when the amplitude of the acoustic signal starts to increase, and T _stop is a time when the amplitude of the acoustic signal becomes maximum.

LAT=log₁₀(T_stop-T_start) LAT = log ₁₀ (T _stop -T _start )

Hyoung-Gook Kim, Nicolas Moreau, Thomas Sikora, ”MPEG-7 AUDIO AND BEYOND: audio content indexing and retrieval”, WILEYHyoung-Gook Kim, Nicolas Moreau, Thomas Sikora, “MPEG-7 AUDIO AND BEYOND: audio content indexing and retrieval”, WILEY

しかしながら、音響信号の振幅が増加を開始する時刻T_start及び音響信号の振幅が最大となる時刻T_stopを特定することが難しい場合がある。この場合には、非特許文献１に記載された技術では、LATを計算することは難しい。 However, it may be difficult to specify the time T _{start at} which the amplitude of the acoustic signal starts increasing and the time T _{stop at} which the amplitude of the acoustic signal becomes maximum. In this case, with the technique described in Non-Patent Document 1, it is difficult to calculate LAT.

この発明は、従来よりも多様な音響信号に適用することができる音響特徴量計算装置及び方法、特定状況モデルデータベース作成装置、特定要素音モデルデータベース作成装置、状況推定装置、発呼適否通知装置並びにプログラムを提供することを目的とする。 The present invention relates to an acoustic feature quantity calculation device and method, a specific situation model database creation device, a specific element sound model database creation device, a situation estimation device, a call suitability notification device, and a device that can be applied to various acoustic signals than before. The purpose is to provide a program.

この発明の一態様による音響特徴量計算装置は、入力された音響信号を所定の時間長のフレームに分割するフレーム分割部と、各フレームの音響信号をK個の区間に分割し、p-kを各フレームのk番目の区間の音響信号の大きさを表す指標の平均値とし、Δp-kを各フレームのk番目の区間におけるp-kの変化率とし、mを２以上の所定の整数として、次式で定義される値を計算し、その値が０以上の場合にはその値を各フレームの立ち上がり特性とし、その値が０未満の場合には各フレームの立ち上がり特性を０とする立ち上がり特性計算部を備える特徴量抽出部と、を含む。 An acoustic feature amount calculation apparatus according to an aspect of the present invention includes a frame dividing unit that divides an input acoustic signal into frames of a predetermined time length, an acoustic signal of each frame is divided into K sections, and pk The average value of the index representing the magnitude of the acoustic signal in the kth section of the frame, Δp-k as the rate of change of pk in the kth section of each frame, and m as a predetermined integer of 2 or more, in calculates defined values, rising characteristic calculation if its value is 0 or more and the value is the rising characteristics of each frame, if the value is less than 0 to 0 rising characteristics of each frame A feature amount extraction unit including a unit.

この発明の一態様による特定状況モデルデータベース作成装置は、音響特徴量計算装置と、複数の特定要素音の特定要素音モデルを記憶する特定要素音モデルデータベースと、音響特徴量計算装置が計算した特徴量と、特定要素音モデルデータベースに記憶された特定要素音モデルとを比較して最も類似するモデルの特定要素音モデルのラベル、または特定要素音モデルのラベルを音響信号列に付与したラベル付き音響信号列を出力する要素音モデル比較部と、要素音モデル比較部の出力する特定要素音モデルのラベルまたはラベル付き音響信号列を入力として、フレームを所定数まとめたヒストグラムフレーム内の特定要素音モデルのラベルごとにその出現頻度である要素音ヒストグラムを作成する要素音ヒストグラム化部と、要素音ヒストグラムを入力として、当該要素音ヒストグラムに対してモデル化手法を用いて特定の場に対応する特定状況モデルを生成する特定状況モデル化部と、を備える。 A specific situation model database creation device according to an aspect of the present invention includes an acoustic feature amount calculation device, a specific element sound model database that stores specific element sound models of a plurality of specific element sounds, and a feature calculated by the acoustic feature amount calculation device. The labeled sound of the specific element sound model of the most similar model by comparing the quantity and the specific element sound model stored in the specific element sound model database, or the label of the specific element sound model assigned to the acoustic signal sequence A specific element sound model in a histogram frame in which a predetermined number of frames are grouped with an element sound model comparison unit that outputs a signal sequence and a label of the specific element sound model output from the element sound model comparison unit or a labeled acoustic signal sequence as an input An element sound histogram generator for creating an element sound histogram which is the frequency of appearance of each label, and an element sound hiss As input grams, and a certain situation modeling unit for generating a specific situation models corresponding to a particular field by using a modeling approach to the element tone histogram.

この発明の一態様による特定要素音モデルデータベース作成装置は、音響特徴量計算装置と、音響特徴量計算装置が計算した特徴量を入力として、当該特徴量に対してモデル化手法を用いて特定要素音モデルを生成する特定要素音モデル化部と、を備える。 A specific element sound model database creation device according to an aspect of the present invention includes an acoustic feature amount calculation device and a feature amount calculated by the acoustic feature amount calculation device as input, and a specific element using a modeling method for the feature amount. A specific element sound modeling unit that generates a sound model.

この発明の一態様による状況推定装置は、音響特徴量計算装置と、特定要素音モデルデータベース作成装置で生成された特定要素音モデルを記憶した特定要素音モデルデータベースと、特定要素音モデルと音響特徴量計算装置が計算した特徴量をそれぞれ比較し、最も近いものをそれぞれの短時間音響信号の要素音と判定してフレーム毎に要素音ラベルを付与する要素音モデル比較部と、ラベル付き音響信号列を入力として、特定要素音モデルのラベルとその頻度の要素音ヒストグラムを作成する要素音ヒストグラム化部と、特定状況モデルデータベース作成装置で生成された複数の特定状況モデルと状況分類モデルとを、記憶した特定状況モデルデータベースと、要素音ヒストグラムと、特定状況モデルまたは状況分類モデルとを比較し、最も類似するものを当該特定状況モデル又は状況分類モデルが表す状況と推定して状況推定結果を出力する状況判定モデル比較部と、を備える。 A situation estimation apparatus according to an aspect of the present invention includes an acoustic feature quantity calculation device, a specific element sound model database storing a specific element sound model generated by a specific element sound model database creation device, a specific element sound model, and an acoustic feature. An element sound model comparison unit that compares the feature amounts calculated by the quantity calculation device, determines the closest one as the element sound of each short-time acoustic signal, and assigns an element sound label for each frame, and a labeled acoustic signal The element sound histogram generation unit that creates a component sound histogram of a specific element sound model label and its frequency with a column as an input, and a plurality of specific situation models and situation classification models generated by the specific situation model database creation device, Compare the stored specific situation model database, element sound histogram, specific situation model or situation classification model, Also and a situation determining model comparison unit for outputting a state estimation result to estimate the situation represented by the specific situation model or situation classification model what similar.

この発明の一態様による発呼適否通知装置は、音響特徴量計算装置と、通話の発生し易さの度合いとを対応付けた発呼推薦モデルを保存した発呼推薦モデル保存部と、音響特徴量計算装置が計算した特徴量を入力とし、当該特徴量が一致する発呼推薦モデルを参照して受話者側において通話が良く発生する状況か若しくは通話があまり発生しない状況かを判定し通話適否通知情報を、通話者側に送信する発呼推薦状況判定部と、を備える。 A call suitability notification device according to an aspect of the present invention includes an acoustic feature amount calculation device, a call recommendation model storage unit that stores a call recommendation model that associates the degree of ease of occurrence of a call, and an acoustic feature. The feature quantity calculated by the quantity calculation device is used as an input, and the call recommendation model is determined by referring to the call recommendation model that matches the feature quantity to determine whether the call is often made or not caused by the caller. A call recommendation status determination unit that transmits the notification information to the caller side.

従来よりも多様な音響信号に適用することができる。 It can be applied to a variety of acoustic signals than in the past.

第一実施形態の音響特徴量計算装置１の機能ブロック図。The functional block diagram of the acoustic feature-value calculation apparatus 1 of 1st embodiment. 音響特徴量計算装置１の動作フローを示す図。The figure which shows the operation | movement flow of the acoustic feature-value calculation apparatus 1. 立ち上がり特性計算部１３１の処理の例を説明するための図。The figure for demonstrating the example of a process of the starting characteristic calculation part. 第二実施形態の音響特徴量計算装置１の機能ブロック図。The functional block diagram of the acoustic feature-value calculation apparatus 1 of 2nd embodiment. 第三実施形態の音響特徴量計算装置１の機能ブロック図。The functional block diagram of the acoustic feature-value calculation apparatus 1 of 3rd embodiment. 特定状況モデルデータベース作成装置１００の機能ブロック図。The functional block diagram of the specific situation model database creation apparatus 100. FIG. 特定状況モデルデータベース作成装置１００の動作フローを示す図。The figure which shows the operation | movement flow of the specific condition model database creation apparatus 100. フレームとヒストグラムフレームとの関係を示す図。The figure which shows the relationship between a frame and a histogram frame. 要素音ヒストグラムの例を示す図。The figure which shows the example of an element sound histogram. 特定状況モデルデータベース作成装置２００の機能ブロック図。The functional block diagram of the specific situation model database creation apparatus 200. FIG. 特定状況モデルデータベース作成装置２００の動作フローを示す図。The figure which shows the operation | movement flow of the specific condition model database creation apparatus 200. 特定要素音モデルデータベース作成装置３００の機能ブロック図。The functional block diagram of the specific element sound model database creation apparatus 300. 特定要素音モデルデータベース作成装置４００の機能ブロック図。The functional block diagram of the specific element sound model database creation apparatus 400. 状況推定装置５００の機能ブロック図。The functional block diagram of the situation estimation apparatus 500. FIG. 状況推定装置５００の動作フローを示す図。The figure which shows the operation | movement flow of the condition estimation apparatus 500. 発呼推薦モデル生成装置６００を組み込んだ通信システム２０００の機能ブロック図。The functional block diagram of the communication system 2000 incorporating the call recommendation model production | generation apparatus 600. FIG. 通信履歴テーブルの例を示す図。The figure which shows the example of a communication history table. 発呼推薦モデル生成装置６１０を組み込んだ通信システム２０００の機能ブロック図。The functional block diagram of the communication system 2000 incorporating the call recommendation model production | generation apparatus 610. FIG. 発呼推薦モデル生成装置６２０を組み込んだ通信システム２０００の機能ブロック図。The functional block diagram of the communication system 2000 incorporating the call recommendation model production | generation apparatus 620. FIG. 発呼推薦モデル生成装置６３０の機能ブロック図。The functional block diagram of the call recommendation model production | generation apparatus 630. FIG. 発呼推薦モデル生成装置６４０の機能ブロック図。The functional block diagram of the call recommendation model production | generation apparatus 640. FIG. 発呼推薦モデル生成装置６５０の機能ブロック図。The functional block diagram of the call recommendation model production | generation apparatus 650. FIG. 発呼推薦モデル生成装置６６０の機能ブロック図。The functional block diagram of the call recommendation model production | generation apparatus 660. FIG. 発呼適否通知装置７００が接続された通話端末２０１０を含む通信システム２０００の機能ブロック図。FIG. 3 is a functional block diagram of a communication system 2000 including a call terminal 2010 to which a call suitability notification device 700 is connected. 発呼適否通知装置７１０が接続された通話端末２０１０を含む通信システム２０００の機能ブロック図。1 is a functional block diagram of a communication system 2000 including a call terminal 2010 to which a call suitability notification device 710 is connected. 発呼適否通知装置７２０が接続された通話端末２０１０を含む通信システム２０００の機能ブロック図。1 is a functional block diagram of a communication system 2000 including a call terminal 2010 to which a call suitability notification device 720 is connected. 発呼適否通知装置７３０の機能ブロック図。The functional block diagram of the calling suitability notification apparatus 730. 発呼適否通知装置７４０の機能ブロック図。The functional block diagram of the calling suitability notification apparatus 740. 発呼適否通知装置７５０の機能ブロック図。The functional block diagram of the calling suitability notification apparatus 750. 発呼適否通知装置７６０の機能ブロック図。The functional block diagram of the calling suitability notification apparatus 760.

以下、図面を参照して、この発明の実施形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

第一実施形態から第三実施形態が音響特徴量計算装置及び方法の実施形態であり、第四実施形態から第五実施形態が特定状況モデルデータベース作成装置の実施形態であり、第六実施形態から第七実施形態が特定要素音モデルデータベース作成装置の実施形態であり、第八実施形態が状況推定装置の実施形態であり、第九実施形態から第十三実施形態が発呼推薦モデル生成装置の実施形態であり、第十四実施形態から第十八実施形態が発呼適否通知装置の実施形態である。 The first embodiment to the third embodiment are embodiments of the acoustic feature quantity calculation device and method, the fourth embodiment to the fifth embodiment are embodiments of the specific situation model database creation device, and from the sixth embodiment. The seventh embodiment is an embodiment of a specific element sound model database creation device, the eighth embodiment is an embodiment of a situation estimation device, and the ninth to thirteenth embodiments are call recommendation model generation devices. The fourteenth embodiment to the eighteenth embodiment are embodiments of the call suitability notification device.

［第一実施形態］
第一実施形態の音響特徴量計算装置１は、図１に示すように、フレーム分割部１１、量子化部１２及び特徴量抽出部１３を例えば備える。第一実施形態の音響特徴量計算装置１の動作フローを図２に示す。 [First embodiment]
As shown in FIG. 1, the acoustic feature quantity calculation device 1 of the first embodiment includes, for example, a frame division unit 11, a quantization unit 12, and a feature quantity extraction unit 13. FIG. 2 shows an operation flow of the acoustic feature quantity calculation apparatus 1 of the first embodiment.

フレーム分割部１１は、入力された音響信号を所定の時間長のフレームに分割する（ステップＡ１）。所定の時間長とは、例えば約５０ミリ秒である。連続する２つのフレームは、重なっていてもよいし、重なっていなくてもよい。フレームに分割された音響信号は、特徴量抽出部１３に出力される。 The frame dividing unit 11 divides the input acoustic signal into frames having a predetermined time length (step A1). The predetermined time length is, for example, about 50 milliseconds. Two consecutive frames may or may not overlap. The acoustic signal divided into frames is output to the feature amount extraction unit 13.

図１の例では、入力された音響信号は、量子化部１２により、一定の時間間隔毎に及び一定の音圧毎に量子化された離散信号である。もちろん、量子化部１２の処理は、フレーム分割部１１の処理の後や、特徴量抽出部１３の処理の後に行われてもよい。 In the example of FIG. 1, the input acoustic signal is a discrete signal quantized by the quantization unit 12 at regular time intervals and at regular sound pressures. Of course, the processing of the quantizing unit 12 may be performed after the processing of the frame dividing unit 11 or after the processing of the feature amount extracting unit 13.

特徴量抽出部１３は、立ち上がり特性計算部１３１を備える。立ち上がり特性計算部１３１は、各フレームの立ち上がり特性を計算する（ステップＡ２）。 The feature amount extraction unit 13 includes a rising characteristic calculation unit 131. The rising characteristic calculation unit 131 calculates the rising characteristic of each frame (step A2).

立ち上がり特性とは、数十から数百ミリ秒毎における、音響信号の大きさを表す指標の増加の度合いを表す指標である。ここで、音響信号の大きさを表す指標とは、例えば、音響信号の振幅の絶対値、音響信号の振幅の絶対値の対数値、音響信号のパワー又は音響信号のパワーの対数値である。 The rising characteristic is an index representing the degree of increase in the index representing the magnitude of the acoustic signal every several tens to several hundreds of milliseconds. Here, the index representing the magnitude of the acoustic signal is, for example, an absolute value of the amplitude of the acoustic signal, a logarithmic value of the absolute value of the amplitude of the acoustic signal, a power of the acoustic signal, or a logarithmic value of the power of the acoustic signal.

立ち上がり特性を計算するために、立ち上がり特性計算部１３１は、まず、各フレームの音響信号をK個の区間に分割する。Kは、所定の正の整数である。各分割された区間が約１ミリ秒になるように、Kの値は設定される。 In order to calculate the rising characteristic, the rising characteristic calculation unit 131 first divides the acoustic signal of each frame into K sections. K is a predetermined positive integer. The value of K is set so that each divided section is about 1 millisecond.

次に、立ち上がり特性計算部１３１は、次式で定義される値を計算し、その値が０以上の場合にはその値を各フレームの立ち上がり特定とし、その値が０未満の場合には各フレームの立ち上がり特性を０とする。計算された立ち上がり特性は、特徴量として特徴量抽出部１３から出力される。 Next, the rising characteristic calculation unit 131 calculates a value defined by the following equation. If the value is 0 or more, the value is specified as the rising edge of each frame, and if the value is less than 0, each value The rising characteristic of the frame is set to zero. The calculated rise characteristic is output from the feature amount extraction unit 13 as a feature amount.

p^- _kはフレームのk番目の区間の音響信号の大きさを表す指標の平均値であり、Δp^- _kはフレームのk番目の区間におけるp^- _kの変化率である。xを任意の文字として、xの右肩の「-」は、xの上付きバーを意味する。mは、２以上の所定の整数である。例えば、m=2である。 p ^- _k is an average value of an index indicating the magnitude of the acoustic signal in the k-th section of the frame, and Δp ^- _k is a rate of change of p ^- _k in the k-th section of the frame. “x” on the right shoulder of x, where x is an arbitrary character, means a superscript bar of x. m is a predetermined integer of 2 or more. For example, m = 2.

例えば、Δp^- _k=p^- _k-p^- _k-1である。Δp^- _k=p^- _k+1-p^- _kとしてもよい。また、最小二乗法等の近似手法を用いてp^- _kを近似した直線を求め、k番目の区間におけるその直線の傾きをΔp^- _kとしてもよい。 For example, Δp ^{^-} _k = p ^- a _{_k-1} ^- _k -p. ^{_{^{_{Δp - k = p - k +}}}} 1 -p - may be as _k. Alternatively, a straight line approximating p ⁻ _k may be _obtained using an approximation method such as a least square method, and the slope of the straight line in the kth section may be Δp ⁻ _k .

p^- _kを音響信号のパワーとし、Δp^- _k=p^- _k+1-p^- _kとした場合、図３に示すように、Δp^- ₂=p^- ₃- p^- ₂となる。 p ^- a _k and power of the acoustic ^{_{^{_{signal, Δp - k = p - k}}}} + 1 -p - If set to _k, as shown in FIG. ^{_{^{3, Δp - 2 = p -}}} 3 - p - 2 become.

このように、立ち上がり特性を計算することにより、音響信号の振幅が増加を開始する時刻T_start及び音響信号の振幅が最大となる時刻T_stopを特定する必要がないため、これらの時刻T_start,T_stopを特定することが難しい場合であっても、立ち上がり特性を計算することができる。 Thus, by calculating the rising characteristics, there is no need to amplitude of the time T _start and acoustic signal amplitude of the acoustic signal starts increasing to identify the time T _stop having the maximum these times T _start, Even when it is difficult to specify T _stop , the rise characteristic can be calculated.

また、増加特性を強調して抽出することが可能となっているため、従来技術では立ち上がり特性のみが抽出困難であった音響信号に対しても効果的に立ち上がり特性を抽出可能となっている。 In addition, since it is possible to extract the enhancement characteristic with emphasis, it is possible to effectively extract the rising characteristic even for an acoustic signal in which it is difficult to extract only the rising characteristic in the conventional technique.

［第二実施形態］
第二実施形態の音響特徴量計算装置１は、立ち上がり特性以外の音響特徴量を計算する点で第一実施形態の音響特徴量計算装置１と異なる。以下、第一実施形態と異なる部分を中心に説明し、第一実施形態と同様の部分については説明を省略する。 [Second Embodiment]
The acoustic feature quantity calculation apparatus 1 of the second embodiment is different from the acoustic feature quantity calculation apparatus 1 of the first embodiment in that an acoustic feature quantity other than the rising characteristic is calculated. Hereinafter, the description will focus on the parts different from the first embodiment, and the description of the same parts as the first embodiment will be omitted.

第二実施形態の音響特徴量計算装置１の特徴量抽出部１３は、立ち上がり特性計算部１３１に加えて、突発性計算部１３２と、時間拡散性計算部１３３と、狭帯域性計算部１３４と、帯域拡散性計算部１３５と、音高特性計算部１３６と、振幅偏在性計算部１３７との少なくとも１つを更に備える。図４は、特徴量抽出部１３がこれらの部の全てを備えている場合の、第二実施形態の音響特徴量計算装置１の機能ブロック図を示している。 The feature quantity extraction unit 13 of the acoustic feature quantity calculation apparatus 1 according to the second embodiment includes an abruptness calculation unit 132, a time diffusivity calculation unit 133, and a narrow band calculation unit 134 in addition to the rising characteristic calculation unit 131. Further, at least one of a band spread calculation unit 135, a pitch characteristic calculation unit 136, and an amplitude unevenness calculation unit 137 is further provided. FIG. 4 shows a functional block diagram of the acoustic feature quantity calculation apparatus 1 of the second embodiment when the feature quantity extraction unit 13 includes all of these parts.

第二実施形態の音響特徴量計算装置１は、量子化部１１、フレーム分割部１２及び特徴量抽出部１３に加えて、ベクトル生成部１４を更に備えている。 The acoustic feature quantity calculation apparatus 1 of the second embodiment further includes a vector generation section 14 in addition to the quantization section 11, the frame division section 12, and the feature quantity extraction section 13.

突発性計算部１３２は、各フレームの音響信号の時間領域での集中の度合いを示す突発性を計算する。突発性は、例えば次式により定義される値である。μ^- _nはｎ番目の区間の音響エネルギー包絡の平均値であり、σ^- _nはｎ番目の区間の音響エネルギー包絡の分散値であり、 The suddenness calculation unit 132 calculates the suddenness indicating the degree of concentration of the acoustic signal of each frame in the time domain. The suddenness is a value defined by the following equation, for example. μ ⁻ _n is the average value of the acoustic energy envelope in the n th section, σ ⁻ _n is the variance of the acoustic energy envelope in the n th section,

時間拡散性計算部１３３は、各フレームの音響信号の時間領域での拡散の度合いを示す時間拡散性を計算する。時間拡散性は、例えば次式により定義される値である。x_nは時間領域における計算フレーム開始位置からの距離であり、x^- _nは時間領域における音響エネルギー包絡の平均値となる位置である。 The time diffusivity calculation unit 133 calculates time diffusivity indicating the degree of diffusion in the time domain of the acoustic signal of each frame. The time diffusivity is a value defined by the following equation, for example. x _n is a distance from the calculation frame start position in the time domain, and x ⁻ _n is a position that is an average value of the acoustic energy envelope in the time domain.

狭帯域性計算部１３４は、各フレームの音響信号の周波数領域での集中の度合いを示す狭帯域性を計算する。狭帯域性は、例えば次式により定義される値である。fは周波数であり、Fは周波数ビンの数であり、p^-(f)は周波数fの音響エネルギーの平均値であり、μ^- _fは音響エネルギー包絡の分布の平均値となる周波数であり、σ^- _fは音響エネルギー包絡の分布の分散値である。 The narrow band calculation unit 134 calculates the narrow band indicating the degree of concentration of the acoustic signal of each frame in the frequency domain. The narrow bandwidth is a value defined by the following equation, for example. f is the frequency, F is the number of frequency bins, p ^- (f) is the mean value of the acoustic energy at a frequency f, mu ^- _f is the frequency at which the mean value of the distribution of acoustic energy envelope, σ ⁻ _f is a variance value of the distribution of the acoustic energy envelope.

帯域拡散性計算部１３５は、各フレームの音響信号の周波数領域での拡散の度合いを示す帯域拡散性を計算する。帯域拡散性は、例えば次式により定義される値である。 The band spread calculation unit 135 calculates band spread indicating the degree of spread in the frequency domain of the acoustic signal of each frame. The band spreading property is a value defined by the following equation, for example.

音高特性計算部１３６は、各フレームの音響信号の周波数領域でのエネルギーの偏在の度合いを示す音高特性を計算する。音高特性は、例えば次式により定義される値である。p(f)は周波数fの音響エネルギーである。 The pitch characteristic calculation unit 136 calculates a pitch characteristic indicating the degree of uneven distribution of energy in the frequency domain of the acoustic signal of each frame. The pitch characteristic is a value defined by the following equation, for example. p (f) is the acoustic energy of frequency f.

振幅偏在性計算部１３７は、各フレームの音響信号の振幅値の分布の偏在の度合いを示す振幅偏在性を計算する。振幅偏在性は、例えば次式により定義される値である。p_nは、n番目のサンプルの振幅値である。 The amplitude unevenness calculation unit 137 calculates the amplitude unevenness indicating the degree of uneven distribution of the amplitude value distribution of the acoustic signal of each frame. The amplitude uneven distribution is a value defined by the following equation, for example. p _n is the amplitude value of the nth sample.

特徴量抽出部１３で計算された特徴量は、ベクトル生成部１４でベクトル化される。特徴量抽出部１３で計算された特徴量とは、立ち上がり特性と、更に、突発性、時間拡散性、狭帯域性、帯域拡散性、音高特性及び振幅偏在性の少なくとも１つとである。 The feature amount calculated by the feature amount extraction unit 13 is vectorized by the vector generation unit 14. The feature amount calculated by the feature amount extraction unit 13 is a rising characteristic and at least one of suddenness, time diffusibility, narrow band property, band spread property, pitch property, and amplitude unevenness.

［第三実施形態］
第三実施形態の音響特徴量計算装置１は、第一実施形態又は第二実施形態の特徴量抽出部１３で計算された特徴量以外の音響特徴量を計算する点で、第一実施形態又は第二実施形態の音響特徴量計算装置１と異なる。以下、第一実施形態又は第二実施形態と異なる部分を中心に説明し、第一実施形態又は第二実施形態と同様の部分については説明を省略する。 [Third embodiment]
The acoustic feature quantity calculation device 1 of the third embodiment is the first embodiment or the point that calculates the acoustic feature quantity other than the feature quantity calculated by the feature quantity extraction unit 13 of the first embodiment or the second embodiment. It differs from the acoustic feature quantity calculation apparatus 1 of the second embodiment. The following description will focus on the parts that are different from the first embodiment or the second embodiment, and the description of the same parts as the first embodiment or the second embodiment will be omitted.

第三実施形態の音響特徴量計算装置１の特徴量抽出部１３は、音響特徴量計算部１３８を更に備える。図５は、特徴量抽出部１３が、第二実施形態で説明した、突発性計算部１３２、時間拡散性計算部１３３、狭帯域性計算部１３４、帯域拡散性計算部１３５、音高特性計算部１３６及び振幅偏在性計算部１３７の全てを備えている場合の、第三実施形態の音響特徴量計算装置１の機能ブロック図である。 The feature quantity extraction unit 13 of the acoustic feature quantity calculation device 1 according to the third embodiment further includes an acoustic feature quantity calculation unit 138. In FIG. 5, the feature amount extraction unit 13 has the suddenness calculation unit 132, the time diffusivity calculation unit 133, the narrow band calculation unit 134, the band diffusivity calculation unit 135, and the pitch characteristic calculation described in the second embodiment. It is a functional block diagram of the acoustic feature quantity calculation apparatus 1 of the third embodiment when all of the unit 136 and the amplitude unevenness calculation unit 137 are provided.

音響特徴量計算部１３８は、MFCC(Mel-Frequency Cepstrum Coefficient)、パワースペクトル等の音響特徴量を計算する。もちろん、音響特徴量計算部１３８は、音響特徴量として、他の既存技術による音響特徴量を計算してもよい。 The acoustic feature amount calculation unit 138 calculates acoustic feature amounts such as MFCC (Mel-Frequency Cepstrum Coefficient) and power spectrum. Of course, the acoustic feature quantity calculation unit 138 may calculate an acoustic feature quantity according to another existing technology as the acoustic feature quantity.

音響特徴量計算部１３８で計算された音響特徴量は、特徴量抽出部１３で計算された特徴量として、ベクトル生成部１４に出力される。 The acoustic feature amount calculated by the acoustic feature amount calculation unit 138 is output to the vector generation unit 14 as the feature amount calculated by the feature amount extraction unit 13.

［第四実施形態］
第四実施形態の特定状況モデルデータベース作成装置１００は、第一実施形態から第三実施形態の音響特徴量計算装置１を用いて特定状況モデルデータベースを作成するものである。 [Fourth embodiment]
The specific situation model database creation device 100 of the fourth embodiment creates a specific situation model database using the acoustic feature quantity calculation device 1 of the first embodiment to the third embodiment.

図６に、第四実施形態の特定状況モデルデータベース作成装置１００の機能ブロック図の例を示す。その動作フローの例を図７に示す。特定状況モデルデータベース作成装置１００は、音響特徴量計算装置１と、特定要素音モデルデータベース２０と、要素音モデル比較部３０と、要素音ヒストグラム化部４０と、特定状況モデル化部５０と、を具備する。特定状況モデルデータベース作成装置１００は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 In FIG. 6, the example of the functional block diagram of the specific condition model database creation apparatus 100 of 4th embodiment is shown. An example of the operation flow is shown in FIG. The specific situation model database creation device 100 includes the acoustic feature quantity calculation device 1, a specific element sound model database 20, an element sound model comparison unit 30, an element sound histogram conversion unit 40, and a specific condition model generation unit 50. It has. The specific situation model database creation apparatus 100 is realized by a predetermined program being read into a computer constituted by, for example, a ROM, a RAM, and a CPU, and the CPU executing the program.

音響特徴量計算装置１は、第一実施形態から第三実施形態の何れかの音響特徴量計算装置１である。音響特徴量計算装置１は、第一実施形態から第三実施形態で説明した方法により、ある特定の場における複数の要素音を含む音響信号列を短時間フレームに分割し当該フレーム毎に特徴量を抽出する（ステップＳ１０）。ステップＳ１０は、図７のステップＡ１とＡ２とに対応する。音響特徴量計算装置１で計算された特徴量は、要素音モデル比較部３０に出力される。ここで、ある特定の場における複数の要素音を含む音響信号列とは、例えば人が料理をしている状況、人が読書をしている状況などの特定の場の状況を表す音響信号列のことである。つまり、特定の場で録音した時間長が例えば５秒〜２０秒程度の音響信号である。その音響信号を、20msec〜100msecのフレームに分割し、そのフレーム毎に、特徴量を計算して、計算した特徴量を要素音の特徴量とする。 The acoustic feature quantity calculation device 1 is the acoustic feature quantity calculation device 1 according to any one of the first to third embodiments. The acoustic feature quantity calculation device 1 divides an acoustic signal sequence including a plurality of element sounds in a specific field into short-time frames by the method described in the first embodiment to the third embodiment, and features quantities for each frame. Is extracted (step S10). Step S10 corresponds to steps A1 and A2 in FIG. The feature amount calculated by the acoustic feature amount calculation device 1 is output to the element sound model comparison unit 30. Here, the acoustic signal sequence including a plurality of element sounds in a specific place is an acoustic signal sequence representing the situation in a specific place such as a situation where a person is cooking or a person is reading. That's it. That is, it is an acoustic signal whose time length recorded in a specific place is about 5 to 20 seconds, for example. The acoustic signal is divided into frames of 20 msec to 100 msec, the feature amount is calculated for each frame, and the calculated feature amount is set as the feature amount of the element sound.

音響特徴量計算装置１が計算した特徴量、言い換えれば特徴量抽出部１３が抽出した特徴量とは、例えば立ち上がり特性、突発性、時間拡散性、狭帯域性、帯域拡散性、音高特性、振幅偏在性、MFCC(Mel-Frequency Cepstrum Coefficient)、パワースペクトル等の音響特徴量である。 The feature quantity calculated by the acoustic feature quantity calculation device 1, in other words, the feature quantity extracted by the feature quantity extraction unit 13 is, for example, rise characteristics, suddenness, time diffusivity, narrow band characteristics, band spread characteristics, pitch characteristics, This is an acoustic feature such as amplitude unevenness, MFCC (Mel-Frequency Cepstrum Coefficient), power spectrum, and the like.

要素音モデル比較部３０は、音響特徴量計算装置１の特徴量抽出部１３が出力する特徴量と、特定要素音モデルデータベース２０に記憶されている複数の特定要素音モデルとをそれぞれ比較して距離（ユークリッド距離やコサイン距離）が最も近い特定要素音モデルのラベル、または当該特定要素音モデルのラベルをフレーム単位で音響信号列に付与したラベル付き音響信号列を出力する（ステップＳ３０）。特定要素音モデルのラベル付き特徴量は、後述する特定要素音モデルデータベース作成装置３００で作成する。 The element sound model comparison unit 30 compares the feature amount output from the feature amount extraction unit 13 of the acoustic feature amount calculation device 1 with a plurality of specific element sound models stored in the specific element sound model database 20. The label of the specific element sound model with the shortest distance (Euclidean distance or cosine distance), or the labeled acoustic signal string obtained by adding the label of the specific element sound model to the acoustic signal string in units of frames is output (step S30). The feature quantity with a label of the specific element sound model is created by the specific element sound model database creation apparatus 300 described later.

要素音ヒストグラム化部４０は、要素音モデル比較部３０から出力された特定要素音モデルのラベルまたはラベル付き音響信号列を入力として、上記フレームを所定数まとめたヒストグラムフレーム内の特定要素音モデルのラベルごとにその出現頻度である要素音ヒストグラムを作成する（ステップＳ４０）。図８に、フレームとヒストグラムフレームとの関係を示す。 The element sound histogram generation unit 40 receives the label of the specific element sound model or the labeled acoustic signal sequence output from the element sound model comparison unit 30 as an input, and outputs the specific element sound model in the histogram frame in which a predetermined number of the frames are collected. An element sound histogram which is the appearance frequency for each label is created (step S40). FIG. 8 shows the relationship between frames and histogram frames.

図８は、特定の場を、例えば「人が料理をしている状況」とした例である。ラベル付き音響信号列は、例えば20msec〜100msecの時間幅のフレーム毎に特定要素音モデルのラベルが付与された信号列である。 FIG. 8 is an example in which the specific place is set to, for example, “a situation where a person is cooking”. The labeled acoustic signal sequence is, for example, a signal sequence to which a specific element sound model label is assigned for each frame having a time width of 20 msec to 100 msec.

図８の例では、最初のフレームｆ_１が人の足音、２番目のフレームｆ_２が包丁で食材を切る音、３番目のフレームｆ_３が人の足音など、人が料理する場面での特定要素音モデルのラベルが付与されている。ヒストグラムフレームは、そのフレームをＰ個まとめたものであり、Ｐ個は例えば１００個〜１０００個とする。最初のフレームｆ_１からｆ_Ｐフレームまでが１番目のヒストグラムフレームＨ_１である。２番目のヒストグラムフレームＨ_２はフレームｆ_２からｆ_Ｐ＋１フレームから成る。ラベル付き音響信号列のフレーム長をＭとした場合、Ｍ−Ｐ＋１個のヒストグラムフレームが作成される。 In the example of FIG. 8, the first frame f ₁ is a person's footsteps, the second frame f ₂ is a sound of cutting food with a kitchen knife, the third frame f ₃ is a person's footsteps, etc. The element sound model label is assigned. The histogram frame is a collection of P frames, and the P frames are, for example, 100 to 1000. From the first frame _{f 1} to _{f P} frame is first histogram frame _{H 1.} The second histogram frame H ₂ consists of frames f ₂ to f _{P + 1} frames. When the frame length of the labeled acoustic signal sequence is M, M−P + 1 histogram frames are created.

要素音ヒストグラム化部４０は、ヒストグラムフレーム内の特定要素音モデルのラベルごとにその出現頻度である要素音ヒストグラムを作成する。図９に、要素音ヒストグラムを例示する。横軸は特定要素音モデルのラベル、縦軸は例えば、１個のヒストグラムフレーム内で各特定要素音が何回現れたかの回数や、各フレーム内における各特定要素音の尤度の、ヒストグラムフレーム内での特定要素音毎の総和等である。 The element sound histogram generator 40 creates an element sound histogram that is the appearance frequency for each label of the specific element sound model in the histogram frame. FIG. 9 illustrates an element sound histogram. The horizontal axis is the label of the specific element sound model, and the vertical axis is, for example, the number of times each specific element sound appears in one histogram frame and the likelihood of each specific element sound in each frame. This is the total sum for each specific element sound.

特定状況モデル化部５０は、要素音ヒストグラム化部４０が出力する要素音ヒストグラムを入力として、当該要素音ヒストグラムに対してモデル化手法を用いて特定の場に対応する特定状況モデルを生成する（ステップＳ５０）。モデル化手法とは、例えばＧＭＭ（Gaussian Mixture Model）を用いた場合、生成した特徴量を、ＥＭ（Expectation Maximization）アルゴリズムなどを用いて例えば式（１）に示すような混合正規分布（Mixture of Gaussian）を用いて当てはめた確率モデルｐ（ｘ）にモデル化することである。 The specific situation modeling unit 50 receives the element sound histogram output from the element sound histogram generation unit 40 as an input, and generates a specific situation model corresponding to a specific field using the modeling method for the element sound histogram ( Step S50). For example, when a GMM (Gaussian Mixture Model) is used, the generated modeling method uses a mixed normal distribution (Mixture of Gaussian as shown in, for example, Expression (1) using an EM (Expectation Maximization) algorithm or the like. ) To model the probability model p (x) fitted.

ここで、ｘは特徴量（ベクトル）、ｋは正規分布の混合数、π_ｋは混合係数、Ｎは正規分布の確率密度関数、μ_ｋは分布の平均、Σ_ｋは分布の分散である。なお、特徴量のモデル化には、過去に観測された信号成分に依存して次の時刻の成分が選択されるという条件を用いて確率分布に計算した特徴量を当てはめるＨＭＭ（Hidden Markov Model）や、特徴量に対して各クラスタ間のマージンを最大化して分離境界を決定することによりモデル化を行うＳＶＭ（Support Vector Machine）等を用いることができる。ＧＭＭ，ＨＭＭ，ＳＶＭは周知である（例えば参考文献：奥村学、高村大也、「言語処理のための機械学習入門」コロナ社）。 Here, x is a feature quantity (vector), k is the number of normal distributions, π _k is a mixing coefficient, N is a probability density function of normal distribution, μ _k is an average of distribution, and Σ _k is a distribution variance. For modeling the feature quantity, an HMM (Hidden Markov Model) that applies the feature quantity calculated to the probability distribution using the condition that the component at the next time is selected depending on the signal component observed in the past. Alternatively, an SVM (Support Vector Machine) that performs modeling by maximizing the margin between the clusters and determining the separation boundary with respect to the feature amount can be used. GMM, HMM, and SVM are well known (for example, references: Manabu Okumura, Daiya Takamura, “Introduction to Machine Learning for Language Processing” Corona).

例えばＧＭＭを用いて特定状況モデルを生成した場合、要素音ヒストグラム化部４０で作成されたＭ−Ｐ＋１個のヒストグラムフレームのそれぞれは、Ｎ個の特定要素音モデルのラベルを有する。特定状況モデルはそのまま出力しても良いし、特定状況モデルデータベース６０に保存するようにしてもよい。 For example, when a specific situation model is generated using GMM, each of the M−P + 1 histogram frames created by the element sound histogram generator 40 has labels of N specific element sounds models. The specific situation model may be output as it is or may be stored in the specific situation model database 60.

この前提において、特定状況モデル化部５０は、料理をしている等の特定の状況を表す、一つまたは複数の長時間音響信号から得られた複数のヒストグラムフレームから平均と分散を求める。この際、Ｒ種類の特定状況モデルを計算するとすれば、それぞれＲ個の平均と分散を計算し、その値が各々特定状況モデルとなる。 Under this assumption, the specific situation modeling unit 50 obtains an average and a variance from a plurality of histogram frames obtained from one or a plurality of long-time acoustic signals representing a specific situation such as cooking. At this time, if R types of specific situation models are calculated, R averages and variances are calculated, respectively, and the values become the specific situation models.

以上説明したように、この発明の特定状況モデルデータベース作成装置１００によれば、ある特定の場における複数の要素音を含む音響信号列から、特定要素音の識別を行い、その識別結果をヒストグラム化した分布から、その場を特定する特定状況モデルを生成する。この特定状況モデルは、従来技術の１個の断片的な特徴量と異なり、複数の特定要素音から求められるので、複数の異なる音によって初めて特徴付けられる場（例えば料理中の場）の状況を推定するモデルとして有効なものとなる。 As described above, according to the specific situation model database creation device 100 of the present invention, specific element sounds are identified from an acoustic signal sequence including a plurality of element sounds in a specific field, and the identification results are histogrammed. A specific situation model that identifies the place is generated from the distribution. Since this specific situation model is obtained from a plurality of specific element sounds, unlike the one piecewise feature amount of the prior art, the situation of a place (for example, a place being cooked) characterized by a plurality of different sounds for the first time is obtained. This is an effective model for estimation.

［第五実施形態］
図１０に、第五実施形態の特定状況モデルデータベース作成装置２００の機能ブロック図の例を示す。その動作フローの例を図１１に示す。特定状況モデルデータベース作成装置２００は、上記した特定状況モデルデータベース作成装置１００に対して、入力される音響信号列が特定の場を表す音響信号でなくても良い点、つまり不特定の場で録音した音響信号で良い点と、要素音ヒストグラム化部４０で作成した要素音ヒストグラムをその分布の形状で分類する分布クラスタリング処理部２１０と、その出力から状況分類モデルを生成する状況分類モデル化部２２０と、を備える点で異なる。特定状況モデルデータベース作成装置２００も、特定状況モデルデータベース作成装置１００と同様に、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 [Fifth embodiment]
In FIG. 10, the example of the functional block diagram of the specific condition model database creation apparatus 200 of 5th embodiment is shown. An example of the operation flow is shown in FIG. The specific situation model database creation device 200 records the sound signal in an unspecified place, that is, the input acoustic signal sequence may not be an acoustic signal representing a specific place with respect to the specific situation model database creation apparatus 100 described above. The distribution clustering processing unit 210 for classifying the element sound histogram created by the element sound histogram generating unit 40 according to the distribution shape, and the situation classification modeling unit 220 for generating a situation classification model from the output. And is different in that it comprises. Similarly to the specific situation model database creation apparatus 100, the specific situation model database creation apparatus 200 reads a predetermined program into a computer composed of, for example, a ROM, a RAM, and a CPU, and the CPU executes the program. It is realized.

特定状況モデルデータベース作成装置２００は、音響特徴量計算装置１と、特定要素音モデルデータベース２０と、要素音モデル比較部３０と、要素音ヒストグラム化部４０と、分布クラスタリング処理部２１０と、状況分類モデル化部２２０と、を備える。音響特徴量計算装置１と特定要素音モデルデータベース２０と要素音モデル比較部３０と要素音ヒストグラム化部４０は、参照符号から明らかなように特定状況モデルデータベース作成装置１００と同じものである。 The specific situation model database creation device 200 includes an acoustic feature quantity calculation device 1, a specific element sound model database 20, an element sound model comparison unit 30, an element sound histogram conversion unit 40, a distribution clustering processing unit 210, and a situation classification. A modeling unit 220. The acoustic feature quantity calculation device 1, the specific element sound model database 20, the element sound model comparison unit 30, and the element sound histogram generation unit 40 are the same as the specific situation model database creation device 100 as is apparent from the reference numerals.

分布クラスタリング処理部２１０は、要素音ヒストグラム化部４０が作成した複数の要素音ヒストグラムを入力として、それぞれの要素音ヒストグラムを、その分布の形状で分類する（ステップＳ２１０）。つまり、Ｍ−Ｐ＋１個のヒストグラムを、その分布の形状が似ているもの同士で分類してＢ個のヒストグラムのまとまりを作成する。Ｂ個は、予め設定した「分類したい要素音の数」である。分布の形状で分類する手法には、上記した特定状況モデルを生成するのと同じ手法を用いることができる。ＧＭＭやＳＶＭ等の分類手法を用いることで、Ｍ−Ｐ＋１個のヒストグラムをＢ個のヒストグラムのまとまり（組）に分類する。この分布の形状が似ているヒストグラムのまとまりのそれぞれは、ある特定の場に対応したものとなる。状況分類モデル化部２２０は、ヒストグラムのまとまりであるＢ個の組に対してＧＭＭやＨＭＭ、ＳＶＭ等のモデル化手法を用いてＢ種類の状況分類モデルを生成する（ステップＳ２２０）。状況分類モデルの生成方法は、上記した特定状況モデルを生成する方法と同じである。 The distribution clustering processing unit 210 receives a plurality of element sound histograms created by the element sound histogram generating unit 40, and classifies each element sound histogram according to its distribution shape (step S210). In other words, a group of B histograms is created by classifying M−P + 1 histograms with similar distribution shapes. B is a preset “number of element sounds to be classified”. As a method of classifying by the shape of the distribution, the same method as that for generating the specific situation model described above can be used. By using a classification method such as GMM or SVM, the M−P + 1 histograms are classified into groups (sets) of B histograms. Each group of histograms having similar distribution shapes corresponds to a specific field. The situation classification modeling unit 220 generates B types of situation classification models using modeling techniques such as GMM, HMM, and SVM for the B sets that are a group of histograms (step S220). The method for generating the situation classification model is the same as the method for generating the specific situation model described above.

［第六実施形態］
第六実施形態として、特定状況モデルデータベース作成装置１００と２００を構成する特定要素音モデルデータベース２０を作成する特定要素音モデルデータベース作成装置３００について説明する。 [Sixth embodiment]
As a sixth embodiment, a specific element sound model database creation apparatus 300 that creates a specific element sound model database 20 constituting the specific situation model database creation apparatuses 100 and 200 will be described.

図１２に、特定要素音モデルデータベース作成装置３００の機能ブロック図を示す。特定要素音モデルデータベース作成装置３００は、音響特徴量計算装置１と、特定要素音モデル化部３２０と、を具備する。 FIG. 12 shows a functional block diagram of the specific element sound model database creation device 300. The specific element sound model database creation device 300 includes the acoustic feature quantity calculation device 1 and a specific element sound modeling unit 320.

音響特徴量計算装置１は、特定音の音響信号列を入力として当該音響信号列を短時間フレームに分割してフレーム毎に特徴量を抽出する。例えば、特定音の足音の音響信号が複数ある場合は、その全ての音響信号の特徴量（ベクトル）が計算される。足音の音響信号がｎ個あり、それぞれの音響信号がｍ個の短時間に分割可能であれば、ｎ×ｍ個の特徴量（ベクトル）が計算される。 The acoustic feature quantity calculation apparatus 1 receives an acoustic signal string of a specific sound as an input, divides the acoustic signal string into short-time frames, and extracts feature quantities for each frame. For example, in the case where there are a plurality of footstep acoustic signals of a specific sound, feature quantities (vectors) of all the acoustic signals are calculated. If there are n footstep sound signals and each sound signal can be divided into m short time periods, n × m feature values (vectors) are calculated.

特定要素音モデル化部３２０は、ｎ×ｍ個の特徴量（ベクトル）に対してモデル化手法を用いて１つの特定要素音モデルを生成する。モデル化手法は、上記した特定状況モデル化部５０で特定状況モデルを生成した手法と同じものを用いる。生成した特定要素音モデルは、特定要素音モデルデータベース２０に記憶される。特定要素音モデルデータベース２０は、上記したように特定状況モデルデータベース作成装置１００を構成する。特定要素音モデルデータベース作成装置３００の他の実施形態を次に説明する。 The specific element sound modeling unit 320 generates one specific element sound model by using a modeling method for n × m feature quantities (vectors). As the modeling method, the same method as the method for generating the specific situation model by the specific situation modeling unit 50 described above is used. The generated specific element sound model is stored in the specific element sound model database 20. The specific element sound model database 20 constitutes the specific situation model database creation apparatus 100 as described above. Next, another embodiment of the specific element sound model database creation apparatus 300 will be described.

［第七実施形態］
図１３に、第七実施形態である特定要素音モデルデータベース作成装置４００の機能ブロック図を示す。特定要素音モデルデータベース作成装置４００は、特定要素音モデルデータベース作成装置３００に対して、特徴量クラスタリング部４１０と要素音分類モデル化部４２０を備える点と、入力される音響信号列に複数の要素音を含む点で異なる。 [Seventh embodiment]
FIG. 13 shows a functional block diagram of a specific element sound model database creation device 400 according to the seventh embodiment. The specific element sound model database creation apparatus 400 is different from the specific element sound model database creation apparatus 300 in that a feature amount clustering unit 410 and an element sound classification modeling unit 420 are provided, and a plurality of elements are included in the input acoustic signal sequence. It differs in that it includes sound.

音響特徴量計算装置１は、複数の要素音を含む音響信号列を入力とする点のみが異なるだけで、他は特定要素音モデルデータベース作成装置３００のそれと同じである。特徴量クラスタリング部４１０は、音響特徴量計算装置１が出力する特徴量を分類して特徴量の組を作成する。特徴量の分類手法にはＧＭＭやＳＶＭ等の手法を用い、音響信号列をＣ個のまとまり（組）に分類する。Ｃ個は、予め設定した「分類したい特徴量の数」である。 The acoustic feature quantity calculation apparatus 1 is the same as that of the specific element sound model database creation apparatus 300 except that the acoustic feature quantity calculation apparatus 1 is different only in that an acoustic signal sequence including a plurality of element sounds is input. The feature quantity clustering unit 410 classifies the feature quantities output by the acoustic feature quantity calculation device 1 and creates a set of feature quantities. A method such as GMM or SVM is used as a feature amount classification method, and the acoustic signal sequence is classified into C groups (groups). C is a preset “number of feature quantities to be classified”.

要素音分類モデル化部４２０は、特徴量クラスタリング部４１０が出力するＣ個の特徴量の組を入力として、当該組に対してモデル化手法を用いて要素音分類モデルを生成する。モデル化手法は、上記した特定状況モデル化部５０で要素音ヒストグラムから特定状況モデルを生成した手法と同じものを用いる。 The element sound classification modeling unit 420 receives a set of C feature amounts output from the feature amount clustering unit 410 and generates a component sound classification model using a modeling method for the set. As the modeling method, the same method as the method for generating the specific situation model from the element sound histogram by the specific situation modeling unit 50 described above is used.

特定要素音モデルデータベース作成装置４００は、複数の要素音を含む音響信号列を、その特徴量で分類し、その分類したまとまり（組）から要素音分類モデルを生成する。 The specific element sound model database creation device 400 classifies an acoustic signal sequence including a plurality of element sounds based on the feature amount, and generates an element sound classification model from the classified group (set).

［第八実施形態］
図１４に、第八実施形態の状況推定装置５００の機能ブロック図を示す。その動作フローを図１５に示す。状況推定装置５００は、上記した特定要素音モデルデータベース作成装置３００で生成された特定要素音モデルを記憶した特定要素音モデルデータベース２０と、上記した特定状況モデルデータベース作成装置１００，２００で生成された特定状況モデルと状況分類モデルを記憶した特定状況モデルデータベース６０と、を用いて音響信号列が表す状況を推定するものである。状況推定装置５００は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 [Eighth embodiment]
In FIG. 14, the functional block diagram of the condition estimation apparatus 500 of 8th embodiment is shown. The operation flow is shown in FIG. The situation estimation apparatus 500 is generated by the specific element sound model database 20 that stores the specific element sound model generated by the specific element sound model database creation apparatus 300 described above, and by the specific situation model database creation apparatuses 100 and 200 described above. The specific situation model database 60 storing the specific situation model and the situation classification model is used to estimate the situation represented by the acoustic signal sequence. The situation estimation apparatus 500 is realized by a predetermined program being read into a computer including, for example, a ROM, a RAM, and a CPU, and the CPU executing the program.

状況推定装置５００は、音響特徴量計算装置１と、特定要素音モデルデータベース２０と、要素音モデル比較部３０と、要素音ヒストグラム化部４０と、状況判定モデル比較部５１０と、特定状況モデルデータベース６０と、を具備する。音響特徴量計算装置１は、入力される音響信号列を短時間フレームに分割し、当該フレーム毎に特徴量を抽出する（ステップＳ１０）。要素音モデル比較部３０は、音響特徴量計算装置１が出力する特徴量と、特定要素音モデルデータベース２０に記憶された特定要素音モデルまたは要素音分類モデルとを比較し、最も近いものをそれぞれの短時間音響信号の要素音と判定してフレーム毎の音響信号列に要素音ラベルを付与する（ステップＳ３０）。要素音ヒストグラム化部４０は、要素音モデル比較部３０から出力された特定要素音モデルのラベルまたはラベル付き音響信号列を入力として、上記フレームを所定数まとめたヒストグラムフレーム内の特定要素音モデルのラベルごとにその出現頻度である要素音ヒストグラムを作成する（ステップＳ４０）。ここまでの動作は、上記した特定状況モデルデータベース作成装置１００又は２００と同じである。 The situation estimation device 500 includes an acoustic feature quantity calculation device 1, a specific element sound model database 20, an element sound model comparison unit 30, an element sound histogram conversion unit 40, a situation determination model comparison unit 510, and a specific situation model database. 60. The acoustic feature quantity calculation device 1 divides the input acoustic signal sequence into short-time frames, and extracts feature quantities for each frame (step S10). The element sound model comparison unit 30 compares the feature amount output from the acoustic feature amount calculation device 1 with the specific element sound model or the element sound classification model stored in the specific element sound model database 20, and determines the closest ones. Is determined as an element sound of the short-time sound signal, and an element sound label is assigned to the sound signal string for each frame (step S30). The element sound histogram generation unit 40 receives the label of the specific element sound model or the labeled acoustic signal sequence output from the element sound model comparison unit 30 as an input, and outputs the specific element sound model in the histogram frame in which a predetermined number of the frames are collected. An element sound histogram which is the appearance frequency for each label is created (step S40). The operation so far is the same as that of the specific situation model database creation apparatus 100 or 200 described above.

状況判定モデル比較部５１０は、要素音ヒストグラムと、特定状況モデルデータベース６０に記憶された特定状況モデルまたは状況分類モデルを比較し、最も近いものを当該特定状況モデルが表す状況と推定してその推定結果を出力する。ここで比較は、複数の特定状況モデルと要素音ヒストグラムのユークリッド距離やコサイン距離などを用いて行う。 The situation determination model comparison unit 510 compares the element sound histogram and the specific situation model or the situation classification model stored in the specific situation model database 60, estimates the closest one as the situation represented by the specific situation model, and estimates the estimation. Output the result. Here, the comparison is performed using a plurality of specific situation models and the Euclidean distance and cosine distance of the element sound histogram.

場の状況の推定は、例えば、距離が最も近いモデルをその場の状況と推定する。予め定めた閾値よりも距離が近い場合には、距離が最も近いモデルをその場の状況と推定し、閾値よりも距離が近いモデルがない場合は「その他の状況」と推定すること等が考えられる。 For the estimation of the situation of the field, for example, the model having the closest distance is estimated as the situation of the scene. If the distance is closer than the predetermined threshold, the model with the closest distance is estimated as the current situation, and if there is no model with the distance closer than the threshold, it may be estimated as “other situations”. It is done.

以上説明した状況推定装置５００によれば、複数の異なる音によって初めて特徴付けられる場の状況の推定を、音響信号を用いて行うことを可能にする。また、要素音の判定モデルの生成にクラスタリング処理を導入することにより、特定音、特定状況のラベル付けが行われた音響信号を事前に用意することなく、場の状況推定を可能にする。 According to the situation estimation apparatus 500 described above, it is possible to estimate the situation of a field characterized for the first time by a plurality of different sounds using an acoustic signal. In addition, by introducing a clustering process into the generation of the element sound determination model, it is possible to estimate the situation of the field without preparing an acoustic signal labeled with a specific sound and a specific situation in advance.

状況推定装置５００を構成する特定要素音モデルデータベース２０を作成する特定要素音モデルデータベース作成装置３００は、ある特定音の音響信号の特徴量（ベクトル）に対して、モデル化手法を用いて特定要素音モデルを生成するものである。以降では、その技術思想を通信の場面に適用した場合のいくつかの装置について説明する。まず、遠隔地にいる通信相手が通信可能な状況なのかを知る目的で使用する発呼推薦モデルを生成する発呼推薦モデル生成装置６００について説明する。 The specific element sound model database creation device 300 that creates the specific element sound model database 20 constituting the situation estimation device 500 uses a modeling method for the feature amount (vector) of an acoustic signal of a specific sound. A sound model is generated. In the following, some devices when the technical idea is applied to a communication scene will be described. First, a call recommendation model generation apparatus 600 that generates a call recommendation model used for the purpose of knowing whether a communication partner in a remote place can communicate is described.

［第九実施形態］
第九実施形態の発呼推薦モデル生成装置６００は、通信システムの中で用いられ、通話が良く発生する場合の音響信号のモデル化と、通話があまり発生しない場合の音響信号のモデル化を行うものである。図１６に、発呼推薦モデル生成装置６００の機能ブロック図と、その発呼推薦モデル生成装置６００を一方の通信端末に接続した通信システム２０００の機能ブロック図を示す。 [Ninth embodiment]
The call recommendation model generation apparatus 600 according to the ninth embodiment is used in a communication system, and performs modeling of an acoustic signal when a telephone call frequently occurs and modeling of an acoustic signal when a telephone conversation does not occur so much. Is. FIG. 16 shows a functional block diagram of the call recommendation model generation device 600 and a functional block diagram of a communication system 2000 in which the call recommendation model generation device 600 is connected to one communication terminal.

通信システム２０００は、電話回線網若しくはインターネット等のネットワーク２０２０と、そのネットワーク２０２０を挟んで一方と他方に配置される通信端末２０１０と２０３０とで構成される。通信端末２０１０を例えば受話側、通信端末２０３０を例えば送話側とする。そして、通信端末２０１０には、発呼推薦モデル生成装置６００が接続されている。通信端末２０１０は、音響・映像信号提示部２０１１と音響・映像信号取得部２０１２を有する。通信端末２０３０側の音響・映像信号提示部と取得部の表記は省略している。 The communication system 2000 includes a network 2020 such as a telephone line network or the Internet, and communication terminals 2010 and 2030 arranged on one side and the other side of the network 2020. The communication terminal 2010 is, for example, a receiving side, and the communication terminal 2030 is, for example, a transmitting side. The communication terminal 2010 is connected with a call recommendation model generation device 600. The communication terminal 2010 includes an audio / video signal presentation unit 2011 and an audio / video signal acquisition unit 2012. The notation of the audio / video signal presentation unit and the acquisition unit on the communication terminal 2030 side is omitted.

発呼推薦モデル生成装置６００は、音響特徴量計算装置１と、通話履歴抽出部６０２と、発呼推薦モデル生成部６０３と、発呼推薦モデル保存部６０４と、を具備する。音響特徴量計算装置１は、一方の通信端末２０１０の音響・映像信号取得部２０１２から取得した音響信号列を短時間のフレームに分割し当該フレーム毎に特徴量を抽出する。抽出した特徴量は発呼推薦モデル生成部６０３に出力される。 The call recommendation model generation device 600 includes an acoustic feature quantity calculation device 1, a call history extraction unit 602, a call recommendation model generation unit 603, and a call recommendation model storage unit 604. The acoustic feature quantity calculation device 1 divides the acoustic signal sequence acquired from the audio / video signal acquisition unit 2012 of one communication terminal 2010 into short frames, and extracts feature quantities for each frame. The extracted feature amount is output to the call recommendation model generation unit 603.

通話履歴抽出部６０２は、通信端末２０１０からの通話履歴を随時受け取り新たな発呼/着呼が有ったことを示す発着呼信号を発呼推薦モデル生成部６０３に伝達すると共に通話履歴テーブルを作成する。図１７に、通話履歴テーブルの例を示す。通話履歴テーブルは、例えば、発信/着信時刻、通話終了時刻、通話時間、発呼/着呼、相手番号、履歴アドレス、の項目で構成される。図１７中の履歴アドレス０００２の通話終了時刻のnullは、「他方の通信端末２０３０からの着呼が有ったが一方の通話者が受話器をオフフックしなかった呼」であることを示す。また、履歴アドレス０００４のnullは、「一方の通信端末２０１０から発呼したが他方の通話者がオフフックしなかった呼」であることを示している。 The call history extraction unit 602 receives the call history from the communication terminal 2010 at any time and transmits an incoming / outgoing call signal indicating that there is a new call / incoming call to the call recommendation model generation unit 603 and also stores the call history table. create. FIG. 17 shows an example of a call history table. The call history table includes, for example, items of outgoing / incoming time, call end time, call time, outgoing / incoming call, destination number, and historical address. The null of the call end time of the history address 0002 in FIG. 17 indicates that “a call that has received an incoming call from the other communication terminal 2030 but one of the callers did not off-hook the receiver”. Further, null of the history address 0004 indicates that “the call originated from one communication terminal 2010 but the other party did not go off-hook”.

発呼推薦モデル生成部６０３は、通話履歴抽出分６０２が出力する発着呼信号に応答して、当該発着呼信号の直前の音響信号の特徴量の特徴量分類を識別する。その識別は、例えば、ユークリッド距離やコサイン距離などを用いてその距離の大きさの範囲で行われる。そして、発呼推薦モデル生成部６０３は、発呼履歴モデルテーブルを作成する。表１に発呼履歴モデルテーブルの例を示す。 In response to the incoming / outgoing call signal output from the call history extraction part 602, the outgoing call recommendation model generation unit 603 identifies the feature quantity classification of the feature quantity of the acoustic signal immediately before the incoming / outgoing call signal. The identification is performed within a range of the magnitude of the distance using, for example, a Euclidean distance or a cosine distance. Then, the call recommendation model generation unit 603 creates a call history model table. Table 1 shows an example of a call history model table.

図１７に示した履歴アドレス０００３と０００５が例えば特徴量分類ａに、履歴アドレス０００１と０００６が特徴量分類ｄに分類されている。 The history addresses 0003 and 0005 shown in FIG. 17 are classified, for example, in the feature amount classification a, and the history addresses 0001 and 0006 are classified in the feature amount classification d.

発呼推薦モデル生成部６０３は、発着呼信号の直前の音響信号の特徴量を分類した後、その履歴アドレスに対応する通話履歴テーブルから、特徴量分類に対する度合い付けを行う。度合い付けは、通話が良く発生する場合には通話の発生し易さの度合いの値が大きくなり、通話があまり発生しない場合には通話の発生し易さの度合いの値が小さくなるように行う。例えば次のような度合い付けを行う。 The call recommendation model generation unit 603 classifies the feature amount of the acoustic signal immediately before the incoming / outgoing call signal, and then grades the feature amount classification from the call history table corresponding to the history address. The leveling is performed so that when the number of calls often occurs, the value of the degree of ease of making the call becomes large, and when the number of calls does not occur so much, the value of the degree of ease of making the call becomes small. . For example, the following leveling is performed.

発呼が行われた時刻における発呼を行った側は、通話が良く発生する場合とみなし、通話の発生し易さの度合いＴに１を加算する。着呼があったのにオフフックしない場合は、通話があまり発生しない場合とみなし、通話の発生し易さの度合いＴから１を減算する。また、通話が発生した場合に、その通話時間に応じてＴに０.０〜２.０の値を加算する。また、通話が発生した場合でも、その通話時間が所定の時間（例えば６０秒）以内の場合は、通話があまり発生しない場合とみなしてＴから０.５を減算する。このように通話の発生し易さの度合いＴの値を調整することで、特徴量分類を、通話が発生し易いものと、通話が発生し難いものとに分けることができる。例えばＴの値が１０以上であれば通話が良く発生する、また、−１０以下であれば通話があまり発生しないと判断することができる。そして、そのようにして分類した特徴量分類と発生度合いＴとを対応付けて発呼推薦モデルとする。発呼推薦モデルは発呼推薦モデル保存部６０４に保存される。つまり、発呼推薦モデルは、特徴量（ベクトル）と発生度合いＴとが対応付けられた表である。したがって、発呼推薦モデルを用いて受話側の音響信号の特徴量を評価することで、受話側の通話が発生し易い状況であるか否かを知ることができる。 The side that made the call at the time when the call was made considers that the call often occurs, and adds 1 to the degree T of the ease of the call. If there is an incoming call but does not go off-hook, it is considered that the call does not occur so much, and 1 is subtracted from the degree T of the ease of the call. When a call occurs, a value of 0.0 to 2.0 is added to T according to the call time. Further, even when a call occurs, if the call time is within a predetermined time (for example, 60 seconds), it is considered that the call does not occur so much, and 0.5 is subtracted from T. Thus, by adjusting the value of the degree T of the likelihood of a call, the feature amount classification can be divided into those that are likely to cause a call and those that are unlikely to cause a call. For example, if the value of T is 10 or more, it can be determined that calls often occur, and if it is -10 or less, it is determined that calls do not occur much. Then, the feature quantity classification thus generated and the occurrence degree T are associated with each other to obtain a call recommendation model. The call recommendation model is stored in the call recommendation model storage unit 604. That is, the call recommendation model is a table in which the feature amount (vector) and the occurrence degree T are associated with each other. Therefore, by evaluating the feature quantity of the acoustic signal on the receiving side using the call recommendation model, it is possible to know whether or not the call on the receiving side is likely to occur.

なお、発呼推薦モデルの生成には、上記したように音響信号のみを用いても良いし、音響・映像信号取得部２０１２で取得した映像信号を利用しても良い。また、その他のセンサ（図１６に破線で示す６０５）で取得した例えば、照度情報や、温度情報、加速度情報等を用いて発呼推薦モデルを生成するようにしても良い。また、発呼推薦モデルは、通話が良く発生する場合のモデルと、あまり発生しない場合のモデルのどちらか一方のみを生成するようにしても良い。 Note that in order to generate the call recommendation model, only the audio signal may be used as described above, or the video signal acquired by the audio / video signal acquisition unit 2012 may be used. Further, the call recommendation model may be generated using, for example, illuminance information, temperature information, acceleration information, and the like acquired by other sensors (605 indicated by a broken line in FIG. 16). Further, as the call recommendation model, only one of a model in which calls frequently occur and a model in which calls are not frequently generated may be generated.

［第十実施形態］
図１８に、第十実施形態の発呼推薦モデル生成装置６１０の機能ブロック図を示す。発呼推薦モデル生成装置６１０が接続された通話端末２０１０は、通信システム２０００を構成する。発呼推薦モデル生成装置６１０は、例えば、足音、ガラスの割れる音、等の特定要素音と特徴量とを事前に対応付けたモデルを用意しておき、上記した発呼推薦モデル生成装置６００の音響特徴量計算装置１で抽出した特徴量から特定要素音を特定し、特定要素音を用いて発呼推薦モデルを生成するものである。 [Tenth embodiment]
FIG. 18 shows a functional block diagram of the call recommendation model generation device 610 of the tenth embodiment. The call terminal 2010 to which the call recommendation model generation device 610 is connected constitutes a communication system 2000. For example, the call recommendation model generation device 610 prepares a model in which specific element sounds such as footsteps, sound of glass breaking, and the like are associated with feature amounts in advance, and the call recommendation model generation device 600 described above. A specific element sound is specified from the feature amount extracted by the acoustic feature amount calculation apparatus 1, and a call recommendation model is generated using the specific element sound.

発呼推薦モデル生成装置６１０は、発呼推薦モデル生成装置６００に対して特定要素音モデルデータベース６１１と要素音特定部６１２とを更に備える点で異なる。特定要素音モデルデータベース６１１には、音響特徴量計算装置１で得られる特徴量（ベクトル）とその特徴量が表す要素音（例えば、足音、ガラスが割れる音、音声等）が対応付けられた要素音特定モデルが保存されている。特定要素音モデルデータベース６１１の作成方法は、第七実施形態と同じである。 The call recommendation model generation device 610 differs from the call recommendation model generation device 600 in that it further includes a specific element sound model database 611 and an element sound specification unit 612. In the specific element sound model database 611, an element in which a feature amount (vector) obtained by the acoustic feature amount calculation device 1 and an element sound represented by the feature amount (for example, footsteps, sound that breaks glass, speech, etc.) Sound specific model is stored. The method for creating the specific element sound model database 611 is the same as in the seventh embodiment.

要素音特定部６１２は、音響特徴量計算装置１で抽出した特徴量を入力として、特定要素音モデルデータベース６１１に保存されている各々の特定要素音モデルとそれぞれ比較して、距離（ユークリッド距離やコサイン距離など）が最も近いものをそのフレームの要素音と特定し、その特定結果を発呼推薦モデル生成部６０３に出力する。 The element sound specifying unit 612 receives the feature amount extracted by the acoustic feature amount calculation apparatus 1 as an input and compares it with each of the specific element sound models stored in the specific element sound model database 611 to determine the distance (Euclidean distance or The one having the closest cosine distance or the like is identified as the element sound of the frame, and the identification result is output to the call recommendation model generation unit 603.

発呼推薦モデル生成部６０３は、発着呼信号の直前の要素音で履歴アドレスを分類し、履歴アドレスに対応する通話履歴テーブルから、要素音に対する度合い付けを行う。要素音に対する度合い付けは上記した方法と同じである。 The outgoing call recommendation model generation unit 603 classifies the history addresses by the element sounds immediately before the incoming / outgoing call signal, and grades the element sounds from the call history table corresponding to the history addresses. The leveling for element sounds is the same as the method described above.

発呼推薦モデル生成部６０３は、要素音特定部６１２が出力する特定結果と通話履歴テーブルの履歴アドレスとを対応させて、通話が良く発生する場合と通話があまり発生しない場合の発呼推薦モデルを生成する。特定結果の要素音と発生度合いＴとが対応付けられた発呼推薦モデルは、発呼推薦モデル保存部６１３に保存される。モデル化は上記したのと同一の手法で行われる。 The call recommendation model generation unit 603 associates the identification result output from the element sound identification unit 612 with the history address of the call history table, and makes a call recommendation model when a call often occurs and when a call hardly occurs Is generated. The call recommendation model in which the element sound of the specific result is associated with the occurrence degree T is stored in the call recommendation model storage unit 613. Modeling is done in the same way as described above.

発呼推薦モデル生成装置６１０によれば、個々の特定結果の要素音と発生度合いＴとが対応付けられた発呼推薦モデルを生成することができる。表２にその例を示す。 According to the call recommendation model generation device 610, it is possible to generate a call recommendation model in which element sounds of individual specific results are associated with the degree of occurrence T. An example is shown in Table 2.

表２は、例えば受話側で人の声がしている時は通話が発生し易く、ドアの開閉音が発生した時は通話が発生し難いことを示している。

Table 2 shows that, for example, a call is likely to occur when a voice is heard on the receiving side, and a call is unlikely to occur when a door opening / closing sound is generated.

［第十一実施形態］
図１９に、第十一実施形態の発呼推薦モデル生成装置６２０の機能ブロック図を示す。発呼推薦モデル生成装置６２０は、一定時間の特徴量をクラスタリングし、クラスタ毎に生成されたモデルから、音響特徴量計算装置１で抽出した特徴量が属するクラスを判定し、判定した要素音のクラスを用いて発呼推薦モデルを生成するものである。 [Eleventh embodiment]
FIG. 19 shows a functional block diagram of the call recommendation model generation device 620 of the eleventh embodiment. The call recommendation model generation device 620 clusters feature amounts for a certain time, determines a class to which the feature amount extracted by the acoustic feature amount calculation device 1 belongs from a model generated for each cluster, and determines the element sound thus determined. A call recommendation model is generated using a class.

発呼推薦モデル生成装置６２０は、発呼推薦モデル生成装置６１０の特定要素音モデルデータベース６１１と要素音判定部６１２に代えて、特定要素音モデルデータベース６２１と要素音クラスタ判定部６２２を備える点で異なる。発呼推薦モデル生成装置６１０では、要素音に対応するモデルを生成するのに、例えば、足音、ガラスの割れる音などの音に対応する特徴量を事前に用意する必要があった。しかし、発生し得る全ての要素音にそれぞれ対応する特徴量を事前に用意することは困難である。 The call recommendation model generation device 620 includes a specific element sound model database 621 and an element sound cluster determination unit 622 instead of the specific element sound model database 611 and the element sound determination unit 612 of the call recommendation model generation device 610. Different. In the call recommendation model generation device 610, in order to generate a model corresponding to an element sound, for example, it is necessary to prepare in advance a feature amount corresponding to a sound such as a footstep or a sound of breaking glass. However, it is difficult to prepare in advance feature amounts corresponding to all the element sounds that can be generated.

そこで、発呼推薦モデル生成装置６２０は、要素音と特徴量の対応付けを事前に用意することなくモデル生成を行うようにしたものである。特定要素音モデルデータベース６２１には、音響特徴量計算装置１で得られるであろう一定時間の特徴量（ベクトル）をＧＭＭやＨＭＭやＳＶＭ等の手法を用いて分類して作成された要素音分類モデルが保存されている。特定要素音モデルデータベース６２１の作成方法は、第七実施形態と同じである。 Therefore, the call recommendation model generation device 620 generates a model without preparing an association between element sounds and feature amounts in advance. In the specific element sound model database 621, element sound classifications created by classifying feature quantities (vectors) for a certain period of time that would be obtained by the acoustic feature quantity calculation apparatus 1 using a technique such as GMM, HMM, or SVM. The model is saved. The method for creating the specific element sound model database 621 is the same as in the seventh embodiment.

要素音クラスタ判定部６２２は、音響特徴量計算装置１から取得した特徴量（ベクトル）を、特定要素音モデルデータベース６２１に保存されている要素音分類モデルと比較し、特徴量が属する分類クラスを判定し、分類結果を発呼推薦モデル生成部６２４に出力する。 The element sound cluster determination unit 622 compares the feature amount (vector) acquired from the acoustic feature amount calculation device 1 with the element sound classification model stored in the specific element sound model database 621, and determines the classification class to which the feature amount belongs. The classification result is output to the call recommendation model generation unit 624.

発呼推薦モデル生成部６２４は、要素音クラスタ判定部６２２が出力する分類クラスと通話履歴テーブルの履歴アドレスとを対応させて、通話が良く発生する場合と通話があまり発生しない場合の発呼推薦モデルを生成する。モデル化は上記したのと同一の手法で行われる。 The call recommendation model generation unit 624 associates the classification class output by the element sound cluster determination unit 622 with the history address of the call history table, and recommends calls when the call often occurs and when the call does not occur much. Generate a model. Modeling is done in the same way as described above.

発呼推薦モデル生成装置６２０によれば、要素音の分類クラスと発生度合いＴとが対応付けられた発呼推薦モデルを生成することができる。 According to the call recommendation model generation device 620, it is possible to generate a call recommendation model in which a classification class of element sounds and an occurrence degree T are associated with each other.

［第十二実施形態］
発呼推薦モデルは、通話の発生度合いＴと、他の情報とを対応付けたモデルとすることも可能である。例えば、上記した特徴量そのものに対してではなく、特徴量から推定できる動作/行動情報やその動作/行動情報を分類した動作/行動分類情報と、通話の発生度合いＴとを対応付けた発呼推薦モデルとしても良い。 [Twelfth embodiment]
The call recommendation model may be a model in which the degree T of occurrence of a call is associated with other information. For example, instead of the above-described feature quantity itself, the call / action information that can be estimated from the feature quantity, the action / behavior classification information that classifies the action / behavior information, and the call occurrence degree T are associated with each other. It may be a recommended model.

図２０に、動作/行動情報と、通話の発生度合いＴとを対応付けた発呼推薦モデルを生成する第十二実施形態の発呼推薦モデル生成装置６３０の機能ブロック図を示す。発呼推薦モデル生成装置６３０は、上記した発呼推薦モデル生成装置６１０の構成に、動作/行動特定モデル保存部６３１と動作/行動特定部６３２を追加したものである。 FIG. 20 is a functional block diagram of a call recommendation model generation device 630 according to the twelfth embodiment that generates a call recommendation model in which operation / behavior information is associated with a call occurrence degree T. The call recommendation model generating device 630 is obtained by adding an operation / behavior specifying model storage unit 631 and an operation / behavior specifying unit 632 to the configuration of the call recommendation model generating device 610 described above.

発呼推薦モデル生成装置６３０は、要素音特定部６１２で特定された要素音、足音、ガラスが割れる音、等の要素音と要素音を生じる動作や行動とを対応付けたモデルを用意しておくことで、要素音を生じる動作や行動を特定し、特定した動作や行動を用いて発呼推薦モデルを生成する。 The call recommendation model generation device 630 prepares a model in which element sounds such as element sounds, footsteps, and glass breaking sounds specified by the element sound specifying unit 612 are associated with actions and actions that generate element sounds. Thus, the action or action that generates the element sound is specified, and a call recommendation model is generated using the specified action or action.

動作/行動特定モデル保存部６３１には、要素音特定部６１２が出力する特定結果と、動作/行動（例えば、料理をしている、読書している、睡眠中等）が対応付けられた動作/行動特定モデルが保存されている。動作/行動特定モデル保存部６３１の作成方法は、第四実施形態の特定状況モデルデータベースの作成方法と同様である。動作/行動特定モデルは、例えば、20msec〜100msecの時間幅のフレームごとの要素音特定部６１２の出力を入力とし、そのフレームをＰ個まとめたヒストグラムフレームごとに、特定要素音モデルのラベルごとにその出現頻度である要素音ヒストグラムを作成し、そのヒストグラムの形状をモデル化手法を用いてモデル化したものである。 The action / behavior specifying model storage unit 631 is an action / action in which a specific result output by the element sound specifying unit 612 and an action / behavior (for example, cooking, reading, sleeping) are associated with each other. An action specific model is stored. The creation method of the action / behavior identification model storage unit 631 is the same as the creation method of the specific situation model database of the fourth embodiment. For example, the action / behavior specifying model receives, as an input, the output of the element sound specifying unit 612 for each frame having a time width of 20 msec to 100 msec, for each histogram frame in which P frames are collected, for each label of the specific element sound model. An element sound histogram representing the appearance frequency is created, and the shape of the histogram is modeled using a modeling technique.

動作/行動特定部６３２は、要素音特定部６１２が出力する特定結果をヒストグラム化し、動作/行動特定モデル保存部６３１に保存されている動作/行動特定モデルと比較し、最も類似する動作/行動分類モデルを特定することで動作/行動を特定し、動作/行動情報を発呼推薦モデル生成部６０３に出力する。 The action / behavior identification unit 632 creates a histogram of the identification results output from the element sound identification unit 612, compares it with the action / behavior identification model stored in the action / behavior identification model storage unit 631, and most similar operation / behavior. The action / behavior is specified by specifying the classification model, and the action / behavior information is output to the call recommendation model generation unit 603.

発呼推薦モデル生成部６３４は、動作/行動特定部６３２が出力する動作/行動情報と通話履歴テーブルの履歴アドレスとを対応させて、通話が良く発生する場合と通話があまり発生しない場合の発呼推薦モデルを生成する。モデル化は上記したのと同じ手法で行われ、発呼推薦モデル保存部６３３に保存される。 The call recommendation model generation unit 634 associates the operation / behavior information output by the operation / behavior identification unit 632 with the history address of the call history table, and generates a call when a call frequently occurs and when a call hardly occurs. Create a call recommendation model. Modeling is performed in the same manner as described above, and is stored in the call recommendation model storage unit 633.

発呼推薦モデル生成装置６３０によれば、動作/行動情報と発生度合いＴとが対応付けられた発呼推薦モデルを生成することができる。動作/行動情報とは、例えば、「料理をしている」、「読書をしている」、等の情報であり、それぞれに発生度合いＴの値が対応付けられた発呼推薦モデルとなる。 The call recommendation model generation device 630 can generate a call recommendation model in which the action / behavior information and the occurrence degree T are associated with each other. The action / behavior information is, for example, information such as “cooking” or “reading”, and is a call recommendation model in which the value of the degree of occurrence T is associated with each.

また、上記した発呼推薦モデル生成装置６２０の構成に、更に動作/行動特定モデル保存部６３１と動作/行動特定部６３２とを追加した構成の発呼推薦モデル生成装置６４０の機能構成例も考えられる。図２１に、発呼推薦モデル生成装置６４０の機能ブロック図を示す。 Further, a functional configuration example of the call recommendation model generation device 640 having a configuration in which an operation / behavior specification model storage unit 631 and an operation / behavior specification unit 632 are further added to the configuration of the call recommendation model generation device 620 described above is also considered. It is done. FIG. 21 shows a functional block diagram of the call recommendation model generation device 640.

発呼推薦モデル生成装置６４０は、要素音クラスタ判定部６２２で特定された要素音のクラスと要素音を生じる動作や行動とを対応付けたモデルを用意しておくことで、要素音を生じる動作や行動を特定し、特定した動作や行動を用いて発呼推薦モデルを生成するものである。 The call recommendation model generation device 640 prepares a model in which the element sound class specified by the element sound cluster determination unit 622 is associated with the action or action that generates the element sound, thereby generating the element sound. And an action are specified, and a call recommendation model is generated using the specified action and action.

発呼推薦モデル生成部６３４は、動作/行動特定部６３２が出力する動作/行動情報と通話履歴テーブルの履歴アドレスとを対応させて、通話が良く発生する場合と通話があまり発生しない場合の発呼推薦モデルを生成する。モデル化は上記したのと同じ手法で行われる。発呼推薦モデル生成装置６４０でも、動作/行動情報と発生度合いＴとが対応付けられた発呼推薦モデルを生成することができる。 The call recommendation model generation unit 634 associates the operation / behavior information output by the operation / behavior identification unit 632 with the history address of the call history table, and generates a call when a call frequently occurs and when a call hardly occurs. Create a call recommendation model. Modeling is done in the same way as described above. The call recommendation model generation device 640 can also generate a call recommendation model in which the action / behavior information and the occurrence degree T are associated with each other.

［第十三実施形態］
図２２に、上記した発呼推薦モデル生成装置６１０の構成に更に動作/行動分類モデル保存部６５１と動作/行動クラスタ判定部６５２を追加した第十三実施形態の発呼推薦モデル生成装置６５０の機能ブロック図を示す。 [Thirteenth embodiment]
FIG. 22 shows a call recommendation model generation device 650 according to the thirteenth embodiment in which an operation / behavior classification model storage unit 651 and an operation / behavior cluster determination unit 652 are further added to the configuration of the call recommendation model generation device 610 described above. A functional block diagram is shown.

発呼推薦モデル生成装置６５０は、要素音特定部６１２が特定した要素音が表す動作や行動のクラスを判定し、判定した動作や行動のクラスを用いて発呼推薦モデルを生成するものである。 The call recommendation model generation device 650 determines the action or action class represented by the element sound specified by the element sound specifying unit 612, and generates a call recommendation model using the determined action or action class. .

動作/行動分類モデル保存部６５１には、要素音特定部６１２から取得した複数フレームにわたる要素音の特定結果から、ヒストグラム化処理により生成された要素音ヒストグラムと、動作/行動（例えば、料理をしている、読書している、睡眠中等）が対応付けられた動作/行動特定モデルが保存されている。動作/行動特定モデル保存部６５１の作成方法は、第五実施形態の特定状況モデルデータベースの作成方法と同様である。例えば20msec〜100msecの時間幅のフレームごとの要素音特定部６１２の出力を入力とし、そのフレームをＰ個まとめたヒストグラムフレームごとに、特定要素音モデルのラベルごとにその出現頻度である要素音ヒストグラムを作成する。そのヒストグラムの形状が似ているもの同士で分類して、Ｂ個のヒストグラムのまとまり（組）にし、このＢ個の組に対してＧＭＭやＨＭＭ、ＳＶＭ等のモデル化手法を用いてＢ種類の動作/行動特定モデルを生成する。 In the action / behavior classification model storage unit 651, the element sound histogram generated by the histogram processing from the element sound identification results over a plurality of frames acquired from the element sound specifying unit 612, and the action / behavior (for example, cooking) , Reading, sleeping, etc.) are stored. The creation method of the action / behavior identification model storage unit 651 is the same as the creation method of the specific situation model database of the fifth embodiment. For example, an output of the element sound specifying unit 612 for each frame having a time width of 20 msec to 100 msec is input, and an element sound histogram which is an appearance frequency for each label of the specific element sound model for each histogram frame in which P frames are collected. Create By classifying the histograms that are similar in shape into a set (group) of B histograms, and using these B sets using a modeling method such as GMM, HMM, or SVM, Generate an action / behavior specific model.

動作/行動クラスタ判定部６５２は、要素音特定部６１２から取得した要素音特定結果から頻度特徴量を計算し、動作/行動分類モデル保存部６５１に保存されている動作/行動分類モデルと比較し、最も類似する動作/行動分類モデルを、その特定結果が表す動作/行動分類として特定し、動作/行動分類情報を発呼推薦モデル生成部６０３に出力する。 The action / behavior cluster determination unit 652 calculates a frequency feature amount from the element sound identification result acquired from the element sound identification unit 612 and compares it with the action / behavior classification model stored in the action / behavior classification model storage unit 651. Then, the most similar operation / behavior classification model is identified as the operation / behavior classification represented by the identification result, and the operation / behavior classification information is output to the call recommendation model generation unit 603.

発呼推薦モデル生成部６５４は、動作/行動特定部６５２が出力する動作/行動分類情報と通話履歴テーブルの履歴アドレスとを対応させて、通話が良く発生する場合と通話があまり発生しない場合の発呼推薦モデルを生成する。動作/行動分類情報と発生度合いＴとが対応付けられた発呼推薦モデルは、発呼推薦モデル保存部６５３に保存される。 The call recommendation model generation unit 654 associates the action / behavior classification information output from the action / behavior identification unit 652 with the history address of the call history table, and the case where the call often occurs and the case where the call does not occur much A call recommendation model is generated. The call recommendation model in which the action / behavior classification information and the occurrence degree T are associated with each other is stored in the call recommendation model storage unit 653.

発呼推薦モデル生成装置６５０によれば、動作/行動分類情報と発生度合いＴとが対応付けられた発呼推薦モデルを生成することができる。 The call recommendation model generation device 650 can generate a call recommendation model in which the action / behavior classification information and the occurrence degree T are associated with each other.

また、上記した発呼推薦モデル生成装置６２０の構成に、更に動作/行動分類モデル保存部６５１と動作/行動クラスタ判定部６５２を追加した発呼推薦モデル生成装置６６０の機能構成例も考えられる。図２３に、発呼推薦モデル生成装置６６０の機能ブロック図を示す。 Also, a functional configuration example of the call recommendation model generation device 660 in which an operation / behavior classification model storage unit 651 and an operation / behavior cluster determination unit 652 are further added to the configuration of the call recommendation model generation device 620 described above is also conceivable. FIG. 23 shows a functional block diagram of the call recommendation model generation device 660.

発呼推薦モデル生成装置６６０は、要素音クラスタ判定部６２２が判定した要素音のクラスが表す動作や行動のクラスを判定し、判定した動作や行動のクラスを用いて発呼推薦モデルを生成するものである。 The call recommendation model generation device 660 determines an action or action class represented by the element sound class determined by the element sound cluster determination unit 622, and generates a call recommendation model using the determined action or action class. Is.

発呼推薦モデル生成部６５４は、動作/行動クラスタ判定部６５２が出力する動作/行動分類情報と通話履歴テーブルの履歴アドレスとを対応させて、通話が良く発生する場合と通話があまり発生しない場合の発呼推薦モデルを生成する。モデル化は上記したのと同じ手法で行われる。発呼推薦モデル生成装置６６０でも、動作/行動分類情報と発生度合いＴとが対応付けられた発呼推薦モデルを生成することができる。 The call recommendation model generation unit 654 associates the action / behavior classification information output by the action / behavior cluster determination unit 652 with the history address of the call history table, and the case where the call often occurs and the case where the call hardly occurs Generate a call recommendation model. Modeling is done in the same way as described above. The call recommendation model generation device 660 can also generate a call recommendation model in which the action / behavior classification information and the occurrence degree T are associated with each other.

なお、第十実施形態から第十三実施形態に記載した発呼推薦モデル生成装置の発呼推薦モデルの生成には、音響信号のみを用いた例を説明したが、音響・映像信号取得部２０１２で取得した映像信号や、その他のセンサで取得した例えば、照度情報や、温度情報、加速度情報等を用いて発呼推薦モデルを生成するようにしても良いことは、第九実施形態の発呼推薦モデル生成装置６００と同じである。また、発呼推薦モデルは、通話が良く発生する場合のモデルと、あまり発生しない場合のモデルのどちらか一方のみを生成するようにしても良いことも同様である。 In addition, although the example using only an audio signal was demonstrated for the generation of the call recommendation model of the call recommendation model generation apparatus described in the tenth embodiment to the thirteenth embodiment, the audio / video signal acquisition unit 2012 has been described. The call recommendation model according to the ninth embodiment may be configured to generate a call recommendation model using, for example, the illuminance information, the temperature information, the acceleration information, or the like acquired by the video signal acquired by the other sensor. This is the same as the recommended model generation apparatus 600. Similarly, as the call recommendation model, only one of a model in which calls are frequently generated and a model in which calls are not frequently generated may be generated.

以上、遠隔地にいる通信相手が通信可能な状況なのかを知る目的で使用する発呼推薦モデルを生成する発呼推薦モデル生成装置について説明した。次に、発呼推薦モデル生成装置６００〜６６０を用いて、遠隔地にいる通信相手が通話可能な状態でないのにも関わらず受話者に発呼してしまう課題を解決する発呼適否通知装置について説明する。 As described above, the call recommendation model generation apparatus that generates the call recommendation model used for the purpose of knowing whether or not the communication partner in the remote place can communicate is described. Next, using the call recommendation model generation devices 600 to 660, a call suitability notification device that solves the problem of calling a receiver even though a remote communication partner is not in a call-capable state Will be described.

［第十四実施形態］
図２４に、第十四実施形態の発呼適否通知装置７００を含む通信システム２０００の機能ブロック図を示す。通信システム２０００の機能構成は上記したものと同じである。発呼適否通知装置７００は、受話側の通信端末２０１０に接続され、発呼推薦モデル生成装置６００で生成した発呼推薦モデルを用いて受話者側が現在通話可能な状況にあるのか否かを通知するものである。 [14th embodiment]
FIG. 24 shows a functional block diagram of a communication system 2000 including the call suitability notification device 700 of the fourteenth embodiment. The functional configuration of the communication system 2000 is the same as that described above. The calling suitability notification device 700 is connected to the communication terminal 2010 on the receiving side, and notifies whether or not the receiving side is currently in a state where the call is possible using the call recommendation model generated by the call recommendation model generating device 600. To do.

発呼適否通知装置７００は、音響特徴量計算装置１と、発呼推薦モデル保存部６０４と、発呼推薦状況判定部７０１と、を具備する。音響特徴量計算装置１と発呼推薦モデル保存部６０４は、発呼推薦モデル生成装置６００と同じものである。 The call suitability notification device 700 includes the acoustic feature quantity calculation device 1, a call recommendation model storage unit 604, and a call recommendation status determination unit 701. The acoustic feature quantity calculation device 1 and the call recommendation model storage unit 604 are the same as the call recommendation model generation device 600.

発呼推薦状況判定部７０１は、音響特徴量計算装置１が出力する音響信号列から抽出した特徴量と、発呼推薦モデル保存部６０４に保存された特徴量（ベクトル）と発生度合いＴとが対応付けられた発呼推薦モデルとを照合して発呼の適否を送話側に通知する発呼適否通知情報を出力する。発呼の適否は、上記した通話の発生度合いＴの値を、ある閾値と比較し、例えばＴの値が１０以上であれば発呼に適、又、Ｔの値が−１０以下で有れば発呼に不適、と判定する。 The call recommendation status determination unit 701 includes a feature amount extracted from the acoustic signal sequence output by the acoustic feature amount calculation device 1, a feature amount (vector) stored in the call recommendation model storage unit 604, and an occurrence degree T. A call suitability notification information for notifying the sending side of the suitability of the call is output by collating with the corresponding call recommendation model. Appropriateness of the call is determined by comparing the value T of the above-mentioned call occurrence T with a certain threshold value. For example, if the value of T is 10 or more, the call is suitable, and the value of T is -10 or less. If it is not suitable for outgoing calls.

発呼適否通知情報は、通話端末２０１０に入力されネットワーク２０２０を介して送話側の通話端末２０３０に送信される。発呼適否通知情報を受信した送話側の通信端末２０３０は、受話者が通話可能な状況であるかを表示する。その表示はＬＥＤランプ等の点灯や、液晶パネルの表示等の図示していない発呼推薦情報表示手段によって行われる。 The call suitability notification information is input to the call terminal 2010 and transmitted to the call terminal 2030 on the transmitting side via the network 2020. The communication terminal 2030 on the transmission side that has received the call suitability notification information displays whether or not the receiver is in a state where a call is possible. The display is performed by calling recommendation information display means (not shown) such as lighting of an LED lamp or the like, or display of a liquid crystal panel.

なお、設計的事項に関わる発呼適否通知情報の出力間隔は、例えば数秒から数分間隔で行われるものとする。また、発呼適否通知装置７００や発呼推薦情報表示手段の機能は、通話端末（２０１０，２０３０）と一体に構成するようにしても良い。また、発呼適否通知情報を通知する送話者側の通信端末を特定する方法は、この発明の要部ではないのでその説明は省略するが、事前に複数の送話者の通話端末を登録しておくことで簡単に実現することが可能である。 The output interval of the call suitability notification information related to the design matter is assumed to be performed at intervals of several seconds to several minutes, for example. The functions of the call suitability notification device 700 and the call recommendation information display unit may be configured integrally with the call terminals (2010, 2030). In addition, the method for identifying the communication terminal on the side of the caller notifying the call suitability notification information is not the main part of the present invention, so the description thereof will be omitted, but the call terminals of a plurality of callers are registered in advance. This can easily be realized.

［第十五実施形態］
図２５に、第十五実施形態の発呼適否通知装置７１０を含む通信システム２０００の機能ブロック図を示す。発呼適否通知装置７１０は、受話側の通信端末２０１０に接続され、発呼推薦モデル生成装置６１０で生成した発呼推薦モデルを用いて受話者側が現在通話可能な状況にあるのか否かを通知するものである。 [Fifteenth embodiment]
FIG. 25 is a functional block diagram of a communication system 2000 including the call suitability notification device 710 according to the fifteenth embodiment. The call adequacy notification device 710 is connected to the communication terminal 2010 on the receiving side, and notifies whether or not the callee is currently in a state where a call is possible using the call recommendation model generated by the call recommendation model generation device 610. To do.

発呼適否通知装置７１０は、音響特徴量計算装置１と、特定要素音モデル保存部６１１と、要素音特定部６１２と、発呼推薦モデル保存部６１３と、発呼推薦状況判定部７１１と、を具備する。音響特徴量計算装置１と特定要素音モデル保存部６１１と要素音特定部６１２と発呼推薦モデル保存部６１３とは、発呼推薦モデル生成装置６１０と同じものである。 The call suitability notification device 710 includes an acoustic feature quantity calculation device 1, a specific element sound model storage unit 611, an element sound specification unit 612, a call recommendation model storage unit 613, a call recommendation status determination unit 711, It comprises. The acoustic feature quantity calculation device 1, the specific element sound model storage unit 611, the element sound specification unit 612, and the call recommendation model storage unit 613 are the same as the call recommendation model generation device 610.

発呼推薦状況判定部７１１は、要素音特定部６１２が出力する要素音と、例えば、足音、ガラスが割れる音等の音声と通話の発生度合いＴとが対応付けられた発呼推薦モデルとを照合して発呼の適否を送話側に通知する発呼適否通知情報を出力する。これ以降の動作は、発呼適否通知装置７００と同じである。 The call recommendation status determination unit 711 generates an element sound output from the element sound identification unit 612 and a call recommendation model in which a voice such as a footstep or a glass breaking sound is associated with an occurrence degree T of a call. The call suitability notification information for notifying the sending side of the suitability of the call by collating is output. The subsequent operation is the same as that of the call suitability notification device 700.

［第十六実施形態］
図２６に、第十六実施形態の発呼適否通知装置７２０を含む通信システム２０００の機能ブロック図を示す。発呼適否通知装置７２０は、受話側の通信端末２０１０に接続され、発呼推薦モデル生成装置６２０で生成した発呼推薦モデルを用いて受話者側が現在通話可能な状況にあるのか否かを通知するものである。 [Sixteenth embodiment]
FIG. 26 shows a functional block diagram of a communication system 2000 including the call suitability notification device 720 of the sixteenth embodiment. The call suitability notification device 720 is connected to the communication terminal 2010 on the receiving side, and notifies whether or not the receiver side is currently in a state where the call is possible using the call recommendation model generated by the call recommendation model generating device 620. To do.

発呼適否通知装置７２０は、音響特徴量計算装置１と、特定要素音モデルデータベース６２１と、要素音クラスタ判定部６２２と、発呼推薦モデル保存部６２３と、発呼推薦状況判定部７２１と、を具備する。音響特徴量計算装置１と、特定要素音モデルデータベース６２１と、要素音クラスタ判定部６２２と、発呼推薦モデル保存部６２３とは、発呼推薦モデル生成装置６２０と同じものである。 The call suitability notification device 720 includes an acoustic feature quantity calculation device 1, a specific element sound model database 621, an element sound cluster determination unit 622, a call recommendation model storage unit 623, a call recommendation status determination unit 721, It comprises. The acoustic feature quantity calculation device 1, the specific element sound model database 621, the element sound cluster determination unit 622, and the call recommendation model storage unit 623 are the same as the call recommendation model generation device 620.

発呼推薦状況判定部７２１は、要素音クラスタ判定部６２２が出力する分類クラスと、分類クラスと通話の発生度合いＴとが対応付けられた発呼推薦モデルとを照合して発呼の適否を送話側に通知する発呼適否通知情報を出力する。これ以降の動作は、発呼適否通知装置７００と同じである。 The call recommendation status determination unit 721 checks the suitability of the call by comparing the classification class output from the element sound cluster determination unit 622 with the call recommendation model in which the classification class is associated with the call generation degree T. The call suitability notification information to be notified to the transmitting side is output. The subsequent operation is the same as that of the call suitability notification device 700.

［第十七実施形態］
発呼適否通知装置は、通話の発生度合いＴと、他の情報とを対応付けた発呼推薦モデルを用いても構成することが可能である。例えば、上記した特徴量そのものに対してではなく、特徴量から推定できる動作/行動情報やその動作/行動情報を分類した動作/行動分類情報と、通話の発生度合いＴとを対応付けた発呼推薦モデルとしても良い。 [Seventeenth embodiment]
The call suitability notification device can also be configured using a call recommendation model in which the degree T of occurrence of a call is associated with other information. For example, instead of the above-described feature quantity itself, the call / action information that can be estimated from the feature quantity, the action / behavior classification information that classifies the action / behavior information, and the call occurrence degree T are associated with each other. It may be a recommended model.

図２７に、その場合の発呼適否通知装置７３０の機能ブロック図を示す。第十七実施形態の発呼適否通知装置７３０は、音響特徴量計算装置１と、対応付け部７３２と、発呼推薦モデル保存部６３３と、発呼推薦状況判定部７３１と、を具備する。音響特徴量計算装置１は、複数の要素音を含む受話者側の音響信号列を短時間フレームに分割し、当該フレーム毎に特徴量を抽出する。対応付け部７３２は、特徴量と、受話者側の動作/行動情報とを対応付けた対応付け情報を出力する。発呼推薦モデル保存部６３３は、特徴量から特定される動作/行動情報と、通話の発生し易さの度合いとを対応付けた発呼推薦モデルを保存する。発呼推薦状況判定部７３１は、対応付け情報を入力として、動作/行動情報で一致する発呼推薦モデルを参照して受話者側において通話が良く発生する状況か若しくは通話があまり発生しない状況かを判定した通話適否通知情報を、通話者側に送信する。 FIG. 27 shows a functional block diagram of the call suitability notification device 730 in that case. The call suitability notification device 730 of the seventeenth embodiment includes the acoustic feature quantity calculation device 1, an association unit 732, a call recommendation model storage unit 633, and a call recommendation status determination unit 731. The acoustic feature quantity calculation device 1 divides a listener's acoustic signal sequence including a plurality of element sounds into short-time frames, and extracts feature quantities for each frame. The associating unit 732 outputs associating information in which the feature amount is associated with the behavior / behavior information on the listener side. The call recommendation model storage unit 633 stores a call recommendation model in which the action / behavior information specified from the feature amount is associated with the degree of ease of occurrence of a call. The call recommendation situation determination unit 731 receives the association information as input, refers to the call recommendation model that matches with the action / behavior information, or is a situation in which a call frequently occurs on the receiver side or a situation in which a call hardly occurs? Is transmitted to the caller side.

発呼適否通知装置７３０は、発呼推薦モデル生成装置６３０で生成した発呼推薦モデルを用いて受話者側が現在通話可能な状況にあるのか否かを通知するものである。 The call suitability notification device 730 notifies whether or not the receiver side is currently in a callable state using the call recommendation model generated by the call recommendation model generation device 630.

発呼適否通知装置７３０は、音響特徴量計算装置１と、対応付け部７３２と、発呼推薦モデル保存部６３３と、発呼推薦状況判定部７３１と、を具備する。対応付け部７３２は、特定要素音モデルデータベース６１１と要素音特定部６１２と動作/行動特定モデル保存部６３１と動作/行動特定部６３２とで構成される。この構成は、発呼推薦モデル生成装置６３０と同じである。 The call suitability notification device 730 includes the acoustic feature amount calculation device 1, an association unit 732, a call recommendation model storage unit 633, and a call recommendation status determination unit 731. The associating unit 732 includes a specific element sound model database 611, an element sound specifying unit 612, an action / behavior specifying model storage unit 631, and an action / behavior specifying unit 632. This configuration is the same as that of the call recommendation model generation device 630.

発呼推薦状況判定部７３１は、対応付け部７３２が出力する特徴量と受話者側の動作/
行動情報とを対応付けた対応付け情報を入力として、動作/行動情報で一致する発呼推薦
モデルを参照して受話者側において通話が良く発生する状況か若しくは通話があまり発生しない状況かを判定した通話適否通知情報を、通話者側に送信する。 The call recommendation situation determination unit 731 determines the feature amount output by the associating unit 732 and the operation / value on the receiver side.
Using association information associated with action information as input, refer to the call recommendation model that matches with the action / behavior information to determine whether the call is likely to occur on the receiver side or is not likely to occur. The call success / failure notification information is transmitted to the caller.

図２８に、発呼適否通知装置７４０の機能ブロック図を示す。発呼適否通知装置７４０は、発呼推薦モデル生成装置６４０で生成した発呼推薦モデルを用いて受話者側が現在通話可能な状況にあるのか否かを通知するものである。 FIG. 28 is a functional block diagram of the call suitability notification device 740. The call suitability notification device 740 uses the call recommendation model generated by the call recommendation model generation device 640 to notify whether or not the receiver side is currently in a callable state.

発呼適否通知装置７４０は、音響特徴量計算装置１と、対応付け部７４２と、発呼推薦モデル保存部６３３と、発呼推薦状況判定部７４１と、を具備する。対応付け部７４２は、特定要素音モデルデータベース６２１と要素音クラスタ判定部６２２と動作/行動特定モデル保存部６３１と動作/行動特定部６３２とで構成される。この構成は、発呼推薦モデル生成装置６３０と同じである。 The call suitability notification device 740 includes the acoustic feature quantity calculation device 1, an association unit 742, a call recommendation model storage unit 633, and a call recommendation status determination unit 741. The associating unit 742 includes a specific element sound model database 621, an element sound cluster determining unit 622, an action / behavior specifying model storage unit 631, and an action / behavior specifying unit 632. This configuration is the same as that of the call recommendation model generation device 630.

発呼推薦状況判定部７４１は、対応付け部７４２が出力する特徴量と動作/行動情報と
を対応付けた対応付け情報を入力として、動作/行動情報で一致する発呼推薦モデルを参
照して受話者側において通話が良く発生する状況か若しくは通話があまり発生しない状況かを判定した通話適否通知情報を、通話者側に送信する。 The call recommendation situation determination unit 741 receives associating information that associates the feature amount output by the associating unit 742 with the action / behavior information, and refers to the call recommendation model that matches the action / behavior information. Call adequacy notification information is transmitted to the caller's side, which determines whether the call is frequently occurring on the receiver side or is not likely to occur.

［第十八実施形態］
図２９に、第十八実施形態の発呼適否通知装置７５０の機能ブロック図を示す。発呼適否通知装置７５０は、発呼推薦モデル生成装置６５０で生成した発呼推薦モデルを用いて受話者側が現在通話可能な状況にあるのか否かを通知するものである。 [Eighteenth embodiment]
FIG. 29 shows a functional block diagram of the call suitability notification device 750 of the eighteenth embodiment. The call adequacy notification device 750 notifies whether or not the receiver side is currently in a callable state using the call recommendation model generated by the call recommendation model generation device 650.

発呼適否通知装置７５０は、音響特徴量計算装置１と、対応付け部７５２と、発呼推薦モデル保存部６５３と、発呼推薦状況判定部７５１と、を具備する。対応付け部７５２は、特定要素音モデルデータベース６１１と要素音特定部６１２と動作/行動分類モデル保存部６５１と動作/行動クラスタ特定部６５２とで構成される。この構成は、発呼推薦モデル生成装置６５０と同じである。 The call suitability notification device 750 includes the acoustic feature quantity calculation device 1, an association unit 752, a call recommendation model storage unit 653, and a call recommendation status determination unit 751. The associating unit 752 includes a specific element sound model database 611, an element sound specifying unit 612, an action / behavior classification model storage unit 651, and an action / behavior cluster specifying unit 652. This configuration is the same as that of the call recommendation model generation device 650.

発呼推薦状況判定部７５１は、対応付け部７５２が出力する特徴量と動作/行動分類情
報を入力として動作/行動分類情報で一致する発呼推薦モデルを参照して受話者側におい
て通話が良く発生する状況か若しくは通話があまり発生しない状況かを判定した通話適否通知情報を、通話者側に送信する。 The call recommendation status determination unit 751 receives the feature amount output from the association unit 752 and the action / behavior classification information as input, and refers to the call recommendation model that matches the action / behavior classification information, so that the call on the receiver side is good. Call adequacy notification information that determines whether the situation occurs or the situation in which a call does not occur so much is transmitted to the caller side.

図３０に、発呼適否通知装置７６０の機能ブロック図を示す。発呼適否通知装置７６０は、発呼推薦モデル生成装置６６０で生成した発呼推薦モデルを用いて受話者側が現在通話可能な状況にあるのか否かを通知するものである。 FIG. 30 shows a functional block diagram of the call suitability notification device 760. The call suitability notification device 760 uses the call recommendation model generated by the call recommendation model generation device 660 to notify whether or not the receiver side is currently in a callable state.

発呼適否通知装置７６０は、音響特徴量計算装置１と、対応付け部７６２と、発呼推薦モデル保存部６５３と、発呼推薦状況判定部７６１と、を具備する。対応付け部７６２は、特定要素音モデルデータベース６２１と要素音クラスタ判定部６２２と動作/行動分類モデル保存部６５１と動作/行動クラスタ判定部６５２とで構成される。この構成は、発呼推薦モデル生成装置６６０と同じである。 The call suitability notification device 760 includes the acoustic feature quantity calculation device 1, an association unit 762, a call recommendation model storage unit 653, and a call recommendation status determination unit 761. The association unit 762 includes a specific element sound model database 621, an element sound cluster determination unit 622, an action / behavior classification model storage unit 651, and an action / behavior cluster determination unit 652. This configuration is the same as that of the call recommendation model generation device 660.

発呼推薦状況判定部７６１は、対応付け部７６２が出力する特徴量と動作/行動分類情
報とを対応付けた対応付け情報を入力として動作/行動分類情報で一致する発呼推薦モデ
ルを参照して受話者側において通話が良く発生する状況か若しくは通話があまり発生しない状況かを判定した通話適否通知情報を、通話者側に送信する。 The call recommendation state determination unit 761 refers to the call recommendation model that matches the action / behavior classification information by using the association information that associates the feature amount output by the association unit 762 with the action / behavior classification information as an input. Thus, call adequacy notification information is transmitted to the caller side for determining whether the call is frequently generated on the receiver side or whether the call hardly occurs.

以上述べたようにこの発明の発呼適否通知装置７００〜７６０によれば、特徴量、要素音、分類クラス、動作/行動情報、動作/行動分類情報の何れかによって受話者側において、受話者が現在通話可能な状況にあるのか否かを判定し、判定結果（発呼適否通知情報）を送話者側に通知することができる。 As described above, according to the calling suitability notification devices 700 to 760 of the present invention, the listener can use the feature amount, element sound, classification class, operation / behavior information, or operation / behavior classification information on the receiver side. It is possible to determine whether or not the telephone is currently in a callable state and notify the determination result (calling suitability notification information) to the transmitter side.

上記各装置及び方法において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 The processes described in the above apparatuses and methods are not only executed in time series according to the order of description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. .

また、上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 Further, when the processing means in the above apparatus is realized by a computer, the processing contents of functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）/ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims

A frame dividing unit that divides the input acoustic signal into frames of a predetermined time length;
The acoustic signal of each frame is divided into K sections, p ^- _k is an average value of the index indicating the magnitude of the acoustic signal of the kth section of each frame, and Δp ^- _k is the _kth of each frame. The value defined by the following equation is calculated with the rate of change of p ^- _k in the interval of m, m being a predetermined integer of 2 or more, and if the value is 0 or more, the value is calculated as the rise characteristic of each frame. and gender, and the feature amount extraction unit if the value is less than 0 with the rising characteristic calculation unit to 0 the rising characteristics of each frame,

An acoustic feature quantity calculation device including

In the acoustic feature amount calculation apparatus according to claim 1,
The suddenness calculation unit calculates the suddenness indicating the degree of concentration in the time domain of the acoustic signal of each frame.
The time diffusivity calculation unit calculates time diffusivity indicating the degree of diffusion in the time domain of the acoustic signal of each frame.
The narrowband calculation unit calculates the narrowbandness indicating the degree of concentration in the frequency domain of the acoustic signal of each frame,
The band spread calculation unit calculates the band spread indicating the degree of spread in the frequency domain of the acoustic signal of each frame.
The pitch characteristic calculation unit calculates a pitch characteristic indicating the degree of uneven distribution of energy in the frequency domain of the acoustic signal of each frame,
The amplitude unevenness calculation unit calculates the amplitude unevenness indicating the degree of uneven distribution of the amplitude value distribution of the acoustic signal of each frame,
The feature amount extraction unit includes the suddenness calculation unit, the time diffusivity calculation unit, the narrowband calculation unit, the band spreadability calculation unit, the pitch characteristic calculation unit, and the amplitude unevenness. And further comprising at least one of a calculation unit,
Acoustic feature quantity calculation device.

The acoustic feature quantity calculation device according to claim 1 or 2,
A specific element sound model database storing specific element sound models of a plurality of specific element sounds;
The feature element calculated by the acoustic feature quantity calculation device and the specific element sound model stored in the specific element sound model database are compared, and the label of the specific element sound model of the most similar model, or the specific element sound An element sound model comparison unit that outputs a labeled acoustic signal sequence in which a label of the model is assigned to the acoustic signal sequence;
The frequency of appearance for each label of the specific element sound model in a histogram frame in which a predetermined number of the frames are collected with the label of the specific element sound model or the labeled acoustic signal sequence output from the element sound model comparison unit as an input. An element sound histogram generator for creating an element sound histogram;
Using the element sound histogram as an input, a specific situation modeling unit that generates a specific situation model corresponding to the specific field using a modeling method for the element sound histogram;
A specific situation model database creation device comprising:

The acoustic feature quantity calculation device according to claim 1 or 2,
A specific element sound model database storing specific element sound models of a plurality of specific element sounds;
The feature element calculated by the acoustic feature quantity calculation device and the specific element sound model stored in the specific element sound model database are compared, and the label of the specific element sound model of the most similar model, or the specific element sound An element sound model comparison unit that outputs a labeled acoustic signal sequence in which a label of the model is assigned to the acoustic signal sequence;
The frequency of appearance for each label of the specific element sound model in a histogram frame in which a predetermined number of the frames are collected with the label of the specific element sound model or the labeled acoustic signal sequence output from the element sound model comparison unit as an input. An element sound histogram generator for creating an element sound histogram;
A distribution clustering processing unit for creating an element sound classification obtained by classifying the plurality of element sound histograms according to the shape of the distribution;
With the element sound classification as an input, a situation classification modeling unit that generates a situation classification model using a modeling method for the element sound classification;
A specific situation model database creation device comprising:

A specific element sound model database creation device for creating the specific element sound model database of the specific situation model database creation device according to claim 3 or 4,
The acoustic feature quantity calculation device according to claim 1 or 2,
A specific element sound modeling unit that generates a specific element sound model using a modeling method for the feature amount, using the feature amount calculated by the acoustic feature amount calculation device;
A specific element sound model database creation device comprising:

A specific element sound model database creation device for creating the specific element sound model database of the specific situation model database creation device according to claim 3 or 4,
The acoustic feature quantity calculation device according to claim 1 or 2,
A feature amount clustering unit that classifies the feature amounts calculated by the acoustic feature amount calculation device and creates a feature amount classification;
An element sound classification modeling unit that generates the element sound classification model using the modeling method for the feature quantity classification, using the feature quantity classification as an input;
A specific element sound model database creation device comprising:

The acoustic feature quantity calculation device according to claim 1 or 2,
A specific element sound model database storing the specific element sound model generated by the specific element sound model database creation device according to claim 5;
An element that compares the specific element sound model with the feature amount calculated by the acoustic feature amount calculation device, determines the closest one as the element sound of each short-time acoustic signal, and assigns an element sound label for each frame The sound model comparison section,
Using the labeled acoustic signal sequence as an input, an element sound histogram generating unit that creates an element sound histogram of the label of the specific element sound model and its frequency,
A specific situation model database storing a plurality of specific situation models and situation classification models generated by the specific situation model database creation device according to claim 3;
A situation judgment model comparison that compares the element sound histogram with the specific situation model or the situation classification model, estimates the most similar one as the situation represented by the specific situation model or the situation classification model, and outputs a situation estimation result And
A situation estimation apparatus comprising:

The acoustic feature quantity calculation device according to claim 1 or 2,
A call recommendation model storage unit that stores a call recommendation model that correlates the degree of ease of occurrence of a call;
And inputs the feature quantity the acoustic feature quantity calculation device has calculated, or situations where circumstances to or call the feature amount is the call recommending call model with reference to the receiving speaker side may occur a match is not much generated A call adequacy notification device comprising: a call recommendation status determination unit that determines call adequacy notification information and transmits the call adequacy notification information to the caller side.

A frame dividing step in which the frame dividing unit divides the input acoustic signal into frames of a predetermined time length;
The feature amount extraction unit divides the acoustic signal of each frame into K sections, and p ^- _k is an average value of an index representing the magnitude of the acoustic signal of the k-th section of each frame, and Δp ^- _k is The value defined by the following equation is calculated with the change rate of p ⁻ _k in the k-th section of each frame, m being a predetermined integer of 2 or more, and if the value is 0 or more, the value is and the rising characteristics of each frame, a feature amount extracting step comprises a rising characteristic calculation unit to 0 the rising characteristics of the respective frame if its value is less than 0,

A method for calculating acoustic features including

Program for causing a computer to function as the acoustic feature quantity calculation apparatus according to claim 1 or 2.

The program for functioning a computer as a condition estimation apparatus of Claim 7.

A program for causing a computer to function as the call suitability notification device according to claim 8.