JP6467915B2

JP6467915B2 - Feature sound extraction method, feature sound extraction device, computer program, distribution system

Info

Publication number: JP6467915B2
Application number: JP2014266102A
Authority: JP
Inventors: 成幸小田嶋; 美和岡林
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-12-26
Filing date: 2014-12-26
Publication date: 2019-02-13
Anticipated expiration: 2034-12-26
Also published as: JP2016126479A

Description

本発明は、生活音の中から、特徴的な箇所を抽出する技術等に関する。 The present invention relates to a technique for extracting a characteristic part from a living sound.

近年、生活音センシング・提示による高齢者見守りサービス、コミュニケーションサービスが検討されている。これらのサービスに共通する要素技術として、全ての音を聞かずとも様子がわかるための、生活音の要約技術がある。生活音の要約技術とは、収集した生活音の中から、ドアの開閉音や人の笑い声などの特徴的な箇所を抽出し、提示する技術である。 In recent years, a service for monitoring elderly people by using living sound sensing / presentation and a communication service have been studied. As an elemental technology common to these services, there is a life sound summarization technology for understanding the situation without listening to all sounds. The life sound summarization technique is a technique for extracting and presenting characteristic portions such as door opening / closing sounds and human laughter from collected life sounds.

従来、信号の特徴的な箇所を抽出する技術として、定常状態のデータを取得しておき、その定常状態からのズレに基づき異変検出する方法がある（例えば、特許文献１）。 Conventionally, as a technique for extracting a characteristic portion of a signal, there is a method of acquiring abnormal data and detecting anomalies based on a deviation from the steady state (for example, Patent Document 1).

特開２００４−２９５８６１号公報JP 2004-295861 A

しかし、生活音においては、エアコンのファンの音、水の音、咳払いの音など、特徴度合いが異なる複数の特徴的な箇所を含むという性質がある。そのため、定常状態からのズレの度合いを閾値で判断し、提示するか否かを判定する方法では、特徴箇所が全て提示されてしまうか、特徴度合いが弱い箇所が全く提示されなくなるという問題が生じる。 However, the living sound has a property of including a plurality of characteristic portions having different characteristic degrees, such as a sound of a fan of an air conditioner, a sound of water, and a sound of coughing. Therefore, in the method of judging the degree of deviation from the steady state with a threshold and determining whether or not to present, there is a problem that all the characteristic parts are presented or parts where the characteristic degree is weak are not presented at all. .

１つの側面では、生活音の中に、複数の特徴箇所が存在しうる場合でも、最も特徴的な箇所を特定し、当該特定箇所に係る音声を抽出する特徴音抽出方法等を提供することを目的とする。 In one aspect, there is provided a feature sound extraction method or the like that identifies the most characteristic part and extracts the sound related to the specific part even when a plurality of characteristic parts may exist in the living sound. Objective.

１つ態様では、コンピュータにより実行する特徴音抽出方法は、音データを所定時間毎に区切り、区切られた期間毎に音データの周波数成分を含む特徴量を算出し、前記期間毎に算出した音データの周波数成分を含む特徴量の１以上の成分値ｘに対して、区間ｘ≧ａ _ｂ（ａ _ｂ ≧０）において、ｘで微分又は劣微分した関数が単調減少であり、下記の式を満たす関数値の下界Ｔが区間０≦ｘ≦ａ _ｂにおいて存在する関数ｆ（ｘ）を作用させ、作用させた結果に基づき、前記期間毎の特徴量を求め、求めた期間毎の特徴量に基づいて、特徴音データを抽出する。

In one aspect, the feature sound extraction method executed by a computer divides sound data at predetermined time intervals, calculates a feature amount including a frequency component of sound data for each divided period, and calculates the sound data calculated for each period. For one or more component values x of the feature quantity including the frequency component of the data, the function differentiated or sub-differentiated by x in the section x ≧ a _b (a _b ≧ 0) is monotonically decreased. The lower bound T of the function value to satisfy satisfies the function f (x) existing in the section 0 ≦ x ≦ _ab , and based on the result of the function, the feature value for each period is obtained, and the obtained feature quantity for each period is obtained. Based on this, characteristic sound data is extracted.

方法の一観点によれば、生活音の中から、最も特徴的な箇所を抽出することが可能となる。 According to one aspect of the method, it is possible to extract the most characteristic part from the living sound.

特徴音抽出装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of a characteristic sound extraction apparatus. 特徴音抽出装置の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of a characteristic sound extraction apparatus. 生活音ＤＢのレコードレイアウトの一例を示す説明図である。It is explanatory drawing which shows an example of the record layout of life sound DB. 音特徴ＤＢのレコードレイアウトの一例を示す説明図である。It is explanatory drawing which shows an example of the record layout of sound feature DB. 音クラスタＤＢのレコードレイアウトの一例を示す説明図である。It is explanatory drawing which shows an example of the record layout of sound cluster DB. ダイジェスト表示ＤＢのレコードレイアウトの一例を示す説明図である。It is explanatory drawing which shows an example of the record layout of digest display DB. 特徴音ＤＢのレコードレイアウトの一例を示す説明図である。It is explanatory drawing which shows an example of the record layout of characteristic sound DB. メイン処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of a main process. 特徴スコア計算処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of a feature score calculation process. 特徴計算処理の一例を示すブロック線図である。It is a block diagram which shows an example of a feature calculation process. 特徴計算処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of a feature calculation process. フィルタの入出力関係の一例を示すグラフである。It is a graph which shows an example of the input-output relationship of a filter. 特徴量の一例を示すグラフである。It is a graph which shows an example of a feature-value. 出力抑制処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of an output suppression process. 提示処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of a presentation process. 出力抑制処理、提示処理の具体例を示す説明図である。It is explanatory drawing which shows the specific example of an output suppression process and a presentation process. 在・不在判定処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of a presence / absence determination process. 在・不在判定処理の具体例を示す説明図である。It is explanatory drawing which shows the specific example of a presence / absence determination process. ダイジェスト表示の一例を示す説明図である。It is explanatory drawing which shows an example of a digest display. 発生頻度計算処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of an occurrence frequency calculation process. 出力抑制処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of an output suppression process. 出力抑制処理の具体例を示す説明図である。It is explanatory drawing which shows the specific example of an output suppression process. 出力抑制処理の他の例を示す説明図である。It is explanatory drawing which shows the other example of an output suppression process. フィルタの入出力の関係の一例を示すグラフである。It is a graph which shows an example of the input / output relationship of a filter. 配信システムの構成の一例を示す説明図である。It is explanatory drawing which shows an example of a structure of a delivery system.

実施の形態１
以下、本発明の実施の形態を、図面を参照して説明する。図１は特徴音抽出装置１
のハードウェア構成を示すブロック図である。特徴音抽出装置１は、汎用コンピュータ、ワークステーション、デスクトップ型ＰＣ（パーソナルコンピュータ）、ノートブック型ＰＣ等である。特徴音抽出装置１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１２、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１３、大容量記憶装置１４、入力部１５、出力部１６、通信部１７（送信部）、読取り部１８を含む。各構成はバスで接続されている。 Embodiment 1
Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 shows a feature sound extraction apparatus 1
It is a block diagram which shows the hardware constitutions. The feature sound extraction apparatus 1 is a general-purpose computer, a workstation, a desktop PC (personal computer), a notebook PC, or the like. The feature sound extraction device 1 includes a CPU (Central Processing Unit) 11, a RAM (Random Access Memory) 12, a ROM (Read Only Memory) 13, a mass storage device 14, an input unit 15, an output unit 16, and a communication unit 17 (transmission). Part) and a reading part 18. Each component is connected by a bus.

ＣＰＵ１１はＲＯＭ１３に記憶された制御プログラム１Ｐに従いハードウェア各部を制御する。ＲＡＭ１２は例えばＳＲＡＭ（ＳｔａｔｉｃＲＡＭ）、ＤＲＡＭ（ＤｙｎａｍｉｃＲＡＭ）、フラッシュメモリである。ＲＡＭ１２はＣＰＵ１１によるプログラムの実行時に発生するデータを一時的に記憶する。 The CPU 11 controls each part of the hardware according to the control program 1P stored in the ROM 13. The RAM 12 is, for example, SRAM (Static RAM), DRAM (Dynamic RAM), or flash memory. The RAM 12 temporarily stores data generated when the CPU 11 executes the program.

大容量記憶装置１４は、例えばハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）などである。大容量記憶装置１４には、後述する各種データベースが記憶されている。また、制御プログラム１Ｐを大容量記憶装置１４に記憶するようにしておいても良い。 The mass storage device 14 is, for example, a hard disk, an SSD (Solid State Drive), or the like. The large-capacity storage device 14 stores various databases described later. Further, the control program 1P may be stored in the mass storage device 14.

入力部１５は特徴音抽出装置１にデータを入力するためのキーボード、マウスなどを含む。また、生活音を収集する例えば、マイク１５ａが接続され、マイク１５ａから収集された生活音は電気信号となり、入力部１５に入力される。
なお、本明細書において、「音」とは、空気中の振動をマイクで取得する狭義の「音」に限らず、例えば空気中、物質中、液体中を伝搬する「振動」を、例えばマイクやピエゾ素子、レーザ微小変位計といった測定装置で計測した場合も含む広義の概念である。 The input unit 15 includes a keyboard, a mouse, and the like for inputting data to the feature sound extraction apparatus 1. Further, for example, a microphone 15 a is connected to collect life sounds, and the life sounds collected from the microphone 15 a are converted into electric signals and input to the input unit 15.
In this specification, “sound” is not limited to “sound” in a narrow sense in which vibration in the air is acquired by a microphone. For example, “vibration” that propagates in air, in a substance, or in a liquid is, for example, a microphone. This is a broad concept including the case of measuring with a measuring device such as a piezo element or a laser micro displacement meter.

出力部１６は特徴音抽出装置１の画像出力を表示装置１６ａに、音声出力をスピーカなどに行うためのものである。 The output unit 16 is for outputting the image of the characteristic sound extraction device 1 to the display device 16a and outputting the sound to a speaker or the like.

通信部１７はネットワークを介して、他のコンピュータと通信を行う。読取り部１８はＣＤ（ＣｏｍｐａｃｔＤｉｓｋ）−ＲＯＭ、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）−ＲＯＭを含む可搬型記憶媒体１ａを読み取る。ＣＰＵ１１が読取り部１８を介して、制御プログラム１Ｐを可搬型記憶媒体１ａより読み取り、大容量記憶装置１４に記憶しても良い。また、ネットワークを介して他のコンピュータからＣＰＵ１１が制御プログラム１Ｐをダウンロードし、大容量記憶装置１４に記憶しても良い。さらにまた、半導体メモリ１ｂから、ＣＰＵ１１が制御プログラム１Ｐを読み込んでも良い。 The communication unit 17 communicates with other computers via a network. The reading unit 18 reads a portable storage medium 1a including a CD (Compact Disk) -ROM and a DVD (Digital Versatile Disc) -ROM. The CPU 11 may read the control program 1P from the portable storage medium 1a via the reading unit 18 and store it in the large-capacity storage device 14. Alternatively, the CPU 11 may download the control program 1P from another computer via a network and store it in the mass storage device 14. Furthermore, the CPU 11 may read the control program 1P from the semiconductor memory 1b.

図２は特徴音抽出装置１の機能構成例を示すブロック図である。特徴音抽出装置１の各機能部は、大容量記憶装置１４に記憶された制御プログラム１Ｐと、ＣＰＵ１１、ＲＡＭ１２等のハードウェア資源とが協働して動作することにより実現される。 FIG. 2 is a block diagram illustrating a functional configuration example of the feature sound extraction apparatus 1. Each functional unit of the feature sound extraction device 1 is realized by the control program 1P stored in the mass storage device 14 and hardware resources such as the CPU 11 and the RAM 12 operating in cooperation.

特徴音抽出装置１は、入力部１１０、特徴スコア計算部１２０、クラスタリング部１３０、ダイジェスト表示生成部１４０、出力抑制部１５０及び提示部１６０、並びに、生活音ＤＢ１４ａ、音特徴ＤＢ１４ｂ、音クラスタＤＢ１４ｃ、ダイジェスト表示ＤＢ１４ｄ及び特徴音ＤＢ１４ｅを含む。 The feature sound extraction apparatus 1 includes an input unit 110, a feature score calculation unit 120, a clustering unit 130, a digest display generation unit 140, an output suppression unit 150, and a presentation unit 160, a living sound DB 14a, a sound feature DB 14b, a sound cluster DB 14c, It includes a digest display DB 14d and a feature sound DB 14e.

入力部１１０は生活音入力部１１１を含む。入力部１１０はハードウェアである入力部１５に対応する機能部である。特徴スコア計算部１２０は、音特徴計算部１２１（算出部、フィルタ部）、音クラスタマッチング・スコア計算部１２２（特徴量算出部）を含む。クラスタリング部１３０はクラスタリング処理部１３１、クラスタ発生頻度計算部１３２を含む。ダイジェスト表示生成部１４０は在／不在判定部１４１（計数部、判定部）を含む。出力抑制部１５０は特徴箇所出力抑制部１５１（抽出部）を含む。提示部１６０はＧＵＩ表示部１６１、音声提示部１６２を含む。 The input unit 110 includes a living sound input unit 111. The input unit 110 is a functional unit corresponding to the input unit 15 that is hardware. The feature score calculation unit 120 includes a sound feature calculation unit 121 (calculation unit, filter unit) and a sound cluster matching / score calculation unit 122 (feature amount calculation unit). The clustering unit 130 includes a clustering processing unit 131 and a cluster occurrence frequency calculation unit 132. The digest display generation unit 140 includes a presence / absence determination unit 141 (counting unit, determination unit). The output suppression unit 150 includes a feature location output suppression unit 151 (extraction unit). The presentation unit 160 includes a GUI display unit 161 and a voice presentation unit 162.

生活音ＤＢ１４ａ、音特徴ＤＢ１４ｂ、音クラスタＤＢ１４ｃ、ダイジェスト表示ＤＢ１４ｄ及び特徴音ＤＢ１４ｅは、大容量記憶装置１４に記憶されている。 The living sound DB 14a, the sound feature DB 14b, the sound cluster DB 14c, the digest display DB 14d, and the feature sound DB 14e are stored in the mass storage device 14.

図３は生活音ＤＢ１４ａのレコードレイアウトの一例を示す説明図である。生活音ＤＢ１４ａはタイムスタンプ、音ファイル名の各列を含む。タイムスタンプには生活音を取得した時刻を記憶する。タイムスタンプとする時刻は、音ファイルとして記憶される生活音の冒頭の時刻や末尾の時刻などである。音ファイル名には、ファイル名を記憶する。 FIG. 3 is an explanatory diagram showing an example of a record layout of the living sound DB 14a. The living sound DB 14a includes columns of time stamps and sound file names. The time stamp stores the time when the life sound was acquired. The time as the time stamp is the beginning time or the end time of the life sound stored as the sound file. In the sound file name, the file name is stored.

図４は音特徴ＤＢ１４ｂのレコードレイアウトの一例を示す説明図である。音特徴ＤＢ１４ｂはタイムスタンプ、特徴量の各列を含む。タイムスタンプには、音データのタイムスタンプを記憶する。特徴量には、音データの特徴量の値を記憶する。 FIG. 4 is an explanatory diagram showing an example of the record layout of the sound feature DB 14b. The sound feature DB 14b includes columns of time stamps and feature amounts. The time stamp stores the time stamp of the sound data. The feature value stores the value of the feature value of the sound data.

図５は音クラスタＤＢ１４ｃのレコードレイアウトの一例を示す説明図である。音クラスタＤＢ１４ｃはクラスタＩＤ、特徴量、発生頻度の各列を含む。クラスタＩＤは各クラスタを特定するＩＤを記憶する。特徴量は各クラスタの特徴量、すなわち、各クラスタの中心座標やクラスタに含まれるデータの中央値などのクラスタの代表値を記憶する。発生頻度は各クラスタの発生頻度を記憶する。 FIG. 5 is an explanatory diagram showing an example of a record layout of the sound cluster DB 14c. The sound cluster DB 14c includes columns of cluster ID, feature amount, and occurrence frequency. The cluster ID stores an ID that identifies each cluster. The feature value stores a feature value of each cluster, that is, a representative value of the cluster such as a center coordinate of each cluster or a median value of data included in the cluster. The occurrence frequency stores the occurrence frequency of each cluster.

図６はダイジェスト表示ＤＢ１４ｄのレコードレイアウトの一例を示す説明図である。ダイジェスト表示ＤＢ１４ｄは開始時刻、終了時刻、ダイジェストＩＤの各列を含む。開始時刻、終了時刻は、人の在または不在を示す時間帯の開始、終了の時刻を記憶する。ダイジェストＩＤは人の在又は不在を示すＩＤを記憶する。例えば、ＩＤ＝１は在を示し、ＩＤ＝０は不在を示す。 FIG. 6 is an explanatory diagram showing an example of a record layout of the digest display DB 14d. The digest display DB 14d includes columns of start time, end time, and digest ID. As the start time and end time, the start and end times of a time zone indicating the presence or absence of a person are stored. The digest ID stores an ID indicating the presence or absence of a person. For example, ID = 1 indicates presence and ID = 0 indicates absence.

図７は特徴音ＤＢ１４ｅのレコードレイアウトの一例を示す説明図である。特徴音ＤＢ１４ｅはタイムスタンプ、スコア、クラスタＩＤの各列を含む。タイムスタンプは音データが観測された日時を記憶する。スコアは音データの発生頻度を記憶する。クラスタＩＤは特徴音が属するクラスタのＩＤを記憶する。 FIG. 7 is an explanatory diagram showing an example of a record layout of the feature sound DB 14e. The feature sound DB 14e includes columns of time stamp, score, and cluster ID. The time stamp stores the date and time when the sound data was observed. The score stores the frequency of sound data generation. The cluster ID stores the ID of the cluster to which the characteristic sound belongs.

次に、特徴音抽出装置１の動作概要を説明する。生活音入力部１１１はマイク１５ａから収集された音をデータ（音データ）として生活音ＤＢ１４ａに記憶する。生活音ＤＢ１４ａに記憶する音声データの形式は、ＷＡＶ（ＲＩＦＦｗａｖｅｆｏｒｍＡｕｄｉｏＦｏｒｍａｔ）、ＡＩＦＦ（ＡｕｄｉｏＩｎｔｅｒｃｈａｎｇｅＦｉｌｅＦｏｒｍａｔ）といった非圧縮形式でも良いし、ＭＰ３（ＭＰＥＧ−１ＡｕｄｉｏＬａｙｅｒ−３）、ＷＭＡ（ＷｉｎｄｏｗｓＭｅｄｉａ（登録商標）Ａｕｄｉｏ）といった圧縮形式でも良い。また、生活音入力部１１１は音データを音特徴計算部１２１に渡す。 Next, an outline of the operation of the feature sound extraction device 1 will be described. The living sound input unit 111 stores the sound collected from the microphone 15a as data (sound data) in the living sound DB 14a. The format of the audio data stored in the living sound DB 14a may be an uncompressed format such as WAV (RIFF waveform Audio Format) or AIFF (Audio Interchange File Format), MP3 (MPEG-1 Audio Layer-3), WMA (WindowMs). (Registered trademark) Audio) may be used. Further, the living sound input unit 111 passes the sound data to the sound feature calculation unit 121.

音特徴計算部１２１は音声データを時間ウィンドウにより区切り、区切られた時間毎に特徴量を算出する。算出した特徴量は音特徴ＤＢ１４ｂに記憶する。クラスタリング処理部１３１は所定期間毎、音特徴ＤＢ１４ｂが更新される毎などのタイミングで、音特徴ＤＢ１４ｂが記憶している特徴量をクラスタリングする。クラスタ発生頻度計算部１３２は各クラスタの発生頻度を計算し、音クラスタＤＢ１４ｃに記憶する。また、音特徴計算部１２１は算出した特徴量を音クラスタマッチング・スコア計算部１２２に渡す。 The sound feature calculation unit 121 divides the audio data by a time window and calculates a feature amount for each divided time. The calculated feature amount is stored in the sound feature DB 14b. The clustering processing unit 131 clusters the feature values stored in the sound feature DB 14b at a timing such as every time the sound feature DB 14b is updated. The cluster occurrence frequency calculation unit 132 calculates the occurrence frequency of each cluster and stores it in the sound cluster DB 14c. The sound feature calculation unit 121 passes the calculated feature amount to the sound cluster matching score calculation unit 122.

音クラスタマッチング・スコア計算部１２２は、音特徴計算部１２１より受け取った特徴量と音クラスタＤＢ１４ｃに記憶してある各クラスタの特徴量とのマッチングを行い、処理対象となっている音が所属すべきクラスタを決定する。所属すべきクラスタの発生頻度は、例えば区切られた音声データの発生頻度としてもよいし、特徴量の近傍に存在するクラスタの発生頻度の重みづけ和を用いてもよい。音クラスタマッチング・スコア計算部１２２は、以上の処理結果として、区切られた音声データ毎に、所属すべきクラスタのＩＤ、発生頻度、区切られた音声のタイムスタンプを、特徴音ＤＢ１４ｅに記憶する。 The sound cluster matching score calculation unit 122 performs matching between the feature amount received from the sound feature calculation unit 121 and the feature amount of each cluster stored in the sound cluster DB 14c, and the sound to be processed belongs. Determine the cluster to be used. The generation frequency of the cluster to which the user belongs should be, for example, the generation frequency of the divided audio data, or a weighted sum of the generation frequencies of the clusters existing in the vicinity of the feature amount may be used. As a result of the above processing, the sound cluster matching score calculation unit 122 stores, in the feature sound DB 14e, the ID of the cluster to which the sound belongs, the frequency of occurrence, and the time stamp of the separated sound for each separated sound data.

特徴箇所出力抑制部１５１は、特徴音ＤＢ１４ｅに記憶されたデータより一定時間分、例えば３０分間分のデータを取り出し、発生頻度の最も低い音データを特定し、特定した音データのタイムスタンプを出力する。 The feature location output suppression unit 151 extracts data for a predetermined time, for example, 30 minutes from the data stored in the feature sound DB 14e, specifies the sound data with the lowest occurrence frequency, and outputs the time stamp of the specified sound data To do.

在／不在判定部１４１は、上述の一時記憶領域に蓄えられたデータより一定時間分のデータを取り出し、入力データとする。また、在／不在判定部１４１は、音クラスタＤＢ１４ｃから、背景音テーブル（図示しない）を作成する。非背景音テーブルは、各音クラスタに背景音または非背景音の種別を付与したものである。背景音、非背景音の区別は、発生頻度を基に決定する。在／不在判定部１４１は、入力データに含まれる非背景音の出現回数を求め、当該出現回数が予め定めた閾値を越えていれば人が居たと判定し、当該出現回数が閾値以下であれば人は居なかったと判定する。在／不在判定部１４１は、判定結果をダイジェスト表示ＤＢ１４ｄに記憶する。 The presence / absence determination unit 141 extracts data for a certain period of time from the data stored in the temporary storage area, and uses it as input data. Also, the presence / absence determination unit 141 creates a background sound table (not shown) from the sound cluster DB 14c. The non-background sound table is obtained by adding a background sound or a non-background sound type to each sound cluster. The distinction between background sounds and non-background sounds is determined based on the frequency of occurrence. The presence / absence determination unit 141 obtains the number of appearances of the non-background sound included in the input data, determines that there is a person if the number of appearances exceeds a predetermined threshold, and if the number of appearances is equal to or less than the threshold. It is determined that there were no people. The presence / absence determination unit 141 stores the determination result in the digest display DB 14d.

音声提示部１６２は、特徴箇所出力抑制部１５１が出力した音データのスコアが所定の閾値を越えたもののみを提示する。なお、ここでのスコアは例えば発生頻度とは反比例するスコアとしてよい。 The voice presentation unit 162 presents only the sound data score output by the feature location output suppression unit 151 that exceeds a predetermined threshold. The score here may be, for example, a score that is inversely proportional to the occurrence frequency.

ＧＵＩ表示部１６１はダイジェスト表示ＤＢ１４ｄに基づいて、在・不在の判定結果を示す表示画面を表示装置１６ａに表示させる。 The GUI display unit 161 causes the display device 16a to display a display screen indicating the presence / absence determination result based on the digest display DB 14d.

次に、特徴音抽出装置１の動作を詳細に説明する。図８はメイン処理の手順の一例を示すフローチャートである。特徴音抽出装置１のＣＰＵ１１は、特徴スコア計算処理を行う（ステップＳ１）。続いて、ＣＰＵ１１は出力抑制処理を行う（ステップＳ２）。最後に、ＣＰＵ１１は提示処理を行う（ステップＳ３）。 Next, the operation of the feature sound extraction device 1 will be described in detail. FIG. 8 is a flowchart showing an example of the procedure of the main process. The CPU 11 of the feature sound extraction device 1 performs feature score calculation processing (step S1). Subsequently, the CPU 11 performs output suppression processing (step S2). Finally, the CPU 11 performs a presentation process (step S3).

図９は特徴スコア計算処理の手順の一例を示すフローチャートである。特徴音抽出装置１のＣＰＵ１１は特徴量計算を行う（ステップＳ１１）。図１０は特徴量計算処理の一例を示すブロック線図である。図１０は特徴量計算処理の概要を示している。特徴量計算処理は、例えば高域強調処理、ＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）処理、雑音除去処理、メルフィルタ処理、スペクトル成分の累乗処理の順で行ってよい。音データから算出した周波数成分を含む特徴量（例えばメルスペクトル）を算出した後にスペクトル成分の累乗処理を行う点が、本実施の形態の特徴である。特徴量計算処理の詳細について、図１１を用いて説明する。なお、以下の例は、音データから算出した周波数成分を含む特徴量として、メルスペクトルを用いる場合のものである。 FIG. 9 is a flowchart illustrating an example of the procedure of the feature score calculation process. The CPU 11 of the feature sound extraction apparatus 1 performs feature amount calculation (step S11). FIG. 10 is a block diagram showing an example of the feature amount calculation process. FIG. 10 shows an outline of the feature amount calculation process. The feature amount calculation processing may be performed in the order of, for example, high-frequency emphasis processing, FFT (Fast Fourier Transform) processing, noise removal processing, mel filter processing, and spectral component power processing. A feature of the present embodiment is that the spectral component is subjected to a power process after calculating a feature amount (for example, a mel spectrum) including a frequency component calculated from sound data. Details of the feature amount calculation processing will be described with reference to FIG. In the following example, a mel spectrum is used as a feature amount including a frequency component calculated from sound data.

図１１は特徴量計算処理の手順の一例を示すフローチャートである。特徴音抽出装置１のＣＰＵ１１は入力された音データを所定時間毎に区切り、区切られた期間毎に処理を行う。すなわち、ＣＰＵ１１は時間ウィンドウを設定し、時間ウィンドウ内のデータを処理し、処理が終了した場合、時間ウィンドウをずらして、同様の処理を繰り返し行う。ＣＰＵ１１は時間ウィンドウ内のデータを取得する（ステップＳ２１）。ＣＰＵ１１は時間ウィンドウ内のデータに対して、高域強調（ステップＳ２２）、ＦＦＴ（ステップＳ２３）、振幅計算（ステップＳ２４）、雑音除去（ステップＳ２５）、メルスペクトル抽出（ステップＳ２６）の各処理を行う。これらの処理については、公知の技術であるので、説明を省略する。 FIG. 11 is a flowchart illustrating an example of the procedure of the feature amount calculation process. The CPU 11 of the feature sound extraction device 1 divides the input sound data every predetermined time and performs processing for each divided period. That is, the CPU 11 sets a time window, processes data in the time window, and when the process is completed, the CPU 11 shifts the time window and repeats the same process. The CPU 11 acquires data within the time window (step S21). The CPU 11 performs high frequency enhancement (step S22), FFT (step S23), amplitude calculation (step S24), noise removal (step S25), and mel spectrum extraction (step S26) on the data in the time window. Do. Since these processes are known techniques, the description thereof is omitted.

続いて、ＣＰＵ１１はステップＳ２６で得たメルスペクトルに対して、フィルタを適用する（ステップＳ２７）。ＣＰＵ１１はフィルタを適用したメルスペクトルを特徴量として出力する（ステップＳ２８）。ＣＰＵ１１は処理を呼び出し元に戻す。 Subsequently, the CPU 11 applies a filter to the mel spectrum obtained in step S26 (step S27). CPU11 outputs the mel spectrum which applied the filter as a feature-value (step S28). The CPU 11 returns the processing to the caller.

ここで、フィルタは乗数ｐが１乗未満の累乗関数であり、例えば、以下の式で表される。

Here, the filter is a power function in which the multiplier p is less than the first power, and is represented by the following expression, for example.

図１２は、フィルタの入出力関係の一例を示すグラフである。横軸が入力、縦軸が出力である。横軸、縦軸とも無次元である。乗数ｐの値が１であるグラフｆ１は参考のために表示しており、フィルタとしては用いない。グラフｆ２は乗数ｐの値が０．５の場合、グラフｆ３は乗数ｐが０．２５の場合である。１乗未満の乗数を持つ累乗フィルタは、図１２で示すように、１以上の値を持つ場合には出力を抑制する効果を有し、また１以下の入力に対しても必ず０以上の値を持つことが保証される。したがって、メルスペクトル特徴を用いた場合に生じる、細かい音量で大幅に特徴量形状が異なる問題と、ｌｏｇフィルタを用いた場合に生じる、１以下の出力で値が発散する問題は解決される。更に、特徴量に音量と周波数成分が同時に考慮されるため、音量・周波数成分を別個の処理で取り扱う必要がなく、処理が容易である。 FIG. 12 is a graph showing an example of the input / output relationship of the filter. The horizontal axis is input and the vertical axis is output. Both the horizontal and vertical axes are dimensionless. A graph f1 having a multiplier p value of 1 is displayed for reference and is not used as a filter. The graph f2 is when the multiplier p is 0.5, and the graph f3 is when the multiplier p is 0.25. As shown in FIG. 12, the power filter having a multiplier of less than the first power has an effect of suppressing the output when it has a value of 1 or more, and always has a value of 0 or more for an input of 1 or less. Guaranteed to have. Therefore, the problem that the feature amount shape is greatly different at a fine sound volume that occurs when the mel spectrum feature is used and the problem that the value diverges with an output of 1 or less that occurs when the log filter is used are solved. Furthermore, since the volume and the frequency component are considered simultaneously in the feature amount, it is not necessary to handle the volume and frequency components in separate processing, and the processing is easy.

図１３は特徴量の一例を示すグラフである。横軸は周波数で、単位はＫＨｚである。縦軸はスペクトル値で無次元数である。図１３Ａが咳払い声、図１３Ｂがファン音から得た特徴量である。両図を比較すると明らかのように、両者で大きく特徴量形状（スペクトル値）が異なっている。したがって、非背景音（咳払い声）と背景音（ファン音）とを類別するのに適した特徴量であると言える。 FIG. 13 is a graph showing an example of the feature amount. The horizontal axis is frequency and the unit is KHz. The vertical axis is a spectrum value and is a dimensionless number. FIG. 13A is a coughing voice, and FIG. 13B is a feature amount obtained from a fan sound. As is apparent from the comparison between the two figures, the feature amount shape (spectrum value) is greatly different between the two. Therefore, it can be said that the feature amount is suitable for classifying the non-background sound (coughing voice) and the background sound (fan sound).

図９に戻り、ＣＰＵ１１は特徴量計算で得た音特徴（特徴量）を音特徴ＤＢ１４ｂに記憶する（ステップＳ１２）。続いて、ＣＰＵ１１は特徴量と、音クラスタＤＢ１４ｃに記憶されている各クラスタの特徴量とのマッチングを行う（ステップＳ１３）。ＣＰＵ１１はマッチングした音クラスタのＩＤと発生頻度を出力する（ステップＳ１４）。ＣＰＵ１１は処理を呼び出し元に戻す。図８に戻り、ＣＰＵ１１は出力抑制処理（ステップＳ２）を行う。 Returning to FIG. 9, the CPU 11 stores the sound feature (feature amount) obtained by the feature amount calculation in the sound feature DB 14b (step S12). Subsequently, the CPU 11 performs matching between the feature amount and the feature amount of each cluster stored in the sound cluster DB 14c (step S13). The CPU 11 outputs the matched sound cluster ID and occurrence frequency (step S14). The CPU 11 returns the processing to the caller. Returning to FIG. 8, the CPU 11 performs an output suppression process (step S <b> 2).

出力抑制処理について説明する。実施の形態１における出力抑制処理は、すでに蓄積されている一定時間分の音データを処理対象とするバッチ処理を前提としている。処理対象となる音データは、音クラスタＩＤ、発生頻度スコア、タイムスタンプを含んでいる。図１４は出力抑制処理の手順の一例を示すフローチャートである。特徴音抽出装置１のＣＰＵ１１は各音データの抑制フラグをＦａｌｓｅとし、特徴箇所リストＦを空集合にする（ステップＳ３１）。抑制フラグは対応する音を抑制するか否かを示すフラグである。抑制フラグがＴｒｕｅの場合、出力抑制を意味するので音は出力されない。抑制フラグがＦａｌｓｅの場合、出力を抑制しないことを意味するので音を出力する。また特徴箇所リストＦは、特徴音が含まれる箇所のリストである。特徴箇所は、例えば、特徴音のタイムスタンプである。 The output suppression process will be described. The output suppression process in the first embodiment is premised on a batch process in which sound data for a certain period of time that has already been accumulated is targeted for processing. The sound data to be processed includes a sound cluster ID, an occurrence frequency score, and a time stamp. FIG. 14 is a flowchart illustrating an example of the procedure of the output suppression process. The CPU 11 of the feature sound extraction apparatus 1 sets the suppression flag of each sound data to False and sets the feature location list F to an empty set (step S31). The suppression flag is a flag indicating whether or not to suppress the corresponding sound. If the suppression flag is True, it means output suppression and no sound is output. If the suppression flag is False, it means that the output is not suppressed, so a sound is output. The feature location list F is a list of locations containing feature sounds. The characteristic location is, for example, a time stamp of the characteristic sound.

ＣＰＵ１１は音クラスタＤＢ１４ｃを参照し、発生頻度の低い順に音クラスタをソートする（ステップＳ３２）。ＣＰＵ１１は、ソートした結果から得られるＩＤの順列を音クラスタリストとして、ＲＡＭ１２などに記憶する。ＣＰＵ１１は音クラスタリストの先頭のクラスタを処理対象クラスタとして選択する（ステップＳ３３）。ＣＰＵ１１は、処理対象となっている音データの中で、選択したクラスと同じクラスタＩＤを持つ音データを取得し、リストＬに格納する（ステップＳ３４）。 The CPU 11 refers to the sound cluster DB 14c and sorts the sound clusters in ascending order of occurrence frequency (step S32). The CPU 11 stores the permutation of IDs obtained from the sorted results as a sound cluster list in the RAM 12 or the like. The CPU 11 selects the first cluster in the sound cluster list as the processing target cluster (step S33). The CPU 11 acquires sound data having the same cluster ID as the selected class among the sound data to be processed, and stores it in the list L (step S34).

ＣＰＵ１１はリストＬから先頭の音データｓを取得する（ステップＳ３５）。ＣＰＵ１１はｓの抑制フラグがＦａｌｓｅで、かつ、発生頻度スコアが閾値を超えているか否かを判定する（ステップＳ３６）。ＣＰＵ１１は、抑制フラグがＦａｌｓｅで、かつ、発生頻度スコアが閾値を超えていると判定した場合（ステップＳ３６でＹＥＳ）、音データｓのタイムスタンプを特徴箇所リストＦに追加する（ステップＳ３７）。なお、閾値は予め定められているものとする。ＣＰＵ１１は音データｓの周囲（前後）一定時間の抑制フラグの値をＴｒｕｅとする（ステップＳ３８）。ＣＰＵ１１はリストＬに未処理の音データがあるか否かを判定する（ステップＳ３９）。ＣＰＵ１１は、抑制フラグがＴｒｕｅか、または、発生頻度スコアが閾値以下であると判定した場合（ステップＳ３６でＮＯ）、処理をステップＳ３９に移す。ＣＰＵ１１は、リストＬに未処理の音データがあると判定した場合（ステップＳ３９でＹＥＳ）、処理をステップＳ３５に戻す。ＣＰＵ１１は、リストＬに未処理の音データがないと判定した場合（ステップＳ３９でＮＯ）、処理していないクラスタが音クラスタＤＢ１４ｃにあるか否かを判定する（ステップＳ４０）。ＣＵＰ１１は、未処理クラスタがあると判定とした場合（ステップＳ４０でＹＥＳ）、処理をステップＳ３３に戻す。ＣＰＵ１１は、未処理クラスタがないと判定した場合（ステップＳ４０でＮＯ）、特徴箇所リストを出力する（ステップＳ４１）。ＣＰＵ１１は出力抑制処理を終了し、処理を呼び出し元に戻す。再び、図８に戻り、ＣＰＵ１１は提示処理（ステップＳ３）を実行し、メイン処理を終了する。 The CPU 11 acquires the head sound data s from the list L (step S35). The CPU 11 determines whether or not the suppression flag for s is False and the occurrence frequency score exceeds the threshold value (step S36). If the CPU 11 determines that the suppression flag is False and the occurrence frequency score exceeds the threshold (YES in step S36), the CPU 11 adds the time stamp of the sound data s to the feature location list F (step S37). It is assumed that the threshold is predetermined. The CPU 11 sets the value of the suppression flag around the sound data s (before and after) for a certain time as True (step S38). The CPU 11 determines whether there is unprocessed sound data in the list L (step S39). If the CPU 11 determines that the suppression flag is True or the occurrence frequency score is equal to or less than the threshold (NO in step S36), the process proceeds to step S39. If the CPU 11 determines that there is unprocessed sound data in the list L (YES in step S39), the process returns to step S35. If the CPU 11 determines that there is no unprocessed sound data in the list L (NO in step S39), the CPU 11 determines whether there is an unprocessed cluster in the sound cluster DB 14c (step S40). If the CUP 11 determines that there is an unprocessed cluster (YES in step S40), the process returns to step S33. When determining that there is no unprocessed cluster (NO in step S40), the CPU 11 outputs a feature location list (step S41). The CPU 11 ends the output suppression process and returns the process to the caller. Returning to FIG. 8 again, the CPU 11 executes the presentation process (step S3) and ends the main process.

提示処理について説明する。図１５は提示処理の手順の一例を示すフローチャートである。特徴音抽出装置１のＣＰＵ１１は、出力抑制処理によって得られた特徴箇所リストに基づき、特徴音ＤＢ１４ｅから特徴箇所に対応する音データ（特徴音データ）を取得する（ステップＳ５１）。ＣＰＵ１１は取得した特徴音データのスコアが、閾値Ｔｉを超えているか否かを判定する（ステップＳ５２）。閾値Ｔｉは予め定めておいても良いし、提示処理を実行するに当たり定めても良い。閾値Ｔｉは、多くの特徴音を確認したい場合は小さめの値とし、そうでない場合は大きめの値とすれば良い。ＣＰＵ１１は、スコアが閾値を超えている判定した場合（ステップＳ５２でＹＥＳ）、特徴音を再生する（ステップＳ５３）。ＣＰＵ１１は、未処理の特徴音データがあるか否かを判定する（ステップＳ５４）。ＣＰＵ１１は、未処理の特徴音データがあると判定した場合（ステップＳ５４でＹＥＳ）、処理をステップＳ５１に戻す。ＣＰＵ１１は、未処理の特徴音データがないと判定した場合（ステップＳ５４でＮＯ）、提示処理を終了し、処理を呼び出し元に戻す。ＣＰＵ１１は、スコアが閾値以下であると判定した場合（ステップＳ５２でＮＯ）、処理をステップＳ５４に移す。 The presentation process will be described. FIG. 15 is a flowchart illustrating an example of the procedure of the presentation process. The CPU 11 of the feature sound extraction device 1 acquires sound data (feature sound data) corresponding to the feature location from the feature sound DB 14e based on the feature location list obtained by the output suppression process (step S51). The CPU 11 determines whether or not the score of the acquired characteristic sound data exceeds the threshold value Ti (step S52). The threshold Ti may be determined in advance or may be determined when the presentation process is executed. The threshold value Ti may be a small value when many characteristic sounds are to be confirmed, and may be a large value otherwise. If the CPU 11 determines that the score exceeds the threshold (YES in step S52), the CPU 11 reproduces the characteristic sound (step S53). The CPU 11 determines whether there is unprocessed feature sound data (step S54). If the CPU 11 determines that there is unprocessed characteristic sound data (YES in step S54), the process returns to step S51. If the CPU 11 determines that there is no unprocessed feature sound data (NO in step S54), the CPU 11 ends the presentation process and returns the process to the caller. If the CPU 11 determines that the score is equal to or less than the threshold (NO in step S52), the process proceeds to step S54.

次に、出力抑制処理、提示処理の具体例を示す。図１６は出力抑制処理、提示処理の具体例を示す説明図である。図１６では横軸を時間、縦軸を振幅として、音声信号の波形を示している。波形と重なる矩形は時間ウィンドウで区切られる期間を示している。波形の上に示しているのは、期間毎の特徴度合いのスコアである。図１６に２つの例を示している。２つの例ともに時間ウィンドウで区切られた期間が７つ（Ｋ１からＫ７、Ｋ１１からＫ１７）示され、そのうち、５つの期間で出力抑制処理が行われる例である。図１６の例に示されている特徴度合いのスコアは、発生頻度と反比例するスコアであり、発生頻度が低いほど大きな値となるスコアである。図１６の左側の例では、期間Ｋ４のスコアが０．５であり、他の期間Ｋ１からＫ３、Ｋ５からＫ７のスコアはいずれも０．１となっている。出力抑制処理の対象とする期間Ｋ２からＫ６において、期間Ｋ４が０．５で他の期間に比べて、スコアが大きいため、期間Ｋ４の音が提示される。図１６の右側の例では、期間Ｋ１５のスコアが１．０であり、他の期間Ｋ１１からＫ１４、Ｋ１６からＫ１７のスコアはいずれも０．５となっている。出力抑制処理の対象とする期間Ｋ１２からＫ１６において、期間Ｋ１５が１．０で他の期間に比べて、スコアが大きいため、期間Ｋ１５の音が提示される。このように、出力抑制処理では、複数の期間中で特徴音を判定して提示するので、スコアが０．５の音であっても提示される場合もあれば、提示されない場合もある。すなわち、少数の特徴的な箇所が抽出され、閾値による一律的な判定の場合のように、全部出力されるか、全部出力されないかの何れかのようになってしまうことを防ぐことが可能となる。 Next, specific examples of output suppression processing and presentation processing will be shown. FIG. 16 is an explanatory diagram illustrating a specific example of output suppression processing and presentation processing. In FIG. 16, the waveform of the audio signal is shown with time on the horizontal axis and amplitude on the vertical axis. A rectangle that overlaps the waveform indicates a period divided by a time window. Shown above the waveform is a score of the feature level for each period. FIG. 16 shows two examples. In both examples, seven periods (K1 to K7, K11 to K17) separated by a time window are shown, and output suppression processing is performed in five periods. The feature degree score shown in the example of FIG. 16 is a score that is inversely proportional to the occurrence frequency, and is a score that increases as the occurrence frequency decreases. In the example on the left side of FIG. 16, the score for the period K4 is 0.5, and the scores for the other periods K1 to K3 and K5 to K7 are all 0.1. In the periods K2 to K6 to be subjected to the output suppression process, the period K4 is 0.5 and the score is larger than the other periods, so the sound of the period K4 is presented. In the example on the right side of FIG. 16, the score for the period K15 is 1.0, and the scores for the other periods K11 to K14 and K16 to K17 are all 0.5. In the periods K12 to K16 targeted for the output suppression process, the period K15 is 1.0, and the score is larger than the other periods, so the sound of the period K15 is presented. In this way, in the output suppression process, the characteristic sound is determined and presented during a plurality of periods, so even if the score is 0.5, the sound may be presented or may not be presented. In other words, it is possible to prevent a small number of characteristic parts from being extracted and being output as a whole or not as in the case of a uniform determination based on a threshold value. Become.

続いて、人の在・不在判定について説明する。図１７は在・不在判定処理の手順の一例を示すフローチャートである。特徴音抽出装置１のＣＰＵ１１は、特徴音ＤＢ１４ｅから過去一定時間に抽出された音データを取得する（ステップＳ６１）。ＣＰＵ１１は取得した音データの中で、背景音に含まれない音データの個数Ｃを算出する（ステップＳ６２）。ＣＰＵ１１は個数Ｃが閾値を超えているか否かを判定する（ステップＳ６３）。ＣＰＵ１１は個数Ｃが閾値を超えていると判定した場合（ステップＳ６３でＹＥＳ）、戻り値を在とする（ステップＳ６４）。ＣＰＵ１１は個数Ｃが閾値以下であると判定した場合（ステップＳ６３でＮＯ）、戻り値を不在とする（ステップＳ６５）。ＣＰＵ１１は在・不在判定処理を終了する。ＣＰＵ１１は、在・不在判定処理を適切な回数、繰り返し実行することにより、ダイジェスト表示ＤＢ１４ｄを作成する。なお、背景音テーブルは、在・不在判定処理が実行される前に作成されているものとする。 Next, the presence / absence determination of a person will be described. FIG. 17 is a flowchart illustrating an example of the procedure of presence / absence determination processing. The CPU 11 of the feature sound extraction device 1 acquires sound data extracted in the past certain time from the feature sound DB 14e (step S61). The CPU 11 calculates the number C of sound data not included in the background sound in the acquired sound data (step S62). The CPU 11 determines whether or not the number C exceeds a threshold value (step S63). When the CPU 11 determines that the number C exceeds the threshold (YES in step S63), the CPU 11 determines that the return value is present (step S64). When the CPU 11 determines that the number C is equal to or less than the threshold (NO in step S63), the CPU 11 determines that the return value is absent (step S65). The CPU 11 ends the presence / absence determination process. The CPU 11 creates the digest display DB 14d by repeatedly executing the presence / absence determination process an appropriate number of times. Note that the background sound table is created before the presence / absence determination process is executed.

図１８は在・不在判定処理の具体例を示す説明図である。図１８に示す波形などは図１６と同様である。図１８の左側は音データの例を示している。図１８の右側には背景音テーブルを概念的に表現したものを記載している。音データは期間Ｋ２１からＫ３０の１０期間のデータである。音データには、咳払いの音、テレビの音、ファンの音の３種類の音が含まれている。また、背景音テーブルにより、咳払いの音、テレビの音は非背景音と定義され、ファンの音は背景音と定義されている。図１８の音データにおいては、期間Ｋ２１、Ｋ２２、Ｋ２４がファンの音、期間Ｋ２３、Ｋ２９が咳払いの音、期間Ｋ２５からＫ２８、Ｋ３０がテレビの音となっている。したがって、非背景音（図１８ではマルを付している。）の数が７、背景音（図１８ではバツを付している。）の数が３となっている。例えば、閾値が６であれば、期間Ｋ２１からＫ３０に掛けての時間帯には、人が居たと判定する。なお、図１８で定義された音データ、背景音データ等は一例であり、環境の差異により、別の音が背景音として定義されていることもありうる。 FIG. 18 is an explanatory diagram showing a specific example of the presence / absence determination process. The waveforms shown in FIG. 18 are the same as those in FIG. The left side of FIG. 18 shows an example of sound data. The right side of FIG. 18 shows a conceptual representation of the background sound table. The sound data is data for 10 periods from the period K21 to K30. The sound data includes three kinds of sounds, a coughing sound, a television sound, and a fan sound. Further, according to the background sound table, coughing sounds and television sounds are defined as non-background sounds, and fan sounds are defined as background sounds. In the sound data of FIG. 18, the periods K21, K22, and K24 are fan sounds, the periods K23 and K29 are coughing sounds, and the periods K25 to K28 and K30 are television sounds. Therefore, the number of non-background sounds (circled in FIG. 18) is 7, and the number of background sounds (crossed in FIG. 18) is 3. For example, if the threshold is 6, it is determined that there was a person in the time period from the period K21 to K30. Note that the sound data, background sound data, and the like defined in FIG. 18 are examples, and another sound may be defined as the background sound due to environmental differences.

図１９はダイジェスト表示の一例を示す説明図である。ダイジェスト表示では過去２４時間について表示される。図１９に示す例では、ハッチングが掛かっている期間が人の不在期間１９ａを、ハッチングがない期間が人の在期間１９ｂを表している。在期間１９ｂでは、さらに特徴音が発生した箇所１９ｃが矢印により示されている。さらに現在の時刻も矢印１９ｄにより示されている。 FIG. 19 is an explanatory diagram showing an example of a digest display. In the digest display, the past 24 hours are displayed. In the example shown in FIG. 19, the hatched period represents the person's absence period 19 a, and the non-hatched period represents the person's residency period 19 b. In the present period 19b, a point 19c where a characteristic sound is further generated is indicated by an arrow. Further, the current time is indicated by an arrow 19d.

以上のように、在・不在判定処理により、細かな瞬間の情報と共に、比較的長時間の傾向を示すことが可能となる。 As described above, by the presence / absence determination process, it is possible to show a tendency for a relatively long time together with detailed information on the moment.

図２０は発生頻度計算処理の手順の一例を示すフローチャートである。発生頻度計算処理は、クラスタリング部１３０が行う処理であり、音クラスタＤＢ１４ｃを更新する処理である。ＣＰＵ１１は、音特徴ＤＢ１４ｂから特徴量データを取得する（ステップＳ７１）。ＣＰＵ１１は、特徴量データのクラスタリング処理を行う（ステップＳ７２）。ここで、クラスタリング処理としては、例えばｋ−ｍｅａｎｓ法のようなクラスタ数を固定するハードクラスタリング手法、例えばＷａｒｄ法のような階層クラスタリング手法やＤＰ−ｍｅａｎｓ法のようなクラスタ数を動的に決定するハードクラスタリング手法、例えばＥＭ法による混合ガウスモデルの最適化に基づく方法やマルコフ連鎖モンテカルロ法による階層ベイズ法に基づく方法といったソフトクラスタリング手法を用いてもよい。 FIG. 20 is a flowchart illustrating an example of the procedure of occurrence frequency calculation processing. The occurrence frequency calculation process is a process performed by the clustering unit 130 and is a process for updating the sound cluster DB 14c. CPU11 acquires feature-value data from sound feature DB14b (step S71). The CPU 11 performs a clustering process on the feature data (step S72). Here, as the clustering processing, for example, a hard clustering method that fixes the number of clusters such as the k-means method, for example, a hierarchical clustering method such as the Ward method or a cluster number such as the DP-means method is dynamically determined. A hard clustering method, for example, a soft clustering method such as a method based on the optimization of the mixed Gaussian model by the EM method or a method based on the hierarchical Bayesian method by the Markov chain Monte Carlo method may be used.

ＣＰＵ１１は得られたクラスタの発生頻度を算出する（ステップＳ７３）。ＣＰＵ１１は処理結果、クラスタＩＤ、クラスタの発生頻度、特徴量（例えばクラスタ中心座標）を音クラスタＤＢ１４ｃに記憶する（ステップＳ７４）。ＣＰＵ１１は発生頻度計算処理を終了する。 The CPU 11 calculates the frequency of occurrence of the obtained cluster (step S73). The CPU 11 stores the processing result, cluster ID, cluster occurrence frequency, and feature amount (for example, cluster center coordinates) in the sound cluster DB 14c (step S74). The CPU 11 ends the occurrence frequency calculation process.

発生頻度計算処理は、所定量の音データが得られる毎に実行しても良いし、時間ウィンドウ毎に特徴量が算出される毎に行っても良い。 The occurrence frequency calculation process may be performed every time a predetermined amount of sound data is obtained, or may be performed every time a feature amount is calculated for each time window.

以上のように、実施の形態１においては、乗数が１以下の累乗関数をフィルタとして採用したことにより、メルスペクトル特徴を用いた場合に生じる、細かい音量で大幅に特徴量形状が異なる問題と、ｌｏｇフィルタを用いた場合に生じる、１以下の出力で値が発散する問題は解決される。また、出力抑制処理により、少数の特徴的な箇所が抽出され、閾値による一律的な判定の場合のように、全部出力されるか、全部出力されないかの何れかのようになってしまうことを防ぐことが可能となる。すなわち、生活音の中から、最も特徴的な箇所を抽出することが可能となる。さらに、在・不在判定処理により、細かな瞬間の情報と共に、比較的長時間の傾向を示すことが可能となる。 As described above, in the first embodiment, by employing a power function having a multiplier of 1 or less as a filter, a problem that a feature amount shape is greatly different at a fine volume, which occurs when a mel spectrum feature is used, The problem that the value diverges with an output of 1 or less, which occurs when the log filter is used, is solved. In addition, a small number of characteristic parts are extracted by the output suppression process, and as in the case of uniform determination based on a threshold value, all output is performed or all output is not performed. It becomes possible to prevent. That is, the most characteristic part can be extracted from the life sounds. Furthermore, the presence / absence determination process makes it possible to show a tendency for a relatively long time together with detailed information on the moment.

実施の形態２
実施の形態２では、出力抑制処理をオンラインで行う。実施の形態２において、特徴音抽出装置１の構成は、実施の形態１と同様であるので、説明を省略する。特徴音抽出装置１が行う処理についても、出力抑制処理を除いて、実施の形態１と同様であるので、以下の説明おいては、主として実施の形態１と異なる部分について説明する。 Embodiment 2
In the second embodiment, the output suppression process is performed online. In the second embodiment, the configuration of the feature sound extraction apparatus 1 is the same as that of the first embodiment, and thus the description thereof is omitted. Since the process performed by the feature sound extraction apparatus 1 is the same as that of the first embodiment except for the output suppression process, the following description will mainly focus on the differences from the first embodiment.

図２１は出力抑制処理の手順の一例を示すフローチャートである。特徴音抽出装置１のＣＰＵ１１は、新たに入力された時間ウィンドウで区切られた音の情報、入力音情報Ｒを取得する（ステップＳ８１）。入力音情報Ｒは、音クラスタＩＤ、発生頻度スコア、タイムスタンプを含む。ＣＰＵ１１は、リングバッファの示すインデクス（index)の要素を取り出し、構造体Ｅに設定する（ステップＳ８２）。ＣＰＵ１１は、Ｅが最大値要素と同じか否かを判定する（ステップＳ８３）。最大値要素とは、発生頻度と例えば反比例するスコアが最も大きい要素としてよい。 FIG. 21 is a flowchart illustrating an example of the procedure of the output suppression process. The CPU 11 of the feature sound extraction apparatus 1 acquires the sound information and the input sound information R separated by the newly input time window (step S81). The input sound information R includes a sound cluster ID, an occurrence frequency score, and a time stamp. The CPU 11 extracts the index element indicated by the ring buffer and sets it in the structure E (step S82). The CPU 11 determines whether E is the same as the maximum value element (step S83). The maximum value element may be an element having a largest score that is inversely proportional to the occurrence frequency.

ＣＰＵ１１はＥが最大値要素と同じと判定した場合（ステップＳ８３でＹＥＳ）、特徴音ＤＢ１４ｅにＥを記憶する（ステップＳ８４）。ＣＰＵ１１は最大値要素をクリアする、すなわち、ＮＵＬＬとする（ステップＳ８５）。ＣＰＵ１１は入力音情報Ｒをリングバッファに登録する（ステップＳ８６）。ＣＰＵ１１は、Ｅが最大値要素と同じでないと判定した場合（ステップＳ８３でＮＯ）、処理をステップＳ８６に移す。 If the CPU 11 determines that E is the same as the maximum value element (YES in step S83), the CPU 11 stores E in the characteristic sound DB 14e (step S84). The CPU 11 clears the maximum value element, that is, sets it to NULL (step S85). The CPU 11 registers the input sound information R in the ring buffer (step S86). If the CPU 11 determines that E is not the same as the maximum value element (NO in step S83), the process proceeds to step S86.

ＣＰＵ１１は最大要素がクリアされたか、または最大値要素のスコアが、入力音情報Ｒのスコアよりも大きいか否かを判定する（ステップＳ８７）。ＣＰＵ１１は、最大要素がクリアされたか、または最大値要素のスコアが、入力音情報Ｒのスコアよりも大きい場合（ステップＳ８７でＹＥＳ）、最大値要素をＲとする（ステップＳ８８）。ＣＰＵ１１は出力抑制処理を終了する。ＣＰＵ１１は、最大値要素がＮＵＬＬでもなく、最大値要素のスコアが、入力音情報Ｒのスコア以下の場合（ステップＳ８７でＮＯ）、出力抑制処理を終了する。 The CPU 11 determines whether the maximum element is cleared or whether the score of the maximum value element is larger than the score of the input sound information R (step S87). When the maximum element is cleared or the score of the maximum value element is larger than the score of the input sound information R (YES in step S87), the CPU 11 sets the maximum value element to R (step S88). The CPU 11 ends the output suppression process. When the maximum value element is not NULL and the score of the maximum value element is equal to or lower than the score of the input sound information R (NO in step S87), the CPU 11 ends the output suppression process.

提示処理では、リングバッファ記憶されているデータの中で、最大値要素と同じスコア持つものが１つであり、かつ未だに提示されていないデータに対応する音データを再生する。 In the presentation process, one piece of data having the same score as the maximum value element is stored in the ring buffer, and sound data corresponding to data that has not been presented yet is reproduced.

図２２は出力抑制処理の具体例を示す説明図である。図２２の上段に示す波形等については、図１６と同様である。図２２の中段、下段は、リングバッファＲｉの状態を示している。図２２に示す例では、リングバッファＲｉはＲ１、Ｒ２、Ｒ３の３つのバッファからなる。上向き矢印がインデクスの位置を示す。縦に並ぶ値は各バッファに格納されている値を示している。上から順に、スコア、クラスタＩＤ、タイムスタンプの順である。図２２は、入力音情報Ｒとして、期間Ｋ３６のデータが入力された場合を示す。ＲのスコアＳは１．０、音クラスタＩＤは０１、タイムスタンプは１２：３３である。最大値要素は、スコアＳは２．０、音クラスタＩＤは０２、タイムスタンプは１２：３０である。図２２の中段に示すように、リングバッファＲｉの示すインデックスＩの要素は、最大値要素と同じであるから、最大値要素は一度クリアされ、入力音情報Ｒに基づいた値に更新されている（図２２の下段参照）。 FIG. 22 is an explanatory diagram showing a specific example of output suppression processing. The waveforms shown in the upper part of FIG. 22 are the same as those in FIG. The middle and lower stages of FIG. 22 show the state of the ring buffer Ri. In the example shown in FIG. 22, the ring buffer Ri is composed of three buffers R1, R2, and R3. An upward arrow indicates the position of the index. The values arranged vertically indicate the values stored in each buffer. In order from the top, the order is score, cluster ID, and time stamp. FIG. 22 shows a case where data of period K36 is input as input sound information R. The R score S is 1.0, the sound cluster ID is 01, and the time stamp is 12:33. As for the maximum value element, the score S is 2.0, the sound cluster ID is 02, and the time stamp is 12:30. As shown in the middle part of FIG. 22, since the element of index I indicated by the ring buffer Ri is the same as the maximum value element, the maximum value element is once cleared and updated to a value based on the input sound information R. (See the lower part of FIG. 22).

実施の形態２では、オンライン処理を行うことにより、ほぼリアルタイムに特徴音の提示が可能となる。 In the second embodiment, the feature sound can be presented almost in real time by performing online processing.

実施の形態３
実施の形態３では、出力抑制処理が実施の形態１とは異なる。実施の形態３において、特徴音抽出装置１の構成は、実施の形態１と同様であるので、説明を省略する。特徴音抽出装置１が行う処理についても、出力抑制処理を除いて、実施の形態１と同様であるので、以下の説明においては、主として実施の形態１と異なる部分について説明する。 Embodiment 3
In the third embodiment, the output suppression process is different from that in the first embodiment. In the third embodiment, the configuration of the feature sound extraction apparatus 1 is the same as that of the first embodiment, and thus the description thereof is omitted. Since the process performed by the feature sound extraction apparatus 1 is the same as that in the first embodiment except for the output suppression process, the following description will mainly focus on the differences from the first embodiment.

図２３は出力抑制処理の他の例を示す説明図である。図２３に示す波形等については、図１６と同様である。実施の形態３では、単純に最大スコアとなる１箇所を選択するのではなく、特徴箇所内の評価スコアの和が最大となるように選択する。図２３に示す例では、期間３つを１つのグループとして評価スコアの和を求めている。本例における音データの種別は、図２３の右側に示すように、咳払いの音、テレビの音、ファンの音の３種類である。評価スコアの値は、それぞれ２．０、１．０、０．０である。グループｇ１は、咳払いの音が１つ、テレビの音はなし、ファンの音が１つであるため、評価値の和Ｓ（Ｗ）は次のように計算される。Ｓ（Ｗ）＝２．０×１＋１．０×０＋０．０×１＝２．０ FIG. 23 is an explanatory diagram showing another example of output suppression processing. The waveforms and the like shown in FIG. 23 are the same as those in FIG. In the third embodiment, instead of simply selecting one location that has the maximum score, the sum of the evaluation scores in the feature locations is selected to be the maximum. In the example shown in FIG. 23, the sum of evaluation scores is obtained with three periods as one group. As shown on the right side of FIG. 23, there are three types of sound data in this example: a coughing sound, a television sound, and a fan sound. The evaluation score values are 2.0, 1.0, and 0.0, respectively. Since the group g1 has one coughing sound, no TV sound, and one fan sound, the sum S (W) of the evaluation values is calculated as follows. S (W) = 2.0 × 1 + 1.0 × 0 + 0.0 × 1 = 2.0

同様に計算すると、グループｇ２の評価スコアは１．０、グループｇ３の評価スコアは３．０である。したがって、グループｇ３が提示候補となる。 When calculated in the same manner, the evaluation score of the group g2 is 1.0, and the evaluation score of the group g3 is 3.0. Therefore, the group g3 is a presentation candidate.

実施の形態３においては、複数箇所をグループ化して、グループごとに出力抑制処理を行うので、できるだけ多くの種類の音が含まれるように提示することが可能となる。 In Embodiment 3, since a plurality of locations are grouped and the output suppression process is performed for each group, it is possible to present as many types of sounds as possible.

累乗フィルタ以外のフィルタの例
上述において、フィルタは累乗フィルタに限られない。次の２つの要件を満たすものであれば、フィルタとして採用可能である。１の要件は、０以上のある閾値ａ_ｂ以下で最小値Ｔを取るものである。これは、ｘが小さい時に値を発散させないためである。他の１の要件は、閾値ａ_ｂを超えた範囲ではｘで微分又は劣微分した値が単調減少となることである。これは、強い音の時には、その影響を下げるためである。 Examples of filters other than the power filter In the above description, the filter is not limited to the power filter. Any filter that satisfies the following two requirements can be used as a filter. The requirement of 1 is to take the minimum value T below a certain threshold _{ab of} 0 or more. This is because the value does not diverge when x is small. Another requirement is that the value differentiated or sub-differentiated by x is monotonously decreased in the range exceeding the threshold _ab . This is to reduce the influence of a strong sound.

図２４はフィルタの入出力の関係の一例を示すグラフである。横軸は入力、縦軸は出力で、横軸、縦軸とも無次元である。図２４Ａは１乗以下の累乗関数を示している。図２４Ｂは最小値が０でないフィルタ関数を示している。図２４Ｃはｘ＝０では最小値とはならないフィルタ関数を示している。図２４Ｄは局所的に値が大きく変動するフィルタ関数を示している。いずれのフィルタ関数も、上述の条件を満たしているため、実施の形態１から３において、累乗フィルタに替えて、図２４に示したフィルタを採用してもよい。なお、図２４に示すのはフィルタの例であり、これらに限られるものではない。 FIG. 24 is a graph showing an example of the input / output relationship of the filter. The horizontal axis is input, the vertical axis is output, and both the horizontal and vertical axes are dimensionless. FIG. 24A shows a power function of 1st power or less. FIG. 24B shows a filter function whose minimum value is not zero. FIG. 24C shows a filter function that does not have a minimum value when x = 0. FIG. 24D shows a filter function whose value greatly fluctuates locally. Since all the filter functions satisfy the above-described conditions, in the first to third embodiments, the filter shown in FIG. 24 may be employed instead of the power filter. In addition, what is shown in FIG. 24 is an example of a filter, It is not restricted to these.

実施の形態４
上述した実施の形態１から実施の形態３の特徴音抽出装置１をプレイスサーバ（配信装置）と連携した配信システムを構成することも可能である。図２５は配信システムの構成の一例を示す説明図である。配信システムは、特徴音抽出装置１、プレイスサーバ２、端末機３、ルータ４を含む。プレイスサーバ２、端末機３は、それぞれ、汎用コンピュータ、ワークステーション、デスクトップ型ＰＣ（パーソナルコンピュータ）、ノートブック型ＰＣ、タブレットＰＣ、スマートフォン等である。特徴音抽出装置１及び端末機３はルータ４、ネットワークＮを介して、プレイスサーバ２と接続されている。ルータ４は必須の構成ではなく、特徴音抽出装置１及び端末機３は直接、ネットワークＮと接続されても良い。端末機３の台数は適宜、定めれば良い。また、図２５に示す例では、特徴音抽出装置１、端末機３、ルータ４は同一の空間ＳＰに設置されている。 Embodiment 4
It is also possible to configure a distribution system in which the characteristic sound extraction device 1 of the first to third embodiments described above is linked with a place server (distribution device). FIG. 25 is an explanatory diagram showing an example of the configuration of the distribution system. The distribution system includes a feature sound extraction device 1, a place server 2, a terminal 3, and a router 4. The place server 2 and the terminal 3 are respectively a general-purpose computer, a workstation, a desktop PC (personal computer), a notebook PC, a tablet PC, a smartphone, and the like. The feature sound extraction device 1 and the terminal 3 are connected to the place server 2 via the router 4 and the network N. The router 4 is not an essential configuration, and the feature sound extraction device 1 and the terminal 3 may be directly connected to the network N. The number of terminals 3 may be determined as appropriate. In the example shown in FIG. 25, the feature sound extraction device 1, the terminal 3, and the router 4 are installed in the same space SP.

プレイスサーバ２は、制御部２１、記憶部２２、通信部２３（受信部、配信部）を含む。制御部２１はＣＰＵ、ＲＡＭ、ＲＯＭ等を含み、ハードウェア各部を制御する。記憶部２２は特徴音抽出装置１や端末機３に配信するコンピュータプログラムなどを記憶する。通信部２３はネットワークＮ、ルータ４を介して、特徴音抽出装置１や端末機３と通信を行う。 The place server 2 includes a control unit 21, a storage unit 22, and a communication unit 23 (reception unit, distribution unit). The control unit 21 includes a CPU, a RAM, a ROM, and the like, and controls each part of the hardware. The storage unit 22 stores a computer program distributed to the feature sound extraction device 1 and the terminal 3. The communication unit 23 communicates with the feature sound extraction device 1 and the terminal 3 via the network N and the router 4.

プレイスサーバ２の制御部２１は通信部２３を介して、特徴音抽出装置１の在／不在判定部１４１（判定部）より人の在／不在の判定結果を受け取る。制御部２１は受け取った判定結果に基づいて、通信部２３を介して、コンピュータプログラムを特徴音抽出装置１に配信する。特徴音抽出装置１は、受信したコンピュータプログラムをさらに、端末機３に配信する。端末機３は、受信したコンピュータプログラムを実行する。それにより、端末機３は、人の在／不在に応じた動作を行う。 The control unit 21 of the place server 2 receives the presence / absence determination result of the person from the presence / absence determination unit 141 (determination unit) of the feature sound extraction device 1 via the communication unit 23. Based on the received determination result, the control unit 21 distributes the computer program to the feature sound extraction device 1 via the communication unit 23. The characteristic sound extraction apparatus 1 further distributes the received computer program to the terminal device 3. The terminal 3 executes the received computer program. Accordingly, the terminal 3 performs an operation according to the presence / absence of a person.

配信システムの一例としては、ＣＡＩ（Ｃｏｍｐｕｔｅｒ−ＡｓｓｉｓｔｅｄＩｎｓｔｒｕｃｔｉｏｎまたはＣｏｍｐｕｔｅｒ−ＡｉｄｅｄＩｎｓｔｒｕｃｔｉｏｎ）においての利用である。特徴音抽出装置１、端末機３、ルータ４が設置されている空間ＳＰを１つの教室とする。特徴音抽出装置１が、空間ＳＰに人が不在と判定した場合、端末機３には、スリープ状態を保つコンピュータプログラムを実行させ、消費電力の節約を行う。特徴音抽出装置１が、空間ＳＰに人が存在すると判定した場合、端末機３にコンピュータプログラムの配信を行い、端末機３の利用を可能とする。空間ＳＰが学校のパソコンルームの場合には、プレイスサーバ２に時間割の情報を記憶させ、時間帯に応じて、適切な教科の教育用プログラムを配信すれば良い。 An example of the distribution system is use in CAI (Computer-Assisted Instruction or Computer-Aided Instruction). A space SP in which the feature sound extraction device 1, the terminal 3, and the router 4 are installed is taken as one classroom. When the feature sound extraction apparatus 1 determines that no person is present in the space SP, the terminal 3 is caused to execute a computer program that maintains the sleep state, thereby saving power consumption. When the characteristic sound extraction apparatus 1 determines that there is a person in the space SP, the computer program is distributed to the terminal 3 so that the terminal 3 can be used. When the space SP is a school computer room, information on the timetable may be stored in the place server 2 and an educational program for an appropriate subject may be distributed according to the time zone.

実施の形態４では、人の在／不在により、端末機３の動作を変更することが可能となるので、端末機３の利用者又は管理者が動作変更の操作を行う必要がないという効果を奏する。 In the fourth embodiment, since the operation of the terminal 3 can be changed depending on the presence / absence of a person, the user or the administrator of the terminal 3 does not need to perform an operation change operation. Play.

各実施例で記載されている技術的特徴（構成要件）はお互いに組合せ可能であり、組み合わせすることにより、新しい技術的特徴を形成することができる。
今回開示された実施の形態はすべての点で例示であって、制限的なものでは無いと考えられるべきである。本発明の範囲は、上記した意味では無く、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 The technical features (components) described in each embodiment can be combined with each other, and new technical features can be formed by combining them.
The embodiments disclosed herein are illustrative in all respects and should not be considered as restrictive. The scope of the present invention is defined not by the above-mentioned meaning but by the scope of the claims, and is intended to include all modifications within the meaning and scope equivalent to the scope of the claims.

以上の実施の形態に関し、さらに以下の付記を開示する。 Regarding the above embodiment, the following additional notes are disclosed.

（付記１）
音データを所定時間毎に区切り、
区切られた期間毎に音データの周波数成分を含む特徴量を算出し、
前記期間毎に算出した音データの周波数成分を含む特徴量の１以上の成分値ｘに対して、
区間ａ_ｂ≦ｘ≦ａ_ｔ（０≦ａ_ｂ＜ａ_ｔ≦∞）において、ｘで微分又は劣微分した関数が単調減少であり、下記の式を満たす関数値の下界Ｔが存在する関数ｆ（ｘ）を作用させ、
作用させた結果に基づき、前記期間毎の特徴量を求め、
求めた期間毎の特徴量に基づいて、特徴音データを抽出する
処理をコンピュータにより実行する特徴音抽出方法。

(Appendix 1)
Divide the sound data every predetermined time,
Calculate the feature value including the frequency component of the sound data for each divided period,
For one or more component values x of feature quantities including frequency components of sound data calculated for each period,
In the interval a _b ≦ x ≦ a _t (0 ≦ a _b <a _t ≦ ∞), the function f which is monotonically decreased in the function differentiated or sub-differentiated by x and has a lower bound T of the function value satisfying the following formula: (X) acts,
Based on the effected result, the feature amount for each period is obtained,
A feature sound extraction method in which a process of extracting feature sound data is executed by a computer based on the obtained feature amount for each period.

（付記２）
前記関数ｆ（ｘ）は、下記の式で示す乗数ｐが１未満の累乗関数である
付記１に記載の特徴音抽出方法。

(Appendix 2)
The characteristic sound extraction method according to claim 1, wherein the function f (x) is a power function having a multiplier p represented by the following expression of less than 1.

（付記３）
前記期間毎の特徴量をクラスタリング処理して音データの種別分けを行い、
夫々の音の種別毎の発生頻度に基づいて、特徴音データを抽出する
付記１又は２に記載の特徴音抽出方法。 (Appendix 3)
Clustering the feature values for each period to classify the sound data,
The feature sound extraction method according to appendix 1 or 2, wherein feature sound data is extracted based on an occurrence frequency of each sound type.

（付記４）
前記発生頻度から音データの種別毎の評価指標を算出し、複数の前記期間の中で評価指標が最大となる種別の音データを特徴音データと判定し、判定した特徴音データを抽出する
付記３に記載の特徴音抽出方法。 (Appendix 4)
An evaluation index for each type of sound data is calculated from the occurrence frequency, the sound data of the type having the maximum evaluation index in the plurality of periods is determined as the characteristic sound data, and the determined characteristic sound data is extracted. 4. The characteristic sound extraction method according to 3.

（付記５）
前記特徴音データと判定した音データの評価指標の値が閾値よりも小さい場合は、特徴音データを抽出しない
付記４に記載の特徴音抽出方法。 (Appendix 5)
The feature sound extraction method according to claim 4, wherein the feature sound data is not extracted when the evaluation index value of the sound data determined as the feature sound data is smaller than a threshold value.

（付記６）
前記発生頻度から音データの種別毎の評価指標を算出し、
前記期間を所定数含む評価区間夫々に含まれる音データの評価指標を加算し、
加算した評価指標が最大となる評価区間を特定し、
特定した評価区間に含まれる音データを特徴音データとして抽出する
付記４又は５に記載の特徴音抽出方法。 (Appendix 6)
An evaluation index for each type of sound data is calculated from the occurrence frequency,
Add an evaluation index of sound data included in each evaluation section including a predetermined number of the period,
Identify the evaluation interval that maximizes the added evaluation index,
The feature sound extraction method according to appendix 4 or 5, wherein the sound data included in the identified evaluation section is extracted as feature sound data.

（付記７）
前記音データは、人が存在しうる空間で収集されたものであり、
所定の時間幅に対して、前記特徴音データの出現回数をカウントし、
出現回数が所定値以上であれば、前記空間に人は存在し、そうでなければ存在しないと判定する
付記４から６のいずれか１つに記載の特徴音抽出方法。 (Appendix 7)
The sound data is collected in a space where humans can exist,
Count the number of appearances of the characteristic sound data for a predetermined time width,
The feature sound extraction method according to any one of appendices 4 to 6, wherein if the number of appearances is equal to or greater than a predetermined value, it is determined that there is a person in the space, otherwise it does not exist.

（付記８）
音データを所定時間毎に区切り、区切られた期間毎に音データの周波数成分を含む特徴量を算出する算出部と、
前記期間毎に算出した音データの周波数成分を含む特徴量の１以上の成分値ｘに対して、区間ａ_ｂ≦ｘ≦ａ_ｔ（０≦ａ_ｂ＜ａ_ｔ≦∞）において、ｘで微分又は劣微分した関数が単調減少であり、下記の式を満たす関数値の下界Ｔが存在する関数ｆ（ｘ）を作用させるフィルタ部と、
作用させた結果に基づき、前記期間毎の特徴量を算出する特徴量算出部と、
求めた期間毎の特徴量に基づいて、特徴音データを抽出する抽出部と
を備える特徴音抽出装置。

(Appendix 8)
A calculation unit that divides the sound data every predetermined time, and calculates a feature amount including a frequency component of the sound data for each divided period;
With respect to one or more component values x of feature quantities including frequency components of sound data calculated for each period, differentiation is performed with x in a section a _b ≦ x ≦ a _t (0 ≦ a _b <a _t ≦ ∞). Alternatively, a filter unit that applies a function f (x) in which a sub-differentiated function is monotonically decreasing and a lower bound T of a function value that satisfies the following equation exists,
A feature amount calculation unit that calculates a feature amount for each period based on the effected results;
A feature sound extraction apparatus comprising: an extraction unit that extracts feature sound data based on the obtained feature amount for each period.

（付記９）
音データを所定時間毎に区切り、
区切られた期間毎に音データの周波数成分を含む特徴量を算出し、
前記期間毎に算出した音データの周波数成分を含む特徴量の１以上の成分値ｘに対して、
区間ａ_ｂ≦ｘ≦ａ_ｔ（０≦ａ_ｂ＜ａ_ｔ≦∞）において、ｘで微分又は劣微分した関数が単調減少であり、下記の式を満たす関数値の下界Ｔが存在する関数ｆ（ｘ）を作用させ、
作用させた結果に基づき、前記期間毎の特徴量を求め、
求めた期間毎の特徴量に基づいて、特徴音データを抽出する
処理をコンピュータに実行させるコンピュータプログラム。

(Appendix 9)
Divide the sound data every predetermined time,
Calculate the feature value including the frequency component of the sound data for each divided period,
For one or more component values x of feature quantities including frequency components of sound data calculated for each period,
In the interval a _b ≦ x ≦ a _t (0 ≦ a _b <a _t ≦ ∞), the function f which is monotonically decreased in the function differentiated or sub-differentiated by x and has a lower bound T of the function value satisfying the following formula: (X) acts,
Based on the effected result, the feature amount for each period is obtained,
A computer program that causes a computer to execute a process of extracting feature sound data based on a calculated feature amount for each period.

（付記１０）
人が存在しうる空間で収集された音データを所定時間毎に区切り、区切られた期間毎に音データの周波数成分を含む特徴量を算出する算出部と、
前記期間毎に算出した音データの周波数成分を含む特徴量の１以上の成分値ｘに対して、区間ａ_ｂ≦ｘ≦ａ_ｔ（０≦ａ_ｂ＜ａ_ｔ≦∞）において、ｘで微分又は劣微分した関数が単調減少であり、下記の式を満たす関数値の下界Ｔが存在する関数ｆ（ｘ）を作用させるフィルタ部と、
作用させた結果に基づき、前記期間毎の特徴量を算出する特徴量算出部と、
求めた期間毎の特徴量に基づいて、特徴音データを抽出する抽出部と、
所定の時間幅に対して、抽出した特徴音データの出現回数をカウントする計数部と、
出現回数が所定以上であれば、前記空間に人が存在すると判定し、そうでなければ存在しないと判定する判定部と、
判定した結果を送信する送信部と
を有する特徴音抽出装置、及び
前記判定した結果を受信する受信部と、
受信した結果に基づいて、前記特徴音抽出装置に所定のコンピュータプログラムを配信する配信部と
を有する配信装置を
備える配信システム。

(Appendix 10)
A calculation unit that divides sound data collected in a space where a person may exist at predetermined time intervals, and calculates a feature amount including a frequency component of the sound data for each divided period;
With respect to one or more component values x of feature quantities including frequency components of sound data calculated for each period, differentiation is performed with x in a section a _b ≦ x ≦ a _t (0 ≦ a _b <a _t ≦ ∞). Alternatively, a filter unit that applies a function f (x) in which a sub-differentiated function is monotonically decreasing and a lower bound T of a function value that satisfies the following equation exists,
A feature amount calculation unit that calculates a feature amount for each period based on the effected results;
An extraction unit that extracts feature sound data based on the obtained feature amount for each period;
A counting unit that counts the number of appearances of the extracted characteristic sound data for a predetermined time width;
A determination unit that determines that a person is present in the space if the number of appearances is equal to or greater than a predetermined number;
A characteristic sound extraction device having a transmitter for transmitting the determined result, and a receiver for receiving the determined result;
A distribution system comprising: a distribution device having a distribution unit that distributes a predetermined computer program to the characteristic sound extraction device based on a received result.

１特徴音抽出装置
１１ＣＰＵ
１２ＲＡＭ
１３ＲＯＭ
１４大容量記憶装置
１４ａ生活音ＤＢ
１４ｂ音特徴ＤＢ
１４ｃ音クラスタＤＢ
１４ｄダイジェスト表示ＤＢ
１４ｅ特徴音ＤＢ
１５入力部
１５ａマイク
１６出力部
１６ａ表示装置
１７通信部
１８読取り部
１Ｐ制御プログラム
１ａ可搬型記憶媒体
１ｂ半導体メモリ
１１０入力部
１１１生活音入力部
１２０特徴スコア計算部
１２１音特徴計算部
１２２音クラスタマッチング・スコア計算部
１３０クラスタリング部
１３１クラスタリング処理部
１３２クラスタ発生頻度計算部
１４０ダイジェスト表示生成部
１４１在／不在判定部
１５０出力抑制部
１５１特徴箇所出力抑制部
１６０提示部
１６１ＧＵＩ表示部
１６２音声提示部
２プレイスサーバ
３端末機
４ルータ
Ｎネットワーク 1 Feature Sound Extractor 11 CPU
12 RAM
13 ROM
14 Mass storage device 14a Living sound DB
14b Sound feature DB
14c Sound cluster DB
14d Digest display DB
14e Feature sound DB
DESCRIPTION OF SYMBOLS 15 Input part 15a Microphone 16 Output part 16a Display apparatus 17 Communication part 18 Reading part 1P Control program 1a Portable storage medium 1b Semiconductor memory 110 Input part 111 Living sound input part 120 Feature score calculation part 121 Sound feature calculation part 122 Sound cluster matching Score calculation unit 130 Clustering unit 131 Clustering processing unit 132 Cluster occurrence frequency calculation unit 140 Digest display generation unit 141 Presence / absence determination unit 150 Output suppression unit 151 Feature location output suppression unit 160 Presentation unit 161 GUI display unit 162 Voice presentation unit 2 Place server 3 Terminal 4 Router N Network

Claims

Divide the sound data every predetermined time,
Calculate the feature value including the frequency component of the sound data for each divided period,
For one or more component values x of feature quantities including frequency components of sound data calculated for each period,
In the section _{_{x ≧ a b (a b ≧}} 0), the function obtained by differentiating or subderivative at x is monotonously decreasing function lower bound T function values which satisfy the following formula is present in the interval 0 ≦ x ≦ a _b f (X) acts,
Based on the effected result, the feature amount for each period is obtained,
A feature sound extraction method in which a process of extracting feature sound data is executed by a computer based on the obtained feature amount for each period.

The characteristic sound extraction method according to claim 1, wherein the function f (x) is a power function in which a multiplier p represented by the following expression is less than one.

A calculation unit that divides the sound data every predetermined time, and calculates a feature amount including a frequency component of the sound data for each divided period;
A function that is differentiated or sub-differentiated with x in a section x ≧ a _b (a _b ≧ 0) with respect to one or more component values x of feature quantities including frequency components of sound data calculated for each period is monotonously decreased. , and the filter unit that applies the function f (x) lower bound T function values which satisfy the following formula is present in the interval 0 ≦ x ≦ a _b,
A feature amount calculation unit that calculates a feature amount for each period based on the effected results;
A feature sound extraction apparatus comprising: an extraction unit that extracts feature sound data based on the obtained feature amount for each period.

Divide the sound data every predetermined time,
Calculate the feature value including the frequency component of the sound data for each divided period,
For one or more component values x of feature quantities including frequency components of sound data calculated for each period,
In the section _{_{x ≧ a b (a b ≧}} 0), the function obtained by differentiating or subderivative at x is monotonously decreasing function lower bound T function values which satisfy the following formula is present in the interval 0 ≦ x ≦ a _b f (X) acts,
Based on the effected result, the feature amount for each period is obtained,
A computer program that causes a computer to execute a process of extracting feature sound data based on a calculated feature amount for each period.

A calculation unit that divides sound data collected in a space where a person may exist at predetermined time intervals, and calculates a feature amount including a frequency component of the sound data for each divided period;
A function that is differentiated or sub-differentiated with x in a section x ≧ a _b (a _b ≧ 0) with respect to one or more component values x of feature quantities including frequency components of sound data calculated for each period is monotonously decreased. , and the filter unit that applies the function f (x) lower bound T function values which satisfy the following formula is present in the interval 0 ≦ x ≦ a _b,
A feature amount calculation unit that calculates a feature amount for each period based on the effected results;
An extraction unit that extracts feature sound data based on the obtained feature amount for each period;
A counting unit that counts the number of appearances of the extracted characteristic sound data for a predetermined time width;
A determination unit that determines that a person is present in the space if the number of appearances is equal to or greater than a predetermined number;
A characteristic sound extraction device having a transmitter for transmitting the determined result, and a receiver for receiving the determined result;
A distribution system comprising: a distribution device having a distribution unit that distributes a predetermined computer program to the characteristic sound extraction device based on a received result.