JP4252028B2

JP4252028B2 - Traffic sound identification device, traffic sound determination program for causing computer to function as traffic sound identification device, recording medium, and traffic sound determination method

Info

Publication number: JP4252028B2
Application number: JP2004341045A
Authority: JP
Inventors: 剛史森田; 理服部; 健二天目; 和夫能勢; 綾子平松
Original assignee: Sumitomo Electric Industries Ltd; Osaka Sangyo University
Current assignee: Sumitomo Electric Industries Ltd; Osaka Sangyo University
Priority date: 2004-11-25
Filing date: 2004-11-25
Publication date: 2009-04-08
Anticipated expiration: 2024-11-25
Also published as: JP2006154961A

Description

本発明は、交通音識別装置、コンピュータを交通音識別装置として機能させるための交通音判定プログラム、記録媒体および交通音判定方法に関し、特に、交通音の種類を判定する交通音識別装置、コンピュータを交通音識別装置として機能させるための交通音判定プログラム、記録媒体および交通音判定方法に関する。 The present invention relates to a traffic sound identification device, a traffic sound determination program for causing a computer to function as a traffic sound identification device, a recording medium, and a traffic sound determination method, and more particularly to a traffic sound identification device and a computer for determining the type of traffic sound. The present invention relates to a traffic sound determination program, a recording medium, and a traffic sound determination method for functioning as a traffic sound identification device.

交通安全を確保し、事故を減少させるためには、事故発生を早期検出し通報するシステムによって二次災害を防止したり、事故の事後処理の早期解決を図るとともに、事故発生のメカニズムを解析することが不可欠である。 In order to ensure traffic safety and reduce accidents, prevent accidents by using a system that detects and reports accidents at an early stage, seeks early resolution of accident post-processing, and analyzes the mechanism of accidents. It is essential.

事故発生のメカニズムを解析するためには、事故やニアミス事象の検出を行なうことが必要であり、事故などによる衝突音や急ブレーキ音などの交通音に基づき事故などを検出することが行なわれている。 In order to analyze the mechanism of accident occurrence, it is necessary to detect accidents and near miss events, and accidents are detected based on traffic sounds such as collision sounds and sudden braking sounds due to accidents. Yes.

しかし、交通音には、事故が発生したときの事故音と似たような特徴を示す音がたくさんある。当該音には、たとえば、大型車の荷台の雑音や工事作業中に発生する金属音などがある。そのため、事故音と、事故音以外の音（以下においては、非事故音とも称する）との識別は、非常に困難である。 However, many traffic sounds have characteristics similar to those of accidents when an accident occurs. The sound includes, for example, noise of a large vehicle carrier or metal sound generated during construction work. Therefore, it is very difficult to distinguish between accident sounds and sounds other than accident sounds (hereinafter also referred to as non-accident sounds).

そこで、ニューラルネットワークを用いて、事故音の正確な識別を行なう技術が特開２０００−２７５０９６号公報（特許文献１）および特開２００１−３３３０４号公報（特許文献２）に開示されている。
特開２０００−２７５０９６号公報特開２００１−３３３０４号公報 Therefore, techniques for accurately identifying accident sounds using a neural network are disclosed in Japanese Patent Application Laid-Open No. 2000-275096 (Patent Document 1) and Japanese Patent Application Laid-Open No. 2001-33304 (Patent Document 2).
JP 2000-275096 A JP 2001-33304 A

しかしながら、特開２０００−２７５０９６号公報（特許文献１）および特開２００１−３３３０４号公報（特許文献２）に開示されている技術では、交通音を識別するためのニューラルネットワークの出力値が、“０”，“１”などといった整数の組み合わせでないと、交通音の正確な識別はできない。 However, in the technique disclosed in Japanese Patent Application Laid-Open No. 2000-275096 (Patent Document 1) and Japanese Patent Application Laid-Open No. 2001-33304 (Patent Document 2), an output value of a neural network for identifying a traffic sound is “ The traffic sound cannot be accurately identified unless it is a combination of integers such as “0” and “1”.

一般に、識別の非常に困難な交通音のデータが入力されたニューラルネットワークの出力層からの出力値は、当該ニューラルネットワークが十分に学習されていたとしても、整数ではなく、小数になることが多い。したがって、特開２０００−２７５０９６号公報（特許文献１）および特開２００１−３３３０４号公報（特許文献２）に開示されている技術では、ニューラルネットワークの出力層からの出力値が小数であれば、交通音を正確に識別できる可能性は低くなる。 In general, the output value from the output layer of a neural network to which traffic sound data that is very difficult to identify is input is often not an integer but a decimal, even if the neural network is sufficiently learned. . Therefore, in the technique disclosed in Japanese Patent Laid-Open No. 2000-275096 (Patent Document 1) and Japanese Patent Laid-Open No. 2001-33304 (Patent Document 2), if the output value from the output layer of the neural network is a decimal number, The possibility of accurately identifying traffic sounds is low.

また、特開２０００−２７５０９６号公報（特許文献１）および特開２００１−３３３０４号公報（特許文献２）に開示されている技術では、ニューラルネットワークを１つしか使用していないため、ニューラルネットワークの出力層からの出力値が“０”および“１”の中間の小数（たとえば、０．４）であると、結果がなしとなり、交通音の識別がさらに困難となる。 Further, since the technique disclosed in Japanese Patent Application Laid-Open No. 2000-275096 (Patent Document 1) and Japanese Patent Application Laid-Open No. 2001-33304 (Patent Document 2) uses only one neural network, If the output value from the output layer is an intermediate decimal number between “0” and “1” (for example, 0.4), no result is obtained, and traffic sound identification becomes more difficult.

本発明は、上述の課題を解決するためになされたもので、本発明の目的の一つは、交通音の識別を非常に高い精度で行なうことができる交通音識別装置を提供することである。 The present invention has been made to solve the above-described problems, and one of the objects of the present invention is to provide a traffic sound identification device capable of identifying traffic sounds with very high accuracy. .

本発明の他の目的は、交通音の識別を非常に高い精度で行なうことができる交通音識別装置として、コンピュータを機能させるための交通音判定プログラムを提供することである。 Another object of the present invention is to provide a traffic sound determination program for causing a computer to function as a traffic sound identification device capable of identifying traffic sounds with very high accuracy.

本発明のさらに他の目的は、交通音の識別を非常に高い精度で行なうことができる交通音識別装置として、コンピュータを機能させるための交通音判定プログラムを記録した記録媒体を提供することである。 Still another object of the present invention is to provide a recording medium in which a traffic sound determination program for causing a computer to function is recorded as a traffic sound identification device capable of identifying traffic sounds with very high accuracy. .

本発明のさらに他の目的は、交通音の識別を非常に高い精度で行なうことができる交通音判定方法を提供することである。 Still another object of the present invention is to provide a traffic sound determination method capable of identifying traffic sounds with very high accuracy.

上述の課題を解決するために、この発明のある局面に従うと、収集した交通音が、予め定めた複数種類の交通音のいずれであるかの判定を行なう交通音識別装置であって、交通音識別装置は、交通音のパワースペクトルを演算する演算回路と、演算回路により演算されたパワースペクトルを複数のサブバンドに分割し、複数のサブバンドに分割されたパワースペクトルに基づいて、Ｉ（自然数）個の入力データを生成するサブバンド回路と、複数のサブバンドに分割されたパワースペクトルに基づいて交通音の種類を判定するための第１番目から第Ｌ（自然数）番目まで予め順序付けられたＬ個のニューラルネットワークと、Ｌ個のニューラルネットワークを学習させるための学習回路とを備え、学習回路は、学習用のＮ（自然数）個の交通音データから重複を妨げないでランダムにＮ個の交通音データを選択する処理をＬ回行なうことで、Ｌ組の学習データを順次生成して、Ｌ個のニューラルネットワークで順次学習させる学習処理を行ない、学習されたＬ個のニューラルネットワークの各々は、Ｉ個の入力データ基づいて、交通音の種類の判定結果を出力し、交通音識別装置は、学習されたＬ個のニューラルネットワークからそれぞれ出力されたＬ個の判定結果の多数決に基づいて、交通音の種類の判定を行うための判定回路をさらに備える。 In order to solve the above-described problem, according to one aspect of the present invention, there is provided a traffic sound identification device that determines which of a plurality of predetermined types of traffic sounds is collected. The identification device calculates a power spectrum of traffic sound, divides the power spectrum calculated by the arithmetic circuit into a plurality of subbands, and based on the power spectrum divided into the plurality of subbands, I (natural number) ) Pre-ordered from the first to the Lth (natural number) for determining the type of traffic sound based on the subband circuit for generating a plurality of input data and the power spectrum divided into a plurality of subbands L neural networks and a learning circuit for learning the L neural networks are provided, and the learning circuit has N (natural number) traffic for learning. A process of selecting N traffic sound data at random without interfering with the data from the data is performed L times, so that L sets of learning data are sequentially generated, and a learning process of sequentially learning with L neural networks is performed. Each of the learned L neural networks outputs a judgment result of the type of traffic sound based on the I input data , and the traffic sound identification device is output from each of the learned L neural networks. And a determination circuit for determining the type of traffic sound based on the majority decision of the L determination results.

この発明の他の局面に従うと、収集した交通音が、予め定めた複数種類の交通音のいずれであるかの判定を行なう交通音識別装置であって、交通音識別装置は、交通音のパワースペクトルを演算する演算回路と、演算回路により演算されたパワースペクトルを複数のサブバンドに分割するサブバンド回路と、複数のサブバンドに分割されたパワースペクトルに基づいて交通音の種類を判定するための第１番目から第Ｌ（自然数）番目まで予め順序付けられたＬ個のニューラルネットワークと、Ｌ個のニューラルネットワークを学習させるための学習回路とを備え、学習回路は、学習用のＮ（自然数）個の交通音データから重複を妨げないでランダムにＮ個の交通音データを選択するデータ選択処理をＬ回行なうことで、Ｌ組の学習データを順次生成して、Ｌ個のニューラルネットワークで順次学習させる学習処理を行ない、学習回路は、データ選択処理では、Ｌ個のうちの第（ｋ＋１）（ｋ：Ｌより小さい自然数）番目のニューラルネットワークの学習処理を行なう場合、Ｌ個のうちの第ｋ番目のニューラルネットワークの学習処理で学習が困難であった交通音データを優先的に選択し、学習されたＬ個のニューラルネットワークの各々は、複数のサブバンドに分割されたパワースペクトルに基づいて、交通音の種類の判定結果を出力し、交通音識別装置は、学習されたＬ個のニューラルネットワークからそれぞれ出力されたＬ個の判定結果の重み付け演算に基づいて、交通音の種類の判定を行うための判定回路をさらに備える。 According to another aspect of the present invention, there is provided a traffic sound identification device for determining which collected traffic sound is any of a plurality of predetermined traffic sounds, wherein the traffic sound identification device is a power of traffic sound. In order to determine the type of traffic sound based on an arithmetic circuit that calculates a spectrum, a subband circuit that divides the power spectrum calculated by the arithmetic circuit into a plurality of subbands, and a power spectrum that is divided into a plurality of subbands 1 to L (natural number) in advance, and N learning networks for learning the L neural networks, and the learning circuit has N (natural number) for learning. By repeating the data selection process that selects N traffic sound data randomly without disturbing the overlap from each traffic sound data, the L sets of learning data are ordered. A learning process of generating and sequentially learning with the L neural networks is performed, and the learning circuit learns the (k + 1) th (k + 1) (k: natural number smaller than L) of the L neural networks in the data selection process. When the process is performed, traffic sound data that is difficult to learn in the learning process of the kth neural network among the L pieces is preferentially selected, and each of the learned L number of neural networks includes a plurality of pieces. Based on the power spectrum divided into sub-bands, the traffic sound type determination result is output, and the traffic sound identification device weights the L determination results output from the learned L neural networks, respectively. And a determination circuit for determining the type of traffic sound.

好ましくは、学習回路がＬ個のニューラルネットワークに学習させる時には、学習の判定に使用する交通音の種類をＭ（自然数）個とし、Ｍ個の交通音は、Ｑ個（Ｍより小さい自然数）のグループに分けられており、判定回路が交通音の判定を行う場合は、判定する交通音がＱ個のグループのどのグループに属するかを判定する。 Preferably, when the learning circuit causes L neural networks to learn, the type of traffic sound used for learning determination is M (natural number), and the M traffic sounds are Q (natural numbers smaller than M). When the determination circuit determines the traffic sound, it is determined to which of the Q groups the determined traffic sound belongs.

この発明の他の局面に従うと、交通音を収集するためのマイクを備えるコンピュータで実行される交通音判定プログラムであって、交通音判定プログラムは、交通音のパワースペクトルを演算するステップと、演算されたパワースペクトルを複数のサブバンドに分割し、複数のサブバンドに分割されたパワースペクトルに基づいて、Ｉ（自然数）個の入力データを生成するステップと、学習用のＮ（自然数）個の交通音データから重複を妨げないでランダムにＮ個の交通音データを選択するデータ選択処理をＬ回行なうことで、Ｌ組の学習データを順次生成して、順次生成された学習データを順次学習させたＬ個のニューラルネットワークを順次構成するステップと、学習させたＬ個のニューラルネットワークの各々に、Ｉ個の入力データに基づいて、交通音の種類の判定結果を出力させるステップと、学習させたＬ個のニューラルネットワークからそれぞれ出力させたＬ個の判定結果の多数決に基づいて、交通音の種類の判定を行うステップとをコンピュータに実行させる。 According to another aspect of the present invention, a traffic sound determination program executed by a computer including a microphone for collecting traffic sound, the traffic sound determination program calculating a power spectrum of the traffic sound; Dividing the power spectrum into a plurality of subbands , generating I (natural number) input data based on the power spectrum divided into the plurality of subbands , and learning N (natural number) Data selection processing for selecting N traffic sound data at random without interfering with traffic sound data is performed L times, so that L sets of learning data are generated sequentially, and the learning data generated sequentially are learned sequentially Each of the learned L neural networks is sequentially configured, and each of the learned L neural networks is converted into I input data . A step of outputting a judgment result of the type of traffic sound, and a step of judging the type of traffic sound based on the majority decision of the L judgment results outputted from the L neural networks learned. Is executed on the computer.

この発明の他の局面に従うと、交通音を収集するためのマイクを備えるコンピュータで実行される交通音判定プログラムであって、交通音判定プログラムは、交通音のパワースペクトルを演算するステップと、演算されたパワースペクトルを複数のサブバンドに分割するステップと、学習用のＮ（自然数）個の交通音データから重複を妨げないでランダムにＮ個の交通音データを選択するデータ選択処理をＬ回行なうことで、Ｌ組の学習データを順次生成して、順次生成された学習データを順次学習させたＬ個のニューラルネットワークを順次構成するステップと、学習させたＬ個のニューラルネットワークの各々に、複数のサブバンドに分割されたパワースペクトルに基づいて、交通音の種類の判定結果を出力させるステップと、学習させたＬ個のニューラルネットワークからそれぞれ出力させたＬ個の判定結果の重み付け演算に基づいて、交通音の種類の判定を行うステップとをコンピュータに実行させ、交通音判定プログラムは、データ選択処理では、Ｌ個のうちの第（ｋ＋１）（ｋ：Ｌより小さい自然数）番目のニューラルネットワークの学習処理を行なう場合、Ｌ個のうちの第ｋ番目のニューラルネットワークで学習が困難であった交通音データを優先的に選択するステップをさらにコンピュータに実行させる。 According to another aspect of the present invention, a traffic sound determination program executed by a computer including a microphone for collecting traffic sound, the traffic sound determination program calculating a power spectrum of the traffic sound; A step of dividing the power spectrum into a plurality of subbands, and a data selection process for selecting N traffic sound data randomly from N (natural number) traffic sound data for learning without preventing overlap. By sequentially generating L sets of learning data, and sequentially configuring L neural networks in which the sequentially generated learning data is sequentially learned, and in each of the learned L neural networks, Based on the power spectrum divided into multiple subbands, the step of outputting the judgment result of the type of traffic sound and the learning The computer executes a step of determining the type of traffic sound based on the weighting calculation of the L determination results respectively output from the L neural networks. When learning processing is performed on the (k + 1) th (k: natural number smaller than L) of the neural networks, priority is given to traffic sound data that is difficult to learn with the kth neural network of L. The computer further performs the step of selecting automatically.

この発明のさらに他の局面に従うと、記録媒体は、交通音判定プログラムを記録した媒体である。 According to still another aspect of the present invention, the recording medium is a medium on which a traffic sound determination program is recorded.

この発明の他の局面に従うと、収集した交通音が、予め定めた複数種類の交通音のいずれであるかの判定を行なう交通音判定方法であって、交通音判定方法は、交通音のパワースペクトルを演算する工程と、演算されたパワースペクトルを複数のサブバンドに分割し、複数のサブバンドに分割されたパワースペクトルに基づいて、Ｉ（自然数）個の入力データを生成する工程と、学習用のＮ（自然数）個の交通音データから重複を妨げないでランダムにＮ個の交通音データを選択するデータ選択処理をＬ回行なうことで、Ｌ組の学習データを順次生成して、順次生成された学習データを順次学習させたＬ個のニューラルネットワークを順次構成する工程と、学習させたＬ個のニューラルネットワークの各々に、Ｉ個の入力データに基づいて、交通音の種類の判定結果を出力させる工程と、学習させたＬ個のニューラルネットワークからそれぞれ出力させたＬ個の判定結果の多数決に基づいて、交通音の種類の判定を行う工程とを含む。 According to another aspect of the present invention, there is provided a traffic sound determination method for determining which collected traffic sound is any of a plurality of predetermined types of traffic sounds, the traffic sound determination method comprising: A step of calculating a spectrum, a step of dividing the calculated power spectrum into a plurality of subbands , generating I (natural number) input data based on the power spectrum divided into the plurality of subbands , and learning By performing a data selection process for selecting N traffic sound data randomly from N (natural number) traffic sound data for L times without hindering duplication, L sets of learning data are sequentially generated and sequentially Based on the I input data , each of the L neural networks trained sequentially comprises L neural networks that sequentially train the generated learning data, and A step of outputting a determination result of the type of traffic sound, and a step of determining the type of traffic sound based on the majority of the L determination results output from the learned L neural networks.

この発明の他の局面に従うと、収集した交通音が、予め定めた複数種類の交通音のいずれであるかの判定を行なう交通音判定方法であって、交通音判定方法は、交通音のパワースペクトルを演算する工程と、演算されたパワースペクトルを複数のサブバンドに分割する工程と、学習用のＮ（自然数）個の交通音データから重複を妨げないでランダムにＮ個の交通音データを選択するデータ選択処理をＬ回行なうことで、Ｌ組の学習データを順次生成して、順次生成された学習データを順次学習させたＬ個のニューラルネットワークを順次構成する工程と、学習させたＬ個のニューラルネットワークの各々に、複数のサブバンドに分割されたパワースペクトルに基づいて、交通音の種類の判定結果を出力させる工程と、学習させたＬ個のニューラルネットワークからそれぞれ出力させたＬ個の判定結果の重み付け演算に基づいて、交通音の種類の判定を行う工程とを含み、データ選択処理では、Ｌ個のうちの第（ｋ＋１）（ｋ：Ｌより小さい自然数）番目のニューラルネットワークの学習処理を行なう場合、Ｌ個のうちの第ｋ番目のニューラルネットワークで学習が困難であった交通音データを優先的に選択する。 According to another aspect of the present invention, there is provided a traffic sound determination method for determining which collected traffic sound is any of a plurality of predetermined types of traffic sounds, the traffic sound determination method comprising: A step of calculating a spectrum, a step of dividing the calculated power spectrum into a plurality of subbands, and N traffic sound data randomly from N (natural number) traffic sound data for learning without hindering duplication. By performing the data selection process to be selected L times, L sets of learning data are sequentially generated, L neural networks sequentially learning the sequentially generated learning data, and learning L A step of causing each of the neural networks to output a judgment result of the type of traffic sound based on the power spectrum divided into a plurality of subbands; And a step of determining the type of traffic sound based on the weighted calculation of L determination results respectively output from the network, and in the data selection process, the Lth (k + 1) (k: L) When the learning process of the (smaller natural number) th neural network is performed, traffic sound data that is difficult to learn with the kth neural network out of L is preferentially selected.

本発明に係る交通音識別装置は、交通音の種類を判定するためのＬ個のニューラルネットワークと、Ｌ個のニューラルネットワークを学習させるための学習回路とを備え、学習されたＬ個のニューラルネットワークからそれぞれ出力されたＬ個の判定結果の多数決に基づいて、交通音の種類の判定を行う。 The traffic sound identification device according to the present invention includes L neural networks for determining the type of traffic sound and a learning circuit for learning the L neural networks, and the learned L neural networks. The type of traffic sound is determined on the basis of the majority decision of the L determination results respectively output from.

したがって、複数のニューラルネットワークからそれぞれ出力された複数の判定結果の多数決に基づいて、交通音の種類の判定を行うので、交通音の識別を非常に高い精度で行なうことができる交通音識別装置を提供することが可能となる。 Therefore, since the type of traffic sound is determined based on the majority of the plurality of determination results output from the plurality of neural networks, a traffic sound identification device capable of identifying traffic sound with very high accuracy is provided. It becomes possible to provide.

また、本発明に係る交通音識別装置は、交通音の種類を判定するためのＬ個のニューラルネットワークと、Ｌ個のニューラルネットワークを学習させるための学習回路とを備え、学習されたＬ個のニューラルネットワークからそれぞれ出力されたＬ個の判定結果の重み付け演算に基づいて、交通音の種類の判定を行う。 The traffic sound identification device according to the present invention includes L neural networks for determining the type of traffic sound, and a learning circuit for learning the L neural networks, and has learned L pieces of learning networks. The type of traffic sound is determined based on the weighting calculation of the L determination results output from the neural network.

したがって、複数のニューラルネットワークからそれぞれ出力された複数の判定結果の重み付け演算に基づいて、交通音の種類の判定を行うので、交通音の識別を非常に高い精度で行なうことができる交通音識別装置を提供することが可能となる。 Accordingly, since the type of traffic sound is determined based on the weighting calculation of the plurality of determination results respectively output from the plurality of neural networks, the traffic sound identification device capable of identifying the traffic sound with very high accuracy Can be provided.

本発明に係る交通音判定プログラムは、Ｌ組の学習データを順次生成して、順次生成された学習データを順次学習させたＬ個のニューラルネットワークを順次構成し、学習させたＬ個のニューラルネットワークからそれぞれ出力させたＬ個の判定結果の多数決に基づいて、交通音の種類の判定を行う。 The traffic sound determination program according to the present invention sequentially generates L sets of learning data, sequentially configures L neural networks in which the sequentially generated learning data is sequentially learned, and learns the L neural networks. The type of traffic sound is determined based on the majority decision of the L determination results output from each.

したがって、学習させた複数のニューラルネットワークからそれぞれ出力された複数の判定結果の多数決に基づいて、交通音の種類の判定を行うので、交通音の識別を非常に高い精度で行なうことができる交通音識別装置として、コンピュータを機能させるための交通音判定プログラムを提供することが可能となる。 Therefore, since the type of traffic sound is determined based on the majority of the plurality of determination results output from the plurality of learned neural networks, the traffic sound can be identified with very high accuracy. As the identification device, it is possible to provide a traffic sound determination program for causing a computer to function.

本発明に係る交通音判定プログラムは、Ｌ組の学習データを順次生成して、順次生成された学習データを順次学習させたＬ個のニューラルネットワークを順次構成し、学習させたＬ個のニューラルネットワークからそれぞれ出力させたＬ個の判定結果の重み付け演算に基づいて、交通音の種類の判定を行う。 The traffic sound determination program according to the present invention sequentially generates L sets of learning data, sequentially configures L neural networks in which the sequentially generated learning data is sequentially learned, and learns the L neural networks. The type of traffic sound is determined on the basis of the weighting calculation of the L determination results respectively output from.

したがって、学習させた複数のニューラルネットワークからそれぞれ出力された複数の判定結果の重み付け演算に基づいて、交通音の種類の判定を行うので、交通音の識別を非常に高い精度で行なうことができる交通音識別装置として、コンピュータを機能させるための交通音判定プログラムを提供することが可能となる。 Therefore, since the type of traffic sound is determined based on the weighted calculation of the plurality of determination results respectively output from the plurality of learned neural networks, the traffic sound can be identified with very high accuracy. It is possible to provide a traffic sound determination program for causing a computer to function as the sound identification device.

本発明に係る記録媒体は、交通音判定プログラムを記録する。交通音判定プログラムは、Ｌ組の学習データを順次生成して、順次生成された学習データを順次学習させたＬ個のニューラルネットワークを順次構成し、学習させたＬ個のニューラルネットワークからそれぞれ出力させたＬ個の判定結果の多数決に基づいて、交通音の種類の判定を行う。 The recording medium according to the present invention records a traffic sound determination program. The traffic sound determination program sequentially generates L sets of learning data, sequentially configures L neural networks that have sequentially learned the generated learning data, and outputs each of the learned L neural networks. The type of traffic sound is determined based on the majority decision of the L determination results.

したがって、学習させた複数のニューラルネットワークからそれぞれ出力された複数の判定結果の多数決に基づいて、交通音の種類の判定を行うので、交通音の識別を非常に高い精度で行なうことができる交通音識別装置として、コンピュータを機能させるための記録媒体を提供することが可能となる。 Therefore, since the type of traffic sound is determined based on the majority of the plurality of determination results output from the plurality of learned neural networks, the traffic sound can be identified with very high accuracy. As an identification device, it is possible to provide a recording medium for causing a computer to function.

また、本発明に係る記録媒体は、交通音判定プログラムを記録する。交通音判定プログラムは、Ｌ組の学習データを順次生成して、順次生成された学習データを順次学習させたＬ個のニューラルネットワークを順次構成し、学習させたＬ個のニューラルネットワークからそれぞれ出力させたＬ個の判定結果の重み付け演算に基づいて、交通音の種類の判定を行う。 The recording medium according to the present invention records a traffic sound determination program. The traffic sound determination program sequentially generates L sets of learning data, sequentially configures L neural networks that have sequentially learned the generated learning data, and outputs each of the learned L neural networks. The type of traffic sound is determined based on the weighted calculation of the L determination results.

したがって、学習させた複数のニューラルネットワークからそれぞれ出力された複数の判定結果の重み付け演算に基づいて、交通音の種類の判定を行うので、交通音の識別を非常に高い精度で行なうことができる交通音識別装置として、コンピュータを機能させるための記録媒体を提供することが可能となる。 Therefore, since the type of traffic sound is determined based on the weighted calculation of the plurality of determination results respectively output from the plurality of learned neural networks, the traffic sound can be identified with very high accuracy. As a sound identification device, it is possible to provide a recording medium for causing a computer to function.

本発明に係る交通音判定方法は、Ｌ組の学習データを順次生成して、順次生成された学習データを順次学習させたＬ個のニューラルネットワークを順次構成し、学習させたＬ個のニューラルネットワークからそれぞれ出力させたＬ個の判定結果の多数決に基づいて、交通音の種類の判定を行う。 The traffic sound determination method according to the present invention sequentially generates L sets of learning data, sequentially configures L neural networks that have sequentially learned the learning data, and learns the L neural networks that have been learned. The type of traffic sound is determined based on the majority decision of the L determination results output from each.

したがって、学習させた複数のニューラルネットワークからそれぞれ出力された複数の判定結果の多数決に基づいて、交通音の種類の判定を行うので、交通音の識別を非常に高い精度で行なうことができる交通音判定方法を提供することが可能となる。 Therefore, since the type of traffic sound is determined based on the majority of the plurality of determination results output from the plurality of learned neural networks, the traffic sound can be identified with very high accuracy. A determination method can be provided.

また、本発明に係る交通音判定方法は、Ｌ組の学習データを順次生成して、順次生成された学習データを順次学習させたＬ個のニューラルネットワークを順次構成し、学習させたＬ個のニューラルネットワークからそれぞれ出力させたＬ個の判定結果の重み付け演算に基づいて、交通音の種類の判定を行う。 In addition, the traffic sound determination method according to the present invention sequentially generates L sets of learning data, sequentially configures L neural networks that sequentially learn the generated learning data, and learns the L pieces of learned data. The type of traffic sound is determined based on the weighting calculation of the L determination results output from the neural network.

したがって、学習させた複数のニューラルネットワークからそれぞれ出力された複数の判定結果の重み付け演算に基づいて、交通音の種類の判定を行うので、交通音の識別を非常に高い精度で行なうことができる交通音判定方法を提供することが可能となる。 Therefore, since the type of traffic sound is determined based on the weighted calculation of the plurality of determination results respectively output from the plurality of learned neural networks, the traffic sound can be identified with very high accuracy. It is possible to provide a sound determination method.

以下、図面を参照しつつ、本発明の実施の形態について説明する。以下の説明では、同一の部品には同一の符号を付してある。それらの名称および機能も同じである。したがって、それらについての詳細な説明は繰り返さない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, the same parts are denoted by the same reference numerals. Their names and functions are also the same. Therefore, detailed description thereof will not be repeated.

＜第１の実施の形態＞
図１は、本実施の形態における交通音識別装置１０００の構成を示すブロック図である。 <First Embodiment>
FIG. 1 is a block diagram showing a configuration of a traffic sound identification apparatus 1000 according to the present embodiment.

図１を参照して、交通音識別装置１０００は、マイク１００と、音圧演算回路１１０と、パワースペクトル演算回路１２０と、サブバンド演算回路１３０と、ニューラルネットワーク部２００とを備える。 Referring to FIG. 1, traffic sound identification apparatus 1000 includes a microphone 100, a sound pressure calculation circuit 110, a power spectrum calculation circuit 120, a subband calculation circuit 130, and a neural network unit 200.

マイク１００は、道路付近（たとえば、交差点付近）における交通音を収集する機能を有する。音圧演算回路１１０は、マイク１００により収集された交通音の音圧値を測定し、所定の演算処理を行なう。パワースペクトル演算回路１２０は、音圧演算回路１１０により演算処理された音圧値に基づきパワースペクトルを演算する。サブバンド演算回路１３０は、パワースペクトル演算回路１２０により演算されたパワースペクトルを複数の周波数帯域（サブバンド）に分割する。ニューラルネットワーク部２００は、サブバンド演算回路１３０により複数のサブバンドに分割されたパワースペクトルのデータに基づいて、交通音の種類を判定し、当該判定結果を出力する。 The microphone 100 has a function of collecting traffic sounds near a road (for example, near an intersection). The sound pressure calculation circuit 110 measures the sound pressure value of the traffic sound collected by the microphone 100 and performs a predetermined calculation process. The power spectrum calculation circuit 120 calculates a power spectrum based on the sound pressure value calculated by the sound pressure calculation circuit 110. The subband calculation circuit 130 divides the power spectrum calculated by the power spectrum calculation circuit 120 into a plurality of frequency bands (subbands). The neural network unit 200 determines the type of traffic sound based on the power spectrum data divided into a plurality of subbands by the subband arithmetic circuit 130, and outputs the determination result.

図２は、本実施の形態におけるニューラルネットワーク部２００の内部構成を示した図である。 FIG. 2 is a diagram showing an internal configuration of the neural network unit 200 in the present embodiment.

図２を参照して、ニューラルネットワーク部２００は、第１番目から第Ｌ（自然巣）番目まで予め順序付けられたニューラルネットワーク２００．１，２００．２，・・・，２００．Ｌを含む。以下においては、ニューラルネットワークを単に“ＮＮ”とも記載する。 Referring to FIG. 2, the neural network unit 200 includes neural networks 200.1, 200.2,..., 200. L is included. In the following, the neural network is also simply referred to as “NN”.

図３は、本実施の形態におけるニューラルネットワーク２００．１の詳細な内部構成を示した図である。 FIG. 3 is a diagram showing a detailed internal configuration of the neural network 200.1 in the present embodiment.

図３を参照して、ニューラルネットワーク２００．１は、入力層２００．１Ａ、隠れ層２００．１Ｂ、出力層２００．１Ｃを有する。 Referring to FIG. 3, neural network 200.1 includes an input layer 200.1A, a hidden layer 200.1B, and an output layer 200.1C.

入力層２００．１Ａは複数の入力ユニットを有する。なお、「ユニット」とは、人間の脳の神経細胞素子である「ニューロン」をモデル化したものである。複数の入力ユニットには、複数のデータがそれぞれ入力される。本実施の形態では、入力ユニットの数は、一例として、３２個であるとする。なお、入力ユニットの数は、３２個に限定されることなく、任意の数であってよい。 The input layer 200.1A has a plurality of input units. The “unit” is a model of a “neuron” that is a neuron element of the human brain. A plurality of data are respectively input to the plurality of input units. In the present embodiment, the number of input units is, for example, 32. The number of input units is not limited to 32 and may be an arbitrary number.

隠れ層２００．１Ｂは複数の隠れユニットを有する。 The hidden layer 200.1B has a plurality of hidden units.

出力層２００．１Ｃは複数の出力ユニットを有する。 The output layer 200.1C has a plurality of output units.

複数の入力ユニットと、複数の隠れユニットとは、互いに結合される。また、複数の隠れユニットと、複数の出力ユニットとは、互いに結合される。複数の出力ユニットからは、ニューラルネットワーク２００．１による複数の結果データＯＴ１，ＯＴ２，・・・，ＯＴｃが、それぞれ出力される。複数の結果データＯＴ１，ＯＴ２，・・・，ＯＴｃは、第１番目〜第ｃ番目まで予め順序付けられている。なお、複数の結果データＯＴ１，ＯＴ２，・・・，ＯＴｃを総括的に判定結果とも称する。 The plurality of input units and the plurality of hidden units are coupled to each other. The plurality of hidden units and the plurality of output units are coupled to each other. A plurality of result data OT1, OT2,..., OTc by the neural network 200.1 are output from the plurality of output units, respectively. The plurality of result data OT1, OT2,..., OTc are pre-ordered from the first to the c-th. The plurality of result data OT1, OT2,... OTc are also collectively referred to as determination results.

結果データＯＴ１，ＯＴ２，・・・，ＯＴｃは、それぞれ、複数種類の交通音と対応づけられている。結果データＯＴ１は、たとえば、「衝突音」に対応づけられている。 The result data OT1, OT2,..., OTc are associated with a plurality of types of traffic sounds. The result data OT1 is associated with, for example, “collision sound”.

なお、図２のニューラルネットワーク２００．２，・・・，２００．Ｌの各々も、図３のニューラルネットワーク２００．１と同様な構成および機能を有する。 2, 200... 200. Each of L has the same configuration and function as the neural network 200.1 of FIG.

再び、図１を参照して、交通音識別装置１０００は、さらに、学習回路２１０と、判定回路２５０とを備える。 Again referring to FIG. 1, traffic sound identification apparatus 1000 further includes a learning circuit 210 and a determination circuit 250.

学習回路２１０は、ニューラルネットワーク部２００から出力された複数の判定結果に基づいて、ニューラルネットワーク部２００内のニューラルネットワーク２００．１，２００．２，・・・，２００．Ｌに、教師データＴＤ１，ＴＤ２，・・・，ＴＤＬをそれぞれ入力させることで、ニューラルネットワーク２００．１，２００．２，・・・，２００．Ｌを学習させる。なお、複数のニューラルネットワークに学習させる処理を集団学習処理とも称する。集団学習処理は、学習回路２１０によって行なわれる。 Based on a plurality of determination results output from the neural network unit 200, the learning circuit 210 performs neural networks 200.1, 200.2,. , TDL is input with teacher data TD1, TD2,..., TDL, respectively, so that the neural networks 200.1, 200.2,. Let L learn. In addition, the process which makes a some neural network learn is also called group learning process. The group learning process is performed by the learning circuit 210.

判定回路２５０は、ニューラルネットワーク部２００から出力された複数の判定結果に基づいて、交通音の種類が何であるかを判定し、当該判定された交通音を特定するための結果データＲＴを出力する。 The determination circuit 250 determines what kind of traffic sound is based on the plurality of determination results output from the neural network unit 200, and outputs result data RT for specifying the determined traffic sound. .

次に、複数のニューラルネットワークを学習させ、当該学習させた複数のニューラルネットワークを使用して、入力データの判定を行なう手法について説明する。当該手法の代表的なものとしては、バギングと、アダブーストとがある。 Next, a method of learning a plurality of neural networks and determining input data using the learned neural networks will be described. Typical examples of the method include bagging and Adaboost.

本実施の形態では、バギングの集団学習処理（以下においては、バギング集団学習処理とも称する）について説明する。 In the present embodiment, a bagging group learning process (hereinafter also referred to as a bagging group learning process) will be described.

バギング集団学習処理では、たとえば、Ｎ個の交通音データからなる学習セットＳを利用する。以下においては、学習セットＳを学習データとも称する。学習セットＳは、次の（１）式によって表される。 In the bagging group learning process, for example, a learning set S composed of N pieces of traffic sound data is used. Hereinafter, the learning set S is also referred to as learning data. The learning set S is represented by the following equation (1).

（１）式のｘ_１，ｘ_２，・・・，ｘ_Ｎは、入力ベクトルを表す。ｙ_ｉは、入力ベクトルｘ_ｉのカテゴリラベルを表す。本実施の形態において、入力ベクトルｘ_ｉは、３２個のデータを示す。なお、入力ベクトルｘ_ｉの示すデータの個数は、３２個に限定されることなく、任意であってよい。また、本実施の形態において、カテゴリラベルｙ_ｉは、整数であるとする。たとえば、カテゴリラベルｙ_ｉが、“２”なら、第２番目の結果データＯＴ２が、“１”となるように、ニューラルネットワークを学習させることになる。また、ｃは、カテゴリラベルまたは後述するグループの数を示す。 In the equation (1), x ₁ , x ₂ ,..., X _N represent input vectors. y _i represents the category label of the input vector x _i . In this embodiment, the input vector x _i indicates the 32 data. Note that the number of data indicated by the input vector x _i is not limited to 32, and may be arbitrary. In the present embodiment, the category label y _i is an integer. For example, if the category label y _i is “2”, the neural network is trained so that the second result data OT2 is “1”. C represents the number of category labels or groups to be described later.

図４は、バギング集団学習処理の流れを示すフローチャートである。 FIG. 4 is a flowchart showing the flow of the bagging group learning process.

図４を参照して、ステップＳ１００では、初期化処理が行なわれる。初期化処理では、学習させるニューラルネットワークの数、その他パラメータ等が設定される。また、ｋの初期値が、“１”に設定される。本実施の形態では、ニューラルネットワークの数は、Ｌ個であるとする。その後、ステップＳ１１０に進む。 Referring to FIG. 4, in step S100, an initialization process is performed. In the initialization process, the number of neural networks to be learned and other parameters are set. The initial value of k is set to “1”. In the present embodiment, the number of neural networks is assumed to be L. Then, it progresses to step S110.

ステップＳ１１０では、ｋ番目の学習セットＳ_ｋの生成処理（以下においては、学習セット生成処理とも称する）が行なわれる。 In step S110, a k-th learning set _Sk generation process (hereinafter also referred to as a learning set generation process) is performed.

図５は、学習セット生成処理を説明するための図である。 FIG. 5 is a diagram for explaining learning set generation processing.

図５（Ａ）は、前述の（１）式を示す。 FIG. 5A shows the above-described equation (1).

図５（Ｂ）、図５（Ｃ）および図５（Ｄ）は、学習セット生成処理により生成された学習セットＳ_１、Ｓ_２、Ｓ_Ｌを示す。学習セットＳ_１、Ｓ_２、Ｓ_Ｌは、Ｎ個の交通音データからなる学習セットＳ（学習データ）から、重複を妨げないで一定の確率（たとえば、１／Ｎ）でランダムにＮ個の交通音データを選択することによって生成される。 FIGS. 5B, 5C, and 5D illustrate learning sets S ₁ , S ₂ , and S _L generated by the learning set generation process. The learning sets S ₁ , S ₂ , and S _L are randomly selected from the learning set S (learning data) consisting of N traffic sound data with a certain probability (for example, 1 / N) without obstructing duplication. Generated by selecting traffic sound data.

すなわち、生成された学習セットには、同じ交通音データが複数ある場合もある。また、生成された学習セットに含まれる交通音データは、全て同じ場合もある。また、生成された学習セットＳ_１、Ｓ_２、Ｓ_Ｌの少なくとも２つが同じ場合もある。 In other words, there may be a plurality of the same traffic sound data in the generated learning set. Moreover, all the traffic sound data included in the generated learning set may be the same. In addition, at least two of the generated learning sets S ₁ , S ₂ , and S _L may be the same.

再び図４を参照して、ステップＳ１１０において、ｋ番目の学習セットＳ_ｋの生成処理が終了すると、ステップＳ１２０に進む。 Referring again to FIG. 4, in step S110, the generation processing of the k-th training set _{S k} is terminated, the process proceeds to step S120.

ステップＳ１２０では、学習セットＳ_ｋをｋ番目のニューラルネットワーク（たとえば、ニューラルネットワーク２００．１）で学習させるＮＮ学習処理が行なわれる。 In step S120, the learning sets _{S k} a k-th neural networks (e.g., the neural network 200.1) NN learning process of learning is carried out.

このとき、ｋ＝１なので、学習処理では、図３のニューラルネットワーク２００．１の複数の入力ユニットに、学習セットＳ_１の１番目の入力ベクトルｘ_１（図５（Ｂ）参照）の値が入力される。 Since k = 1 at this time, in the learning process, the values of the _first input vector x ₁ (see FIG. 5B) of the learning set S ₁ are stored in the plurality of input units of the neural network 200.1 in FIG. Entered.

ニューラルネットワーク２００．１は、入力された値に対して出力する結果データＯＴ１，ＯＴ２，・・・，ＯＴｃと、学習セットＳ_１の１番目のカテゴリラベルｙ_１（図５（Ｂ）参照）とに基づいて、ニューラルネットワーク２００．１の各ユニット間の結合の重みを変化させる処理（以下においては、結合重み変化処理とも称する）を行なう。 The neural network 200.1 outputs result data OT1, OT2,... OTc to be output with respect to the input values, the first category label y ₁ (see FIG. 5B) of the learning set S ₁ , and Based on the above, a process of changing the weight of connection between units of the neural network 200.1 (hereinafter also referred to as a connection weight change process) is performed.

具体的には、結合重み変化処理では、入力ベクトルｘ_１に対応するカテゴリラベルと、入力ベクトルｘ_１によりニューラルネットワークが出力したデータとの誤差の値を、各ユニット間の結合の重みに対して、フィードバックさせる。これにより、各ユニット間の結合の重みを変化させることができる。なお、この変化させる割合を「学習率」という。 Specifically, the connection weights changing process, and category label corresponding to the input vector x _1, the error value of the data neural network is output by the input vector x _1, relative to the weight of the interconnection between the units Give feedback. Thereby, the weight of the coupling between the units can be changed. This rate of change is referred to as “learning rate”.

次に、学習セットＳ_１の２番目からＮ番目までの各入力ベクトルに対しても、前述と同様な処理を繰返すことで、ニューラルネットワーク２００．１を学習させる。その後、ステップＳ１３０に進む。 Then, also for each input vector from the second training set S ₁ to N-th, by repeating the same processing as described above, to train the neural network 200.1. Thereafter, the process proceeds to step S130.

ステップＳ１３０では、ｋ番目のニューラルネットワーク（ニューラルネットワーク２００．１）を識別器の集団に追加する。ここで、識別器とは、ニューラルネットワークのことである。また、識別器の集団とは、後述する交通音の種類を判定させる処理において使用する学習済みのニューラルネットワークの集団のことである。なお、初めて、ステップＳ１３０の処理が行なわれる前の識別器の集団には、何も含まれていない。その後、ステップＳ１４０に進む。 In step S130, the kth neural network (neural network 200.1) is added to the group of classifiers. Here, the discriminator is a neural network. The group of classifiers is a group of learned neural networks used in processing for determining the type of traffic sound described later. For the first time, nothing is included in the group of classifiers before the process of step S130 is performed. Thereafter, the process proceeds to step S140.

ステップＳ１４０では、ｋが１インクリメントされる。その後、ステップＳ１５０の処理が行なわれる。 In step S140, k is incremented by one. Thereafter, the process of step S150 is performed.

ステップＳ１５０では、ｋがＬより大きいか否かが判定される。ステップＳ１５０において、ＮＯの場合、再度、ステップＳ１１０に進む。 In step S150, it is determined whether k is greater than L. If NO in step S150, the process proceeds to step S110 again.

以上説明した、ステップＳ１１０，Ｓ１２０，Ｓ１３０，Ｓ１４０の処理が、ステップＳ１５０の条件を満たすまで繰り返されることにより、ニューラルネットワーク２００．１，２００．２，・・・，２００．Ｌの全てが学習済みとなる。このとき、識別器の集団には、学習済みのニューラルネットワーク２００．１，２００．２，・・・，２００．Ｌが含まれる。 By repeating the processes of steps S110, S120, S130, and S140 described above until the condition of step S150 is satisfied, neural networks 200.1, 200.2,. All of L is already learned. At this time, the trained neural networks 200.1, 200.2,..., 200. L is included.

そして、ステップＳ１５０において、ＹＥＳの場合、このバギング集団学習処理は終了する。 If YES in step S150, the bagging group learning process ends.

次に、バギングにおいて、収集した交通音の種類を判定させる処理（以下においては、バギング判定処理とも称する）について説明する。 Next, processing for determining the type of collected traffic sound in bagging (hereinafter also referred to as bagging determination processing) will be described.

図６は、バギング判定処理の流れを示すフローチャートである。 FIG. 6 is a flowchart showing the flow of the bagging determination process.

図７は、バギング判定処理を説明するための図である。 FIG. 7 is a diagram for explaining the bagging determination process.

図７（Ａ）は、マイク１００から入力された交通音をグラフで表した図である。図７（Ａ）では、横軸が経過時間、縦軸が音圧値を表す。 FIG. 7A is a graph showing traffic sound input from the microphone 100. In FIG. 7A, the horizontal axis represents the elapsed time, and the vertical axis represents the sound pressure value.

次に、図１、図６、図７を参照して、バギング判定処理を説明する。 Next, the bagging determination process will be described with reference to FIG. 1, FIG. 6, and FIG.

ステップＳ２００では、交通音収集処理が行なわれる。交通音収集処理では、マイク１００から交通音の収集が行なわれる。収集されたアナログデータとしての交通音は、音圧演算回路１１０が、４８ｋＨｚでサンプリングする。なお、サンプリング周波数は、４８ｋＨｚに限定されることなく、他の値であってもよい。 In step S200, traffic sound collection processing is performed. In the traffic sound collection process, traffic sound is collected from the microphone 100. The sound pressure calculation circuit 110 samples the collected traffic sound as analog data at 48 kHz. Note that the sampling frequency is not limited to 48 kHz, and may be another value.

そして、音圧演算回路１１０は、４８ｋＨｚで連続してサンプリングした１０２４個のデータのある区間を１区間とし、当該１区間の音圧値の合計値を求める。以下においては、当該合計値を区間音圧値とも称する。図７（Ａ）では、区間Ｊ，Ｋ，Ｌ，Ｍ，Ｎの各々が、前述の１区間に相当する。 Then, the sound pressure calculation circuit 110 sets a certain section of 1024 data continuously sampled at 48 kHz as one section, and obtains the total value of the sound pressure values in the one section. Hereinafter, the total value is also referred to as a section sound pressure value. In FIG. 7A, each of the sections J, K, L, M, and N corresponds to the above-described one section.

次に、音圧演算回路１１０は、隣あう２つの区間（たとえば、区間Ｊおよび区間Ｋ）を第１の区間および第２の区間に設定する。ここで、第１の区間は、第２の区間よりも前の時間の区間（区間Ｊ）である。第２の区間は、第１の区間よりも後の時間の区間（区間Ｋ）である。 Next, the sound pressure calculation circuit 110 sets two adjacent sections (for example, the section J and the section K) as the first section and the second section. Here, the first section is a section (section J) of a time before the second section. The second section is a section (section K) of a time later than the first section.

音圧演算回路１１０は、第２の区間の区間音圧値から第１の区間の区間音圧値を減算することで、第１および第２の区間の区間音圧値の差を求める処理を行なう。当該処理は、所定時間の経過毎に音圧演算回路１１０により行なわれる。すなわち、交通音の音圧値（図７（Ａ）の波形）は、音圧演算回路１１０により、常に監視される。その後、ステップＳ２１０に進む。 The sound pressure calculation circuit 110 performs a process for obtaining a difference between the section sound pressure values of the first and second sections by subtracting the section sound pressure value of the first section from the section sound pressure value of the second section. Do. This process is performed by the sound pressure calculation circuit 110 every time a predetermined time elapses. That is, the sound pressure value of the traffic sound (the waveform in FIG. 7A) is always monitored by the sound pressure calculation circuit 110. Thereafter, the process proceeds to step S210.

ステップＳ２１０では、所定レベル以上の交通音が発生したか否かが判定される処理（以下においては、大交通音判定処理とも称する）が行なわれる。 In step S210, a process for determining whether or not a traffic sound of a predetermined level or higher has occurred (hereinafter also referred to as a heavy traffic sound determination process) is performed.

大交通音判定処理では、音圧演算回路１１０が、第１および第２の区間の区間音圧値の差が所定のしきい値より大きいか否かを判定する。 In the heavy traffic sound determination process, the sound pressure calculation circuit 110 determines whether or not the difference between the section sound pressure values of the first and second sections is larger than a predetermined threshold value.

ステップＳ２１０において、ＮＯの場合、再度、ステップＳ２００に進む。一方、ステップＳ２１０において、ＹＥＳの場合、ステップＳ２２０に進む。 If NO in step S210, the process proceeds to step S200 again. On the other hand, if YES at step S210, the process proceeds to step S220.

ステップＳ２２０では、パワースペクトル演算処理が行なわれる。 In step S220, power spectrum calculation processing is performed.

パワースペクトル演算処理では、パワースペクトル演算回路１２０が、音圧演算回路１１０により設定された第２の区間と、第２の区間（区間Ｋ）から連続する３つの区間（区間Ｌ，Ｍ，Ｎ）とに対し、ＦＦＴ（Fast Fourier Transform）演算処理を行なう。 In the power spectrum calculation process, the power spectrum calculation circuit 120 includes a second section set by the sound pressure calculation circuit 110 and three sections (sections L, M, and N) that are continuous from the second section (section K). Then, FFT (Fast Fourier Transform) calculation processing is performed.

図７（Ｂ）、図７（Ｃ）、図７（Ｄ）および図７（Ｅ）は、パワースペクトル演算回路１２０が、それぞれ、区間Ｋ、区間Ｌ、区間Ｍ、区間Ｍおよび区間Ｎの波形をＦＦＴ演算処理することによって得られたパワースペクトルである。 7B, FIG. 7C, FIG. 7D, and FIG. 7E show the waveforms of the power spectrum calculation circuit 120 in section K, section L, section M, section M, and section N, respectively. Is a power spectrum obtained by performing an FFT calculation process.

そして、パワースペクトル演算回路１２０は、区間Ｋ、区間Ｌ、区間Ｍおよび区間Ｎを、それぞれＦＦＴ処理することによって得られたグラフを合成する。 Then, the power spectrum calculation circuit 120 synthesizes the graphs obtained by performing the FFT processing on the section K, the section L, the section M, and the section N, respectively.

図７（Ｆ）は、パワースペクトル演算回路１２０により合成された波形を示す図である。パワースペクトル演算回路１２０は、合成された波形の各周波数における最大値に基づいて合成波形を生成する。合成波形は、図７（Ｆ）の太線の波形となる。その後、ステップＳ２３０に進む。 FIG. 7F shows a waveform synthesized by the power spectrum calculation circuit 120. The power spectrum calculation circuit 120 generates a synthesized waveform based on the maximum value at each frequency of the synthesized waveform. The combined waveform is a thick line waveform in FIG. Thereafter, the process proceeds to step S230.

ステップＳ２３０では、サブバンド演算処理が行なわれる。 In step S230, subband calculation processing is performed.

サブバンド演算処理では、生成された合成波形に対し、各々が異なり、連続した複数の周波数帯域（たとえば、１ｋＨｚ〜１０ｋＨｚ）の各々に含まれるパワースペクトルの平均値が、サブバンド演算回路１３０により演算される。これにより、パワースペクトルが複数の周波数帯域（サブバンド）に分割される。以下においては、各サブバンドに対応する平均値を、特徴データと称する。本実施の形態における特徴データの数は５１２個であるとする。なお、特徴データの数は、５１２個に限定されない。その後、ステップＳ２４０に進む。 In the subband calculation processing, the subband calculation circuit 130 calculates an average value of power spectra included in each of a plurality of continuous frequency bands (for example, 1 kHz to 10 kHz) with respect to the generated composite waveform. Is done. As a result, the power spectrum is divided into a plurality of frequency bands (subbands). In the following, the average value corresponding to each subband is referred to as feature data. It is assumed that the number of feature data in this embodiment is 512. The number of feature data is not limited to 512. Thereafter, the process proceeds to step S240.

ステップＳ２４０では、平均化演算処理が行なわれる。特徴データの数が５１２個だとニューラルネットワークの入力データ数としては大きいので、平均化演算処理では、サブバンド演算回路１３０が、５１２個の特徴データのうち、連続する１６個の特徴データ毎に平均値を求める。また、サブバンド演算回路１３０は、当該平均値を、“０”〜“１”の範囲に収まるように正規化処理する。本実施の形態では、当該正規化処理された平均値（以下においては、正規化済平均値とも称する）をニューラルネットワークの入力データとする。当該入力データの数は、５１２／１６より３２個である。 In step S240, an averaging calculation process is performed. If the number of feature data is 512, the number of input data of the neural network is large. Therefore, in the averaging calculation process, the subband calculation circuit 130 performs every 16 consecutive feature data out of 512 feature data. Find the average value. Further, the subband arithmetic circuit 130 normalizes the average value so that it falls within the range of “0” to “1”. In the present embodiment, the normalized average value (hereinafter also referred to as normalized average value) is used as input data for the neural network. The number of the input data is 32 from 512/16.

なお、特徴データ毎の平均値を求める際は、前述の１６個に限定されることはない。また、ニューラルネットワークの入力データ数は、５１２個のままであってもよい。すなわち、ステップＳ２４０の処理はなくてもよい。その後、ステップＳ２５０に進む。 In addition, when calculating | requiring the average value for every feature data, it is not limited to the above-mentioned 16 pieces. Further, the number of input data of the neural network may remain 512. That is, the process of step S240 may not be performed. Thereafter, the process proceeds to step S250.

ステップＳ２５０では、ニューラルネットワーク２００．１，２００．２，・・・，２００．Ｌの各々が有する３２個の入力ユニットに、ステップＳ２４０で求めた３２個の入力データが、それぞれ入力される。ニューラルネットワーク２００．１，２００．２，・・・，２００．Ｌは、入力データに応じて、出力データＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴＬをそれぞれ出力する。出力データＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴＬは、判定回路２５０へ入力される。 In step S250, the neural networks 200.1, 200.2,. The 32 input data obtained in step S240 are respectively input to the 32 input units of each L. Neural networks 200.1, 200.2,. L outputs output data OUT1, OUT2,..., OUTL, respectively, according to the input data. The output data OUT1, OUT2,..., OUTL are input to the determination circuit 250.

ここで、ニューラルネットワークが出力する出力データについて詳細に説明する。一例として、ニューラルネットワーク２００．１について説明する。ニューラルネットワーク２００．１では、３２個の入力ユニットに、ステップＳ２４０で求めた３２個の入力データが、それぞれ入力されると、複数の出力ユニットから、複数の結果データＯＴ１，ＯＴ２，・・・，ＯＴｃがそれぞれ出力される。なお、結果データＯＴ１，ＯＴ２，・・・，ＯＴｃは、“０”〜“１”の間の小数（たとえば、０．２、０．８等）である。また、結果データＯＴ１，ＯＴ２，・・・，ＯＴｃの少なくとも２つが、同じ値となることもある。 Here, the output data output from the neural network will be described in detail. As an example, the neural network 200.1 will be described. In the neural network 200.1, when the 32 input data obtained in step S240 are input to 32 input units, respectively, a plurality of result data OT1, OT2,. OTc is output respectively. The result data OT1, OT2,..., OTc are decimal numbers between “0” and “1” (for example, 0.2, 0.8, etc.). Further, at least two of the result data OT1, OT2,..., OTc may have the same value.

ニューラルネットワーク２００．１から出力される結果データＯＴ１，ＯＴ２，・・・，ＯＴｃから構成されたデータが、出力データＯＵＴ１である。 Data composed of result data OT1, OT2,..., OTc output from the neural network 200.1 is output data OUT1.

ニューラルネットワーク２００．２，・・・，２００．Ｌの各々も、ニューラルネットワーク２００．１と同様に、３２個の入力ユニットに、ステップＳ２４０で求めた３２個の入力データが、それぞれ入力されると、複数の出力ユニットから、複数の結果データＯＴ１，ＯＴ２，・・・，ＯＴｃをそれぞれ出力する。ニューラルネットワーク２００．２から出力される結果データＯＴ１，ＯＴ２，・・・，ＯＴｃから構成されたデータが、出力データＯＵＴ２である。また、ニューラルネットワーク２００．Ｌから出力される結果データＯＴ１，ＯＴ２，・・・，ＯＴｃから構成されたデータが、出力データＯＵＴＬある。その後、ステップＳ２６０に進む。 Neural network 200.2,. Similarly to the neural network 200.1, when each of the 32 input data obtained in step S240 is input to each of the 32 input units, each of L is output from the plurality of output units to the plurality of result data OT1. , OT2,..., OTc. Data composed of result data OT1, OT2,..., OTc output from the neural network 200.2 is output data OUT2. The neural network 200. Data composed of result data OT1, OT2,..., OTc output from L is output data OUTL. Thereafter, the process proceeds to step S260.

ステップＳ２６０では、交通音判定処理が行なわれる。交通音判定処理とは、Ｓ２００で収集した交通音が、予め定めた複数種類の交通音のいずれであるかの判定を行なう処理である。 In step S260, a traffic sound determination process is performed. The traffic sound determination process is a process for determining which of the plurality of predetermined types of traffic sounds collected in S200.

図８は、交通音判定処理の流れを示すフローチャートである。 FIG. 8 is a flowchart showing the flow of traffic sound determination processing.

図８を参照して、ステップＳ２６２では、判定回路２５０が、入力された出力データＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴＬに対し、データ処理を行なう。 Referring to FIG. 8, in step S262, determination circuit 250 performs data processing on input output data OUT1, OUT2,..., OUTL.

一例として、ニューラルネットワーク２００．１が出力する出力データＯＵＴ１のデータ処理について説明する。判定回路２５０は、出力データＯＵＴ１を構成する結果データＯＴ１，ＯＴ２，・・・，ＯＴｃのうち、所定のしきい値（たとえば、０．５）以上を示す結果データを有効とする。したがって、有効な結果データは、２つ以上のときもある。 As an example, data processing of output data OUT1 output from the neural network 200.1 will be described. The determination circuit 250 validates the result data indicating a predetermined threshold value (for example, 0.5) or more among the result data OT1, OT2,... OTc constituting the output data OUT1. Thus, there may be more than one valid result data.

判定回路２５０は、有効な結果データに対応づけられた交通音を判定する。なお、ニューラルネットワーク２００．１が出力した複数の結果データのうち、有効な結果データが２つ以上である場合、２つの有効な結果データにそれぞれ対応する２つの交通音（たとえば、「衝突音」、「急ブレーキ音」）が判定されたとする。 The determination circuit 250 determines a traffic sound associated with valid result data. In addition, when there are two or more effective result data among the plurality of result data output from the neural network 200.1, two traffic sounds (for example, “collision sound”) respectively corresponding to the two effective result data. , “Sudden brake sound”) is determined.

ニューラルネットワーク２００．２，・・・，Ｌがそれぞれ出力する出力データＯＵＴ２，・・・，ＯＵＴＬについても、判定回路２５０は、前述のニューラルネットワーク２００．１と同様にデータ処理するので詳細な説明は繰り返さない。その後、ステップＳ２６４に進む。 As for the output data OUT2,..., OUTL output from the neural network 200.2,..., L, respectively, the determination circuit 250 performs data processing in the same manner as the above-described neural network 200.1. Do not repeat. Thereafter, the process proceeds to step S264.

ステップＳ２６４では、ニューラルネットワーク２００．２，・・・，２００．Ｌの各々の判定結果の多数決がとられる。具体的には、判定回路２５０が、ニューラルネットワーク２００．１，・・・，２００．Ｌの各々で判定された交通音の集計を行なう。たとえば、「衝突音」が、２つのニューラルネットワークで判定されたなら、「衝突音」を２件とする。たとえば、「急ブレーキ音」が、３つのニューラルネットワークで判定されたなら、「急ブレーキ音」を３件とする。このような集計を行ない、最終的に、最多件数の交通音を、ニューラルネットワーク部２００の判定結果とする。なお、本実施の形態では、最多件数の交通音が複数の場合、当該複数の最多件数の交通音の中に、マイク１００から収集させた交通音としての入力ベクトルｘ_ｉに対応するカテゴリラベルｙ_ｉに対応する交通音が含まれていれば、ニューラルネットワーク部２００は、正しい判定を行なったとする。そして、この交通音判定処理は、終了し、バギング判定処理に戻る。 In step S264, the neural network 200.2,. The majority of each L determination result is taken. Specifically, the determination circuit 250 includes neural networks 200.1,. The traffic sounds determined for each of L are tabulated. For example, if “collision sound” is determined by two neural networks, two “collision sounds” are set. For example, if “sudden brake sound” is determined by three neural networks, “sudden brake sound” is set to three. Such aggregation is performed, and finally, the largest number of traffic sounds are set as the determination results of the neural network unit 200. In the present embodiment, when there are a plurality of traffic sounds, the category label y corresponding to the input vector x _i as the traffic sound collected from the microphone 100 among the plurality of traffic sounds. If the traffic sound corresponding to _i is included, it is assumed that the neural network unit 200 performs a correct determination. And this traffic sound determination process is complete | finished and it returns to a bagging determination process.

再び、図５を参照して、ステップＳ２６０の処理の後、このバギング判定処理は終了する。 Referring again to FIG. 5, after the process of step S260, the bagging determination process ends.

なお、本実施の形態におけるバギング判定処理においては、複数のニューラルネットワークの各々が出力する結果データが所定のしきい値以上を示す結果データを有効なデータとして処理したが、複数のニューラルネットワークの各々が出力する結果データの出力値を、対応する交通音毎に合計し、最も大きな値に対応する交通音を判定された交通音としてもよい。 In the bagging determination process in the present embodiment, the result data indicating that the result data output from each of the plurality of neural networks is equal to or greater than a predetermined threshold is processed as valid data. The output values of the result data output by may be summed up for each corresponding traffic sound, and the traffic sound corresponding to the largest value may be determined as the determined traffic sound.

次に、実際に具体的なデータを使用した場合の、前述したバギングによる処理の説明と、処理の結果について説明する。 Next, description will be given of the above-described processing by bagging and the results of processing when specific data is actually used.

図９は、交通音の判定に使用したデータテーブルＴ１００を示す図である。 FIG. 9 is a diagram showing a data table T100 used for determination of traffic sound.

図９を参照して、本実施の形態では、１１９個の交通音データを使用した。１１９個の交通音データは、カテゴリＮ１〜Ｎ１１、すなわち、１１個のカテゴリのいずれかに属する。また、１１９／１１＝１０．８により、１カテゴリあたり約１１件の交通音データとなる。なお、使用する交通音のデータの個数は、１１９個に限定されることはなく任意の個数であってよい。また、カテゴリの個数も１１個に限定されることはなく任意の個数であってよい。 Referring to FIG. 9, 119 traffic sound data are used in the present embodiment. The 119 traffic sound data belong to categories N1 to N11, that is, any of 11 categories. Further, with 119/11 = 10.8, approximately 11 traffic sound data per category are obtained. The number of traffic sound data to be used is not limited to 119, and may be any number. Also, the number of categories is not limited to 11 and may be any number.

一例として、カテゴリＮ１について説明する。カテゴリＮ１は、「衝突音」のカテゴリである。また、カテゴリＮ１には、１０個の交通音データが含まれる。 As an example, the category N1 will be described. The category N1 is a category of “collision sound”. The category N1 includes 10 pieces of traffic sound data.

また、カテゴリＮ１〜Ｎ１１は、複数のグループに分けられる。本件では、２つのグループ（グループＡ，グループＢ）に分けた。グループＡは「事故音」、グループＢは事故音以外の「非事故音」とした。なお、グループの数は、２個に限定されることはなく任意の個数であってよい。 The categories N1 to N11 are divided into a plurality of groups. In this case, it was divided into two groups (Group A and Group B). Group A was “accident sound” and Group B was “non-accident sound” other than accident sound. Note that the number of groups is not limited to two and may be an arbitrary number.

以下に、実験条件を説明する。１１９個の交通音データのうち、１００個を学習に用い、残りの１９個を判定に用いた。また、ニューラルネットワークの個数は１０個とした。また、ニューラルネットワークの入力ユニットの数は３２個とした。また、ニューラルネットワークの隠れ層の数は、１つとした。また、ニューラルネットワークの隠れユニットの数は２０個とした。また、ニューラルネットワークの学習率は、“０．２”とした。また、ニューラルネットワークから出力される結果データの値の範囲は、“０”〜“１”となるように設定される。 The experimental conditions are described below. Of the 119 traffic sound data, 100 were used for learning and the remaining 19 were used for determination. The number of neural networks was 10. The number of input units of the neural network is 32. The number of hidden layers in the neural network is one. The number of hidden units in the neural network was 20. The learning rate of the neural network was set to “0.2”. The range of the value of the result data output from the neural network is set to be “0” to “1”.

本実施の形態では、条件Ａ、条件Ｂ、条件Ｃの３つの条件で実験を行なった。条件Ａは、ニューラルネットワークに学習させる場合、学習回路２１０が、交通音データが、１１個のカテゴリのうちどのカテゴリに属するかを判定してニューラルネットワークを学習させる。学習させたニューラルネットワークで交通音を判定するときは、判定回路２５０が、交通音データが、１１個のカテゴリのうちどのカテゴリに属するかを判定する。 In the present embodiment, the experiment was performed under three conditions of condition A, condition B, and condition C. When the condition A causes the neural network to learn, the learning circuit 210 determines which of the 11 categories the traffic sound data belongs to and learns the neural network. When determining the traffic sound using the learned neural network, the determination circuit 250 determines which category of the 11 categories the traffic sound data belongs to.

条件Ｂは、ニューラルネットワークに学習させる場合、学習回路２１０が、交通音データが、２つのグループのうちどのグループに属するかを判定してニューラルネットワークを学習させる。学習させたニューラルネットワークで交通音を判定するときは、判定回路２５０が、交通音データが、２つのグループのうちどのグループに属するかを判定する。 In condition B, when the neural network learns, the learning circuit 210 determines which of the two groups the traffic sound data belongs to and learns the neural network. When determining the traffic sound using the learned neural network, the determination circuit 250 determines which group of the two groups the traffic sound data belongs to.

条件Ｃは、ニューラルネットワークに学習させる場合は、条件Ａと同様である。学習させたニューラルネットワークで交通音を判定するときは、条件Ｂと同様である。 Condition C is the same as condition A when the neural network is trained. When the traffic sound is determined by the learned neural network, it is the same as the condition B.

まず、バギングでの条件Ａにおける、データテーブルＴ１００に基づいたバギング集団学習処理を説明する。 First, the bagging collective learning process based on the data table T100 under the bagging condition A will be described.

再び、図４を参照して、ステップＳ１００では、ニューラルネットワークの数が１０個に設定される。また、各ニューラルネットワークの入力ユニットの数は３２個に設定される。また、ニューラルネットワークの隠れ層の数は、１つに設定される。また、ニューラルネットワークの隠れユニットの数は２０個に設定される。また、ニューラルネットワークの学習率は、“０．２”に設定される。また、ニューラルネットワークから出力される結果データの値の範囲は、“０”〜“１”となるように設定される。 Referring to FIG. 4 again, in step S100, the number of neural networks is set to ten. Further, the number of input units of each neural network is set to 32. The number of hidden layers of the neural network is set to one. The number of hidden units in the neural network is set to 20. The learning rate of the neural network is set to “0.2”. The range of the value of the result data output from the neural network is set to be “0” to “1”.

また、条件Ａにより、ニューラルネットワークの出力ユニットの数が１１個に設定される。したがって、学習用の１００個の交通音データの入力ベクトルｘ_１，・・・，ｘ_1００にそれぞれ対応するカテゴリラベルｙ_１，・・・，ｙ_1００の示すデータは、“１”〜“１１”のいずれかに設定される。なお、カテゴリラベルｙ_ｉの示す値“１”〜“１１”は、カテゴリＮ１〜Ｎ１１にそれぞれ対応する。その後、ステップＳ１１０に進む。 Further, according to the condition A, the number of output units of the neural network is set to 11. Therefore, the data indicated by the category labels y ₁ ,..., Y ₁₀₀ corresponding to the input vectors x ₁ ,..., X ₁₀₀ of ₁₀₀ traffic sound data for learning are “1” to “11”, respectively. Is set to one of the following. Note that the values “1” to “11” indicated by the category labels y _i correspond to the categories N1 to N11, respectively. Then, it progresses to step S110.

ステップＳ１１０では、前述した学習セット生成処理が行なわれる。その後、ステップＳ１２０に進む。 In step S110, the learning set generation process described above is performed. Thereafter, the process proceeds to step S120.

ステップＳ１２０では、前述のニューラルネットワーク学習処理が行なわれる。その後、ステップＳ１３０、Ｓ１４０の処理が順に行なわれる。 In step S120, the above-described neural network learning process is performed. Thereafter, the processes of steps S130 and S140 are performed in order.

ステップＳ１３０、Ｓ１４０の処理は、前述のステップＳ１３０、Ｓ１４０の処理と、それぞれ同様な処理が行なわれるので詳細な説明は繰り返さない。その後、ステップＳ１５０に進む。 Since the processes in steps S130 and S140 are the same as the processes in steps S130 and S140 described above, detailed description will not be repeated. Thereafter, the process proceeds to step S150.

ステップＳ１５０では、ｋがＬ（１０）より大きいか否かが判定される。ステップＳ１５０において、ＮＯの場合、再度、ステップＳ１１０に進む。 In step S150, it is determined whether k is larger than L (10). If NO in step S150, the process proceeds to step S110 again.

以上説明した、ステップＳ１１０，Ｓ１２０，Ｓ１３０，Ｓ１４０の処理が、ステップＳ１５０の条件を満たすまで繰り返されることにより、ニューラルネットワーク２００．１，２００．２，・・・，２００．１０の全てが学習済みとなる。このとき、識別器の集団には、学習済みのニューラルネットワーク２００．１，２００．２，・・・，２００．１０が含まれる。 All of the neural networks 200.1, 200.2,..., 200.10 have been learned by repeating the processing of steps S110, S120, S130, and S140 described above until the condition of step S150 is satisfied. It becomes. At this time, the group of classifiers includes learned neural networks 200.1, 200.2, ..., 200.10.

次に、バギングでの条件Ａにおける、データテーブルＴ１００に基づいたバギング判定処理を説明する。ここでは、判定用に１９個の交通音データが用いられ、入力ベクトルは、ｘ_１０１〜ｘ_１１９となる。 Next, the bagging determination process based on the data table T100 under the bagging condition A will be described. Here, 19 of traffic sound data are used for determining the input vector _becomes x 101 _{~x 119.}

再び図６を参照して、ステップＳ２００，Ｓ２１０，Ｓ２２０，Ｓ２３０，Ｓ２４０の処理が順に行なわれる。ステップＳ２００，Ｓ２１０，Ｓ２２０，Ｓ２３０，Ｓ２４０の処理は、前述のステップＳ２００，Ｓ２１０，Ｓ２２０，Ｓ２３０，Ｓ２４０の処理と、それぞれ同様な処理が行なわれるので、詳細な説明は繰り返さない。その後、ステップＳ２５０に進む。 Referring to FIG. 6 again, steps S200, S210, S220, S230, and S240 are performed in order. Since the processes in steps S200, S210, S220, S230, and S240 are respectively similar to the processes in steps S200, S210, S220, S230, and S240 described above, detailed description thereof will not be repeated. Thereafter, the process proceeds to step S250.

ステップＳ２５０では、ニューラルネットワーク２００．１，２００．２，・・・，２００．１０の各々が有する３２個の入力ユニットに、ステップＳ２４０で求めた３２個の入力データが、それぞれ入力される。ニューラルネットワーク２００．１，２００．２，・・・，２００．１０は、入力データに応じて、出力データＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴ１０をそれぞれ出力する。出力データＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴ１０は、判定回路２５０へ入力される。 In step S250, the 32 input data obtained in step S240 are input to the 32 input units included in each of the neural networks 200.1, 200.2,. The neural networks 200.1, 200.2,..., 200.10 output output data OUT1, OUT2,. The output data OUT1, OUT2,..., OUT10 are input to the determination circuit 250.

なお、カテゴリの数が１１個なので、ＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴ１０の各々は、結果データＯＴ１，ＯＴ２，・・・，ＯＴ１１から構成される。その後、ステップＳ２６０に進む。 Since the number of categories is 11, each of OUT1, OUT2,..., OUT10 is composed of result data OT1, OT2,. Thereafter, the process proceeds to step S260.

ステップＳ２６０では、交通音判定処理が行なわれる。 In step S260, a traffic sound determination process is performed.

交通音判定処理では、前述したステップＳ２６２，Ｓ２６４において、Ｌ＝１０とし、ｃ＝１１としたきの処理と同様なので詳細な説明は繰り返さない。そして、バギング判定処理は終了する。 Since the traffic sound determination process is the same as the process when L = 10 and c = 11 in steps S262 and S264 described above, detailed description will not be repeated. Then, the bagging determination process ends.

そして、１１９個の交通音データのうち学習に用いる１００個の交通音データのランダムな選定処理、１１９個の交通音データのうち判定に用いる１９個の交通音データのランダムな選定処理、当該１００個の交通音データの学習処理および当該１９個の交通音データの判定処理を２０回繰返し行ない、正しい判定が行なわれた確率（以下においては、「正解率」とも称する。）を求める。 Then, random selection processing of 100 traffic sound data used for learning out of 119 traffic sound data, random selection processing of 19 traffic sound data used for determination out of 119 traffic sound data, 100 The learning process for each piece of traffic sound data and the judgment process for the 19 pieces of traffic sound data are repeated 20 times, and the probability that correct judgment is made (hereinafter also referred to as “correct answer rate”) is obtained.

図１０は、実験結果を示すデータテーブルＴ２００を示す図である。 FIG. 10 is a diagram showing a data table T200 showing experimental results.

図１０を参照して、ＮＮ（ニューラルネットワーク）の列は、複数のニューラルネットワークではなく、従来の１つのニューラルネットワークで、学習、判定処理を行なった場合の正解率を示す。前述したバギングでの条件Ａにおける、データテーブルＴ１００に基づいたバギング判定処理の正解率は、０．５５５となった。条件Ａでは、バギングの正解率は、１つのニューラルネットワークの正解率（０．５５４）とほぼ同等という結果となった。 Referring to FIG. 10, the column of NN (neural network) indicates a correct answer rate when learning and determination processing are performed by one conventional neural network instead of a plurality of neural networks. The accuracy rate of the bagging determination process based on the data table T100 under the above-described bagging condition A was 0.555. Under condition A, the correct answer rate of bagging was almost equal to the correct answer rate (0.554) of one neural network.

次に、バギングでの条件Ｂにおける、データテーブルＴ１００に基づいたバギング集団学習処理を説明する。 Next, the bagging group learning process based on the data table T100 under the bagging condition B will be described.

再び、図４を参照して、ステップＳ１００では、条件Ａのときと比較して、ニューラルネットワークの出力ユニットの数が２個に設定される点のみが異なる。したがって、学習用の１００個の交通音データの入力ベクトルｘ_１，・・・，ｘ_1００にそれぞれ対応するカテゴリラベルｙ_１，・・・，ｙ_1００の示すデータは、“１”，“２”のいずれかに設定される。なお、カテゴリラベルｙ_ｉの示す値“１”，“２”は、グループＡ，Ｂにそれぞれ対応する。それ以外は、条件ＡのときのステップＳ１００の処理と同様なので詳細な説明は繰り返さない。その後、ステップＳ１１０，Ｓ１２０，Ｓ１３０，Ｓ１４０，Ｓ１５０の処理が順に行なわれる。 Referring again to FIG. 4, step S100 is different from condition A only in that the number of output units of the neural network is set to two. Therefore, the data indicated by the category labels y ₁ ,..., Y ₁₀₀ corresponding to the input vectors x ₁ ,..., X ₁₀₀ of the ₁₀₀ traffic sound data for learning are “1” and “2”, respectively. Is set to one of the following. The values “1” and “2” indicated by the category label y _i correspond to the groups A and B, respectively. Other than that, it is the same as the process of step S100 in the case of condition A, and therefore detailed description will not be repeated. Thereafter, the processes of steps S110, S120, S130, S140, and S150 are performed in order.

ステップＳ１１０，Ｓ１２０，Ｓ１３０，Ｓ１４０，Ｓ１５０の処理は、前述の条件ＡのときのステップＳ１１０，Ｓ１２０，Ｓ１３０，Ｓ１４０，Ｓ１５０の処理とそれぞれ、同様処理が行なわれるので詳細な説明は繰り返さない。そして、バギング集団学習処理は終了する。 Since the processes in steps S110, S120, S130, S140, and S150 are the same as the processes in steps S110, S120, S130, S140, and S150 for the condition A described above, detailed description will not be repeated. Then, the bagging group learning process ends.

次に、バギングでの条件Ｂにおける、データテーブルＴ１００に基づいたバギング判定処理を説明する。ここでは、判定用に１９個の交通音データが用いられ、入力ベクトルはｘ_１０１〜ｘ_1１９となる。 Next, the bagging determination process based on the data table T100 under the bagging condition B will be described. Here, 19 traffic sound data are used for determination, and the input vectors are x ₁₀₁ to x ₁₁₉ .

再び図５を参照して、ステップＳ２００，Ｓ２１０，Ｓ２２０，Ｓ２３０，Ｓ２４０の処理が順に行なわれる。ステップＳ２００，Ｓ２１０，Ｓ２２０，Ｓ２３０，Ｓ２４０の処理は、条件ＡのときのステップＳ２００，Ｓ２１０，Ｓ２２０，Ｓ２３０，Ｓ２４０の処理と、それぞれ同様な処理が行なわれるので、詳細な説明は繰り返さない。その後、ステップＳ２５０に進む。 Referring to FIG. 5 again, steps S200, S210, S220, S230, and S240 are performed in order. Since the processes in steps S200, S210, S220, S230, and S240 are the same as the processes in steps S200, S210, S220, S230, and S240 for condition A, detailed description thereof will not be repeated. Thereafter, the process proceeds to step S250.

ステップＳ２５０では、条件ＡのときのＳ２５０の処理と同様な処理が行なわれる。そして、出力データＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴ１０は、判定回路２５０へ入力される。 In step S250, a process similar to the process of S250 for condition A is performed. The output data OUT1, OUT2,..., OUT10 are input to the determination circuit 250.

なお、グループ数が２なので、ＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴ１０の各々は、結果データＯＴ１，ＯＴ２から構成される。その後、ステップＳ２６０に進む。 Since the number of groups is 2, each of OUT1, OUT2,..., OUT10 includes result data OT1 and OT2. Thereafter, the process proceeds to step S260.

交通音判定処理では、前述したステップＳ２６２，Ｓ２６４において、Ｌ＝１０とし、ｃ＝２としたきの処理と同様なので詳細な説明は繰り返さない。前述した処理でバギング判定処理は終了する。 Since the traffic sound determination process is the same as the process when L = 10 and c = 2 in steps S262 and S264 described above, detailed description will not be repeated. The bagging determination process ends with the above-described process.

そして、１１９個の交通音データのうち学習に用いる１００個の交通音データのランダムな選定処理、１１９個の交通音データのうち判定に用いる１９個の交通音データのランダムな選定処理、当該１００個の交通音データの学習処理および当該１９個の交通音データの判定処理を２０回繰返し行ない、前述の正解率を求める。 Then, random selection processing of 100 traffic sound data used for learning out of 119 traffic sound data, random selection processing of 19 traffic sound data used for determination out of 119 traffic sound data, 100 The learning process for each piece of traffic sound data and the determination process for the 19 pieces of traffic sound data are repeated 20 times to obtain the above-mentioned correct answer rate.

再び図１０を参照して、前述したバギングでの条件Ｂにおける、データテーブルＴ１００に基づいたバギング判定処理の正解率は、０．８２４となった。条件Ｂでは、バギング判定処理の正解率は、１つのニューラルネットワークの正解率（０．７６６）より、約６％も向上している。したがって、本発明における交通音識別装置は、交通音の識別を非常に高い精度で行なうことができたといえる。 Referring to FIG. 10 again, the accuracy rate of the bagging determination process based on the data table T100 in the above-described bagging condition B is 0.824. Under condition B, the accuracy rate of the bagging determination process is improved by about 6% from the accuracy rate of one neural network (0.766). Therefore, it can be said that the traffic sound identification device according to the present invention was able to identify traffic sounds with very high accuracy.

次に、バギングでの条件Ｃにおける、データテーブルＴ１００に基づいたバギング集団学習処理を説明する。 Next, the bagging collective learning process based on the data table T100 under the bagging condition C will be described.

再び、図４を参照して、条件Ｃにおいては、条件ＡのときのステップＳ１００，Ｓ１１０，Ｓ１２０，Ｓ１３０，Ｓ１４０，Ｓ１５０の処理が順に行なわれる。 Referring to FIG. 4 again, in condition C, the processes of steps S100, S110, S120, S130, S140, and S150 for condition A are performed in order.

ステップＳ１００，Ｓ１１０，Ｓ１３０，Ｓ１２０，Ｓ１４０，Ｓ１５０の処理は、前述の条件ＡのときのステップＳ１００，Ｓ１１０，Ｓ１３０，Ｓ１２０，Ｓ１４０，Ｓ１５０の処理と、それぞれ同様な処理が行なわれるので詳細な説明は繰り返さない。 Since the processes in steps S100, S110, S130, S120, S140, and S150 are the same as the processes in steps S100, S110, S130, S120, S140, and S150 when the condition A is described above, detailed description will be given. Will not repeat.

次に、バギングでの条件Ｃにおける、データテーブルＴ１００に基づいたバギング判定処理を説明する。ここでは、判定用に１９個の交通音データが用いられ、入力ベクトルはｘ_１０１〜ｘ_1１９となる。 Next, the bagging determination process based on the data table T100 under the bagging condition C will be described. Here, 19 traffic sound data are used for determination, and the input vectors are x ₁₀₁ to x ₁₁₉ .

再び図５を参照して、条件Ｃにおいては、条件ＢのときのステップＳ２００，Ｓ２１０，Ｓ２２０，Ｓ２３０，Ｓ２４０，Ｓ２５０，Ｓ２６０の処理が順に行なわれる。ステップＳ２００，Ｓ２１０，Ｓ２２０，Ｓ２３０，Ｓ２４０，Ｓ２５０，Ｓ２６０の処理は、条件ＢのときのステップＳ２００，Ｓ２１０，Ｓ２２０，Ｓ２３０，Ｓ２４０，Ｓ２５０，Ｓ２６０の処理とそれぞれ同様な処理が行なわれるので、詳細な説明は繰り返さない。そして、バギング判定処理は終了する。 Referring to FIG. 5 again, under condition C, the processes of steps S200, S210, S220, S230, S240, S250, and S260 for condition B are performed in order. The processes in steps S200, S210, S220, S230, S240, S250, and S260 are the same as the processes in steps S200, S210, S220, S230, S240, S250, and S260 in the case of condition B. The explanation is not repeated. Then, the bagging determination process ends.

再び図１０を参照して、前述したバギングでの条件Ｃにおける、データテーブルＴ１００に基づいたバギング判定処理の正解率は、０．８４３となった。条件Ｃでは、バギング判定処理の正解率は、１つのニューラルネットワークの正解率（０．８００）より、約４％も向上している。 Referring to FIG. 10 again, the accuracy rate of the bagging determination process based on the data table T100 under the above-described bagging condition C is 0.843. Under condition C, the accuracy rate of the bagging determination process is improved by about 4% from the accuracy rate of one neural network (0.800).

また、グループレベルで学習を行ない、グループレベルで判定を行なう条件Ｂの正解率（０．８２４）より、カテゴリレベルで学習を行ない、グループレベルで判定を行なう条件Ｃの方が、正解率が０．８４３と、約２％も向上した。したがって、条件Ｃのバギングは、交通音の識別に非常に有効であるといえる。したがって、本発明における交通音識別装置は、交通音の識別を非常に高い精度で行なうことができたといえる。 In addition, the correct answer rate of condition C in which learning is performed at the category level and determination is performed at the group level is 0, compared to the correct answer rate (0.824) of condition B in which learning is performed at the group level and determination is performed at the group level. .843, an improvement of about 2%. Therefore, it can be said that the bagging of the condition C is very effective for identifying traffic sounds. Therefore, it can be said that the traffic sound identification device according to the present invention was able to identify traffic sounds with very high accuracy.

＜第２の実施の形態＞
第１の実施の形態では、バギングについて説明したが、本実施の形態では、アダブーストについて説明する。アダブーストでは、詳細は後述するが、学習セットＳ_ｋの作成方法がバギングと異なる。 <Second Embodiment>
In the first embodiment, bagging has been described, but in this embodiment, Adaboost will be described. In the Adaboost, it will be described in detail later, how to create a training set S _k is different from the bagging.

本実施の形態では使用する交通音識別装置は、第１の実施の形態の交通音識別装置１０００と同じであるので、構成および機能については、詳細な説明は繰り返さない。 Since the traffic sound identification device used in the present embodiment is the same as traffic sound identification device 1000 of the first embodiment, detailed description of the configuration and functions will not be repeated.

次に、アダブーストの集団学習処理（以下においては、アダブースト集団学習処理とも称する）について説明する。 Next, the Adaboost collective learning process (hereinafter also referred to as Adaboost collective learning process) will be described.

アダブースト集団学習処理では、バギング集団学習処理と同様、前述の（１）式で表される学習セットＳを利用する。 In the AdaBoost group learning process, the learning set S expressed by the above-described equation (1) is used as in the bagging group learning process.

図１１は、アダブースト集団学習処理の流れを示すフローチャートである。 FIG. 11 is a flowchart showing the flow of the AdaBoost group learning process.

図１１を参照して、ステップＳ３００では、ステップＳ１００と同様な初期化処理が行なわれる。ステップＳ３００では、さらに、学習セットＳ_ｋを作成する際に使用する分布ｗ_ｋの初期化が行なわれる。分布ｗ_ｋは、学習セットＳ_ｋを作成する毎に更新される。具体的には、１番目の分布Ｗ_１が求められる。分布Ｗ_１は、次の（２）式によって求められる。 Referring to FIG. 11, in step S300, initialization processing similar to that in step S100 is performed. In step S300, the distribution w _k used for creating the learning set S _k is further initialized. The distribution w _k is updated every time the learning set S _k is created. More specifically, the first distribution W ₁ is required. Distribution _{W 1} is determined by the following equation (2).

Ｂは、全てのミスラベルの集合である。ここで、ミスラベルとは、事例ｉが入力ベクトルｘ_ｉのインデックスである場合、事例ｉの正しくないラベルｙのことである。すなわち、（ｉ，ｙ）は、正しくないラベルのペアを示す。Ｂは、次の（３）式で表される。 B is a set of all mislabels. Here, the mislabeling, if the case i is an index of the input vector x _i, is that of incorrect label y of cases i. That is, (i, y) indicates an incorrect label pair. B is expressed by the following equation (3).

（２）式の｜Ｂ｜は、ミスラベル集合の要素の数であり、次の（４）式によって表される。 | B | in the equation (2) is the number of elements of the mislabel set, and is represented by the following equation (4).

ここで、ｃは、前述のカテゴリラベルまたはグループの数を示す。 Here, c indicates the number of the category labels or groups described above.

（２），（３），（４）式により、分布Ｗ_１が求められる。その後、ステップＳ３１０に進む。 The distribution W ₁ is obtained from the equations (2), (3), and (4). Thereafter, the process proceeds to step S310.

ステップＳ３１０では、分布ｗ_ｋを使用してｋ番目の学習セットＳ_ｋの生成処理が行なわれる。このとき、ｋの初期値は、“１”に設定されている。 At step S310, the generation processing of the k-th training set _{S k} is performed by using the distribution _{w k.} At this time, the initial value of k is set to “1”.

再び、図５を参照して、学習セットＳ_１、Ｓ_２、Ｓ_Ｌは、Ｎ個の交通音データからなる学習セットＳ（学習データ）から、Ｗ_１、Ｗ_２、Ｗ_Ｌにそれぞれ基づいて、重複を妨げないでランダムにＮ個の交通音データを選択することによって生成される。 Referring to FIG. 5 again, learning sets S ₁ , S ₂ , and S _L are based on W ₁ , W ₂ , and W _L from learning set S (learning data) including N pieces of traffic sound data, respectively. , By randomly selecting N traffic sound data without preventing duplication.

なお、Ｗ_１は定数なので、１番目の学習セットＳ_１のみは、学習セットＳから、重複を妨げないで一定の確率でランダムにＮ個の交通音データを選択することによって生成される。 Since W ₁ is a constant, only the _first learning set S ₁ is generated by randomly selecting N traffic sound data from the learning set S with a certain probability without preventing duplication.

再び図１１を参照して、ステップＳ３１０において、ｋ番目の学習セットＳ_ｋの生成処理が終了すると、ステップＳ３２０に進む。 Referring again to FIG. 11, in step S310, the when generation processing of the k-th training set _{S k} is terminated, the process proceeds to step S320.

ステップＳ３２０では、学習セットＳ_ｋをｋ番目のニューラルネットワーク（たとえば、ニューラルネットワーク２００．１）で学習させるＮＮ学習処理が行なわれる。ＮＮ学習処理は、前述のステップＳ１２０のＮＮ学習処理と同様なので詳細な説明は繰り返さない。その後、ステップＳ３３０に進む。 In step S320, the learning sets _{S k} a k-th neural networks (e.g., the neural network 200.1) NN learning process of learning is carried out. Since the NN learning process is similar to the NN learning process in step S120 described above, detailed description will not be repeated. Thereafter, the process proceeds to step S330.

ステップＳ３３０では、ｋ番目のニューラルネットワークの擬似ロスε_ｋの演算処理が行なわれる。擬似ロスε_ｋは、次の（５）式によって求められる。 In step S330, the k-th neural network pseudo loss ε _k is calculated. The pseudo loss ε _k is _obtained by the following equation (5).

ｈ_ｋ（ｘ_ｉ、ｙ_ｉ）は、ｋ番目のニューラルネットワークにｘ_ｉを入力した場合のｙ_ｉの出力ＯＴ_ｙｉの値（たとえば、０．８）であり、ｈ_ｋ（ｘ_ｉ、ｙ）は、ｋ番目のニューラルネットワークにｘ_ｉを入力した場合のｙ_ｉ以外のカテゴリラベルの出力の値の合計（たとえば、０．３）である。擬似ロスε_ｋの値が大きいほど、交通音データの学習が困難であったことを示す。擬似ロスε_ｋの値が小さい（“０”に近い）ほど、交通音データの学習が容易であったことを示す。その後、ステップＳ３４０に進む。 h _k (x _i , y _i ) is a value (for example, 0.8) of the output OT _yi of y _i when x _i is input to the k th neural network, and h _k (x _i , y) Is the sum (for example, 0.3) of the output values of category labels other than y _i when x _i is input to the k-th neural network. The larger the pseudo loss ε _k is, the more difficult it is to learn traffic sound data. The smaller the value of the pseudo loss ε _k (closer to “0”), the easier it is to learn traffic sound data. Thereafter, the process proceeds to step S340.

ステップＳ３４０では、分布ｗ_ｋの更新が行なわれる。分布ｗ_ｋの更新は、次の、（６）式によって行なわれる。 In step S340, the distribution w _k is updated. The distribution w _k is updated by the following equation (6).

Ｚ_ｋは、ｗ_ｋ＋１が確率部分布になるようにするための正規化定数である。β_ｋは、次の（７）式によって求められる。 Z _k is a normalization constant for making w _{k + 1 a} probability part distribution. β _k is _obtained by the following equation (7).

（５）式、（６）式、（７）式により、分布ｗ_ｋを更新した分布ｗ_ｋ＋１が求められる。その後、ステップＳ３５０に進む。 (5), (6) and (7), the distribution _{w k + 1} updating the distribution _{w k} is determined. Thereafter, the process proceeds to step S350.

ステップＳ３５０では、ｋ番目の学習済みのニューラルネットワーク（ニューラルネットワーク２００．１）を前述した識別器の集団に追加する。なお、初めて、ステップＳ３５０の処理が行なわれる前の識別器の集団には、何も含まれていない。その後、ステップＳ３６０に進む。 In step S350, the k-th learned neural network (neural network 200.1) is added to the group of classifiers described above. Note that for the first time, nothing is included in the group of classifiers before the process of step S350 is performed. Thereafter, the process proceeds to step S360.

ステップＳ３６０では、ｋが１インクリメントされる。その後、ステップＳ３７０の処理が行なわれる。 In step S360, k is incremented by one. Thereafter, the process of step S370 is performed.

ステップＳ３７０では、ｋがＬより大きいか否かが判定される。ステップＳ３７０において、ＮＯの場合、再度、ステップＳ３１０に進む。 In step S370, it is determined whether k is greater than L. If NO in step S370, the process proceeds to step S310 again.

以上説明した、ステップＳ３１０，Ｓ３２０，Ｓ３３０，Ｓ３４０，Ｓ３５０，Ｓ３６０の処理が、ステップＳ３７０の条件を満たすまで繰り返されることにより、ニューラルネットワーク２００．１，２００．２，・・・，２００．Ｌの全てが学習済みとなる。このとき、識別器の集団には、学習済みのニューラルネットワーク２００．１，２００．２，・・・，２００．Ｌが含まれる。 The processes of steps S310, S320, S330, S340, S350, and S360 described above are repeated until the condition of step S370 is satisfied, so that the neural networks 200.1, 200.2,. All of L is already learned. At this time, the trained neural networks 200.1, 200.2,..., 200. L is included.

以上の処理により、Ｓ３１０では、ｋの値を大きくする毎に、分布ｗ_ｋを変化させ、学習セットＳ_ｋを生成する。学習セットＳ_ｋは、ｋの値が大きい程、学習が困難であった交通音データを含む。すなわち、ｋの値が大きい学習セットＳ_ｋを入力データとするニューラルネットワーク２００．ｋは、交通音の判定をより正確にできるようになる。 By the above processing, in S310, each time increasing the value of k, to change the distribution _{w k,} to generate the training set _{S k.} The learning set _Sk includes traffic sound data that is difficult to learn as the value of k increases. That is, the neural network 200 as input data the training set S _k value is large k. k can determine the traffic sound more accurately.

そして、ステップＳ３７０において、ＹＥＳの場合、このアダブースト集団学習処理は終了する。 If YES in step S370, this Adaboost group learning process ends.

次に、アダブーストにおいて、収集した交通音の種類を判定させる処理（以下においては、アダブースト判定処理とも称する）について説明する。 Next, a process for determining the type of collected traffic sound in the AdaBoost (hereinafter also referred to as an AdaBoost determination process) will be described.

図１２は、アダブースト判定処理の流れを示すフローチャートである。 FIG. 12 is a flowchart showing the flow of the AdaBoost determination process.

次に、図１、図１２を参照して、アダブースト判定処理を説明する。 Next, the AdaBoost determination process will be described with reference to FIGS.

ステップＳ４００では、ステップＳ２００と同様な処理が行なわれるので詳細な説明は繰り返さない。その後、ステップＳ４１０に進む。 In step S400, the same processing as in step S200 is performed, and thus detailed description will not be repeated. Thereafter, the process proceeds to step S410.

ステップＳ４１０では、ステップＳ２１０と同様な処理が行なわれるので、詳細な説明は繰り返さない。ステップＳ４１０において、ＮＯの場合、再度、ステップＳ４００に進む。一方、ステップＳ４１０において、ＹＥＳの場合、ステップＳ４２０，Ｓ４３０，Ｓ４４０，Ｓ４５０の処理が順に行なわれる。 In step S410, processing similar to that in step S210 is performed, and thus detailed description will not be repeated. If NO in step S410, the process proceeds to step S400 again. On the other hand, if YES in step S410, the processes in steps S420, S430, S440, and S450 are performed in order.

ステップＳ４２０，Ｓ４３０，Ｓ４４０，Ｓ４５０の処理は、ステップＳ２２０，Ｓ２３０，Ｓ２４０，Ｓ２５０の処理とそれぞれ同様な処理が行なわれるので、詳細な説明は繰り返さない。その後、ステップＳ４６０に進む。 Since the processes of steps S420, S430, S440, and S450 are the same as the processes of steps S220, S230, S240, and S250, detailed description will not be repeated. Thereafter, the process proceeds to step S460.

ステップＳ４６０では、交通音判定処理が行なわれる。交通音判定処理では、判定回路２５０が、ニューラルネットワーク２００．１，２００．２，・・・，２００．Ｌからそれぞれ出力された出力データＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴＬの重み付け演算により交通音を判定する。具体的には、判定回路２５０が、次の重み付け演算の（８）式によって求められたカテゴリラベルｈ_ｆｉｎ（ｘ）により、交通音を判定する。 In step S460, traffic sound determination processing is performed. In the traffic sound determination process, the determination circuit 250 includes the neural networks 200.1, 200.2,. The traffic sound is determined by weighting calculation of the output data OUT1, OUT2,. Specifically, the determination circuit 250 determines the traffic sound based on the category label h _fin (x) obtained by the following equation (8) of the weighting calculation.

たとえば、カテゴリラベルｈ_ｆｉｎ（ｘ）が、“Ｎ２”であれば、判定する交通音は、カテゴリＮ２の急ブレーキ音と判定されたことになる。 For example, if the category label h _fin (x) is “N2”, the traffic sound to be determined is determined to be the sudden brake sound of the category N2.

その後、このアダブースト判定処理は終了する。 Thereafter, this Adaboost determination process ends.

次に、実際に具体的なデータを使用した場合の、前述したアダブーストによる処理の説明と、処理の結果について説明する。 Next, description will be given of the above-described processing by Adaboost and the processing results when specific data is actually used.

なお、本実施の形態における実験条件は、実施の形態１における実験条件と同じであるので詳細な説明は繰り返さない。 Since the experimental conditions in the present embodiment are the same as the experimental conditions in the first embodiment, detailed description will not be repeated.

まず、アダブーストでの条件Ａにおける、データテーブルＴ１００に基づいたアダブースト集団学習処理を説明する。 First, the AdaBoost group learning process based on the data table T100 under the Aboost condition A will be described.

再び、図１１を参照して、ステップＳ３００では、条件ＡでのステップＳ１００と同様な処理が行なわれるので詳細な説明は繰り返さない。その後、ステップＳ３１０，Ｓ３２０，Ｓ３３０，Ｓ３４０，Ｓ３５０，Ｓ３６０の処理が順に行なわれる。 Referring to FIG. 11 again, in step S300, the same processing as step S100 in condition A is performed, and therefore detailed description will not be repeated. Thereafter, steps S310, S320, S330, S340, S350, and S360 are sequentially performed.

ステップＳ３１０，Ｓ３２０，Ｓ３３０，Ｓ３４０，Ｓ３５０，Ｓ３６０の処理は、前述のステップＳ３１０，Ｓ３２０，Ｓ３３０，Ｓ３４０，Ｓ３５０，Ｓ３６０とそれぞれ、同様な処理が行なわれるので詳細な説明は繰り返さない。その後、ステップＳ３７０に進む。 Since the processes in steps S310, S320, S330, S340, S350, and S360 are the same as those in steps S310, S320, S330, S340, S350, and S360, detailed description will not be repeated. Thereafter, the process proceeds to step S370.

ステップＳ３７０では、ｋがＬ（１０）より大きいか否かが判定される。ステップＳ３７０において、ＮＯの場合、再度、ステップＳ３１０に進む。 In step S370, it is determined whether k is larger than L (10). If NO in step S370, the process proceeds to step S310 again.

以上説明した、ステップＳ３１０，Ｓ３２０，Ｓ３３０，Ｓ３４０，Ｓ３５０，Ｓ３６０の処理が、ステップＳ３７０の条件を満たすまで繰り返されることにより、ニューラルネットワーク２００．１，２００．２，・・・，２００．１０の全てが学習済みとなる。このとき、識別器の集団には、学習済みのニューラルネットワーク２００．１，２００．２，・・・，２００．１０が含まれる。 The processes of steps S310, S320, S330, S340, S350, and S360 described above are repeated until the condition of step S370 is satisfied, whereby the neural networks 200.1, 200.2,. Everything has been learned. At this time, the group of classifiers includes learned neural networks 200.1, 200.2, ..., 200.10.

次に、アダブーストでの条件Ａにおける、データテーブルＴ１００に基づいたアダブースト判定処理を説明する。 Next, the AdaBoost determination process based on the data table T100 under the AdaBoost condition A will be described.

再び図１２を参照して、ステップＳ４００，Ｓ４１０，Ｓ４２０，Ｓ４３０，Ｓ４４０の処理が順に行なわれる。ステップＳ４００，Ｓ４１０，Ｓ４２０，Ｓ４３０，Ｓ４４０の処理は、前述のステップＳ４００，Ｓ４１０，Ｓ４２０，Ｓ４３０，Ｓ４４０の処理と、それぞれ同様な処理が行なわれるので、詳細な説明は繰り返さない。その後、ステップＳ４５０に進む。 Referring to FIG. 12 again, steps S400, S410, S420, S430, and S440 are performed in order. Since the processes in steps S400, S410, S420, S430, and S440 are the same as the processes in steps S400, S410, S420, S430, and S440 described above, detailed description will not be repeated. Thereafter, the process proceeds to step S450.

ステップＳ４５０では、ニューラルネットワーク２００．１，２００．２，・・・，２００．１０の各々が有する３２個の入力ユニットに、ステップＳ２４０で求めた３２個の入力データが、それぞれ入力される。ニューラルネットワーク２００．１，２００．２，・・・，２００．１０は、入力データに応じて、出力データＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴ１０をそれぞれ出力する。出力データＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴ１０は、判定回路２５０へ入力される。 In step S450, the 32 input data obtained in step S240 are input to the 32 input units of each of the neural networks 200.1, 200.2,. The neural networks 200.1, 200.2,..., 200.10 output output data OUT1, OUT2,. The output data OUT1, OUT2,..., OUT10 are input to the determination circuit 250.

なお、カテゴリの数が１１個なので、ＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴ１０の各々は、結果データＯＴ１，ＯＴ２，・・・，ＯＴ１１から構成される。その後、ステップＳ４６０に進む。 Since the number of categories is 11, each of OUT1, OUT2,..., OUT10 is composed of result data OT1, OT2,. Thereafter, the process proceeds to step S460.

ステップＳ４６０では、交通音判定処理が行なわれる。 In step S460, traffic sound determination processing is performed.

交通音判定処理では、前述したステップＳ４６０において、Ｌ＝１０とし、ｃ＝１１としたきの処理と同様なので詳細な説明は繰り返さない。そして、アダブースト判定処理は終了する。 Since the traffic sound determination process is the same as the process when L = 10 and c = 11 in step S460 described above, detailed description will not be repeated. Then, the Adaboost determination process ends.

そして、前述したバギングの場合と同様に、学習処理および判定処理を２０回繰返し行ない、前述の正解率を求める。 Then, as in the case of the above-described bagging, the learning process and the determination process are repeated 20 times to obtain the above-described accuracy rate.

再び図１０を参照して、前述したアダブーストでの条件Ａにおける、データテーブルＴ１００に基づいたアダブースト判定処理の正解率は、０．５５６となった。条件Ａでは、アダブースト判定処理の正解率は、１つのニューラルネットワークの正解率（０．５５４）およびバギングの正解率（０．５５５）とほぼ同等という結果となった。 Referring to FIG. 10 again, the accuracy rate of the AdaBoost determination process based on the data table T100 under the above-described Aboost condition A is 0.556. Under the condition A, the accuracy rate of the AdaBoost determination process was almost equal to the accuracy rate of one neural network (0.554) and the accuracy rate of bagging (0.555).

次に、アダブーストでの条件Ｂにおける、データテーブルＴ１００に基づいたアダブースト集団学習処理を説明する。 Next, the AdaBoost group learning process based on the data table T100 under the AdaBoost condition B will be described.

再び、図１１を参照して、ステップＳ３００では、条件Ａのときと比較して、ニューラルネットワークの出力ユニットの数が２個に設定される点のみが異なり、それ以外は、前述のバギングの場合と同様なので詳細な説明は繰り返さない。その後、ステップＳ３１０，Ｓ３２０，Ｓ３３０，Ｓ３４０，Ｓ３５０，Ｓ３６０，Ｓ３７０の処理が順に行なわれる。 Referring to FIG. 11 again, in step S300, the difference is that the number of output units of the neural network is set to two as compared with the case of condition A. Otherwise, the above-described case of bagging is performed. Detailed description will not be repeated. Thereafter, steps S310, S320, S330, S340, S350, S360, and S370 are performed in order.

ステップＳ３１０，Ｓ３２０，Ｓ３３０，Ｓ３４０，Ｓ３５０，Ｓ３６０，Ｓ３７０の処理は、前述の条件ＡのときのステップＳ３１０，Ｓ３２０，Ｓ３３０，Ｓ３４０，Ｓ３５０，Ｓ３６０，Ｓ３７０の処理と、それぞれ同様な処理が行なわれるので詳細な説明は繰り返さない。そして、アダブースト集団学習処理は終了する。 The processes in steps S310, S320, S330, S340, S350, S360, and S370 are the same as the processes in steps S310, S320, S330, S340, S350, S360, and S370, respectively, under the condition A described above. Therefore, detailed description will not be repeated. Then, the AdaBoost group learning process ends.

次に、アダブーストでの条件Ｂにおける、データテーブルＴ１００に基づいたアダブースト判定処理を説明する。 Next, the AdaBoost determination process based on the data table T100 under the AdaBoost condition B will be described.

再び図１２を参照して、ステップＳ４００，Ｓ４１０，Ｓ４２０，Ｓ４３０，Ｓ４４０の処理が順に行なわれる。ステップＳ４００，Ｓ４１０，Ｓ４２０，Ｓ４３０，Ｓ４４０の処理は、条件ＡのときのステップＳ４００，Ｓ４１０，Ｓ４２０，Ｓ４３０，Ｓ４４０の処理と、それぞれ同様な処理が行なわれるので、詳細な説明は繰り返さない。その後、ステップＳ４５０に進む。 Referring to FIG. 12 again, steps S400, S410, S420, S430, and S440 are performed in order. Since the processes in steps S400, S410, S420, S430, and S440 are the same as the processes in steps S400, S410, S420, S430, and S440 for condition A, detailed description will not be repeated. Thereafter, the process proceeds to step S450.

ステップＳ４５０では、条件ＡのときのＳ４５０の処理と同様な処理が行なわれる。そして、出力データＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴ１０は、判定回路２５０へ入力される。 In step S450, a process similar to the process of S450 in the case of condition A is performed. The output data OUT1, OUT2,..., OUT10 are input to the determination circuit 250.

なお、グループ数が２なので、ＯＵＴ１，ＯＵＴ２，・・・，ＯＵＴ１０の各々は、結果データＯＴ１，ＯＴ２から構成される。その後、ステップＳ４６０に進む。 Since the number of groups is 2, each of OUT1, OUT2,..., OUT10 includes result data OT1 and OT2. Thereafter, the process proceeds to step S460.

交通音判定処理では、前述した条件ＡのときのステップＳ４６０において、Ｌ＝１０とし、ｃ＝２としたきの処理と同様なので詳細な説明は繰り返さない。そして、アダブースト判定処理は終了する。 The traffic sound determination process is the same as the process when L = 10 and c = 2 in step S460 for the condition A described above, and thus detailed description will not be repeated. Then, the Adaboost determination process ends.

そして、前述したバギングの場合と同様に、学習処理および判定処理を２０回繰返し行ない、正解率を求める。 Then, as in the case of bagging described above, the learning process and the determination process are repeated 20 times to obtain the correct answer rate.

再び図１０を参照して、前述したアダブーストでの条件Ｂにおける、データテーブルＴ１００に基づいたアダブースト判定処理の正解率は、０．７９３となった。条件Ｂでは、アダブースト判定処理の正解率は、バギングの正解率（０．８２４）よりも約３％悪いものの、１つのニューラルネットワークの正解率（０．７６６）よりは、約３％も向上している。したがって、本発明における交通音識別装置は、交通音の識別を非常に高い精度で行なうことができたといえる。 Referring to FIG. 10 again, the accuracy rate of the AdaBoost determination process based on the data table T100 under the above-described Condition B under AdaBoost is 0.793. Under condition B, the accuracy rate of the AdaBoost determination process is about 3% worse than the accuracy rate of bagging (0.824), but is about 3% higher than the accuracy rate of one neural network (0.766). ing. Therefore, it can be said that the traffic sound identification device according to the present invention was able to identify traffic sounds with very high accuracy.

次に、アダブーストでの条件Ｃにおける、データテーブルＴ１００に基づいたアダブースト集団学習処理を説明する。 Next, the Adaboost collective learning process based on the data table T100 under the Adaboost condition C will be described.

再び、図１１を参照して、条件Ｃにおいては、条件ＡのときのステップＳ３００，Ｓ３１０，Ｓ３２０，Ｓ３３０，Ｓ３４０，Ｓ３５０，Ｓ３６０，Ｓ３７０の処理が順に行なわれる。 Referring to FIG. 11 again, under condition C, steps S300, S310, S320, S330, S340, S350, S360, and S370 for condition A are sequentially performed.

ステップＳ３００，Ｓ３１０，Ｓ３２０，Ｓ３３０，Ｓ３４０，Ｓ３５０，Ｓ３６０，Ｓ３７０の処理は、前述の条件ＡのときのステップＳ３００，Ｓ３１０，Ｓ３２０，Ｓ３３０，Ｓ３４０，Ｓ３５０，Ｓ３６０，Ｓ３７０の処理と、それぞれ同様な処理が行なわれるので詳細な説明は繰り返さない。 The processes of steps S300, S310, S320, S330, S340, S350, S360, and S370 are the same as the processes of steps S300, S310, S320, S330, S340, S350, S360, and S370, respectively, under the above-described condition A. Since the process is performed, detailed description will not be repeated.

次に、アダブーストでの条件Ｃにおける、データテーブルＴ１００に基づいたアダブースト判定処理を説明する。 Next, the AdaBoost determination process based on the data table T100 under the AdaBoost condition C will be described.

再び図１２を参照して、条件Ｃにおいては、条件ＢのときのステップＳ４００，Ｓ４１０，Ｓ４２０，Ｓ４３０，Ｓ４４０，Ｓ４５０，Ｓ４６０の処理が順に行なわれる。ステップＳ４００，Ｓ４１０，Ｓ４２０，Ｓ４３０，Ｓ４４０，Ｓ４５０，Ｓ４６０の処理は、条件ＢのときのステップＳ４００，Ｓ４１０，Ｓ４２０，Ｓ４３０，Ｓ４４０，Ｓ４５０，Ｓ４６０の処理と、それぞれ同様な処理が行なわれるので、詳細な説明は繰り返さない。そして、アダブースト判定処理は終了する。 Referring to FIG. 12 again, under condition C, steps S400, S410, S420, S430, S440, S450, and S460 in the case of condition B are sequentially performed. Since the processes of steps S400, S410, S420, S430, S440, S450, and S460 are the same as the processes of steps S400, S410, S420, S430, S440, S450, and S460 for the condition B, respectively. Detailed description will not be repeated. Then, the Adaboost determination process ends.

再び図１０を参照して、前述したアダブーストでの条件Ｃにおける、データテーブルＴ１００に基づいたアダブーストの正解率は、０．８０５となった。条件Ｃでは、アダブーストの正解率は、１つのニューラルネットワークの正解率（０．８００）とほぼ同等という結果となった。 Referring to FIG. 10 again, the correct rate of Adaboost based on the data table T100 under the above-mentioned Condition C in Adaboost is 0.805. Under the condition C, the correct answer rate of AdaBoost was almost equal to the correct answer rate (0.800) of one neural network.

また、グループレベルで学習を行ない、グループレベルで判定を行なう条件Ｂの正解率（０．７９３）より、カテゴリレベルで学習を行ない、グループレベルで判定を行なう条件Ｃの方が、正解率が０．８０５と、約１％も向上した。したがって、条件Ｃのアダブーストは、交通音の識別に非常に有効であるといえる。したがって、本発明における交通音識別装置は、交通音の識別を非常に高い精度で行なうことができたといえる。 In addition, the correct answer rate of the condition C in which the learning is performed at the category level and the determination at the group level is 0 than the correct answer rate (0.793) of the condition B in which the learning is performed at the group level and the determination is performed at the group level. .805, an improvement of about 1%. Therefore, it can be said that the Adaboost of the condition C is very effective for identifying traffic sounds. Therefore, it can be said that the traffic sound identification device according to the present invention was able to identify traffic sounds with very high accuracy.

＜第３の実施の形態＞
本発明は、マイクが接続されたパーソナルコンピュータ（以下においては、ＰＣ（Personal Computer）とも称する）においても、適用可能である。その場合、交通音判定プログラム１８０Ａが記録された記録媒体から通信プログラムを読出して、ＰＣにインストールさせ、交通音判定プログラム１８０Ａに基づいてＰＣを動作させればよい。以下に詳細な説明を行なう。 <Third Embodiment>
The present invention can also be applied to a personal computer (hereinafter also referred to as a PC (Personal Computer)) to which a microphone is connected. In that case, the communication program may be read from the recording medium on which the traffic sound determination program 180A is recorded, installed on the PC, and the PC may be operated based on the traffic sound determination program 180A. Detailed description will be given below.

図１３は、本実施の形態におけるＰＣ５００の内部の構成を示すブロック図である。なお、図１３には、説明のために、通信部５７０、記録媒体５５５も示している。また、図１３は、交通音判定プログラム１８０ＡをＰＣ５００にインストールするときの構成を示す。この場合、ＰＣ５００は、交通音識別装置１０００として動作する。 FIG. 13 is a block diagram showing an internal configuration of PC 500 in the present embodiment. Note that FIG. 13 also shows a communication unit 570 and a recording medium 555 for explanation. FIG. 13 shows a configuration when the traffic sound determination program 180A is installed in the PC 500. In this case, the PC 500 operates as the traffic sound identification device 1000.

図１３を参照して、通信部５７０は、ネットワークと有線または無線で、データの授受を行なう。通信部５７０は、ネットワークとデータの授受を行なう。また、通信部５７０は、イーサネット（登録商標）を利用した通信用インターフェース（たとえば、ルータ）である。 Referring to FIG. 13, communication unit 570 exchanges data with a network in a wired or wireless manner. Communication unit 570 exchanges data with the network. The communication unit 570 is a communication interface (for example, a router) using Ethernet (registered trademark).

また、通信部５７０は、無線ＬＡＮの規格であるＩＥＥＥ８０２．１１ａ、ＩＥＥＥ８０２．１１ｂ、ＩＥＥＥ８０２．１１ｇ、その他無線技術を利用してデータ通信を行なう通信用インターフェースのいずれであってもよい。 Further, the communication unit 570 may be any of a communication interface for performing data communication using a wireless technology such as IEEE802.11a, IEEE802.11b, IEEE802.11g, which are wireless LAN standards.

ＰＣ５００には、表示部５３０と、マウス５４２と、キーボード５４４とマイク５４６とが接続される。 A display unit 530, a mouse 542, a keyboard 544, and a microphone 546 are connected to the PC 500.

表示部５３０は、ＰＣ５００から出力された画像データに基づいた画像を表示する。表示部５３０は、制御部５１０からの指示に応じた画像を表示する。表示部５３０は、液晶ディスプレイ（ＬＣＤ（Liquid Crystal Display））、ＣＲＴ（Cathode Ray Tube）、ＦＥＤ（Field Emission Display）、ＰＤＰ（Plasma Display Panel）、有機ＥＬディスプレイ（Organic Electro luminescence Display）、ドットマトリクス等その他の画像表示方式の表示機器のいずれであってもよい。 Display unit 530 displays an image based on the image data output from PC 500. Display unit 530 displays an image in accordance with an instruction from control unit 510. The display unit 530 is a liquid crystal display (LCD (Liquid Crystal Display)), a CRT (Cathode Ray Tube), an FED (Field Emission Display), a PDP (Plasma Display Panel), an organic EL display (Organic Electroluminescence Display), a dot matrix, or the like. Any other display device of an image display method may be used.

マウス５４２は、ユーザがＰＣ５００を操作するためのインターフェースである。キーボード５４４は、ユーザがＰＣ５００を操作するためのインターフェースである。マイク５４６は、前述のマイク１００と同様、交通音を収集する機能を有する。 The mouse 542 is an interface for the user to operate the PC 500. The keyboard 544 is an interface for the user to operate the PC 500. The microphone 546 has a function of collecting traffic sound, similar to the microphone 100 described above.

ＰＣ５００は、制御部５１０と、データ一時記憶部５２２と、記憶部５２０と、通信部５６０と、ＶＤＰ（Video Display Processor）５３２と、ＣＧＲＯＭ（Character Graphic Read Only Memory）５３４と、ＶＲＡＭ（Video Random Access Memory）５３６と、入力部５４０と、記録媒体アクセス部５５０とを含む。 The PC 500 includes a control unit 510, a data temporary storage unit 522, a storage unit 520, a communication unit 560, a VDP (Video Display Processor) 532, a CGROM (Character Graphic Read Only Memory) 534, and a VRAM (Video Random Access). Memory) 536, an input unit 540, and a recording medium access unit 550.

ＣＧＲＯＭ５３４には、フォントデータ、図形データなど、表示部５３０で表示される画像を生成するための画像データが記憶されている。 The CGROM 534 stores image data for generating an image to be displayed on the display unit 530 such as font data and graphic data.

記憶部５２０には、制御部５１０に所定の処理を行なわせるための通信プログラム１８０Ａ、その他各種データ等が記憶されている。記憶部５２０は、制御部５１０によってデータアクセスされる。 The storage unit 520 stores a communication program 180A for causing the control unit 510 to perform predetermined processing, and other various data. The storage unit 520 is accessed by the control unit 510.

記憶部５２０は、制御部５１０によってデータアクセスされる。記憶部５２０は、大容量のデータを記憶可能なハードディスクである。なお、記憶部５２０は、ハードディスクに限定されることなく、電源を供給されなくてもデータを保持可能な媒体（たとえば、フラッシュメモリ）であればよい。 The storage unit 520 is accessed by the control unit 510. The storage unit 520 is a hard disk capable of storing a large amount of data. Note that the storage unit 520 is not limited to a hard disk, and may be any medium (for example, a flash memory) that can hold data without being supplied with power.

すなわち、記憶部５２０は、記憶の消去・書き込みを何度でも行えるＥＰＲＯＭ（Erasable Programmable Read Only Memory）、電気的に内容を書き換えることができるＥＥＰＲＯＭ（Electronically Erasable and Programmable Read Only Memory）、紫外線を使って記憶内容の消去・再書き込みを何度でも行えるＵＶ−ＥＰＲＯＭ（Ultra-Violet Erasable Programmable Read Only Memory）、その他、不揮発的にデータを記憶保持可能な構成を有する回路のいずれであってもよい。 That is, the storage unit 520 uses an EEPROM (Erasable Programmable Read Only Memory) capable of erasing and writing the memory any number of times, an EEPROM (Electronically Erasable and Programmable Read Only Memory) capable of being electrically rewritten, and ultraviolet rays. Any of a UV-EPROM (Ultra-Violet Erasable Programmable Read Only Memory) capable of erasing and rewriting stored contents any number of times and a circuit having a configuration capable of storing and storing data in a nonvolatile manner may be used.

制御部５１０は、記憶部５２０に記憶された交通音判定プログラム１８０Ａに従って、ＰＣ５００の内部の各機器に対する各種処理や、演算処理等を行なう機能を有する。制御部５１０は、マイクロプロセッサ（Microprocessor）、プログラミングすることができるＬＳＩ（Large Scale Integration）であるＦＰＧＡ（Field Programmable Gate Array）、特定の用途のために設計、製造される集積回路であるＡＳＩＣ（Application Specific Integrated Circuit）、その他の演算機能を有する回路のいずれであってもよい。 Control unit 510 has a function of performing various types of processing, arithmetic processing, and the like for each device inside PC 500 in accordance with traffic sound determination program 180A stored in storage unit 520. The control unit 510 includes a microprocessor, an FPGA (Field Programmable Gate Array), which is an LSI (Large Scale Integration) that can be programmed, and an ASIC (Application, which is an integrated circuit designed and manufactured for a specific application. Specific Integrated Circuit) or any other circuit having an arithmetic function may be used.

また、制御部５１０は、記憶部５２０に記憶された交通音判定プログラム１８０Ａに従って、ＶＤＰ５３２に対し、画像を生成させ、当該画像を表示部５３０に表示させる指示（以下においては、「描画指示」とも称する）を出す。 The control unit 510 also instructs the VDP 532 to generate an image and display the image on the display unit 530 according to the traffic sound determination program 180A stored in the storage unit 520 (hereinafter referred to as a “drawing instruction”). Call out).

ＶＤＰ５３２は表示部５３０と接続されている。ＶＤＰ５３２は、制御部５１０からの描画指示に応じて、ＣＧＲＯＭ５３４から必要な画像データを読出し、ＶＲＡＭ５３６を利用して画像を生成する。そして、ＶＤＰ５３２は、ＶＲＡＭ５３６に記憶された画像データを読出し、表示部５３０に、当該画像データに基づく画像を表示させる。 The VDP 532 is connected to the display unit 530. The VDP 532 reads necessary image data from the CGROM 534 in accordance with a drawing instruction from the control unit 510 and generates an image using the VRAM 536. The VDP 532 reads out the image data stored in the VRAM 536 and causes the display unit 530 to display an image based on the image data.

ＶＲＡＭ５３６は、ＶＤＰ５３２が生成した画像を一時的に記憶する機能を有する。 The VRAM 536 has a function of temporarily storing an image generated by the VDP 532.

データ一時記憶部５２２は、制御部５１０によってデータアクセスされ、一時的にデータを記憶するワークメモリとして使用される。 The data temporary storage unit 522 is accessed as data by the control unit 510 and used as a work memory for temporarily storing data.

データ一時記憶部５２２は、データを一時的に記憶可能なＲＡＭ（Random Access Memory）、ＳＲＡＭ（Static Random Access Memory）、ＤＲＡＭ（Dynamic Random Access Memory）、ＳＤＲＡＭ（Synchronous DRAM）、ダブルデータレートモードという高速なデータ転送機能を持ったＳＤＲＡＭであるＤＤＲ−ＳＤＲＡＭ（Double Data Rate SDRAM）、Rambus社が開発した高速インターフェース技術を採用したＤＲＡＭであるＲＤＲＡＭ（Rambus Dynamic Random Access Memory）、Ｄｉｒｅｃｔ−ＲＤＲＡＭ（Direct Rambus Dynamic Random Access Memory）、その他、データを揮発的に記憶保持可能な構成を有する回路のいずれであってもよい。 The data temporary storage unit 522 is a high-speed RAM (Random Access Memory), SRAM (Static Random Access Memory), DRAM (Dynamic Random Access Memory), SDRAM (Synchronous DRAM), or double data rate mode capable of temporarily storing data. DDR-SDRAM (Double Data Rate SDRAM), which is an SDRAM with an excellent data transfer function, RDRAM (Rambus Dynamic Random Access Memory), DRAM that adopts high-speed interface technology developed by Rambus, and Direct-RDRAM (Direct Rambus Dynamic) Random Access Memory) or any other circuit having a configuration capable of storing and storing data in a volatile manner.

入力部５４０には、マウス５４２と、キーボード５４４と、マイク５４６とが接続されている。マウス５４２またはキーボード５４４からの入力指示は、入力部５４０を介して制御部５１０に伝達される。制御部５１０は、入力部５４０からの入力指示に基づいて所定の処理を行なう。また、マイク５４６から収集された交通音は、入力部５４０を介して制御部５１０に伝達される。 A mouse 542, a keyboard 544, and a microphone 546 are connected to the input unit 540. An input instruction from the mouse 542 or the keyboard 544 is transmitted to the control unit 510 via the input unit 540. Control unit 510 performs predetermined processing based on an input instruction from input unit 540. The traffic sound collected from the microphone 546 is transmitted to the control unit 510 via the input unit 540.

制御部５１０は、入力部５４０から入力された交通音に対して、第１および第２の実施の形態で説明した音圧演算回路１１０、パワースペクトル演算回路１２０、サブバンド演算回路１３０、ニューラルネットワーク２００．１，２００．２，・・・，２００．Ｌ、学習回路２１０および判定回路２５０と同様な処理を行なう。 The control unit 510 uses the sound pressure calculation circuit 110, the power spectrum calculation circuit 120, the subband calculation circuit 130, and the neural network described in the first and second embodiments for the traffic sound input from the input unit 540. 200.1, 200.2, ..., 200. L, processing similar to that of the learning circuit 210 and the determination circuit 250 is performed.

記録媒体アクセス部５５０は、交通音判定プログラム１８０Ａが記録された記録媒体５５５から、交通音判定プログラム１８０Ａを読出す機能を有する。記録媒体５５５に記憶されている交通音判定プログラム１８０Ａは、制御部５１０の動作（インストール処理）により、記録媒体アクセス部５５０から読み出され、記憶部５２０に記憶される。 The recording medium access unit 550 has a function of reading the traffic sound determination program 180A from the recording medium 555 on which the traffic sound determination program 180A is recorded. The traffic sound determination program 180A stored in the recording medium 555 is read from the recording medium access unit 550 and stored in the storage unit 520 by the operation (installation process) of the control unit 510.

記録媒体５５５は、ＤＶＤ−ＲＯＭ（Digital Versatile Disk Read Only Memory）、ＣＤ−ＲＯＭ（Compact Disk Read Only Memory）、ＭＯ（Magneto Optical Disk）、フロッピー（登録商標）ディスク、ＣＦ(Compact Flash) カード、ＳＭ（Smart Media（登録商標））、ＭＭＣ（Multi Media Card）、ＳＤ（Secure Digital）メモリーカード、メモリースティック（登録商標）、ｘＤピクチャーカードおよびＵＳＢメモリ、磁気テープ、その他不揮発性メモリのいずれであってもよい。 The recording medium 555 includes a DVD-ROM (Digital Versatile Disk Read Only Memory), a CD-ROM (Compact Disk Read Only Memory), an MO (Magneto Optical Disk), a floppy (registered trademark) disk, a CF (Compact Flash) card, and an SM. (Smart Media (registered trademark)), MMC (Multi Media Card), SD (Secure Digital) memory card, Memory Stick (registered trademark), xD picture card and USB memory, magnetic tape, and other non-volatile memory Good.

通信部５６０は、制御部５１０とデータの授受を行なう。また、通信部５６０は、通信部１３５と有線または無線で、データの授受を行なう。通信部５６０は、前述の通信部５７０と同様なインターフェースである。 Communication unit 560 exchanges data with control unit 510. Communication unit 560 exchanges data with communication unit 135 in a wired or wireless manner. The communication unit 560 is an interface similar to the communication unit 570 described above.

また、通信部５６０は、ＵＳＢ（Universal Serial Bus）１．１、ＵＳＢ２．０、その他シリアル転送を行なう通信用インターフェースのいずれであってもよい。また、通信部５６０は、セントロニクス仕様、ＩＥＥＥ１２８４（Institute of Electrical and Electronic Engineers 1284）、その他パラレル転送を行なう通信用インターフェースのいずれであってもよい。また、通信部５６０は、ＩＥＥＥ１３９４、その他ＳＣＳＩ規格を利用した通信用インターフェースのいずれであってもよい。 The communication unit 560 may be any one of USB (Universal Serial Bus) 1.1, USB 2.0, and other communication interfaces for performing serial transfer. The communication unit 560 may be any of a Centronics specification, IEEE 1284 (Institute of Electrical and Electronic Engineers 1284), and other communication interfaces that perform parallel transfer. The communication unit 560 may be any one of communication interfaces using IEEE 1394 or other SCSI standards.

以上説明したように、交通音判定プログラム１８０ＡがインストールされたＰＣにおいても、第１および第２の実施の形態と同様な処理を行なうことができ、本発明を適用可能となる。 As described above, even with a PC on which the traffic sound determination program 180A is installed, the same processing as in the first and second embodiments can be performed, and the present invention can be applied.

したがって、本実施の形態においても、第１および第２の実施の形態と同様な効果を得ることができる。 Therefore, also in this embodiment, the same effect as in the first and second embodiments can be obtained.

なお、交通音判定プログラム１８０Ａが記録される機器は、ＰＣに限定されることなく、交通音を収集可能な機器が接続され、交通音判定プログラム１８０Ａをインストール可能で、交通音判定プログラム１８０Ａに基づいて動作する機器であればよい。 The device on which the traffic sound determination program 180A is recorded is not limited to a PC, but a device capable of collecting traffic sounds is connected, the traffic sound determination program 180A can be installed, and the device is based on the traffic sound determination program 180A. Any device can be used.

以上、本発明においては、ニューラルネットワークにより、バギングまたはアダブーストの手法を行なったが、これに限定されることはなく、バギングまたはアダブーストと同様な手法を利用してもよい。 As described above, in the present invention, the bagging or Adaboost technique is performed using a neural network, but the present invention is not limited to this, and a technique similar to Bagging or Adaboost may be used.

なお、第１および第２の実施の形態に記載の実験結果は、学習用に交通音データを１００個しか用いていないため、正解率があまり高くない数値となっているが、学習用の交通音データの数を増やせば、正解率は向上するものと考えられる。 The experimental results described in the first and second embodiments are numerical values that do not have a very high accuracy rate because only 100 traffic sound data are used for learning. Increasing the number of sound data is considered to improve the accuracy rate.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

本実施の形態における交通音識別装置の構成を示すブロック図である。It is a block diagram which shows the structure of the traffic sound identification apparatus in this Embodiment. 本実施の形態におけるニューラルネットワーク部の内部構成を示した図である。It is the figure which showed the internal structure of the neural network part in this Embodiment. 本実施の形態におけるニューラルネットワークの詳細な内部構成を示した図である。It is the figure which showed the detailed internal structure of the neural network in this Embodiment. バギング集団学習処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a bagging group learning process. 学習セット生成処理を説明するための図である。It is a figure for demonstrating a learning set production | generation process. バギング判定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a bagging determination process. バギング判定処理を説明するための図である。It is a figure for demonstrating a bagging determination process. 交通音判定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a traffic sound determination process. 交通音の判定に使用したデータテーブルを示す図である。It is a figure which shows the data table used for determination of a traffic sound. 実験結果を示すデータテーブルを示す図である。It is a figure which shows the data table which shows an experimental result. アダブースト集団学習処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an Ada boost group learning process. アダブースト判定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an Ada boost determination process. 本実施の形態におけるＰＣの内部の構成を示すブロック図である。It is a block diagram which shows the internal structure of PC in this Embodiment.

Explanation of symbols

１００マイク、１１０音圧演算回路、１２０パワースペクトル演算回路、１３０サブバンド演算回路、１８０Ａ交通音判定プログラム、２００ニューラルネットワーク部、２００．１，２００．２，・・・，２００．Ｌニューラルネットワーク、２１０学習回路、２５０判定回路、５００ＰＣ、５５５記録媒体、１０００交通音識別装置。 100 microphone, 110 sound pressure calculation circuit, 120 power spectrum calculation circuit, 130 subband calculation circuit, 180A traffic sound determination program, 200 neural network unit, 200.1, 200.2,. L neural network, 210 learning circuit, 250 determination circuit, 500 PC, 555 recording medium, 1000 traffic sound identification device.

Claims

A traffic sound identification device that determines which of a plurality of predetermined types of traffic sounds is collected,
An arithmetic circuit for calculating a power spectrum of the traffic sound;
A subband circuit that divides the power spectrum calculated by the arithmetic circuit into a plurality of subbands, and generates I (natural number) input data based on the power spectrum divided into the plurality of subbands ;
L neural networks pre-ordered from first to L (natural number) for determining the type of traffic sound based on the power spectrum divided into the plurality of subbands;
A learning circuit for learning the L neural networks,
The learning circuit sequentially generates L sets of learning data by performing a process of selecting N traffic sound data randomly from N (natural number) traffic sound data for learning without interfering with duplication. Then, a learning process for sequentially learning with the L neural networks is performed,
Each of the learned L neural networks outputs a judgment result of the traffic sound type based on the I input data ,
A traffic sound identification device further comprising a determination circuit for determining the type of traffic sound based on a majority decision of L determination results respectively output from the learned L neural networks.

A traffic sound identification device that determines which of a plurality of predetermined types of traffic sounds is collected,
An arithmetic circuit for calculating a power spectrum of the traffic sound;
A subband circuit that divides the power spectrum calculated by the arithmetic circuit into a plurality of subbands;
L neural networks pre-ordered from first to L (natural number) for determining the type of traffic sound based on the power spectrum divided into the plurality of subbands;
A learning circuit for learning the L neural networks,
The learning circuit performs L data selection processing to select N traffic sound data randomly from N (natural number) traffic sound data for learning without preventing duplication, thereby obtaining L sets of learning data. A learning process that sequentially generates and sequentially learns with the L neural networks is performed,
In the data selection process, the learning circuit performs the learning process of the (k + 1) th (k + 1) (k: a natural number smaller than L) of the L pieces of neural networks, and the kth of the L pieces of learning circuits. Select traffic sound data that was difficult to learn in the neural network learning process,
Each of the learned L neural networks outputs a determination result of the traffic sound type based on the power spectrum divided into the plurality of subbands,
A traffic sound identification device further comprising a determination circuit for determining the type of the traffic sound based on a weighting calculation of L determination results respectively output from the learned L neural networks.

When the learning circuit causes the L neural networks to learn, the type of traffic sound used for learning determination is M (natural number),
The M traffic sounds are divided into Q groups (natural numbers smaller than M),
3. The traffic sound identification device according to claim 1, wherein when the determination circuit determines the traffic sound, the traffic sound to be determined belongs to which group of the Q groups.

A traffic sound determination program executed by a computer having a microphone for collecting traffic sound,
Calculating a power spectrum of the traffic sound;
Dividing the calculated power spectrum into a plurality of subbands , and generating I (natural number) input data based on the power spectrum divided into the plurality of subbands ;
By performing a data selection process for selecting N traffic sound data randomly from N (natural number) traffic sound data for learning without interfering with duplication, L sets of learning data are sequentially generated, Sequentially configuring L neural networks in which the learning data generated sequentially is trained;
Causing each of the learned L neural networks to output a determination result of the traffic sound type based on the I input data ;
A traffic sound determination program for causing a computer to execute a step of determining the type of traffic sound based on a majority decision of L determination results respectively output from the L neural networks learned.

A traffic sound determination program executed by a computer having a microphone for collecting traffic sound,
Calculating a power spectrum of the traffic sound;
Dividing the computed power spectrum into a plurality of subbands;
By performing a data selection process for selecting N traffic sound data randomly from N (natural number) traffic sound data for learning without interfering with duplication, L sets of learning data are sequentially generated, Sequentially configuring L neural networks in which the learning data generated sequentially is trained;
Causing each of the learned L neural networks to output a judgment result of the traffic sound type based on the power spectrum divided into the plurality of subbands;
Making the computer perform the step of determining the type of the traffic sound based on the weighting calculation of the L determination results output from the learned L neural networks,
In the data selection process, when learning processing of the (k + 1) th (k + 1) (k: natural number less than L) of the L pieces of neural networks is performed, learning is performed with the kth neural network of the L pieces. A traffic sound determination program for causing a computer to further execute a step of preferentially selecting traffic sound data that has been difficult.

The recording medium which recorded the traffic sound determination program of Claim 4 or Claim 5 .

A traffic sound determination method for determining whether the collected traffic sound is any of a plurality of predetermined traffic sounds,
Calculating a power spectrum of the traffic sound;
Dividing the calculated power spectrum into a plurality of subbands , and generating I (natural number) input data based on the power spectrum divided into the plurality of subbands ;
By performing a data selection process for selecting N traffic sound data randomly from N (natural number) traffic sound data for learning without interfering with duplication, L sets of learning data are sequentially generated, Sequentially configuring L neural networks in which the learning data generated in sequence is sequentially learned;
Outputting each of the learned L neural networks based on the I pieces of input data to output a judgment result of the traffic sound type;
And a step of determining the type of the traffic sound based on a majority decision of the L determination results output from the learned L neural networks.

A traffic sound determination method for determining whether the collected traffic sound is any of a plurality of predetermined traffic sounds,
Calculating a power spectrum of the traffic sound;
Dividing the computed power spectrum into a plurality of subbands;
By performing a data selection process for selecting N traffic sound data randomly from N (natural number) traffic sound data for learning without interfering with duplication, L sets of learning data are sequentially generated, Sequentially configuring L neural networks in which the learning data generated in sequence is sequentially learned;
Outputting each of the learned neural networks based on the power spectrum divided into the plurality of sub-bands based on the traffic sound type determination results;
Determining the type of the traffic sound based on a weighting calculation of L determination results respectively output from the L neural networks learned.
In the data selection process, when learning processing of the (k + 1) th (k + 1) (k: natural number less than L) of the L pieces of neural networks is performed, learning is performed with the kth neural network of the L pieces. A traffic sound determination method that preferentially selects difficult traffic sound data.