JP5283268B2

JP5283268B2 - Voice utterance state judgment device

Info

Publication number: JP5283268B2
Application number: JP2009067315A
Authority: JP
Inventors: 日出夫杉本
Original assignee: 株式会社大日電子
Priority date: 2009-03-19
Filing date: 2009-03-19
Publication date: 2013-09-04
Anticipated expiration: 2029-03-19
Also published as: JP2010217807A

Description

本発明は、スピーカーで発した音声情報を、マイクロホンから集音し、発した音声情報の音声品質状態をチェックする技術に関するものである。 The present invention relates to a technique for collecting sound information emitted from a speaker from a microphone and checking a sound quality state of the emitted sound information.

従来、災害時等に情報収集し、その収集情報に基づいて避難を促すような音声情報をスピーカーなどで放送する際、設置されているスピーカーから正しい音声情報が放送されているか否かを確かめる手段がなかった。例えば、スピーカー設備の不良や、強風による音声情報の乱れが原因となって音声情報が期待通りに、周囲に告知されない場合がある。また、災害時においては、周囲の雑音やノイズなどで重要な音声情報が掻き消されてしまうケースがある。また、人には聴覚心理という特性がある。聴覚心理とは、例えば大きな音があるとその周辺の周波数の小さな音の有無が知覚できない、というような人間の耳が持つ特性の一つである。 Conventionally, when collecting information in the event of a disaster, etc., and broadcasting audio information that prompts evacuation based on the collected information through speakers etc., means to check whether correct audio information is being broadcast from the installed speakers There was no. For example, audio information may not be announced to the surroundings as expected due to poor speaker equipment or disturbance of audio information due to strong winds. In a disaster, important audio information may be erased by ambient noise or noise. In addition, humans have the characteristic of auditory psychology. Auditory psychology is one of the characteristics of the human ear such that, for example, if there is a loud sound, the presence or absence of a sound with a low frequency around it cannot be perceived.

一方、再生した音声を収音し、正しく発音をしているかを確かめる手段として、外国語のリスニング試験等で使用されるヘッドフォン内のスピーカー近くにマイクを設置し、試験問題の音声が正しく再生されているかを確認するためのものがあった（特許文献１参照）。これは、音声が正しく再生されているかの確認だけに使用されるものであり、災害時等に起こり得る、再生音声に騒音やノイズが入り混じっているかどうかを確認するためのものではなかった。 On the other hand, as a means of collecting the reproduced sound and confirming whether it is sounding correctly, a microphone is installed near the speaker in the headphones used in foreign language listening tests, etc. There was a thing for confirming whether it is (refer patent document 1). This is used only for confirming whether or not the sound is correctly reproduced, and is not for confirming whether or not noise or noise is mixed in the reproduced sound, which may occur at the time of a disaster or the like.

特開２００７−２２１５１１号公報JP 2007-221511 A

音声情報を周囲に音声告知する場合、スピーカー設置場所の現場状況などにより、周囲に告知できたか否かは、音声情報を発信している側には不明である。またスピーカーに関わる回線の断線や、アンプ，マイク，スピーカーなどの機器不良や、また電磁波による障害などが原因となり、音声にノイズや歪みが発生する。このような場合、特に防災情報など緊急の放送であれば、人命にも影響するため、音声が正しく発声されているか判別する仕組みが要望されている。
上記状況に鑑みて、本発明では、災害時等に周囲に告知された音声情報が正しく周囲に放送されたか否かを判定できる装置の提供を目的とする。 When the voice information is notified to the surroundings, it is not clear to the side sending the voice information whether or not the surroundings can be notified depending on the situation of the speaker installation site. In addition, noise and distortion occur in the sound due to disconnection of the lines related to the speakers, defective devices such as amplifiers, microphones and speakers, and obstructions caused by electromagnetic waves. In such a case, an emergency broadcast such as disaster prevention information affects human life, so a mechanism for discriminating whether the sound is correctly uttered is desired.
In view of the above situation, an object of the present invention is to provide an apparatus that can determine whether or not audio information notified to the surroundings at the time of a disaster or the like has been correctly broadcast to the surroundings.

上記目的を達成すべく、本発明に係る音声発声状態判断装置は、音声情報をスピーカーなどの音声発声手段によって周囲に音声告知する装置において、
１）告知する音声情報を記憶保持する音声情報保持手段と、
２）装置周辺の音を集音する集音手段と、
３）上記の集音手段から集音された集音情報と上記音声情報とを比較する比較手段と、
４）上記の集音情報と上記音声情報とを比較した結果に基づき、上記の音声発声手段から出力された音声の品質状態を判定する判定手段と、
を少なくとも備え、
５）上記３）の比較手段は、
５−１）上記集音情報の信号スペクトラムの入力音量レベルが所定の閾値以上のレベルの場合に比較処理をする入力レベル判定ステップと、
５−２）上記音声情報および上記集音情報の信号スペクトラムを正規化する正規化ステップと、
５−３）上記集音情報の信号スペクトラムの内、所定の音声帯域の周波数と振幅と、上記音声情報の周波数と振幅との相関値を演算する相関演算ステップと、
５−４）上記集音情報における音声帯域のスペクトルと音声帯域外のスペクトルのＳ／Ｎ比、および音声帯域での原音スペクトルのＳ／Ｎ比を演算するＳ／Ｎ比演算ステップとを有し、
６）上記４）の判定手段は、上記５−３）の相関演算ステップにおける相関値が所定値以上で、かつ、上記５−４）のＳ／Ｎ比演算ステップにおける各Ｓ／Ｎ比が所定値以上の場合に、上記音声発声手段から出力された音声の品質が良好と判定する判定ステップを有する、ことを特徴とする。 In order to achieve the above object, a voice utterance state determination device according to the present invention is a device that announces voice information to surroundings by voice utterance means such as a speaker.
1) voice information holding means for storing and holding voice information to be notified;
2) sound collecting means for collecting sounds around the device;
3) Comparison means for comparing the sound collection information collected from the sound collection means with the sound information;
4) A determination unit that determines a quality state of the voice output from the voice utterance unit based on a result of comparing the collected sound information and the voice information;
Comprising at least
5) The comparison means of 3) above is
5-1) an input level determination step for performing comparison processing when the input volume level of the signal spectrum of the sound collection information is a level equal to or higher than a predetermined threshold;
5-2) a normalizing step for normalizing the signal spectrum of the voice information and the collected sound information;
5-3) a correlation calculation step for calculating a correlation value between the frequency and amplitude of a predetermined voice band in the signal spectrum of the sound collection information and the frequency and amplitude of the voice information;
5-4) an S / N ratio calculating step for calculating the S / N ratio of the spectrum in the voice band and the spectrum outside the voice band in the collected sound information , and the S / N ratio of the original sound spectrum in the voice band. ,
6) In the determination means of 4), the correlation value in the correlation calculation step of 5-3) is not less than a predetermined value, and each S / N ratio in the S / N ratio calculation step of 5-4) is predetermined. In the case where the value is equal to or higher than the value, the method includes a determination step of determining that the quality of the sound output from the sound uttering means is good.

かかる構成によれば、周囲に告知された音声情報が正しく周囲に放送されたか否かを判定できる。
ここで、音声情報とは、人が聞きうる言葉による指示や避難を促す警告音やサイレン、をいう。人間が発する音声だけでなく、コンピュータが生成した音声も含まれる。
また、集音手段とは、マイクやマイクロホンなどの周囲の音を収集できるものをいう。 According to such a configuration, it is possible to determine whether or not the voice information notified to the surroundings is correctly broadcast to the surroundings.
Here, the voice information refers to instructions in words that humans can hear and warning sounds and sirens that prompt evacuation. This includes not only human speech but also computer generated speech.
The sound collection means refers to a device that can collect ambient sounds such as a microphone and a microphone.

音声の品質状態の判定は、集音情報の信号スペクトラムの入力音量レベルが閾値以上のレベルの場合に、音声情報および集音情報の信号スペクトラムを正規化し、集音情報の信号スペクトラムの音声帯域の周波数と振幅と、音声情報の周波数と振幅との相関値を演算し、集音情報における音声帯域のスペクトルと音声帯域外のスペクトルのＳ／Ｎ比を演算し、相関値が所定値以上で、かつ、Ｓ／Ｎ比が所定値以上の場合に、音声品質が良好と判定するものである。 The sound quality state is determined by normalizing the sound information and the signal spectrum of the sound collection information when the input volume level of the signal spectrum of the sound collection information is equal to or higher than the threshold, and determining the sound band of the signal spectrum of the sound collection information. The correlation value between the frequency and amplitude and the frequency and amplitude of the voice information is calculated, the S / N ratio of the spectrum of the voice band and the spectrum outside the voice band in the sound collection information is calculated, and the correlation value is a predetermined value or more, In addition, when the S / N ratio is a predetermined value or more, it is determined that the voice quality is good.

なお、音声帯域のスペクトルとは、人が聞き取れる音声帯域である３００〜３４００Ｈｚの帯域の音波のスペクトルである。音声帯域のスペクトルと音声帯域外のスペクトルのＳ／Ｎ比を演算するのは、３００〜３４００Ｈｚ帯域内の音波は音声情報の信号として処理し、３００〜３４００Ｈｚ帯域以外の音波をノイズ（雑音）として見做し、Ｓ／Ｎ比を演算するためである。Ｓ／Ｎ比は、一般に信号（Ｓ）に対するノイズ（Ｎ）の量を対数で表したものであり、アンプなどの電気回路の性能を表すときなどに使われるものである。基準信号を入力したときの出力レベル（信号レベル）を、入力なしの場合の出力レベル（雑音レベル）に対してｄＢ (デシベル)で表す。このｄＢの数値が大きいほど雑音が少なく高品質の信号が得られることを意味する。従って、音声帯域のスペクトルと音声帯域外のスペクトルのＳ／Ｎ比を演算すれば、音声情報が存在する音声帯域のスペクトルと音声情報ではないノイズが存在する音声帯域外のスペクトルのＳ／Ｎ比が求まる。Ｓ／Ｎ比が小さい場合には、音声情報がノイズの影響を大きくうける。上述したように、人には聴覚心理といった特性があり、大きなノイズ音があるとその周辺の周波数の小さな音の有無が知覚できないために、Ｓ／Ｎ比が小さい場合は、装置の周囲に正常に音声情報が告知できないものと判定することとしたものである。
なお、ノイズには大別して２種類ある。間欠ノイズと一定の周波数で入ってくる騒音等のノイズである。Ｓ／Ｎ比が小さく、ノイズがある場合は正常に音声情報が周囲に告知できなかったこととするのである。 The spectrum of the voice band is a spectrum of sound waves in a band of 300 to 3400 Hz, which is a voice band that can be heard by humans. The S / N ratio between the spectrum in the voice band and the spectrum outside the voice band is calculated by processing sound waves in the 300 to 3400 Hz band as a signal of voice information, and sound waves outside the 300 to 3400 Hz band as noise. This is for calculating the S / N ratio. The S / N ratio is generally a logarithm of the amount of noise (N) with respect to the signal (S), and is used when representing the performance of an electric circuit such as an amplifier. The output level (signal level) when the reference signal is input is expressed in dB (decibel) with respect to the output level (noise level) when there is no input. The larger the value of dB, the lower the noise and the higher the quality of the signal. Therefore, if the S / N ratio between the spectrum in the voice band and the spectrum outside the voice band is calculated, the S / N ratio between the spectrum in the voice band where the voice information exists and the spectrum outside the voice band where the noise other than the voice information exists. Is obtained. When the S / N ratio is small, the voice information is greatly affected by noise. As described above, humans have characteristics such as auditory psychology, and if there is a loud noise sound, it is impossible to perceive the presence or absence of a sound with a small frequency around it. It is determined that the voice information cannot be notified.
There are two types of noise. Intermittent noise and noise that enters at a constant frequency. If the S / N ratio is small and there is noise, the voice information cannot be normally notified to the surroundings.

また、本発明に係る音声発声状態判断装置は、上記の構成において、更に、音声情報を有線もしくは無線通信により受信する音声情報受信手段を備え、音声情報受信手段により受信した音声情報を上記の音声情報保持手段により記憶保持することを特徴とする。
かかる構成により、告知する音声情報を遠隔の場所から発信することができる。有線通信とは、LANやWANなどのネットワーク通信をいい、無線通信とは、事業用無線や防災無線など広域で使用されるワイヤレスLANなどのネットワーク通信をいう。 In addition, the voice utterance state determination device according to the present invention further includes voice information receiving means for receiving voice information by wired or wireless communication in the above-described configuration, and the voice information received by the voice information receiving means is received by the voice information. The information is stored and held by the information holding means.
With this configuration, the voice information to be notified can be transmitted from a remote location. Wired communication refers to network communication such as a LAN or WAN, and wireless communication refers to network communication such as a wireless LAN used in a wide area such as business radio and disaster prevention radio.

また、本発明に係る音声発声状態判断装置は、上記の構成における判定手段の判定ステップにおいて、良好と判定された以外の前記音声情報を再び音声発声手段によって周囲に音声告知する、リトライ手段を備えたことを特徴とする。
かかる構成により、告知した音声情報の品質状態が良好でない場合に、再度、音声情報を周囲に発声することができる。これにより、防災時の重要な告知メッセージを確実に周囲の住民に伝達することが可能となる。 The voice utterance state determination device according to the present invention further includes a retry unit that, in the determination step of the determination unit in the above configuration, notifies the surrounding voice information other than that determined to be good again by the voice utterance unit. It is characterized by that.
With this configuration, when the quality state of the notified voice information is not good, the voice information can be uttered again to the surroundings. This makes it possible to reliably transmit important notification messages at the time of disaster prevention to the surrounding residents.

また、本発明に係る音声発声状態判断装置は、上記の構成において、上記の比較手段及び／又は判定手段の結果を外部出力する結果出力手段を更に備えたことを特徴とする。
かかる構成により、防災時の重要な告知メッセージを確実に周囲の住民に伝達することができたか否かの報告が可能となる。
ここで、外部出力とは、比較結果や判定結果を印刷出力したり、或は、外部のコンピュータ装置と通信してデータ出力したり、装置に接続されている表示装置に表示したりすることをいう。結果出力手段とは、例えば、プリンタなどの紙媒体への出力装置や無線機などの外部との通信装置である。 In addition, the voice utterance state determination device according to the present invention is characterized in that, in the above configuration, a result output means for outputting the result of the comparison means and / or the determination means to the outside is further provided.
With this configuration, it is possible to report whether or not an important notification message at the time of disaster prevention has been reliably transmitted to the surrounding residents.
Here, external output means printing out comparison results or determination results, or outputting data by communicating with an external computer device, or displaying on a display device connected to the device. Say. The result output means is, for example, an output device to a paper medium such as a printer or a communication device with the outside such as a wireless device.

本発明の音声発声状態判断装置によれば、災害時等に周囲に告知された音声情報が正しく周囲に放送されたか否かを判定できるといった効果を有する。 According to the voice utterance state determination device of the present invention, there is an effect that it is possible to determine whether or not the voice information notified to the surroundings at the time of a disaster or the like is correctly broadcast to the surroundings.

実施例１の音声発声状態判断装置のシステム構成図1 is a system configuration diagram of a voice utterance state determination device according to a first embodiment. 装置本体１の内部の処理フローProcessing flow inside device body 1 相関演算アルゴリズムCorrelation algorithm Ｓ／Ｎ比演算アルゴリズムS / N ratio calculation algorithm 品質状態判定アルゴリズムQuality condition judgment algorithm Ｓ／Ｎ比演算の説明図Illustration of S / N ratio calculation 音声品質判定基準表（一例）Audio quality criteria table (example)

以下、本発明の実施形態について、図面を参照しながら詳細に説明していく。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、実施例１の音声発声状態判断装置のシステム構成図を示している。図１に示すように、音声発声状態判断装置は、装置本体１と音声発声手段となるスピーカー２とスピーカー２の傍に集音手段となるマイク３が設置された構成を有する。そして、装置本体１の中には、比較手段および判定手段となるＣＰＵ１１とＲＡＭ１２やＲＯＭ１３のメモリがある。また、音声情報や外部トリガー信号を入力できる外部インタフェースのＬＡＮＩ／Ｆ１４やＩ／Ｏコントローラ１５を備えている。
ＣＰＵ１１は、オペレーティング・システム（ＯＳ）、比較プログラムや判定プログラム等その他のアプリケーションに基づいた処理を行う。ＲＡＭ１２やＲＯＭ１３のメモリは、ＣＰＵ１１に対して作業領域を提供する。ＲＯＭ１３のメモリは、オペレーティング・システム（ＯＳ）、アプリケーションプログラムを記憶保持する。
また、スピーカー２やマイク３の音声を入出力する端子（２３，２４）と、その音声信号を入出力するＣＯＤＥＣ２２と、それを制御するＲＴＰコントローラ２０がある。ＣＯＤＥＣ２２と入出力端子（２３，２４）の間には、増幅器（２５，２６）が設けられている。
また、無線もしくは有線の通信ネットワークにインタフェースするＬＡＮＩ／Ｆ１４がある。また、外部のアナログ入出力、ディジタル入出力（接点）とインタフェースするＩ／Ｏコントローラ１５がある。さらに、演算専用プロセッサのＤＳＰ（デジタルシグナルプロセッサ）３０がある。これらは内部バス１６を介して接続されている。 FIG. 1 is a system configuration diagram of the voice utterance state determination apparatus according to the first embodiment. As shown in FIG. 1, the voice utterance state determination device has a configuration in which an apparatus main body 1, a speaker 2 as voice utterance means, and a microphone 3 as sound collection means are installed beside the speaker 2. In the apparatus main body 1, there are a CPU 11, a RAM 12, and a ROM 13 as comparison means and determination means. In addition, a LAN I / F 14 and an I / O controller 15 are provided as external interfaces through which voice information and external trigger signals can be input.
The CPU 11 performs processing based on other applications such as an operating system (OS), a comparison program, and a determination program. The RAM 12 and the ROM 13 provide a work area for the CPU 11. The memory of the ROM 13 stores and holds an operating system (OS) and application programs.
Further, there are terminals (23, 24) for inputting / outputting sound of the speaker 2 and the microphone 3, a CODEC 22 for inputting / outputting the sound signal, and an RTP controller 20 for controlling the same. Amplifiers (25, 26) are provided between the CODEC 22 and the input / output terminals (23, 24).
There is also a LAN I / F 14 that interfaces to a wireless or wired communication network. There is also an I / O controller 15 that interfaces with external analog input / output and digital input / output (contacts). Furthermore, there is a DSP (digital signal processor) 30 which is a processor dedicated to computation. These are connected via an internal bus 16.

次に、図２を用いて装置本体１の内部の処理について説明する。先ず、装置本体１からスピーカー２を介して発声される音声情報は、装置本体１内のＲＡＭ１２やＲＯＭ１３のメモリに記憶保持されている。この音声情報は、予めメモリに記憶保持されているものや、通信ネットワークインタフェースや無線インタフェースなどの外部インタフェースのＬＡＮＩ／Ｆ１４により、送られてくるものもある。装置本体１は、ＬＡＮＩ／Ｆ１４からの信号トリガーにより、メモリに記憶保持されている音声情報を再生し、スピーカー２から発声する。
マイク３は、スピーカー２からの直接音を収音する。そして、マイク３から収音された音は、以下の処理により、多段階で音声品質をチェックされ、品質状態が判定される。 Next, processing inside the apparatus main body 1 will be described with reference to FIG. First, audio information uttered from the apparatus main body 1 through the speaker 2 is stored and held in the RAM 12 or the ROM 13 in the apparatus main body 1. Some of the audio information is stored and held in a memory in advance, and some is transmitted by a LAN I / F 14 of an external interface such as a communication network interface or a wireless interface. The apparatus main body 1 reproduces the sound information stored in the memory and utters from the speaker 2 by a signal trigger from the LAN I / F 14.
The microphone 3 picks up the direct sound from the speaker 2. The sound collected from the microphone 3 is checked for voice quality in multiple stages by the following processing, and the quality state is determined.

先ず、ＣＰＵ１１が比較プラグラムの手順に従って、マイク３から集音された集音情報と、メモリに記憶保持された音声情報とを比較する。
（処理１）集音情報の信号スペクトラムの入力音量レベルが所定の閾値以上のレベルの場合に比較処理をする（入力レベル判定ステップ；Ｓ１１）。
入力音量レベルが所定の閾値より小さいものは、周囲に十分に告知できていないものと判断するためである。 First, the CPU 11 compares the sound collection information collected from the microphone 3 with the sound information stored in the memory according to the procedure of the comparison program.
(Process 1) A comparison process is performed when the input volume level of the signal spectrum of the sound collection information is equal to or higher than a predetermined threshold (input level determination step; S11).
If the input volume level is smaller than the predetermined threshold value, it is determined that the input volume level is not sufficiently notified to the surroundings.

（処理２）音声情報および集音情報の信号スペクトラムを正規化する（正規化ステップ；Ｓ１２）。
正規化するのは、例えば、音声情報の発生源である無線機のレベルの差に影響されなく、信号スペクトラム成分だけの比較を行うためである。 (Process 2) The signal spectrum of the voice information and the sound collection information is normalized (normalization step; S12).
The reason for normalizing is, for example, to compare only the signal spectrum components without being affected by the difference in level of the radio that is the source of the voice information.

（処理３）集音情報の信号スペクトラムの内、所定の音声帯域の周波数と振幅と、音声情報の周波数と振幅との相関値を演算する（相関演算ステップ；Ｓ１３）。
スピーカー２より発声された音声情報とマイク３から集音された集音情報とでは、スピーカー２やマイク３の特性や設置環境等により影響を受けるため、情報を完全一致させることは困難である。また、単純な音声のレベル（音声波形の大小）比較だと環境の騒音が入ってくるとわからない。そこで、集音情報の信号と音声情報の信号の２つの信号の相関を調べることにより、２つの信号の品質の良否をチェックすることとしたものである。
また、集音情報の信号スペクトラムの内、所定の音声帯域の周波数と振幅を処理対象としているのは、トラックの騒音、風の音、人の歓声、飛行機の音、騒音、鳴き声などのバックグラウンドの音を排除するためである。
具体的には、２つの信号の信号スペクトラムにおいて、音声帯域の周波数と振幅の相関関数を演算して、２つの信号の類似性を数値化している。相関関数の値が大きい信号同士が似た信号となる。
ここで、音声帯域とは、５０Ｈｚ〜１０ＫＨｚの周波数帯域をいう。この帯域の範囲内で、特に、通常の音声であれば、３００Ｈｚ〜３４００Ｈｚ帯域に存在する。かかる音声帯域の周波数範囲は、本発明装置において、チューニングできるパラメータとする。 (Process 3) A correlation value between the frequency and amplitude of a predetermined voice band and the frequency and amplitude of the voice information in the signal spectrum of the sound collection information is calculated (correlation calculation step; S13).
The voice information uttered from the speaker 2 and the collected sound information collected from the microphone 3 are affected by the characteristics of the speaker 2 and the microphone 3, the installation environment, etc., and it is difficult to make the information completely coincident. In addition, a simple comparison of sound levels (sound waveform magnitudes) does not indicate that environmental noise will enter. Therefore, the quality of the two signals is checked by examining the correlation between the two signals of the sound collection information signal and the sound information signal.
In addition, in the signal spectrum of the collected sound information, the frequency and amplitude of a predetermined voice band are subject to processing. It is for eliminating.
Specifically, in the signal spectrum of two signals, the correlation function between the frequency and amplitude of the voice band is calculated, and the similarity between the two signals is digitized. Signals having a large correlation function value are similar to each other.
Here, the voice band refers to a frequency band of 50 Hz to 10 KHz. Within this band range, particularly in the case of normal voice, it exists in the 300 Hz to 3400 Hz band. The frequency range of the voice band is a parameter that can be tuned in the device of the present invention.

（処理４）集音情報における音声帯域のスペクトルと音声帯域外のスペクトルのＳ／Ｎ比、音声帯域での原音スペクトルのＳ／Ｎ比を演算する（Ｓ／Ｎ比演算ステップ；Ｓ１４）
音声帯域の信号スペクトルと音声帯域の外側の信号スペクトルとのＳ／Ｎ比をとることにより、Ｓ／Ｎ比の高いものを明瞭度が高い音と看做している。
また、音声帯域の信号スペクトルにノイズとなる信号がある場合、音声帯域での原音スペクトルのＳ／Ｎ比をとり、Ｓ／Ｎ比の高いものを明瞭度が高い音と看做している。
図４にＳ／Ｎ比の演算についての説明図を示す。
図４（１）は、Ｓ／Ｎ比が高い場合を示しており、図４（２）は音声帯域以外の周波数帯域Ｚ_１（Ｈｚ）にピークがあり、Ｓ／Ｎ比が低い場合の例を示している。
また、図４（３）は音声帯域内の周波数帯域Ｚ_２（Ｈｚ）にピークを有するノイズ信号があるものは、音声帯域での原音スペクトルのＳ／Ｎ比が演算して、後述する音声品質判定基準の判定レベルに基づいて処理される。 (Process 4) The S / N ratio between the spectrum of the voice band and the spectrum outside the voice band in the sound collection information and the S / N ratio of the original sound spectrum in the voice band are calculated (S / N ratio calculation step; S14).
By taking the S / N ratio between the signal spectrum in the voice band and the signal spectrum outside the voice band, a sound with a high S / N ratio is regarded as a sound with high intelligibility.
Further, when there is a signal that becomes noise in the signal spectrum of the voice band, the S / N ratio of the original sound spectrum in the voice band is taken, and the one with a high S / N ratio is regarded as a sound with high clarity.
FIG. 4 is an explanatory diagram for calculating the S / N ratio.
FIG. 4 (1) shows a case where the S / N ratio is high, and FIG. 4 (2) shows an example where there is a peak in the frequency band Z ₁ (Hz) other than the voice band and the S / N ratio is low. Is shown.
FIG. 4 (3) shows that there is a noise signal having a peak in the frequency band Z ₂ (Hz) in the voice band, and the S / N ratio of the original sound spectrum in the voice band is calculated, and the voice quality described later Processing is performed based on the determination level of the determination criterion.

そして、ＣＰＵ１１が判定プラグラムの手順に従って、集音情報と音声情報とを比較した結果に基づき、スピーカー２から出力された音声の品質状態を判定する（品質状態判定ステップ；Ｓ１５）。
音声の品質が良好という判定基準は、相関演算ステップ（Ｓ１３）における相関値が所定値以上で、かつ、Ｓ／Ｎ比演算ステップ（Ｓ１４）におけるＳ／Ｎ比が所定値以上の条件を満足することとしている。
この良好の判定基準は、多段階で音声品質を判定することが可能である。
図５に、音声品質判定基準表の一例を示す。図５で示される音声品質判定基準は５段階の判定レベルに分けられている。“５”が最も音声品質がよく、数値が下がるに従って音声品質が悪くなる。判定レベルが“１”の場合は、音声情報の伝達無し（音が聞こえない）といった無音状態と意味する。判定レベル５〜２が、スピーカーから発せられる音声が聞こえる状態である。判定レベル“５”は、スピーカーから発せられる原音とマイクで収集された音が完全相似であり、いわゆるクリアの音声品質状態である。次に、判定レベル“４”は、マイクで収集された音に、原音に比べて、ややザワツキ（ノイズ）が聞こえるといった音声品質状態である。音声はクリアである。Ｓ／Ｎ比でいうと２０ｄＢ以上の状態を指す。また、判定レベル“３”は、マイクで収集された音から、音声を聞き分けることが可能といった音声品質状態である。Ｓ／Ｎ比でいうと１０ｄＢ以上２０ｄＢ未満の状態を指す。そして、判定レベル“２”は、マイクで収集された音から、音声を聞き分けることが可能であるが、ノイズと音声の両方が聞こえるといった音声品質状態である。Ｓ／Ｎ比でいうと６ｄＢ以下の状態を指す。 Then, the CPU 11 determines the quality state of the sound output from the speaker 2 based on the result of comparing the sound collection information and the sound information according to the procedure of the determination program (quality state determination step; S15).
The determination criterion that the sound quality is good satisfies the condition that the correlation value in the correlation calculation step (S13) is a predetermined value or more and the S / N ratio in the S / N ratio calculation step (S14) is a predetermined value or more. I am going to do that.
This good criterion can determine the voice quality in multiple stages.
FIG. 5 shows an example of the voice quality determination criterion table. The voice quality judgment criteria shown in FIG. 5 are divided into five judgment levels. “5” has the best audio quality, and the audio quality deteriorates as the value decreases. When the determination level is “1”, it means a silent state in which no audio information is transmitted (no sound can be heard). Determination levels 5 and 2 are states in which the sound emitted from the speakers can be heard. The judgment level “5” is a so-called clear voice quality state in which the original sound emitted from the speaker and the sound collected by the microphone are completely similar. Next, the determination level “4” is a voice quality state in which the sound collected by the microphone is slightly more audible (noise) than the original sound. The voice is clear. In terms of the S / N ratio, it refers to a state of 20 dB or more. The determination level “3” is a voice quality state in which it is possible to distinguish the voice from the sound collected by the microphone. In terms of the S / N ratio, it indicates a state of 10 dB or more and less than 20 dB. The determination level “2” is a sound quality state in which it is possible to distinguish the sound from the sound collected by the microphone, but both noise and sound can be heard. In terms of the S / N ratio, it indicates a state of 6 dB or less.

なお、マイク３で集音する集音情報は、スピーカー２からの直接音のみが対象である。仮に、ホールなど周辺の建築物等により反響音がある場合は、遅延を考慮した演算を行うことで、品質状態の判定を行うことができる。 Note that the sound collection information collected by the microphone 3 is only for the direct sound from the speaker 2. If there is a reverberation sound due to surrounding buildings such as a hall, the quality state can be determined by performing an operation that takes delay into account.

また、マイク３より収音された集音情報は全て、本体装置１のメモリ１２によって保存される。ＣＰＵ１１は、外部インタフェース１３を介して、比較プログラムの結果や判定プログラムの結果を日時情報を付加してプリンタ出力する。 Further, all the collected sound information collected from the microphone 3 is stored in the memory 12 of the main device 1. The CPU 11 adds the date and time information to the printer and outputs the result of the comparison program and the result of the determination program via the external interface 13.

本発明は、事業用無線を利用する広域監視システムや防災無線を利用する防災システムなどの音声告知装置として有用である。 INDUSTRIAL APPLICABILITY The present invention is useful as a voice notification device such as a wide-area monitoring system that uses business radio and a disaster prevention system that uses a disaster prevention radio.

１装置本体
２スピーカー
３マイク
１１ＣＰＵ
１２ＲＡＭ
１３ＲＯＭ
１４ＬＡＮインタフェース（Ｉ／Ｆ）
１５Ｉ／Ｏコントローラ

1 Device body 2 Speaker 3 Microphone 11 CPU
12 RAM
13 ROM
14 LAN interface (I / F)
15 I / O controller

Claims

In a device that announces voice information to the surroundings by a voice utterance means such as a speaker,
Voice information holding means for storing and holding the voice information to be notified;
Sound collecting means for collecting sounds around the device;
Comparison means for comparing the collected sound information collected from the sound collecting means with the voice information;
A determination unit that determines a quality state of the voice output from the voice utterance unit based on a result of comparing the collected sound information and the voice information;
Comprising at least
The comparison means includes
An input level determination step for performing comparison processing when the input volume level of the signal spectrum of the sound collection information is a level equal to or higher than a predetermined threshold;
A normalizing step of normalizing a signal spectrum of the audio information and the sound collection information;
A correlation calculation step of calculating a correlation value between the frequency and amplitude of a predetermined voice band in the signal spectrum of the sound collection information and the frequency and amplitude of the voice information;
A S / N ratio calculation step of calculating a S / N ratio of the spectrum of the voice band and the spectrum outside the voice band in the sound collection information , and a S / N ratio of the original sound spectrum in the voice band ;
The determination means includes
When the correlation value in the correlation calculation step is equal to or higher than a predetermined value and each S / N ratio in the S / N ratio calculation step is equal to or higher than a predetermined value, the quality of the voice output from the voice utterance means is good. A determination step for determining,
A speech utterance state determination device characterized by the above.

Voice information receiving means for receiving voice information by wired or wireless communication;
2. The voice utterance state determination device according to claim 1, wherein the voice information received by the voice information receiving unit is stored and held by the voice information holding unit.

The voice utterance state according to claim 2, further comprising a retry unit that, in the determination step of the determination unit, announces the voice information other than that determined to be good to the surroundings again by the voice utterance unit. Judgment device.

4. The voice utterance state determination device according to claim 1, further comprising a result output means for outputting the result of the comparison means and / or the determination means to the outside.

A disaster prevention system comprising the voice utterance state determination device according to any one of claims 1 to 4.