JP4745916B2 - Noise suppression speech quality estimation apparatus, method and program - Google Patents

Noise suppression speech quality estimation apparatus, method and program Download PDF

Info

Publication number
JP4745916B2
JP4745916B2 JP2006225158A JP2006225158A JP4745916B2 JP 4745916 B2 JP4745916 B2 JP 4745916B2 JP 2006225158 A JP2006225158 A JP 2006225158A JP 2006225158 A JP2006225158 A JP 2006225158A JP 4745916 B2 JP4745916 B2 JP 4745916B2
Authority
JP
Japan
Prior art keywords
noise
speech
signal
suppressed
speech signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2006225158A
Other languages
Japanese (ja)
Other versions
JP2008015443A (en
Inventor
則次 恵木
仁志 青木
玲 高橋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2006225158A priority Critical patent/JP4745916B2/en
Publication of JP2008015443A publication Critical patent/JP2008015443A/en
Application granted granted Critical
Publication of JP4745916B2 publication Critical patent/JP4745916B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephone Function (AREA)

Description

本発明は、雑音抑圧処理技術を利用する音声通信サービスにおける音声品質の評価技術に係り、特に周囲騒音の影響が大きい通信環境下での音声品質を評価する技術に関するものである。   The present invention relates to a voice quality evaluation technique in a voice communication service using a noise suppression processing technique, and more particularly to a technique for evaluating voice quality in a communication environment where the influence of ambient noise is large.

周囲騒音が大きい環境の音声通信では、送話器に騒音が混入することで、受話者は雑音が重畳した音声を受聴することとなる。ハンズフリー通信では、ハンドセットやヘッドセットを利用する場合に比べて、人間の口とマイクとの間の距離が長くなるため、マイクの収音範囲が広くなり、周囲騒音の影響を受けやすい。また、携帯電話による音声通信では、周囲騒音の大きい室外の環境で使用されることが多いため、ハンドセット通信においても周囲騒音の影響を受けやすい。そのため、このような通信形態では雑音抑圧処理技術が重要となる。   In voice communication in an environment with a high ambient noise, the noise is mixed in the transmitter, so that the listener listens to the voice with the superimposed noise. In hands-free communication, since the distance between the human mouth and the microphone is longer than when using a handset or headset, the sound collection range of the microphone is widened and is easily affected by ambient noise. Further, since voice communication using a mobile phone is often used in an outdoor environment where there is a large amount of ambient noise, the handset communication is also susceptible to ambient noise. Therefore, a noise suppression processing technique is important in such a communication form.

従来、様々な手法を用いた雑音抑圧処理技術が開発されている。高品質な音声通信サービスを提供するためには、雑音抑圧処理技術の性能を正確に把握し、方式のパラメータ最適化および方式選定を行うことが重要である。そのため、雑音抑圧音声の品質評価法が望まれる。
音声品質評価の基本は、実際に音声受聴や会話を行うことによる心理評価に基づく主観品質評価である。主観品質評価は、ユーザが実感する品質を直接的に評価することができる反面、十分な数の被験者や専用の設備が必要となり、多大なコストや時間を要するなど簡便ではない。
Conventionally, noise suppression processing techniques using various methods have been developed. In order to provide a high-quality voice communication service, it is important to accurately grasp the performance of the noise suppression processing technology, optimize the parameters of the method, and select the method. Therefore, a quality evaluation method for noise-suppressed speech is desired.
The basis of voice quality evaluation is subjective quality evaluation based on psychological evaluation by actually listening to voice or talking. Subjective quality evaluation can directly evaluate the quality perceived by the user, but it requires a sufficient number of subjects and dedicated equipment, and is not as simple as requiring significant costs and time.

そこで、人間による主観評価の代わりに、音声信号の物理量に基づいて効率的に主観品質を推定する技術が望まれる。このような技術を客観品質評価と呼ぶ。現在、雑音抑圧処理性能の客観的指標として最も広く用いられている特徴量に雑音除去量が挙げられるが、主観品質との対応という観点では必ずしも十分ではない。なぜなら、雑音の抑圧処理の過程で音声や雑音に歪みが生じ、主観品質に影響を与える要因となるため、主観品質を適切に推定するためには、このような歪みも考慮する必要があるからである。   Therefore, a technique for efficiently estimating subjective quality based on a physical quantity of an audio signal is desired instead of human subjective evaluation. Such a technique is called objective quality evaluation. Currently, noise removal is one of the most widely used feature quantities as an objective index of noise suppression processing performance, but it is not always sufficient in terms of correspondence with subjective quality. This is because distortion occurs in the speech and noise during the noise suppression process, and this affects the subjective quality. Therefore, in order to estimate the subjective quality appropriately, it is necessary to consider such distortion. It is.

音声歪みを評価可能な客観品質評価技術として、非特許文献1に開示されたPESQ(Perceptual evaluation of speech quality )がある。PESQは、原音声信号と、評価対象となる符号化方式や装置で処理された劣化音声信号とを入力とし、両信号の差分から評価対象の品質を測定する技術である。図8、図9はPESQを用いた品質評価系の構成例を示すブロック図である。図8は雑音が重畳していない音声信号の品質評価を行う場合の構成を示し、図9は雑音が重畳している音声信号の品質評価を行う場合の構成を示している。図8、図9において、100は評価対象装置、101はPESQ装置、102は音声加算器である。   Non-Patent Document 1 discloses PESQ (Perceptual evaluation of speech quality) as an objective quality evaluation technique capable of evaluating speech distortion. PESQ is a technique for measuring the quality of an evaluation target from the difference between the original speech signal and a degraded speech signal processed by an encoding method or apparatus to be evaluated. 8 and 9 are block diagrams showing a configuration example of a quality evaluation system using PESQ. FIG. 8 shows a configuration for evaluating the quality of an audio signal on which noise is not superimposed, and FIG. 9 shows a configuration for evaluating the quality of an audio signal on which noise is superimposed. 8 and 9, reference numeral 100 denotes an evaluation target apparatus, 101 denotes a PESQ apparatus, and 102 denotes an audio adder.

ITU-T Recommendation P.862,「Perceptual evaluation of speech quality(PESQ),an obective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs」,Feb.2001ITU-T Recommendation P.862, “Perceptual evaluation of speech quality (PESQ), an obective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs”, Feb.2001

図9に示すとおり、PESQでは、雑音が重畳される前の音声信号と雑音抑圧音声信号とを入力としているため、音声歪みを考慮した雑音重畳音声を品質評価することは可能である。しかしながら、PESQでは、抑圧処理前の雑音に関する入力が無いために、雑音歪みを考慮した評価を行うことができない。よって、非特許文献1に開示されたPESQでは、雑音抑圧音声の評価を正確に行うことはできないという問題点があった。   As shown in FIG. 9, in PESQ, since the speech signal before noise is superimposed and the noise-suppressed speech signal are input, it is possible to evaluate the quality of the speech with superimposed noise in consideration of speech distortion. However, in PESQ, since there is no input related to noise before suppression processing, it is not possible to perform evaluation in consideration of noise distortion. Therefore, the PESQ disclosed in Non-Patent Document 1 has a problem that noise-suppressed speech cannot be accurately evaluated.

本発明は、上記課題を解決するためになされたもので、雑音抑圧音声の評価を正確に行うことができる雑音抑圧音声品質推定装置、方法およびプログラムを提供することを目的とする。   The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a noise-suppressed speech quality estimation apparatus, method, and program capable of accurately evaluating a noise-suppressed speech.

本発明は、雑音抑圧音声の品質を客観的に推定する雑音抑圧音声品質推定装置であって、評価対象となる雑音抑圧処理装置への入力として雑音重畳音声信号を与えたときに、前記雑音抑圧処理装置から出力される雑音抑圧音声信号の品質要因の特徴量を検出する検出手段と、この検出手段により検出された特徴量に基づいて前記雑音抑圧音声信号の品質を推定する推定手段とを備え、前記検出手段は、前記雑音抑圧音声信号を一定時間で区切ったときの各区間が音声区間か無音声区間かを判別する判別手段と、音声区間における前記雑音抑圧音声信号の品質要因の特徴量を検出する音声区間特徴量検出手段と、無音声区間における前記雑音抑圧音声信号の品質要因の特徴量を検出する無音声区間特徴量検出手段とを備え、前記音声区間特徴量検出手段は、雑音が重畳される前の音声信号とこれに対応する前記音声区間における雑音抑圧音声信号とを比較することにより、前記雑音抑圧音声信号の品質要因の特徴量として音声歪みを検出する音声歪み測定部と、前記雑音抑圧音声信号の品質要因の特徴量として前記音声区間における雑音抑圧音声信号の音量を検出する音量測定部とを有し、前記無音声区間特徴量検出手段は、前記無音声区間における雑音抑圧音声信号とこれに対応する前記雑音重畳音声信号又はこの雑音重畳音声信号の元となる雑音信号とを比較することにより、前記雑音抑圧音声信号の品質要因の特徴量として雑音歪みを検出する雑音歪み測定部と、前記雑音抑圧音声信号の品質要因の特徴量として前記無音声区間における雑音抑圧音声信号の雑音量を検出する雑音量測定部とを有し、前記雑音歪みは、前記無音声区間における雑音抑圧音声信号を劣化音声信号、これに対応する前記雑音重畳音声信号又はこの雑音重畳音声信号の元となる雑音信号を参照信号としたときのPESQ値もしくはWideband−PESQ値であり、前記推定手段は、前記音声歪みと前記雑音抑圧音声信号の音量と前記雑音歪みと前記雑音抑圧音声信号の雑音量とに基づいて前記雑音抑圧音声信号の品質を推定することを特徴とするものである。 The present invention provides a noise-suppressed speech quality estimation device that objectively estimates the quality of noise-suppressed speech, and when the noise-superimposed speech signal is given as an input to a noise suppression processing device to be evaluated, the noise suppression Detection means for detecting a feature amount of a quality factor of a noise-suppressed speech signal output from the processing device, and estimation means for estimating the quality of the noise-suppressed speech signal based on the feature amount detected by the detection means The detecting means determines whether each section when the noise-suppressed speech signal is divided at a predetermined time is a speech section or a non-speech section; and a feature quantity of the quality factor of the noise-suppressed speech signal in the speech section Voice section feature quantity detecting means for detecting the noise section feature quantity detecting means for detecting the feature quantity of the quality factor of the noise-suppressed speech signal in the no voice section, and the voice section feature quantity The output means detects speech distortion as a feature quantity of the quality factor of the noise-suppressed speech signal by comparing the speech signal before noise is superimposed with the noise-suppressed speech signal in the speech section corresponding to the speech signal. A speech distortion measuring unit; and a volume measuring unit that detects a volume of the noise-suppressed speech signal in the speech segment as a feature amount of the quality factor of the noise-suppressed speech signal, By comparing the noise-suppressed speech signal in the non-speech interval with the corresponding noise-superimposed speech signal or the noise signal that is the basis of this noise-superposed speech signal, noise is used as a feature quantity of the quality factor of the noise-suppressed speech signal. A noise distortion measuring unit for detecting distortion; and a noise amount detecting unit for detecting a noise amount of the noise-suppressed speech signal in the non-speech interval as a feature factor of the quality factor of the noise-suppressed speech signal. The noise distortion is obtained by referring to the noise-suppressed speech signal in the no-speech interval as a degraded speech signal, the noise superimposed speech signal corresponding thereto, or the noise signal that is the basis of this noise superimposed speech signal. A PESQ value or Wideband-PESQ value when a signal is used, and the estimating means is configured to determine the noise based on the voice distortion, the volume of the noise-suppressed voice signal, the noise distortion, and the noise amount of the noise-suppressed voice signal. The quality of the suppressed speech signal is estimated .

また、本発明の雑音抑圧音声品質推定装置の1構成例は、さらに、音声信号が予め登録された音声データベースと、この音声データベースの音声信号を実通話環境下で再生する音声再生手段と、前記再生された音声を集音したときに得られる信号を前記雑音重畳音声信号として出力する音声録音手段とを備えるものである。
また、本発明の雑音抑圧音声品質推定装置の1構成例は、さらに、音声信号が予め登録された音声データベースと、実通話環境下で雑音信号を集音する音声録音手段と、前記音声データベースの音声信号と前記音声録音手段が集音した雑音信号とを加算した信号を前記雑音重畳音声信号として出力する音声加算手段とを備えるものである。
また、本発明の雑音抑圧音声品質推定装置の1構成例は、さらに、音声信号が予め登録された音声データベースと、雑音信号が予め登録された雑音データベースと、前記音声データベースの音声信号と前記雑音データベースの雑音信号とを加算した信号を前記雑音重畳音声信号として出力する音声加算手段とを備えるものである。
In addition, one configuration example of the noise-suppressed speech quality estimation apparatus of the present invention further includes a speech database in which speech signals are registered in advance, speech playback means for playing back the speech signals in the speech database in an actual call environment, Voice recording means for outputting a signal obtained when the reproduced voice is collected as the noise-superimposed voice signal.
In addition, one configuration example of the noise-suppressed speech quality estimation apparatus of the present invention further includes a speech database in which speech signals are registered in advance, speech recording means for collecting noise signals in an actual call environment, and the speech database. And a sound adding means for outputting a signal obtained by adding the sound signal and the noise signal collected by the sound recording means as the noise superimposed sound signal.
Further, one configuration example of the noise-suppressed speech quality estimation apparatus of the present invention further includes a speech database in which speech signals are registered in advance, a noise database in which noise signals are registered in advance, a speech signal in the speech database, and the noise And a voice adding means for outputting a signal obtained by adding the noise signal of the database as the noise superimposed voice signal.

また、本発明の雑音抑圧音声品質推定方法は、評価対象となる雑音抑圧処理装置への入力として雑音重畳音声信号を与えたときに、前記雑音抑圧処理装置から出力される雑音抑圧音声信号の品質要因の特徴量を検出する検出手順と、この検出手順により検出された特徴量に基づいて前記雑音抑圧音声信号の品質を推定する推定手順とを備え、前記検出手順は、前記雑音抑圧音声信号を一定時間で区切ったときの各区間が音声区間か無音声区間かを判別する判別手順と、音声区間における前記雑音抑圧音声信号の品質要因の特徴量を検出する音声区間特徴量検出手順と、無音声区間における前記雑音抑圧音声信号の品質要因の特徴量を検出する無音声区間特徴量検出手順とからなり、前記音声区間特徴量検出手順は、雑音が重畳される前の音声信号とこれに対応する前記音声区間における雑音抑圧音声信号とを比較することにより、前記雑音抑圧音声信号の品質要因の特徴量として音声歪みを検出する音声歪み測定手順と、前記雑音抑圧音声信号の品質要因の特徴量として前記音声区間における雑音抑圧音声信号の音量を検出する音量測定手順とからなり、前記無音声区間特徴量検出手順は、前記無音声区間における雑音抑圧音声信号とこれに対応する前記雑音重畳音声信号又はこの雑音重畳音声信号の元となる雑音信号とを比較することにより、前記雑音抑圧音声信号の品質要因の特徴量として雑音歪みを検出する雑音歪み測定手順と、前記雑音抑圧音声信号の品質要因の特徴量として前記無音声区間における雑音抑圧音声信号の雑音量を検出する雑音量測定手順とからなり、前記雑音歪みは、前記無音声区間における雑音抑圧音声信号を劣化音声信号、これに対応する前記雑音重畳音声信号又はこの雑音重畳音声信号の元となる雑音信号を参照信号としたときのPESQ値もしくはWideband−PESQ値であり、前記推定手順は、前記音声歪みと前記雑音抑圧音声信号の音量と前記雑音歪みと前記雑音抑圧音声信号の雑音量とに基づいて前記雑音抑圧音声信号の品質を推定することを特徴とするものである。 The noise-suppressed speech quality estimation method of the present invention provides a quality of a noise-suppressed speech signal output from the noise suppression processing device when a noise superimposed speech signal is given as an input to the noise suppression processing device to be evaluated. A detection procedure for detecting a feature quantity of the factor, and an estimation procedure for estimating the quality of the noise-suppressed speech signal based on the feature quantity detected by the detection procedure, wherein the detection procedure includes the noise-suppressed speech signal. A discriminating procedure for discriminating whether each segment is a speech segment or a non-speech segment when divided by a certain time, a speech segment feature detection procedure for detecting a feature factor of the quality factor of the noise-suppressed speech signal in a speech segment, A non-voice section feature quantity detection procedure for detecting a feature quantity of a quality factor of the noise-suppressed voice signal in a voice section, and the voice section feature quantity detection procedure is performed before speech is superimposed with noise. A speech distortion measurement procedure for detecting speech distortion as a feature quantity of the quality factor of the noise-suppressed speech signal by comparing the signal and a noise-suppressed speech signal in the speech section corresponding thereto, and the noise-suppressed speech signal A volume measurement procedure for detecting a volume of a noise-suppressed speech signal in the speech segment as a feature factor of the quality factor, and the speechless feature feature detection procedure corresponds to a noise-suppressed speech signal in the speechless segment A noise distortion measurement procedure for detecting noise distortion as a feature quantity of a quality factor of the noise-suppressed voice signal by comparing the noise-superimposed voice signal or a noise signal that is a source of the noise-superimposed voice signal, and the noise suppression A noise amount measurement procedure for detecting a noise amount of a noise-suppressed speech signal in the silent period as a feature amount of a quality factor of the speech signal, The sound distortion is a PESQ value or Wideband when the noise-suppressed speech signal in the non-speech interval is a degraded speech signal, and the corresponding noise superimposed speech signal or the noise signal that is the basis of this noise superimposed speech signal is a reference signal. A PESQ value, and the estimation procedure estimates the quality of the noise-suppressed speech signal based on the speech distortion, the volume of the noise-suppressed speech signal, the noise distortion, and the amount of noise of the noise-suppressed speech signal. It is characterized by .

また、本発明の雑音抑圧音声品質推定プログラムは、評価対象となる雑音抑圧処理装置への入力として雑音重畳音声信号を与えたときに、前記雑音抑圧処理装置から出力される雑音抑圧音声信号の品質要因の特徴量を検出する検出手順と、この検出手順により検出された特徴量に基づいて前記雑音抑圧音声信号の品質を推定する推定手順とをコンピュータに実行させ、前記検出手順は、前記雑音抑圧音声信号を一定時間で区切ったときの各区間が音声区間か無音声区間かを判別する判別手順と、音声区間における前記雑音抑圧音声信号の品質要因の特徴量を検出する音声区間特徴量検出手順と、無音声区間における前記雑音抑圧音声信号の品質要因の特徴量を検出する無音声区間特徴量検出手順とからなり、前記音声区間特徴量検出手順は、雑音が重畳される前の音声信号とこれに対応する前記音声区間における雑音抑圧音声信号とを比較することにより、前記雑音抑圧音声信号の品質要因の特徴量として音声歪みを検出する音声歪み測定手順と、前記雑音抑圧音声信号の品質要因の特徴量として前記音声区間における雑音抑圧音声信号の音量を検出する音量測定手順とからなり、前記無音声区間特徴量検出手順は、前記無音声区間における雑音抑圧音声信号とこれに対応する前記雑音重畳音声信号又はこの雑音重畳音声信号の元となる雑音信号とを比較することにより、前記雑音抑圧音声信号の品質要因の特徴量として雑音歪みを検出する雑音歪み測定手順と、前記雑音抑圧音声信号の品質要因の特徴量として前記無音声区間における雑音抑圧音声信号の雑音量を検出する雑音量測定手順とからなり、前記雑音歪みは、前記無音声区間における雑音抑圧音声信号を劣化音声信号、これに対応する前記雑音重畳音声信号又はこの雑音重畳音声信号の元となる雑音信号を参照信号としたときのPESQ値もしくはWideband−PESQ値であり、前記推定手順は、前記音声歪みと前記雑音抑圧音声信号の音量と前記雑音歪みと前記雑音抑圧音声信号の雑音量とに基づいて前記雑音抑圧音声信号の品質を推定することを特徴とするものである。 The noise-suppressed speech quality estimation program of the present invention provides a quality of a noise-suppressed speech signal output from the noise suppression processing device when a noise-superimposed speech signal is given as an input to the noise suppression processing device to be evaluated. A detection procedure for detecting a feature quantity of the factor and an estimation procedure for estimating the quality of the noise-suppressed speech signal based on the feature quantity detected by the detection procedure are executed by the computer, and the detection procedure includes the noise suppression. A determination procedure for determining whether each section is a speech section or a non-speech section when the speech signal is divided at a predetermined time, and a speech section feature amount detection procedure for detecting a feature amount of the quality factor of the noise-suppressed speech signal in the speech section And a voiceless section feature quantity detection procedure for detecting a feature quantity of a quality factor of the noise-suppressed voice signal in the voiceless section, and the voice section feature quantity detection procedure includes: A speech distortion measurement procedure for detecting speech distortion as a feature quantity of the quality factor of the noise-suppressed speech signal by comparing the speech signal before the sound is superimposed and the noise-suppressed speech signal in the speech section corresponding to the speech signal. And a volume measurement procedure for detecting the volume of the noise-suppressed speech signal in the speech section as a feature quantity of the quality factor of the noise-suppressed speech signal, and the silent section feature amount detection procedure includes noise in the silent section. Noise that detects noise distortion as a feature quantity of the quality factor of the noise-suppressed speech signal by comparing the suppressed speech signal with the noise-superimposed speech signal corresponding thereto or the noise signal that is the basis of the noise-superposed speech signal A noise amount for detecting a noise amount of the noise-suppressed speech signal in the silent period as a distortion measurement procedure and a feature amount of the quality factor of the noise-suppressed speech signal The noise distortion includes a noise-suppressed speech signal in the no-speech interval as a degraded speech signal, a corresponding noise superimposed speech signal or a noise signal that is a source of this noise superimposed speech signal as a reference signal. PESQ value or Wideband-PESQ value when the noise suppression speech is calculated based on the speech distortion, the volume of the noise-suppressed speech signal, the noise distortion, and the noise amount of the noise-suppressed speech signal. The signal quality is estimated .

本発明によれば、評価対象となる雑音抑圧処理装置への入力として雑音重畳音声信号を与え、雑音抑圧処理装置から出力される雑音抑圧音声信号の品質要因の特徴量を検出する際に、雑音重畳音声信号又は雑音重畳音声信号の元となる雑音信号と雑音抑圧音声信号とを比較することにより、雑音抑圧音声信号の品質要因の特徴量として少なくとも雑音歪みを検出し、この特徴量に基づいて雑音抑圧音声信号の品質を推定することにより、雑音抑圧音声のユーザ体感に即した品質推定が可能となり、雑音抑圧処理技術の適切な設定や性能の比較を安価かつ容易に行うことが可能となる。例えば、音声通話を行う環境に応じた音声信号および雑音信号を入力として与えることで、各環境における雑音抑圧処理技術の性能を知ることができる。また、雑音抑圧処理技術を開発している業者にとって、開発技術の性能を知ることを可能とする。さらに、音声通信端末を設計している業者にとって、端末内に雑音抑圧処理技術を組み込む場合に、想定される使用環境に応じた最良の技術の選択および設定を可能とする。
本発明では、雑音抑圧音声信号の雑音量を測定する際に、一定時間ごとの音量を測定することで突発的な雑音を捉える。これにより、突発的な雑音が体感品質に与える影響を考慮した雑音抑圧音声信号の雑音量の測定を行うことが可能となる。
本発明では、雑音抑圧音声信号の音声歪みを測定する際に、雑音抑圧音声信号と雑音重畳音声信号の元となる音声信号を比較することにより検出した歪みの大きさから、雑音の音量に基づいて雑音による歪みの影響を取り除く。これにより、雑音抑圧音声信号の純粋な音声歪みの測定を行うことが可能となる。
According to the present invention, when a noise-superimposed speech signal is given as an input to the noise suppression processing device to be evaluated and the feature quantity of the quality factor of the noise-suppressed speech signal output from the noise suppression processing device is detected, noise is detected. By comparing the noise signal that is the basis of the superimposed speech signal or the noise superimposed speech signal with the noise-suppressed speech signal, at least noise distortion is detected as a feature amount of the quality factor of the noise-suppressed speech signal, and based on this feature amount Estimating the quality of noise-suppressed speech signals makes it possible to estimate the quality of noise-suppressed speech in accordance with the user experience, making it possible to make appropriate settings for noise suppression processing technology and compare performance at low cost and easily. . For example, it is possible to know the performance of the noise suppression processing technique in each environment by providing a voice signal and a noise signal according to the environment where the voice call is performed as inputs. In addition, it is possible for a company developing noise suppression processing technology to know the performance of the developed technology. Further, when a voice communication terminal is designed, when the noise suppression processing technique is incorporated in the terminal, it is possible to select and set the best technique according to the assumed use environment.
In the present invention, when measuring the amount of noise of a noise-suppressed speech signal, sudden noise is captured by measuring the volume at regular intervals. As a result, it is possible to measure the noise amount of the noise-suppressed speech signal in consideration of the effect of sudden noise on the quality of experience.
In the present invention, when measuring the voice distortion of the noise-suppressed voice signal, the noise level is calculated based on the noise volume from the magnitude of the distortion detected by comparing the noise-suppressed voice signal and the voice signal that is the source of the noise-superimposed voice signal. To eliminate the effects of noise distortion. This makes it possible to measure pure speech distortion of the noise-suppressed speech signal.

また、本発明では、音声データベースの音声信号を実通話環境下で再生し、再生された音声を集音したときに得られる信号を雑音重畳音声信号として出力することにより、実通話環境下における雑音抑圧音声のユーザ体感に即した品質推定を正確かつ容易に行うことが可能となる。   Further, the present invention reproduces a voice signal in a voice database under a real call environment, and outputs a signal obtained when the reproduced voice is collected as a noise superimposed voice signal, whereby noise in the real call environment is obtained. It is possible to accurately and easily perform quality estimation in accordance with the user experience of the suppressed speech.

また、本発明では、実通話環境下で雑音信号を集音し、音声データベースの音声信号と集音した雑音信号とを加算した信号を雑音重畳音声信号として出力することにより、実通話環境下における雑音抑圧音声のユーザ体感に即した品質推定を正確かつ容易に行うことが可能となる。   In the present invention, a noise signal is collected in a real call environment, and a signal obtained by adding the voice signal of the voice database and the collected noise signal is output as a noise superimposed voice signal. It is possible to accurately and easily perform quality estimation in accordance with the user experience of noise-suppressed speech.

また、本発明では、音声データベースの音声信号と雑音データベースの雑音信号とを加算した信号を雑音重畳音声信号として出力することにより、様々な雑音環境下における雑音抑圧音声のユーザ体感に即した品質推定を正確かつ容易に行うことが可能となる。   Further, in the present invention, a signal obtained by adding the voice signal of the voice database and the noise signal of the noise database is output as a noise-superimposed voice signal, so that the quality estimation according to the user experience of the noise-suppressed voice under various noise environments. Can be performed accurately and easily.

[第1の実施の形態]
以下、本発明の実施の形態について図面を用いて説明する。図1は、本発明の第1の実施の形態に係る雑音抑圧音声品質推定装置の構成例を示すブロック図である。
図1に示すように、雑音抑圧音声品質推定装置1は、音声データベース部2、音声再生部3、スピーカ4、音声録音部5、マイク6、音声区間検出部7(判別手段)、遅延補正部8−1,8−2−1,8−2−2、スイッチ制御部9、音声連結部10(10−1〜10−4)、音声歪み測定部11、雑音歪み測定部12、音量測定部13、雑音量測定部14、音声品質推定部15(推定手段)、音声品質出力部16、スイッチ20,21,22を備えている。
[First Embodiment]
Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration example of a noise-suppressed speech quality estimation apparatus according to the first embodiment of the present invention.
As shown in FIG. 1, a noise-suppressed speech quality estimation apparatus 1 includes a speech database unit 2, a speech playback unit 3, a speaker 4, a speech recording unit 5, a microphone 6, a speech segment detection unit 7 (discriminating means), and a delay correction unit. 8-1, 8-2-1, 8-2-2, switch control unit 9, audio connection unit 10 (10-1 to 10-4), audio distortion measurement unit 11, noise distortion measurement unit 12, volume measurement unit 13, a noise amount measurement unit 14, a voice quality estimation unit 15 (estimation means), a voice quality output unit 16, and switches 20, 21 and 22.

音声区間検出部7と遅延補正部8−1,8−2−1,8−2−2とスイッチ制御部9と音声連結部10と音声歪み測定部11と雑音歪み測定部12と音量測定部13と雑音量測定部14とスイッチ20〜22とは、雑音抑圧音声信号の品質要因の特徴量を検出する検出手段を構成している。   Voice section detection unit 7, delay correction unit 8-1, 8-2-1, 8-2-2, switch control unit 9, voice connection unit 10, voice distortion measurement unit 11, noise distortion measurement unit 12, and volume measurement unit 13, the noise amount measurement unit 14, and the switches 20 to 22 constitute detection means for detecting the feature amount of the quality factor of the noise-suppressed speech signal.

音声再生部3は、音声データベース部2から音声信号Aを取得し、音声通話を行う環境下で音声信号Aをスピーカ4から出力する。
マイク6は、スピーカ4から出力された音声を集音する。このとき、マイク6が集音する音声には雑音が重畳されているので、マイク6から出力される音声信号を雑音重畳音声信号Bとする。音声録音部5は、マイク6が集音した雑音重畳音声信号Bを取得して評価対象装置100(雑音抑圧処理装置)と遅延補正部8−2−2に入力する。雑音重畳音声信号Bを評価対象装置100に入力することで、評価対象装置100から出力される雑音抑圧処理された音声信号をCとする。評価対象装置100の例としては、例えば音声通信端末がある。
The audio reproducing unit 3 acquires the audio signal A from the audio database unit 2 and outputs the audio signal A from the speaker 4 in an environment where a voice call is performed.
The microphone 6 collects sound output from the speaker 4. At this time, since noise is superimposed on the sound collected by the microphone 6, the sound signal output from the microphone 6 is referred to as a noise superimposed sound signal B. The voice recording unit 5 acquires the noise superimposed voice signal B collected by the microphone 6 and inputs it to the evaluation target device 100 (noise suppression processing device) and the delay correction unit 8-2-2. By inputting the noise-superimposed speech signal B to the evaluation target device 100, let C be the speech signal that has been subjected to noise suppression processing and is output from the evaluation target device 100. An example of the evaluation target device 100 is a voice communication terminal, for example.

遅延補正部8−1は、音声信号Aと雑音抑圧音声信号Cとを入力とし、雑音抑圧音声信号Cの音声信号Aに対する遅延時間を測定する。遅延補正部8−1は、音声信号Aと雑音抑圧音声信号Cとの間の短時間相互相関係数が最大となる時間を求めることにより、雑音抑圧音声信号Cの遅延時間を測定する。遅延補正部8−2−1は、音声信号Aに遅延補正部8−1で測定された遅延時間分だけ遅延を与えることにより、雑音抑圧音声信号Cと時刻が同期した音声信号A’を出力する。遅延補正部8−2−2は、雑音重畳音声信号Bに遅延補正部8−1で測定された遅延時間分だけ遅延を与えることにより、雑音抑圧音声信号Cと時刻が同期した雑音重畳音声信号B’を出力する。   The delay correction unit 8-1 receives the audio signal A and the noise-suppressed audio signal C, and measures the delay time of the noise-suppressed audio signal C with respect to the audio signal A. The delay correcting unit 8-1 measures the delay time of the noise-suppressed speech signal C by obtaining the time when the short-time cross-correlation coefficient between the speech signal A and the noise-suppressed speech signal C is maximized. The delay correction unit 8-2-1 delays the audio signal A by the delay time measured by the delay correction unit 8-1, thereby outputting the audio signal A ′ whose time is synchronized with the noise suppression audio signal C. To do. The delay correcting unit 8-2-2 delays the noise superimposed speech signal B by the delay time measured by the delay correcting unit 8-1, so that the noise superimposed speech signal in which the time is synchronized with the noise suppression speech signal C. B 'is output.

音声区間検出部7は、音声信号A’を一定の短区間(20ms)ごとに分けて、各区間が音声の存在する音声区間か無音声の区間かをVAD(Voice Activity Detection)を用いて区間ごとに判別する。音声区間検出部7は、このようにして得られた音声信号A’の各区間の種別情報(音声区間又は無音性区間)をスイッチ制御部9に送信する。   The voice section detection unit 7 divides the voice signal A ′ into fixed short sections (20 ms), and uses VAD (Voice Activity Detection) to determine whether each section is a voice section where voice is present or a voiceless section. Determine for each. The voice section detector 7 transmits the type information (voice section or silent section) of each section of the voice signal A ′ obtained in this way to the switch controller 9.

スイッチ制御部9は、音声区間検出部7から通知された音声信号A’の各区間の種別を基にスイッチ20〜22を制御する。
図2は、音声信号A’の短区間x1が音声区間と判別されたときのスイッチ制御を示す図である。スイッチ制御部9は、音声信号A’の短区間x1が音声区間であることを示す種別情報が入力された場合、スイッチ20を切り換えて、音声信号A’の短区間x1が音声連結部10−1に入力されるようにすると同時に、スイッチ21を切り換えて、雑音抑圧音声信号Cの短区間が音声連結部10−3に入力されるようにする。雑音抑圧音声信号Cは音声信号A’と同期しているため、音声信号A’の短区間x1に対応した雑音抑圧音声信号Cの短区間が音声連結部10−3に入力されることになる。また、スイッチ制御部9は、スイッチ22を制御して、音声連結部10−2への信号入力を無入力とする。
The switch control unit 9 controls the switches 20 to 22 based on the type of each section of the audio signal A ′ notified from the audio section detection unit 7.
FIG. 2 is a diagram illustrating switch control when the short interval x1 of the audio signal A ′ is determined as the audio interval. When the type information indicating that the short interval x1 of the audio signal A ′ is an audio interval is input, the switch control unit 9 switches the switch 20 so that the short interval x1 of the audio signal A ′ is the audio connection unit 10−. At the same time, the switch 21 is switched so that the short section of the noise-suppressed speech signal C is input to the speech linking unit 10-3. Since the noise-suppressed audio signal C is synchronized with the audio signal A ′, the short interval of the noise-suppressed audio signal C corresponding to the short interval x1 of the audio signal A ′ is input to the audio connecting unit 10-3. . Further, the switch control unit 9 controls the switch 22 so that no signal is input to the voice connection unit 10-2.

図3は、音声信号A’の短区間x2が無音声区間と判別されたときのスイッチ制御を示す図である。スイッチ制御部9は、音声信号A’の短区間x2が無音声区間であることを示す種別情報が入力された場合、スイッチ21を切り換えて、雑音抑圧音声信号Cの短区間が音声連結部10−4に入力されるようにすると同時に、スイッチ22を切り換えて、雑音重畳音声信号B’の短区間が音声連結部10−2に入力されるようにする。雑音重畳音声信号B’および雑音抑圧音声信号Cは音声信号A’と同期しているため、音声信号A’の短区間x2に対応した雑音重畳音声信号B’の短区間が音声連結部10−2に入力され、短区間x2に対応した雑音抑圧音声信号Cの短区間が音声連結部10−4に入力されることになる。また、スイッチ制御部9は、スイッチ20を制御して、音声連結部10−1への信号入力を無入力とする。   FIG. 3 is a diagram illustrating switch control when the short section x2 of the audio signal A ′ is determined to be a non-voice section. When the type information indicating that the short interval x2 of the audio signal A ′ is a non-audio interval is input, the switch control unit 9 switches the switch 21 so that the short interval of the noise-suppressed audio signal C is the audio connection unit 10. At the same time, the switch 22 is switched so that the short section of the noise superimposed audio signal B ′ is input to the audio connecting unit 10-2. Since the noise superimposed audio signal B ′ and the noise-suppressed audio signal C are synchronized with the audio signal A ′, the short interval of the noise superimposed audio signal B ′ corresponding to the short interval x2 of the audio signal A ′ 2 and the short section of the noise-suppressed speech signal C corresponding to the short section x2 is input to the speech linking unit 10-4. Further, the switch control unit 9 controls the switch 20 so that no signal is input to the voice connection unit 10-1.

音声連結部10−1は、最初に入力された短区間の信号を記憶し、以降は短区間の信号が入力される度に、入力された短区間の信号を現在記憶している最新の信号の後ろに連結して、この連結した信号を新たに記憶する。音声連結部10−1は、入力される全ての音声信号A’の短区間を以上のように連結して記憶する。音声連結部10−2,10−3,10−4は、それぞれ同様に入力される短区間の信号を連結して記憶する。   The voice linking unit 10-1 stores the first input signal of the short section, and thereafter, every time the short section signal is input, the latest signal that currently stores the input short section signal. The connected signal is newly stored after the connection. The voice connection unit 10-1 stores and stores the short sections of all input voice signals A 'as described above. The voice coupling units 10-2, 10-3, and 10-4 concatenately store short-period signals that are similarly input.

これにより、音声連結部10−1には音声信号A’の全ての音声短区間を連結した音声信号aが記憶され、音声連結部10−2には音声信号A’の全ての無音声短区間に対応した雑音重畳音声信号B’の短区間を連結した雑音信号bが記憶され、音声連結部10−3には音声信号A’の全ての音声短区間に対応した雑音抑圧音声信号Cの短区間を連結した音声信号c1が記憶され、音声連結部10−4には音声信号A’の全ての無音声短区間に対応した雑音抑圧音声信号Cの短区間を連結した雑音信号c2が記憶される。音声信号A’、雑音重畳音声信号B’および雑音抑圧音声信号Cと、音声信号a,c1および雑音信号b,c2との関係を図4に示す。なお、図4の縦軸は信号レベル、横軸は時間である。   As a result, the voice connection unit 10-1 stores the voice signal a obtained by connecting all the short voice sections of the voice signal A ′, and the voice connection section 10-2 stores all the voiceless short sections of the voice signal A ′. The noise signal b obtained by concatenating the short sections of the noise-superimposed speech signal B ′ corresponding to is stored, and the speech concatenation unit 10-3 stores the short of the noise-suppressed speech signal C corresponding to all the speech short sections of the speech signal A ′. The voice signal c1 in which the sections are connected is stored, and the noise signal c2 in which the short sections of the noise-suppressed speech signal C corresponding to all the voiceless short sections of the voice signal A ′ are stored in the voice connecting unit 10-4. The FIG. 4 shows the relationship among the audio signal A ′, the noise superimposed audio signal B ′, and the noise-suppressed audio signal C, and the audio signals a and c1 and the noise signals b and c2. In FIG. 4, the vertical axis represents signal level and the horizontal axis represents time.

音声歪み測定部11には、音声信号a,c1が入力される。音声歪み測定部11は、音声信号aとc1とを比較することにより、雑音抑圧音声信号Cの音声歪みを測定する。本実施の形態では、音声の比較に公知のPESQを用い、音声信号aを参照信号、音声信号c1を劣化音声信号として音声歪みの測定を行う。PESQでは歪みの特徴量をPESQ値として出力する。音声歪み測定部11は、測定したPESQ値を音声歪み量x1として音声品質推定部15に送信する。 Audio signals a and c1 are input to the audio distortion measurement unit 11. The audio distortion measurement unit 11 measures the audio distortion of the noise-suppressed audio signal C by comparing the audio signals a and c1. In this embodiment, a known PESQ is used for voice comparison, and voice distortion is measured using the voice signal a as a reference signal and the voice signal c1 as a degraded voice signal. PESQ outputs a distortion feature value as a PESQ value. The audio distortion measurement unit 11 transmits the measured PESQ value as the audio distortion amount x 1 to the audio quality estimation unit 15.

雑音歪み測定部12には、雑音信号b,c2が入力される。雑音歪み測定部12は、雑音信号bとc2とを比較することにより、雑音抑圧音声信号Cの雑音歪みを測定する。本実施の形態では、音声歪みの測定のときと同様に、音声の比較に公知のPESQを用い、雑音信号bを参照信号、雑音信号c2を劣化音声信号として測定を行う。雑音歪み測定部12は、測定したPESQ値を雑音歪み量x2として音声品質推定部15に送信する。 Noise signals b and c2 are input to the noise distortion measurement unit 12. The noise distortion measurement unit 12 measures the noise distortion of the noise-suppressed speech signal C by comparing the noise signals b and c2. In the present embodiment, as in the case of measuring the audio distortion, a known PESQ is used for audio comparison, and the noise signal b is used as a reference signal and the noise signal c2 is used as a deteriorated audio signal. The noise distortion measurement unit 12 transmits the measured PESQ value as the noise distortion amount x 2 to the voice quality estimation unit 15.

音量測定部13には、音声信号c1が入力される。音量測定部13は、音声信号c1の音量を測定することにより、雑音抑圧音声信号Cの音量を測定する。本実施の形態では、音量の測定にISO532で規格化された方法を用いる。音量測定部13は、測定した音量x3の値を音声品質推定部15に送信する。 A sound signal c <b> 1 is input to the volume measuring unit 13. The volume measuring unit 13 measures the volume of the noise-suppressed audio signal C by measuring the volume of the audio signal c1. In this embodiment, a method standardized by ISO 532 is used for measuring the volume. The volume measuring unit 13 transmits the value of the measured volume x 3 to the voice quality estimating unit 15.

雑音量測定部14には、雑音信号c2が入力される。雑音量測定部14は、雑音信号c2の音量を測定することにより、雑音抑圧音声信号Cの雑音量を測定する。本実施の形態では、音量の測定のときと同様に、測定にISO532で規格化された方法を用いる。雑音量測定部14は、測定した雑音量x4の値を音声品質推定部15に送信する。 A noise signal c <b> 2 is input to the noise amount measurement unit 14. The noise amount measurement unit 14 measures the noise amount of the noise-suppressed speech signal C by measuring the volume of the noise signal c2. In the present embodiment, a method standardized by ISO 532 is used for measurement, as in the case of measuring the volume. The noise amount measurement unit 14 transmits the value of the measured noise amount x 4 to the voice quality estimation unit 15.

音声品質推定部15は、音声歪み測定部11、雑音歪み測定部12、音量測定部13および雑音量測定部14から入力された雑音抑圧音声信号Cの音声歪み量、雑音歪み量、音量および雑音量を基に、雑音抑圧音声信号Cの主観品質を推定し、この主観品質の推定値を音声品質出力部16へ送信する。音声品質推定部15では、例えば以下の方法によって求めた推定式を用いて主観品質を推定することができる。   The voice quality estimation unit 15 includes a voice distortion amount, a noise distortion amount, a volume, and a noise of the noise suppression voice signal C input from the voice distortion measurement unit 11, the noise distortion measurement unit 12, the volume measurement unit 13, and the noise amount measurement unit 14. Based on the amount, the subjective quality of the noise-suppressed speech signal C is estimated, and the estimated value of the subjective quality is transmitted to the speech quality output unit 16. The voice quality estimation unit 15 can estimate the subjective quality using, for example, an estimation formula obtained by the following method.

まず、推定式を求めるために、音声歪み、雑音歪み、音量、雑音量に対して様々な特徴量を与えた音声サンプルを予め用意し、各音声サンプルに対して複数の被験者が5段階の絶対範疇尺度による主観品質評価を行う。この主観品質評価により得られた評価値の平均をMOS(Mean Opinion Score)値と呼ぶ。MOS値では、5点が非常に良く、1点が非常に悪いということを示している。   First, in order to obtain an estimation formula, voice samples in which various feature quantities are given to voice distortion, noise distortion, volume, and noise amount are prepared in advance. Subjective quality assessment based on category scale. The average of the evaluation values obtained by this subjective quality evaluation is called a MOS (Mean Opinion Score) value. In the MOS value, 5 points are very good and 1 point is very bad.

そして、各音声サンプルに対するMOS値を基に、音声歪みと雑音歪みと音量と雑音量の4つの品質要因の特徴量を変数として主観品質を推定する式を重回帰分析を用いて求めることで、以下のような式(1)を導出する。
y=α1・x1+α2・x2+α3・x3+α4・x4+α5 ・・・(1)
ここで、x1は音声歪み量、x2は雑音歪み量、x3は音量、x4は雑音量、yはMOS値(主観品質推定値)を表している。α1、α2、α3、α4、α5は定数である。音声品質推定部15は、式(1)を用いて主観品質の推定値を求める。
Then, based on the MOS value for each audio sample, by using multiple regression analysis to obtain an expression for estimating subjective quality using the variables of the four quality factors of audio distortion, noise distortion, volume, and noise amount as variables, The following equation (1) is derived.
y = α 1 · x 1 + α 2 · x 2 + α 3 · x 3 + α 4 · x 4 + α 5 (1)
Here, x 1 represents the amount of voice distortion, x 2 represents the amount of noise distortion, x 3 represents the volume, x 4 represents the amount of noise, and y represents the MOS value (subjective quality estimate). α 1 , α 2 , α 3 , α 4 and α 5 are constants. The voice quality estimation unit 15 obtains an estimated value of subjective quality using Expression (1).

音声品質出力部16は、音声品質推定部15から入力された雑音抑圧音声信号Cの主観品質の推定値を、雑音抑圧音声品質推定装置1の出力値として出力する。   The speech quality output unit 16 outputs the subjective quality estimation value of the noise-suppressed speech signal C input from the speech quality estimation unit 15 as an output value of the noise-suppressed speech quality estimation device 1.

以上のように、本実施の形態では、従来の問題点を解決するために、雑音抑圧処理前と処理後の雑音を比較して、雑音抑圧音声の雑音歪みを検出する。このために、本実施の形態では、雑音抑圧処理前の雑音信号に関する情報として、抑圧処理前の雑音重畳音声信号を用いる。さらに、雑音抑圧処理後の雑音を得るために、雑音抑圧音声信号を音声区間と無音声区間に分ける。無音声区間における雑音抑圧処理前と処理後の雑音の差分より、雑音抑圧音声の雑音歪みを正確に検出することができる。また、本実施の形態では、無音声区間の雑音抑圧音声の音量を測定することで雑音量を検出する。さらに、本実施の形態では、雑音抑圧音声信号の音声区間における、音声と雑音抑圧音声の差分により音声歪みを検出し、この音声区間の雑音抑圧音声の音量を測定することで音量を検出する。   As described above, in this embodiment, in order to solve the conventional problems, the noise before noise suppression processing is compared with the noise after processing to detect noise distortion of noise-suppressed speech. For this reason, in this embodiment, a noise-superimposed speech signal before the suppression process is used as information regarding the noise signal before the noise suppression process. Furthermore, in order to obtain noise after noise suppression processing, the noise-suppressed voice signal is divided into a voice section and a non-voice section. The noise distortion of the noise-suppressed speech can be accurately detected from the difference between the noise before and after the noise suppression processing in the no-speech section. Further, in the present embodiment, the amount of noise is detected by measuring the volume of the noise-suppressed speech in the silent period. Furthermore, in the present embodiment, the sound distortion is detected from the difference between the sound and the noise-suppressed sound in the sound section of the noise-suppressed sound signal, and the sound volume is detected by measuring the sound volume of the noise-suppressed sound in the sound section.

このようにして検出した雑音抑圧音声の品質要因である音声歪み、雑音歪み、音量、雑音量から、雑音抑圧音声のユーザ体感品質を推定する。本実施の形態では、雑音抑圧音声の品質を推定するために、予め求めた推定式を用いる。この推定式は、各品質要因に対して様々な特徴量の雑音抑圧音声を用意し、主観品質評価実験によってそれぞれの主観品質評価値を取得して、取得した主観品質評価値と品質要因の特徴量の関係から導出したものである。   The user experience quality of the noise-suppressed speech is estimated from the speech distortion, noise distortion, volume, and noise amount that are the quality factors of the noise-suppressed speech detected in this way. In the present embodiment, an estimation equation obtained in advance is used to estimate the quality of noise-suppressed speech. This estimation formula prepares noise-reduced speech with various features for each quality factor, obtains each subjective quality assessment value through subjective quality assessment experiments, and obtains the subjective quality assessment value and the characteristics of the quality factor It is derived from the relationship of quantity.

こうして、本実施の形態では、従来の客観品質評価では不可能であった雑音歪みについて考慮した雑音抑圧音声の品質評価を容易に行うことが可能となる。これにより、本実施の形態では、従来技術よりもユーザ体感品質に近い推定を行うことができる。また、本実施の形態では、音声データベース部2の音声信号を実通話環境下で再生し、再生した音声を集音したときに得られる信号を雑音重畳音声信号とすることにより、実際の通話環境下で生じる雑音重畳音声信号を用いて、雑音抑圧音声のユーザ体感に即した品質推定を正確かつ容易に行うことが可能となる。   Thus, according to the present embodiment, it is possible to easily perform quality evaluation of noise-suppressed speech in consideration of noise distortion, which is impossible with conventional objective quality evaluation. Thereby, in this Embodiment, estimation close | similar to a user experience quality can be performed rather than a prior art. In the present embodiment, the voice signal of the voice database unit 2 is reproduced in an actual call environment, and a signal obtained when the reproduced voice is collected is used as a noise-superimposed voice signal. Using the noise superimposed speech signal generated below, it is possible to accurately and easily estimate the quality of the noise-suppressed speech in accordance with the user experience.

[第2の実施の形態]
以下、本発明の第2の実施の形態について図面を用いて説明する。図5は、本発明の第2の実施の形態に係る雑音抑圧音声品質推定装置の構成例を示すブロック図であり、図1と同様の構成には同一の符号を付してある。
本実施の形態においても、雑音抑圧音声品質推定装置の構成は第1の実施の形態とほぼ同様であるので、第1の実施の形態と異なる部分のみ説明する。
[Second Embodiment]
Hereinafter, a second embodiment of the present invention will be described with reference to the drawings. FIG. 5 is a block diagram showing a configuration example of a noise-suppressed speech quality estimation apparatus according to the second embodiment of the present invention, and the same components as those in FIG.
Also in the present embodiment, the configuration of the noise-suppressed speech quality estimation apparatus is almost the same as that of the first embodiment, so only the parts that are different from the first embodiment will be described.

まず、第1の実施の形態では、雑音量測定部14が雑音信号c2の音量を測定する際に、ISO532で規格化された方法を用いるとしたが、このISO532で規格化された方法を実施する際に、以下の方法を用いることもできる。雑音量測定部14は、一定時間t(ms)ごとに測定した音量B1,B2,・・・,Bnを式(2)に代入することで、突発的な雑音が体感品質に与える影響を考慮した雑音c2の音量x5[dB]を算出する。本実施の形態ではt=120、p=4とするが、この値に限定されるものではない。雑音量測定部14は、測定した音量x5を雑音c2の雑音量として音声品質推定部15に送信する。また、雑音量測定部14は、雑音量x5とは別に、ISO532で規格化された方法を用いて雑音c2の音量x4[dB]を算出する。雑音量測定部14は、測定した音量x4を音声歪み測定部11に送信する。 First, in the first embodiment, when the noise amount measurement unit 14 measures the volume of the noise signal c2, the method standardized by ISO 532 is used. However, the method standardized by ISO 532 is implemented. In doing so, the following method can also be used. The noise amount measurement unit 14 substitutes the sound volumes B 1 , B 2 ,..., B n measured every predetermined time t (ms) into the expression (2), so that sudden noise gives the quality of experience. The volume x 5 [dB] of the noise c2 considering the influence is calculated. In this embodiment, t = 120 and p = 4. However, the present invention is not limited to these values. The noise amount measurement unit 14 transmits the measured volume x 5 to the voice quality estimation unit 15 as the noise amount of the noise c2. In addition to the noise amount x 5 , the noise amount measurement unit 14 calculates the volume x 4 [dB] of the noise c 2 using a method standardized by ISO 532. The noise amount measurement unit 14 transmits the measured volume x 4 to the audio distortion measurement unit 11.

Figure 0004745916
Figure 0004745916

次に、第1の実施の形態では、音声歪み測定部11における音声の比較にPESQを用いたが、PESQに代えてWideband−PESQを用いてもよい。Wideband−PESQは公知のPESQの対象範囲を電話帯域から広帯域に拡張した技術であり、7kHz帯域までを考慮した音声歪みの評価が可能である。本実施の形態の場合、音声歪み測定部11には、音声a,c1の他に、音量測定部13から音量x3が入力され、雑音量測定部14から音量x4が入力される。 Next, in the first embodiment, PESQ is used for audio comparison in the audio distortion measurement unit 11, but Wideband-PESQ may be used instead of PESQ. Wideband-PESQ is a technology in which the target range of known PESQ is expanded from a telephone band to a wide band, and it is possible to evaluate voice distortion considering up to a 7 kHz band. In the case of the present embodiment, the sound distortion measuring unit 11 receives the sound volume x 3 from the sound volume measuring unit 13 and the sound volume x 4 from the noise amount measuring unit 14 in addition to the sounds a and c1.

音声歪み測定部11は、音声aと音声c1の比較により雑音抑圧音声Cの音声歪みを得る。本実施の形態では、Wideband−PESQを用い、音声aを参照信号、音声c1を劣化音声としてW−PESQ値x6’を算出する。しかし、Wideband−PESQは重畳する雑音も歪みとして捉えるため、W−PESQ値x6’に対して、音声区間の音声の音量x3と無音声区間の雑音の音量x4に基づいて補正を加える。W−PESQ値x6’、音量x3,x4を式(3)に代入することにより、W−PESQ値x6’を補正した値x6を得る。 The voice distortion measuring unit 11 obtains the voice distortion of the noise-suppressed voice C by comparing the voice a and the voice c1. In the present embodiment, Wideband-PESQ is used, and W-PESQ value x 6 ′ is calculated using speech a as a reference signal and speech c1 as degraded speech. However, since Wideband-PESQ also captures superimposed noise as distortion, the W-PESQ value x 6 ′ is corrected based on the sound volume x 3 in the voice section and the noise volume x 4 in the non-voice section. . By substituting the W-PESQ value x 6 ′ and the sound volumes x 3 and x 4 into Equation (3), a value x 6 obtained by correcting the W-PESQ value x 6 ′ is obtained.

Figure 0004745916
Figure 0004745916

式(3)は、α1と、x6’/(1−x6’/α2 α3(x3-x4))のうちどちらか小さい方を補正値x6とすることを意味している。ここでα1,α2,α3は定数である。本実施の形態ではα1=4.644、α2=3、α3=0.07としたが、この値に限定されるものではない。音声歪み測定部11は、測定した値x6を音声c1の音声歪み量として音声品質推定部15に送信する。 Equation (3) means that the smaller one of α 1 and x 6 ′ / (1−x 6 ′ / α 2 α3 (x3−x4) ) is set as the correction value x 6 . Here, α 1 , α 2 , and α 3 are constants. In this embodiment, α 1 = 4.644, α 2 = 3, and α 3 = 0.07, but the present invention is not limited to these values. The audio distortion measurement unit 11 transmits the measured value x 6 to the audio quality estimation unit 15 as the audio distortion amount of the audio c1.

次に、第1の実施の形態では、雑音歪み測定部12における音声の比較にPESQを用いたが、PESQに代えてWideband−PESQを用いてもよい。雑音歪み測定部12には、雑音b,c2が入力される。雑音歪み測定部12は、雑音bと雑音c2の比較により雑音抑圧音声Cの雑音歪みを得る。本実施の形態では、Wideband−PESQを用い、雑音bを参照信号、雑音c2を劣化音声としてW−PESQ値x7を算出する。雑音歪み測定部12は、算出したW−PESQ値x7を音声c1の雑音歪み量として音声品質推定部15に送信する。 Next, in the first embodiment, PESQ is used for voice comparison in the noise distortion measurement unit 12, but Wideband-PESQ may be used instead of PESQ. Noises b and c2 are input to the noise distortion measurement unit 12. The noise distortion measurement unit 12 obtains the noise distortion of the noise-suppressed speech C by comparing the noise b and the noise c2. In the present embodiment, Wideband-PESQ is used, and W-PESQ value x 7 is calculated using noise b as a reference signal and noise c2 as degraded speech. The noise distortion measurement unit 12 transmits the calculated W-PESQ value x 7 to the voice quality estimation unit 15 as the noise distortion amount of the voice c1.

音声の比較にPESQに代えてWideband−PESQを用いる場合、音声品質推定部15は、入力された音量x3、雑音量x5、音声歪み量x6、雑音歪み量x7をもとに、雑音抑圧音声信号Cの主観品質を推定する。音声品質推定部15は、式(4)に音量x3、雑音量x5、音声歪み量x6、雑音歪み量x7を代入することで値Qを算出する。 When Wideband-PESQ is used instead of PESQ for voice comparison, the voice quality estimation unit 15 uses the input volume x 3 , noise amount x 5 , voice distortion amount x 6 , and noise distortion amount x 7 as follows. The subjective quality of the noise-suppressed speech signal C is estimated. The voice quality estimation unit 15 calculates the value Q by substituting the volume x 3 , the noise amount x 5 , the voice distortion amount x 6 , and the noise distortion amount x 7 into Equation (4).

Figure 0004745916
Figure 0004745916

ただし、式(4)ではx3−x5≧10を制約条件とする。ここで、β1〜β11は定数である。本実施の形態では、β1=3.7、β2=0.215、β3=0.4、β4=3、β5=1.1、β6=0.9、β7=0.05、β8=0.2、β9=4、β10=0.0002、β11=24としたが、この値に限定されるものではない。音声品質推定部15は、算出した値Qを雑音抑圧音声信号Cの主観品質の推定値として音声品質出力機能16へ送信する。
他の構成は第1の実施の形態と同じである。こうして、本実施の形態においても、第1の実施の形態と同様の効果を得ることができる。
However, in Expression (4), x 3 −x 5 ≧ 10 is a constraint condition. Here, β 1 to β 11 are constants. In this embodiment, β 1 = 3.7, β 2 = 0.215, β 3 = 0.4, β 4 = 3, β 5 = 1.1, β 6 = 0.9, β 7 = 0. .05, β 8 = 0.2, β 9 = 4, β 10 = 0.0002, and β 11 = 24, but are not limited to these values. The voice quality estimation unit 15 transmits the calculated value Q to the voice quality output function 16 as an estimated value of the subjective quality of the noise-suppressed voice signal C.
Other configurations are the same as those of the first embodiment. Thus, also in this embodiment, the same effect as that of the first embodiment can be obtained.

[第3の実施の形態]
以下、本発明の第3の実施の形態について図面を用いて説明する。図6は、本発明の第3の実施の形態に係る雑音抑圧音声品質推定装置の構成例を示すブロック図であり、図1と同一の構成には同一の符号を付してある。
図6に示すように、雑音抑圧音声品質推定装置17は、音声データベース部2、音声録音部5、マイク6、音声区間検出部7、遅延補正部8−1,8−2−1,8−2−2、スイッチ制御部9、音声連結部10(10−1〜10−4)、音声歪み測定部11、雑音歪み測定部12、音量測定部13、雑音量測定部14、音声品質推定部15、音声品質出力部16、音声加算部18、スイッチ20〜22を備えている。
[Third Embodiment]
The third embodiment of the present invention will be described below with reference to the drawings. FIG. 6 is a block diagram showing a configuration example of a noise-suppressed speech quality estimation apparatus according to the third embodiment of the present invention, and the same components as those in FIG.
As shown in FIG. 6, the noise-suppressed speech quality estimation device 17 includes a speech database unit 2, a speech recording unit 5, a microphone 6, a speech segment detection unit 7, and delay correction units 8-1, 8-2-1, 8-. 2-2, switch control unit 9, voice linking unit 10 (10-1 to 10-4), voice distortion measurement unit 11, noise distortion measurement unit 12, volume measurement unit 13, noise amount measurement unit 14, voice quality estimation unit 15, a voice quality output unit 16, a voice addition unit 18, and switches 20 to 22 are provided.

本実施の形態では、通話環境下においてマイク6が収音して音声録音部5が取得する実環境雑音を雑音信号Bとする。音声録音部5は、雑音信号Bを音声加算部18と遅延補正部8−2−2に入力する。
音声加算部18は、音声データベース部2から取得した音声信号Aと雑音信号Bとを入力とし、音声信号Aと雑音信号Bとを加算して雑音重畳音声信号を生成する。この雑音重畳音声信号を評価対象装置100に入力することで、評価対象装置100から出力される雑音抑圧処理された音声信号をCとする。
In the present embodiment, the real environment noise acquired by the microphone 6 and acquired by the voice recording unit 5 in the call environment is referred to as a noise signal B. The voice recording unit 5 inputs the noise signal B to the voice addition unit 18 and the delay correction unit 8-2-2.
The audio adder 18 receives the audio signal A and the noise signal B acquired from the audio database unit 2 and adds the audio signal A and the noise signal B to generate a noise superimposed audio signal. By inputting the noise-superimposed speech signal to the evaluation target device 100, the speech signal subjected to noise suppression processing output from the evaluation target device 100 is set as C.

遅延補正部8−1は、第1の実施の形態と同様に、音声信号Aと雑音抑圧音声信号Cとを入力とし、雑音抑圧音声信号Cの音声信号Aに対する遅延時間を測定する。遅延補正部8−2−1は、音声信号Aに遅延補正部8−1で測定された遅延時間分だけ遅延を与えることにより、雑音抑圧音声信号Cと時刻が同期した音声信号A’を出力する。遅延補正部8−2−2は、雑音信号Bに遅延補正部8−1で測定された遅延時間分だけ遅延を与えることにより、雑音抑圧音声信号Cと時刻が同期した雑音信号B’を出力する。
音声区間検出部7は、第1の実施の形態と同様に、音声信号A’を短区間に分けて、各区間の種別情報をスイッチ制御部9に送信する。
Similarly to the first embodiment, the delay correction unit 8-1 receives the audio signal A and the noise-suppressed audio signal C, and measures the delay time of the noise-suppressed audio signal C with respect to the audio signal A. The delay correction unit 8-2-1 delays the audio signal A by the delay time measured by the delay correction unit 8-1, thereby outputting the audio signal A ′ whose time is synchronized with the noise suppression audio signal C. To do. The delay correcting unit 8-2-2 gives the noise signal B a delay corresponding to the delay time measured by the delay correcting unit 8-1, thereby outputting a noise signal B ′ whose time is synchronized with the noise-suppressed voice signal C. To do.
As in the first embodiment, the voice section detection unit 7 divides the voice signal A ′ into short sections and transmits type information of each section to the switch control unit 9.

スイッチ制御部9と音声連結部10は、第1の実施の形態と同様の処理を行う。ただし、スイッチ22を介して音声連結部10−2に入力される信号は、雑音重畳音声信号ではなく雑音信号B’である。これにより、音声連結部10−2には音声信号A’の全ての無音声短区間に対応した雑音信号B’の短区間を連結した雑音信号bが記憶される。それ以外は実施形態1と同様に、音声連結部10−1には音声信号A’の全ての音声短区間を連結した音声信号aが記憶され、音声連結部10−3には音声信号A’の全ての音声短区間に対応した雑音抑圧音声信号Cの短区間を連結した音声信号c1が記憶され、音声連結部10−4には音声信号A’の全ての無音声短区間に対応した雑音抑圧音声信号Cの短区間を連結した雑音信号c2が記憶される。   The switch control unit 9 and the voice connection unit 10 perform the same processing as in the first embodiment. However, the signal input to the audio connecting unit 10-2 via the switch 22 is not the noise superimposed audio signal but the noise signal B '. As a result, the noise signal b obtained by connecting the short sections of the noise signal B ′ corresponding to all the non-voice short sections of the voice signal A ′ is stored in the voice connecting unit 10-2. Other than that, as in the first embodiment, the audio connection unit 10-1 stores the audio signal a obtained by connecting all the audio short sections of the audio signal A ′, and the audio connection unit 10-3 stores the audio signal A ′. The speech signal c1 obtained by concatenating the short sections of the noise-suppressed speech signal C corresponding to all the speech short sections is stored, and the speech concatenation unit 10-4 stores noise corresponding to all the speechless sections of the speech signal A ′. A noise signal c2 obtained by connecting short sections of the suppressed speech signal C is stored.

音声歪み測定部11、雑音歪み測定部12、音量測定部13、雑音量測定部14、音声品質推定部15及び音声品質出力部16は、第1の実施の形態と同様の処理を行う。
以上の構成により、本実施の形態では、第1の実施の形態と同様の効果を得ることができる。
The audio distortion measurement unit 11, the noise distortion measurement unit 12, the volume measurement unit 13, the noise amount measurement unit 14, the audio quality estimation unit 15, and the audio quality output unit 16 perform the same processing as in the first embodiment.
With the above configuration, the present embodiment can obtain the same effects as those of the first embodiment.

[第4の実施の形態]
以下、本発明の第4の実施の形態について図面を用いて説明する。図7は、本発明の第4の実施の形態に係る雑音抑圧音声品質推定装置の構成例を示すブロック図であり、図1と同一の構成には同一の符号を付してある。
本実施の形態の雑音抑圧音声品質推定装置19は、第3の実施の形態の雑音抑圧音声品質推定装置17における音声録音部5とマイク6の代わりに、雑音データベース部23を用いたものである。
[Fourth Embodiment]
The fourth embodiment of the present invention will be described below with reference to the drawings. FIG. 7 is a block diagram showing a configuration example of a noise-suppressed speech quality estimation apparatus according to the fourth embodiment of the present invention. The same reference numerals are given to the same configurations as those in FIG.
The noise-suppressed speech quality estimation device 19 of the present embodiment uses a noise database unit 23 instead of the speech recording unit 5 and the microphone 6 in the noise-suppressed speech quality estimation device 17 of the third embodiment. .

本実施の形態では、雑音データベース部23に予め登録されている雑音信号をBとする。この雑音信号Bが音声加算部18と遅延補正部8−2−2に入力される。以降の動作は第3の実施の形態と同じである。
以上の構成により、本実施の形態では、第1の実施の形態と同様の効果を得ることができる。さらに、本実施の形態では、雑音信号Bとして雑音データベース部23に予め登録されている信号を用いるため、様々な雑音環境下における雑音抑圧音声のユーザ体感に即した品質推定を正確かつ容易に行うことが可能となる。
In the present embodiment, it is assumed that a noise signal registered in advance in the noise database unit 23 is B. This noise signal B is input to the voice adder 18 and the delay corrector 8-2-2. Subsequent operations are the same as those in the third embodiment.
With the above configuration, the present embodiment can obtain the same effects as those of the first embodiment. Furthermore, in the present embodiment, since a signal registered in advance in the noise database unit 23 is used as the noise signal B, the quality estimation according to the user experience of the noise-suppressed speech under various noise environments is performed accurately and easily. It becomes possible.

なお、第3、第4の実施の形態を第2の実施の形態に適用してもよいことは言うまでもない。
また、第1〜第4の実施の形態の雑音抑圧音声品質推定装置は、CPU、記憶装置および外部とのインタフェースを備えたコンピュータとこれらのハードウェア資源を制御するプログラムによって実現することができる。このようなコンピュータにおいて、本発明の雑音抑圧音声品質推定方法を実現させるための雑音抑圧音声品質推定プログラムは、フレキシブルディスク、CD−ROM、DVD−ROM、メモリカードなどの記録媒体に記録された状態で提供される。CPUは、記録媒体から読み込んだプログラムを記憶装置に書き込み、プログラムに従って第1〜第4の実施の形態で説明した処理を実行する。
Needless to say, the third and fourth embodiments may be applied to the second embodiment.
The noise-suppressed speech quality estimation apparatus according to the first to fourth embodiments can be realized by a computer having a CPU, a storage device, and an external interface, and a program for controlling these hardware resources. In such a computer, the noise-suppressed speech quality estimation program for realizing the noise-suppressed speech quality estimation method of the present invention is recorded on a recording medium such as a flexible disk, a CD-ROM, a DVD-ROM, or a memory card. Provided in. The CPU writes the program read from the recording medium into the storage device, and executes the processes described in the first to fourth embodiments according to the program.

本発明は、音声品質の評価技術に適用することができる。   The present invention can be applied to a voice quality evaluation technique.

本発明の第1の実施の形態に係る雑音抑圧音声品質推定装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the noise suppression audio | voice quality estimation apparatus which concerns on the 1st Embodiment of this invention. 本発明の第1の実施の形態において音声信号の短区間が音声区間と判別されたときのスイッチ制御を示す図である。It is a figure which shows switch control when the short area of an audio | voice signal is discriminate | determined as an audio | voice area in the 1st Embodiment of this invention. 本発明の第1の実施の形態において音声信号の短区間が無音声区間と判別されたときのスイッチ制御を示す図である。It is a figure which shows switch control when the short area of an audio | voice signal is discriminate | determined as a non-voice area in the 1st Embodiment of this invention. 本発明の第1の実施の形態における音声信号、雑音重畳音声信号および雑音抑圧音声信号と、これらの信号を短区間に区切って連結した後の音声信号および雑音信号との関係を示す波形図である。FIG. 4 is a waveform diagram showing the relationship between the audio signal, the noise superimposed audio signal, and the noise-suppressed audio signal in the first embodiment of the present invention, and the audio signal and the noise signal after connecting these signals in a short section. is there. 本発明の第2の実施の形態に係る雑音抑圧音声品質推定装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the noise suppression audio | voice quality estimation apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第3の実施の形態に係る雑音抑圧音声品質推定装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the noise suppression speech quality estimation apparatus which concerns on the 3rd Embodiment of this invention. 本発明の第4の実施の形態に係る雑音抑圧音声品質推定装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the noise suppression speech quality estimation apparatus which concerns on the 4th Embodiment of this invention. PESQを用いた無雑音音声信号の品質評価系の構成例を示すブロック図である。It is a block diagram which shows the structural example of the quality evaluation system of a noiseless audio | voice signal using PESQ. PESQを用いた雑音重畳音声信号の品質評価系の構成例を示すブロック図である。It is a block diagram which shows the structural example of the quality evaluation system of a noise superimposed audio | voice signal using PESQ.

符号の説明Explanation of symbols

1,17,19…雑音抑圧音声品質推定装置、2…音声データベース部、3…音声再生部、4…スピーカ、5…音声録音部、6…マイク、7…音声区間検出部、8−1,8−2−1,8−2−2…遅延補正部、9…スイッチ制御部、10…音声連結部、11…音声歪み測定部、12…雑音歪み測定部、13…音量測定部、14…雑音量測定部、15…音声品質推定部、16…音声品質出力部、18…音声加算部、20,21,22…スイッチ、23…雑音データベース部。   DESCRIPTION OF SYMBOLS 1,17,19 ... Noise suppression voice quality estimation apparatus, 2 ... Voice database part, 3 ... Voice reproduction part, 4 ... Speaker, 5 ... Voice recording part, 6 ... Microphone, 7 ... Voice section detection part, 8-1, 8-2-1, 8-2-2 ... delay correction unit, 9 ... switch control unit, 10 ... audio connection unit, 11 ... audio distortion measurement unit, 12 ... noise distortion measurement unit, 13 ... volume measurement unit, 14 ... Noise amount measurement unit, 15 ... voice quality estimation unit, 16 ... voice quality output unit, 18 ... voice addition unit, 20, 21, 22 ... switch, 23 ... noise database unit.

Claims (6)

雑音抑圧音声の品質を客観的に推定する雑音抑圧音声品質推定装置であって、
評価対象となる雑音抑圧処理装置への入力として雑音重畳音声信号を与えたときに、前記雑音抑圧処理装置から出力される雑音抑圧音声信号の品質要因の特徴量を検出する検出手段と、
この検出手段により検出された特徴量に基づいて前記雑音抑圧音声信号の品質を推定する推定手段とを備え、
前記検出手段は、
前記雑音抑圧音声信号を一定時間で区切ったときの各区間が音声区間か無音声区間かを判別する判別手段と、
音声区間における前記雑音抑圧音声信号の品質要因の特徴量を検出する音声区間特徴量検出手段と、
無音声区間における前記雑音抑圧音声信号の品質要因の特徴量を検出する無音声区間特徴量検出手段とを備え、
前記音声区間特徴量検出手段は、
雑音が重畳される前の音声信号とこれに対応する前記音声区間における雑音抑圧音声信号とを比較することにより、前記雑音抑圧音声信号の品質要因の特徴量として音声歪みを検出する音声歪み測定部と、
前記雑音抑圧音声信号の品質要因の特徴量として前記音声区間における雑音抑圧音声信号の音量を検出する音量測定部とを有し、
前記無音声区間特徴量検出手段は、
前記無音声区間における雑音抑圧音声信号とこれに対応する前記雑音重畳音声信号又はこの雑音重畳音声信号の元となる雑音信号とを比較することにより、前記雑音抑圧音声信号の品質要因の特徴量として雑音歪みを検出する雑音歪み測定部と、
前記雑音抑圧音声信号の品質要因の特徴量として前記無音声区間における雑音抑圧音声信号の雑音量を検出する雑音量測定部とを有し、
前記雑音歪みは、前記無音声区間における雑音抑圧音声信号を劣化音声信号、これに対応する前記雑音重畳音声信号又はこの雑音重畳音声信号の元となる雑音信号を参照信号としたときのPESQ値もしくはWideband−PESQ値であり、
前記推定手段は、前記音声歪みと前記雑音抑圧音声信号の音量と前記雑音歪みと前記雑音抑圧音声信号の雑音量とに基づいて前記雑音抑圧音声信号の品質を推定することを特徴とする雑音抑圧音声品質推定装置。
A noise-suppressed speech quality estimation device that objectively estimates the quality of noise-suppressed speech,
Detecting means for detecting a feature quantity of a quality factor of a noise-suppressed speech signal output from the noise suppression processing device when a noise-superimposed speech signal is given as an input to the noise suppression processing device to be evaluated;
An estimation unit that estimates the quality of the noise-suppressed speech signal based on the feature amount detected by the detection unit;
The detection means includes
Discriminating means for discriminating whether each section when the noise-suppressed speech signal is divided at a certain time is a speech section or a silent section;
Speech section feature amount detecting means for detecting a feature amount of a quality factor of the noise-suppressed speech signal in a speech section;
A voiceless section feature quantity detecting means for detecting a feature quantity of a quality factor of the noise-suppressed voice signal in a voiceless section,
The voice section feature amount detection means includes:
A speech distortion measuring unit that detects speech distortion as a feature quantity of the quality factor of the noise-suppressed speech signal by comparing the speech signal before noise is superimposed with the noise-suppressed speech signal in the speech section corresponding to the speech signal When,
A volume measuring unit that detects the volume of the noise-suppressed speech signal in the speech section as a feature quantity of the quality factor of the noise-suppressed speech signal;
The silent section feature quantity detecting means is
By comparing the noise-suppressed speech signal in the no-speech section with the corresponding noise-superimposed speech signal or the noise signal that is the basis of this noise-superposed speech signal, the characteristic amount of the quality factor of the noise-suppressed speech signal A noise distortion measurement unit for detecting noise distortion;
A noise amount measuring unit that detects a noise amount of the noise-suppressed speech signal in the no-speech interval as a feature amount of the quality factor of the noise-suppressed speech signal;
The noise distortion is a PESQ value obtained when the noise-suppressed speech signal in the silent period is a degraded speech signal, the noise superimposed speech signal corresponding to the noise suppressed speech signal or the noise signal that is the basis of the noise superimposed speech signal is a reference signal, or Wideband-PESQ value,
The estimation means estimates the quality of the noise-suppressed speech signal based on the speech distortion, the volume of the noise-suppressed speech signal, the noise distortion, and the amount of noise of the noise-suppressed speech signal. Voice quality estimation device.
請求項1記載の雑音抑圧音声品質推定装置において、
さらに、音声信号が予め登録された音声データベースと、
この音声データベースの音声信号を実通話環境下で再生する音声再生手段と、
前記再生された音声を集音したときに得られる信号を前記雑音重畳音声信号として出力する音声録音手段とを備えることを特徴とする雑音抑圧音声品質推定装置。
The noise-suppressed speech quality estimation apparatus according to claim 1,
Furthermore, an audio database in which audio signals are registered in advance,
Voice playback means for playing back the voice signal of the voice database in an actual call environment;
An apparatus for estimating a noise-suppressed voice quality , comprising: voice recording means for outputting a signal obtained when the reproduced voice is collected as the noise-superimposed voice signal .
請求項記載の雑音抑圧音声品質推定装置において、
さらに、音声信号が予め登録された音声データベースと、
実通話環境下で雑音信号を集音する音声録音手段と、
前記音声データベースの音声信号と前記音声録音手段が集音した雑音信号とを加算した信号を前記雑音重畳音声信号として出力する音声加算手段とを備えることを特徴とする雑音抑圧音声品質推定装置。
The noise-suppressed speech quality estimation apparatus according to claim 1 ,
Furthermore, an audio database in which audio signals are registered in advance,
A voice recording means for collecting noise signals in an actual call environment;
A noise-suppressed speech quality estimation apparatus comprising speech adding means for outputting a signal obtained by adding the speech signal of the speech database and the noise signal collected by the speech recording means as the noise superimposed speech signal .
請求項記載の雑音抑圧音声品質推定装置において、
さらに、音声信号が予め登録された音声データベースと、
雑音信号が予め登録された雑音データベースと、
前記音声データベースの音声信号と前記雑音データベースの雑音信号とを加算した信号を前記雑音重畳音声信号として出力する音声加算手段とを備えることを特徴とする雑音抑圧音声品質推定装置。
The noise-suppressed speech quality estimation apparatus according to claim 1 ,
Furthermore, an audio database in which audio signals are registered in advance,
A noise database in which noise signals are registered in advance;
A noise-suppressed speech quality estimation apparatus comprising speech adding means for outputting a signal obtained by adding a speech signal of the speech database and a noise signal of the noise database as the noise superimposed speech signal .
雑音抑圧音声の品質を客観的に推定する雑音抑圧音声品質推定方法であって、
評価対象となる雑音抑圧処理装置への入力として雑音重畳音声信号を与えたときに、前記雑音抑圧処理装置から出力される雑音抑圧音声信号の品質要因の特徴量を検出する検出手順と、
この検出手順により検出された特徴量に基づいて前記雑音抑圧音声信号の品質を推定する推定手順とを備え、
前記検出手順は、
前記雑音抑圧音声信号を一定時間で区切ったときの各区間が音声区間か無音声区間かを判別する判別手順と、
音声区間における前記雑音抑圧音声信号の品質要因の特徴量を検出する音声区間特徴量検出手順と、
無音声区間における前記雑音抑圧音声信号の品質要因の特徴量を検出する無音声区間特徴量検出手順とからなり、
前記音声区間特徴量検出手順は、
雑音が重畳される前の音声信号とこれに対応する前記音声区間における雑音抑圧音声信号とを比較することにより、前記雑音抑圧音声信号の品質要因の特徴量として音声歪みを検出する音声歪み測定手順と、
前記雑音抑圧音声信号の品質要因の特徴量として前記音声区間における雑音抑圧音声信号の音量を検出する音量測定手順とからなり、
前記無音声区間特徴量検出手順は、
前記無音声区間における雑音抑圧音声信号とこれに対応する前記雑音重畳音声信号又はこの雑音重畳音声信号の元となる雑音信号とを比較することにより、前記雑音抑圧音声信号の品質要因の特徴量として雑音歪みを検出する雑音歪み測定手順と、
前記雑音抑圧音声信号の品質要因の特徴量として前記無音声区間における雑音抑圧音声信号の雑音量を検出する雑音量測定手順とからなり、
前記雑音歪みは、前記無音声区間における雑音抑圧音声信号を劣化音声信号、これに対応する前記雑音重畳音声信号又はこの雑音重畳音声信号の元となる雑音信号を参照信号としたときのPESQ値もしくはWideband−PESQ値であり、
前記推定手順は、前記音声歪みと前記雑音抑圧音声信号の音量と前記雑音歪みと前記雑音抑圧音声信号の雑音量とに基づいて前記雑音抑圧音声信号の品質を推定することを特徴とする雑音抑圧音声品質推定方法
A noise suppression speech quality estimation method for objectively estimating the quality of noise suppression speech,
A detection procedure for detecting a feature quantity of a quality factor of a noise-suppressed speech signal output from the noise suppression processing device when a noise-superimposed speech signal is given as an input to the noise suppression processing device to be evaluated;
An estimation procedure for estimating the quality of the noise-suppressed speech signal based on the feature amount detected by the detection procedure,
The detection procedure includes:
A determination procedure for determining whether each section when the noise-suppressed speech signal is divided at a certain time is a speech section or a silent section;
A speech section feature amount detection procedure for detecting a feature amount of a quality factor of the noise-suppressed speech signal in a speech section;
A silent section feature amount detection procedure for detecting a feature amount of a quality factor of the noise-suppressed speech signal in a silent section,
The speech segment feature amount detection procedure includes:
A speech distortion measurement procedure for detecting speech distortion as a feature quantity of the quality factor of the noise-suppressed speech signal by comparing the speech signal before the noise is superimposed with the noise-suppressed speech signal corresponding to the speech section. When,
The volume measurement procedure for detecting the volume of the noise-suppressed speech signal in the speech section as a feature quantity of the quality factor of the noise-suppressed speech signal,
The silent section feature amount detection procedure includes:
By comparing the noise-suppressed speech signal in the no-speech section with the corresponding noise-superimposed speech signal or the noise signal that is the basis of this noise-superposed speech signal, the characteristic amount of the quality factor of the noise-suppressed speech signal Noise distortion measurement procedure for detecting noise distortion,
A noise amount measurement procedure for detecting a noise amount of the noise-suppressed speech signal in the no-speech interval as a feature amount of the quality factor of the noise-suppressed speech signal,
The noise distortion is a PESQ value obtained when the noise-suppressed speech signal in the silent period is a degraded speech signal, the noise superimposed speech signal corresponding to the noise suppressed speech signal or the noise signal that is the basis of the noise superimposed speech signal is a reference signal, or Wideband-PESQ value,
The estimation procedure estimates the quality of the noise-suppressed speech signal based on the speech distortion, the volume of the noise-suppressed speech signal, the noise distortion, and the noise amount of the noise-suppressed speech signal. Speech quality estimation method .
雑音抑圧音声の品質を客観的に推定する雑音抑圧音声品質推定装置としてコンピュータを動作させる雑音抑圧音声品質推定プログラムであって、
評価対象となる雑音抑圧処理装置への入力として雑音重畳音声信号を与えたときに、前記雑音抑圧処理装置から出力される雑音抑圧音声信号の品質要因の特徴量を検出する検出手順と、
この検出手順により検出された特徴量に基づいて前記雑音抑圧音声信号の品質を推定する推定手順とを前記コンピュータに実行させ、
前記検出手順は、
前記雑音抑圧音声信号を一定時間で区切ったときの各区間が音声区間か無音声区間かを判別する判別手順と、
音声区間における前記雑音抑圧音声信号の品質要因の特徴量を検出する音声区間特徴量検出手順と、
無音声区間における前記雑音抑圧音声信号の品質要因の特徴量を検出する無音声区間特徴量検出手順とからなり、
前記音声区間特徴量検出手順は、
雑音が重畳される前の音声信号とこれに対応する前記音声区間における雑音抑圧音声信号とを比較することにより、前記雑音抑圧音声信号の品質要因の特徴量として音声歪みを検出する音声歪み測定手順と、
前記雑音抑圧音声信号の品質要因の特徴量として前記音声区間における雑音抑圧音声信号の音量を検出する音量測定手順とからなり、
前記無音声区間特徴量検出手順は、
前記無音声区間における雑音抑圧音声信号とこれに対応する前記雑音重畳音声信号又はこの雑音重畳音声信号の元となる雑音信号とを比較することにより、前記雑音抑圧音声信号の品質要因の特徴量として雑音歪みを検出する雑音歪み測定手順と、
前記雑音抑圧音声信号の品質要因の特徴量として前記無音声区間における雑音抑圧音声信号の雑音量を検出する雑音量測定手順とからなり、
前記雑音歪みは、前記無音声区間における雑音抑圧音声信号を劣化音声信号、これに対応する前記雑音重畳音声信号又はこの雑音重畳音声信号の元となる雑音信号を参照信号としたときのPESQ値もしくはWideband−PESQ値であり、
前記推定手順は、前記音声歪みと前記雑音抑圧音声信号の音量と前記雑音歪みと前記雑音抑圧音声信号の雑音量とに基づいて前記雑音抑圧音声信号の品質を推定することを特徴とする雑音抑圧音声品質推定プログラム。
A noise-suppressed speech quality estimation program that operates a computer as a noise-suppressed speech quality estimation device that objectively estimates the quality of noise-suppressed speech,
A detection procedure for detecting a feature quantity of a quality factor of a noise-suppressed speech signal output from the noise suppression processing device when a noise-superimposed speech signal is given as an input to the noise suppression processing device to be evaluated;
Causing the computer to execute an estimation procedure for estimating the quality of the noise-suppressed speech signal based on the feature amount detected by the detection procedure,
The detection procedure includes:
A determination procedure for determining whether each section when the noise-suppressed speech signal is divided at a certain time is a speech section or a silent section;
A speech section feature amount detection procedure for detecting a feature amount of a quality factor of the noise-suppressed speech signal in a speech section;
A silent section feature amount detection procedure for detecting a feature amount of a quality factor of the noise-suppressed speech signal in a silent section,
The speech segment feature amount detection procedure includes:
A speech distortion measurement procedure for detecting speech distortion as a feature quantity of the quality factor of the noise-suppressed speech signal by comparing the speech signal before the noise is superimposed with the noise-suppressed speech signal corresponding to the speech section. When,
The volume measurement procedure for detecting the volume of the noise-suppressed speech signal in the speech section as a feature quantity of the quality factor of the noise-suppressed speech signal,
The silent section feature amount detection procedure includes:
By comparing the noise-suppressed speech signal in the no-speech section with the corresponding noise-superimposed speech signal or the noise signal that is the basis of this noise-superposed speech signal, the characteristic amount of the quality factor of the noise-suppressed speech signal Noise distortion measurement procedure for detecting noise distortion,
A noise amount measurement procedure for detecting a noise amount of the noise-suppressed speech signal in the no-speech interval as a feature amount of the quality factor of the noise-suppressed speech signal,
The noise distortion is a PESQ value obtained when the noise-suppressed speech signal in the silent period is a degraded speech signal, the noise superimposed speech signal corresponding to the noise suppressed speech signal or the noise signal that is the basis of the noise superimposed speech signal is a reference signal, or Wideband-PESQ value,
The estimation procedure estimates the quality of the noise-suppressed speech signal based on the speech distortion, the volume of the noise-suppressed speech signal, the noise distortion, and the noise amount of the noise-suppressed speech signal. Voice quality estimation program.
JP2006225158A 2006-06-07 2006-08-22 Noise suppression speech quality estimation apparatus, method and program Expired - Fee Related JP4745916B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2006225158A JP4745916B2 (en) 2006-06-07 2006-08-22 Noise suppression speech quality estimation apparatus, method and program

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2006158229 2006-06-07
JP2006158229 2006-06-07
JP2006225158A JP4745916B2 (en) 2006-06-07 2006-08-22 Noise suppression speech quality estimation apparatus, method and program

Publications (2)

Publication Number Publication Date
JP2008015443A JP2008015443A (en) 2008-01-24
JP4745916B2 true JP4745916B2 (en) 2011-08-10

Family

ID=39072488

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2006225158A Expired - Fee Related JP4745916B2 (en) 2006-06-07 2006-08-22 Noise suppression speech quality estimation apparatus, method and program

Country Status (1)

Country Link
JP (1) JP4745916B2 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8949120B1 (en) 2006-05-25 2015-02-03 Audience, Inc. Adaptive noise cancelation
JP5157852B2 (en) 2008-11-28 2013-03-06 富士通株式会社 Audio signal processing evaluation program and audio signal processing evaluation apparatus
JP5293329B2 (en) 2009-03-26 2013-09-18 富士通株式会社 Audio signal evaluation program, audio signal evaluation apparatus, and audio signal evaluation method
US20110178800A1 (en) 2010-01-19 2011-07-21 Lloyd Watts Distortion Measurement for Noise Suppression System
JP5606764B2 (en) * 2010-03-31 2014-10-15 クラリオン株式会社 Sound quality evaluation device and program therefor
US9558755B1 (en) 2010-05-20 2017-01-31 Knowles Electronics, Llc Noise suppression assisted automatic speech recognition
JP5623961B2 (en) * 2011-03-30 2014-11-12 クラリオン株式会社 Voice communication device and in-vehicle device
JP5849758B2 (en) * 2012-02-20 2016-02-03 株式会社Jvcケンウッド Communication device, status notification method
US9640194B1 (en) 2012-10-04 2017-05-02 Knowles Electronics, Llc Noise suppression for speech processing based on machine-learning mask estimation
EP2733700A1 (en) * 2012-11-16 2014-05-21 Nederlandse Organisatie voor toegepast -natuurwetenschappelijk onderzoek TNO Method of and apparatus for evaluating intelligibility of a degraded speech signal
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
US9799330B2 (en) 2014-08-28 2017-10-24 Knowles Electronics, Llc Multi-sourced noise suppression
CN112967735A (en) * 2021-02-23 2021-06-15 北京达佳互联信息技术有限公司 Training method of voice quality detection model and voice quality detection method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004056612A (en) * 2002-07-23 2004-02-19 Nippon Telegr & Teleph Corp <Ntt> Method, device and program for evaluating objective quality, and recording medium having objective quality evaluation program recorded thereon
JP2005328527A (en) * 2004-05-14 2005-11-24 Agilent Technol Inc Measurement noise reduction for signal quality evaluation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004056612A (en) * 2002-07-23 2004-02-19 Nippon Telegr & Teleph Corp <Ntt> Method, device and program for evaluating objective quality, and recording medium having objective quality evaluation program recorded thereon
JP2005328527A (en) * 2004-05-14 2005-11-24 Agilent Technol Inc Measurement noise reduction for signal quality evaluation

Also Published As

Publication number Publication date
JP2008015443A (en) 2008-01-24

Similar Documents

Publication Publication Date Title
JP4745916B2 (en) Noise suppression speech quality estimation apparatus, method and program
RU2651616C2 (en) Method and apparatus for audio interference estimation
Falk et al. A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech
KR101430321B1 (en) Method and system for determining a perceived quality of an audio system
US9959886B2 (en) Spectral comb voice activity detection
US20080312918A1 (en) Voice performance evaluation system and method for long-distance voice recognition
WO2009145192A1 (en) Voice detection device, voice detection method, voice detection program, and recording medium
JP2018156044A (en) Voice recognition device, voice recognition method, and voice recognition program
JP2012027186A (en) Sound signal processing apparatus, sound signal processing method and program
CN104919525B (en) For the method and apparatus for the intelligibility for assessing degeneration voice signal
JP2011033717A (en) Noise suppression device
CN108885864B (en) Measuring device, filter generating device, measuring method, and filter generating method
CN109600697A (en) The outer playback matter of terminal determines method and device
Sun et al. Investigations into the relationship between measurable speech quality and speech recognition rate for telephony speech
JP5627440B2 (en) Acoustic apparatus, control method therefor, and program
Moeller et al. Objective estimation of speech quality for communication systems
JP2004325127A (en) Sound source detection method, sound source separation method, and apparatus for executing them
JP4113481B2 (en) Voice quality objective evaluation apparatus and voice quality objective evaluation method
Reimes et al. The relative approach algorithm and its applications in new perceptual models for noisy speech and echo performance
Gierlich et al. Advances in perceptual modeling of speech quality in telecommunications
JP4495704B2 (en) Sound image localization emphasizing reproduction method, apparatus thereof, program thereof, and storage medium thereof
JP2020190606A (en) Sound noise removal device and program
Ghimire Speech intelligibility measurement on the basis of ITU-T Recommendation P. 863
Rund et al. Objective quality assessment for the acoustic zoom
JP3490380B2 (en) Apparatus and method for evaluating signal transmission quality of signal transmission medium, and information recording medium

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20080725

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20101224

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20110201

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20110404

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20110510

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20110512

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140520

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140520

Year of fee payment: 3

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

LAPS Cancellation because of no payment of annual fees