JPH09331391A - Speech quality object estimate device - Google Patents

Speech quality object estimate device

Info

Publication number
JPH09331391A
JPH09331391A JP15106796A JP15106796A JPH09331391A JP H09331391 A JPH09331391 A JP H09331391A JP 15106796 A JP15106796 A JP 15106796A JP 15106796 A JP15106796 A JP 15106796A JP H09331391 A JPH09331391 A JP H09331391A
Authority
JP
Japan
Prior art keywords
speech
similarity
voice signal
voice
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP15106796A
Other languages
Japanese (ja)
Inventor
Tetsuro Yamazaki
哲朗 山崎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP15106796A priority Critical patent/JPH09331391A/en
Publication of JPH09331391A publication Critical patent/JPH09331391A/en
Pending legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To improve the speech quality estimate accuracy by integrating a section to identify a coding system of a voice signal so as to select a standard deteriorating voice signal and a weight coefficient in matching with the coding system of a test voice signal. SOLUTION: A coding voice identification section 1 identifies a coding system of a coding voice signal having a code error. A similarity calculation section 3 obtains a time series of the similarity between the analysis result by a voice analysis section 2 and a standard deterioration voice signal selected based on the identification result by the coding voice identification section 1. A weight coefficient database 6 stores weight coefficients between units obtained by subject evaluation values of various deteriorated voice signals and learning processing by a neural net by the similarity time series. A quality estimate calculation section 4 applies input processing to the weight coefficient selected based on the time series of the similarity obtained by the similarity calculation section 3 and the identification result of the coded voice identification section 1 from the weight coefficient database 6 and estimates the speech quality.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【発明の属する技術分野】この発明は、通話品質客観推
定装置に関し、特に、電話伝送装置において発生した
歪、雑音により劣化した音声信号の通話品質を音声信号
の物理量により客観的に推定する通話品質客観推定装置
に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech quality objective estimation apparatus, and more particularly to a speech quality objectively estimating speech speech quality of a voice signal deteriorated by distortion and noise generated in a telephone transmission apparatus, by a physical quantity of the speech signal. The present invention relates to an objective estimation device.

【0002】[0002]

【従来の技術】従来例を図1を参照して説明する。図1
において、鎖線により包囲される部分が従来例を示す。
パターンマッチング法とニューラルネットを使用して通
話品質を客観的に推定する通話品質客観推定装置は、品
質推定されるべき音声信号である試験音声信号を音声分
析部2に入力し、この分析結果と、標準劣化音声データ
ベース5に予め蓄積される劣化が生じた音声信号の特徴
をパラメータの短時間系列で表わした標準劣化音声信号
との間の類似度を、類似度計算部3においてパターンマ
ッチング処理して求め、求められた類似度の時系列をニ
ューラルネットを構成する品質推定値計算部4に入力処
理して通話品質を推定する。この推定装置のパターンマ
ッチングを実施する類似度計算部3における類似度の計
算、および標準劣化音声信号を作成する学習処理は、通
話品質客観測定方法(特願平3−118924)と同様
に行う。そして、類似度時系列を入力として通話品質の
尺度を表わすMOS(Mean Opinion Sc
ore:平均オピニオン評点)を出力するニューラルネ
ットは、入力層、中間層、および出力層の3層より成
り、学習処理、品質推定処理を以下の通りに行う。
2. Description of the Related Art A conventional example will be described with reference to FIG. FIG.
In FIG. 1, the portion surrounded by the chain line shows a conventional example.
A speech quality objective estimation apparatus that objectively estimates speech quality using a pattern matching method and a neural network inputs a test speech signal, which is a speech signal to be quality-estimated, to a speech analysis unit 2 and outputs this analysis result. , The similarity calculation unit 3 performs pattern matching processing on the similarity with the standard deteriorated voice signal in which the characteristics of the deteriorated voice signal stored in the standard deteriorated voice database 5 in advance are represented by a short time series of parameters. Then, the time series of the obtained similarity is input to the quality estimation value calculation unit 4 forming the neural network to estimate the call quality. The calculation of the degree of similarity in the degree-of-similarity calculation unit 3 that performs the pattern matching of this estimation device and the learning process for creating the standard deteriorated speech signal are performed in the same manner as the speech quality objective measuring method (Japanese Patent Application No. 3-118924). Then, a MOS (Mean Opinion Sc) representing a measure of speech quality with the similarity time series as an input.
The ore: average opinion score) is output to the neural network, which is composed of three layers: an input layer, an intermediate layer, and an output layer. Learning processing and quality estimation processing are performed as follows.

【0003】学習処理は、先ず、短時間区間(以降、フ
レームと記述する)毎に求めた学習用音声信号と標準劣
化音声信号の類似度の時系列を5つの区分に均等分割
し、各区分毎に類似度を平均化したものをニューラルネ
ットの入力層に入力する。中間層のユニット数は5とす
る。ユニット数1の出力層には、学習音声信号のMOS
を入力する。次に、バックプロパゲーション法(D.
E.Rumelhart,J.L.McClellan
d,and the PDP ResearchGro
up,Parallel Distributed P
rocessing Vol.1 MITPress,
pp.318−362,1986)により各層のユニッ
ト間の重み係数の学習を行う。中間層および出力層の各
ユニットの入出力関数にはシグモイド関数が使用され
る。学習処理毎に学習サンプル、非学習サンプルの品質
推定を行い、主観品質測定値との間の差を求める。学習
サンプルの品質推定値と主観品質との間の差が小さい値
で安定し、非学習サンプルの品質推定値と主観品質との
間の差が極小となった学習回数で処理を終了する。これ
ら一連の学習処理から得られたユニット間の重み係数は
これを重み係数データベースに蓄積しておき、品質推定
処理をするに際して使用される。品質推定処理は標準劣
化音声信号に対する試験音声の各フレームの類似度の時
系列を5つの区分に均等分割し、各区分毎に平均した類
似度を、学習処理で使用したニューラルネットと同じ構
造のニューラルネットに入力し、学習処理で得られたユ
ニット間の重み係数とシグモイド関数を使用して試験音
声のMOSを決定する。
In the learning process, first, the time series of the similarity between the learning voice signal and the standard deteriorated voice signal obtained for each short time period (hereinafter referred to as a frame) is equally divided into five divisions, and each division is divided into five divisions. The averaged similarity is input to the input layer of the neural network. The number of units in the intermediate layer is 5. The output layer with one unit has a learning voice signal MOS.
Enter Next, the back propagation method (D.
E. FIG. Rumelhart, J .; L. McClelllan
d, and the PDP Research Gro
up, Parallel Distributed P
processing Vol. 1 MIT Press,
pp. 318-362, 1986), the weighting coefficient between units in each layer is learned. A sigmoid function is used as the input / output function of each unit in the middle layer and the output layer. The quality of the learning sample and the non-learning sample is estimated for each learning process, and the difference between the quality value and the subjective quality measurement value is obtained. The processing ends when the difference between the quality estimation value of the learning sample and the subjective quality is stable at a small value, and the difference between the quality estimation value of the non-learning sample and the subjective quality becomes minimal, and the processing is ended. The weighting factors between units obtained from the series of learning processes are accumulated in the weighting factor database and used in the quality estimation process. The quality estimation processing equally divides the time series of the similarity of each frame of the test speech with respect to the standard deteriorated speech signal into five sections, and averages the similarity for each section with the same structure as the neural network used in the learning processing. It is input to the neural network and the weighting coefficient between the units obtained by the learning process and the sigmoid function are used to determine the MOS of the test voice.

【0004】[0004]

【発明が解決しようとする課題】上述した通話品質客観
推定装置は、試験音声信号と同じ符号化方式の音声信号
を学習サンプルに使用して標準劣化音声およびユニット
間の重み係数を決定しているので、品質を充分な精度で
推定することができる。しかし、符号化方式が明らかで
はない試験音声信号の品質を推定する場合、この通話品
質推定装置は符号化方式に適合していない標準劣化音声
信号と重み係数を使用する恐れがある。符号化方式に適
合しない標準劣化音声信号を使用した場合、標準劣化音
声信号と試験音声信号との間の類似度は正確には求めら
れない。また、ニューラルネットで学習された重み係数
は、符号化方式に依存したものであるので、符号化方式
が異なった場合、正しい品質推定値は得られない。
The speech quality objective estimation apparatus described above uses a speech signal of the same coding method as the test speech signal as a learning sample to determine the standard deteriorated speech and the weighting coefficient between units. Therefore, the quality can be estimated with sufficient accuracy. However, when estimating the quality of a test speech signal whose coding scheme is not clear, this speech quality estimation device may use a standard degraded speech signal and a weighting coefficient that are not compatible with the coding scheme. If a standard degraded speech signal that does not conform to the coding method is used, the degree of similarity between the standard degraded speech signal and the test speech signal cannot be obtained accurately. In addition, since the weighting coefficient learned by the neural network depends on the encoding method, a correct quality estimation value cannot be obtained when the encoding method is different.

【0005】この発明は、この通話品質客観推定装置に
音声信号の符号化方式を識別する部分を組み込むことに
より、試験音声の符号化方式に適合した標準劣化音声信
号と重み係数とが選択され、上述の問題を解消した通話
品質推定精度の良好な通話品質客観推定装置を提供する
ものである。
According to the present invention, by incorporating a portion for identifying a voice signal coding system into this speech quality objective estimation device, a standard deteriorated voice signal and a weighting coefficient suitable for the test voice coding system are selected. It is an object of the present invention to provide a speech quality objective estimation device with good speech quality estimation accuracy that solves the above problems.

【0006】[0006]

【課題を解決するための手段】標準劣化音声信号と試験
音声信号との間の類似度を求め、類似度の時系列をニュ
ーラルネットに入力処理して試験音声信号の通話品質を
客観的に推定する通話品質客観推定装置において、試験
音声信号を周波数分析する音声分析部2を具備し、各種
劣化音声信号を予め蓄積する標準劣化音声データベース
5を具備し、符号誤りが生じた符号化音声信号の符号化
方式を識別する符号化音声識別部1を具備し、音声分析
部2における分析結果と標準劣化音声データベース5か
ら符号化音声識別部1の識別結果に基づいて選択された
標準劣化音声信号との間の類似度の時系列を求める類似
度計算部3を具備し、各種劣化音声信号の主観評価値と
類似度時系列によりニューラルネットで学習処理を行っ
た結果得られたユニット間の重み係数を予め蓄積する重
み係数データベース6を具備し、類似度計算部3におい
て求められた類似度の時系列および重み係数データベー
ス6から符号化音声識別部1の識別結果に基づいて選択
された重み係数を入力処理して通話品質を推定するニュ
ーラルネットを構成する品質推定値計算部4を具備する
通話品質客観推定装置を構成した。
[Problem to be Solved] A similarity between a standard deteriorated voice signal and a test voice signal is obtained, and a time series of the similarity is input to a neural network to objectively estimate the speech quality of the test voice signal. In the speech quality objective estimation apparatus, the speech analysis unit 2 for frequency-analyzing the test speech signal is provided, and the standard degraded speech database 5 for preliminarily storing various degraded speech signals is provided. A standard speech signal selected from the standard analysis speech database 5 and the analysis result of the speech analysis section 2 based on the identification result of the encoded voice identification section 1; The similarity calculation unit 3 for obtaining the time series of the similarity between the two is provided, and the result obtained by performing the learning processing by the neural network by the subjective evaluation value of various deteriorated speech signals and the similarity time series is obtained. And a weighting coefficient database 6 for accumulating weighting coefficients between sets in advance, and based on the time series of the similarity calculated by the similarity calculating section 3 and the weighting coefficient database 6 based on the discrimination result of the encoded speech discriminating section 1. The speech quality objective estimation device is configured to include the quality estimation value calculation unit 4 that constitutes the neural network that estimates the speech quality by inputting the selected weighting factor.

【0007】そして、符号化音声識別部1は試験音声信
号を周波数分析し、分析結果に基づいて試験音声信号の
スペクトル概形を決定し、スペクトル概形により試験音
声信号の符号化方式を識別するものである通話品質客観
推定装置を構成した。
Then, the coded voice identification unit 1 frequency-analyzes the test voice signal, determines the spectrum outline of the test voice signal based on the analysis result, and identifies the coding system of the test voice signal by the spectrum outline. We constructed an objective speech quality estimation device.

【0008】[0008]

【発明の実施の形態】この発明の通話品質客観推定装置
の一実施例においては、先ず、試験音声をフレーム毎に
高速フーリエ変換した後、得られたスペクトルを幾つか
の帯域に等分割区分し、区分毎にスペクトルを平均す
る。平均化した値を更に音声の長さ、即ち全フレーム数
で平均する。次に、最も低い帯域のスペクトルと全帯域
スペクトルの平均の差を全帯域スペクトルの標準偏差で
除算する。除算して得られた値を“識別に使用する物理
量”とし、この物理量と閾値とを比較することにより符
号化方式を識別する。識別に使用する閾値は、符号誤り
が生じた音声信号のサンプルから求めた“識別に使用す
る物理量”の分布により決定する。
BEST MODE FOR CARRYING OUT THE INVENTION In an embodiment of the speech quality objective estimation apparatus of the present invention, first, a test speech is subjected to fast Fourier transform for each frame, and then the obtained spectrum is equally divided into several bands. , Average the spectra for each section. The averaged values are further averaged over the voice length, that is, the total number of frames. Then, the difference between the average of the lowest band spectrum and the full band spectrum is divided by the standard deviation of the full band spectrum. The value obtained by the division is defined as "a physical quantity used for identification", and the encoding method is identified by comparing this physical quantity with a threshold value. The threshold used for identification is determined by the distribution of the "physical quantity used for identification" obtained from the sample of the speech signal in which the code error has occurred.

【0009】[0009]

【実施例】この発明の実施例を図1を参照して説明す
る。図1において、1はこの発明により付加される符号
化音声識別部であり、試験音声信号を入力して符号誤り
が生じた符号化音声信号の符号化方式を識別するもので
ある。符号化音声識別部1の識別結果を使用して標準劣
化音声データベース5から標準劣化音声信号を選択する
と共に、重み係数データベース6からニューラルネット
により求められた重み係数を選択するものである。音声
分析部2は試験音声信号のLPCケプストラム係数をフ
レーム毎に求める。類似度計算部3は試験音声信号と標
準劣化音声データベース5から符号化音声識別部1の識
別結果に応じて選択された標準劣化音声信号との間のパ
ターンマッチングをフレーム毎に行い、類似度の時系列
を求める。ここで、試験音声信号および標準劣化音声信
号の特徴を表わす物理量はLPCケプストラム係数であ
る。品質推定値計算部4には、類似度の時系列を5つの
区分に等分割し、区分毎にこれに属する類似度を平均し
たものが入力されると共に、重み係数データベース6か
ら符号化方式識別結果に応じた重み係数が入力され、品
質(MOS:平均オピニオン評点)を求める。標準劣化
音声データベース5は、各種劣化音声信号を代表する劣
化音声信号のデータベースである。重み係数データベー
ス6は、各種劣化音声信号の主観評価値と類似度時系列
によりニューラルネットで学習処理を行った結果得られ
たユニット間の重み係数のデータベースである。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT An embodiment of the present invention will be described with reference to FIG. In FIG. 1, reference numeral 1 denotes a coded voice identification unit added according to the present invention, which identifies a coding system of a coded voice signal in which a test voice signal is input and a code error occurs. The standard voice signal is selected from the standard voice database 5 using the discrimination result of the encoded voice discriminating unit 1, and the weighting factor obtained by the neural network is selected from the weighting factor database 6. The voice analysis unit 2 obtains the LPC cepstrum coefficient of the test voice signal for each frame. The similarity calculation unit 3 performs pattern matching between the test speech signal and the standard deteriorated speech signal selected from the standard deteriorated speech database 5 according to the identification result of the encoded speech identification unit 1 for each frame to determine the similarity. Find the time series. Here, the physical quantity representing the characteristics of the test voice signal and the standard deteriorated voice signal is the LPC cepstrum coefficient. To the quality estimation value calculation unit 4, the time series of the similarity is equally divided into 5 sections, the average of the degrees of similarity belonging to each section is input, and the encoding method identification is performed from the weight coefficient database 6. The weighting factor corresponding to the result is input, and the quality (MOS: average opinion score) is obtained. The standard deteriorated voice database 5 is a database of deteriorated voice signals representing various deteriorated voice signals. The weighting coefficient database 6 is a database of weighting coefficients between units obtained as a result of performing a learning process by a neural network using subjective evaluation values of various deteriorated speech signals and similarity time series.

【0010】図2は、符号化音声識別部1の処理の流れ
図である。先ず、ステップ7において、試験音声信号を
フレーム毎に512点で高速フーリエ変換し256点ス
ペクトルを求める。ステップ8においては、電話帯域音
声(3400Hz以下)を対象としているところから、
得られた256点スペクトルの内の225点スペクトル
を9帯域に等分割し、各帯域毎に平均する。ステップ9
においては、この全9帯域スペクトルを音声の長さ、即
ち全フレーム数で平均する。ステップ10においては、
最も低域の帯域のスペクトルと全9帯域スペクトルの平
均の差を全9帯域スペクトルの標準偏差で除算する。こ
の除算して得られた値を“識別に使用する物理量”と
し、この物理量と閾値とを比較することにより符号化方
式を識別する。即ち、ステップ11において、ステップ
10において得られた値である“識別に使用する物理
量”と、3段階の誤り率(BER=0,10-3,1
-2)で符号誤りを発生させたADPCM符号化音声信
号、LDCELP符号化音声信号の集合からBER=1
-2のADPCM、LDCELP符号化音声信号を識別
する閾値とを比較する。比較した結果、試験音声信号が
BER=10-2のADPCM符号化音声信号である場合
には''1''、BER=10-2のLDCELP符号化音声
信号である場合は''2''、それ以外の場合は''3''を、
識別結果として類似度計算部3および品質推定値計算部
4に出力する。類似度計算部3および品質推定値計算部
4においては、識別結果が''1''である場合、類似度計
算部3および品質推定値計算部4からBER=10-2
ADPCM符号化音声信号の通話品質を推定する標準劣
化音声データおよび重み係数を選択する。同様に、識別
結果が''2''である場合、類似度計算部3および品質推
定値計算部4からBER=10-2のLDCELP符号化
音声信号の通話品質を推定する標準劣化音声データおよ
び重み係数を選択する。識別結果が''3''である場合、
類似度計算部3および品質推定値計算部4からBER=
10-2のADPCM符号化音声信号およびLDCELP
符号化音声信号以外の通話品質を推定する標準劣化音声
データおよび重み係数を選択する。
FIG. 2 is a flow chart of the processing of the coded speech identifying section 1. First, in step 7, the test speech signal is fast Fourier transformed at 512 points for each frame to obtain a 256-point spectrum. In Step 8, since the target is telephone band voice (3400 Hz or less),
A 225-point spectrum of the obtained 256-point spectrum is equally divided into 9 bands and averaged for each band. Step 9
In the above, the entire 9-band spectrum is averaged over the voice length, that is, the total number of frames. In step 10,
The average difference between the spectrum of the lowest band and the spectrum of all 9 bands is divided by the standard deviation of the spectrum of all 9 bands. The value obtained by this division is taken as the "physical quantity used for identification", and the encoding method is identified by comparing this physical quantity with a threshold value. That is, in step 11, the “physical quantity used for identification” which is the value obtained in step 10 and the error rate of three levels (BER = 0, 10 −3 , 1)
BER = 1 from the set of the ADPCM coded voice signal and the LDCELP coded voice signal in which a code error has occurred in 0 -2 ).
0 -2 ADPCM, and compares the threshold value for identifying the LDCELP encoded audio signal. As a result of comparison, "1" when the test speech signal is an ADPCM coded speech signal with BER = 10 -2 , and "2" when the test speech signal is an LDCELP coded speech signal with BER = 10 -2 , Otherwise "3",
The identification result is output to the similarity calculation unit 3 and the quality estimation value calculation unit 4. In the similarity calculation unit 3 and the quality estimation value calculation unit 4, when the identification result is “1”, the similarity calculation unit 3 and the quality estimation value calculation unit 4 calculate the ADPCM coded speech with BER = 10 −2. Select standard degraded speech data and weighting factors that estimate the speech quality of the signal. Similarly, when the identification result is “2”, the standard deterioration voice data for estimating the speech quality of the LDCELP encoded voice signal of BER = 10 −2 from the similarity calculation unit 3 and the quality estimation value calculation unit 4 and Select a weighting factor. If the identification result is "3",
From the similarity calculation unit 3 and the quality estimation value calculation unit 4, BER =
10 -2 ADPCM coded audio signal and LDCELP
Standard degrading speech data and weighting factors for estimating speech quality other than the coded speech signal are selected.

【0011】3段階の誤り率(BER=0,10-3,1
-2)で符号誤りを発生させたADPCM符号化音声、
LDCELP符号化音声576文章を対象とし、BER
=10-2のADPCM符号化音声信号、LDCELP符
号化音声信号を識別する実験を行った結果、BER=1
-2のLDCELP符号化音声信号の識別正答率は9
7.9%、BER=10-2のADPCM符号化音声信号
の識別正答率は83.3%、その他の音声信号の識別正
答率は75.0%、3カテゴリーの平均は87.3%と
なった。また、識別部を組み込んだ場合の品質測定実験
の結果、識別部を組み込まない場合と比較して精度が約
0.25改善された。
Three-step error rate (BER = 0, 10 -3 , 1
ADPCM coded speech with a code error of 0 -2 ),
BER for LDCELP encoded voice 576 sentences
= 10 −2 ADPCM coded speech signal, LDCELP coded speech signal were identified, and BER = 1.
The discrimination correct answer rate of the LDCELP coded voice signal of 0 -2 is 9
7.9%, the identification correct answer rate of ADPCM coded voice signal of BER = 10 -2 is 83.3%, the identification correct answer rate of other voice signals is 75.0%, and the average of 3 categories is 87.3%. became. In addition, as a result of the quality measurement experiment in which the identification unit is incorporated, the accuracy is improved by about 0.25 as compared with the case where the identification unit is not incorporated.

【0012】[0012]

【発明の効果】以上の通りであって、パターンマッチン
グとニューラルネットを使用した通話品質客観推定装置
に、音声信号の符号化方式を識別する部分を組み込んで
試験音声の符号化方式を識別することにより、試験音声
の符号化方式に適合した標準劣化音声信号と重み係数を
使用して通話品質推定が行われるので、通話品質推定の
精度を改善することができる。
As described above, the speech quality objective estimation apparatus using the pattern matching and the neural network is provided with a portion for identifying the speech signal encoding method to identify the test speech encoding method. As a result, the call quality estimation is performed using the standard deteriorated voice signal and the weighting coefficient that are suitable for the test voice coding method, so that the accuracy of the call quality estimation can be improved.

【図面の簡単な説明】[Brief description of drawings]

【図1】通話品質客観推定装置を説明する図。FIG. 1 is a diagram illustrating a call quality objective estimation device.

【図2】符号化音声識別部の処理の手順を説明する図。FIG. 2 is a diagram illustrating a procedure of processing of a coded voice identification unit.

【符号の説明】[Explanation of symbols]

1 符号化音声識別部 2 音声分析部 3 類似度計算部 4 品質推定値計算部 5 標準劣化音声データベース 6 重み係数データベース 1 coded speech identification unit 2 speech analysis unit 3 similarity calculation unit 4 quality estimation value calculation unit 5 standard deteriorated speech database 6 weighting coefficient database

Claims (2)

【特許請求の範囲】[Claims] 【請求項1】 標準劣化音声信号と試験音声信号との間
の類似度を求め、類似度の時系列をニューラルネットに
入力処理して試験音声信号の通話品質を客観的に推定す
る通話品質客観推定装置において、 試験音声信号を周波数分析する音声分析部を具備し、 各種劣化音声信号を予め蓄積する標準劣化音声データベ
ースを具備し、 符号誤りが生じた符号化音声信号の符号化方式を識別す
る符号化音声識別部を具備し、 音声分析部における分析結果と標準劣化音声データベー
スから符号化音声識別部の識別結果に基づいて選択され
た標準劣化音声信号との間の類似度の時系列を求める類
似度計算部を具備し、 各種劣化音声信号の主観評価値と類似度時系列によりニ
ューラルネットで学習処理を行った結果得られたユニッ
ト間の重み係数を予め蓄積する重み係数データベースを
具備し、 類似度計算部において求められた類似度の時系列および
重み係数データベースから符号化音声識別部の識別結果
に基づいて選択された重み係数を入力処理して通話品質
を推定するニューラルネットを構成する品質推定値計算
部を具備することを特徴とする通話品質客観推定装置。
1. A call quality objective for objectively estimating the call quality of a test voice signal by obtaining a similarity between a standard degraded voice signal and a test voice signal and inputting a time series of the similarity into a neural network. The estimation device is equipped with a speech analysis unit for frequency-analyzing the test speech signal, equipped with a standard degraded speech database that stores various degraded speech signals in advance, and identifies the coding method of the coded speech signal in which a code error has occurred. A coded speech identification unit is provided, and a time series of the similarity between the analysis result in the speech analysis unit and the standard degraded speech signal selected based on the identification result of the encoded speech identification unit from the standard degraded speech database is obtained. Equipped with a similarity calculation unit, the weighting factors between units obtained as a result of learning processing by a neural network using subjective evaluation values of various deteriorated speech signals and similarity time series are stored in advance. The weighting coefficient database is provided, and the weighting coefficient selected from the time series of the similarity determined by the similarity calculating section and the weighting coefficient database based on the identification result of the encoded speech identification section is input-processed to improve the speech quality. An objective speech quality estimation apparatus comprising a quality estimation value calculation unit forming a neural network for estimation.
【請求項2】 請求項1に記載される通話品質客観推定
装置において、 符号化音声識別部は試験音声信号を周波数分析し、分析
結果に基づいて試験音声信号のスペクトル概形を決定
し、スペクトル概形により試験音声信号の符号化方式を
識別するものであることを特徴とする通話品質客観推定
装置。
2. The speech quality objective estimation device according to claim 1, wherein the coded voice identification unit frequency-analyzes the test voice signal, and determines a spectrum outline of the test voice signal based on the analysis result. A speech quality objective estimation device characterized by identifying a coding system of a test voice signal by a rough shape.
JP15106796A 1996-06-12 1996-06-12 Speech quality object estimate device Pending JPH09331391A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP15106796A JPH09331391A (en) 1996-06-12 1996-06-12 Speech quality object estimate device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP15106796A JPH09331391A (en) 1996-06-12 1996-06-12 Speech quality object estimate device

Publications (1)

Publication Number Publication Date
JPH09331391A true JPH09331391A (en) 1997-12-22

Family

ID=15510588

Family Applications (1)

Application Number Title Priority Date Filing Date
JP15106796A Pending JPH09331391A (en) 1996-06-12 1996-06-12 Speech quality object estimate device

Country Status (1)

Country Link
JP (1) JPH09331391A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009525633A (en) * 2006-01-31 2009-07-09 テレフオンアクチーボラゲット エル エム エリクソン(パブル) Non-intrusive signal quality assessment
JP2012516591A (en) * 2009-01-30 2012-07-19 テレフオンアクチーボラゲット エル エム エリクソン(パブル) Audio signal quality prediction
JP2021015137A (en) * 2019-07-10 2021-02-12 三菱電機株式会社 Information processing device, program, and information processing method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009525633A (en) * 2006-01-31 2009-07-09 テレフオンアクチーボラゲット エル エム エリクソン(パブル) Non-intrusive signal quality assessment
JP2012516591A (en) * 2009-01-30 2012-07-19 テレフオンアクチーボラゲット エル エム エリクソン(パブル) Audio signal quality prediction
JP2021015137A (en) * 2019-07-10 2021-02-12 三菱電機株式会社 Information processing device, program, and information processing method

Similar Documents

Publication Publication Date Title
US5715372A (en) Method and apparatus for characterizing an input signal
AU712412B2 (en) Speech processing
EP0625774A2 (en) A method and an apparatus for speech detection
EP0470245B1 (en) Method for spectral estimation to improve noise robustness for speech recognition
US6609092B1 (en) Method and apparatus for estimating subjective audio signal quality from objective distortion measures
JP4005128B2 (en) Signal quality evaluation
CN109496334B (en) Apparatus and method for evaluating speech quality
US9786300B2 (en) Single-sided speech quality measurement
JPH07271394A (en) Removal of signal bias for sure recognition of telephone voice
CN110070895B (en) Mixed sound event detection method based on factor decomposition of supervised variational encoder
JP3298858B2 (en) Partition-based similarity method for low-complexity speech recognizers
EP0453649B1 (en) Method and apparatus for modeling words with composite Markov models
CN113936681B (en) Speech enhancement method based on mask mapping and mixed cavity convolution network
Hollier et al. Error activity and error entropy as a measure of psychoacoustic significance in the perceptual domain
Liang et al. Output-based objective speech quality
CN111798875A (en) VAD implementation method based on three-value quantization compression
Picovici et al. Output-based objective speech quality measure using self-organizing map
JPH09331391A (en) Speech quality object estimate device
Chen et al. Nonintrusive speech quality evaluation using an adaptive neurofuzzy inference system
Picovici et al. New output-based perceptual measure for predicting subjective quality of speech
JPH064097A (en) Speaker recognizing method
CN114283835A (en) Voice enhancement and detection method suitable for actual communication condition
CN109741733B (en) Voice phoneme recognition method based on consistency routing network
CN112786068A (en) Audio source separation method and device and storage medium
Fu et al. Speech quality objective assessment using neural network