JP2005077970A - Device and method for speech quality objective evaluation - Google Patents

Device and method for speech quality objective evaluation Download PDF

Info

Publication number
JP2005077970A
JP2005077970A JP2003311090A JP2003311090A JP2005077970A JP 2005077970 A JP2005077970 A JP 2005077970A JP 2003311090 A JP2003311090 A JP 2003311090A JP 2003311090 A JP2003311090 A JP 2003311090A JP 2005077970 A JP2005077970 A JP 2005077970A
Authority
JP
Japan
Prior art keywords
point
speech
distortion amount
evaluation
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2003311090A
Other languages
Japanese (ja)
Other versions
JP4113481B2 (en
Inventor
Rei Takahashi
玲 高橋
Atsuko Kurashima
敦子 倉島
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2003311090A priority Critical patent/JP4113481B2/en
Publication of JP2005077970A publication Critical patent/JP2005077970A/en
Application granted granted Critical
Publication of JP4113481B2 publication Critical patent/JP4113481B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

<P>PROBLEM TO BE SOLVED: To precisely estimate the subjective quality of the speech of a real telephone call, by taking into account the influence of the speaking state of a speaker of a side (receiving side) of speech quality evaluation on the quality evaluation. <P>SOLUTION: A distortion measurement part 12 compares a speech signal at a point A obtained from a speech DB 11 with a deteriorated speech obtained by passing the former speech signal through a system 2 to be evaluated, to quantize the extent of the distortion of the deteriorated speech as a time series. A two-way speech section detection part 13 detects a bilateral speech section by comparing the speech signal at the point A obtained from the speech DB 11 with a speech signal at a point B. A weighting part 14 weights and averages the extent of distortion time series that the distortion quantity measurement part 12 outputs, while making the weight on the extent of quality deterioration in the two-way speech section which is smaller than in an independent speech section. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は,音声品質の評価技術に関し,特に,人間が音声を聴いてその品質を評価する主観評価試験を行うことなく,音声信号の物理的特徴量の測定から主観品質を推定する音声品質客観評価装置および音声品質客観評価方法に関する。   The present invention relates to a speech quality evaluation technique, and more particularly to a speech quality objective in which subjective quality is estimated from measurement of physical features of a speech signal without performing a subjective assessment test in which a human listens to speech and evaluates the quality. The present invention relates to an evaluation apparatus and a voice quality objective evaluation method.

従来の音声品質客観評価装置のブロック構成を図7に示す。図7において,2は音声品質が伝送によりどの程度劣化するかの評価対象となる評価対象系,4は音声品質客観評価装置である。音声品質客観評価装置4内において,41は評価音源が保持されている音声データベース(DB),42は評価対象系2において生じる歪量を測定する歪量測定部,43は歪量測定部42が測定した歪時系列を時間平均する平均処理部である。評価対象系2としては,例えば固定電話システム,携帯電話システム,IP電話システム等が対象となる。   FIG. 7 shows a block configuration of a conventional voice quality objective evaluation apparatus. In FIG. 7, reference numeral 2 denotes an evaluation target system that is an evaluation target of how much the voice quality deteriorates due to transmission, and 4 denotes a voice quality objective evaluation apparatus. In the voice quality objective evaluation device 4, reference numeral 41 denotes a voice database (DB) in which an evaluation sound source is held, reference numeral 42 denotes a distortion amount measurement unit that measures the distortion amount generated in the evaluation target system 2, and reference numeral 43 denotes a distortion amount measurement unit 42. It is an average process part which carries out the time average of the measured distortion time series. The evaluation target system 2 is, for example, a fixed telephone system, a mobile phone system, an IP telephone system, or the like.

従来の音声品質客観評価装置4では,評価音源が保持されている音声DB41から評価対象系2への入力信号と評価対象系2からの出力信号(以下,劣化音声という)を用い,歪量測定部42が音声品質客観評価アルゴリズム(例えば,ITU−T勧告P.862に規定される音声品質客観評価法)に基づいて歪量の時系列を算出する。   The conventional speech quality objective evaluation apparatus 4 uses the input signal from the speech DB 41 holding the evaluation sound source to the evaluation target system 2 and the output signal from the evaluation target system 2 (hereinafter referred to as degraded speech) to measure the amount of distortion. The unit 42 calculates a distortion time series based on an audio quality objective evaluation algorithm (for example, an audio quality objective evaluation method defined in ITU-T recommendation P.862).

ITU−T勧告P.862に規定される音声品質客観評価法では,具体的には評価音源と,評価対象系2を通した劣化音声の周波数スペクトル分析を行い,これらの差分を求め,人間の聴覚特性に基づく重み付けをすることにより,知覚される歪量を定量化する。   ITU-T recommendation P.I. In the speech quality objective evaluation method defined in 862, specifically, the frequency spectrum analysis of the evaluation sound source and the degraded speech through the evaluation target system 2 is performed, the difference between these is obtained, and weighting based on human auditory characteristics is performed. By doing so, the perceived amount of distortion is quantified.

この際,試験信号としては双方向通話における片方向音声信号のみを用いる。平均処理部43によっで,歪量測定部42の出力である歪時系列を時間平均することにより最終的な客観評価値を得る。   At this time, only a one-way audio signal in a two-way call is used as a test signal. The average processing unit 43 obtains a final objective evaluation value by time averaging the distortion time series as the output of the distortion amount measuring unit 42.

一般に,双方向通話(地点Aと地点Bの通話)は,図2に示すように,
(1):双方無音区間,
(2)−A:地点A単独発話区間,
(2)−B:地点B単独発話区間,
(3):双方発話区間,
の4状態に分類される。
In general, two-way calls (calls at point A and point B) are as shown in FIG.
(1): Both silent sections,
(2) -A: Point A single utterance section,
(2) -B: Point B single utterance section,
(3): Both utterance intervals,
Are classified into four states.

従来の方法では,例えば地点Aに着目した場合,図7に示す平均処理部43が,地点A発話区間(つまり地点A単独発話区間と双方発話区間)と,地点A無音区間(つまり地点B単独発話区間と双方無音区間)とを区別し,両者の間で音声信号の歪量に対する重み付けを変えることにより,主観品質評価特性と整合した劣化の定量化を行っていた。   In the conventional method, for example, when attention is paid to the point A, the average processing unit 43 shown in FIG. Distinguishing between speech and silent sections), and changing the weighting for the amount of distortion of the speech signal between them, the degradation quantified consistent with the subjective quality evaluation characteristics.

なお,図7に示す歪量測定部42において用いる音声品質客観評価アルゴリズムについては,例えば下記の非特許文献1に記載されている。本アルゴリズムは,ITU−T勧告P.862として採用されている。
Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecsRix,A.W.;Beerends,J.G.;Hollier,M.P.;Hekstra,A.P.;Acoustics,speech,and Signal Processing,2001.Proceedings. (ICASSP '01). 2001 IEEE International Conference on,Volume:2,7-11 May 2001 Page(s):749-752 vol.2 .
Note that the speech quality objective evaluation algorithm used in the distortion amount measurement unit 42 shown in FIG. 7 is described in Non-Patent Document 1, for example. This algorithm is the ITU-T recommendation P.I. 862 is adopted.
Perceptual evaluation of speech quality (PESQ) -a new method for speech quality assessment of telephone networks and codecsRix, AW; Beerends, JG; Hollier, MP; Hekstra, AP; Acoustics, speech, and Signal Processing, 2001.Proceedings. '01). 2001 IEEE International Conference on, Volume: 2,7-11 May 2001 Page (s): 749-752 vol.2.

しかし,実際には,地点A発話区間においても地点A単独発話区間と双方発話区間とでは,地点Bの話者が感じる地点Aの話者の音声品質に対する感度は異なる。つまり,双方発話区間の場合,地点Bの話者が発話中であるため,地点Aの話者の音声が歪んでいる場合でも,地点Bの話者自身の音声にマスクされ,地点Aの話者の音声歪を知覚しにくくなり,結果として,同一の歪量であっても主観品質の劣化は地点A単独発話区間に比べて軽減される。従来の方法では,この効果を考慮していないため,推定主観品質は実際の通話において地点Bの話者が感じる主観品質より厳しい評価となり,品質推定精度の点で問題があった。   However, in fact, even in the point A utterance section, the sensitivity to the voice quality of the speaker at the point A felt by the speaker at the point B is different between the point A single utterance section and the two-side utterance section. That is, in the case of the two-speaking section, since the speaker at the point B is speaking, even if the speaker's voice at the point A is distorted, it is masked by the voice of the speaker at the point B and As a result, the deterioration of the subjective quality is reduced compared with the point A single utterance section even with the same amount of distortion. In the conventional method, since this effect is not taken into consideration, the estimated subjective quality is evaluated more severely than the subjective quality felt by the speaker at the point B in an actual call, and there is a problem in quality estimation accuracy.

本発明の目的は,品質を評価する側の話者の発話状態が品質評価に与える影響を考慮し,現実の通話における音声の主観品質を精度良く推定可能な音声品質客観評価装置および音声品質客観評価方法を提供することにある。   SUMMARY OF THE INVENTION An object of the present invention is to provide an objective speech quality objective evaluation device and speech quality objective that can accurately estimate the subjective quality of speech in an actual call in consideration of the influence of the speech state of the speaker evaluating the quality on the quality assessment. To provide an evaluation method.

上記課題を解決するため,本発明は,地点A,B間の通話において,地点Aの音声品質を評価(つまり,地点Bの話者が感じる音声品質を評価)する際に,地点Bの話者の発話状態を考慮し,これに基づいて地点Aの話者の音声歪量に重み付けすることを主要な特徴とする。   In order to solve the above-mentioned problem, the present invention relates to a point B story when evaluating a voice quality at a point A (that is, evaluating a voice quality felt by a speaker at the point B) in a call between the points A and B. The main feature is that the speech distortion amount of the speaker at the point A is weighted based on the utterance state of the speaker.

すなわち,本発明は,音声信号の主観品質(人間が信号を聴いたときに感じる品質)を,音声信号の物理的特徴量の測定結果から推定する音声品質客観評価装置において,双方発話区間の品質劣化量に対する重み付けを,単独発話区間に比べて軽減することを特徴とする。また,双方発話率がエンドエンドの伝送遅延時間に相関があることを考慮し,評価に用いる音声信号の双方発話率を評価対象系の伝送遅延時間から決定することを特徴とする。   In other words, the present invention provides a speech quality objective evaluation device that estimates the subjective quality of a speech signal (the quality that a person feels when listening to the signal) from the measurement results of the physical features of the speech signal. The feature is that the weighting for the amount of degradation is reduced compared to the single utterance interval. Further, in consideration of the fact that the bidirectional speech rate is correlated with the end-to-end transmission delay time, the bidirectional speech rate of the voice signal used for the evaluation is determined from the transmission delay time of the evaluation target system.

従来法では,地点Aの話者の音声信号を分析した結果のみに基づいて歪量を定量化し,主観品質を推定しており,この点が本発明との差異である。本発明では,地点Bの話者が感じる地点Aの話者の音声品質を,地点Bの話者の発話状態を考慮して評価するため,これを考慮していない従来法に比べて精度の良い主観品質推定が可能となる。   In the conventional method, the distortion amount is quantified based on only the result of analyzing the voice signal of the speaker at the point A, and the subjective quality is estimated. This is a difference from the present invention. In the present invention, since the voice quality of the speaker at the point A felt by the speaker at the point B is evaluated in consideration of the utterance state of the speaker at the point B, the accuracy is higher than that in the conventional method not considering this. Good subjective quality estimation is possible.

本発明の音声品質客観評価装置によれば,音声品質を評価する側の話者の発話状態が品質評価に与える影響を考慮した音声品質の客観評価が可能となり,結果として,現実の通話における音声の主観品質を精度良く推定可能となる。また,特に評価に用いる音声信号の双方発話率を伝送遅延時間に応じて決定し,その双方発話率の音声信号を評価音源として用いることにより,さらに精度の良い主観品質の推定が可能となる。   According to the voice quality objective evaluation device of the present invention, it is possible to objectively evaluate the voice quality in consideration of the influence of the speech state of the speaker on the voice quality evaluation side on the quality evaluation. It is possible to accurately estimate the subjective quality of. Further, by determining the both speech rates of the voice signal used for the evaluation in accordance with the transmission delay time and using the voice signal of the both speech rates as the evaluation sound source, it is possible to estimate the subjective quality with higher accuracy.

本発明の第1の実施例のブロック構成を図1に示す。図1において,1は本発明に係る音声品質客観評価装置である。また,上述したように,評価対象系2としては,例えば,IP電話システム,固定電話システム,携帯電話システム等を想定する。   A block configuration of the first embodiment of the present invention is shown in FIG. In FIG. 1, reference numeral 1 denotes an audio quality objective evaluation apparatus according to the present invention. As described above, the evaluation target system 2 is assumed to be, for example, an IP telephone system, a fixed telephone system, a mobile phone system, or the like.

音声品質客観評価装置1は,地点A,B双方の音声信号を保持する音声DB11,音声DB11から得られる地点Aの音声信号とこれを評価対象系2に通して得られる劣化音声とを比較することにより劣化音声の歪量を時系列として定量化する歪量測定部12,音声DB11から得られる地点Aの音声信号と地点Bの音声信号とを比較することにより双方発話区間を検出する双方発話区間検出部13,および双方発話区間検出部13から得られる情報に基づいて歪量測定部12が出力する歪量時系列に重み付けを行う重み付け部14を備える。   The voice quality objective evaluation device 1 compares the voice DB 11 holding the voice signals of both the points A and B, the voice signal of the point A obtained from the voice DB 11 and the deteriorated voice obtained by passing this through the evaluation target system 2. The distortion amount measuring unit 12 that quantifies the distortion amount of the deteriorated speech as a time series, and the two-way utterance detecting the two-speaking section by comparing the voice signal at the point A and the voice signal at the point B obtained from the voice DB 11 Based on information obtained from the section detection unit 13 and the both-speaking section detection unit 13, a weighting unit 14 that weights the distortion time series output from the distortion measurement unit 12 is provided.

音声DB11は,各地点の話者の音声信号を蓄えている。具体的な音声信号としてはITU−T勧告P.59擬似会話音声を用いることができる。この擬似音声信号は,図2に示すような2チャネルの音声信号から構成される。   The voice DB 11 stores voice signals of speakers at each point. Specific audio signals include ITU-T recommendation P.I. 59 pseudo-conversational speech can be used. This pseudo audio signal is composed of a 2-channel audio signal as shown in FIG.

図2は,双方向通話の特徴を説明する図である。地点Aまたは地点Bにおける発話区間が斜線部に示される。図2において,(1)の区間は双方無音区間,(2)−Aの区間は地点A単独発話区間,(2)−Bの区間は地点B単独発話区間,(3)の区間は双方発話区間である。   FIG. 2 is a diagram for explaining the characteristics of a two-way call. The utterance section at the point A or the point B is indicated by the hatched portion. In FIG. 2, the section (1) is a silent section for both, the section (2) -A is a point A single utterance section, the section (2) -B is a point B single utterance section, and the section (3) is a double utterance. It is a section.

歪量測定部12における歪量の算出には,例えばITU−T勧告P.862に規定されるアルゴリズムを適用する。双方発話区間検出部13は,図2に示すような地点A音声信号と地点B音声信号の2チャネル信号のパワーを比較することにより,双方発話区間を検出し,その情報を重み付け部14に提供する。   For the calculation of the strain amount in the strain amount measuring unit 12, for example, ITU-T recommendation P.I. The algorithm defined in 862 is applied. The two-speaking section detecting unit 13 detects the two-speaking section by comparing the powers of the two channel signals of the point A voice signal and the point B voice signal as shown in FIG. 2 and provides the information to the weighting unit 14. To do.

重み付け部14は,双方発話区間検出部13から得た情報に基づいて,地点A発話区間のうち,地点A単独発話区間(図2の(2)−A)と双方発話区間(図2の(3))とを区別して歪量測定部12の出力である歪量時系列を重み付け平均する。   Based on the information obtained from the both-side utterance section detection unit 13, the weighting unit 14 includes the point A utterance section ((2) -A in FIG. 2) and the two-part utterance section (((2) in FIG. 2)). 3)) is distinguished, and the distortion time series as the output of the distortion measurement unit 12 is weighted averaged.

評価に用いる地点A単独発話区間集合をΩs,双方発話区間集合をΩd,双方無音区間集合をΩe,歪量時系列をD(t)(t:時間)とした時に,例えば,客観評価値Yを以下のように決定する。   For example, when the point A single utterance interval set used for evaluation is Ωs, both utterance interval sets are Ωd, both silent interval sets are Ωe, and the distortion time series is D (t) (t: time), for example, objective evaluation value Y Is determined as follows.

Figure 2005077970
Figure 2005077970

具体的な重み係数αは,主観評価実験により得られる主観評価値(学習データ)と上記客観評価値Yの相関が最も高くなるように予め最適化する。   The specific weighting coefficient α is optimized in advance so that the correlation between the subjective evaluation value (learning data) obtained by the subjective evaluation experiment and the objective evaluation value Y is the highest.

図3は,本発明の第1の実施例に係る音声品質客観評価処理フローの一例を示す図である。まず,歪量測定部12が,音声DB11から得られる地点Aの音声信号とこれを評価対象系2に通して得られる劣化音声とを比較することにより劣化音声の歪量を時系列として定量化する(ステップS1)。   FIG. 3 is a diagram showing an example of an audio quality objective evaluation process flow according to the first embodiment of the present invention. First, the distortion amount measuring unit 12 quantifies the distortion amount of the deteriorated speech as a time series by comparing the speech signal of the point A obtained from the speech DB 11 and the deteriorated speech obtained by passing this through the evaluation target system 2. (Step S1).

双方発話区間検出部13が,音声DB11から得られる地点Aの音声信号と地点Bの音声信号とを比較することにより双方発話区間を検出する(ステップS2)。そして,重み付け部14が,双方発話区間検出部13から得られる情報に基づいて歪量測定部12が出力する歪量時系列を,上記式に従って重み付け平均し,客観評価値Yを算出する(ステップS3)。   The both-speaking section detecting unit 13 detects the both-speaking section by comparing the voice signal at the point A and the voice signal at the point B obtained from the voice DB 11 (step S2). Then, the weighting unit 14 weights and averages the distortion amount time series output from the distortion amount measurement unit 12 based on the information obtained from the two-utterance section detection unit 13 according to the above formula, and calculates the objective evaluation value Y (step) S3).

本発明の第2の実施例のブロック構成を図4に示す。音声品質客観評価装置3は,地点A,B間における評価対象系2の伝送遅延時間の測定に用いる試験信号を保持する試験信号データベース(DB)31,地点Aから試験信号を送信した時刻と地点Bでこれを受信した時刻とを比較することにより伝送遅延時間を測定する遅延時間測定部32,伝送遅延時間と双方発話率との対応情報を保持する双方発話率テーブル34,遅延時間測定部32の出力である伝送遅延時間から双方発話率テーブル34を参照することにより双方発話率を決定する双方発話率決定部33,この双方発話率を実現する2チャネルの音声信号を生成する音声信号生成部35,地点Aの音声信号とこれを評価対象系に通して得られる劣化音声とを比較することにより劣化音声の歪量を時系列として定量化する歪量測定部36,音声信号生成部35から得られる地点Aの音声信号と地点Bの音声信号とを比較することにより双方発話区間を検出する双方発話区間検出部37,および双方発話区間検出部37から得られる情報に基づいて歪量測定部36が出力する歪量時系列に重み付けを行う重み付け部38を備える。   FIG. 4 shows a block configuration of the second embodiment of the present invention. The voice quality objective evaluation device 3 includes a test signal database (DB) 31 that holds a test signal used for measuring the transmission delay time of the evaluation target system 2 between the points A and B, and the time and point at which the test signal is transmitted from the point A. A delay time measuring unit 32 that measures the transmission delay time by comparing the time when it was received in B, a two-way speech rate table 34 that holds correspondence information between the transmission delay time and the two-way speech rate, and a delay time measuring unit 32 The bilateral speech rate determination unit 33 that determines the bilateral speech rate by referring to the bilateral speech rate table 34 from the transmission delay time that is the output of the voice, and the audio signal generation unit that generates a 2-channel audio signal that realizes the bilateral speech rate 35. Distortion amount measuring unit 3 for quantifying the amount of distortion of degraded speech as a time series by comparing the speech signal of point A with the degraded speech obtained by passing the speech signal through the evaluation target system. The information obtained from the both-speaking section detecting unit 37 and the both-speaking section detecting unit 37 for detecting the both-speaking section by comparing the speech signal at the point A and the speech signal at the point B obtained from the speech signal generating unit 35. Is provided with a weighting unit 38 that weights the distortion amount time series output from the distortion amount measurement unit 36.

第1の実施例では,双方発話率が一定であることを前提として,これを実現する音声信号を予め図1中に示す音声DB11に蓄えておく方法を用いた。しかし,一般に地点A,B間の伝送遅延時間が長くなるほど会話がしにくくなり,会話の衝突が起きやすくなるという性質がある。つまり,伝送遅延時間が長くなるほど双方発話率が高くなる。   In the first embodiment, on the assumption that the two-way utterance rate is constant, a method is used in which a voice signal for realizing this is stored in advance in the voice DB 11 shown in FIG. However, in general, the longer the transmission delay time between points A and B, the more difficult it is to have a conversation and the more likely that a conversation collision will occur. That is, the longer the transmission delay time, the higher the both-side utterance rate.

そこで本実施例では,予め伝送遅延時間と双方発話率の関係をテーブル(双方発話率テーブル34)として用意し,伝送遅延時間を測定した結果に基づいて,適切な双方発話率を決定している。   Therefore, in this embodiment, the relationship between the transmission delay time and the two-way speech rate is prepared in advance as a table (two-way speech rate table 34), and an appropriate two-way speech rate is determined based on the result of measuring the transmission delay time. .

図5は,双方発話率テーブル34のデータ構成例を示す図である。双方発話率テーブル34には,例えば図5に示すように,伝送の遅延時間(msec)と双方発話率(%)との対応情報が格納されている。この双方発話率テーブル34は,伝送遅延時間をパラメータとした会話実験を行い,このときの会話音声の双方発話率を分析することにより,両者の対応関係を調べ,結果をテーブル化することによって作成することができる。   FIG. 5 is a diagram illustrating a data configuration example of the two-way utterance rate table 34. For example, as shown in FIG. 5, the two-way utterance rate table 34 stores correspondence information between the transmission delay time (msec) and the two-way utterance rate (%). This two-way utterance rate table 34 is created by conducting a conversation experiment using the transmission delay time as a parameter, analyzing the two-way utterance rate of the conversation voice at this time, examining the correspondence between the two, and tabulating the results. can do.

双方発話率決定部33によって決定された双方発話率を実現する音声信号を得るために,音声信号生成部35では,例えばITU−T勧告P.59に準拠したアルゴリズムに従って擬似的な2チャネル音声信号を生成する。この音声信号の生成では,話者A単独発話,話者B単独発話,双方発話,双方無音の4つの状態間の状態遷移確率を,上記双方発話率を満たすように定め,各状態に対応する擬似音声信号を,ITU−T勧告P.50に定められる生成法を用いて,長時間平均スペクトル,瞬時振幅分布,ピッチ周波数等の音声特徴量が平均的な特性となるように信号を生成する。   In order to obtain a voice signal that realizes the two-way utterance rate determined by the two-way utterance rate determining unit 33, the voice signal generating unit 35, for example, ITU-T Recommendation A pseudo two-channel audio signal is generated according to an algorithm based on 59. In the generation of this voice signal, the state transition probabilities between the four states of speaker A single utterance, speaker B single utterance, bilateral utterance, and bilateral silence are determined so as to satisfy the bilateral utterance rate, and correspond to each state The pseudo audio signal is transmitted to ITU-T recommendation P.22. Using the generation method defined in 50, a signal is generated so that speech feature quantities such as a long-time average spectrum, instantaneous amplitude distribution, and pitch frequency have an average characteristic.

音声信号生成部35を備える代わりに,様々な双方発話率を有する2チャネル音声信号を音声データベースとして予め用意しておき,適切な双方発話率の音声信号を選択して用いることも可能である。   Instead of providing the voice signal generation unit 35, it is also possible to prepare a two-channel voice signal having various two-way speech rates in advance as a voice database and select and use a voice signal having an appropriate two-way voice rate.

以後の動作は第1の実施例に準ずる。すなわち,双方発話区間検出部37が,音声信号生成部35から得られる地点A音声信号と地点Bの音声信号とを比較することにより双方発話区間を検出し,重み付け部38が,双方発話区間検出部37から得られる情報に基づいて歪量測定部36が出力する歪量時系列に重み付けを行う。   The subsequent operation is in accordance with the first embodiment. That is, the both-speaking section detecting unit 37 detects the both-speaking section by comparing the point A speech signal obtained from the speech signal generating unit 35 and the point-B speech signal, and the weighting unit 38 detects the both-speaking section detection. Based on the information obtained from the unit 37, the strain amount time series output from the strain amount measuring unit 36 is weighted.

図6は,本発明の第2の実施例に係る音声品質客観評価処理フローの一例を示す図である。まず,遅延時間測定部32が,地点Aから試験信号を送信した時刻と地点Bでこれを受信した時刻とを比較することにより伝送遅延時間を測定する(ステップS11)。   FIG. 6 is a diagram showing an example of an audio quality objective evaluation process flow according to the second embodiment of the present invention. First, the delay time measuring unit 32 measures the transmission delay time by comparing the time when the test signal is transmitted from the point A and the time when the test signal is received at the point B (step S11).

双方発話率決定部33が,測定された伝送遅延時間から双方発話率テーブル34を参照することにより双方発話率を決定する(ステップS12)。その結果をもとに,音声信号生成部35が,決定された双方発話率を実現する2チャネルの音声信号(地点Aの音声信号と地点Bの音声信号)を生成する(ステップS13)。   The two-way utterance rate determining unit 33 determines the two-way utterance rate by referring to the two-way utterance rate table 34 from the measured transmission delay time (step S12). Based on the result, the voice signal generation unit 35 generates a two-channel voice signal (the voice signal at the point A and the voice signal at the point B) that realizes the determined bilateral speech rate (step S13).

歪量測定部36が,地点Aの音声信号とこれを評価対象系に通して得られる劣化音声とを比較することにより劣化音声の歪量を時系列として定量化する(ステップS14)。双方発話区間検出部37が,音声信号生成部35から得られる地点Aの音声信号と地点Bの音声信号とを比較することにより双方発話区間を検出する(ステップS15)。重み付け部38が,双方発話区間検出部37から得られる情報に基づいて歪量測定部36が出力する歪量時系列を,前述した式に従って重み付け平均し,客観評価値Yを算出する(ステップS16)。   The distortion amount measuring unit 36 quantifies the distortion amount of the deteriorated sound as a time series by comparing the sound signal of the point A with the deteriorated sound obtained by passing the sound signal through the evaluation target system (step S14). The both-speaking section detecting unit 37 detects the both-speaking section by comparing the sound signal at the point A and the sound signal at the point B obtained from the sound signal generating unit 35 (step S15). The weighting unit 38 weights and averages the distortion amount time series output from the distortion amount measuring unit 36 based on the information obtained from the two-speaking section detecting unit 37 according to the above-described formula, thereby calculating the objective evaluation value Y (step S16). ).

本発明の第1の実施例のブロック構成を示す図である。It is a figure which shows the block configuration of the 1st Example of this invention. 双方向通話の特徴を説明する図である。It is a figure explaining the characteristic of a two-way call. 音声品質客観評価処理フローの一例を示す図である。It is a figure which shows an example of an audio | voice quality objective evaluation processing flow. 本発明の第2の実施例のブロック構成を示す図である。It is a figure which shows the block configuration of the 2nd Example of this invention. 双方発話率テーブルのデータ構成例を示す図である。It is a figure which shows the data structural example of a both-side utterance rate table. 音声品質客観評価処理フローの一例を示す図である。It is a figure which shows an example of an audio | voice quality objective evaluation processing flow. 従来の音声品質客観品質評価装置のブロック構成を示す図である。It is a figure which shows the block structure of the conventional audio | voice quality objective quality evaluation apparatus.

符号の説明Explanation of symbols

1,3,4 音声品質客観評価装置
2 評価対象系
11,41 音声データベース(DB)
12,36,42 歪量測定部
13,37 双方発話区間検出部
14,38 重み付け部
31 試験信号データベース(DB)
32 遅延時間測定部
33 双方発話率決定部
34 双方発話率テーブル
35 音声信号生成部
43 平均処理部
1,3,4 Speech quality objective evaluation system 2 Evaluation target system 11,41 Voice database (DB)
12, 36, 42 Distortion amount measurement unit 13, 37 Both utterance section detection unit 14, 38 Weighting unit 31 Test signal database (DB)
32 Delay time measurement unit 33 Both-side utterance rate determination unit 34 Both-side utterance rate table 35 Audio signal generation unit 43 Average processing unit

Claims (4)

第1地点から第2地点までの評価対象系を通した音声信号の物理的特徴量の測定結果から音声品質を客観評価する音声品質客観評価装置であって,
第1地点の音声信号とこれを評価対象系に通して得られる劣化音声とを比較することにより劣化音声の歪量を測定し,歪量時系列として定量化する歪量測定手段と,
第1地点の音声信号と第2地点の音声信号とを比較することにより双方発話区間を検出する双方発話区間検出手段と,
前記検出された双方発話区間の歪量に対する重みを,単独発話区間に比べて軽減した重み付け値を用いて,前記歪量測定手段が出力する歪量時系列の重み付け平均を算出する重み付け手段とを備える
ことを特徴とする音声品質客観評価装置。
An audio quality objective evaluation device for objectively evaluating audio quality from measurement results of physical features of audio signals through an evaluation target system from a first point to a second point,
A distortion amount measuring means for measuring the distortion amount of the deteriorated voice by comparing the voice signal of the first point with the deteriorated voice obtained by passing the voice signal through the evaluation target system, and quantifying the distortion amount as a time series;
A both-speaking section detecting means for detecting a both-speaking section by comparing the sound signal of the first point and the sound signal of the second point;
Weighting means for calculating a weighted average of distortion amount time series output from the distortion amount measuring means using a weighting value obtained by reducing the weight for the distortion amount of the detected both utterance intervals compared to a single utterance interval; A voice quality objective evaluation device characterized by comprising:
第1地点から第2地点までの評価対象系を通した音声信号の物理的特徴量の測定結果から音声品質を客観評価する音声品質客観評価装置であって,
前記第1地点から第2地点までの評価対象系の伝送遅延時間を測定する遅延時間測定手段と,
前記測定された伝送遅延時間に基づいて,予め定められた伝送遅延時間と通話における双方発話率との対応情報から双方発話率を決定する双方発話率決定手段と,
前記決定された双方発話率を実現する音声信号を評価音源として生成または予め用意された音声データベースから選択する音声信号生成/選択手段と,
前記評価音源の第1地点の音声信号とこれを評価対象系に通して得られる劣化音声とを比較することにより劣化音声の歪量を測定し,歪量時系列として定量化する歪量測定手段と,
前記評価音源の第1地点の音声信号と第2地点の音声信号とを比較することにより双方発話区間を検出する双方発話区間検出手段と,
前記検出された双方発話区間の歪量に対する重みを,単独発話区間に比べて軽減した重み付け値を用いて,前記歪量測定手段が出力する歪量時系列の重み付け平均を算出する重み付け手段とを備える
ことを特徴とする音声品質客観評価装置。
An audio quality objective evaluation device for objectively evaluating audio quality from measurement results of physical features of audio signals through an evaluation target system from a first point to a second point,
A delay time measuring means for measuring a transmission delay time of an evaluation target system from the first point to the second point;
Based on the measured transmission delay time, a both-side speech rate determining means for determining a two-way speech rate from correspondence information between a predetermined transmission delay time and a two-way speech rate in a call;
An audio signal generating / selecting means for generating an audio signal for realizing the determined bilateral speech rate as an evaluation sound source or selecting from an audio database prepared in advance;
Distortion amount measuring means for measuring the distortion amount of the deteriorated speech by comparing the sound signal of the first point of the evaluation sound source with the deteriorated speech obtained by passing this through the evaluation target system, and quantifying it as a distortion time series When,
A both-speaking section detecting means for detecting a both-speaking section by comparing a voice signal of the first point and a second point of the evaluation sound source;
Weighting means for calculating a weighted average of distortion amount time series output from the distortion amount measuring means using a weighting value obtained by reducing the weight for the distortion amount of the detected both utterance intervals compared to a single utterance interval; A voice quality objective evaluation device characterized by comprising:
第1地点から第2地点までの評価対象系を通した音声信号の物理的特徴量の測定結果から音声品質を客観評価する音声品質客観評価方法であって,
第1地点の音声信号とこれを評価対象系に通して得られる劣化音声とを比較することにより劣化音声の歪量を測定し,歪量時系列として定量化する歪量測定ステップと,
第1地点の音声信号と第2地点の音声信号とを比較することにより双方発話区間を検出する双方発話区間検出ステップと,
前記検出された双方発話区間の歪量に対する重みを,単独発話区間に比べて軽減した重み付け値を用いて,前記歪量時系列の重み付け平均を算出する重み付けステップとを有する
ことを特徴とする音声品質客観評価方法。
An audio quality objective evaluation method for objectively evaluating audio quality from measurement results of physical features of audio signals through an evaluation target system from a first location to a second location,
A distortion amount measuring step of measuring a distortion amount of the deteriorated voice by comparing the voice signal of the first point and the deteriorated voice obtained by passing this through the evaluation target system, and quantifying the distortion amount as a time series;
A both-speaking section detecting step of detecting a both-speaking section by comparing the sound signal of the first point and the sound signal of the second point;
A weighting step of calculating a weighted average of the distortion amount time series using a weighting value obtained by reducing a weight for the distortion amount of the detected both utterance sections as compared to a single utterance section. Quality objective evaluation method.
第1地点から第2地点までの評価対象系を通した音声信号の物理的特徴量の測定結果から音声品質を客観評価する音声品質客観評価方法であって,
前記第1地点から第2地点までの評価対象系の伝送遅延時間を測定する遅延時間測定ステップと,
前記測定された伝送遅延時間に基づいて,予め定められた伝送遅延時間と通話における双方発話率との対応情報から双方発話率を決定する双方発話率決定ステップと,
前記決定された双方発話率を実現する音声信号を評価音源として生成または予め用意された音声データベースから選択する音声信号生成/選択ステップと,
前記評価音源の第1地点の音声信号とこれを評価対象系に通して得られる劣化音声とを比較することにより劣化音声の歪量を測定し,歪量時系列として定量化する歪量測定ステップと,
前記評価音源の第1地点の音声信号と第2地点の音声信号とを比較することにより双方発話区間を検出する双方発話区間検出ステップと,
前記検出された双方発話区間の歪量に対する重みを,単独発話区間に比べて軽減した重み付け値を用いて,前記歪量時系列の重み付け平均を算出する重み付けステップとを有する
ことを特徴とする音声品質客観評価方法。
An audio quality objective evaluation method for objectively evaluating audio quality from measurement results of physical features of audio signals through an evaluation target system from a first location to a second location,
A delay time measuring step of measuring a transmission delay time of an evaluation target system from the first point to the second point;
A two-way utterance rate determining step for determining a two-way utterance rate from correspondence information between a predetermined transmission delay time and a two-way utterance rate in a call based on the measured transmission delay time;
An audio signal generation / selection step of generating an audio signal that realizes the determined bilateral speech rate as an evaluation sound source or selecting from an audio database prepared in advance;
A distortion amount measuring step of measuring the distortion amount of the deteriorated speech by comparing the sound signal of the first point of the evaluation sound source with the deteriorated speech obtained by passing this through the evaluation target system, and quantifying it as a distortion amount time series. When,
A both-speaking section detecting step of detecting a both-speaking section by comparing the sound signal of the first point of the evaluation sound source with the sound signal of the second point;
A weighting step of calculating a weighted average of the distortion amount time series using a weighting value obtained by reducing a weight for the distortion amount of the detected both utterance sections as compared to a single utterance section. Quality objective evaluation method.
JP2003311090A 2003-09-03 2003-09-03 Voice quality objective evaluation apparatus and voice quality objective evaluation method Expired - Fee Related JP4113481B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003311090A JP4113481B2 (en) 2003-09-03 2003-09-03 Voice quality objective evaluation apparatus and voice quality objective evaluation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003311090A JP4113481B2 (en) 2003-09-03 2003-09-03 Voice quality objective evaluation apparatus and voice quality objective evaluation method

Publications (2)

Publication Number Publication Date
JP2005077970A true JP2005077970A (en) 2005-03-24
JP4113481B2 JP4113481B2 (en) 2008-07-09

Family

ID=34412740

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003311090A Expired - Fee Related JP4113481B2 (en) 2003-09-03 2003-09-03 Voice quality objective evaluation apparatus and voice quality objective evaluation method

Country Status (1)

Country Link
JP (1) JP4113481B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011250289A (en) * 2010-05-28 2011-12-08 Nippon Telegr & Teleph Corp <Ntt> Speech quality estimation method, speech quality estimation device, and speech quality estimation system
CN111276161A (en) * 2020-03-05 2020-06-12 公安部第三研究所 Voice quality scoring system and method
CN114486286A (en) * 2022-01-12 2022-05-13 中国重汽集团济南动力有限公司 Method and equipment for evaluating quality of door closing sound of vehicle

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11544545B2 (en) 2017-04-04 2023-01-03 Hailo Technologies Ltd. Structured activation based sparsity in an artificial neural network
US10387298B2 (en) 2017-04-04 2019-08-20 Hailo Technologies Ltd Artificial neural network incorporating emphasis and focus techniques
US11238334B2 (en) 2017-04-04 2022-02-01 Hailo Technologies Ltd. System and method of input alignment for efficient vector operations in an artificial neural network
US11551028B2 (en) 2017-04-04 2023-01-10 Hailo Technologies Ltd. Structured weight based sparsity in an artificial neural network
US11615297B2 (en) 2017-04-04 2023-03-28 Hailo Technologies Ltd. Structured weight based sparsity in an artificial neural network compiler

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011250289A (en) * 2010-05-28 2011-12-08 Nippon Telegr & Teleph Corp <Ntt> Speech quality estimation method, speech quality estimation device, and speech quality estimation system
CN111276161A (en) * 2020-03-05 2020-06-12 公安部第三研究所 Voice quality scoring system and method
CN111276161B (en) * 2020-03-05 2023-03-10 公安部第三研究所 Voice quality scoring system and method
CN114486286A (en) * 2022-01-12 2022-05-13 中国重汽集团济南动力有限公司 Method and equipment for evaluating quality of door closing sound of vehicle
CN114486286B (en) * 2022-01-12 2024-05-17 中国重汽集团济南动力有限公司 Method and equipment for evaluating quality of door closing sound of vehicle

Also Published As

Publication number Publication date
JP4113481B2 (en) 2008-07-09

Similar Documents

Publication Publication Date Title
JP2009050013A (en) Echo detection and monitoring
EP1979900B1 (en) Apparatus for estimating sound quality of audio codec in multi-channel and method therefor
Hines et al. ViSQOL: The virtual speech quality objective listener
JP4745916B2 (en) Noise suppression speech quality estimation apparatus, method and program
KR101430321B1 (en) Method and system for determining a perceived quality of an audio system
Rix Perceptual speech quality assessment-a review
JP2011501206A (en) Method and system for measuring voice comprehension of audio transmission system
KR20190111134A (en) Methods and devices for improving call quality in noisy environments
US8566082B2 (en) Method and system for the integral and diagnostic assessment of listening speech quality
Ding et al. Non-intrusive single-ended speech quality assessment in VoIP
JP4113481B2 (en) Voice quality objective evaluation apparatus and voice quality objective evaluation method
US20090161882A1 (en) Method of Measuring an Audio Signal Perceived Quality Degraded by a Noise Presence
JP2007013674A (en) Comprehensive speech communication quality evaluating device and comprehensive speech communication quality evaluating method
Moeller et al. Objective estimation of speech quality for communication systems
JP4761391B2 (en) Listening quality evaluation method and apparatus
US7412375B2 (en) Speech quality assessment with noise masking
JP4116955B2 (en) Voice quality objective evaluation apparatus and voice quality objective evaluation method
Gierlich et al. Advanced speech quality testing of modern telecommunication equipment: An overview
Brachmański Estimation of logatom intelligibility with the STI method for polish speech transmitted via communication channels
Ghimire Speech intelligibility measurement on the basis of ITU-T Recommendation P. 863
JP2015106896A (en) Speech quality estimation method, speech quality estimation device, and program
Egi et al. Objective quality evaluation method for noise-reduced speech
Somek et al. Speech quality assessment
Lingapuram Measuring speech quality of laptop microphone system using PESQ
Hedlund et al. Quantification of audio quality loss after wireless transfer

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20050715

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20080317

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20080408

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20080411

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110418

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110418

Year of fee payment: 3

RD02 Notification of acceptance of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: R3D02

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120418

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130418

Year of fee payment: 5

LAPS Cancellation because of no payment of annual fees