JP4514149B2

JP4514149B2 - Speech quality estimation apparatus and speech quality estimation method

Info

Publication number: JP4514149B2
Application number: JP2005254533A
Authority: JP
Inventors: 仁志青木; 玲高橋
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-09-02
Filing date: 2005-09-02
Publication date: 2010-07-28
Anticipated expiration: 2025-09-02
Also published as: JP2007065547A

Description

本発明は、ユーザが体感する音声の品質を、ユーザを介さずに音声の物理的特徴等から推定する音声品質推定技術に関するものである。 The present invention relates to a speech quality estimation technique for estimating speech quality experienced by a user from physical features of speech without using the user.

従来の音声品質推定技術としては、非特許文献１に開示されたＥ−ｍｏｄｅｌがある。このＥ−ｍｏｄｅｌによると、品質パラメータ（音声コーデック、遅延量、音量、エコー量など）を入力としてＲ値と呼ばれる通信品質指標を出力している。
品質の推定対象となる音声帯域には様々なものがあり、代表的な音声通信サービスの帯域として、電話帯域（３００〜３４００Ｈｚ）や広帯域（５０〜７０００Ｈｚ）がある。前者の電話帯域は、電話サービス網上やＩＰネットワーク上の電話サービスとして広く使用されており、後者の広帯域は、電話サービス網上やＩＰネットワーク上のＴＶ会議サービスや高品質電話サービスとして使用されている。前記のＥ−ｍｏｄｅｌは、電話帯域の音声の品質を推定する場合に用いられる。 As a conventional speech quality estimation technique, there is E-model disclosed in Non-Patent Document 1. According to this E-model, a quality parameter (sound codec, delay amount, volume, echo amount, etc.) is input and a communication quality index called an R value is output.
There are various voice bands for which quality is to be estimated, and typical voice communication service bands include a telephone band (300 to 3400 Hz) and a wide band (50 to 7000 Hz). The former telephone band is widely used as a telephone service on a telephone service network or an IP network, and the latter broadband is used as a video conference service or a high quality telephone service on a telephone service network or an IP network. Yes. The E-model is used when estimating the voice quality of the telephone band.

「The E-model,a computational model for use in transmission planning」，ITU-T Recommendation G.107，2003"The E-model, a computational model for use in transmission planning", ITU-T Recommendation G.107, 2003

Ｅ−ｍｏｄｅ１は、電話帯域の音声をターゲットとして開発されたこともあり、電話帯域よりも低域および高域を拡張した広帯域の音声の品質推定に際しては十分な精度が得られない。具体的には、Ｅ−ｍｏｄｅｌにおける通信品質指標値の最大値は９３とされているため、より鮮明で高品質な音声通信を実現した広帯域通信において９３を超える通信品質指標値が出力されることはなく、高品質な広帯域音声通信の品質を十分に推定しているとは言えない。 The E-mode 1 has been developed targeting the voice of the telephone band, and sufficient accuracy cannot be obtained when estimating the quality of the wideband voice in which the low frequency band and the high frequency band are extended from the telephone band. Specifically, since the maximum value of the communication quality index value in E-model is 93, a communication quality index value exceeding 93 is output in broadband communication that realizes clearer and higher quality voice communication. It cannot be said that the quality of high-quality broadband voice communication is sufficiently estimated.

ユーザが満足できるようにネットワークや端末の品質を設計しておくことは、音声通信サービスを提供する上で重要である。しかし、Ｅ−ｍｏｄｅｌの対象が電話帯域の音声であるため、Ｅ−ｍｏｄｅｌをネットワークや端末の品質設計に用いた場合、電話帯域における最高の品質までしか品質設計ができず、より高品質な広帯域音声通信サービスの品質設計ができないという問題点があった。 Designing the quality of networks and terminals so that users can be satisfied is important in providing voice communication services. However, since the target of E-model is voice in the telephone band, when E-model is used for network and terminal quality design, quality design can only be performed up to the highest quality in the telephone band, and higher quality broadband. There was a problem that the quality of voice communication service could not be designed.

本発明の目的は、Ｅ−ｍｏｄｅｌ等の電話帯域用に開発された音声品質推定技術を利用して、より広帯域の音声通信に対しても十分な推定精度が得られる音声品質推定値を導出することにある。 An object of the present invention is to derive a speech quality estimation value that can provide sufficient estimation accuracy for wider-band speech communication using speech quality estimation technology developed for a telephone band such as E-model. There is.

本発明は、対象音声周波数帯域における対象音声の品質を推定する音声品質推定装置であって、前記対象音声周波数帯域と異なる基準音声周波数帯域における前記対象音声の品質を推定して品質指標値を出力する品質推定手段と、各音声周波数帯域毎の音声評価値を予め記憶するデータベースと、前記対象音声周波数帯域と前記基準音声周波数帯域の各々の前記音声評価値を前記データベースから取得する読込手段と、前記対象音声周波数帯域の音声評価値と前記基準音声周波数帯域の音声評価値に基づいて、前記品質指標値を前記対象音声周波数帯域における値に補正する補正値を算出する算出手段と、前記品質指標値を前記補正値により補正する補正手段とを有するものである。
また、本発明の音声品質推定装置の１構成例において、前記データベースは、各音声周波数帯域と音声品質との関係を評価する主観品質評価実験により予め求められた前記音声評価値を音声周波数帯域毎に記憶するものである。
また、本発明の音声品質推定装置の１構成例において、前記データベースは、各音声周波数帯域と音声品質との関係を測定する客観品質測定により予め求められた前記音声評価値を音声周波数帯域毎に記憶するものである。
また、本発明の音声品質推定装置の１構成例は、さらに、前記対象音声周波数帯域が未知である場合に、通信装置から出力される音声信号に基づいて前記対象音声周波数帯域を求める対象音声周波数帯域判定手段を有するものである。 The present invention is a speech quality estimation device for estimating the quality of a target speech in a target speech frequency band, and estimates the quality of the target speech in a reference speech frequency band different from the target speech frequency band and outputs a quality index value Quality estimation means, a database for storing speech evaluation values for each voice frequency band in advance, a reading means for acquiring the voice evaluation values for each of the target voice frequency band and the reference voice frequency band from the database, Calculation means for calculating a correction value for correcting the quality index value to a value in the target audio frequency band based on the audio evaluation value of the target audio frequency band and the audio evaluation value of the reference audio frequency band; and the quality index Correction means for correcting the value by the correction value.
Further, in one configuration example of the speech quality estimation apparatus of the present invention, the database stores the speech evaluation value obtained in advance by a subjective quality evaluation experiment for evaluating a relationship between each speech frequency band and speech quality for each speech frequency band. To remember.
In the configuration example of the speech quality estimation apparatus according to the present invention, the database may store the speech evaluation value obtained in advance by objective quality measurement for measuring the relationship between each speech frequency band and speech quality for each speech frequency band. It is something to remember.
Moreover, one configuration example of the speech quality estimation apparatus of the present invention further includes a target speech frequency for obtaining the target speech frequency band based on a speech signal output from a communication device when the target speech frequency band is unknown. It has a band determination means.

また、本発明の音声品質推定方法は、前記対象音声周波数帯域と異なる基準音声周波数帯域における前記対象音声の品質を推定して品質指標値を出力する品質推定手順と、予め得られている各音声周波数帯域と音声品質との関係に基づいて、前記対象音声周波数帯域と前記基準音声周波数帯域の各々の音声評価値を求める評価値取得手順と、前記対象音声周波数帯域の音声評価値と前記基準音声周波数帯域の音声評価値に基づいて、前記品質指標値を前記対象音声周波数帯域における値に補正する補正値を算出する補正値算出手順と、前記品質指標値を前記補正値により補正する補正手順とを有するものである。 Also, the speech quality estimation method of the present invention includes a quality estimation procedure for estimating the quality of the target speech in a reference speech frequency band different from the target speech frequency band and outputting a quality index value; Based on the relationship between the frequency band and the voice quality, an evaluation value acquisition procedure for obtaining a voice evaluation value of each of the target voice frequency band and the reference voice frequency band, a voice evaluation value of the target voice frequency band, and the reference voice A correction value calculation procedure for calculating a correction value for correcting the quality index value to a value in the target audio frequency band based on a voice evaluation value in a frequency band; and a correction procedure for correcting the quality index value by the correction value; It is what has.

本発明によれば、対象音声周波数帯域の音声評価値と基準音声周波数帯域の音声評価値に基づいて補正値を算出し、この補正値によって基準音声周波数帯域における対象音声の品質指標値を補正することにより、基準音声周波数帯域における対象音声の品質指標値を対象音声周波数帯域における品質指標値に拡張することができる。その結果、本発明では、各音声周波数帯域と音声品質との関係を音声評価値として予めデータベースに登録しておくことにより、Ｅ−ｍｏｄｅｌ等の従来の品質推定手段を利用する場合において、基準音声周波数帯域よりも広帯域の対象音声周波数帯域における対象音声の主観品質に相当する品質指標値を高い精度で推定することができる。 According to the present invention, the correction value is calculated based on the voice evaluation value of the target voice frequency band and the voice evaluation value of the reference voice frequency band, and the quality index value of the target voice in the reference voice frequency band is corrected by the correction value. Thus, the quality index value of the target voice in the reference voice frequency band can be extended to the quality index value in the target voice frequency band. As a result, in the present invention, the relationship between each audio frequency band and audio quality is registered in advance in the database as an audio evaluation value, so that when using conventional quality estimation means such as E-model, the reference audio A quality index value corresponding to the subjective quality of the target voice in the target voice frequency band wider than the frequency band can be estimated with high accuracy.

また、本発明では、対象音声周波数帯域判定手段を設けることにより、対象音声周波数帯域が未知の場合であっても、対象音声周波数帯域における対象音声の主観品質に相当する品質指標値を推定することができる。 Further, in the present invention, by providing the target voice frequency band determining means, the quality index value corresponding to the subjective quality of the target voice in the target voice frequency band is estimated even when the target voice frequency band is unknown. Can do.

［第１の実施の形態］
以下、本発明の実施の形態について図面を参照して説明する。図１は、本発明の第１の実施の形態となる音声品質推定装置の構成を示すブロック図である。本実施の形態の音声品質推定装置は、基準音声周波数帯域における対象音声Ａｉｎの品質を推定して品質指標値Ｒを出力する総合品質推定部１と、各音声周波数帯域毎の音声主観評価値を予め記憶する主観品質データベース２と、品質推定の対象となる対象音声周波数帯域と基準音声周波数帯域の各々の音声主観評価値を主観品質データベース２から取得するＭＯＳデータ読込部３と、ＭＯＳデータ読込部３が取得した音声主観評価値を間隔尺度へ変換する間隔尺度変換部４と、間隔尺度に変換された評価値から、品質指標値Ｒを対象音声周波数帯域における値に補正する補正値を算出する算出部５と、品質指標値Ｒを補正値により補正するスケーリング部６とを備える。 [First Embodiment]
Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a speech quality estimation apparatus according to the first embodiment of the present invention. The speech quality estimation apparatus of the present embodiment estimates the quality of the target speech Ain in the reference speech frequency band and outputs the quality index value R, and the speech subjective evaluation value for each speech frequency band. Subjective quality database 2 stored in advance, MOS data reading unit 3 for acquiring subjective speech evaluation values of the target speech frequency band and the reference speech frequency band to be subjected to quality estimation from the subjective quality database 2, and a MOS data reading unit 3 calculates a correction value for correcting the quality index value R to a value in the target audio frequency band from the interval scale conversion unit 4 that converts the acquired speech subjective evaluation value into the interval scale, and the evaluation value converted into the interval scale. A calculation unit 5 and a scaling unit 6 that corrects the quality index value R with a correction value are provided.

総合品質推定部１には、図示しない通信装置から品質推定の対象となる対象音声Ａｉｎが入力される。総合品質推定部１は、基準音声周波数帯域（本実施の形態では電話帯域であり、３００〜３４００Ｈｚ）における対象音声Ａｉｎの総合品質を推定し、品質指標値Ｒを出力する。この品質指標値Ｒは、基準音声周波数帯域における対象音声Ａｉｎの主観品質に相当するものである。総合品質推定部１の例としては、例えば非特許文献１に開示されたＥ−ｍｏｄｅｌがある。Ｅ−ｍｏｄｅｌの場合、まず対象音声Ａｉｎの品質パラメータ（音声コーデック、遅延量、音量、エコー量など）を求め、この品質パラメータから品質指標値Ｒを求める。 The overall quality estimation unit 1 receives a target speech Ain that is a target of quality estimation from a communication device (not shown). The total quality estimation unit 1 estimates the total quality of the target voice Ain in the reference voice frequency band (which is a telephone band in the present embodiment, 300 to 3400 Hz), and outputs a quality index value R. The quality index value R corresponds to the subjective quality of the target voice Ain in the reference voice frequency band. As an example of the total quality estimation unit 1, there is an E-model disclosed in Non-Patent Document 1, for example. In the case of E-model, first, quality parameters (speech codec, delay amount, volume, echo amount, etc.) of the target speech Ain are obtained, and a quality index value R is obtained from the quality parameters.

主観品質データベース２には、音声周波数帯域毎の音声主観評価値が予め登録されている。この音声主観評価値の例としては、平均オピニオン評点（MOS:Mean Opinion Score）がある。ＭＯＳは、被験者が音声信号サンプルを聞いて５段階評価した評価値である。本実施の形態では、帯域幅が異なるだけでそれ以外の劣化がない音声信号サンプルを音声周波数帯域毎に用意して、例えば文献「A.Takahashi，A.Kurashima，and H.Yoshino，“Subjective quality index for compatibly evaluating narrowband and wideband speech”，MESAQIN2005，June，2005」で提案されているグローバルテスト（Global test）により、各音声信号サンプルのＭＯＳを求めたものを主観品質データベース２に登録している。主観品質データベース２の構成例を図２に示す。主観品質データベース２には、音声周波数帯域の下限周波数および上限周波数と、この音声周波数帯域のＭＯＳとが対応付けられて登録されている。 In the subjective quality database 2, speech subjective evaluation values for each sound frequency band are registered in advance. As an example of the speech subjective evaluation value, there is a mean opinion score (MOS). The MOS is an evaluation value evaluated by a subject on a five-point scale by listening to an audio signal sample. In the present embodiment, audio signal samples that are different in bandwidth but have no other deterioration are prepared for each audio frequency band. For example, documents “A. Takahashi, A. Kurashima, and H. Yoshino,“ Subjective quality ” The MOS for each audio signal sample obtained by the global test proposed in “index for compatibly evaluating narrowband and wideband speech”, MESAQIN2005, June, 2005 ”is registered in the subjective quality database 2. A configuration example of the subjective quality database 2 is shown in FIG. In the subjective quality database 2, the lower limit frequency and the upper limit frequency of the voice frequency band and the MOS of the voice frequency band are registered in association with each other.

ＭＯＳデータ読込部３は、例えば音声品質推定装置のユーザから品質設計の対象となる対象音声周波数帯域の下限周波数Ｆｌ（例えば５０Ｈｚ）および上限周波数Ｆｈ（例えば７０００Ｈｚ）の指定を受け付ける。そして、ＭＯＳデータ読込部３は、指定された対象音声周波数帯域（５０〜７０００Ｈｚ）と予め定められた基準音声周波数帯域（３００〜３４００Ｈｚ）の音声主観評価値を主観品質データベース２から取得する。以下、対象音声周波数帯域の音声主観評価値をＭＯＳｔ、基準音声周波数帯域の音声主観評価値をＭＯＳｂとする。図２の例では、ＭＯＳｔは４．３、ＭＯＳｂは３．５である。 For example, the MOS data reading unit 3 accepts designation of a lower limit frequency Fl (for example, 50 Hz) and an upper limit frequency Fh (for example, 7000 Hz) of the target speech frequency band that is a target of quality design from a user of the speech quality estimation apparatus. Then, the MOS data reading unit 3 acquires from the subjective quality database 2 speech subjective evaluation values of the designated target speech frequency band (50 to 7000 Hz) and a predetermined reference speech frequency band (300 to 3400 Hz). Hereinafter, the speech subjective evaluation value in the target speech frequency band is MOSt, and the speech subjective evaluation value in the reference speech frequency band is MOSb. In the example of FIG. 2, MOSt is 4.3 and MOSb is 3.5.

ＭＯＳは心理尺度上の量であるため、ＭＯＳ尺度上では、尺度値の差の比と品質との関係が一定ではない。例えば、最高尺度値５の付近では品質が非常によく、最低尺度値１の付近では品質が非常に悪いといったように心理尺度が飽和してしまい、尺度値が変化しても品質があまり変化しないという性質がある。そこで、間隔尺度変換部４は、ＭＯＳデータ読込部３が取得したＭＯＳｔ，ＭＯＳｂを、尺度値の差の比と品質との関係が一定となる間隔尺度へ変換する。以下、ＭＯＳｔ，ＭＯＳｂを変換した後の評価値をそれぞれＲｔ，Ｒｂとする。間隔尺度変換部４の例としては、非特許文献１のＡｐｐｅｎｄｉｘＩで提案されている、ＭＯＳを品質指標値Ｒに変換する変換式を用いたものが考えられる。 Since MOS is a quantity on a psychological scale, on the MOS scale, the relationship between the ratio of scale value differences and quality is not constant. For example, the psychological scale is saturated such that the quality is very good near the maximum scale value 5 and the quality is very bad near the minimum scale value 1, and the quality does not change much even if the scale value changes. It has the nature of Therefore, the interval scale conversion unit 4 converts the MOSt and MOSb acquired by the MOS data reading unit 3 into an interval scale in which the relationship between the scale value difference ratio and the quality is constant. Hereinafter, the evaluation values after the conversion of MOSt and MOSb are Rt and Rb, respectively. As an example of the interval scale conversion unit 4, one using a conversion formula that is proposed in Appendix I of Non-Patent Document 1 and converts a MOS into a quality index value R can be considered.

算出部５は、間隔尺度変換部４によって変換された評価値Ｒｔ，Ｒｂから、品質指標値Ｒの補正値である拡張係数λを以下の式（１）に従って計算する。
λ＝Ｒｔ−Ｒｂ・・・（１） The calculation unit 5 calculates an expansion coefficient λ, which is a correction value of the quality index value R, from the evaluation values Rt and Rb converted by the interval scale conversion unit 4 according to the following equation (1).
λ = Rt−Rb (1)

最後に、スケーリング部６は、総合品質推定部１から出力された品質指標値Ｒを式（２）に示すように拡張係数λで補正することにより、対象音声周波数帯域における対象音声Ａｉｎの総合品質指標値Ｒ’を算出する。この総合品質指標値Ｒ’は、対象音声周波数帯域における対象音声Ａｉｎの主観品質に相当するものである。
Ｒ’＝Ｒ＋λ ・・・（２） Finally, the scaling unit 6 corrects the quality index value R output from the total quality estimation unit 1 with the expansion coefficient λ as shown in the equation (2), so that the total quality of the target speech Ain in the target speech frequency band is corrected. An index value R ′ is calculated. This total quality index value R ′ corresponds to the subjective quality of the target voice Ain in the target voice frequency band.
R ′ = R + λ (2)

以上のように、本実施の形態では、対象音声周波数帯域の評価値Ｒｔと基準音声周波数帯域の評価値Ｒｂに基づいて拡張係数λを算出し、この拡張係数λによって基準音声周波数帯域における対象音声の品質指標値Ｒを補正することにより、品質指標値Ｒを対象音声周波数帯域における品質指標値Ｒ’に拡張することができる。 As described above, in the present embodiment, the extension coefficient λ is calculated based on the evaluation value Rt of the target voice frequency band and the evaluation value Rb of the reference voice frequency band, and the target voice in the reference voice frequency band is calculated based on the extension coefficient λ. The quality index value R can be expanded to the quality index value R ′ in the target audio frequency band by correcting the quality index value R.

なお、本実施の形態では、各音声周波数帯域と音声品質との関係を評価する主観品質評価実験により求められた音声主観評価値を記憶する主観品質データベース２を用いているが、このデータベース２の代わりに、各音声周波数帯域と音声品質との関係を測定する客観品質測定により求められた音声評価値を音声周波数帯域毎に記憶するデータベースを用いてもよい。 In the present embodiment, a subjective quality database 2 is used which stores a speech subjective evaluation value obtained by a subjective quality evaluation experiment for evaluating the relationship between each speech frequency band and speech quality. Instead, a database that stores voice evaluation values obtained by objective quality measurement for measuring the relationship between each voice frequency band and voice quality may be used for each voice frequency band.

［第２の実施の形態］
次に、本発明の第２の実施の形態について説明する。図３は、本発明の第２の実施の形態となる音声品質推定装置の構成を示すブロック図であり、図１と同一の構成には同一の符号を付してある。第１の実施の形態で説明した総合品質推定部１に入力される対象音声Ａｉｎは音声通信装置１１から出力されるが、本実施の形態は、この音声通信装置１１が使用している音声周波数帯域が不明な場合の例である。 [Second Embodiment]
Next, a second embodiment of the present invention will be described. FIG. 3 is a block diagram showing the configuration of the speech quality estimation apparatus according to the second embodiment of the present invention. The same components as those in FIG. The target voice Ain that is input to the overall quality estimation unit 1 described in the first embodiment is output from the voice communication device 11. In the present embodiment, the voice frequency used by the voice communication device 11 is used. This is an example when the bandwidth is unknown.

本実施の形態では、予め音声通信装置１１の対象音声周波数帯域を測定する測定時に、音声通信装置１１に試験信号Ｓｔｅｓｔを入力し、この試験信号Ｓｔｅｓｔに応じて音声通信装置１１から出力される音声通信装置出力音声Ａｔｅｓｔに基づいて、対象音声周波数帯域を求める対象音声周波数帯域幅判定部７を備える。このときの試験信号Ｓｔｅｓｔとしては、例えば白色雑音やピンク雑音のように全周波数帯域成分を持つものを使用することができる。対象音声周波数帯域を決定するには、例えば通過帯域の平均パワーよりｘｄＢ（ｘは任意の値）低くなる周波数を対象音声周波数帯域の下限周波数Ｆｌおよび上限周波数Ｆｈとすればよい。 In the present embodiment, a test signal Stest is input to the voice communication device 11 at the time of measuring the target voice frequency band of the voice communication device 11 in advance, and the voice output from the voice communication device 11 according to the test signal Test. A target voice frequency bandwidth determination unit 7 for obtaining a target voice frequency band is provided based on the communication device output voice Atest. As the test signal Test at this time, a signal having all frequency band components such as white noise and pink noise can be used. In order to determine the target voice frequency band, for example, the frequency lower than the average power of the pass band by xdB (x is an arbitrary value) may be set as the lower limit frequency Fl and the upper limit frequency Fh of the target voice frequency band.

対象音声周波数帯域の判定例を図４を用いて説明する。図４（Ａ）は試験信号Ｓｔｅｓｔの周波数特性を示す図、図４（Ｂ）は音声通信装置出力音声Ａｔｅｓｔの周波数特性を示す図である。本実施の形態では、試験信号Ｓｔｅｓｔとして図４（Ａ）に示すように全周波数帯域でフラットな特性を有する白色雑音を用いる。白色雑音を音声通信装置１１に入力すると、通過帯域の信号のみが音声通信装置１１を通過するので、図４（Ｂ）に示すように帯域制限された音声通信装置出力音声Ａｔｅｓｔが出力される。ここでは、仮に通過帯域の平均パワーより３ｄＢ低くなる周波数を対象音声周波数帯域の下限周波数Ｆｌおよび上限周波数Ｆｈとすると、下限周波数Ｆｌが５０Ｈｚで、上限周波数Ｆｈが７０００Ｈｚであると判定することができる。 A determination example of the target audio frequency band will be described with reference to FIG. 4A is a diagram showing the frequency characteristics of the test signal Stest, and FIG. 4B is a diagram showing the frequency characteristics of the voice communication apparatus output voice Atest. In the present embodiment, white noise having flat characteristics in the entire frequency band is used as the test signal Test as shown in FIG. When white noise is input to the voice communication device 11, only the signal in the pass band passes through the voice communication device 11, so that the band-limited voice communication device output voice Atest is output as shown in FIG. Here, assuming that the frequency that is 3 dB lower than the average power of the pass band is the lower limit frequency Fl and the upper limit frequency Fh of the target voice frequency band, it can be determined that the lower limit frequency Fl is 50 Hz and the upper limit frequency Fh is 7000 Hz. .

対象音声周波数帯域幅判定部７が求めた対象音声周波数帯域の下限周波数Ｆｌおよび上限周波数Ｆｈは、受付部８を通じて総合品質推定部１に通知される。
対象音声周波数帯域の測定終了後、対象音声周波数帯域幅判定部７は、音声通信装置１１への試験信号Ｓｔｅｓｔの入力を停止し、音声通信装置１１を通常の状態に戻す。以後は、音声通信装置１１から総合品質推定部１に対象音声Ａｉｎが入力される。総合品質推定部１、主観品質データベース２、ＭＯＳデータ読込部３、間隔尺度変換部４、算出部５およびスケーリング部６の動作は、第１の実施の形態と同じである。 The lower limit frequency Fl and the upper limit frequency Fh of the target voice frequency band obtained by the target voice frequency bandwidth determination unit 7 are notified to the overall quality estimation unit 1 through the reception unit 8.
After the measurement of the target voice frequency band is completed, the target voice frequency bandwidth determination unit 7 stops the input of the test signal Test to the voice communication device 11 and returns the voice communication device 11 to a normal state. Thereafter, the target voice Ain is input from the voice communication device 11 to the total quality estimation unit 1. The operations of the overall quality estimation unit 1, the subjective quality database 2, the MOS data reading unit 3, the interval scale conversion unit 4, the calculation unit 5 and the scaling unit 6 are the same as those in the first embodiment.

以上のように、本実施の形態では、対象音声周波数帯域幅判定部７を設けることにより、対象音声周波数帯域が未知の場合であっても、対象音声周波数帯域における対象音声の品質を推定することができる。 As described above, in the present embodiment, by providing the target audio frequency bandwidth determination unit 7, the quality of the target audio in the target audio frequency band is estimated even when the target audio frequency band is unknown. Can do.

［第３の実施の形態］
次に、本発明の第３の実施の形態について説明する。図５は、本発明の第３の実施の形態となる音声品質推定装置の構成を示すブロック図であり、図１と同一の構成には同一の符号を付してある。本実施の形態は、第１の実施の形態における主観品質データベース２とＭＯＳデータ読込部３と間隔尺度変換部４の代わりに、間隔尺度品質データベース９と品質データ読込部１０を備えるものである。 [Third Embodiment]
Next, a third embodiment of the present invention will be described. FIG. 5 is a block diagram showing the configuration of a speech quality estimation apparatus according to the third embodiment of the present invention. The same components as those in FIG. In this embodiment, an interval scale quality database 9 and a quality data reading unit 10 are provided instead of the subjective quality database 2, the MOS data reading unit 3, and the interval scale conversion unit 4 in the first embodiment.

間隔尺度品質データベース９には、音声周波数帯域毎の音声主観評価値を予め間隔尺度に変換した評価値が登録されている。第１の実施の形態と同様に、音声主観評価値の例としてはＭＯＳがある。そして、ＭＯＳを間隔尺度に変換するには、第１の実施の形態の間隔尺度変換部４で説明した変換式を用いればよい。間隔尺度品質データベース９の構成例を図６に示す。間隔尺度品質データベース９には、音声周波数帯域の下限周波数および上限周波数と、この音声周波数帯域の間隔尺度値とが対応付けられて登録されている。 In the interval scale quality database 9, an evaluation value obtained by converting a speech subjective evaluation value for each audio frequency band into an interval scale in advance is registered. Similar to the first embodiment, an example of the speech subjective evaluation value is MOS. In order to convert the MOS into the interval scale, the conversion formula described in the interval scale conversion unit 4 of the first embodiment may be used. A configuration example of the interval scale quality database 9 is shown in FIG. In the interval scale quality database 9, the lower limit frequency and the upper limit frequency of the voice frequency band and the interval scale value of the voice frequency band are registered in association with each other.

品質データ読込部１０は、ユーザから対象音声周波数帯域の下限周波数Ｆｌ（例えば５０Ｈｚ）および上限周波数Ｆｈ（例えば７０００Ｈｚ）の指定を受け付け、指定された対象音声周波数帯域と予め定められた基準音声周波数帯域（３００〜３４００Ｈｚ）の間隔尺度値Ｒｔ，Ｒｂを間隔尺度品質データベース９から取得して出力する。図６の例では、対象音声周波数帯域の間隔尺度値Ｒｔは１００、基準音声周波数帯域の間隔尺度値Ｒｂは８０である。 The quality data reading unit 10 accepts designation of the lower limit frequency Fl (for example, 50 Hz) and the upper limit frequency Fh (for example, 7000 Hz) of the target audio frequency band from the user, and the specified target audio frequency band and a predetermined reference audio frequency band The interval scale values Rt and Rb of (300 to 3400 Hz) are acquired from the interval scale quality database 9 and output. In the example of FIG. 6, the interval scale value Rt of the target audio frequency band is 100, and the interval scale value Rb of the reference audio frequency band is 80.

総合品質推定部１、算出部５およびスケーリング部６の動作は、第１の実施の形態と同じである。
以上のように、本実施の形態では、音声主観評価値を予め間隔尺度に変換しておくことにより、間隔尺度変換部４を省略することができる。 The operations of the overall quality estimation unit 1, the calculation unit 5, and the scaling unit 6 are the same as those in the first embodiment.
As described above, in this embodiment, the interval scale conversion unit 4 can be omitted by converting the speech subjective evaluation value into the interval measure in advance.

本発明は、音声品質推定技術に適用することができる。 The present invention can be applied to a speech quality estimation technique.

本発明の第１の実施の形態となる音声品質推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice quality estimation apparatus used as the 1st Embodiment of this invention. 本発明の第１の実施の形態における主観品質データベースの構成例を示す図である。It is a figure which shows the structural example of the subjective quality database in the 1st Embodiment of this invention. 本発明の第２の実施の形態となる音声品質推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice quality estimation apparatus used as the 2nd Embodiment of this invention. 試験信号および試験信号に応じて音声通信装置から出力される音声通信装置出力音声の周波数特性を示す図である。It is a figure which shows the frequency characteristic of the audio | voice communication apparatus output audio | voice output from an audio | voice communication apparatus according to a test signal and a test signal. 本発明の第３の実施の形態となる音声品質推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice quality estimation apparatus used as the 3rd Embodiment of this invention. 本発明の第３の実施の形態における間隔尺度品質データベースの構成例を示す図である。It is a figure which shows the structural example of the space | interval scale quality database in the 3rd Embodiment of this invention.

Explanation of symbols

１…総合品質推定部、２…主観品質データベース、３…ＭＯＳデータ読込部、４…間隔尺度変換部、５…算出部、６…スケーリング部、７…対象音声周波数帯域幅判定部、８…受付部、９…間隔尺度品質データベース、１０…品質データ読込部、１１…音声通信装置。
DESCRIPTION OF SYMBOLS 1 ... Total quality estimation part, 2 ... Subjective quality database, 3 ... MOS data reading part, 4 ... Space | interval scale conversion part, 5 ... Calculation part, 6 ... Scaling part, 7 ... Target audio | voice frequency bandwidth determination part, 8 ... Reception , 9 ... interval scale quality database, 10 ... quality data reading unit, 11 ... voice communication device.

Claims

A speech quality estimation device for estimating the quality of a target speech in a target speech frequency band,
Quality estimation means for estimating the quality of the target voice in a reference voice frequency band different from the target voice frequency band and outputting a quality index value;
A database that pre-stores voice evaluation values for each voice frequency band;
Reading means for acquiring the voice evaluation value of each of the target voice frequency band and the reference voice frequency band from the database;
Calculation means for calculating a correction value for correcting the quality index value to a value in the target voice frequency band based on the voice evaluation value of the target voice frequency band and the voice evaluation value of the reference voice frequency band;
A speech quality estimation apparatus comprising: a correction unit that corrects the quality index value by the correction value.

The speech quality estimation apparatus according to claim 1, wherein
The speech quality estimation apparatus characterized in that the database stores, for each speech frequency band, the speech evaluation value obtained in advance by a subjective quality evaluation experiment for evaluating a relationship between each speech frequency band and speech quality. .

The speech quality estimation apparatus according to claim 1, wherein
The speech quality estimation apparatus, wherein the database stores, for each speech frequency band, the speech evaluation value obtained in advance by objective quality measurement for measuring a relationship between each speech frequency band and speech quality.

The speech quality estimation apparatus according to claim 1, wherein
The speech quality estimation apparatus further comprising target speech frequency band determining means for obtaining the target speech frequency band based on a speech signal output from a communication device when the target speech frequency band is unknown.

A speech quality estimation method for estimating a target speech quality in a target speech frequency band,
A quality estimation procedure for estimating the quality of the target voice in a reference voice frequency band different from the target voice frequency band and outputting a quality index value;
Based on the relationship between each voice frequency band and voice quality obtained in advance, an evaluation value acquisition procedure for obtaining a voice evaluation value of each of the target voice frequency band and the reference voice frequency band;
A correction value calculation procedure for calculating a correction value for correcting the quality index value to a value in the target audio frequency band based on the audio evaluation value of the target audio frequency band and the audio evaluation value of the reference audio frequency band;
And a correction procedure for correcting the quality index value by the correction value.