JP2016144035A

JP2016144035A - Voice quality estimation device, method and program

Info

Publication number: JP2016144035A
Application number: JP2015018499A
Authority: JP
Inventors: 隆文奥山; Takafumi Okuyama; 征貴増田; Masataka Masuda; 敦子倉島; Atsuko Kurashima
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-02-02
Filing date: 2015-02-02
Publication date: 2016-08-08
Anticipated expiration: 2035-02-02
Also published as: JP6387308B2

Abstract

PROBLEM TO BE SOLVED: To estimate voice quality without inputting a voice signal, while considering the existence or non-existence of a voice in a section in which a packet loss occurs.SOLUTION: An acquisition unit 22 of a voice quality estimation device 20 acquires a second packet group. A lost packet identification unit 23 identifies a lost third packet group on the basis of the second packet group. A voice existence or non-existence section discrimination unit 24 discriminates a first voice presence section and a first silence section in the second packet group, and based on the second packet group, discriminates a second voice presence section and a second silence section in the third packet group. A loss ratio calculation unit 25 calculates a first loss ratio according to the first voice presence section and the second voice presence section, and calculates a second loss ratio according to the first silence section and the second silence section. A voice quality estimation unit 26 estimates voice quality according to the first loss ratio, the second loss ratio and a mapping function.SELECTED DRAWING: Figure 2

Description

この発明は、音声品質推定装置、方法及びプログラムに関する。 The present invention relates to a speech quality estimation apparatus, method, and program.

音声通話サービスを利用するユーザの体感品質を評価するために、個別の品質要因に対する評価を考慮することが望ましい。考慮すべき主な品質要因の一つとして、通話時の音声品質が挙げられる。 In order to evaluate the quality of experience of a user who uses a voice call service, it is desirable to consider evaluation of individual quality factors. One of the main quality factors to consider is the voice quality during a call.

音声品質を推定する手法として、受聴ＭＯＳ（Mean Opinion Score）を評価する手法がある。例えば、ＶｏＬＴＥ（Voice over Long Term Evolution）のような広帯域音声サービスの受聴ＭＯＳを評価する手法として、ＰＯＬＱＡ（Perceptual Objective Listening Quality Analysis）に基づく手法が知られている（非特許文献１参照）。 As a method of estimating the voice quality, there is a method of evaluating a listening MOS (Mean Opinion Score). For example, a technique based on POLQA (Perceptual Objective Listening Quality Analysis) is known as a technique for evaluating a listening MOS of a broadband voice service such as VoLTE (Voice over Long Term Evolution) (see Non-Patent Document 1).

ＰＯＬＱＡは、発話側から入力される参照音声信号と、受話側で出力される収録音声信号とを比較し、ＰＯＬＱＡ評価値を算出することで、音声品質を評価する手法である。なお、ＰＯＬＱＡ評価値は、ＩＴＵ−Ｔ（Telecommunication standardization sector of International Telecommunication Union）勧告Ｐ．８６３のImplementer’s guide（P. Imp 863）で規定するマッピング関数を適用することにより、推定受聴ＭＯＳ（MOS-LQO : Mean Opinion Score ‐ Listening Quality Objective）に変換可能である。 POLQA is a technique for evaluating speech quality by comparing a reference speech signal input from the utterance side with a recorded speech signal output on the reception side and calculating a POLQA evaluation value. The POLQA evaluation value is ITU-T (Telecommunication standardization sector of International Telecommunication Union) recommendation P.I. By applying a mapping function defined in 863 Implementer ’s guide (P. Imp 863), conversion to an estimated listening MOS (MOS-LQO: Mean Opinion Score-Listening Quality Objective) is possible.

しかしながら、ＰＯＬＱＡは、参照音声信号を入力して処理する必要がある。したがって、参照音声信号が取得できない実オペレーションを想定した場合、適用できないという不都合がある。 However, POLQA needs to input and process a reference audio signal. Therefore, when an actual operation in which a reference audio signal cannot be acquired is assumed, there is a disadvantage that it cannot be applied.

一方、ＩＴＵ−Ｔ勧告Ｐ．５６４では、参照音声信号を入力せずに受聴ＭＯＳを評価する手法として、ＩＰ（Internet Protocol）電話のパケット損失特性に基づき受聴ＭＯＳを評価することで、音声品質を推定するフレームワークを定義している（非特許文献２参照）。 On the other hand, ITU-T recommendation P.I. In 564, as a method for evaluating the listening MOS without inputting the reference voice signal, a framework for estimating the voice quality is defined by evaluating the listening MOS based on the packet loss characteristic of the IP (Internet Protocol) telephone. (See Non-Patent Document 2).

ITU-T P.863 Perceptual Objective Listening Quality Assessment., 09/2014.ITU-T P.863 Perceptual Objective Listening Quality Assessment., 09/2014. ITU-T P.564 Conformance Testing for Voice over IP Transmission Quality Assessment Models., 11/2007.ITU-T P.564 Conformance Testing for Voice over IP Transmission Quality Assessment Models., 11/2007.

しかしながら、非特許文献２が定義するフレームワークは、本発明者の検討によれば、受聴ＭＯＳを評価するに際して、パケット損失が生じた区間の音声の有無が考慮されない。 However, according to the inventor's study, the framework defined by Non-Patent Document 2 does not consider the presence or absence of voice in a section where packet loss occurs when evaluating the listening MOS.

したがって、音声信号レベルが極度に低い、又は無音の区間（以下、無音区間と言う。）にパケット損失が集中した場合、評価される受聴ＭＯＳは、実際の受聴ＭＯＳと比較して過小になってしまう、という不都合がある。 Therefore, when packet loss is concentrated in a section where the audio signal level is extremely low or silent (hereinafter referred to as a silent section), the evaluated listening MOS becomes smaller than the actual listening MOS. There is an inconvenience.

また、通常の音声信号が含まれる区間（以下、有音区間と言う。）にパケット損失が集中した場合、評価される受聴ＭＯＳは、実際の受聴ＭＯＳと比較して過大になってしまう、という不都合がある。 Further, when packet loss is concentrated in a section including a normal audio signal (hereinafter referred to as a “sound section”), the evaluated listening MOS becomes excessive as compared with the actual listening MOS. There is an inconvenience.

すなわち、従来の音声品質推定装置は、音声信号を入力せずに音声品質を推定できるものの、パケット損失が生じた区間における音声の有無を考慮できない。 That is, the conventional speech quality estimation apparatus can estimate speech quality without inputting speech signals, but cannot consider the presence or absence of speech in a section where packet loss has occurred.

この発明は上記事情に着目してなされたもので、その目的とするところは、パケット損失が生じた区間における音声の有無を考慮しつつ、音声信号を入力せずに音声品質を推定できるようにした音声品質推定装置、方法及びプログラムを提供することにある。 The present invention has been made paying attention to the above circumstances, and its purpose is to enable estimation of voice quality without inputting a voice signal while considering the presence or absence of voice in a section where packet loss has occurred. It is to provide a voice quality estimation apparatus, method and program.

上記目的を達成するためにこの発明の第１の観点は、以下のような構成要素を備えている。すなわち、音声品質推定装置は、測定期間内に有音及び無音の音声データを含む第１のパケット群を送信する第１の端末から、上記第１のパケット群から第３のパケット群が欠損した第２のパケット群を受信する第２の端末における上記音声データの音声品質を推定する。上記第２のパケット群を取得する。上記取得された第２のパケット群をメモリに記憶する。上記記憶された第２のパケット群に基づき、上記欠損した第３のパケット群を特定する。上記第２のパケット群内の音声データについて、有音の音声データを含む第１の有音区間と無音の音声データを含む第１の無音区間とを判別する。上記特定された第３のパケット群内の音声データについて、上記第２のパケット群内の音声データに基づき、有音の音声データを含む第２の有音区間と無音の音声データを含む第２の無音区間とを判別する。上記判別された第１の有音区間及び第２の有音区間に基づき、上記測定期間内の総有音区間に対して上記第２の有音区間が占める割合を示す第１の欠損割合を算出し、上記判別された第１の無音区間及び第２の無音区間に基づき、上記測定期間内の総無音区間に対して上記第２の無音区間が占める割合を示す第２の欠損割合を算出する。上記算出された第１の欠損割合及び第２の欠損割合と、予め定められたマッピング関数とに基づき、上記音声データの音声品質を推定するようにしたものである。 In order to achieve the above object, a first aspect of the present invention includes the following components. That is, the voice quality estimation device has lost the third packet group from the first packet group from the first terminal that transmits the first packet group including voiced and silent voice data within the measurement period. The voice quality of the voice data at the second terminal that receives the second packet group is estimated. The second packet group is acquired. The acquired second packet group is stored in a memory. Based on the stored second packet group, the missing third packet group is identified. For the voice data in the second packet group, a first voiced section including voiced voice data and a first silent section containing silent voice data are discriminated. For the voice data in the specified third packet group, a second voiced section including voiced voice data and a second voiced section containing voiced voice data based on the voice data in the second packet group. Is determined as a silent section. Based on the determined first voiced section and second voiced section, a first deficit ratio indicating a ratio of the second voiced section to the total voiced section in the measurement period is set. And calculating a second loss ratio indicating a ratio of the second silent section to the total silent section in the measurement period based on the determined first silent section and second silent section. To do. The voice quality of the voice data is estimated based on the calculated first and second missing ratios and a predetermined mapping function.

この発明の第１の観点によれば、音声品質推定装置は、受信されたパケット群と、受信時に欠損したパケット群とにおいて、有音区間と無音区間をそれぞれ判別する。そして、判別された有音区間及び無音区間において、欠損したパケットの占める欠損割合をそれぞれ評価する。そして、有音区間及び無音区間でそれぞれ評価された欠損割合に基づき、音声品質を推定する。このため、音声信号を入力する必要が無く、パケットの損失特性から音声品質を推定することができる。また、パケット損失が生じた区間における音声の有無を考慮することができる。 According to the first aspect of the present invention, the speech quality estimation device discriminates a voiced section and a silent section from a received packet group and a packet group lost at the time of reception. Then, in the determined voiced section and silent section, the loss ratio occupied by the lost packet is evaluated. Then, the voice quality is estimated based on the loss ratio evaluated in each of the voiced section and the silent section. For this reason, it is not necessary to input a voice signal, and the voice quality can be estimated from the packet loss characteristics. Further, it is possible to consider the presence or absence of voice in a section where packet loss has occurred.

すなわち、この発明によれば、パケット損失が生じた区間における音声の有無を考慮しつつ、音声信号を入力せずに音声品質を推定できるようにした音声品質推定装置、方法及びプログラムを提供することができる。 That is, according to the present invention, it is possible to provide a speech quality estimation device, method, and program capable of estimating speech quality without inputting a speech signal while considering the presence or absence of speech in a section where packet loss has occurred. Can do.

この発明の第１の実施形態に係る音声品質推定システムの機能構成の一例を示す模式図である。It is a schematic diagram which shows an example of a function structure of the audio | voice quality estimation system which concerns on 1st Embodiment of this invention. 同実施形態における音声品質推定装置の機能構成の一例を示すブロック図である。It is a block diagram which shows an example of a function structure of the audio | voice quality estimation apparatus in the embodiment. 同実施形態における音声有無区間判別の一例を示す模式図である。It is a schematic diagram which shows an example of the audio | voice presence / absence area discrimination | determination in the embodiment. 同実施形態における音声有無区間判別の一例を示す模式図である。It is a schematic diagram which shows an example of the audio | voice presence / absence area discrimination | determination in the embodiment. 同実施形態における音声品質推定装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the audio | voice quality estimation apparatus in the embodiment. この発明の第２の実施形態に係る音声品質推定装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the audio | voice quality estimation apparatus which concerns on 2nd Embodiment of this invention.

以下、図面を参照してこの発明に関わる実施形態を説明する。なお、以下の音声品質推定装置は、音声品質を推定する音声品質推定装置としてコンピュータを機能させるためのプログラムを用いて実施してもよい。すなわち、音声品質推定装置は、メモリ又はＣＰＵ（Central Processing Unit）等のハードウェア資源に協働する機能ブロックの機能をプログラムにより実現してもよい。 Embodiments according to the present invention will be described below with reference to the drawings. Note that the following speech quality estimation apparatus may be implemented using a program for causing a computer to function as a speech quality estimation apparatus that estimates speech quality. That is, the speech quality estimation apparatus may realize the function of a function block that cooperates with a hardware resource such as a memory or a CPU (Central Processing Unit) by a program.

［第１の実施形態］
図１は、この発明の第１の実施形態に係る音声品質推定システムの機能構成を示す模式図であり、図２は、同実施形態における音声品質推定装置の機能構成を示すブロック図である。音声品質推定システム１は、音声通話システム１０、及び音声品質推定装置２０を備えている。 [First Embodiment]
FIG. 1 is a schematic diagram showing a functional configuration of a speech quality estimation system according to the first embodiment of the present invention, and FIG. 2 is a block diagram showing a functional configuration of a speech quality estimation apparatus according to the embodiment. The voice quality estimation system 1 includes a voice call system 10 and a voice quality estimation device 20.

音声通話システム１０は、送信端末１１、及び受信端末１２を備えている。送信端末１１及び受信端末１２は、ＩＰ網やモバイルネットワーク等のパケット交換網１３を介して接続されている。なお、以下の説明において、「送信端末１１」、「受信端末１２」は、それぞれ「第１の端末」、「第２の端末」と読み替えてもよい。 The voice call system 10 includes a transmission terminal 11 and a reception terminal 12. The transmission terminal 11 and the reception terminal 12 are connected via a packet switching network 13 such as an IP network or a mobile network. In the following description, “transmission terminal 11” and “reception terminal 12” may be read as “first terminal” and “second terminal”, respectively.

送信端末１１は、パケット交換網１３を介して、有音及び無音の音声データを含む第１のパケット群を受信端末１２に送信する。 The transmitting terminal 11 transmits a first packet group including voiced and silent voice data to the receiving terminal 12 via the packet switching network 13.

受信端末１２は、送信端末１１から送信された第１のパケット群から第３のパケット群が欠損した第２のパケット群を受信する。 The receiving terminal 12 receives the second packet group in which the third packet group is lost from the first packet group transmitted from the transmitting terminal 11.

なお、以下の説明において、「第１のパケット群」、「第２のパケット群」、及び「第３のパケット群」は、それぞれ「送信パケット群」、「受信パケット群」、及び「欠損パケット群」と読み替えてもよい。 In the following description, “first packet group”, “second packet group”, and “third packet group” are “transmission packet group”, “reception packet group”, and “missing packet”, respectively. It may be read as “group”.

なお、音声通話システム１０は、例えば、ＶｏＬＴＥ等の音声通話システムを構成してもよい。この場合、音声通話システム１０は、ＡＭＲ−ＷＢ(Adaptive Multi Rate Codec ‐ Wide Band)の如き音声区間検出機構（VAD: Voice Activity Detection）を備えるコーデックをサポートしていてもよい。ＡＭＲ−ＷＢは、有音区間と無音区間でパケット形式が異なるため、ＲＴＰ(Realtime Transport Protocol)ヘッダのシーケンス番号やタイムスタンプと併せて区間毎のパケット損失特性を評価することが可能である。 The voice call system 10 may constitute a voice call system such as VoLTE, for example. In this case, the voice call system 10 may support a codec including a voice activity detection mechanism (VAD: Voice Activity Detection) such as AMR-WB (Adaptive Multi Rate Codec-Wide Band). Since AMR-WB has different packet formats in a voiced section and a silent section, it is possible to evaluate the packet loss characteristics for each section together with the sequence number and time stamp of the RTP (Realtime Transport Protocol) header.

なお、以下に示す実施形態では、音声区間検出機構を持つコーデックを利用し、有音区間と無音区間とでパケット形式が異なる音声通話システム１０を想定して、その機能構成を説明する。 In the following embodiment, a functional configuration will be described assuming a voice call system 10 that uses a codec having a voice section detection mechanism and has different packet formats in a voiced section and a silent section.

音声品質推定装置２０は、測定期間内に有音及び無音の音声データを含む送信パケット群を送信する送信端末１１から、当該送信パケット群から欠損パケット群が欠損した受信パケット群を受信する受信端末１２における受信パケット群内の音声データの音声品質を推定する機能を備えている。 The voice quality estimation apparatus 20 receives a reception packet group in which a missing packet group is missing from the transmission packet group from a transmission terminal 11 that transmits a transmission packet group including voiced and silent voice data within a measurement period. 12 has a function of estimating the voice quality of the voice data in the received packet group.

音声品質推定装置２０は、図２に示すように、記憶部２１、取得部２２、欠損パケット特定部２３、音声有無区間判別部２４、欠損割合算出部２５、及び音声品質推定部２６を備えている。 As shown in FIG. 2, the speech quality estimation apparatus 20 includes a storage unit 21, an acquisition unit 22, a missing packet identification unit 23, a speech presence / absence section determination unit 24, a missing rate calculation unit 25, and a speech quality estimation unit 26. Yes.

記憶部２１は、各部から読出し／書込み可能なメモリであり、取得部２２が取得したパケット群を記憶する。具体的には、記憶部２１は、受信パケット群を記憶する。 The storage unit 21 is a memory that can be read / written from each unit, and stores the packet group acquired by the acquisition unit 22. Specifically, the storage unit 21 stores received packet groups.

パケット群内の各パケットは、当該パケットを構成するデータ部に有音又は無音の音声データを含む。また、パケット群内の各パケットは、当該パケットを構成するヘッダ部にシーケンス番号及び送信タイムスタンプを含んでいてもよい。 Each packet in the packet group includes voice data with or without sound in the data portion constituting the packet. In addition, each packet in the packet group may include a sequence number and a transmission time stamp in a header part that constitutes the packet.

取得部２２は、音声通話システム１０で通信されるパケット群をパケット交換網１３から取得する。具体的には、取得部２２は、パケット交換網１３から、受信端末１２が受信した受信パケット群を取得し、記憶部２１に格納する。取得部２２は、受信パケット群を欠損パケット特定部２３、音声有無区間判別部２４にそれぞれ送信してもよい。 The acquisition unit 22 acquires a packet group communicated by the voice call system 10 from the packet switching network 13. Specifically, the acquisition unit 22 acquires the received packet group received by the receiving terminal 12 from the packet switching network 13 and stores it in the storage unit 21. The acquiring unit 22 may transmit the received packet group to the missing packet specifying unit 23 and the voice presence / absence section determining unit 24, respectively.

欠損パケット特定部２３は、記憶部２１から受信パケット群を読出し、当該読み出した受信パケット群から欠損した欠損パケット群を特定する。欠損パケット特定部２３は、特定された欠損パケット群を識別する識別情報を、音声有無区間判別部２４に送信する。 The missing packet specifying unit 23 reads the received packet group from the storage unit 21 and specifies the missing packet group that is missing from the read received packet group. The missing packet specifying unit 23 transmits identification information for identifying the specified missing packet group to the voice presence / absence section determining unit 24.

なお、欠損パケット特定部２３は、受信パケット群内の各パケットのシーケンス番号に基づき、欠損した欠損パケット群のシーケンス番号を特定してもよい。この場合、シーケンス番号は、欠損パケット群を識別する識別情報であってもよい。 The missing packet specifying unit 23 may specify the sequence number of the missing missing packet group based on the sequence number of each packet in the received packet group. In this case, the sequence number may be identification information for identifying a missing packet group.

音声有無区間判別部２４は、記憶部２１から受信パケット群を読出し、特定された欠損パケット群を識別する識別情報を欠損パケット特定部２３から受信する。音声有無区間判別部２４は、当該受信パケット群内の音声データについて、有音の音声データを含む第１の有音区間と、無音の音声データを含む第１の無音区間とを判別する。音声有無区間判別部２４は、受信パケット群を第１の有音区間と第１の無音区間とに判別した第１の判別結果を欠損割合算出部２５に送信する。 The voice presence / absence section determination unit 24 reads the received packet group from the storage unit 21 and receives identification information for identifying the specified lost packet group from the lost packet specifying unit 23. The voice presence / absence section discriminating unit 24 discriminates, for the voice data in the received packet group, a first voiced section including voiced voice data and a first silent section including silent voice data. The voice presence / absence section discriminating unit 24 transmits the first discrimination result obtained by discriminating the received packet group into the first voiced section and the first silent section to the loss ratio calculating unit 25.

具体的には、音声有無区間判別部２４は、受信パケット群の各パケットについてパケット形式を確認し、当該各パケット内の音声データが有音か無音かを判別する。音声有無区間判別部２４は、音声データが有音か無音かを判別された各パケットに基づき、有音の音声データを含む第１の有音区間と、無音の音声データを含む第１の無音区間とをそれぞれ判別する。 Specifically, the voice presence / absence section determination unit 24 checks the packet format for each packet in the received packet group, and determines whether the voice data in each packet is voiced or silent. The voice presence / absence section discriminating unit 24, based on each packet for which voice data is discriminated to be voiced or silent, includes a first voiced section including voiced voice data and a first silence including silent voice data. Each section is discriminated.

また、音声有無区間判別部２４は、特定された欠損パケット群内の音声データについて、受信パケット群内の音声データに基づき、有音の音声データを含む第２の有音区間と、無音の音声データを含む第２の無音区間とを判別する。音声有無区間判別部２４は、欠損パケット群を第２の有音区間と第２の無音区間とに判別された第２の判別結果を欠損割合算出部２５に送信する。なお、以下の説明において、「受信パケット群内の音声データについて、有音の音声データを含む第１の有音区間と、無音の音声データを含む第１の無音区間とを判別する手段」は、「第１の判別手段」と読み替えてもよい。また、「特定された欠損パケット群内の音声データについて、受信パケット群内の音声データに基づき、有音の音声データを含む第２の有音区間と、無音の音声データを含む第２の無音区間とを判別する手段」は、「第２の判別手段」と読み替えてもよい。 In addition, the voice presence / absence section determination unit 24, for the voice data in the specified missing packet group, based on the voice data in the received packet group, a second voiced section including voiced voice data and a silent voice The second silent section including data is discriminated. The voice presence / absence section discriminating unit 24 transmits the second discrimination result in which the missing packet group is discriminated into the second voiced section and the second silent section to the missing ratio calculating unit 25. In the following description, “means for discriminating between the first voiced section including voiced voice data and the first voiceless section containing silent voice data for the voice data in the received packet group” , May be read as “first determination means”. In addition, “for the voice data in the specified missing packet group, based on the voice data in the received packet group, the second voiced section including the voice data and the second silence including the voice data. “Means for discriminating between sections” may be read as “second discrimination means”.

音声有無区間判別部２４は、欠損パケット群を識別する識別情報に基づき、欠損パケット群の前後のパケットを受信パケット群から特定してもよい。音声有無区間判別部２４は、特定された欠損パケット群の前後のパケット内の音声データが有音か無音かに応じて、第２の有音区間と第２の無音区間とを判別してもよい。 The voice presence / absence section determination unit 24 may specify packets before and after the missing packet group from the received packet group based on identification information for identifying the missing packet group. The voice presence / absence section discriminating unit 24 may discriminate between the second voiced section and the second silent section according to whether the voice data in the packets before and after the specified missing packet group is voiced or silent. Good.

具体的には、音声有無区間判別部２４は、図３（Ａ）に示すように、欠損パケット群の前後のパケット内の音声データが共に有音であった場合、当該欠損パケット群内の音声データが有音であると判別し、当該欠損パケット群は第２の有音区間であると判別してもよい。同様に、音声有無区間判別部２４は、図３（Ｂ）に示すように、欠損パケット群の前後のパケット内の音声データが共に無音であった場合、当該欠損パケット群内の音声データが無音であると判別し、当該欠損パケット群は第２の無音区間であると判別してもよい。 Specifically, as shown in FIG. 3A, the voice presence / absence section discriminating unit 24, when the voice data in the packets before and after the missing packet group are both voiced, the voice in the missing packet group. The data may be determined to be sound, and the missing packet group may be determined to be the second sound section. Similarly, as shown in FIG. 3B, the voice presence / absence section discriminating unit 24, when both voice data in packets before and after the lost packet group are silent, the voice data in the lost packet group is silent. And the missing packet group may be determined to be the second silent section.

また、音声有無区間判別部２４は、図４に示すように、欠損パケット群が前後のパケット内の音声データが有音のパケットと、無音のパケットとで挟まれている場合、当該欠損パケット群内の音声データは有音の部分と無音の部分とがあると判別してもよい。この場合、音声有無区間判別部２４は、当該欠損パケット群は、第２の有音区間と第２の無音区間とで半分ずつに分けられると判別してもよく、予め定められた比率にしたがって分けられると判別してもよい。当該予め定められた比率は、測定期間内における第１の有音区間と第１の無音区間との長さの比率でもよい。 In addition, as shown in FIG. 4, the voice presence / absence section determination unit 24, when the voice data in the preceding and following packets of the missing packet group is sandwiched between a voice packet and a silent packet, It may be determined that the audio data includes a sound part and a silence part. In this case, the voice presence / absence section determination unit 24 may determine that the missing packet group is divided in half in the second voiced section and the second silent section, according to a predetermined ratio. You may determine that it is divided. The predetermined ratio may be a ratio of the length of the first voiced section and the first silent section within the measurement period.

なお、音声有無区間判別部２４は、各区間の長さを、各区間の占める時間に基づき算出してもよい。音声有無区間判別部２４は、各区間の占める時間を、パケット内の送信タイムスタンプに基づき算出してもよい。 Note that the voice presence / absence section determination unit 24 may calculate the length of each section based on the time occupied by each section. The voice presence / absence section determination unit 24 may calculate the time occupied by each section based on the transmission time stamp in the packet.

また、音声有無区間判別部２４は、各区間の長さを、各区間の占めるパケット数に基づき算出してもよい。 Further, the voice presence / absence section determination unit 24 may calculate the length of each section based on the number of packets occupied by each section.

欠損割合算出部２５は、受信パケット群を第１の有音区間と第１の無音区間とに判別した第１の判別結果を音声有無区間判別部２４から受信する。また、欠損割合算出部２５は、欠損パケット群を第２の有音区間と第２の無音区間とに判別した第２の判別結果を音声有無区間判別部２４から受信する。欠損割合算出部２５は、受信した第１の判別結果と第２の判別結果とに基づき、欠損割合を算出する。 The loss ratio calculation unit 25 receives the first discrimination result obtained by discriminating the received packet group into the first voiced segment and the first silent segment from the voice presence / absence segment discrimination unit 24. Further, the loss ratio calculation unit 25 receives from the speech presence / absence section determination unit 24 the second determination result obtained by determining the lost packet group as the second voiced section and the second silent section. The loss ratio calculation unit 25 calculates a loss ratio based on the received first determination result and second determination result.

具体的には、欠損割合算出部２５は、第１の有音区間及び第２の有音区間に基づき、測定期間内の総有音区間に対して第２の有音区間が占める割合を示す第１の欠損割合を算出する。また、欠損割合算出部２５は、第１の無音区間及び第２の無音区間に基づき、測定期間内の総無音区間に対して第２の無音区間が占める割合を示す第２の欠損割合を算出する。欠損割合算出部２５は、算出した第１の欠損割合及び第２の欠損割合を音声品質推定部２６に送信する。 Specifically, the loss ratio calculation unit 25 indicates the ratio of the second sound section to the total sound section in the measurement period based on the first sound section and the second sound section. A first deficiency ratio is calculated. Further, the loss ratio calculation unit 25 calculates a second loss ratio indicating the ratio of the second silence interval to the total silence interval in the measurement period based on the first silence interval and the second silence interval. To do. The defect ratio calculation unit 25 transmits the calculated first defect ratio and second defect ratio to the voice quality estimation unit 26.

なお、測定期間内の総有音区間は、第１の有音区間と第２の有音区間との和により算出され、測定期間内の総無音区間は、第１の無音区間と第２の無音区間との和により算出される。 Note that the total sound interval within the measurement period is calculated by the sum of the first sound interval and the second sound interval, and the total silence interval within the measurement period is the first silence interval and the second sound interval. Calculated as the sum of silence intervals.

なお、欠損割合算出部２５は、第１の欠損割合及び第２の欠損割合を、各区間の占める時間に基づき算出してもよい。欠損割合算出部２５は、各区間の占める時間を、パケット内の送信タイムスタンプに基づき算出してもよい。 Note that the defect ratio calculation unit 25 may calculate the first defect ratio and the second defect ratio based on the time occupied by each section. The loss ratio calculation unit 25 may calculate the time occupied by each section based on the transmission time stamp in the packet.

また、欠損割合算出部２５は、第１の欠損割合及び第２の欠損割合を、各区間の占めるパケット数に基づき算出してもよい。 Further, the loss ratio calculation unit 25 may calculate the first loss ratio and the second loss ratio based on the number of packets occupied by each section.

音声品質推定部２６は、第１の欠損割合及び第２の欠損割合を欠損割合算出部２５から受信する。音声品質推定部２６は、算出した第１の欠損割合及び第２の欠損割合と、予め定められたマッピング関数とに基づき、音声データの音声品質を推定する。 The voice quality estimation unit 26 receives the first loss rate and the second loss rate from the loss rate calculation unit 25. The voice quality estimation unit 26 estimates the voice quality of the voice data based on the calculated first loss rate and second loss rate and a predetermined mapping function.

具体的には、音声品質推定部２６は、第１の欠損割合及び第２の欠損割合と、音声品質、とが対応付けられた予め定められたマッピング関数に基づき、音声データの音声品質を推定する。 Specifically, the voice quality estimation unit 26 estimates the voice quality of the voice data based on a predetermined mapping function in which the first loss ratio and the second loss ratio are associated with the voice quality. To do.

ここで、音声品質推定部２６は、当該マッピング関数として、無音区間又は有音区間にパケット損失が集中した場合に、妥当な音声品質を推定する関数を選択するものとする。例えば、第１の欠損割合に比べて第２の欠損割合が極端に大きい場合、音声品質が過小に評価されることが無いように、従来推定される音声品質より高い音声品質を出力するマッピング関数が望ましい。また、第１の欠損割合に比べて第２の欠損割合が極端に小さい場合、音声品質が過大に評価されることが無いように、従来推定される音声品質より低い音声品質を出力するマッピング関数が望ましい。 Here, it is assumed that the voice quality estimation unit 26 selects, as the mapping function, a function that estimates an appropriate voice quality when packet loss is concentrated in a silent section or a voiced section. For example, when the second deficiency ratio is extremely large compared to the first deficiency ratio, a mapping function that outputs a voice quality higher than the conventionally estimated voice quality so that the voice quality is not underestimated. Is desirable. Also, a mapping function that outputs a voice quality lower than the conventionally estimated voice quality so that the voice quality is not overestimated when the second missing percentage is extremely small compared to the first missing percentage. Is desirable.

なお、推定される音声品質は、例えば、受聴ＭＯＳの如き定量的な音質の評価指標であることが望ましい。 Note that the estimated voice quality is preferably a quantitative sound quality evaluation index such as a listening MOS.

次に、以上のように構成された音声品質推定装置の動作について図５に示すフローチャートを用いて説明する。 Next, the operation of the speech quality estimation apparatus configured as described above will be described using the flowchart shown in FIG.

まず、送信端末１１は、測定期間内に、パケット交換網１３を介して有音又は無音の音声データを含む送信パケット群を受信端末１２に送信する。 First, the transmission terminal 11 transmits a transmission packet group including voiced or silent voice data to the reception terminal 12 via the packet switching network 13 within the measurement period.

受信端末１２は、送信端末１１から送信された送信パケット群に対して、受信パケット群を受信する。この時、受信パケット群は、送信パケット群から欠損パケット群が欠損したパケット群であるとする。 The reception terminal 12 receives the reception packet group with respect to the transmission packet group transmitted from the transmission terminal 11. At this time, it is assumed that the received packet group is a packet group in which the lost packet group is lost from the transmitted packet group.

取得部２２は、パケット交換網１３から受信パケット群を取得する（ＳＴ１１０）。取得部２２は、取得された受信パケット群を各部２１，２３，２４に送信する。 Acquisition unit 22 acquires a received packet group from packet-switched network 13 (ST110). The acquisition unit 22 transmits the acquired received packet group to each unit 21, 23, 24.

記憶部２１は、受信された受信パケット群を記憶する（ＳＴ１２０）。 Storage unit 21 stores the received packet group (ST120).

欠損パケット特定部２３は、記憶された受信パケット群に基づき、欠損した欠損パケット群を特定する（ＳＴ１３０）。欠損パケット特定部２３は、特定された欠損パケット群の識別情報を音声有無区間判別部２４に送信する。 The missing packet identifying unit 23 identifies the missing missing packet group based on the stored received packet group (ST130). The missing packet specifying unit 23 transmits the identification information of the specified missing packet group to the voice presence / absence section determining unit 24.

なお、当該受信パケット群内の各パケットは、シーケンス番号を備えていてもよい。また、欠損パケット特定部２３は、取得された受信パケット群のシーケンス番号に基づき、欠損した欠損パケット群のシーケンス番号を特定してもよい。特定された欠損パケット群の識別情報は、シーケンス番号であってもよい。 Each packet in the received packet group may have a sequence number. Further, the missing packet specifying unit 23 may specify the sequence number of the missing missing packet group based on the acquired sequence number of the received packet group. The identification information of the identified missing packet group may be a sequence number.

音声有無区間判別部２４は、受信パケット群を読出し、特定された欠損パケット群の識別情報を受信する。 The voice presence / absence section determination unit 24 reads the received packet group and receives the identification information of the specified missing packet group.

音声有無区間判別部２４は、受信パケット群内の音声データについて、有音の音声データを含む第１の有音区間と、無音の音声データを含む第１の無音区間とを判別する（ＳＴ１４０）。音声有無区間判別部２４は、受信パケット群内の音声データについて第１の有音区間及び第１の無音区間に判別した第１の判別結果を、欠損割合算出部２５に送信する。 The voice presence / absence section discriminating unit 24 discriminates, for the voice data in the received packet group, a first voiced section including voiced voice data and a first voiceless section including silent voice data (ST140). . The voice presence / absence section discriminating unit 24 transmits to the loss ratio calculating unit 25 the first discrimination result of discriminating the voice data in the received packet group into the first voiced section and the first silent section.

また、音声有無区間判別部２４は、欠損パケット群内の音声データについて、受信パケット群内の音声データに基づき、有音の音声データを含む第２の有音区間と、無音の音声データを含む第２の無音区間とを判別する（ＳＴ１５０）。音声有無区間判別部２４は、欠損パケット群内の音声データについて第２の有音区間及び第２の無音区間に判別した第２の判別結果を、欠損割合算出部２５に送信する。 In addition, the voice presence / absence section determination unit 24 includes, for the voice data in the missing packet group, a second voiced section including voiced voice data and silent voice data based on the voice data in the received packet group. The second silent section is discriminated (ST150). The voice presence / absence section discriminating unit 24 transmits the second discrimination result obtained by discriminating the voice data in the missing packet group into the second voiced section and the second silent section to the missing ratio calculating section 25.

欠損割合算出部２５は、第１の判別結果及び第２の判別結果を受信し、欠損割合を算出する（ＳＴ１６０）。具体的には、欠損割合算出部２５は、第１の有音区間及び第２の有音区間に基づき、測定期間内の総有音区間に対して第２の有音区間が占める割合を示す第１の欠損割合を算出する。また、欠損割合算出部２５は、第１の無音区間及び第２の無音区間に基づき、測定期間内の総無音区間に対して第２の無音区間が占める割合を示す第２の欠損割合を算出する。 The defect ratio calculation unit 25 receives the first determination result and the second determination result, and calculates the defect ratio (ST160). Specifically, the loss ratio calculation unit 25 indicates the ratio of the second sound section to the total sound section in the measurement period based on the first sound section and the second sound section. A first deficiency ratio is calculated. Further, the loss ratio calculation unit 25 calculates a second loss ratio indicating the ratio of the second silence interval to the total silence interval in the measurement period based on the first silence interval and the second silence interval. To do.

欠損割合算出部２５は、算出された第１の欠損割合及び第２の欠損割合を音声品質推定部２６に送信する。 The defect ratio calculation unit 25 transmits the calculated first defect ratio and second defect ratio to the voice quality estimation unit 26.

音声品質推定部２６は、第１の欠損割合及び第２の欠損割合を受信する。音声品質推定部２６は、第１の欠損割合及び第２の欠損割合と、予め定められたマッピング関数とに基づき、音声データの音声品質を推定する（ＳＴ１７０）。 The voice quality estimation unit 26 receives the first loss rate and the second loss rate. The voice quality estimation unit 26 estimates the voice quality of the voice data based on the first loss ratio and the second loss ratio and a predetermined mapping function (ST170).

以上詳述したように、第１の実施形態では、音声品質推定装置２０は、測定期間内に有音及び無音の音声データを含む第１のパケット群を送信する第１の端末から、上記第１のパケット群から第３のパケット群が欠損した第２のパケット群を受信する第２の端末における上記第２のパケット群内の音声データの音声品質を推定する。上記第２のパケット群を取得する。上記取得された第２のパケット群をメモリに記憶する。上記記憶された第２のパケット群に基づき、上記欠損した第３のパケット群を特定する。上記第２のパケット群内の音声データについて、有音の音声データを含む第１の有音区間と無音の音声データを含む第１の無音区間とを判別する。上記特定された第３のパケット群内の音声データについて、上記第２のパケット群内の音声データに基づき、有音の音声データを含む第２の有音区間と無音の音声データを含む第２の無音区間とを判別する。上記判別された第１の有音区間及び第２の有音区間に基づき、上記測定期間内の総有音区間に対して上記第２の有音区間が占める割合を示す第１の欠損割合を算出し、上記判別された第１の無音区間及び第２の無音区間に基づき、上記測定期間内の総無音区間に対して上記第２の無音区間が占める割合を示す第２の欠損割合を算出する。上記算出された第１の欠損割合及び第２の欠損割合と、予め定められたマッピング関数とに基づき、上記音声データの音声品質を推定するようにしている。このため、パケットが欠損した区間が有音区間又は無音区間に集中した場合に、音声品質をそれぞれ過大評価又は過小評価せずに評価することができる。 As described above in detail, in the first embodiment, the voice quality estimation apparatus 20 transmits the first packet group including the voice data including voiced and silent voices within the measurement period from the first terminal. The voice quality of the voice data in the second packet group in the second terminal that receives the second packet group in which the third packet group is missing from the first packet group is estimated. The second packet group is acquired. The acquired second packet group is stored in a memory. Based on the stored second packet group, the missing third packet group is identified. For the voice data in the second packet group, a first voiced section including voiced voice data and a first silent section containing silent voice data are discriminated. For the voice data in the specified third packet group, a second voiced section including voiced voice data and a second voiced section containing voiced voice data based on the voice data in the second packet group. Is determined as a silent section. Based on the determined first voiced section and second voiced section, a first deficit ratio indicating a ratio of the second voiced section to the total voiced section in the measurement period is set. And calculating a second loss ratio indicating a ratio of the second silent section to the total silent section in the measurement period based on the determined first silent section and second silent section. To do. The voice quality of the voice data is estimated based on the calculated first and second missing ratios and a predetermined mapping function. For this reason, when the section in which the packet is lost is concentrated on the voiced section or the silent section, the voice quality can be evaluated without overestimation or underestimation, respectively.

補足すると、有音区間と、無音区間において、それぞれどの程度パケットが欠損しているかを、第１の欠損割合、第２の欠損割合として算出する。各欠損割合を引数として、音声品質を推定するマッピング関数を用いることにより、測定期間において、受信側で有音区間がどの程度欠損しているか、を定量的に評価することができる。このため、音声信号レベルが極度に低い、又は無音の区間にパケット損失が集中した場合や、通常の音声信号が含まれる区間にパケット損失が集中した場合においても、音声品質を過大評価又は過小評価することなく、推定することができる。 Supplementally, how many packets are lost in the voiced and silent sections is calculated as the first loss ratio and the second loss ratio. By using a mapping function for estimating speech quality using each missing ratio as an argument, it is possible to quantitatively evaluate how much a voiced section is missing on the receiving side during the measurement period. For this reason, voice quality is overestimated or underestimated even when packet loss is concentrated in an extremely low or silent section or when packet loss is concentrated in a section containing normal speech signals. Can be estimated without.

したがって、パケット損失が生じた区間における音声の有無を考慮しつつ、音声信号を入力せずに音声品質を推定することができる。 Accordingly, it is possible to estimate the voice quality without inputting a voice signal while considering the presence or absence of voice in a section where packet loss has occurred.

また、上記特定された欠損パケット群内の音声データについて、当該欠損パケット群の前後のパケット内の音声データが有音か無音かに応じて、上記第２の有音区間と上記第２の無音区間とを判別するようにしている。このため、実際には受信できないため、パケット内の音声データが有音か無音か不明である欠損パケット群について、受信パケット群から推定することができる。 In addition, for the voice data in the specified missing packet group, the second voiced section and the second silence are determined depending on whether the voice data in the packets before and after the missing packet group is voiced or silent. The section is discriminated. For this reason, since it cannot actually be received, it is possible to estimate from the received packet group the missing packet group whose voice data in the packet is unknown whether it is voiced or silent.

したがって、受信パケット群のみからパケット損失が生じた区間における音声の有無を考慮しつつ、音声信号を入力せずに音声品質を推定することができる。 Therefore, it is possible to estimate the voice quality without inputting a voice signal while considering the presence or absence of voice in a section where packet loss has occurred only from the received packet group.

また、各パケット群内の各パケットは、シーケンス番号を備える。上記取得された受信パケット群内の各パケットのシーケンス番号に基づき、上記欠損した欠損パケット群のシーケンス番号を特定するようにしている。このため、受信パケット群のみからでも、欠損パケット群が容易に特定することができる。 Each packet in each packet group has a sequence number. Based on the sequence number of each packet in the acquired received packet group, the sequence number of the missing missing packet group is specified. For this reason, the missing packet group can be easily identified only from the received packet group.

したがって、音声信号を入力せずに、パケット損失特性から音声品質を推定することができる。 Therefore, the voice quality can be estimated from the packet loss characteristics without inputting the voice signal.

また、各パケット群内の各パケットは、送信タイムスタンプを備える。上記送信タイムスタンプに基づき、上記第１の欠損割合及び上記第２の欠損割合を算出するようにしている。このため、各区間の長さを時間で評価することができる。 Each packet in each packet group includes a transmission time stamp. Based on the transmission time stamp, the first deficiency ratio and the second deficiency ratio are calculated. For this reason, the length of each section can be evaluated by time.

したがって、第１の欠損割合及び第２の欠損割合をより定量的に算出することができる。 Therefore, the first defect ratio and the second defect ratio can be calculated more quantitatively.

また、上記取得した各パケット群のパケット数に基づき、上記第１の欠損割合及び上記第２の欠損割合を算出するようにしている。このため、各区間の長さをパケット数で評価することができる。 The first loss ratio and the second loss ratio are calculated based on the acquired number of packets in each packet group. For this reason, the length of each section can be evaluated by the number of packets.

［第２の実施形態］
第２の実施形態は、第１の実施形態の変形例であり、送信パケット群を更に取得することで、より精度よく音声品質を推定し得る構成となっている。具体的には、第２の実施形態の機能構成は、図２に示す機能構成と同様である。以下では、図２と同一部分には同一符号を付してその詳しい説明を省略し、異なる部分について主に述べる。 [Second Embodiment]
The second embodiment is a modification of the first embodiment, and has a configuration in which voice quality can be estimated more accurately by further acquiring a transmission packet group. Specifically, the functional configuration of the second embodiment is the same as the functional configuration shown in FIG. In the following, the same parts as those in FIG. 2 are denoted by the same reference numerals, detailed description thereof is omitted, and different parts are mainly described.

記憶部２１は、各部から読出し／書込み可能なメモリであり、取得部２２が取得したパケット群を記憶する。具体的には、記憶部２１は、受信パケット群に加え、送信パケット群を更に記憶する。 The storage unit 21 is a memory that can be read / written from each unit, and stores the packet group acquired by the acquisition unit 22. Specifically, the storage unit 21 further stores a transmission packet group in addition to the reception packet group.

取得部２２は、パケット交換網１３から、受信パケット群に加えて、送信端末１１が送信した送信パケット群を更に取得し、記憶部２１に格納する。取得部２２は、受信パケット群を欠損パケット特定部２３に送信し、送信パケット群を音声有無区間判別部２４に送信してもよい。 The acquisition unit 22 further acquires the transmission packet group transmitted from the transmission terminal 11 from the packet switching network 13 in addition to the reception packet group, and stores the transmission packet group in the storage unit 21. The acquiring unit 22 may transmit the received packet group to the missing packet specifying unit 23 and transmit the transmitted packet group to the voice presence / absence section determining unit 24.

音声有無区間判別部２４は、記憶部２１から送信パケット群を読出し、特定された欠損パケット群を識別する識別情報を欠損パケット特定部２３から受信する。 The voice presence / absence section determination unit 24 reads the transmission packet group from the storage unit 21 and receives identification information for identifying the specified missing packet group from the missing packet specification unit 23.

音声有無区間判別部２４は、第１の判別手段に代えて、送信パケット群内の音声データについて、測定期間内において有音の音声データを含む総有音区間と、無音の音声データを含む総無音区間とを判別する。音声有無区間判別部２４は、送信パケット群を総有音区間と総無音区間とに判別した第３の判別結果を欠損割合算出部２５に送信する。 The voice presence / absence section discriminating unit 24 replaces the first discriminating means with respect to the voice data in the transmission packet group, and includes the total voiced section including voiced voice data and the total voiced voice data within the measurement period. Distinguish between silent sections. The voice presence / absence section discriminating unit 24 transmits the third discrimination result obtained by discriminating the transmission packet group into the total voiced section and the total silent section to the loss ratio calculating unit 25.

具体的には、音声有無区間判別部２４は、送信パケット群の各パケットについてパケット形式を確認し、当該各パケット内の音声データが有音か無音かを判別する。音声有無区間判別部２４は、音声データが有音か無音かを判別された各パケットに基づき、有音の音声データを含む総有音区間と、無音の音声データを含む総無音区間とをそれぞれ判別する。 Specifically, the voice presence / absence section determination unit 24 checks the packet format for each packet in the transmission packet group, and determines whether the voice data in each packet is voiced or silent. The voice presence / absence section discriminating unit 24 determines a total voiced section including voiced voice data and a total silent section including silent voice data based on each packet in which the voice data is determined to be voiced or silent. Determine.

なお、以下の説明において、「送信パケット群内の音声データについて、測定期間内において有音の音声データを含む総有音区間と、無音の音声データを含む総無音区間とを判別する手段」は、「第３の判別手段」と読み替えてもよい。 In the following description, “means for discriminating between total voiced sections including voiced voice data and total silent sections including silent voice data within the measurement period for voice data in the transmission packet group” , May be read as “third determination means”.

また、音声有無区間判別部２４は、特定された欠損パケット群について、送信パケット群内の音声データに基づき、有音の音声データを含む第２の有音区間と、無音の音声データを含む第２の無音区間とを判別する。音声有無区間判別部２４は、欠損パケット群を第２の有音区間と第２の無音区間とに判別した第２の判別結果を欠損割合算出部２５に送信する。 In addition, the voice presence / absence section discriminating unit 24, for the identified missing packet group, based on the voice data in the transmission packet group, the second voice section including the voice data and the voice data including the silent voice data. 2 silence sections are discriminated. The voice presence / absence section discriminating unit 24 transmits the second discrimination result obtained by discriminating the missing packet group into the second voiced section and the second silent section to the missing ratio calculating unit 25.

具体的には、音声有無区間判別部２４は、送信パケット群について、特定された欠損パケット群の識別情報を照合することにより、特定された欠損パケット群が送信パケット群内のどのパケットかを特定する。音声有無区間判別部２４は、送信パケット群内で特定された欠損パケット群に相当する各パケットについてパケット形式を確認し、当該パケット内の音声データが有音か無音かを判別する。音声有無区間判別部２４は、音声データが有音か無音かを判別された各パケットに基づき、有音の音声データを含む第２の有音区間と、無音の音声データを含む第２の無音区間とをそれぞれ判別する。 Specifically, the voice presence / absence section discriminating unit 24 identifies which packet in the transmission packet group the identified missing packet group by collating the identification information of the identified missing packet group with respect to the transmission packet group. To do. The voice presence / absence section determination unit 24 checks the packet format for each packet corresponding to the missing packet group specified in the transmission packet group, and determines whether the voice data in the packet is voiced or silent. The voice presence / absence section discriminating unit 24, based on each packet in which the voice data is judged to be voiced or silent, includes a second voiced section including voiced voice data and a second voiceless voice including silent voice data. Each section is discriminated.

欠損割合算出部２５は、送信パケット群を総有音区間と総無音区間とに判別した第３の判別結果を音声有無区間判別部２４から受信する。また、欠損割合算出部２５は、欠損パケット群を第２の有音区間と第２の無音区間とに判別した第２の判別結果を音声有無区間判別部２４から受信する。欠損割合算出部２５は、受信した第３の判別結果と第２の判別結果とに基づき、欠損割合を算出する。 The loss ratio calculation unit 25 receives the third discrimination result obtained by discriminating the transmission packet group into the total voiced segment and the total silent segment from the voice presence / absence segment determination unit 24. Further, the loss ratio calculation unit 25 receives from the speech presence / absence section determination unit 24 the second determination result obtained by determining the lost packet group as the second voiced section and the second silent section. The missing ratio calculation unit 25 calculates a missing ratio based on the received third determination result and second determination result.

具体的には、欠損割合算出部２５は、第１の有音区間に代えて、総有音区間と、第２の有音区間とに基づき、測定期間内の総有音区間に対して第２の有音区間が占める割合を示す第１の欠損割合を算出する。また、欠損割合算出部２５は、第１の無音区間に代えて、総無音区間と、第２の無音区間とに基づき、測定期間内の総無音区間に対して第２の無音区間が占める割合を示す第２の欠損割合を算出する。欠損割合算出部２５は、算出した第１の欠損割合及び第２の欠損割合を音声品質推定部２６に送信する。 Specifically, the loss rate calculation unit 25 replaces the first sounded section with the total sounded section and the second sounded section, and calculates the first sounded section with respect to the total sounded section within the measurement period. A first loss ratio indicating a ratio occupied by the two sounded sections is calculated. In addition, the loss ratio calculation unit 25 occupies the second silent section with respect to the total silent section in the measurement period based on the total silent section and the second silent section instead of the first silent section. A second deficiency ratio is calculated. The defect ratio calculation unit 25 transmits the calculated first defect ratio and second defect ratio to the voice quality estimation unit 26.

次に、以上のように構成された音声品質推定装置の動作について図６に示すフローチャートを用いて説明する。 Next, the operation of the speech quality estimation apparatus configured as described above will be described using the flowchart shown in FIG.

取得部２２は、パケット交換網１３から受信パケット群及び送信パケット群を取得する（ＳＴ１１０’）。取得部２２は、取得した受信パケット群を各部２１，２３に送信する。また、取得部２２は、取得した送信パケット群を各部２１，２４に送信する。 Obtaining unit 22 obtains a received packet group and a transmitted packet group from packet switched network 13 (ST110 '). The acquisition unit 22 transmits the acquired received packet group to the units 21 and 23. In addition, the acquisition unit 22 transmits the acquired transmission packet group to the units 21 and 24.

記憶部２１は、受信した受信パケット群及び送信パケット群を記憶する（ＳＴ１２０’）。 The storage unit 21 stores the received packet group and the transmitted packet group (ST120 ').

欠損パケット特定部２３は、読み出した受信パケット群に基づき、欠損した欠損パケット群を特定する（ＳＴ１３０）。欠損パケット特定部２３は、特定された欠損パケット群の識別情報を音声有無区間判別部２４に送信する。 The missing packet identifying unit 23 identifies the missing missing packet group based on the read received packet group (ST130). The missing packet specifying unit 23 transmits the identification information of the specified missing packet group to the voice presence / absence section determining unit 24.

なお、受信した受信パケット群内の各パケットは、シーケンス番号を備えていてもよい。また、欠損パケット特定部２３は、取得した受信パケット群のシーケンス番号に基づき、欠損した欠損パケット群のシーケンス番号を特定してもよい。特定された欠損パケット群の識別情報は、シーケンス番号であってもよい。 Each packet in the received packet group may be provided with a sequence number. Further, the missing packet specifying unit 23 may specify the sequence number of the missing missing packet group based on the acquired sequence number of the received packet group. The identification information of the identified missing packet group may be a sequence number.

音声有無区間判別部２４は、送信パケット群と、特定された欠損パケット群の識別情報とをそれぞれ受信する。 The voice presence / absence section determination unit 24 receives the transmission packet group and the identification information of the identified missing packet group.

音声有無区間判別部２４は、送信パケット群内の音声データについて、有音の音声データを含む総有音区間と、無音の音声データを含む総無音区間とを判別する（ＳＴ１４０’）。音声有無区間判別部２４は、送信パケット群内の音声データについて総有音区間及び総無音区間に判別した第３の判別結果を、欠損割合算出部２５に送信する。 The voice presence / absence section discriminating section 24 discriminates, for the voice data in the transmission packet group, a total voice section including voiced voice data and a total silence section including silent voice data (ST140 '). The voice presence / absence section discriminating unit 24 transmits the third discrimination result of the voice data in the transmission packet group, which is discriminated into the total voiced section and the total silent section, to the loss ratio calculating section 25.

また、音声有無区間判別部２４は、送信パケット群内の音声データについて、特定された欠損パケット群に基づき、有音の音声データを含む第２の有音区間と、無音の音声データを含む第２の無音区間とを判別する（ＳＴ１５０’）。音声有無区間判別部２４は、欠損パケット群内の音声データについて第２の有音区間及び第２の無音区間に判別した第２の判別結果を、欠損割合算出部２５に送信する。 The voice presence / absence section discriminating unit 24 also includes a second voiced section including voiced voice data and voiced data including silent voice data based on the identified missing packet group for voice data in the transmission packet group. 2 silence sections are discriminated (ST150 ′). The voice presence / absence section discriminating unit 24 transmits the second discrimination result obtained by discriminating the voice data in the missing packet group into the second voiced section and the second silent section to the missing ratio calculating section 25.

欠損割合算出部２５は、第３の判別結果及び第２の判別結果を受信し、欠損割合を算出する（ＳＴ１６０’）。具体的には、欠損割合算出部２５は、総有音区間及び第２の有音区間に基づき、測定期間内の総有音区間に対して第２の有音区間が占める割合を示す第１の欠損割合を算出する。また、欠損割合算出部２５は、総無音区間及び第２の無音区間に基づき、測定期間内の総無音区間に対して第２の無音区間が占める割合を示す第２の欠損割合を算出する。 The loss ratio calculation unit 25 receives the third determination result and the second determination result, and calculates the loss ratio (ST160 '). Specifically, the loss ratio calculation unit 25 is a first unit that indicates a ratio of the second voiced section to the total voiced section in the measurement period based on the total voiced section and the second voiced section. Calculate the percentage of deficiency. Further, the loss ratio calculation unit 25 calculates a second loss ratio indicating the ratio of the second silence interval to the total silence interval in the measurement period based on the total silence interval and the second silence interval.

欠損割合算出部２５は、算出した第１の欠損割合及び第２の欠損割合を音声品質推定部２６に送信する。 The defect ratio calculation unit 25 transmits the calculated first defect ratio and second defect ratio to the voice quality estimation unit 26.

以上詳述したように、第２の実施形態では、音声品質推定装置２０は、送信パケット群を更に取得し、記憶する。上記記憶された送信パケット群内の音声データについて、有音の音声データを含む総有音区間と無音の音声データを含む総無音区間とを判別する。上記送信パケット群内の音声データについて、上記特定された欠損パケット群に基づき、上記第２の有音区間と上記第２の無音区間とを判別する。上記判別された総有音区間及び第２の有音区間に基づき上記第１の欠損割合を算出し、上記判別された総無音区間及び第２の無音区間に基づき、上記第２の欠損割合を算出するようにしている。このため、欠損したパケット群が有音であったか、無音であったか、をより正確に考慮したうえで、音声品質を推定することができる。 As described above in detail, in the second embodiment, the voice quality estimation apparatus 20 further acquires and stores a transmission packet group. With respect to the voice data in the stored transmission packet group, a total voiced section including voiced voice data and a total silent section including silent voice data are discriminated. The voice data in the transmission packet group is discriminated from the second voiced section and the second silent section based on the identified missing packet group. The first loss ratio is calculated based on the determined total sound interval and the second sound interval, and the second loss ratio is calculated based on the determined total silence interval and the second silence interval. I am trying to calculate. For this reason, it is possible to estimate the voice quality after more accurately considering whether the missing packet group is voiced or silent.

補足すると、送信パケット群内の音声データについて、直接総有音区間と総無音区間とを判別するため、当該判別結果をそのまま欠損割合算出に使用することができる。また、受信パケット群に基づいて特定された欠損パケット群を、送信パケット群内から抽出し、当該抽出された欠損パケット群について第２の有音区間と、第２の無音区間を判別する。したがって、第２の有音区間及び第２の無音区間の判別に際し、推定計算をする必要がないため、より正確に両者を判別することができる。 Supplementally, since the voice data in the transmission packet group is directly discriminated from the total voiced section and the total silent section, the determination result can be used as it is for the loss ratio calculation. Further, the missing packet group specified based on the received packet group is extracted from the transmission packet group, and the second voiced section and the second silent section are discriminated for the extracted missing packet group. Accordingly, since it is not necessary to perform estimation calculation when discriminating between the second voiced section and the second silent section, both can be discriminated more accurately.

つまり、第２の実施形態によれば、第１の実施形態の効果に加え、パケット損失が生じた区間における音声の有無を考慮しつつ、音声信号を入力せずに、より精度よく音声品質を推定することができる。 That is, according to the second embodiment, in addition to the effects of the first embodiment, the voice quality can be improved more accurately without inputting the voice signal while considering the presence or absence of the voice in the section where the packet loss has occurred. Can be estimated.

また、第２の実施形態は、第１の実施形態と同様に、シーケンス番号に基づいて欠損パケット群を特定する、送信タイムスタンプに基づいて欠損割合を算出する、パケット数に基づいて欠損割合を算出する等というように、適宜、変形して実施することができる。 In the second embodiment, similarly to the first embodiment, a missing packet group is specified based on a sequence number, a missing rate is calculated based on a transmission time stamp, and a missing rate is calculated based on the number of packets. For example, the calculation can be modified as appropriate.

要するにこの発明は、上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合せにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態に亘る構成要素を適宜組み合せてもよい。 In short, the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Further, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, you may combine suitably the component covering different embodiment.

１…音声品質推定システム、１０…音声通話システム、１１…送信端末、１２…受信端末、１３…パケット交換網、２０…音声品質推定装置、２１…記憶部、２２…取得部、２３…欠損パケット特定部、２４…音声有無区間判別部、２５…欠損割合算出部、２６…音声品質推定部。 DESCRIPTION OF SYMBOLS 1 ... Voice quality estimation system, 10 ... Voice call system, 11 ... Transmission terminal, 12 ... Reception terminal, 13 ... Packet switching network, 20 ... Voice quality estimation apparatus, 21 ... Memory | storage part, 22 ... Acquisition part, 23 ... Missing packet Identification unit, 24... Voice presence / absence section determination unit, 25... Missing rate calculation unit, 26.

Claims

A second packet group in which the third packet group is lost from the first packet group is received from the first terminal that transmits the first packet group including voiced and silent voice data within the measurement period. A speech quality estimation device that estimates speech quality of speech data in the second packet group in a second terminal,
Obtaining means for obtaining the second packet group;
Storage means for storing the acquired second packet group in a memory;
Based on the stored second packet group, a missing packet identifying means for identifying the missing third packet group;
A first discriminating means for discriminating a first voiced section including voiced voice data and a first voiceless section containing silent voice data for the voice data in the second packet group;
For the voice data in the specified third packet group, a second voiced section including voiced voice data and a second voiced section containing silent voice data based on the voice data in the second packet group. Second discriminating means for discriminating the silent section of
Based on the determined first voiced section and second voiced section, a first loss ratio indicating a ratio occupied by the second voiced section with respect to a total voiced section in the measurement period. And calculating a second loss ratio indicating a ratio of the second silent section to the total silent section in the measurement period based on the determined first silent section and second silent section. Deficiency ratio calculation means to
Voice quality estimation means for estimating the voice quality of the voice data based on the calculated first and second loss ratios and a predetermined mapping function;
A speech quality estimation apparatus comprising:

The second discriminating unit determines whether the second voice data in the identified third packet group is voiced or silent according to whether voice data in packets before and after the third packet group is voiced or silent. The speech quality estimation apparatus according to claim 1, wherein a voiced section and a second silent section are discriminated.

The acquisition means further acquires the first packet group,
The storage means further stores the acquired first packet group in a memory,
In place of the first determining means, a third sound determining section for determining the total voiced section including voiced voice data and the total silent section including silent voice data for the voice data in the first packet group. With a discrimination means,
The second discriminating means, for the voice data in the specified third packet group, based on the voice data in the first packet group, the second voiced section and the second silent section. And
The loss rate calculating means calculates the first loss rate based on the total sound interval instead of the first sound interval, and replaces the first silence interval with the total silence interval. Calculating the second deficiency ratio based on:
The speech quality estimation apparatus according to claim 1, wherein:

Each packet in each packet group comprises a sequence number;
The missing packet specifying means specifies the sequence number of the missing third packet group based on the sequence number of each packet in the acquired second packet group.
The speech quality estimation apparatus according to any one of claims 1 to 3, wherein

Each packet in each packet group comprises a transmission timestamp,
The said defect | deletion ratio calculation means calculates a said 1st defect | deletion ratio and a said 2nd defect | deletion ratio based on the said transmission time stamp. The Claim 1 thru | or 4 characterized by the above-mentioned. Voice quality estimation device.

5. The loss ratio calculation unit calculates the first loss ratio and the second loss ratio based on the acquired number of packets in each packet group. The speech quality estimation apparatus according to claim 1.

A second packet group in which the third packet group is lost from the first packet group is received from the first terminal that transmits the first packet group including voiced and silent voice data within the measurement period. A speech quality estimation method in a speech quality estimation device that estimates speech quality of the speech data in a second terminal,
Obtaining the second packet group; and
A storing step of storing the acquired second packet group in a memory;
A missing packet identifying step for identifying the missing third packet group based on the stored second packet group;
A first discriminating step for discriminating between the first voiced section including voiced voice data and the first voiceless section containing silent voice data for the voice data in the second packet group;
For the voice data in the specified third packet group, based on the acquired voice data in the second packet group, a second voiced section including voiced voice data and silent voice data are obtained. A second determination step of determining a second silent section including
Based on the determined first voiced section and second voiced section, a first loss ratio indicating a ratio occupied by the second voiced section with respect to a total voiced section in the measurement period. And calculating a second loss ratio indicating a ratio of the second silent section to the total silent section in the measurement period based on the determined first silent section and second silent section. Deficiency ratio calculation step to
A speech quality estimation step of estimating speech quality of the speech data based on the calculated first loss rate and second loss rate, and a predetermined mapping function;
A speech quality estimation method comprising:

A second packet group in which a third packet group is missing from the first packet group is received from a first terminal that transmits a first packet group that includes voice data with voice and silence within a measurement period. A speech quality estimation program used in a speech quality estimation apparatus for estimating speech quality of the speech data in the terminal of 2;
The speech quality estimation device;
Obtaining means for obtaining the second packet group;
Storage means for storing the acquired second packet group in a memory;
A lost packet specifying means for specifying the lost third packet group based on the stored second packet group;
A first discriminating means for discriminating between the first voiced section including voiced voice data and the first silent section containing silent voice data for the voice data in the second packet group;
For the voice data in the specified third packet group, based on the acquired voice data in the second packet group, a second voiced section including voiced voice data and silent voice data are obtained. A second discriminating means for discriminating a second silent section including the second silent section;
Based on the determined first voiced section and second voiced section, a first loss ratio indicating a ratio occupied by the second voiced section with respect to a total voiced section in the measurement period. And calculating a second loss ratio indicating a ratio of the second silent section to the total silent section in the measurement period based on the determined first silent section and second silent section. Deficiency ratio calculating means,
A voice quality estimating means for estimating a voice quality of the voice data based on the calculated first and second missing ratios and a predetermined mapping function;
Program to function as.