JP2017204840A

JP2017204840A - Voice quality estimation device, voice quality estimation method and program

Info

Publication number: JP2017204840A
Application number: JP2016097399A
Authority: JP
Inventors: 隆文奥山; Takafumi Okuyama; 和久山岸; Kazuhisa Yamagishi
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-05-13
Filing date: 2016-05-13
Publication date: 2017-11-16
Anticipated expiration: 2036-05-13
Also published as: JP6586044B2

Abstract

PROBLEM TO BE SOLVED: To provide a voice quality estimation technique using a more precise sound loss rate, when estimating the hearing quality based on packet capture.SOLUTION: In a voice quality estimation device having a sound loss rate estimation unit for estimating sound loss rate from a sound packet flow, and a hearing quality estimation unit for estimating the hearing quality from the sound loss rate thus estimated, by using a predetermined mapping function, the sound loss rate estimation unit estimates the number of sound packet loss according to which of the combination of a tone packet or a silent packet, the packets before and after the lost packet in the sound packet flow belong, and estimates the sound loss rate from the number of sound packet loss thus estimated.SELECTED DRAWING: Figure 3

Description

本発明は、ネットワークを介した音声通話サービスの音声品質推定装置、方法及びプログラムに関する。 The present invention relates to a voice quality estimation apparatus, method, and program for a voice call service via a network.

サービス事業者は、音声通話サービスを維持・改善していくために、サービスの品質を把握することが重要である。サービスの品質を評価する指標の一つが会話品質である。会話品質は、サービスの利用ユーザが会話をした時に体感する品質を表す。運用の場面においては、主観評価で会話品質を評価すると費用と時間がかかることから、会話品質の主要因である受聴品質を用いて会話品質が推定される。ここで、受聴品質とは、ユーザが音声を受聴した時に感じる品質を表す。 It is important for service providers to understand the quality of services in order to maintain and improve voice call services. One of the indicators for evaluating service quality is conversation quality. The conversation quality represents the quality experienced by the user using the service when having a conversation. In a scene of operation, since it is expensive and time consuming to evaluate the conversation quality by subjective evaluation, the conversation quality is estimated using the listening quality which is the main factor of the conversation quality. Here, the listening quality represents the quality that the user feels when listening to the voice.

受聴品質の評価指標に受聴MOS（Mean Opinion Score）がある。受聴MOSの評価は、実際に評価者が音声サンプルを聴いて判定することになるが、評価者を必要とせずに推定するアルゴリズム手法がある。例えば、ＶｏＬＴＥ（Voice over Long Term Evolution）のような広帯域音声サービスの受聴MOSを推定する手法として、ＰＯＬＱＡ（Perceptual Objective Listening Quality Analysis）に基づく手法が知られている（非特許文献１参照）。ＰＯＬＱＡは、発話側から入力される参照音声信号と、受話側で出力される収録音声信号とを比較し、ＰＯＬＱＡ評価値を算出することで、受聴品質を評価する客観評価手法である。ＰＯＬＱＡ評価値は、ＩＴＵ−Ｔ（Telecommunication standardization sector of International Telecommunication Union）勧告Ｐ．８６３のImplementer's guide（P. Imp 863）で規定するマッピング関数を適用することにより、推定受聴MOS（MOS-LQO : Mean Opinion Score - Listening Quality Objective）に変換可能である。 Listening MOS (Mean Opinion Score) is an evaluation index of listening quality. The evaluation of the listening MOS is actually judged by the evaluator listening to the audio sample, but there is an algorithm method that estimates without requiring the evaluator. For example, a technique based on POLQA (Perceptual Objective Listening Quality Analysis) is known as a technique for estimating the listening MOS of a broadband voice service such as VoLTE (Voice over Long Term Evolution) (see Non-Patent Document 1). POLQA is an objective evaluation method for evaluating listening quality by comparing a reference voice signal input from the utterance side with a recorded voice signal output on the reception side and calculating a POLQA evaluation value. The POLQA evaluation value is ITU-T (Telecommunication standardization sector of International Telecommunication Union) recommendation P.I. By applying a mapping function defined in 863 Implementer's guide (P. Imp 863), conversion to an estimated listening MOS (MOS-LQO: Mean Opinion Score-Listening Quality Objective) is possible.

ＰＯＬＱＡは、音声信号を入力して処理する必要がある。しかしながら、端末に録音機能が必ずしも搭載されているとは限らず、音声通話サービスの品質把握を目的とした音声信号の取得が困難な場合がある。このように音声信号を直接扱えず、ネットワーク内のパケットをキャプチャして品質を推定する際には、ＰＯＬＱＡは適用できないという不都合がある。 POLQA needs to input and process audio signals. However, the recording function is not always installed in the terminal, and it may be difficult to obtain a voice signal for the purpose of grasping the quality of the voice call service. In this way, voice signals cannot be handled directly, and there is a disadvantage that POLQA cannot be applied when estimating quality by capturing packets in the network.

これに対し、パケットキャプチャを用いて高精度な品質推定をする方法が、非特許文献２、３において提案されている。非特許文献２や非特許文献３では、音声区間検出機構を持つコーデック（AMR-WB等）の特性を利用し、パケット形式により有音/無音区間の判定を行い、有音区間のパケット（以下、有音パケット）が損失した割合（以下、有音損失率）を考慮した推定を行う。 On the other hand, Non-Patent Documents 2 and 3 propose a method for estimating quality with high accuracy using packet capture. Non-Patent Document 2 and Non-Patent Document 3 use the characteristics of a codec (AMR-WB, etc.) that has a voice section detection mechanism, determine voice / silent sections according to the packet format, , Voice packet) is estimated in consideration of the loss rate (hereinafter referred to as voice loss rate).

しかしながら、非特許文献２はネットワーク等で損失が発生した後のパケット情報と、その損失が発生する前のパケット情報を比較するフルリファレンスモデルであり、網の制約により1点測定となる場合には適用できないという不都合がある。 However, Non-Patent Document 2 is a full reference model that compares packet information after loss occurs in a network and the like and packet information before the loss occurs. There is an inconvenience that it cannot be applied.

一方、非特許文献３は、受信側のパケット情報のみから推定するノンリファレンスモデルであり、1点でのパケットキャプチャに基づいて有音損失率を考慮した推定を行う。この推定方法では、損失したパケットの前後のパケットのヘッダ情報から損失した個数と当該損失したパケットが有音パケットか否かを判断する。特に、前後のパケットが有音パケットと無音パケットの場合、損失したパケット全てを有音パケットとみなしている。しかしながら、有音区間と無音区間が切り替わる箇所は、無音、もしくは、受聴MOSに影響を与えにくい状況が多くあるため、受聴MOSが実際は高いものの、過小推定される可能性がある、という不都合がある。 On the other hand, Non-Patent Document 3 is a non-reference model that is estimated from only packet information on the receiving side, and performs estimation in consideration of the sound loss rate based on packet capture at one point. In this estimation method, it is determined from the header information of the packets before and after the lost packet and whether or not the lost packet is a voice packet. In particular, when the preceding and following packets are voice packets and silence packets, all lost packets are regarded as voice packets. However, there are many situations where the voiced and silent sections are switched, because there are many situations where it is difficult to affect the listening MOS, or the listening MOS is actually high, but it may be underestimated. .

ITU-T P.863 Perceptual Objective Listening Quality Assessment., 09/2014.ITU-T P.863 Perceptual Objective Listening Quality Assessment., 09/2014. 奥山隆文, 倉島敦子, 増田征貴, "VoLTEにおける受聴品質推定法の検討," 電子情報通信学会総大, B-11-21, pp.438, 2015年3月Okuyama Takafumi, Kurashima Atsuko, Masuda Yuki, "A Study on Estimation Method of Listening Quality in VoLTE," IEICE Societies, B-11-21, pp.438, March 2015 Yang F., Jiang L. and Li X. (2012). Real-time quality assessment for voice over IP. Con-currency and Computation: Practice and Experience, 24(11), 1192-1199.Yang F., Jiang L. and Li X. (2012) .Real-time quality assessment for voice over IP.Con-currency and Computation: Practice and Experience, 24 (11), 1192-1199.

上記のように、ネットワークを介して複数端末間で音声通信による会話を行うシステムにおいて、1点でのパケットキャプチャに基づいて有音損失率を考慮した推定を行う際、有音区間と無音区間が切り替わる箇所でパケット損失が生じた場合に、受聴MOSが実際は高いものの、過小推定される可能性がある、という問題があった。 As described above, in a system that performs conversation by voice communication between a plurality of terminals via a network, when performing estimation considering the voice loss rate based on packet capture at one point, the voiced and silent sections are When packet loss occurs at the switching point, there is a problem that although the listening MOS is actually high, it may be underestimated.

上述した問題を解決するため、本発明の課題は、パケットキャプチャに基づき受聴品質を推定する際、より精度の高い有音損失率を利用した音声品質推定技術を提供することである。 In order to solve the above-described problem, an object of the present invention is to provide a voice quality estimation technique using a more accurate voice loss rate when estimating listening quality based on packet capture.

上記課題を解決するため、本発明の一態様は、音声パケットフローから有音損失率を推定する有音損失率推定部と、所定のマッピング関数を利用して、前記推定された有音損失率から受聴品質を推定する受聴品質推定部と、を有する音声品質推定装置であって、前記有音損失率推定部は、前記音声パケットフローにおける損失したパケットの前後のパケットが有音パケットと無音パケットとの何れの組み合わせであるかに応じて有音パケット損失数を推定し、前記推定された有音パケット損失数から前記有音損失率を推定する音声品質推定装置に関する。 In order to solve the above-described problem, an aspect of the present invention provides a sound loss rate estimator that estimates a sound loss rate from a voice packet flow, and the estimated sound loss rate using a predetermined mapping function. A voice quality estimation device having a voice quality estimation unit for estimating a voice quality from the voice packet flow rate, and the voice loss rate estimation unit includes voice packets and silence packets before and after a lost packet in the voice packet flow. It is related with the audio | voice quality estimation apparatus which estimates the number of voice packet loss according to which combination and and estimates the said voice loss rate from the estimated number of voice packet loss.

本発明の他の態様は、音声パケットフローを取得するステップと、前記音声パケットフローから有音損失率を推定するステップと、所定のマッピング関数を利用して、前記推定された有音損失率から受聴品質を推定するステップと、を有する音声品質推定方法であって、前記有音損失率を推定するステップは、前記音声パケットフローにおける損失したパケットの前後のパケットが有音パケットと無音パケットとの何れの組み合わせであるかに応じて有音パケット損失数を推定し、前記推定された有音パケット損失数から前記有音損失率を推定する方法に関する。 According to another aspect of the present invention, a voice packet flow is obtained, a voice loss rate is estimated from the voice packet flow, and a predetermined mapping function is used to calculate the voice loss rate from the estimated voice loss rate. A voice quality estimation method comprising: a step of estimating a listening quality, wherein the step of estimating the voice loss rate includes: a packet before and after a lost packet in the voice packet flow is a voice packet and a voice packet; The present invention relates to a method for estimating the number of voice packet losses according to which combination is used, and estimating the voice loss rate from the estimated number of voice packet losses.

本発明の更なる他の態様は、上述した音声品質推定装置の各部としてプロセッサを機能させるためのプログラムに関する。 Still another aspect of the present invention relates to a program for causing a processor to function as each unit of the speech quality estimation apparatus described above.

本発明によると、パケットキャプチャに基づき受聴品質を推定する際、より精度の高い有音損失率を利用した音声品質推定技術を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, when estimating listening quality based on a packet capture, the speech quality estimation technique using a more accurate voice loss rate can be provided.

図１は、本発明の一実施例によるネットワークを介した音声通信サービスの構成を示す概略図である。FIG. 1 is a schematic diagram showing a configuration of a voice communication service via a network according to an embodiment of the present invention. 図２は、本発明の一実施例による通信システムの構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a communication system according to an embodiment of the present invention. 図３は、本発明の一実施例による音声品質推定装置の機能構成を示すブロック図である。FIG. 3 is a block diagram showing a functional configuration of a speech quality estimation apparatus according to an embodiment of the present invention. 図４は、本発明の一実施例による有音パケット数の推定手順を示す概略図である。FIG. 4 is a schematic diagram illustrating a procedure for estimating the number of voiced packets according to an embodiment of the present invention. 図５は、本発明の一実施例による音声品質推定処理を示すフロー図である。FIG. 5 is a flowchart showing the voice quality estimation process according to one embodiment of the present invention.

以下、図面に基づいて本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

本発明の音声品質推定装置は、IP網やモバイルネットワークを介して接続される2つの音声通話端末で構成されるVoLTE等の音声通話システムにおいて、端末間で通信される音声が含まれるパケットを受信側端末又は網内で測定するものである。 The voice quality estimation apparatus of the present invention receives a packet including voice communicated between terminals in a voice call system such as VoLTE composed of two voice call terminals connected via an IP network or a mobile network. It is measured at the side terminal or network.

まず、図１を参照して、本発明の一実施例による音声通信サービスの概略を説明する。図１は、本発明の一実施例によるネットワークを介した音声通信サービスの構成を示す概略図である。 First, an outline of a voice communication service according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 is a schematic diagram showing a configuration of a voice communication service via a network according to an embodiment of the present invention.

図１に示されるように、本実施例による音声通信サービスでは、無線端末がキャリアネットワーク（ＮＷ）を介し音声通信を実行する。より詳細には、各無線端末はキャリアＮＷ内の近傍の基地局に無線アクセスし、基地局は中継ノードを介し音声パケットフローをやりとりする。本実施例では、音声パケットフローのプロトコルは、シーケンス番号及び送信タイムスタンプを含むRTP (Realtime Transport Protocol)を利用し、図示されるように、音声パケットフローは無音パケットと有音パケットから構成される。RTPによると、無音パケットは相対的に小さなサイズを有し、また、無音パケット間の間隔は相対的に大きくなるよう配置され、他方、有音パケットは相対的に大きなサイズを有し、また、有音パケット間の間隔は相対的に小さくなるよう配置される。しかしながら、本発明による音声通信サービスはRTPに限定されず、シーケンス番号及び送信タイムスタンプが含まれているその他のプロトコルが用いられてもよい。例えば、音声区間検出機構を持ち、有音/無音によりパケット形式やパケット送信間隔が異なるコーデック（AMR-WB等）が利用されてもよい。 As shown in FIG. 1, in the voice communication service according to the present embodiment, a wireless terminal performs voice communication via a carrier network (NW). More specifically, each wireless terminal wirelessly accesses a nearby base station in the carrier NW, and the base station exchanges a voice packet flow via the relay node. In this embodiment, the voice packet flow protocol uses RTP (Realtime Transport Protocol) including a sequence number and a transmission time stamp. As shown in the figure, the voice packet flow is composed of a silent packet and a voice packet. . According to RTP, silence packets have a relatively small size, and the spacing between silence packets is arranged to be relatively large, whereas voice packets have a relatively large size, and The intervals between voice packets are arranged to be relatively small. However, the voice communication service according to the present invention is not limited to RTP, and other protocols including a sequence number and a transmission time stamp may be used. For example, a codec (AMR-WB or the like) that has a voice section detection mechanism and has different packet formats and packet transmission intervals depending on sound / silence may be used.

次に、図２を参照して、本発明の一実施例による通信システムを説明する。図２は、本発明の一実施例による通信システムの構成を示すブロック図である。 Next, a communication system according to an embodiment of the present invention will be described with reference to FIG. FIG. 2 is a block diagram showing a configuration of a communication system according to an embodiment of the present invention.

図２に示されるように、通信システム１０は、端末２０、ネットワーク３０、パケットキャプチャ装置４０及び音声品質推定装置１００を有する。 As illustrated in FIG. 2, the communication system 10 includes a terminal 20, a network 30, a packet capture device 40, and a voice quality estimation device 100.

端末２０は、ＶｏＬＴＥ対応端末などのパケットベース音声通信機能を備えた端末である。端末２０のユーザは、ネットワーク３０を介し音声通信により会話を行うことができる。 The terminal 20 is a terminal having a packet-based voice communication function such as a VoLTE compatible terminal. The user of the terminal 20 can perform a conversation by voice communication via the network 30.

ネットワーク３０は、キャリアネットワーク、インターネットなどのパケット交換ネットワークである。端末２０から送信された音声パケットフローは、ネットワーク３０内の中継ノードを介しルーティングされ、宛先の端末２０に送信される。 The network 30 is a packet switching network such as a carrier network or the Internet. The voice packet flow transmitted from the terminal 20 is routed through a relay node in the network 30 and transmitted to the destination terminal 20.

パケットキャプチャ装置４０は、送信側端末２０から送信された音声パケットフローを取得する。図示された実施例では、パケットキャプチャ装置４０は、受信側端末２０とネットワーク３０との間で音声パケットフローを取得する。しかしながら、本発明はこれに限定されず、パケットキャプチャ装置４０は、端末２０間の何れか任意のポイントで音声パケットフローを取得してもよい。パケットキャプチャ装置４０は、取得した音声パケットフローを音声品質推定装置１００に送信する。また、パケットキャプチャ装置４０は、図示されるようなスタンドアローンな装置として実現されてもよいし、あるいは、音声品質推定装置１００に搭載されてもよい。 The packet capture device 40 acquires the voice packet flow transmitted from the transmission side terminal 20. In the illustrated embodiment, the packet capture device 40 acquires a voice packet flow between the receiving terminal 20 and the network 30. However, the present invention is not limited to this, and the packet capture device 40 may acquire the voice packet flow at any arbitrary point between the terminals 20. The packet capture device 40 transmits the acquired voice packet flow to the voice quality estimation apparatus 100. Further, the packet capture device 40 may be realized as a stand-alone device as shown in the figure, or may be mounted on the voice quality estimation device 100.

音声品質推定装置１００は、パケットキャプチャ装置４０から受信した音声パケットフローに対して後述される処理を実行し、音声品質の１つの指標である受聴品質の推定値を出力する。 The voice quality estimation apparatus 100 performs processing to be described later on the voice packet flow received from the packet capture apparatus 40, and outputs an estimated value of listening quality that is one index of voice quality.

音声品質推定装置１００は、典型的には、サーバにより実現されてもよく、例えば、バスを介し相互接続されるドライブ装置、補助記憶装置、メモリ装置、プロセッサ、インタフェース装置及び通信回路から構成される。音声品質推定装置１００における後述される各種機能及び処理を実現するプログラムを含む各種コンピュータプログラムは、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）、フラッシュメモリなどの記録媒体によって提供されてもよい。プログラムを記憶した記録媒体がドライブ装置にセットされると、プログラムが記録媒体からドライブ装置を介して補助記憶装置にインストールされる。但し、プログラムのインストールは必ずしも記録媒体により行う必要はなく、ネットワークなどを介し何れかの外部装置からダウンロードするようにしてもよい。補助記憶装置は、インストールされたプログラムを格納すると共に、必要なファイルやデータなどを格納する。メモリ装置は、プログラムの起動指示があった場合に、補助記憶装置からプログラムやデータを読み出して格納する。プロセッサは、メモリ装置に格納されたプログラムやプログラムを実行するのに必要なパラメータなどの各種データに従って、後述されるような音声品質推定装置１００の各種機能及び処理を実行する。インタフェース装置は、ネットワーク又は外部装置に接続するための通信インタフェースとして用いられる。通信回路は、インターネットなどのネットワークと通信するための各種通信処理を実行する。しかしながら、上述したハードウェア構成は単なる一例であり、音声品質推定装置１００は、上述したハードウェア構成に限定されるものでなく、他の何れか適切なハードウェア構成により実現されてもよい。 The voice quality estimation apparatus 100 may typically be realized by a server, and includes, for example, a drive device, an auxiliary storage device, a memory device, a processor, an interface device, and a communication circuit that are interconnected via a bus. . Various computer programs including programs for realizing various functions and processing described later in the voice quality estimation apparatus 100 are recorded on a recording medium such as a CD-ROM (Compact Disk-Read Only Memory), a DVD (Digital Versatile Disk), and a flash memory. May be provided. When the recording medium storing the program is set in the drive device, the program is installed from the recording medium to the auxiliary storage device via the drive device. However, it is not always necessary to install the program using a recording medium, and the program may be downloaded from any external device via a network or the like. The auxiliary storage device stores the installed program and also stores necessary files and data. The memory device reads and stores the program and data from the auxiliary storage device when there is an instruction to start the program. The processor executes various functions and processes of the speech quality estimation apparatus 100 described later according to various data such as a program stored in the memory device and parameters necessary for executing the program. The interface device is used as a communication interface for connecting to a network or an external device. The communication circuit executes various communication processes for communicating with a network such as the Internet. However, the hardware configuration described above is merely an example, and the speech quality estimation apparatus 100 is not limited to the hardware configuration described above, and may be realized by any other appropriate hardware configuration.

次に、図３を参照して、本発明の一実施例による音声品質推定装置を説明する。図３は、本発明の一実施例による音声品質推定装置を示すブロック図である。 Next, a speech quality estimation apparatus according to an embodiment of the present invention will be described with reference to FIG. FIG. 3 is a block diagram illustrating a speech quality estimation apparatus according to an embodiment of the present invention.

図３に示されるように、音声品質推定装置１００は、有音損失率推定部１１０及び受聴品質推定部１２０を有する。 As shown in FIG. 3, the speech quality estimation apparatus 100 includes a sound loss rate estimation unit 110 and a listening quality estimation unit 120.

有音損失率推定部１１０は、音声パケットフローから有音損失率を推定する。ここで、有音損失率とは、音声パケットフローにおける有音パケットの個数に対する損失した有音パケットの個数の比率を表す。すなわち、有音損失率Ｌは、 The voice loss rate estimation unit 110 estimates the voice loss rate from the voice packet flow. Here, the voice loss rate represents the ratio of the number of lost voice packets to the number of voice packets in the voice packet flow. That is, the sound loss rate L is

により算出可能である。ここで、ｐ_ｌｏｓｔは損失した有音パケットの個数であり、ｐ_ｒｃｖは受信した有音パケットの個数である。

Can be calculated. Here, p _lost is the number of _lost voice packets, and p _rcv is the number of received voice packets.

損失した有音パケットの個数を推定するため、有音損失率推定部１１０はまず、音声パケットフローにおけるパケット損失部分を検出し、検出したパケット損失部分に含まれる有音パケットの個数を推定する。例えば、当該パケット損失部分は、各パケットのヘッダ情報におけるシーケンス番号に基づき検出されてもよい。すなわち、有音損失率推定部１１０は、受信したパケットのシーケンス番号に欠落を検出した場合、欠落したシーケンス番号に対応するパケットを損失したパケットとして判断してもよい。あるいは、有音損失率推定部１１０は、音声パケットフローにおけるパケットの送信タイムスタンプの進行状況からパケット損失部分を判断してもよい。例えば、順次受信した２つのパケットについて、有音パケット間の間隔及び無音パケット間の間隔の何れか大きい方より大きな送信タイムスタンプの差分が検出されると、有音損失率推定部１１０は、当該２つのパケットの間にパケット損失部分があると判断してもよい。 In order to estimate the number of lost voice packets, the voice loss rate estimation unit 110 first detects a packet loss portion in the voice packet flow, and estimates the number of voice packets included in the detected packet loss portion. For example, the packet loss part may be detected based on the sequence number in the header information of each packet. That is, the sound loss rate estimation unit 110 may determine that a packet corresponding to the missing sequence number is a lost packet when a missing is detected in the sequence number of the received packet. Alternatively, the voice loss rate estimation unit 110 may determine the packet loss part from the progress status of the packet transmission time stamp in the voice packet flow. For example, when a difference between transmission time stamps larger than the larger one of the interval between voice packets and the interval between silence packets is detected for two packets received sequentially, the voice loss rate estimation unit 110 It may be determined that there is a packet loss portion between two packets.

このようにしてパケット損失部分を検出すると、有音損失率推定部１１０は、検出したパケット損失部分の前後のパケットが有音パケットと無音パケットとの何れの組み合わせであるかに応じて損失した有音パケットの個数を推定する。なお、有音損失率推定部１１０は、パケットサイズに基づき各パケットが有音パケット又は無音パケットであるか判断してもよい。すなわち、利用されているプロトコルタイプに応じて、有音パケット及び無音パケットのパケットサイズが規定されており、有音損失率推定部１１０は、各パケットのパケットサイズを判断することによって、当該パケットが有音パケット又は無音パケットであるか判断できる。 When the packet loss part is detected in this way, the voice loss rate estimation unit 110 detects whether there is a lost packet depending on which combination of the voice packet and the silence packet the packets before and after the detected packet loss part are. Estimate the number of sound packets. Note that the sound loss rate estimation unit 110 may determine whether each packet is a sound packet or a sound packet based on the packet size. That is, the packet size of the voice packet and the voiceless packet is defined according to the protocol type being used, and the voice loss rate estimation unit 110 determines the packet size by determining the packet size of each packet. It can be determined whether the packet is a voice packet or a silent packet.

具体的には、ケース１）損失したパケットの前後のパケットが有音パケットである場合、有音損失率推定部１１０は、損失したパケットを有音パケットと推定してもよい。すなわち、図４に示されるように、損失したパケットの前後のパケットが有音パケットである場合、有音損失率推定部１１０は、損失した全てのパケットを有音パケットであると推定してもよい。そして、有音損失率推定部１１０は、損失したパケットの前後のパケットのシーケンス番号に基づき損失したパケットの個数を算出し、当該損失した個数を損失した有音パケットの個数として決定してもよい。 Specifically, Case 1) When the packet before and after the lost packet is a voice packet, the voice loss rate estimation unit 110 may estimate the lost packet as a voice packet. That is, as shown in FIG. 4, when the packets before and after the lost packet are voice packets, the voice loss rate estimation unit 110 estimates that all the lost packets are voice packets. Good. Then, the voice loss rate estimation unit 110 may calculate the number of lost packets based on the sequence numbers of the packets before and after the lost packet, and may determine the lost number as the number of lost voice packets. .

次に、ケース２）損失したパケットの前後のパケットが無音パケットである場合、有音損失率推定部１１０は、損失したパケットを無音パケットと推定してもよい。すわなち、図４に示されるように、損失した前後のパケットが無音パケットである場合、有音損失率推定部１１０は、損失した全てのパケットを無音パケットであると推定する。この場合、有音損失率推定部１１０は、損失した有音パケットの個数を０として決定する。 Next, Case 2) When the packets before and after the lost packet are silence packets, the voice loss rate estimation unit 110 may estimate the lost packet as a silence packet. That is, as shown in FIG. 4, when the lost packet is a silent packet, the voice loss rate estimation unit 110 estimates all the lost packets as silent packets. In this case, the sound loss rate estimation unit 110 determines the number of lost sound packets as 0.

次に、ケース３）損失したパケットの前のパケットが有音パケットであって、後のパケットが無音パケットである場合、有音損失率推定部１１０は、損失したパケットを無音パケットと推定してもよい。すなわち、図４に示されるように、損失したパケットの前のパケットが有音パケットであって、後のパケットが無音パケットである場合、有音損失率推定部１１０は、損失した全てのパケットを無音パケットであると推定してもよい。この場合、有音損失率推定部１１０は、損失した有音パケットの個数を０として決定する。 Next, Case 3) When the packet before the lost packet is a voice packet and the subsequent packet is a voice packet, the voice loss rate estimation unit 110 estimates the lost packet as a voice packet. Also good. That is, as shown in FIG. 4, when the packet before the lost packet is a voice packet and the subsequent packet is a silence packet, the voice loss rate estimation unit 110 displays all the lost packets. It may be estimated that the packet is a silent packet. In this case, the sound loss rate estimation unit 110 determines the number of lost sound packets as 0.

次に、ケース４）損失したパケットの前のパケットが無音パケットであって、後のパケットが有音パケットである場合、有音損失率推定部１１０は、損失したパケットのシーケンス番号と送信タイムスタンプとに基づき損失した有音パケットの個数を推定してもよい。具体的には、有音損失率推定部１１０は、損失したパケットの前後のパケットのシーケンス番号に基づき損失したパケットの個数を算出すると共に、損失したパケットの前後のパケットの送信タイムスタンプの差分を算出する。例えば、図４に示される具体例では、損失したパケットの前後のパケットのシーケンス番号はそれぞれ、"１２０"と"１２３"であるため、損失したパケットは２個であると分かる。また、送信タイムスタンプはそれぞれ、"０：００：００．０１００"と"０：００：００．０４００"であるため、送信タイムスタンプの差分は３００ｍｓであると分かる。このとき、有音損失率推定部１１０は、 Next, Case 4) When the packet before the lost packet is a silent packet and the subsequent packet is a voice packet, the voice loss rate estimation unit 110 determines the sequence number and transmission time stamp of the lost packet. Based on the above, the number of lost voice packets may be estimated. Specifically, the sound loss rate estimation unit 110 calculates the number of lost packets based on the sequence numbers of the packets before and after the lost packet, and calculates the difference between the transmission time stamps of the packets before and after the lost packet. calculate. For example, in the specific example shown in FIG. 4, since the sequence numbers of the packets before and after the lost packet are “120” and “123”, respectively, it can be understood that there are two lost packets. Further, since the transmission time stamps are “0: 00: 00.0100” and “0: 00: 00.0400”, respectively, it can be seen that the difference between the transmission time stamps is 300 ms. At this time, the sound loss rate estimation unit 110

を満たす整数ｘを損失した有音パケットの個数として決定してもよい。ただし、Ｎは損失したパケットの個数であり、Ｔは送信タイムスタンプの差分であり、ｔ_０は無音パケット送信間隔であり、ｔ_１は有音パケット送信間隔であり、ｘ（≦Ｎ）は有音パケットの個数である。利用されるプロトコルタイプに応じて、無音パケット送信間隔及び有音パケット送信間隔は規定される。なお、ｘの値が複数存在する場合、有音損失率推定部１１０は、任意の１つを選択してもよい。

An integer x satisfying the above may be determined as the number of lost voice packets. Where N is the number of lost packets, T is the difference in transmission timestamps, t ₀ is the silent packet transmission interval, t ₁ is the voice packet transmission interval, and x (≦ N) is present. The number of sound packets. Depending on the protocol type used, the silence packet transmission interval and the voice packet transmission interval are defined. When there are a plurality of values of x, the sound loss rate estimation unit 110 may select any one.

このようにして損失した有音パケットの個数を決定すると、有音損失率推定部１１０は、上述した式に従って有音損失率Ｌを決定することができる。 When the number of lost voice packets is determined in this way, the voice loss rate estimation unit 110 can determine the voice loss rate L according to the above-described equation.

受聴品質推定部１２０は、所定のマッピング関数を利用して、推定された有音損失率から受聴品質を推定する。当該所定のマッピング関数は、有音損失率と受聴品質推定値との対応関係を示す何れかの適切な関係式であり、例えば、事前の実験により様々な端末にて有音損失率を与えた状況でPOLQA測定した受聴品質推定値MOS_LQOと与えられた有音損失率との間の対応関係を示す関係式であってもよい。 The listening quality estimation unit 120 estimates the listening quality from the estimated sound loss rate using a predetermined mapping function. The predetermined mapping function is any appropriate relational expression indicating the correspondence relationship between the sound loss rate and the listening quality estimation value. For example, the sound loss rate was given to various terminals by a prior experiment. It may be a relational expression indicating a correspondence relationship between the listening quality estimation value MOS_LQO measured by POLQA in a situation and a given sound loss rate.

より詳細には、マッピング関数は、事前の実験により求められ、品質劣化環境（エミュレータ設置の検証環境等）で、対象端末を用いて網内/端末パケット取得とPOLQAによる受聴品質測定とを実施し、有音損失率と受聴品質推定値MOS_LQO(POLQA)との組み合わせ（複数のバリエーションがあると好ましい）による関数式のカーブフィッティングにより決定されてもよい。例えば、マッピング関数は、 More specifically, the mapping function is obtained by a prior experiment, and in the quality degradation environment (e.g., the verification environment where the emulator is installed), the target terminal is used to perform in-network / terminal packet acquisition and listening quality measurement using POLQA. Further, it may be determined by curve fitting of a functional expression based on a combination (preferably with a plurality of variations) of the sound loss rate and the listening quality estimation value MOS_LQO (POLQA). For example, the mapping function is

であってもよい。ここで、MOS_LQOは受聴品質推定値であり、p1, p2はカーブフィッティングの精度が良い関数形状となる値であり、それぞれ、基準とする端末やコーデックにより実験的に定める値である。

It may be. Here, MOS _LQO is a listening quality estimation value, and p1 and p2 are values that have a function shape with good curve fitting accuracy, and are values that are experimentally determined by a reference terminal or codec, respectively.

しかしながら、本発明によるマッピング関数はこれに限定されず、例えば、実験で求めた有音損失率と受聴品質推定値との対応関係に精度よくフィッティングできる関数であればよく、例えば、二次関数等の線形関数であってもよいし、非線形関数であってもよい。 However, the mapping function according to the present invention is not limited to this, and may be any function that can be accurately fitted to the correspondence relationship between the sound loss rate and the listening quality estimation value obtained through experiments, for example, a quadratic function, The linear function may be a non-linear function.

次に、図５を参照して、本発明の一実施例による音声品質推定処理を説明する。図５は、本発明の一実施例による音声品質推定処理を示すフロー図である。 Next, a speech quality estimation process according to an embodiment of the present invention will be described with reference to FIG. FIG. 5 is a flowchart showing the voice quality estimation process according to one embodiment of the present invention.

図５に示されるように、ステップＳ１０１において、有音損失率推定部１１０は、音声パケットフローを取得する。具体的には、有音損失率推定部１１０は、パケットキャプチャ装置４０から音声パケットフローを取得する。 As shown in FIG. 5, in step S <b> 101, the sound loss rate estimation unit 110 acquires a voice packet flow. Specifically, the voice loss rate estimation unit 110 acquires a voice packet flow from the packet capture device 40.

ステップＳ１０２において、有音損失率推定部１１０は、音声パケットフローから有音損失率を推定する。具体的には、有音損失率推定部１１０は、音声パケットフローにおける損失したパケットの前後のパケットが有音パケットと無音パケットとの何れの組み合わせであるかに応じて有音パケット損失数を推定し、推定された有音パケット損失数から有音損失率を推定する。上述したように、有音損失率推定部１１０は、ケース１）損失したパケットの前後のパケットが有音パケットである場合、損失したパケットを有音パケットと推定し、ケース２）損失したパケットの前後のパケットが無音パケットである場合、損失したパケットを無音パケットと推定し、ケース３）損失したパケットの前のパケットが有音パケットであって、後のパケットが無音パケットである場合、損失したパケットを無音パケットと推定し、ケース４）損失したパケットの前のパケットが無音パケットであって、後のパケットが有音パケットである場合、損失したパケットのシーケンス番号と送信タイムスタンプとに基づき有音パケットの個数を推定してもよい。ここで、ケース４において、有音損失率推定部１１０は、 In step S102, the voice loss rate estimation unit 110 estimates the voice loss rate from the voice packet flow. Specifically, the voice loss rate estimation unit 110 estimates the number of voice packet losses depending on which combination of voice packets and voice packets the packets before and after the lost packet in the voice packet flow are. The voice loss rate is estimated from the estimated number of voice packet losses. As described above, the sound loss rate estimation unit 110 estimates that the lost packet is a sound packet when the packet before and after the case 1) lost packet is a sound packet, and the case 2) of the lost packet If the preceding and following packets are silence packets, the lost packet is assumed to be a silence packet. Case 3) If the packet before the lost packet is a voice packet and the subsequent packet is a silence packet, it is lost. Case 4) When the packet before the lost packet is a silent packet and the subsequent packet is a voice packet, the packet is estimated based on the sequence number of the lost packet and the transmission time stamp. The number of sound packets may be estimated. Here, in Case 4, the sound loss rate estimation unit 110

を満たす整数ｘを損失した有音パケットの個数として決定してもよい。ただし、Ｎは損失したパケットの個数であり、Ｔは送信タイムスタンプの差分であり、ｔ_０は無音パケット送信間隔であり、ｔ_１は有音パケット送信間隔であり、ｘ（≦Ｎ）は有音パケットの個数である。

An integer x satisfying the above may be determined as the number of lost voice packets. Where N is the number of lost packets, T is the difference in transmission timestamps, t ₀ is the silent packet transmission interval, t ₁ is the voice packet transmission interval, and x (≦ N) is present. The number of sound packets.

ステップＳ１０３において、受聴品質推定部１２０は、所定のマッピング関数を利用して、推定された有音損失率から受聴品質を推定する。当該マッピング関数は、有音損失率と受聴品質推定値の対応関係を示す何れかの関係式であってもよく、事前の実験等により定義されてもよい。 In step S103, the listening quality estimation unit 120 estimates listening quality from the estimated sound loss rate using a predetermined mapping function. The mapping function may be any relational expression indicating the correspondence relationship between the sound loss rate and the listening quality estimation value, and may be defined by a prior experiment or the like.

なお、上述した音声品質推定装置１００の各部及びステップＳ１０１〜Ｓ１０３は、コンピュータのメモリに記憶されたプログラムをプロセッサが実行することによって実現されてもよい。 Note that each unit of the speech quality estimation apparatus 100 and steps S101 to S103 described above may be realized by a processor executing a program stored in a memory of a computer.

以上、本発明の実施例について詳述したが、本発明は上述した特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 As mentioned above, although the Example of this invention was explained in full detail, this invention is not limited to the specific embodiment mentioned above, In the range of the summary of this invention described in the claim, various deformation | transformation・ Change is possible.

１０通信システム
２０端末
３０ネットワーク
４０パケットキャプチャ装置
１００音声品質推定装置
１１０有音損失率推定部
１２０受聴品質推定部 DESCRIPTION OF SYMBOLS 10 Communication system 20 Terminal 30 Network 40 Packet capture apparatus 100 Voice quality estimation apparatus 110 Sound loss rate estimation part 120 Listen quality estimation part

Claims

A voice loss rate estimator that estimates the voice loss rate from the voice packet flow;
A listening quality estimation unit that estimates listening quality from the estimated sound loss rate using a predetermined mapping function;
A speech quality estimation device comprising:
The voiced loss rate estimation unit estimates the number of voiced packet losses according to which combination of voiced packets and voiceless packets before and after the lost packet in the voice packet flow, and the estimation A speech quality estimation apparatus for estimating the speech loss rate from the number of speech packet loss.

The sound loss rate estimation unit
i) When packets before and after the lost packet are voice packets, the lost packet is estimated as a voice packet;
ii) If the packets before and after the lost packet are silent packets, the lost packet is estimated as a silent packet;
iii) If the packet before the lost packet is a voice packet and the subsequent packet is a silence packet, the lost packet is estimated as a silence packet;
iv) When the packet before the lost packet is a silent packet and the subsequent packet is a voice packet, the number of voice packet loss is estimated based on the sequence number of the lost packet and a transmission time stamp. The speech quality estimation apparatus according to claim 1.

The voice loss rate estimation unit iv) when the packet before the lost packet is a silent packet and the subsequent packet is a voice packet,

(Where N is the number of lost packets, T is the difference between the transmission time stamps of the preceding and succeeding packets, t ₀ is the silent packet transmission interval, t ₁ is the voice packet transmission interval, and x ( ≦ N) is the number of voice packet loss)
The speech quality estimation apparatus according to claim 2, wherein the voice packet loss number is estimated by obtaining an integer x satisfying

The speech quality estimation apparatus according to any one of claims 1 to 3, wherein the speech loss rate estimation unit determines whether each packet is a speech packet or a silence packet based on a packet size.

The predetermined mapping function is:

(However, MOS _LQO = listening quality estimate, p1 and p2 are predetermined constants)
The speech quality estimation apparatus according to any one of claims 1 to 4, wherein

Obtaining a voice packet flow;
Estimating a voiced loss rate from the voice packet flow;
Estimating listening quality from the estimated sound loss rate using a predetermined mapping function;
A speech quality estimation method comprising:
The step of estimating the voice loss rate estimates the number of voice packet loss according to which combination of the voice packet and the voice packet the packet before and after the lost packet in the voice packet flow is, A method of estimating the voice loss rate from the estimated number of voice packet losses.

The program for functioning a processor as each part of the audio | voice quality estimation apparatus as described in any one of Claims 1 thru | or 5.