JP2005184220A

JP2005184220A - Voice communication apparatus and voice quality estimation method

Info

Publication number: JP2005184220A
Application number: JP2003419551A
Authority: JP
Inventors: Satohiko Watabe; 聡彦渡部; Izuru Yagakinai; 出野垣内
Original assignee: Toyota Motor Corp; Toyota InfoTechnology Center Co Ltd
Current assignee: Toyota Motor Corp; Toyota InfoTechnology Center Co Ltd
Priority date: 2003-12-17
Filing date: 2003-12-17
Publication date: 2005-07-07

Abstract

<P>PROBLEM TO BE SOLVED: To inexpensively estimate voice quality without the need for a large-scale system configuration in the case of making a speech by transmitting / receiving a voice packet to / from a speech opposite apparatus being a speech opposite party via a network. <P>SOLUTION: A voice communication apparatus for making a speech by transmitting / receiving a voice packet to / from the speech opposite apparatus being the speech opposite party via the network includes: a detection means for detecting a state of the network located between the voice communication apparatus and the speech opposite apparatus; an estimation means for estimating the voice quality on the basis of the network state detected by the detection means; and an output means for outputting information denoting the voice quality estimated by the estimation means; and also a recording means for recording a transmission timing every time an echo request packet is transmitted and recording a reception timing every time an echo response packet is received, and the detection means detects the state of the network on the basis of the recording contents. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、音声パケットを通話相手側である通話相手装置との間でネットワークを介して送受することにより通話を行うための音声通信装置において、音質を推定するための技術に関する。 The present invention relates to a technique for estimating sound quality in a voice communication apparatus for performing a call by transmitting and receiving voice packets to and from a call partner apparatus that is a call partner side via a network.

従来、任意の評価基準に基づきネットワークの性能を評価するネットワーク評価システムが提案されている（例えば特許文献１参照）。
特開２００２−１６４８９１号公報 Conventionally, a network evaluation system for evaluating network performance based on an arbitrary evaluation criterion has been proposed (see, for example, Patent Document 1).
JP 2002-164891 A

しかしながら、この従来のネットワーク評価システムは、入力装置から入力された要求値と、測定装置におけるトラフィックの測定の結果得られた実測値とを、任意の評価基準に基づき評価するものであることから、比較的大がかりなシステム構成となり、安価に音質を評価することができないという問題がある。 However, this conventional network evaluation system evaluates the required value input from the input device and the actual measurement value obtained as a result of the traffic measurement in the measurement device based on an arbitrary evaluation criterion. There is a problem that the system configuration becomes relatively large and the sound quality cannot be evaluated at a low cost.

本発明の課題は、音声パケットを通話相手側である通話相手装置との間でネットワークを介して送受することにより通話を行う場合において、大がかりなシステム構成を必要とせずにしかも安価に、音質を推定するための技術を提供することにある。 An object of the present invention is to reduce the sound quality without requiring a large-scale system configuration when performing a call by sending and receiving voice packets to and from a call partner apparatus that is a call partner side via a network. It is to provide a technique for estimation.

本発明は、上記課題を解決するためになされたものであり、音声パケットを通話相手側である通話相手装置との間でネットワークを介して送受することにより通話を行うための音声通信装置であって、前記通話相手装置との間のネットワークの状態を検出する検出手段と、前記検出手段によって検出されたネットワークの状態に基づいて音質を推定する推定手段と、前記推定手段によって推定された音質を表す情報を出力する出力手段と、を備える構成とした。 The present invention has been made to solve the above-described problems, and is a voice communication device for performing a call by transmitting and receiving voice packets to and from a call partner device that is a call partner side via a network. Detecting means for detecting the state of the network with the other party device, estimating means for estimating the sound quality based on the state of the network detected by the detecting means, and the sound quality estimated by the estimating means Output means for outputting the information to be expressed.

本発明によれば、音声パケットを通話相手側である通話相手装置との間でネットワークを介して送受することにより通話を行う場合において、ネットワークの状態を検出し、その検出されたネットワークの状態に基づいて音質を推定することから、大がかりなシステム構成を必要とせずにしかも安価に、音質を推定することが可能となる。 According to the present invention, when a call is made by sending and receiving a voice packet to and from a call partner apparatus on the call partner side via a network, the network state is detected, and the detected network state is set. Since the sound quality is estimated based on the sound quality, the sound quality can be estimated at a low cost without requiring a large-scale system configuration.

上記音声通信装置においては、例えば、（例えば前記通話相手装置に対して）エコー要求パケットを送信し、そのエコー要求パケットを受信した相手装置（例えば前記通話相手装置）から返送されるエコー応答パケットを受信する送受信手段と、前記エコー要求パケットを送信するごとにその送信タイミングを記録するとともに、前記エコー応答パケットを受信するごとにその受信タイミングを記録する記録手段と、をさらに備え、前記検出手段は、前記記録手段によって記録された内容に基づいて、前記ネットワークの状態を検出する。 In the voice communication device, for example, an echo request packet is transmitted (for example, to the call partner device), and an echo response packet returned from the partner device (for example, the call partner device) that has received the echo request packet is transmitted. Receiving / transmitting means; and recording means for recording the transmission timing each time the echo request packet is transmitted and recording the reception timing each time the echo response packet is received; and the detection means The network status is detected based on the content recorded by the recording means.

これは、ネットワークの状態の検出例を示したものであり、本発明の検出手段はこれに限定されない。このように、エコー要求パケット及びエコー応答パケットを用いてネットワークの状態を検出し、その検出されたネットワークの状態に基づいて音質を推定することから、大がかりなシステム構成を必要とせずにしかも安価に、片側から音質を推定することが可能となる。 This shows an example of detecting the state of the network, and the detection means of the present invention is not limited to this. As described above, the network status is detected using the echo request packet and the echo response packet, and the sound quality is estimated based on the detected network status, so that a large-scale system configuration is not required and the cost is low. The sound quality can be estimated from one side.

また、上記音声通信装置においては、例えば、前記検出手段によって検出される前記ネットワークの状態は、自装置から送信されたパケットが前記ネットワークを介して相手装置（例えば前記通話相手装置）に到着し、そのパケットに応答して相手装置（例えば前記通話相手装置）から返送されるパケットが自装置へ到着するまでのラウンドトリップ時間に対応する値で表される。 In the voice communication device, for example, the state of the network detected by the detection unit is such that a packet transmitted from the own device arrives at a partner device (for example, the partner device) via the network, It is represented by a value corresponding to a round trip time until a packet returned from the partner apparatus (for example, the communication partner apparatus) in response to the packet arrives at the own apparatus.

これは、前記検出手段によって検出される前記ネットワークの状態の例示である。本発明において検出されるネットワークの状態はこれに限定されない。例えば、前記検出手段によって検出される前記ネットワークの状態は、前記送受信手段によって送受されるパケットのパケットロス率に対応する値で表されるものであってもよい。 This is an example of the state of the network detected by the detection means. The network state detected in the present invention is not limited to this. For example, the network state detected by the detection unit may be represented by a value corresponding to a packet loss rate of a packet transmitted / received by the transmission / reception unit.

本発明は方法の発明として次のように特定することもできる。
音声パケットを通話相手側である通話相手装置との間でネットワークを介して送受することにより通話を行うための音声通信装置において音質を推定するための方法であって、前記通話相手装置との間のネットワークの状態を検出する検出ステップと、前記検出ステップによって検出されたネットワークの状態に基づいて音質を推定する推定ステップと、前記推定ステップによって推定された音質を表す情報を出力する出力ステップと、を備える音質推定方法。 The present invention can also be specified as a method invention as follows.
A method for estimating sound quality in a voice communication apparatus for performing a call by transmitting and receiving voice packets to and from a call partner apparatus that is a call partner side, A detection step for detecting a network state of the network, an estimation step for estimating sound quality based on the network state detected by the detection step, and an output step for outputting information representing the sound quality estimated by the estimation step; A sound quality estimation method comprising:

上記音質推定方法においては、例えば、（例えば前記通話相手装置に対して）エコー要求パケットを送信し、そのエコー要求パケットを受信した相手装置（例えば前記通話相手装置）から返送されるエコー応答パケットを受信する送受信ステップと、前記エコー要求パケットを送信するごとにその送信タイミングを記録するとともに、前記エコー応答パケットを受信するごとにその受信タイミングを記録する記録ステップと、をさらに備え、前記検出ステップは、前記記録手段によって記録された内容に基づいて、前記ネットワークの状態を検出する。 In the sound quality estimation method, for example, an echo request packet is transmitted (for example, to the call partner device), and an echo response packet returned from the partner device (for example, the call partner device) that has received the echo request packet is transmitted. A transmission / reception step for receiving, and a recording step for recording the transmission timing each time the echo request packet is transmitted, and a recording step for recording the reception timing each time the echo response packet is received, the detection step further comprising: The network status is detected based on the content recorded by the recording means.

また、上記音質推定方法においては、例えば、前記検出ステップによって検出される前記ネットワークの状態は、自装置から送信されたパケットが前記ネットワークを介して相手装置（例えば前記通話相手装置）に到着し、そのパケットに応答して相手装置（例えば前記通話相手装置）から返送されるパケットが自装置へ到着するまでのラウンドトリップ時間に対応する値で表される。 In the sound quality estimation method, for example, the network state detected by the detection step is such that a packet transmitted from the own device arrives at the partner device (for example, the other party device) via the network, It is represented by a value corresponding to a round trip time until a packet returned from the partner apparatus (for example, the communication partner apparatus) in response to the packet arrives at the own apparatus.

また、前記検出ステップによって検出される前記ネットワークの状態は、前記送受信ステップによって送受されるパケットのパケットロス率に対応する値で表される The network state detected by the detecting step is represented by a value corresponding to the packet loss rate of the packet transmitted / received by the transmitting / receiving step.

本発明によれば、音声パケットを通話相手側である通話相手装置との間でネットワークを介して送受することにより通話を行う場合において、大がかりなシステム構成を必要とせずにしかも安価に、音質を推定することが可能となる。 According to the present invention, when a call is made by transmitting and receiving voice packets to and from a call partner device that is a call partner side, a large-scale system configuration is not required and sound quality is reduced at a low cost. It is possible to estimate.

以下、本発明の一実施形態である音声通信装置を包含する通信システムについて図面を参照しながら説明する。図１は、本発明の一実施形態である音声通信装置の機能ブロック図である。
（音声通信装置の構成）
音声通信装置１００は、ＶｏＩＰ(Voice over IP) により、音声パケットを通話相手側である通話相手装置との間でＩＰネットワークを介して送受することで通話を実現する機能を有する通信装置である。例えば、既存の一般的な電話機をＶｏＩＰ機能付きのモデム
等を介してＩＰネットワークに接続する、phone to phone型のＩＰ電話を構成する場合には、そのＶｏＩＰ機能付きのモデム等が音声通信装置１００に相当する。また、一般的なパーソナルコンピュータ等の情報処理装置を介してＩＰネットワークに接続する、PC to phone型のＩＰ電話を構成する場合には、そのパーソナルコンピュータ等の情報処理装置が音声通信装置１００に相当する。また、ＶｏＩＰ機能内蔵型のＩＰ電話を構成する場合には、その電話機自体が音声通信装置１００に相当する。 Hereinafter, a communication system including a voice communication apparatus according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a functional block diagram of a voice communication apparatus according to an embodiment of the present invention.
(Configuration of voice communication device)
The voice communication apparatus 100 is a communication apparatus having a function of realizing a call by transmitting and receiving voice packets to and from a call partner apparatus that is a call partner side by VoIP (Voice over IP). For example, when configuring a phone-to-phone type IP telephone in which an existing general telephone is connected to an IP network via a modem with a VoIP function, the voice communication apparatus 100 is connected to the modem with the VoIP function. It corresponds to. Further, when configuring a PC to phone type IP phone connected to an IP network via an information processing device such as a general personal computer, the information processing device such as the personal computer corresponds to the voice communication device 100. To do. Further, when configuring an IP telephone with a built-in VoIP function, the telephone itself corresponds to the voice communication apparatus 100.

図１に示すように、音声通信装置１００は、入力機能１０１、エコー(Echo)パケット送信機能１０２、エコーパケット受信機能１０３、送受信状態記録機能１０４、送受信記録評価機能１０５、評価結果出力機能１０６、及び、音声コーデック(codec)情報格納機能１０７等を備えている。これらの機能は、所定ソフトウエア又は回路等により実現される。 As shown in FIG. 1, the voice communication apparatus 100 includes an input function 101, an echo packet transmission function 102, an echo packet reception function 103, a transmission / reception status recording function 104, a transmission / reception record evaluation function 105, an evaluation result output function 106, And an audio codec (codec) information storage function 107 and the like. These functions are realized by predetermined software or a circuit.

音声通信装置１００は、通信機能（図示せず）も備えており、この通信機能を介してＩＰネットワークに接続される。また、音声通信装置１００は、図示しないがコーデック（通常複数）も備えており、このコーデックによりデコード機能及びエンコード機能を実現する。なお、コーデックとは、アナログ信号である音声をデジタル化する際の符号化／復号化の装置やデバイス等のことである。音声コーデック情報格納機能１０７は、コーデックとそのビットレートや送信間隔などのコーデック情報との対応関係等を格納するためのものである。その他の機能については、以下の動作説明等により明確にされる。
（音声通信装置の動作）
次に、上記構成の音声通信装置１００の動作について図面を参照しながら説明する。図２は、音声通信装置１００の動作について説明するためのフローチャートである。
（音質推定処理概要）
本実施形態では、いわゆるpingのエコー要求パケットを用いて、通話相手装置との間のネットワークの状態（例えば遅延により表される）を求め、この遅延やその他の値に対して周知の音質推定方法（ＥＩＦ法）を適用することで音質を推定する。 The voice communication apparatus 100 also has a communication function (not shown), and is connected to the IP network via this communication function. The voice communication apparatus 100 also includes a codec (usually a plurality) (not shown). The codec realizes a decoding function and an encoding function. Note that the codec is an encoding / decoding device or device for digitizing audio that is an analog signal. The audio codec information storage function 107 is for storing the correspondence between codecs and their code rates such as bit rate and transmission interval. Other functions will be clarified by the following operation description.
(Operation of voice communication device)
Next, the operation of the voice communication apparatus 100 configured as described above will be described with reference to the drawings. FIG. 2 is a flowchart for explaining the operation of the voice communication apparatus 100.
(Sound quality estimation process overview)
In this embodiment, a so-called ping echo request packet is used to determine the state of the network (for example, represented by a delay) with the other device, and a known sound quality estimation method for this delay and other values. The sound quality is estimated by applying (EIF method).

図２に示すように、エコー要求パケットの送信先である通話相手装置（例えば通話する予定の相手側の音声通信装置１００）が指定されたとする（Ｓ１００）。これは例えば入力機能１０１を介してユーザーが指定する。通話相手装置が指定されると、音声通信装置１００は、エコー要求パケット（エコーパケットともいう）を、その指定された通話相手装置に対して、予め設定された条件で送信開始する（Ｓ１０１）。これは例えばエコーパケット送信機能１０２が行う。 As shown in FIG. 2, it is assumed that a communication partner apparatus (for example, a voice communication apparatus 100 on the other party scheduled to make a call) is designated (S100). This is designated by the user via the input function 101, for example. When the other party device is designated, the voice communication device 100 starts to transmit an echo request packet (also referred to as an echo packet) to the designated other party device under preset conditions (S101). This is performed by the echo packet transmission function 102, for example.

なお、ping/ICMPを使用するので、Ｓ１００は単なるＰＣ（パーソナルコンピュータ等の情報処理装置）／ルータ等でもよい。そのため、実装前の簡易評価も可能となっている。 Since ping / ICMP is used, S100 may be a simple PC (information processing apparatus such as a personal computer) / router. Therefore, simple evaluation before mounting is also possible.

予め設定された条件とは、パケット長や送信間隔などである。この条件は例えば入力機能１０１を介してユーザーが設定する。または、次のようにして設定することも考えられる。まず、音声通信装置１００の記憶装置（音声コーデック情報格納機能１０７）にコーデックとコーデック情報（パケット長や送信間隔など）との対応関係を（例えばテーブルの形式で）保持させておく。そして、音声通信装置１００において使用するコーデックが定まった場合に、その使用予定のコーデックに対応するコーデック情報を先ほどの対応関係から取得して（読み出して）、これを設定する。 The preset conditions are packet length, transmission interval, and the like. This condition is set by the user via the input function 101, for example. Alternatively, it can be set as follows. First, the correspondence between the codec and the codec information (packet length, transmission interval, etc.) is held in the storage device (voice codec information storage function 107) of the voice communication apparatus 100 (for example, in the form of a table). Then, when the codec to be used in the voice communication apparatus 100 is determined, the codec information corresponding to the codec scheduled to be used is acquired (read out) from the corresponding relationship and set.

あるいは、入力機能１０１を介してユーザーにパケット長を入力させるとともにビットレートを指定させ、これらに基づいて送信間隔を計算させ、これらを設定する。
以上のようにして、パケット長や送信間隔やビットレートなどの条件を予め設定するこ
とが可能となる。 Alternatively, the user is allowed to input the packet length via the input function 101, specify the bit rate, calculate the transmission interval based on these, and set them.
As described above, conditions such as packet length, transmission interval, and bit rate can be set in advance.

音声通信装置１００は、エコー要求パケットを、Ｓ１００で指定された通話相手装置に対して、予め設定された条件、すなわち上記設定されたパケット長、及び送信間隔などの条件に従って連続送信する（Ｓ１０１）。この送信されるエコー要求パケットの内容には何番目のパケットかが格納されている。図３に、エコー要求パケット（後述のエコー応答パケットも同様）のフォーマットを示す。これは、ＩＣＭＰ(Internet Control Message Protocol)メッセージのフォーマットである。シーケンス番号には、エコー要求パケットを送信するごとに、１づつ増加する値が設定される。すなわち、シーケンス番号が、何番目のパケットかを示している。なお、タイプ及びコード部分には本パケットがエコー要求パケットであることを特定するためのデータが設定される。オプションデータ部分にはエコー要求パケットの送出時刻が設定される。エコー要求パケットを受信した通話相手装置はその送出時刻を含むエコー応答パケットを送り返すことから、その送出時刻とこのエコー応答パケットを受け取った時刻とを比較することで、音声通信装置１００において、送信から応答までに要した時間（ラウンドトリップ時間）を知ることが可能となる。 The voice communication device 100 continuously transmits the echo request packet to the communication partner device specified in S100 according to preset conditions, that is, the set packet length, transmission interval, and other conditions (S101). . The number of the packet is stored in the contents of the echo request packet to be transmitted. FIG. 3 shows a format of an echo request packet (the same applies to an echo response packet described later). This is an ICMP (Internet Control Message Protocol) message format. The sequence number is set to a value that increases by 1 each time an echo request packet is transmitted. That is, the sequence number indicates the number of the packet. In the type and code portion, data for specifying that this packet is an echo request packet is set. The transmission time of the echo request packet is set in the option data part. Since the other party device that has received the echo request packet sends back an echo response packet including the transmission time, the voice communication device 100 compares the transmission time with the time at which the echo response packet is received. It becomes possible to know the time (round trip time) required for the response.

音声通信装置１００は、エコー要求パケットの送信開始とともに、そのエコー要求パケットを受信した通話相手装置から返送されるエコー応答パケット（エコーパケットともいう）の受信を開始する（Ｓ１０２）。これは例えばエコーパケット受信機能１０３が行う。また、音声通信装置１００は、各パケット番号ごとに、送信時刻、及び受信時刻を所定ファイル等に記録（保存）する（Ｓ１０３）。すなわち、音声通信装置１００は、エコー要求パケットを送信するごとにその送信タイミング（例えば送信時刻）を記録する。また、音声通信装置１００は、エコー応答パケットを受信するごとにその受信タイミング（例えば受信時刻）を記録する。これの記録は例えば送受信状態記録機能１０４が行う。これは、本発明の記録手段に相当する。 The voice communication apparatus 100 starts receiving an echo response packet (also referred to as an echo packet) returned from the communication partner apparatus that has received the echo request packet at the same time as the transmission of the echo request packet is started (S102). This is performed by the echo packet receiving function 103, for example. Further, the voice communication device 100 records (saves) the transmission time and the reception time in a predetermined file or the like for each packet number (S103). That is, the voice communication apparatus 100 records the transmission timing (for example, transmission time) each time an echo request packet is transmitted. The voice communication device 100 records the reception timing (for example, reception time) every time an echo response packet is received. This recording is performed, for example, by the transmission / reception state recording function 104. This corresponds to the recording means of the present invention.

音声通信装置１００は、所定条件を満たした場合、エコー要求パケット及びエコー応答パケットの送受信を終了する（Ｓ１０４）。所定条件としては、例えば、終了指示が入力されたこと、送信回数が予め設定された回数に達したこと、あるいは、予め設定した時間が経過したこと、がある。音声通信装置１００は、エコー要求パケット等の送受信が終了すると、Ｓ１０３で所定ファイルに記録された内容（送受信記録）から音質を推定し、出力する（Ｓ１０５）。これは例えば送受信記録評価機能１０５が行う。
（音質推定方法）
次にこの音質の推定方法（Ｓ１０５）について説明する。これは本発明の推定手段に相当する送受信記録評価機能１０５が行う。 If the predetermined condition is satisfied, the voice communication device 100 ends transmission / reception of the echo request packet and the echo response packet (S104). Examples of the predetermined condition include that an end instruction has been input, that the number of transmissions has reached a preset number, or that a preset time has elapsed. When the transmission / reception of the echo request packet or the like is completed, the voice communication device 100 estimates and outputs the sound quality from the content (transmission / reception recording) recorded in the predetermined file in S103 (S105). This is performed, for example, by the transmission / reception record evaluation function 105.
(Sound quality estimation method)
Next, the sound quality estimation method (S105) will be described. This is performed by the transmission / reception record evaluation function 105 corresponding to the estimation means of the present invention.

本実施形態では公知のＥＩＦ法（ITU勧告Ｇ．１１３）を用いて音質を推定する。図４に示すように、ＥＩＦ法においては、音質はＩｃｐｉｆにより表される。音声通信装置１００は、そのＩｃｐｉｆをそのまま出力するか、または、利用者が理解しやすいような表示に変換して出力する。なお、公知のＥモデルなどを用いても同様に音質を推定できる。 In the present embodiment, sound quality is estimated using a known EIF method (ITU recommendation G.113). As shown in FIG. 4, in the EIF method, the sound quality is represented by Icpif. The voice communication apparatus 100 outputs the Icpif as it is, or converts it into a display that can be easily understood by the user and outputs it. Note that the sound quality can be similarly estimated using a known E model or the like.

Ｉｃｐｉｆは次の式１で算出される。
（式１）Ｉｃｐｉｆ＝Ｉｔｏｔ−Ａ
この式中のＩｔｏｔは次の式２で算出される。
（式２）Ｉｔｏｔ＝Ｉｎｃ＋Ｉｌｒ＋Ｉｑ＋Ｉｄｔｅ＋Ｉｄｄ＋Ｉｅ
これら各式中の要素それぞれの意味は次の通りである。 Icpif is calculated by the following equation 1.
(Formula 1) Icpif = Itot-A
Itot in this equation is calculated by the following equation 2.
(Formula 2) Itot = Inc + Ilr + Iq + Idte + Idd + Ie
The meaning of each element in each of these formulas is as follows.

Ｉｎｃは受信側回線ノイズを、Ｉｌｒは総合音量評価を、ＩｑはＡ／Ｄ，Ｄ／Ａ変換による量子化ひずみを、Ｉｄｔｅは送話者エコーを、Ｉｄｄは遅延を、Ｉｅはコーデック、パケットロスなどの影響を、Ｉｔｏｔは全劣化の総合評価を、Ａは利用者の期待要因を、
Ｉｃｐｉｆは利用者の満足度の期待されるレベルを、それぞれ意味する。 Inc is reception side line noise, Ilr is total sound volume evaluation, Iq is quantization distortion due to A / D and D / A conversion, Idte is talker echo, Idd is delay, Ie is codec, packet loss Itot is a comprehensive evaluation of total degradation, A is a user's expectation factor,
Icpif means each expected level of user satisfaction.

次に、これら各式中の要素それぞれの値の具体例について説明する。Ｉｎｃ、Ｉｌｒ、Ｉｑ、Ｉｄｔｅについては、最適値あるいは一般的な平均値を指定する。例えば、これら各値を予め音声通信装置１００の記憶装置に格納しておき、必要に応じて取得する（読み出す）。Ｉｄｄについては、Ｓ１０３で所定ファイルに記録された内容に基づいて平均遅延時間を算出し、さらにその平均遅延時間の半分を片方向遅延の推測値として算出し、図５に示す換算表を参照して、その片方向遅延の推測値に対応するＩｄｄを取得する（読み出す）。これは本発明の検出手段に相当する。このＩｄｄは、本発明のラウンドトリップ時間に対応する値に相当し、これはネットワークの状態を表す。なお、換算表は平均遅延時間の半分(Ta)とＩｄｄとの対応関係であり、例えば音声通信装置１００の記憶装置に格納されている。 Next, specific examples of the values of the elements in these formulas will be described. For Inc, Ilr, Iq, and Idte, an optimum value or a general average value is designated. For example, each of these values is stored in advance in the storage device of the voice communication device 100, and is acquired (read) as necessary. For Idd, the average delay time is calculated based on the content recorded in the predetermined file in S103, and half of the average delay time is calculated as an estimated value of the one-way delay. Refer to the conversion table shown in FIG. Thus, Idd corresponding to the estimated value of the one-way delay is acquired (read). This corresponds to the detection means of the present invention. This Idd corresponds to a value corresponding to the round trip time of the present invention, which represents the state of the network. The conversion table is a correspondence relationship between half of the average delay time (Ta) and Idd, and is stored in the storage device of the voice communication device 100, for example.

Ｉｅについては、図６に示す換算表を参照して、パケットロス又はパケット損失（例えば音声通信装置１００においてエコー要求パケット及びエコー応答パケットに基づいて算出する）に対応するＩｅを取得する（読み出す）。これは本発明の検出手段に相当する。このＩｅは、本発明のパケットロス率に対応する値に相当し、これはネットワークの状態を表す。なお、換算表はパケット損失（％）とＩｅとの対応関係であり、例えば音声通信装置１００の記憶装置に格納されている。この換算表は音声通信装置１００が保持するコーデックごとに設けられており、使用予定のコーデックに対応する換算表が使用される。なお、心理的な期待要因Ａについては計算しない。 For Ie, referring to the conversion table shown in FIG. 6, obtain (read out) Ie corresponding to packet loss or packet loss (e.g., calculated based on echo request packet and echo response packet in voice communication apparatus 100). . This corresponds to the detection means of the present invention. This Ie corresponds to a value corresponding to the packet loss rate of the present invention, which represents the state of the network. Note that the conversion table indicates the correspondence between packet loss (%) and Ie, and is stored in the storage device of the voice communication device 100, for example. This conversion table is provided for each codec held by the voice communication apparatus 100, and the conversion table corresponding to the codec scheduled to be used is used. The psychological expectation factor A is not calculated.

以上のように、各要素を上述の式１及び式２に適用することで音質（Ｉｃｐｉｆ）を簡易に推定することが可能となっている。
（音質の出力）
音声通信装置１００は、上記のように推測した音質を表すＩｃｐｉｆをそのまま出力するか、または、利用者が理解しやすいような表示に変換して出力する。これは本発明の出力手段に相当する評価結果出力機能１０６が行う。 As described above, the sound quality (Icpif) can be easily estimated by applying each element to the above-described Expression 1 and Expression 2.
(Sound quality output)
The voice communication apparatus 100 outputs Icpif representing the estimated sound quality as described above, or converts it into a display that can be easily understood by the user and outputs it. This is performed by the evaluation result output function 106 corresponding to the output means of the present invention.

後者の出力例としては、例えば、図７に示すように、Ｉｃｐｉｆの上限値と音声会話の通信品質との対応関係を（例えばテーブルの形態で）音声通信装置１００の記憶装置に格納させておき、Ｉｃｐｉｆ（の上限値）に対応する（一致もしくは最も類似する）音声会話の通信品質を、そのテーブルから取得し（読み出し）、出力する。 As an example of the latter output, for example, as shown in FIG. 7, the correspondence between the upper limit value of Icpif and the communication quality of voice conversation is stored in the storage device of the voice communication device 100 (for example, in the form of a table). , The communication quality of the voice conversation corresponding to (or the most similar to) Icpif (upper limit value) is acquired (read) from the table and output.

また、推定された音質の出力形態としては、例えば音声通信装置１００が液晶ディスプレイなどの画像表示装置を備えている場合には、推定された音質を表す情報を、その画像表示装置に表示することが考えられる。また、例えば音声通信装置１００がスピーカなどの音声出力装置を備えている場合は、推定された音質を表す情報を、その音声出力装置から出力することが考えられる。 As an output form of the estimated sound quality, for example, when the voice communication device 100 includes an image display device such as a liquid crystal display, information indicating the estimated sound quality is displayed on the image display device. Can be considered. For example, when the audio communication apparatus 100 includes an audio output device such as a speaker, it is conceivable that information indicating the estimated sound quality is output from the audio output device.

以上説明したように、本実施形態の音声通話装置１００によれば、音声パケットを通話相手側である通話相手装置との間でネットワークを介して送受することにより通話を行う場合において、大がかりなシステム構成を必要とせずにしかも安価に、音質を推定することが可能となる。 As described above, according to the voice call device 100 of the present embodiment, a large-scale system is used when a call is made by sending and receiving voice packets to and from a call partner device that is a call partner side via a network. The sound quality can be estimated at a low cost without requiring a configuration.

上記の実施形態はあらゆる点で単なる例示にすぎない。このため、本発明は上記の実施形態に限定して解釈されるものでない。すなわち、本発明は、その精神または主要な特徴から逸脱することなく、他の様々な形で実施することができる。 The above embodiments are merely examples in all respects. For this reason, this invention is limited to said embodiment and is not interpreted. That is, the present invention can be implemented in various other forms without departing from the spirit or main features thereof.

本発明によれば、音声パケットを通話相手側である通話相手装置との間でネットワークを介して送受することにより通話を行う場合において、大がかりなシステム構成を必要とせずにしかも安価に、片側から音質を推定することが可能となる。 According to the present invention, when a call is made by sending and receiving voice packets to and from a call partner apparatus that is a call partner side via a network, a large-scale system configuration is not required and the cost can be reduced from one side. Sound quality can be estimated.

本発明の一実施形態である音声通信装置の機能ブロック図である。It is a functional block diagram of the voice communication apparatus which is one Embodiment of this invention. 本発明の一実施形態である音声通信装置の動作について説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the audio | voice communication apparatus which is one Embodiment of this invention. エコー要求パケット（後述のエコー応答パケットも同様）のフォーマット例である。It is a format example of an echo request packet (the same applies to an echo response packet described later). 音質の推定方法（ＥＩＦ法）を説明するための図である。It is a figure for demonstrating the estimation method (EIF method) of a sound quality. 片方向遅延をＩｄｄに変換する換算表の例である。It is an example of the conversion table | surface which converts a one way delay into Idd. パケット損失をＩｅに変換する換算表の例である。It is an example of the conversion table which converts packet loss into Ie. Ｉｃｐｉｆ（の上限値）を音声会話の通信品質に変換する換算表の例である。It is an example of the conversion table which converts Icpif (the upper limit value) into the communication quality of voice conversation.

Explanation of symbols

１００音声通信装置
１０１入力機能
１０２エコー(Echo)パケット送信機能
１０３エコーパケット受信機能
１０４送受信状態記録機能
１０５送受信記録評価機能
１０６評価結果出力機能
１０７音声コーデック(codec)情報格納機能 DESCRIPTION OF SYMBOLS 100 Voice communication apparatus 101 Input function 102 Echo packet transmission function 103 Echo packet reception function 104 Transmission / reception status recording function 105 Transmission / reception record evaluation function 106 Evaluation result output function 107 Voice codec (codec) information storage function

Claims

A voice communication device for making a call by sending and receiving voice packets via a network with a call partner device that is a call partner side,
Detecting means for detecting a state of a network with the counterpart device;
Estimating means for estimating sound quality based on the state of the network detected by the detecting means;
Output means for outputting information representing the sound quality estimated by the estimating means;
A voice communication device comprising:

A transmission / reception means for transmitting an echo request packet and receiving an echo response packet sent back from the partner apparatus that received the echo request packet;
Recording means for recording the transmission timing each time the echo request packet is transmitted, and recording means for recording the reception timing each time the echo response packet is received;
The detecting means detects the state of the network based on the content recorded by the recording means;
The voice communication apparatus according to claim 1.

The state of the network detected by the detecting means is that a packet transmitted from the own device arrives at the partner device via the network, and a packet returned from the partner device in response to the packet arrives at the own device. It is represented by a value corresponding to the round trip time until
The voice communication apparatus according to claim 1 or 2.

The state of the network detected by the detecting unit is represented by a value corresponding to a packet loss rate of a packet transmitted and received by the transmitting / receiving unit.
The voice communication apparatus according to claim 3.

A method for estimating sound quality in a voice communication apparatus for performing a call by sending and receiving voice packets via a network with a call partner apparatus that is a call partner side,
A detection step of detecting a state of a network with the counterpart device;
An estimation step for estimating sound quality based on the state of the network detected by the detection step;
An output step of outputting information representing the sound quality estimated by the estimation step;
A sound quality estimation method comprising:

A transmission / reception step of transmitting an echo request packet and receiving an echo response packet sent back from the partner apparatus that received the echo request packet;
A recording step of recording the transmission timing each time the echo request packet is transmitted, and recording the reception timing each time the echo response packet is received; and
The detecting step detects the state of the network based on the content recorded by the recording means;
The sound quality estimation method according to claim 5.

The state of the network detected by the detecting step is that a packet transmitted from the own device arrives at the partner device via the network, and a packet returned from the partner device in response to the packet arrives at the own device. It is represented by a value corresponding to the round trip time until
The sound quality estimation method according to claim 5 or 6.

The network state detected by the detection step is represented by a value corresponding to the packet loss rate of the packet transmitted and received by the transmission / reception step.
The sound quality estimation method according to claim 7.