WO2016208168A1

WO2016208168A1 - Call quality evaluation method, call quality evaluation device and call quality evaluation program

Info

Publication number: WO2016208168A1
Application number: PCT/JP2016/002926
Authority: WO
Inventors: 浩一二瓶
Original assignee: 日本電気株式会社
Priority date: 2015-06-24
Filing date: 2016-06-17
Publication date: 2016-12-29

Abstract

The present invention addresses the problem of, when estimating call quality deterioration, obtaining the subjective quality of experience of a voice call with higher accuracy by using a method with few drawbacks as each of methods for estimating call quality deteriorations due to delay and sound quality deterioration, respectively. To solve this problem, a first deterioration value indicating call quality deterioration due to delay caused in a voice signal by transmission/reception processing via a network is derived, and a second deterioration value indicating call quality deterioration due to sound quality deterioration caused in the voice signal by the transmission/reception processing is derived. Further, an element of an ordered set having an order relation corresponding to call quality is outputted, the element being associated with a combination of the first and second deterioration values by surjection.

Description

Call quality evaluation method, call quality evaluation apparatus, and call quality evaluation program

The present invention relates to a method for evaluating call quality of a voice call.

There are a subjective quality evaluation method and an objective quality evaluation method as methods for evaluating call quality.

The subjective quality evaluation method is a method in which a subject is made to listen to a voice that is assumed to be an actual call and the subject is subjected to subjective evaluation. This method has a problem that many evaluators and dedicated evaluation facilities are required, and it takes time and cost, and there are variations in the environment and evaluation by the evaluators.

Therefore, an objective quality evaluation method has been developed to estimate the subjective evaluation value from physical features. As objective quality evaluation methods, methods disclosed in Non-Patent Documents 1 to 3 and Patent Document 1 are known.

The method disclosed in Non-Patent Document 1 is a method for estimating subjective quality according to a predetermined additive rule having a sound quality degradation amount such as codec type, delay, and packet loss rate in a database.

The method disclosed in Non-Patent Document 2 is a method for estimating subjective quality by comparing an original voice and an evaluation target voice.

The method disclosed in Non-Patent Document 3 is a method for estimating subjective quality only from received speech.

The method disclosed in Patent Document 1 is a quality evaluation method in a case where the assumption that the influence of the delay and sound quality deterioration assumed in the method disclosed in Patent Document 1 can be simply added is not satisfied. In this method, a MOS value calculated by a technique such as PESQ described in Non-Patent Document 2 is converted into Ie and eff, which are sound quality deterioration amounts of R values that are call quality evaluation values. Here, PESQ refers to Perceptual evaluation of speech quality, MOS value refers to Mean Opinion Score value, and the same applies to the following. Then, the R value is calculated by adding the delay deterioration amount and the interaction amount, and converted to a MOS value using a relational expression between the R value and the MOS value. Here, “R value” refers to the value of a transmission rating factor defined in Non-Patent Document 1, and the same applies to the following.

On the other hand, the speech evaluation apparatus disclosed in Patent Document 2 calculates a speech quality evaluation value in a first evaluation period at a constant period, and a speech quality evaluation value in a second evaluation period longer than the first evaluation period is constant. It calculates with the period of. The speech evaluation device disclosed in Patent Document 2 selects the evaluation value calculated by the first calculation unit or the evaluation value calculated by the second calculation unit, whichever is higher, and selects the selected evaluation value. Output.

JP 2004-222257 A JP 2009-033683 A

However, in the method disclosed in Non-Patent Document 1, only the average subjective quality can be estimated by this method, and the influence of sound quality deterioration due to the occurrence of packet loss is not considered. For this reason, even if there is no problem in estimating the degree of call quality deterioration due to delay, the degree of call quality deterioration due to sound quality deterioration cannot be estimated with high accuracy.

Further, the methods disclosed in Non-Patent Document 2 and Non-Patent Document 3 are incomplete as a call quality evaluation because delay is not considered.

Further, in the method disclosed in Patent Document 1, the calculated MOS value is converted into Ie, eff, which is the sound quality deterioration amount of the R value, and is converted back to the MOS value after addition with other factors. An error occurs during the conversion, but since the conversion is included twice, the error is large and the estimation accuracy decreases. Furthermore, when converting MOS values to Ie, eff, ITU-T G. The average R value of 711 codec, 87.8, is used as a reference. “ITU-T” refers to “International Telecommunication Union Telecommunication Standardization Sector”. Here, it is known that the R value varies depending on the input sound source, and an error due to the use of an average R value is further added.

The present invention uses a method with few drawbacks to estimate the degree of call quality degradation due to delay and the degree of call quality degradation due to sound quality degradation, thereby making it possible to obtain a subjective experience quality of a voice call with higher accuracy. The purpose is to provide quality evaluation methods.

The call quality evaluation method of the present invention includes a step of deriving a first degradation value representing call quality degradation due to a delay caused in a voice signal by transmission / reception processing via a certain network. The call quality evaluation method of the present invention further includes a step of deriving a second deterioration value representing call quality deterioration due to sound quality deterioration caused in the voice signal by the transmission / reception processing. The call quality evaluation method of the present invention further includes a step of outputting an element of an ordered set having an order relationship corresponding to the call quality, which is associated with the combination of the first and second deterioration values by surjective. .

The call quality estimation method and the like of the present invention uses a method that has few drawbacks for a method that estimates a call quality degradation due to delay and a value that represents a call quality degradation due to a sound quality degradation, so that Can be obtained with higher accuracy.

It is a conceptual diagram showing the processing flow of the call quality evaluation method of this embodiment. It is a conceptual diagram showing the structure of the call system of the process target by the call quality evaluation method of this embodiment. It is a conceptual diagram showing the example of the processing flow of the derivation | leading-out method of the value showing call quality degradation by delay. It is a conceptual diagram showing the example of the processing flow of the derivation | leading-out method of the value showing the speech quality degradation by sound quality degradation. It is a conceptual diagram showing the example of the processing flow of the derivation | leading-out method of the value showing the speech quality degradation by sound quality degradation. It is a conceptual diagram showing the example of the processing flow of the derivation | leading-out method of the value showing the speech quality degradation by sound quality degradation. It is a conceptual diagram showing the example of the processing flow which adds together the value showing call quality degradation. It is a conceptual diagram showing the structural example of the telephone call evaluation apparatus of this embodiment. It is a conceptual diagram showing the structural example of a buffering part. It is a conceptual diagram showing the example of the connection relation of a buffering part and a buffer delay measurement part. It is a conceptual diagram showing the other structural example of the communication evaluation apparatus of this embodiment. It is a conceptual diagram showing the processing flow of the minimum call quality evaluation method of this invention.

<Call quality evaluation method>
First, an embodiment relating to the call quality evaluation method of the present invention will be described.
[1. Overall view of processing flow]
FIG. 1 is a conceptual diagram showing a processing flow of the call quality evaluation method of this embodiment.

First, a value representing call quality degradation due to delay is derived in a call system described later including a transmitter, a network (hereinafter, referred to as “NW”), and a receiver. A specific example of a method for deriving a value representing call quality degradation due to delay will be described later (S001).

Next, a value representing the call quality deterioration due to the sound quality deterioration is derived. A specific example of a method for deriving a value representing speech quality degradation due to sound quality degradation will be described later (S002).

Then, the value representing the speech quality degradation due to the delay and the value representing the speech quality degradation due to the sound quality degradation are added together. A specific example of the summing method will be described later (S003).
[2. Configuration of processing call system]
The process shown in FIG. 1 is a process for evaluating call quality in a call system such as that shown in FIG.

The call system 100 includes a transmitter 001, an NW002, and a receiver 003.

The audio signal Vin is input to the transmitter 001. The audio signal Vin is a signal generated by, for example, converting audio by a microphone connected to the transmitter 001. The audio signal Vin is transmitted from the transmitter 001 to the ITU-T G. It may be compressed by compression means represented by 711. The audio signal Vin may be divided into packets in the transmitter 001. Due to the processing in the transmitter 001, sound quality deterioration and delay occur in the sound signal Vin, and the sound signal Vin becomes the sound signal Vin '. The audio signal Vin ′ is input to the NW002.

When the audio signal Vin ′ passes through the NW002, sound quality deterioration and delay such as data loss occur, and the audio signal Vin ′ becomes the audio signal Vin ″. The audio signal Vin ″ is input to the receiver 003. The

The audio signal Vin ″ is received by the receiver 003 and is subjected to a predetermined process in the receiver 003. This processing is performed, for example, when the audio signal Vin ″ is compressed by the transmitter 001 and the non-audio signal Vin ″ is not processed. Compression processing and buffer processing for absorbing variations in delay that have occurred. Due to these processing in the receiver 003, sound quality deterioration and delay occur in the audio signal Vin ″ and are output as the output signal Vout. The output signal Vout is typically output to a speaker connected to the receiver 003.
[3. Estimating values representing call quality degradation due to delay]
FIG. 3 is a conceptual diagram showing an example of a more detailed processing flow of the method for deriving a value representing deterioration in call quality due to delay, expressed in S001 of FIG.

First, a delay time D1 generated in the transmitter 001 is obtained (S101).

The delay time D1 can be obtained, for example, by the difference between the time when a signal is input to the transmitter 001 and the signal is input to the transmitter 001 and the time when the signal is output from the transmitter.

Next, a delay time D2 generated in NW002 is obtained (S102).

Delay time D2 can be obtained, for example, by the difference between the time when a signal is input to NW002 and the signal is input to NW002 and the time when the signal is output from NW002. Therefore, if a signal is sent from the receiver 003 to the transmission / reception unit of the transmitter 001 through the NW002, and the signal is received from the transmission / reception unit of the transmitter 001 immediately to the receiver 003, the delay time D2 is obtained. be able to. This is because by measuring the difference between the time at which the transmitter 001 sends a signal and the time at which the receiver 003 receives the signal, the delay time for a round trip by the NW002 can be obtained. The delay time D2 for one way by NW002 may be half of the round trip. Alternatively, the transmitter 001 can determine the one-way delay time D2 in the NW002 by sending a signal storing the time when the transmitter 001 transmits to the receiver 003.

When the call system 100 is an IP telephone system using the VoIP technology, the delay time D2 can be obtained by communication based on RTCP performed between the receiver and the transmitter. RTCP is defined in Section 6 of Non-Patent Document 4. Here, “VoIP” means Voice over Internet Protocol, and “IP” means Internet Protocol. “RTCP” refers to Real-time Transport Control Protocol. These are the same in the following. Communication based on RTCP is performed by the receiver 003 via the NW002 to the transmitter 001, or by the transmitter 001 via the NW002 to the receiver 003. As is well known, the round-trip delay time (Round-Trip delay Time (RTT)) is obtained by communication based on RTCP, so it can be estimated that half of the RTT value is a one-way delay time generated when passing through the NW. .

Next, the delay time D3 in the receiver 003 is obtained (S103).

The delay time D3 can be obtained as a value obtained by adding the delay in the buffering unit and a fixed value when the call system 100 is an IP telephone system using the VoIP technology. Since the delay in the buffering unit is a variable value, an accurate delay time D3 can be obtained by updating as necessary.

In the IP telephone system using the VoIP technology, the transmitter 001 divides voice information into packets and then sends the information in packets to the receiver 003 through the NW002. The receiver 003 receives audio information sent in units of packets and converts it into a continuous audio signal. The timing at which the receiver 003 receives audio information is a discontinuous timing in packet units. For this reason, when the receiver 003 joins the received audio signals in real time, it becomes difficult to hear audio information in which there are many silent parts (sound breaks) that occur when the arrival of packets is delayed.

The buffering unit adjusts the delay time given to the received packet voice information, thereby suppressing the occurrence of silent parts and adjusting the connected voice information so that it can be easily heard. This is the part used for the call technology used. Since the buffering unit adjusts the voice information by giving a delay time to the received packet voice information, it always causes a delay.

The fixed value is a signal delay generated by the receiver 003 other than the delay by the buffering unit. The fixed value is a short time signal such as a pulse signal input to the receiver 003, and the difference between the input time of the signal and the time when the output signal corresponding to the signal is output from the receiver 003 is measured. Can be obtained. This is because when a short-time signal such as a pulse signal is input, the delay in the buffering unit can be approximately zero.

Next, the delay time D1 + D2 + D3 of the entire call system 100 is obtained (S104).

Then, from the delay time D1 + D2 + D3 of the entire call system 100, a value representing call quality deterioration due to the delay of the entire call system 100 is calculated (S105).

A value representing call quality deterioration due to delay can be obtained by the procedure described in Non-Patent Document 1, using delay time D1 + D2 + D3, for example.

First, Is is calculated according to the procedure described in Section 7.3 “Simultaneous impulse factor, Is” of Non-Patent Document 1. Here, Is is a factor defined in Non-Patent Document 1.

Next, Id is calculated according to the procedure described in Section 7.4 “Delay impermement factor, Id” of the same document. Here, Id is a factor defined in Non-Patent Document 1.

A value Is + Id obtained by adding Is and Id is a value representing a decrease amount of the R value, that is, a speech quality deterioration due to a delay.
[4. Derivation of a value that represents speech quality degradation due to sound quality degradation]
FIG. 4 is a conceptual diagram illustrating an example of a processing flow of a method for deriving a value representing speech quality degradation due to sound quality degradation.

First, a voice signal sample used to obtain a value representing call quality deterioration due to sound quality deterioration is acquired (S201).

Then, using the voice signal sample created in S201, a value representing call quality degradation due to sound quality degradation is calculated (S202).

When the call system 100 is an IP telephone system using the VoIP technology, a procedure for calculating a value representing call quality degradation due to sound quality degradation is disclosed in Non-Patent Document 3, and can be performed by that procedure. .

FIG. 5 is a conceptual diagram showing an example of a processing flow of a method for deriving a value representing call quality deterioration due to sound quality deterioration to which the procedure disclosed in Non-Patent Document 3 is applied.

First, an audio signal sample that conforms to the conditions disclosed in Section 6 “Requirements on speech signals to be assessed” of Non-Patent Document 3 is created (S301). When the receiver 003 is a smartphone using Android as an OS, an audio signal input to AudioTrack, which is an API for reproducing an audio signal, is acquired and used as an audio signal sample.

Then, using the audio signal sample created in S201, the MOS value is calculated according to the procedure disclosed in item 7 and thereafter of the document (S302).

This MOS value is a value that represents call quality deterioration due to sound quality deterioration.

When the call system 100 is an IP telephone system using the VoIP technology, the value representing the call quality deterioration due to the sound quality deterioration can be performed by the procedure disclosed in Non-Patent Document 2.

FIG. 6 is a conceptual diagram showing an example of a processing flow of a method for deriving a value representing call quality deterioration due to sound quality deterioration to which the procedure disclosed in Non-Patent Document 2 is applied.

First, the audio signal Vin input to the transmitter 001 is recorded as an audio file (S401). Here, the audio signal Vin is a signal immediately after the sound is converted into a signal by the microphone.

The transmitter or the like converts the audio signal Vin into an audio file and sends it to the processing unit in the audio file state (S402). In this case, the transmitter or the like may be a transmitter or an operator. The processing unit may be included in the transmitter 001, the receiver 003, or may be configured not to be included in the transmitter 001 and the receiver 003. If the transmitter 001 does not include the processing unit, the audio signal Vin file is sent to the processing unit via the Internet, for example. The audio signal Vin is sent to the receiver 003 through the NW002 as described in the explanation of FIG.

The receiver 003 records the output signal Vout immediately before being output (S403).

The receiver 003 outputs the output signal Vout (S404).

The transmitter or the like converts the recorded output signal Vout into an audio file and sends it to the processing unit in the audio file state (S405). In this case, the transmitter or the like may be a transmitter or an operator.

For example, when the processing unit is connected to the receiver 003 via the Internet, the file of the output signal Vout is sent to the processing unit via the Internet.

Next, the processing unit is disclosed in Non-Patent Document 2 using the audio signal Vin reproduced from the file in which the audio signal Vin is recorded and the output signal Vout reproduced from the file in which the output signal Vout is recorded. The MOS value is calculated by the method described above (S406).

This MOS value is a value representing speech quality degradation due to sound quality degradation.
[5. The sum of the value representing the speech quality degradation due to delay and the value representing the speech quality degradation due to sound quality degradation]
Since the calculation method differs between the value representing the speech quality degradation due to delay and the value representing the speech quality degradation due to the sound quality degradation obtained by the above-described method, they are usually not values that can be evaluated on the same scale. Therefore, if these values are added together as they are, the combined value of these values is not a value that can be grasped by a certain scale, and thus the call quality is not estimated with high accuracy.

For this reason, at least one of the two values representing the call quality deterioration is converted and then summed up so that the values representing the two call quality deteriorations having different calculation methods can be evaluated based on the same standard. Is more preferable.

FIG. 7 is a conceptual diagram showing an example of a processing flow for adding values representing call quality deterioration.

First, at least one of the two values representing the call quality deterioration is set so that the value representing the call quality deterioration due to the delay and the value representing the call quality deterioration due to the sound quality deterioration can be assumed to be evaluated based on the same standard. Is converted (S501).

Here, the relationship between the R value and the MOS value is disclosed in Annex B of Non-Patent Document 1. The R value is a value representing the call quality obtained by the method disclosed in Non-Patent Document 1. The MOS value is a value representing the call quality obtained by the method disclosed in Non-Patent Document 2 or Non-Patent Document 3. Accordingly, when the value representing the call quality degradation due to delay is obtained by the procedure disclosed in Non-Patent Document 1, and the value representing the call quality degradation caused by sound quality degradation is obtained by the procedure disclosed in Non-Patent Document 3, this relationship is used. The above conversion can be performed. In this case, the conversion may be conversion that matches the decrease amount of the MOS value with the decrease amount of the R value, or conversion that matches the decrease amount of the R value with the decrease amount of the MOS value. Furthermore, as long as the conversion satisfies the relationship between the R value and the MOS value, the conversion of the MOS value decrease amount and the R value decrease amount to a third value that is neither the MOS value nor the R value is possible. I do not care.

Next, after the conversion in S501, the value representing the speech quality degradation due to delay and the value representing the speech quality degradation due to the speech quality degradation are added together to obtain a value representing the speech quality degradation in the speech system 100 (S502). ).
<Call quality evaluation device>
Next, an embodiment of a call quality evaluation apparatus to which the above call quality evaluation method can be applied will be described.

FIG. 8 is a conceptual diagram showing a configuration example of the call evaluation device 200 of the present embodiment.

In the configuration example shown in the figure, the method disclosed in Non-Patent Document 1 is used to calculate a value representing call quality degradation due to delay, and the method disclosed in Non-Patent Document 3 is used to calculate a value representing sound quality degradation. It is an example of composition for using each. In the configuration shown in the figure, it is assumed that the transmitter, the NW, and the receiver constitute an IP telephone system using VoIP technology.

The call quality evaluation apparatus 200 includes a delay deterioration deriving unit 220, a sound quality deterioration deriving unit 23a, a conversion adjusting unit 240, and a quality expression deriving unit 250. These connections are as shown in the figure. Here, when different configurations are connected by a line in the same figure, it means that the configurations are connected, and “connection” means that signals can be exchanged. Is the same.

Further, the receiver 203 is not included in the call quality evaluation apparatus 200, and includes a receiving unit 211, a decoding unit 212, a buffering unit 213, and an output unit 214. These connections are as shown in FIG.

The receiving unit 211 is connected to the NW 202. The NW 202 is connected to the transmitter 201. As a result, the reception unit 211 can receive the audio signal Vin ″ sent from the transmitter 201. The audio signal Vin ″ is subjected to a coding process by the transmitter 201 with respect to the audio signal Vin and is a packet. Assume that the signal is divided every time.

The audio signal Vin ″ received by the receiving unit 211 is sent to the decoding unit 212.

The decoding unit 212 performs a decoding process on the transmitted audio signal. The decoding process is a process for converting a compressed signal into an uncompressed signal, for example. Since this process is a process normally used in a mobile phone or the like, detailed description thereof is omitted. The decoded signal is sent to the buffering unit 213.

The buffering unit 213 performs a process of joining the signals divided from the packet unit transmitted from the decoding unit 212 with a delay for each packet unit. Since this process is a process normally used in a mobile phone or the like, detailed description thereof is omitted. The joined signal is sent to the output unit 214.

The buffering unit 213 may be divided into a plurality of parts. For example, the operation system may have a buffer immediately before the output unit 214. In this case, together with the jitter buffer prepared by the application program, the buffering unit 213 is configured as shown in the conceptual diagram of FIG. That is, the buffering unit 213 includes a part 213a including a jitter buffer 213aa and a part 213b including an output buffer 213ba. A portion 213 a including the jitter buffer 213 aa is connected to the decoding unit 212, and a portion 213 b including the output unit buffer 213 ba is connected to the output unit 214.

The output unit 214 outputs the signal sent from the buffering unit 213 as an output signal Vout to a speaker (not shown) connected to the communication evaluation apparatus 200 and the sound quality deterioration deriving unit 23a.

Next, the delay deterioration deriving unit 220 will be described.

The delay degradation derivation unit 220 includes an NW delay measurement unit 221, a buffer delay measurement unit 222, a fixed value input unit 223, a transmitter delay input unit 224, and a delay degradation calculation unit 225. Connections between these components are as shown in FIG.

The NW delay measurement unit 221 is connected to the NW 202 and can communicate with the transmitter 201. The NW delay measurement unit 221 obtains the delay time D2 in the NW 202 by the method described in the description of S102 in FIG. The NW delay measurement unit 221 sends the delay time D2 in the NW 202 to the delay deterioration calculation unit 225.

The buffer delay measuring unit 222 is connected to the buffering unit 213.

Suppose that the buffer delay measurement unit 222 can communicate with the buffering unit 213 based on an API by a program provided in the middleware, for example. In that case, the delay time of the delay generated in the buffering unit 213 by this communication can be obtained.

Alternatively, the buffer delay measuring unit 222 can obtain a delay time of a delay generated in the buffering unit 213 by sending a signal to the input unit of the buffering unit 213 and sending a signal after passing through the buffering unit 213. it can. This is because the delay time is the difference between the time when the buffer delay measurement unit 222 sends the signal and the time when the buffer delay measurement unit 222 receives the signal.

As described above, when the buffering unit 213 is divided into a plurality of parts, the buffer delay measurement unit 222 may be connected to the respective parts and communicate independently with the respective parts. .

For example, when the buffering unit 213 has the configuration shown in FIG. 9, the buffering unit 213 and the buffer delay measurement unit 222 can be connected as shown in FIG. In the example shown in the figure, the buffer delay measuring unit 222 is connected to the jitter buffer 213aa and the output unit buffer 213ba, and can perform independent communication with each of the jitter buffer 213aa and the output unit buffer 213ba.

The buffer delay measuring unit 222 communicates with the jitter buffer 213aa based on the first API by the first program provided in the middleware, for example. Thereby, the buffer delay measuring unit 222 can obtain the delay time generated in the jitter buffer 213aa.

The buffer delay measuring unit 222 communicates with the output unit buffer 213ba based on the second API, for example, by a second program provided in the middleware. Thereby, the buffer delay measuring unit 222 can obtain the delay time generated in the output unit buffer 213ba.

The buffer delay measuring unit 222 can add up the delay time generated in the jitter buffer 213aa and the delay time generated in the output buffer 213ba to obtain the delay time in the entire buffering unit 213.

The buffer delay measuring unit 222 sends the delay time in the buffering unit 213 to the delay deterioration calculating unit 225.

The fixed value input unit 223 sends the input fixed value of the delay time generated in the receiver 203 to the delay deterioration calculation unit 225. The fixed value of the delay time generated in the receiver 203 is as described in the description of S103 in FIG.

Note that a value obtained by adding the fixed value of the delay time generated in the receiver 203 and the delay time in the buffering unit 213 is the delay time D3 in the receiver 203.

The transmitter delay input unit 224 sends the input delay time D1 in the transmitter 201 to the delay degradation calculation unit 225. The delay time D1 in the transmitter 201 is as described in the description of S101 in FIG.

The delay deterioration calculating unit 225 adds the delay time D2 in the NW 202, the delay time D3 in the receiver 203, and the delay time D1 in the transmitter 201 to obtain the delay time D1 + D2 + D3. Here, the delay time D2 in the NW 202 is sent from the NW delay measurement unit 221 and the delay time D1 in the transmitter 201 is sent from the transmitter delay input unit 224, respectively. The delay time D3 in the receiver 203 is a total value of the delay times sent from the buffer delay measurement unit 222 and the fixed value input unit 223, respectively.

Then, the delay deterioration calculation unit 225 obtains a decrease amount of the R value, which is a value representing call quality deterioration due to delay, from the delay time D1 + D2 + D3 by the method described in the description of S105 in FIG. The calculated decrease amount of the R value is sent to the conversion adjustment unit 240.

Next, the sound quality deterioration deriving unit 23a will be described.

The sound quality degradation deriving unit 23a uses the input output signal Vout to obtain a reduction amount of the MOS value, which is a value representing the speech quality degradation due to the sound quality degradation, by the method described in FIG. The obtained reduction amount of the MOS value is sent to the conversion adjustment unit 240.

The conversion adjustment unit 240 is assumed to be able to compare the amount of decrease in the R value sent from the delay degradation calculation unit 225 and the amount of reduction in the MOS value sent from the sound quality degradation deriving unit 23a on the same basis. Process to convert to. This process is performed by the method described in S301 of FIG. The value representing the speech quality degradation due to the delay and the value representing the speech quality degradation due to the sound quality degradation after processing are sent to the quality expression deriving unit 250.

The quality expression deriving unit 250 adds the transmitted value representing the call quality degradation due to delay and the value representing the call quality degradation due to the sound quality degradation, and creates information including the summed value representing the call quality degradation. The information may be a value representing the summed call quality degradation, or may be information obtained by processing a value representing the summed call quality degradation. The signal Ve including the information is output to a display, a printer, and other devices connected to the communication evaluation apparatus 200.

FIG. 11 is a conceptual diagram showing another configuration example of the communication evaluation apparatus 200 of the present embodiment.

In the configuration example shown in the figure, the method disclosed in Non-Patent Document 1 is used to calculate a value representing speech quality degradation due to delay, and the method disclosed in Non-Patent Document 2 is used to calculate a value representing sound quality degradation. It is an example of composition for using each. In the configuration shown in the figure, it is assumed that the transmitter, the NW, and the receiver constitute an IP telephone system using VoIP technology.

In the description of the figure, the description of the configuration other than the sound quality deterioration deriving unit 23b is the case where the sound quality deterioration deriving unit 23a is replaced by the sound quality deterioration deriving unit 23b in the description of FIG. In the following, the sound quality deterioration deriving unit 23b will be described.

The sound quality degradation deriving unit 23b includes an output signal file creation unit 231, an audio file input unit 232, and a sound quality degradation calculation unit 233.

The output signal file creation unit 231 converts the output signal Vout sent from the output unit 214 into an audio file and records it. The converted audio file is sent to the sound quality deterioration calculation unit 233.

The audio file input unit 232 sends the audio file of the input audio signal Vin to the sound quality deterioration calculation unit 233. The description of the audio file of the audio signal Vin is as described for S401 and S402 in FIG.

The sound quality degradation calculation unit 233 reproduces the audio file of the output signal Vout sent from the output signal file creation unit 231 and the audio file of the audio signal Vin sent from the audio file input unit 232. Then, the amount of decrease in the MOS value, which is a value representing the speech quality degradation due to the sound quality degradation, is calculated for the output signal Vout by the method described in S406 of FIG.
[effect]
Several methods, including Non-Patent Documents 1 to 4, are disclosed as currently known call quality estimation methods. In these methods, various methods are disclosed for each of a method for obtaining a value representing speech quality degradation due to delay and a method for obtaining a value representing speech quality degradation due to quality degradation. However, none of the methods disclosed therein can accurately determine both a value representing speech quality degradation due to delay and a value representing speech quality degradation due to sound quality degradation. However, it can be considered that there is a method that can be obtained with a certain degree of accuracy with respect to each of the value representing the speech quality degradation due to delay and the value representing the speech quality degradation due to sound quality degradation.

Here, the call quality evaluation method according to the present embodiment is the same as the call quality evaluation method according to the present embodiment, in which a value indicating call quality degradation in a call system is expressed as a value indicating call quality deterioration due to delay and a call quality due to sound quality deterioration. A value representing deterioration is obtained and obtained by adding together. Therefore, it is possible to select a method with few defects from various options as a method for evaluating a value representing speech quality degradation due to delay and a value representing speech quality degradation due to sound quality degradation. Then, by selecting a method that evaluates a value representing call quality degradation due to delay, and a method that has less drawbacks as a value representing call quality degradation due to sound quality degradation, the subjective experience quality of a voice call is further improved. It can be obtained with high accuracy.

In the description so far, considering the ease of understanding, the value representing the call quality degradation in the call system is the sum of the value representing the call quality degradation due to delay and the value representing the call quality degradation due to the sound quality degradation. The case of obtaining and outputting was explained as an example. However, the output target does not necessarily need to be the sum of the two values, but the element of the ordered set having the order relationship corresponding to the call quality associated with the combination of the two values. If it is. In addition, the elements of the ordered set in this case do not necessarily have to be values. For example, a person (including those operated by a person) such as color, shape, pattern, brightness, sound, vibration, smell, temperature, and the like. Anything can be recognized.

Next, the minimum call quality evaluation method of the present invention will be described. FIG. 12 is a conceptual diagram showing the processing flow of the minimum call quality evaluation method of the present invention.

The method derives a first degradation value representing speech quality degradation due to a delay caused in a voice signal by transmission / reception processing via a certain network (S601). The method further derives a second degradation value representing speech quality degradation due to speech quality degradation caused to the voice signal by the transmission / reception process (S602). The method further outputs an element of an ordered set having an order relationship corresponding to the call quality, which is associated with the combination of the first and second deterioration values by surjective (S603).

The minimum call quality evaluation method of the present invention has the effects described in [Effects of the Invention] with the above configuration.

Further, a part or all of the above embodiment can be described as in the following supplementary notes, but is not limited thereto.

(Appendix A1)
Deriving a first degradation value representing speech quality degradation due to a delay caused in a voice signal by transmission / reception processing via a network;
Deriving a second degradation value representing speech quality degradation due to speech quality degradation caused to the voice signal by the transmission / reception processing;
Outputting an element of an ordered set having an order relationship corresponding to call quality, which is associated with the combination of the first and second deterioration values by surjective;
Including call quality evaluation method.

(Appendix A2)
The call quality evaluation method according to attachment A1, wherein the element is a total value of the first deterioration value and the second deterioration value.

(Appendix A3)
The transmitter / receiver process transmits a second audio signal, which is an audio signal obtained by performing a first process on the input first audio signal, to the receiver through the network, the network, An output signal that is an audio signal that receives the third audio signal that is the audio signal that has reached the receiver through the network with respect to the second audio signal, and has performed a second process on the third audio signal. The call quality evaluation method described in Supplementary Note A1 or Supplementary Note A2, which is a transmission / reception process performed in the receiver.

(Appendix A4)
The first degradation value is described in the supplementary note A2 or the supplementary note 3, wherein a first delay time which is a delay time of the output signal with respect to the first audio signal is obtained and calculated by the first delay time. Call quality evaluation method.

(Appendix A5)
A second delay time that is a delay time of the second audio signal with respect to the first audio signal, and a third delay time that is a delay time of the third audio signal with respect to the second audio signal. And a fourth delay time that is a delay time of the output signal with respect to the third audio signal, and the first delay time is determined as the second delay time and the third delay time. The call quality evaluation method described in appendix A4, which is obtained by adding the fourth delay time.

(Appendix A6)
The first deterioration value is set as ITU-T G.I. 107 “The E-model: a computational model for use in
Call quality evaluation described in appendix A3 or A5, which is obtained by summing Is obtained by equation (7-8) and Id obtained by equation (7-18) described in “transmission planning” Method.
(Appendix A7)
The obtained first deterioration value is expressed as ITU-T G.I. 107 “The E-model: a computational model for use in
"transmission planning", the relationship described in the equation B (B-4) in the formula B or FIG. The call quality evaluation method described in appendix A6, which is converted into a mean opinion score using the relationship described in 2.

(Appendix A8)
Of the supplementary notes A4 to A7, the step of calculating the first degradation value of the system includes the step of obtaining the second delay time by communication through the network with the transmitter by the receiver. The call quality evaluation method described in any one of the above.

(Appendix A9)
The call quality evaluation method according to any one of supplementary notes A2 to A8, wherein the second deterioration value is obtained from the output signal.

(Appendix A10)
The second deterioration value is set to ITU-TP. 563 “Single-ended method for objective speech”
The call quality evaluation method described in appendix A9, which is performed according to the procedure described in “quality assessment in near-band telephony applications”.

(Appendix A11)
The call quality evaluation method according to any one of supplementary notes A2 to A10, wherein the second deterioration value is obtained from the output signal and the first voice signal.

(Appendix A12)
The call quality evaluation method according to appendix A11, wherein the first audio signal is a signal obtained by reproducing an audio file in which the first audio signal is recorded.

(Appendix A13)
The call quality evaluation method according to appendix A12, wherein the voice file is a voice file transmitted over the Internet.

(Appendix A14)
The second deterioration value is set to ITU-TP. 862
"Perceptual evaluation of speech quality
(PESQ): An objective method for
end-to-end speech quality assessment of
narrow-band telephony networks and
The call quality evaluation method according to any one of Supplementary Note A11 to Supplementary Note A13, which is performed according to the procedure described in "Speech codes".

(Supplementary Note A15) Before the addition, any of the first deterioration value, the second deterioration value, and both the first deterioration value and the value representing the call quality deterioration due to the sound quality deterioration The first deterioration value and the second deterioration value are converted into values that can be compared with each other by correcting the first deterioration value, and the sum is calculated after the conversion. The call quality evaluation method according to any one of supplementary notes A3 to A14, which is a sum of the first degradation value and the second degradation value.

(Appendix B1)
A delay degradation derivation unit for deriving a first degradation value representing a speech quality degradation due to a delay caused in a voice signal by transmission / reception processing via a network;
A sound quality deterioration deriving unit for deriving a second deterioration value representing a call quality deterioration due to sound quality deterioration generated in the voice signal by the transmission / reception processing;
A call quality deriving unit that outputs an element of an ordered set having an order relationship corresponding to the call quality associated with the combination of the first and second deterioration values by surjective;
A call quality evaluation apparatus comprising:

(Appendix B2)
The call quality evaluation apparatus according to appendix B1, wherein the element is a total value of the first deterioration value and the second deterioration value.

(Appendix B3)
The transmitter / receiver process transmits a second audio signal, which is an audio signal obtained by performing a first process on the input first audio signal, to the receiver through the network, the network, An output signal that is an audio signal that receives the third audio signal that is the audio signal that has reached the receiver through the network with respect to the second audio signal, and has performed a second process on the third audio signal. The call quality evaluation apparatus according to Supplementary Note 1 or Supplementary Note 2, which is a transmission / reception process performed in the receiver.

(Appendix B4)
A first delay time deriving unit that obtains a first delay time that is a delay time of the output signal with respect to the first audio signal; and a first delay time based on the first delay. The call quality evaluation apparatus according to appendix B2, further comprising a delay deterioration deriving unit for deriving a value representing call quality deterioration.

(Appendix B5)
The first delay time deriving unit includes a second delay time deriving unit that obtains a second delay time that is a delay time of the third audio signal with respect to the second audio signal, and uses the second delay time to The call quality evaluation apparatus according to appendix B4, which derives a delay time.

(Appendix B6)
The call quality evaluation apparatus according to appendix B5, wherein the second delay time deriving unit obtains the second delay time by performing communication with the transmitter through the network.

(Appendix B7)
The first delay time deriving unit includes a third delay time deriving unit for obtaining a third delay time which is a delay time of the output signal with respect to the third audio signal, and the first delay time is calculated from the third delay time. The call quality evaluation apparatus described in any one of Supplementary Notes B4 to B6.

(Appendix B8)
In the case where the third audio signal is a packet signal and the receiver includes a buffer that suppresses delay variation between packets in the third audio signal, the third delay time deriving unit is generated in the buffer. The speech quality evaluation apparatus according to attachment B7, further comprising a fourth delay time deriving unit that derives a buffer delay time that is a delay time, and obtaining the third delay time from the buffer delay time.

(Appendix B9)
The call quality evaluation apparatus according to attachment B7, wherein the third delay time deriving unit derives the buffer delay time by communicating with the receiving unit.

(Appendix B10)
The call quality evaluation device according to attachment B9, wherein the fourth delay time deriving unit derives the buffer delay time by communicating with the receiving unit.

(Appendix B11)
The call quality evaluation device according to attachment B10, wherein the third delay time deriving unit derives the buffer delay time by communicating with a part including the buffer in the receiving unit.

(Appendix B12)
The speech quality evaluation apparatus according to attachment B11, wherein the fourth delay time deriving unit derives the buffer delay time by communicating with a part including the buffer in the receiving unit.

(Appendix B13)
A value representing voice quality degradation due to the delay due to the first delay,
ITU-T G. 107 “The E-model: a
computational model for use in
The call quality evaluation apparatus according to any one of supplementary notes B4 to B12, which is derived by the procedure described in “transmission planning”.

(Appendix B14)
The sound quality degradation deriving unit derives a value representing speech quality degradation due to the sound quality degradation from the output signal input from the receiver to the sound quality degradation deriving unit, according to any one of appendices B2 to B13 The described call quality evaluation device.

(Appendix B15)
The sound quality degradation deriving unit sets a value representing speech quality degradation due to the delay,
ITU-TP 563 “Single-ended method for
objective speech quality assessment in
The call quality evaluation apparatus according to appendix B14, which is derived by the procedure described in “narrow-band telephony applications”.

(Appendix B16)
Additional notes B2 to B13, wherein the sound quality deterioration deriving unit derives from the output signal input from the receiver to the sound quality deterioration deriving unit and the first audio signal input to the sound quality deterioration deriving unit. A call quality evaluation apparatus described in any one of the above.

(Appendix B17)
Item 16. The first sound signal input to the sound quality deterioration deriving unit is input to the sound quality deterioration deriving unit as an audio file including the first sound signal input to the sound quality deterioration deriving unit. Call quality evaluation device described in 1.

(Appendix B18)
The call quality evaluation apparatus according to appendix 17, wherein an audio file including the first audio signal is input to the sound quality degradation deriving unit via the Internet.

(Appendix B19)
The sound quality degradation deriving unit obtains a value representing the speech quality degradation due to the delay as an ITU-TP. 862 “Perceptual evaluation of speech”
quality (PESQ): An objective method for
end-to-end speech quality assessment of
narrow-band telephony networks and
The speech quality evaluation apparatus according to any one of Supplementary Note B16 to Supplementary Note B18, which is derived by the procedure described in "Speech codes".

(Appendix B20)
Before the summation, at least one of a value representing call quality degradation due to the delay and a value representing call quality degradation due to the sound quality degradation, a value representing the call quality degradation due to the delay and a call quality degradation due to the sound quality degradation. And a conversion adjustment unit for converting the value representing the value that can be compared with a value that can be compared with the same standard, and the sum is a value representing the speech quality deterioration due to the delay and the sound quality after the conversion The call quality evaluation apparatus according to any one of appendices B3 to B19, which is a sum of values representing call quality deterioration due to deterioration.

(Appendix C1)
A process of deriving a first degradation value representing a speech quality degradation due to a delay caused in a voice signal by a transmission / reception process via a network;
A process of deriving a second degradation value representing speech quality degradation due to degradation of sound quality caused in the audio signal by the transmission / reception processing;
A process of outputting an element of an ordered set having an order relation corresponding to a call quality, which is associated with the combination of the first and second deterioration values by surjective;
A call quality evaluation program for causing a computer to execute a process.

The present invention has been described above using the above-described embodiment as an exemplary example. However, the present invention is not limited to the above-described embodiment. That is, the present invention can apply various modes that can be understood by those skilled in the art within the scope of the present invention.

This application claims priority based on Japanese Patent Application No. 2015-126085 filed on June 24, 2015, the entire disclosure of which is incorporated herein.

001, 201

Transmitter

002, 202 NW
DESCRIPTION OF SYMBOLS 100 Call system 200 Call

quality evaluation apparatus

003, 203 Receiver 211 Reception part 212 Decoding part 213 Buffering part 213a Part including a jitter buffer 213aa Jitter buffer 213b Part including an output part buffer 213ba Output part buffer 214 Output part 220 Derivation of delay deterioration Unit 221 NW delay measurement unit 222 buffer delay measurement unit 223 fixed value input unit 224 transmitter delay input unit 225 delay

degradation calculation unit

23a, 23b sound quality degradation derivation unit 231 output signal file creation unit 232 audio file input unit 233 sound quality degradation calculation unit 240 Conversion adjustment unit 250 Quality expression deriving unit

Claims

Deriving the first degradation value representing the degradation of call quality due to the delay caused to the voice signal by the transmission / reception processing via a certain network,
Deriving a second degradation value representing speech quality degradation due to speech quality degradation caused to the voice signal by the transmission / reception processing,
Outputting an element of an ordered set having an order relationship corresponding to a call quality, which is associated with the combination of the first and second deterioration values by surjective;
Call quality evaluation method.
The call quality evaluation method according to claim 1, wherein the element is a total value of the first deterioration value and the second deterioration value.
The transmitter / receiver process transmits a second audio signal, which is an audio signal obtained by performing a first process on the input first audio signal, to the receiver through the network, the network, An output signal that is an audio signal that receives the third audio signal that is the audio signal that has reached the receiver through the network with respect to the second audio signal, and has performed a second process on the third audio signal. The call quality evaluation method according to claim 1 or 2, which is a transmission / reception process performed in a receiver that outputs a message.
The first deterioration value is set as ITU-T G.I. 107 “The E-model: a computational model for use in
4. The call quality evaluation method according to claim 3, wherein the call quality evaluation method is obtained by summing Is obtained by the equation (7-8) and Id obtained by the equation (7-18) described in “transmission planning”. .
The obtained first deterioration value is expressed as ITU-T G.I. 107 “The E-model: a computational model for use in
"transmission planning", the relationship described in the equation B (B-4) in the formula B or FIG. 5. The call quality evaluation method according to claim 4, wherein conversion to Mean Opinion Score is performed using the relationship described in 2.
The call quality evaluation method according to any one of claims 2 to 5, wherein a value representing call quality deterioration due to the sound quality deterioration is obtained from the output signal.
The second deterioration value is set to ITU-TP. 563 “Single-ended method for objective speech”
The call quality evaluation method according to claim 6, wherein the call quality evaluation method is performed according to a procedure described in “quality assessment in near-band telephony applications”.
The call quality evaluation method according to any one of claims 2 to 7, wherein a value representing call quality deterioration due to the sound quality deterioration is obtained from the output signal and the first voice signal.
The call quality evaluation method according to claim 8, wherein the first audio signal is a signal obtained by reproducing an audio file in which the first audio signal is recorded.
Prior to the summation, a value representing call quality degradation due to delay, a value representing call quality degradation due to sound quality degradation, a value representing call quality degradation due to delay and a value representing call quality degradation due to sound quality degradation By correcting either one of the two values, the value representing the call quality deterioration due to the delay and the value representing the call quality deterioration due to the sound quality deterioration are converted into values assumed to be comparable on the same basis. 10. The method according to claim 2, wherein the sum is a sum of a value representing speech quality degradation due to the delay after the conversion and a value representing speech quality degradation due to the sound quality degradation. Call quality evaluation method.