JP4354962B2

JP4354962B2 - Video quality estimation apparatus and video quality estimation method

Info

Publication number: JP4354962B2
Application number: JP2006078897A
Authority: JP
Inventors: 聡子富永; 孝典林; 征貴増田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-03-22
Filing date: 2006-03-22
Publication date: 2009-10-28
Anticipated expiration: 2026-03-22
Also published as: JP2007258919A

Description

この発明は、ネットワークを介する通信端末間の映像通信サービスにおける映像品質を推定する映像品質推定装置および映像品質推定方法に関するものである。 The present invention relates to a video quality estimation apparatus and a video quality estimation method for estimating video quality in a video communication service between communication terminals via a network.

近年、インターネット網など通信ネットワークが広帯域化する中、映像配信サービスやテレビ会議などの映像系のサービス（映像通信サービス）が普及しつつある。これらの映像通信サービスでは、映像信号を高能率圧縮符号化し、パケット列に変換して、映像配信クライアントや双方向映像通信サービスクライアントなどの通信端末へ送るようにしている。 In recent years, video communication services (video communication services) such as video distribution services and video conferencing are becoming widespread as communication networks such as the Internet have become broadband. In these video communication services, a video signal is compression-encoded with high efficiency, converted into a packet sequence, and sent to a communication terminal such as a video distribution client or a bidirectional video communication service client.

例えば、フレーム間予測を用いずフレーム内で符号化処理を行うＩ（Intra Picture ）フレーム、動き補償予測を使い過去から現在を予測し符号化処理を行うＰ（Predictive Picture）フレーム、双方向予測を使い順方向および逆方向予測し符号化処理を行う（Bidirectionally Predictive Picture）Ｂフレームという３種類のフレームにより映像信号を高能率圧縮符号化し、この高能率圧縮符号化した映像信号をパケット列に変換し、ネットワークを介して映像配信クライアントや双方向映像通信サービスクライアントへ送るようにしている。 For example, an I (Intra Picture) frame that performs encoding processing within a frame without using inter-frame prediction, a P (Predictive Picture) frame that performs encoding processing by predicting the present from the past using motion compensation prediction, and bi-directional prediction Use forward and backward prediction and encode processing (Bidirectionally Predictive Picture) Video signal is highly efficient compression encoded by three types of frames called B frames, and this highly efficient compression encoded video signal is converted into a packet sequence And sent to the video distribution client and the interactive video communication service client via the network.

この場合、例えば、ネットワークでパケット損失が生じると、映像配信クライアントや双方向映像通信サービスクライアントなどの通信端末での復号に際し前後のフレーム情報を使うことから、１パケットの損失による映像品質の劣化が１映像フレームにとどまらず、複数の映像フレームに及ぶことがある。このような場合、次の映像フレームではパケット損失が生じていないにも拘わらず、復号された映像では品質劣化が生じており、ユーザが実際に体感する映像品質（ユーザ体感品質）が大きく低下する。これは、Ｉ，Ｐ，ＢフレームのＧＯＰ（Group of pictures ）構成、すなわち符号化された一群の映像フレームに対応する１構成単位時間中の映像フレームの構成に依存する。 In this case, for example, when packet loss occurs in the network, the frame information before and after is used for decoding at a communication terminal such as a video distribution client or a bidirectional video communication service client. There are cases where the video frame is not limited to one video frame but may extend to a plurality of video frames. In such a case, although there is no packet loss in the next video frame, quality degradation occurs in the decoded video, and the video quality (user experience quality) actually experienced by the user is greatly reduced. . This depends on the GOP (Group of pictures) configuration of the I, P, and B frames, that is, the configuration of the video frame in one unit time corresponding to a group of encoded video frames.

ＧＯＰ構成は、ＩまたはＰの出現周期Ｍと、Ｉの出現周期Ｎとの組み合わせで表現される。ＧＯＰ構成の一例を図１１に示す。この例は、Ｍ＝３，Ｎ＝１５の場合のＧＯＢ構成を示したものであり、１秒間３０フレームの映像の場合、１ＧＯＰ構成のフレーム数は１５であるので、１ＧＯＰ時間（１構成単位時間）は０．５秒となる。 The GOP configuration is expressed by a combination of an appearance period M of I or P and an appearance period N of I. An example of the GOP configuration is shown in FIG. This example shows the GOB configuration when M = 3 and N = 15. In the case of 30 frames per second, the number of frames of 1 GOP configuration is 15, so 1 GOP time (1 configuration unit time) ) Is 0.5 seconds.

このように、１パケットの損失であっても、ユーザ体感品質は劣化した映像フレームの種別によって異なる。そこで、ユーザ体感品質を精度よく推定する方法として、ネットワークおよび通信端末での無効パケット（損失パケット，順序逆転パケット，遅着パケット）を検出し、無効パケットが属する映像フレームおよびその映像フレームの影響を受けて劣化する映像フレームの数を無効フレーム数として算出し、この無効フレーム数の測定対象区間（例えば、１０秒）内の総送信フレーム数に対する割合を無効フレーム率として求め、この無効フレーム率からユーザ体感品質を推定する方法が考えられている（非特許文献１，２，３参照）。この方法を用いれば、非特許文献３で報告されているように、無効パケット率でユーザ体感品質を推定する場合に比べ、品質推定精度が向上する。 As described above, even if one packet is lost, the user experience quality varies depending on the type of the deteriorated video frame. Therefore, as a method for accurately estimating the user experience quality, the invalid packet (lost packet, reverse sequence packet, late arrival packet) in the network and communication terminal is detected, and the video frame to which the invalid packet belongs and the influence of the video frame are detected. The number of video frames that deteriorate due to reception is calculated as the number of invalid frames, and the ratio of the number of invalid frames to the total number of transmission frames in the measurement target section (for example, 10 seconds) is obtained as the invalid frame rate. A method for estimating the user experience quality has been considered (see Non-Patent Documents 1, 2, and 3). If this method is used, as reported in Non-Patent Document 3, the quality estimation accuracy is improved as compared with the case where the user experience quality is estimated based on the invalid packet rate.

この方法では、無効パケットが属する映像フレームの情報を無効パケットのヘッダ情報やペイロード情報から得ている。無効パケットを構成する損失パケット，順序逆転パケットおよび遅着パケットのうち、順序逆転パケットおよび遅着パケットについては、パケット自体は存在するため、そのパケットが属する映像フレームの情報をヘッダ情報やペイロード情報から得ることは可能である。 In this method, information of a video frame to which an invalid packet belongs is obtained from header information and payload information of the invalid packet. Of the lost packets, reverse order packets, and late arrival packets that make up invalid packets, the reverse order packets and late arrival packets exist, so the information of the video frame to which the packets belong is determined from the header information and payload information. It is possible to get.

増田，富永：”無効パケット率による映像品質推定方法”，信総大，B-11-15,March 2004.Masuda, Tominaga: “Video quality estimation method based on invalid packet rate”, Shinso Univ., B-11-15, March 2004. 富永，増田，林：”映像フレーム情報を考慮したネットワーク品質尺度による映像品質推定”，信総大，BS-6-2,March 2005.Tominaga, Masuda, Hayashi: “Video quality estimation based on network quality scale considering video frame information”, Shinso Univ., BS-6-2, March 2005. 増田，富永，林：”無効フレーム率を用いたインサービス映像品質管理法”，信学技報，CQ2005-59,September 2005.Masuda, Tominaga, Hayashi: “In-service video quality control using invalid frame rate”, IEICE Technical Report, CQ2005-59, September 2005.

しかしながら、上述した方法において、損失パケットについては、パケットが損失しているために、そのパケットが属する映像フレームの情報をヘッダ情報やペイロード情報から得ることはできない。この場合、損失パケットの前後のパケットからそのパケットが属する映像フレームの情報を類推することになるが、例えば前後のパケットも損失しているような場合は推定が困難であり、映像品質（ユーザ体感品質）の推定精度が低下する。 However, in the above-described method, for a lost packet, since the packet is lost, information on the video frame to which the packet belongs cannot be obtained from header information or payload information. In this case, the information of the video frame to which the packet belongs is inferred from the packets before and after the lost packet. However, for example, when the preceding and following packets are also lost, it is difficult to estimate and the video quality (user experience) (Quality) estimation accuracy decreases.

本発明は、このような課題を解決するためになされたもので、その目的とするところは、無効パケットが属する映像フレームの情報をヘッダ情報やペイロード情報から得ることなく、また損失パケットの前後のパケットが損失しているような場合であっても、精度よく映像品質を推定することができる映像品質推定装置および映像品質推定方法を提供することにある。 The present invention has been made to solve such a problem. The object of the present invention is to obtain information on a video frame to which an invalid packet belongs from header information and payload information, and before and after a lost packet. An object of the present invention is to provide a video quality estimation apparatus and a video quality estimation method capable of accurately estimating video quality even when packets are lost.

このような目的を達成するために本発明は、ネットワークを介する通信端末間の映像通信サービスにおける映像品質を推定する映像品質推定装置において、符号化された１群の映像フレームに対応する１構成単位時間中の映像フレームの構成情報を含む映像符号化情報を通信端末より収集する映像符号化情報収集手段と、映像品質の測定対象区間の始端から１構成単位時間幅の区間を先頭領域とし、この先頭領域をｎ分割（ｎ≧２）してｎ個の１構成単位時間の開始位置を仮に定める１構成単位時間開始位置決定手段と、ｎ個の１構成単位時間の開始位置毎に、映像符号化情報に基づいて、その開始位置以降の各映像フレームの位置および種別を仮に定める仮設定手段と、ネットワークおよび通信端末での無効パケットの発生情報を収集する無効パケット発生情報収集手段と、ｎ個の１構成単位時間の開始位置毎に、仮設定手段によって仮に定められた映像フレームの位置および種別に基づいて、その開始位置以降に発生した無効パケットがどの種別の映像フレームに属しているのかを判断し、その無効パケットが属する映像フレームおよびその映像フレームの影響を受けて劣化する映像フレームの数を無効フレーム数として算出する無効フレーム数算出手段と、ｎ個の１構成単位時間の開始位置毎に算出された無効フレーム数に基づいて測定対象区間内の映像品質を推定する映像品質推定手段とを設けたものである。 In order to achieve such an object, the present invention provides an image quality estimation apparatus for estimating image quality in an image communication service between communication terminals via a network, in which one component unit corresponding to a group of encoded image frames is provided. Video coding information collecting means for collecting video coding information including video frame configuration information from the communication terminal from the communication terminal, and a section of one structural unit time width from the start of the video quality measurement target section, The first region is divided into n (n ≧ 2), and one structural unit time start position determining means for temporarily determining the starting positions of n structural unit times, and a video code for each of the starting positions of n structural unit times Based on the activation information, provisional setting means for temporarily determining the position and type of each video frame after the start position, and information on generation of invalid packets in the network and communication terminal are collected Based on the position and type of the video frame provisionally determined by the temporary setting means for each start position of the effective packet generation information collecting means and n one unit time, which invalid packet is generated after the start position. Invalid frame number calculating means for determining whether it belongs to a type of video frame and calculating the number of video frames to which the invalid packet belongs and the number of video frames deteriorated under the influence of the video frame as the number of invalid frames; Video quality estimation means for estimating the video quality in the measurement target section based on the number of invalid frames calculated for each start position of one structural unit time is provided.

この発明において、通信端末を双方向映像通信サービスクライアント、映像信号を符号化する際の１構成単位をＧＯＰとした場合、双方向映像通信サービスクライアントより１ＧＯＰ時間中の映像フレームの構成情報を含む映像符号化情報（符号化方式、帯域、ＩまたはＰの出現周期Ｍ、Ｉの出現周期Ｎ、映像フレームの種別毎の構成パケット数など）を収集する。そして、映像品質の測定対象区間（例えば、１０秒）の始端から１ＧＯＰ時間幅の区間を先頭領域とし、この先頭領域をｎ分割（例えば、１０分割）してｎ個のＧＯＰ時間の開始位置（１構成単位時間の開始位置）を仮に定める。なお、１ＧＯＰ時間は、映像符号化情報から、例えばＧＯＰ構成をＭ＝３，Ｎ＝１５、１秒間３０フレームの映像の場合、０．５秒というようにして求められる。 In this invention, when the communication terminal is a bidirectional video communication service client and one constituent unit when encoding a video signal is GOP, the video including the configuration information of the video frame during one GOP time from the bidirectional video communication service client. Encoding information (encoding method, bandwidth, I or P appearance period M, I appearance period N, number of packets constituting each type of video frame, etc.) is collected. Then, a section having a width of 1 GOP from the start of the video quality measurement target section (for example, 10 seconds) is used as a head area, and the head area is divided into n (for example, 10 sections) to start positions of n GOP times ( The starting position of one structural unit time) is provisionally determined. Note that the 1 GOP time is obtained from the video encoding information, for example, in the case of a video with a GOP configuration of M = 3, N = 15, and 30 frames per second, 0.5 seconds.

そして、このｎ個のＧＯＰ時間の開始位置毎に、１ＧＯＰ時間中の映像フレームの構成情報を含む映像符号化情報に基づいて、その開始位置以降の各映像フレームの位置および種別を仮に定める。例えば、１ＧＯＰ時間内の総パケット数に対する映像フレームの種別毎の総パケット数の比率を映像フレーム種類別構成パケット比率として求め、１ＧＯＰ時間内の総パケット数、映像フレーム種類別構成パケット比率および１ＧＯＰ時間中の映像フレームの配列（Ｉ，Ｐ，Ｂフレームの配列）に基づいて、ｎ個のＧＯＰ時間の開始位置毎にそれ以降の各映像フレームの位置および種別を仮に定める。 Then, for each start position of the n GOP times, the position and type of each video frame after the start position are tentatively determined based on the video encoding information including the configuration information of the video frame during one GOP time. For example, the ratio of the total number of packets for each type of video frame to the total number of packets in one GOP time is obtained as the composition packet ratio by video frame type, and the total number of packets in one GOP time, the composition packet ratio by video frame type, and 1 GOP time Based on the arrangement of the video frames in the middle (the arrangement of I, P, and B frames), the position and type of each subsequent video frame are provisionally determined for each start position of n GOP times.

この場合、映像フレーム種類別構成パケット比率は、映像符号化情報から得られる映像フレームの種別毎の構成パケット数およびＧＯＰ構成から映像フレームの種別毎の総パケット数を求め、この映像フレームの種別毎の総パケット数をトータルしてＧＯＰ内の総パケット数を求め、映像フレームの種別毎の総パケット数をＧＯＰ内の総パケット数で除して求めるようにしてもよいし、ＧＯＰ内の総パケット数もしくは帯域と映像フレーム種類別構成パケット比率との関係を定めたテーブルを用意しておき、このテーブルから求めるようにしてもよい。また、ＧＯＰ内の総パケット数もしくは帯域と映像フレーム種類別構成パケット比率との関係を表す推定式を予め作成しておき、この推定式から求めるようにしてもよい。 In this case, the configuration packet ratio for each video frame type is obtained by obtaining the number of configuration packets for each video frame type obtained from video encoding information and the total number of packets for each video frame type from the GOP configuration. The total number of packets in the GOP may be obtained by totaling the total number of packets in the GOP, and the total number of packets for each video frame type may be divided by the total number of packets in the GOP. A table that defines the relationship between the number or bandwidth and the composition packet ratio for each video frame type may be prepared and obtained from this table. Also, an estimation expression that represents the relationship between the total number of packets or bandwidth in the GOP and the configuration packet ratio for each video frame type may be created in advance and obtained from this estimation expression.

そして、ｎ個のＧＯＰ時間の開始位置毎に、仮に定められた映像フレームの位置および種別に基づいて、無効パケットがどの種別の映像フレームに属しているのかを判断し、その無効パケットが属する映像フレームおよびその映像フレームの影響を受けて劣化する映像フレームの数を無効フレーム数として算出する。これにより、ｎ個のＧＯＰ時間の開始位置を１つずつずらしながら、合計ｎ個の無効フレーム数が算出される。 Then, for each start position of the n GOP times, it is determined which type of video frame the invalid packet belongs to based on the position and type of the video frame tentatively determined, and the video to which the invalid packet belongs The number of frames and the number of video frames that deteriorate due to the influence of the video frames is calculated as the number of invalid frames. As a result, the total number of invalid frames is calculated while shifting the start positions of the n GOP times one by one.

ここで、無効フレーム数の算出に際しては、ＧＯＰ時間の開始位置毎に、そのＧＯＰ時間の開始位置から測定対象区間の終端までの区間を算出対象区間とする方式（方式１）と、ＧＯＰ時間の開始位置毎に、そのＧＯＰ時間の開始位置から測定対象区間の時間幅の区間を算出対象区間とする方式（方式２）とが考えられる。 Here, when calculating the number of invalid frames, for each start position of the GOP time, a method (method 1) in which a section from the start position of the GOP time to the end of the measurement target section is a calculation target section, and the GOP time For each start position, there can be considered a method (method 2) in which a time width of a measurement target section from the start position of the GOP time is used as a calculation target section.

方式１の場合、ＧＯＰ時間の開始位置を１つずつずらすと、無効フレーム数の算出対象区間が少しずつ短くなる。しかし、１ＧＯＰ時間が０．５秒、測定対象区間が１０秒というように、測定対象区間が１ＧＯＰ時間に対してかなり長いような場合、ＧＯＰ時間の開始位置毎に算出される無効フレーム数の誤差は小さく、映像品質の推定精度への影響は少ない。 In the case of method 1, if the start position of the GOP time is shifted one by one, the invalid frame number calculation target section is gradually shortened. However, if the measurement target section is considerably longer than the 1 GOP time, such as 1 GOP time of 0.5 seconds and the measurement target section of 10 seconds, an error in the number of invalid frames calculated for each start position of the GOP time. Is small and has little effect on the estimation accuracy of video quality.

方式２の場合、ＧＯＰ時間の開始位置を１つずつずらしても、無効フレーム数の算出対象区間は短くなることがなく、常に一定の時間幅（測定対象区間の時間幅）が確保される。したがって、方式２とすることにより、方式１とする場合よりも、映像品質の推定精度のアップが望める。測定対象区間が短い場合には方式２とするとよい。 In the case of method 2, even if the start position of the GOP time is shifted one by one, the calculation target section of the number of invalid frames is not shortened, and a constant time width (time width of the measurement target section) is always secured. Therefore, the use of method 2 can be expected to improve the estimation accuracy of the video quality as compared to the method 1. When the measurement target section is short, the method 2 is preferable.

本発明において、測定対象区間は原理的には１ＧＯＰ時間以上あればよいが、１ＧＯＰ時間が例えば０．５秒というように短い場合には、１ＧＯＰ時間に対して測定対象期間をかなり長くした方がよい。また、１ＧＯＰ時間が例えば８秒というように長い場合には、２ＧＯＰ時間以上とすることが望ましい。 In the present invention, in principle, the measurement target section may be 1 GOP time or longer, but when the 1 GOP time is short, for example, 0.5 seconds, the measurement target period should be considerably longer than the 1 GOP time. Good. In addition, when 1 GOP time is long, for example, 8 seconds, it is desirable to set it to 2 GOP time or more.

本発明において、測定対象区間内の映像品質の推定は、ｎ個のＧＯＰ時間の開始位置毎に算出された無効フレーム数に基づいて行われる。例えば、ｎ個のＧＯＰ時間の開始位置毎に算出された無効フレーム数の平均値を無効フレーム数期待値とし、この無効フレーム数期待値の測定対象区間内の総送信フレーム数に対する割合を無効フレーム率期待値として求め、この無効フレーム率期待値を予め定められている無効フレーム率と映像品質との関係を示す映像品質推定モデルに代入して、測定対象区間内の映像品質の推定値を得る。 In the present invention, the estimation of the video quality in the measurement target section is performed based on the number of invalid frames calculated for each start position of n GOP times. For example, an average value of the number of invalid frames calculated for each start position of n GOP times is used as an invalid frame number expected value, and the ratio of the invalid frame number expected value to the total number of transmission frames in the measurement target section is used as an invalid frame. Obtained as an expected rate value, and substitutes this expected invalid frame rate value into a predetermined video quality estimation model indicating the relationship between invalid frame rate and video quality to obtain an estimated value of video quality within the measurement target section .

なお、無効フレーム数期待値は、ｎ個のＧＯＰ時間の開始位置毎に算出された無効フレーム数の最大値としたり、７５％値（最大値の７５％）とするなどとしてもよい。無効フレーム数の最大値を無効フレーム数期待値とすることによって、最悪時を想定して、品質管理を行うことが可能となる。７５％値を無効フレーム数期待値とすることによって、安全側に品質管理を行うことが可能となる。また、無効フレーム率と映像品質との関係を示す映像品質推定モデルは、サービス帯域，コンテンツの違い毎に求めておいてもよい。また、映像品質推定モデルは、モデル式として作成しておくようにしてもよいし、テーブルとして用意しておくようにしてもよい。 Note that the expected number of invalid frames may be the maximum number of invalid frames calculated for each start position of n GOP times, or may be a 75% value (75% of the maximum value). By setting the maximum number of invalid frames as the expected number of invalid frames, quality management can be performed assuming the worst case. By setting the 75% value as the expected number of invalid frames, quality management can be performed on the safe side. Also, a video quality estimation model indicating the relationship between the invalid frame rate and the video quality may be obtained for each service band and content. Further, the video quality estimation model may be created as a model formula or may be prepared as a table.

また、本発明において、符号化する際の１構成単位はＧＯＰに限られるものではない。なお、ＧＯＰは、ＧＯＶ（Group of video）と呼ばれることもある。また、本発明は、映像品質推定装置としてではなく、映像品質推定方法として実現することも可能である。 In the present invention, one structural unit for encoding is not limited to GOP. The GOP is sometimes called a GOV (Group of video). In addition, the present invention can be realized not as a video quality estimation device but as a video quality estimation method.

本発明によれば、符号化された一群の映像フレームに対応する１構成単位時間中の映像フレームの構成情報を含む映像符号化情報を通信端末より収集し、映像品質の測定対象区間の始端から１構成単位時間幅の区間を先頭領域とし、この先頭領域をｎ分割してｎ個の１構成単位時間の開始位置を仮に定め、このｎ個の１構成単位時間の開始位置毎に、１構成単位時間中の映像フレームの構成情報を含む映像符号化情報に基づいて、その開始位置以降の各映像フレームの位置および種別を仮に定め、無効パケットがどの種別の映像フレームに属しているのかを判断し、無効パケットが属する映像フレームおよびその映像フレームの影響を受けて劣化する映像フレームの数を無効フレーム数として算出し、このｎ個の１構成単位時間の開始位置毎に算出された無効フレーム数に基づいて測定対象区間内の映像品質を推定するようにしたので、無効パケットが属する映像フレームの情報をヘッダ情報やペイロード情報から得ることなく、また損失パケットの前後のパケットが損失しているような場合であっても、精度よく映像品質を推定することができるようになる。 According to the present invention, video encoding information including configuration information of video frames in one unit time corresponding to a group of encoded video frames is collected from a communication terminal, and from the beginning of a video quality measurement target section. A section of one structural unit time width is used as a head area, and the head area is divided into n to tentatively determine start positions of n one structural unit time, and for each start position of the n one structural unit time, one configuration Based on video encoding information including video frame configuration information during a unit time, the position and type of each video frame after the start position are provisionally determined, and the type of video frame to which the invalid packet belongs is determined. The number of video frames to which invalid packets belong and the number of video frames that deteriorate due to the influence of the video frames are calculated as the number of invalid frames. Since the video quality within the measurement target section is estimated based on the calculated number of invalid frames, the information about the video frame to which the invalid packet belongs is not obtained from the header information and payload information, and the packets before and after the lost packet Even when the image is lost, the video quality can be estimated accurately.

以下、本発明を図面に基づいて詳細に説明する。図１はこの発明に係る映像品質推定装置を利用した映像通信サービスシステムの一例を示すシステム構成図である。なお、本実施の形態において、映像品質とはユーザ体感品質を表し、主観評価実験によって得られた主観評価値（例えば、ＭＯＳ：Mean Opinion Score）のことを指す。 Hereinafter, the present invention will be described in detail with reference to the drawings. FIG. 1 is a system configuration diagram showing an example of a video communication service system using a video quality estimation apparatus according to the present invention. In the present embodiment, the video quality represents the user experience quality and refers to a subjective evaluation value (for example, MOS: Mean Opinion Score) obtained by a subjective evaluation experiment.

図１において、１０は本発明に係る映像品質推定装置、２０（２０−１，２０−２）は双方向映像通信サービスクライアント（以下、双方向ユーザ端末と呼ぶ）、３０は映像配信サーバ、４０は映像配信サーバ３０からの映像の配信を受ける映像配信クライアント（以下、映像配信端末と呼ぶ）であり、これらの装置はネットワーク５０を介して相互に接続されている。 In FIG. 1, 10 is a video quality estimation apparatus according to the present invention, 20 (20-1, 20-2) is a bidirectional video communication service client (hereinafter referred to as a bidirectional user terminal), 30 is a video distribution server, 40 Is a video distribution client (hereinafter referred to as a video distribution terminal) that receives video distribution from the video distribution server 30, and these devices are connected to each other via a network 50.

図２は映像品質推定装置１０の内部構成の要部を示すブロック図である。映像品質推定装置１０は、双方向ユーザ端末２０や映像配信サーバ３０から映像符号化情報（符号化方式，帯域，映像フレームの構成情報，映像フレームの種別毎の構成パケット数など）を収集する映像符号化情報収集部１１と、双方向ユーザ端末２０や映像配信端末４０から無効パケット（損失パケット，順序逆転パケット，遅着パケット）の発生情報を収集する無効パケット発生情報収集部１２と、映像符号化情報収集部１１が収集した映像符号化情報に基づいて後述する映像フレーム種類別構成パケット比率を算出する映像フレーム種類別構成パケット比率算出部１３と、映像符号化情報収集部１１が収集した映像符号化情報、映像フレーム種類別構成パケット比率算出部１３で算出された映像フレーム種類別構成パケット比率および無効パケット発生情報収集部１２が収集した無効パケットの発生情報に基づいて後述する無効フレーム率期待値を算出する無効フレーム率期待値算出部１４と、無効フレーム率と映像品質との関係を映像品質推定モデルとして格納した映像品質推定モデルＤＢ１５と、無効フレーム率期待値算出部１４で算出された無効フレーム率期待値から映像品質推定モデルＤＢ１５に格納されている映像品質推定モデルに従って映像品質推定値を求める映像品質推定部１６とを備えている。 FIG. 2 is a block diagram showing the main part of the internal configuration of the video quality estimation apparatus 10. The video quality estimation apparatus 10 collects video encoding information (encoding method, bandwidth, video frame configuration information, number of configuration packets for each video frame type, etc.) from the interactive user terminal 20 and the video distribution server 30. An encoded information collection unit 11, an invalid packet generation information collection unit 12 that collects generation information of invalid packets (lost packets, reverse sequence packets, late arrival packets) from the interactive user terminal 20 and the video distribution terminal 40, and a video code A video packet type-specific configuration packet ratio calculation unit 13 that calculates a video frame type-specific configuration packet rate, which will be described later, based on the video encoding information collected by the video encoding information collection unit 11, and a video collected by the video encoding information collection unit 11 Encoding information, composition packet ratio by video frame type calculated by video packet type composition packet ratio calculation unit 13 and An invalid frame rate expected value calculation unit 14 that calculates an invalid frame rate expected value, which will be described later, based on invalid packet generation information collected by the valid packet generation information collection unit 12, and the relationship between the invalid frame rate and video quality Based on the video quality estimation model DB 15 stored as the estimation model and the expected invalid frame rate value calculated by the invalid frame rate expected value calculation unit 14, the estimated video quality value is determined according to the video quality estimation model stored in the video quality estimation model DB 15. And a desired video quality estimation unit 16.

なお、映像品質推定装置１０は、プロセッサや記憶装置からなるハードウェアと、これらのハードウェアと協働して図２に示した各部の機能を実現させるプログラムとによって実現される。また、この例では、映像品質推定モデルＤＢ１５に、符号化方式，コンテンツ，帯域などの条件毎に無効フレーム率と映像品質との関係を示すモデル式を作成し、これを映像品質推定モデルとして格納している。 Note that the video quality estimation apparatus 10 is realized by hardware including a processor and a storage device, and a program that realizes the functions of the units illustrated in FIG. 2 in cooperation with the hardware. Further, in this example, a model formula indicating the relationship between the invalid frame rate and the video quality is created in the video quality estimation model DB 15 for each condition such as the encoding method, content, and bandwidth, and stored as a video quality estimation model. is doing.

図３は、映像品質推定モデルの例であり、無効フレーム率が求まると、映像品質（ＭＯＳ）を算出することができる。無効フレーム率に対する映像品質推定モデルとしては、例えば、次式のような指数関数で表されることが分かっている。
Ｙ＝ｙ０＋Ａ１*ｅｘｐ（−ｘ／ｔ１）＋Ａ２*ｅｘｐ（−ｘ／ｔ２）・・・・(１)
Ｙ：ユーザ体感品質（ＭＯＳ）、ｘ：無効フレーム率、ｙ０，Ａ１，Ａ２，ｔ１，ｔ２：定数。 FIG. 3 shows an example of a video quality estimation model. When an invalid frame rate is obtained, video quality (MOS) can be calculated. It has been found that the video quality estimation model for the invalid frame rate is represented by an exponential function such as the following equation, for example.
Y = y0 + A1 * exp (−x / t1) + A2 * exp (−x / t2) (1)
Y: user experience quality (MOS), x: invalid frame rate, y0, A1, A2, t1, t2: constants.

〔実施の形態１（方式１）〕
以下、図５に示すフローチャートに従って、映像品質推定装置１０が有する本実施の形態特有の機能について、双方向ユーザ端末２０における映像品質を推定する場合を例にとって説明する。 [Embodiment 1 (Method 1)]
In the following, according to the flowchart shown in FIG. 5, functions specific to the present embodiment of the video quality estimation apparatus 10 will be described by taking as an example the case of estimating the video quality in the interactive user terminal 20.

〔映像符号化情報収集部〕
映像品質推定装置１０において、映像符号化情報収集部１１は、双方向ユーザ端末２０から映像信号を符号化する際の符号化方式，帯域，映像フレームの構成情報，映像フレームの種別毎の構成パケット数などの映像符号化情報を収集する（ステップ１０１）。この例では、双方向ユーザ端末２０での符号化方式を高能率圧縮符号化とし、映像フレームの構成情報としてＧＯＰ構成（Ｍ，Ｎ）が収集されるものとする。 [Video coding information collection unit]
In the video quality estimation apparatus 10, the video encoding information collection unit 11 is configured to encode a video signal from the interactive user terminal 20, a band, video frame configuration information, and a configuration packet for each video frame type. Video encoding information such as the number is collected (step 101). In this example, it is assumed that the encoding method in the interactive user terminal 20 is high-efficiency compression encoding, and GOP configuration (M, N) is collected as video frame configuration information.

なお、映像フレームの種別毎の構成パケット数は、映像配信サービスの場合には、予め想定されるジャンルの映像コンテンツについてデータ収集してもよいし、サービスで実際に利用されるコンテンツの情報をデータ収集してもよい。また、双方向映像通信サービスの場合には、予め想定されるジャンルの映像コンテンツ（例えば、遠隔授業，自由会話など）のサンプル映像を用いて、映像フレームの種別毎のパケット数情報を収集しておいてもよい。 In the case of a video distribution service, the number of configuration packets for each type of video frame may be collected for video content of a genre assumed in advance, or information on content actually used in the service may be data. May be collected. In the case of a bidirectional video communication service, the number of packet information for each video frame type is collected using sample video of a video content of a genre that is assumed in advance (for example, distance learning, free conversation, etc.). It may be left.

〔映像フレーム種類別構成パケット比率算出部〕
映像フレーム種類別構成パケット比率算出部１３は、映像符号化情報収集部１１が収集した映像符号化情報を入力とし、この映像符号化情報に基づいて映像フレーム種類別構成パケット比率を算出する（ステップ１０２）。 [Configuration packet ratio calculation unit by video frame type]
The video packet type composition packet ratio calculation unit 13 receives the video coding information collected by the video coding information collection unit 11 and calculates the video frame type composition packet ratio based on the video coding information (step). 102).

この例では、映像符号化情報から得られる映像フレームの種別毎の構成パケット数（Ｉ，Ｐ，Ｂの各フレームを構成するパケット数）およびＧＯＰ構成から映像フレームの種別毎の総パケット数を求め、この映像フレームの種別毎の総パケット数をトータルしてＧＯＰ内の総パケット数を求め、映像フレームの種別毎の総パケット数をＧＯＰ内の総パケット数で除して、Ｉフレームの映像フレーム種類別構成パケット比率、Ｐフレームの映像フレーム種類別構成パケット比率、Ｂフレームの映像フレーム種類別構成パケット比率を求める。 In this example, the total number of packets for each type of video frame is obtained from the number of constituent packets for each type of video frame obtained from video coding information (the number of packets constituting each frame of I, P, and B) and the GOP configuration. The total number of packets for each type of video frame is totaled to obtain the total number of packets in the GOP, the total number of packets for each type of video frame is divided by the total number of packets in the GOP, and the video frame of the I frame The configuration packet ratio by type, the configuration packet ratio by video frame type of P frame, and the configuration packet ratio by video frame type of B frame are obtained.

なお、ＧＯＰ内の総パケット数もしくは帯域と映像フレーム種類別構成パケット比率との関係（図４参照）を定めたテーブルを用意しておき、このテーブルからＩ，Ｐ，Ｂ各フレームの映像フレーム種類別構成パケット比率を求めるようにしてもよい。また、ＧＯＰ内の総パケット数もしくは帯域と映像フレーム種類別構成パケット比率との関係を表す推定式を予め作成しておき、この推定式からＩ，Ｐ，Ｂ各フレームの映像フレーム種類別構成パケット比率を求めるようにしてもよい。 It should be noted that a table that defines the relationship between the total number of packets in the GOP or the bandwidth and the composition packet ratio for each video frame type (see FIG. 4) is prepared, and from this table, the video frame type for each of the I, P, and B frames. Another configuration packet ratio may be obtained. In addition, an estimation expression representing the relationship between the total number of packets or bandwidth in the GOP and the configuration packet ratio for each video frame type is created in advance, and the configuration packet for each frame of I, P, and B frames from this estimation formula The ratio may be obtained.

〔無効パケット発生情報収集部〕
無効パケット発生情報収集部１２は、１０秒を測定区間（測定対象区間）とし、この測定区間内の無効パケットの発生情報を双方向ユーザ端末２０から収集する（ステップ１０３）。無効パケットとは、ネットワーク５０における損失パケットおよび順序逆転パケットならびに双方向ユーザ端末２０におけるバッファ溢れによる遅着パケットを合計したものである。本実施の形態では、双方向ユーザ端末２０で無効パケットの発生情報を収集し、この収集した無効パケットの発生情報を映像品質推定装置１０からの要求に応じて映像品質推定装置１０へ転送するという方法をとる。なお、この転送には、制御パケットとして、例えば「ＩＥＴＦＲＦＣ３６１１」に記載されているＲＴＣＰＸＲを用いる。 [Invalid packet occurrence information collection unit]
The invalid packet generation information collection unit 12 sets 10 seconds as a measurement section (measurement target section), and collects generation information of invalid packets in the measurement section from the interactive user terminal 20 (step 103). The invalid packet is a sum of lost packets and reverse order packets in the network 50 and late arrival packets due to buffer overflow in the bidirectional user terminal 20. In this embodiment, the interactive user terminal 20 collects invalid packet occurrence information and transfers the collected invalid packet occurrence information to the video quality estimation apparatus 10 in response to a request from the video quality estimation apparatus 10. Take the way. For this transfer, for example, RTCPXR described in “IETF RFC3611” is used as a control packet.

〔無効フレーム率期待値算出部〕
無効フレーム率期待値算出部１４は、映像符号化情報と映像フレーム種類別構成パケット比率と無効パケットの発生情報に基づいて、次のようにして無効フレーム率期待値を算出する（ステップ１０４）。図６にステップ１０４における無効フレーム率期待値の算出処理の詳細を示す。 [Invalid frame rate expected value calculation unit]
The invalid frame rate expected value calculation unit 14 calculates the expected invalid frame rate based on the video encoding information, the configuration packet ratio for each video frame type, and the generation information of invalid packets as follows (step 104). FIG. 6 shows details of the processing for calculating the expected invalid frame rate in step 104.

先ず、測定区間（図７（ａ）に示すＴ（Ｔ１））の始端から１ＧＯＰ時間の区間（図７（ｂ）に示すＴＳ）を先頭領域とする（ステップ２０１）。１ＧＯＰ時間は、イントラリフレッシュ時間ともいい、映像符号化情報から求められる。ＭＰＥＧ−２の場合、１ＧＯＰ時間は０．５秒以内が望ましいとされており、Ｍ＝３，Ｎ＝１５というＧＯＰ構成であれば、ＩまたはＰの出現周期Ｍが３フレーム、Ｉの出現周期Ｎが１５フレームであることが分かるので、１秒間３０フレームの映像の場合、１ＧＯＰ時間は０．５秒として求められる。リアルタイムかつ双方向の映像通信サービスなど、遅延に厳しいアプリケーションの場合は、１ＧＯＰ時間を短くとる場合が多い。一方、遅延にそれほど厳しくないアプリケーションの場合には、１ＧＯＰ時間が３秒あるいはそれ以上の場合もあり得る。 First, a section of 1 GOP time (TS shown in FIG. 7B) from the start of the measurement section (T (T1) shown in FIG. 7A) is set as the head area (step 201). The 1 GOP time is also called an intra refresh time, and is obtained from video coding information. In the case of MPEG-2, 1 GOP time is preferably 0.5 seconds or less. If the GOP configuration is M = 3 and N = 15, the appearance period M of I or P is 3 frames, and the appearance period of I. Since it can be seen that N is 15 frames, in the case of video of 30 frames per second, 1 GOP time is obtained as 0.5 seconds. In the case of an application that is severe in delay, such as a real-time and interactive video communication service, the 1 GOP time is often shortened. On the other hand, in the case of an application that is not so severe in delay, there may be a case where 1 GOP time is 3 seconds or more.

次に、ステップ２０１で定めた先頭領域ＴＳをパケット数によりｎ分割（この例では、ｎ＝１０）して、ｎ個のＧＯＰ開始パケット位置（ＧＯＰ時間の開始位置）Ｐ１〜Ｐｎを仮に定める（図７（ｃ）：ステップ２０２）。すなわち、１ＧＯＰ時間内の総パケット数より、１０個のＧＯＰ開始パケット位置を仮に決定する。例えば、１ＧＯＰ時間内の総パケット数が１０００個である場合、１０個のＧＯＰ開始パケット位置は、「１」，「１０１」，「２０１」，「３０１」，「４０１」，「５０１」，「６０１」，「７０１」，「８０１」，「９０１」番目のパケットとされる。 Next, the head area TS determined in step 201 is divided into n by the number of packets (in this example, n = 10), and n GOP start packet positions (GOP time start positions) P1 to Pn are temporarily determined ( FIG. 7C: Step 202). That is, ten GOP start packet positions are temporarily determined from the total number of packets in one GOP time. For example, if the total number of packets in one GOP time is 1000, the positions of the 10 GOP start packets are “1”, “101”, “201”, “301”, “401”, “501”, “ The packets are “601”, “701”, “801”, and “901”.

そして、ＧＯＰ開始パケット位置の順番を示すパラメータをｍ＝１とし（ステップ２０３）、ｍ＝１番目のＧＯＰ開始パケット位置Ｐ１より、１ＧＯＰ時間内の総パケット数，映像フレーム種類別パケット比率およびＧＯＰ構成（１ＧＯＰ時間中の映像フレームの配列）に基づいて、測定区間Ｔの終端までの各映像フレームの位置および種別を仮に定める（ステップ２０４）。 Then, the parameter indicating the order of the GOP start packet position is set to m = 1 (step 203), and from the first GOP start packet position P1, m = 1, the total number of packets within one GOP time, the packet ratio by video frame type, and the GOP configuration Based on (arrangement of video frames during 1 GOP time), the position and type of each video frame up to the end of the measurement section T are provisionally determined (step 204).

例えば、ＧＯＰ構成がＭ＝３，Ｎ＝１５である場合には、１ＧＯＰ時間中の映像フレームの配列が「ＩＢＢＰＢＢＰＢＢＰＢＢＰＢＢ」となる。ここで、１ＧＯＰ時間内の総パケット数が１０００個であり、そのときのＩフレームの映像フレーム種類別パケット比率が例えば１０％として求められていると、Ｉフレームとなるパケット番号は１番から１００番となる。同様にして、Ｐフレーム、Ｂフレームについて、パケット番号を求める。 For example, when the GOP configuration is M = 3 and N = 15, the arrangement of video frames during one GOP time is “IBBPBBPBBPBBPBB”. Here, if the total number of packets in one GOP time is 1000, and the packet ratio by video frame type of the I frame at that time is obtained as 10%, for example, the packet number for the I frame is from 1 to 100. It will be a turn. Similarly, packet numbers are obtained for P frames and B frames.

そして、このＧＯＰ開始パケット位置Ｐ１に対して仮に定められた映像フレームの位置および種別に基づいて、ＧＯＰ開始パケット位置Ｐ１から測定期間Ｔの終端までの区間で発生した無効パケットがどの種別の映像フレームに属しているのかを判断し（ステップ２０５）、その無効パケットが属する映像フレームおよびその映像フレームの影響を受けて劣化する映像フレームの数をＧＯＰ開始パケット位置Ｐ１での無効フレーム数Ｆ_NG1として算出する（ステップ２０６）。 Then, based on the position and type of the video frame provisionally determined for the GOP start packet position P1, which type of video frame is the invalid packet generated in the section from the GOP start packet position P1 to the end of the measurement period T. (Step 205), and calculates the number of video frames to which the invalid packet belongs and the number of video frames that deteriorate due to the influence of the video frame as the number of invalid frames F _NG1 at the GOP start packet position P1. (Step 206).

例えば、発生した無効パケットが９０番目のパケットであれば、Ｉフレームに属していると判断し、１０１番目のパケットであればＢフレームに属していると判断し、その映像フレームの種別とＧＯＰ構成に基づいて無効フレーム数を算出し、測定区間Ｔの終端までの無効パケットに対する無効フレーム数を積算する。 For example, if the generated invalid packet is the 90th packet, it is determined that it belongs to the I frame, and if it is the 101st packet, it is determined that it belongs to the B frame. The number of invalid frames is calculated based on the above, and the number of invalid frames for the invalid packets up to the end of the measurement section T is integrated.

そして、ｍ＝ｍ＋１＝２としてステップ２０４へ戻り（ステップ２０８）、ステップ２０７においてｍ＝ｎとなるまで、ステップ２０４〜２０６の処理を繰り返す。これにより、２番目のＧＯＰ開始パケット位置Ｐ２での無効フレーム数Ｆ_NG2、３番目のＧＯＰ開始パケット位置Ｐ３での無効フレーム数Ｆ_NG3というように、ＧＯＰ開始パケット位置を１つずつずらしながら、ｎ番目のＧＯＰ開始パケット位置Ｐｎまで、合計ｎ個の無効フレーム数（Ｆ_NG1〜Ｆ_NGn）が算出される。 Then, m = m + 1 = 2 is set, the process returns to step 204 (step 208), and the processes of steps 204 to 206 are repeated until m = n in step 207. As a result, the number of invalid frames F _NG2 at the second GOP start packet position P2 and the number of invalid frames F _NG3 at the third GOP start packet position P3 are shifted one by one while shifting the GOP start packet position one by one. th to GOP start packet position Pn, a total of n number of invalid frames (F _NG1 ~F _NGn) is calculated.

全てのＧＯＰ開始パケット位置での無効フレーム数が算出されると（ステップ２０７のＹＥＳ）、すなわちＧＯＰ開始パケット位置Ｐ１〜Ｐｎでの無効フレーム数Ｆ_NG1〜Ｆ_NGn が算出されると、この無効フレーム数Ｆ_NG1〜Ｆ_NGn より無効フレーム数期待値を求める（ステップ２０９）。この例では、無効フレーム数Ｆ_NG1〜Ｆ_NGn の平均値を無効フレーム数期待値として求める。そして、測定区間Ｔ内の総送信フレーム数を求め、この測定区間Ｔ内の総送信フレーム数に対する無効フレーム数期待値の割合を無効フレーム率期待値として算出する（ステップ２１０）。 When the number of invalid frames in all GOP start packet position is calculated (YES in step 207), i.e. the GOP start packet position number of invalid frames F at P1 to Pn _NG1 to F _NGN is calculated, the invalid frame the number F _NG1 to F _NGN Request invalid frame number expected value than (step 209). In this example, the average value of the number of invalid frames F _NG1 to F _NGN as an invalid frame number expected value. Then, the total number of transmission frames in the measurement interval T is obtained, and the ratio of the expected number of invalid frames to the total number of transmission frames in the measurement interval T is calculated as the expected invalid frame rate (step 210).

なお、この例では、無効フレーム数Ｆ_NG1〜Ｆ_NGn の平均値を無効フレーム数期待値として求めるようにしたが、無効フレーム数Ｆ_NG1〜Ｆ_NGn の最大値を無効フレーム数期待値として求めるようにしてもよく、７５％値（最大値の７５％）を無効フレーム数期待値として求めるようにしてもよい。無効フレーム数の最大値を無効フレーム数期待値とすることによって、最悪時を想定して、品質管理を行うことが可能となる。７５％値を無効フレーム数期待値とすることによって、安全側に品質管理を行うことが可能となる。 In this example, it was to obtain the average value of the number of invalid frames F _NG1 to F _NGN as an invalid frame number expected value, to obtain the maximum value of the number of invalid frames F _NG1 to F _NGN as an invalid frame number expected value Alternatively, a 75% value (75% of the maximum value) may be obtained as the invalid frame number expected value. By setting the maximum number of invalid frames as the expected number of invalid frames, quality management can be performed assuming the worst case. By setting the 75% value as the expected number of invalid frames, quality management can be performed on the safe side.

〔映像品質推定部〕
映像品質推定部１６は、無効フレーム率期待値算出部１４で算出された無効フレーム率期待値を入力とし、映像品質推定モデルＤＢ１５に格納されている映像品質推定モデルに従って測定区間Ｔ内の映像品質推定値を求める。 [Video Quality Estimator]
The video quality estimation unit 16 receives the expected invalid frame rate calculated by the invalid frame rate expected value calculation unit 14 as input, and the video quality in the measurement section T according to the video quality estimation model stored in the video quality estimation model DB 15. Get an estimate.

この場合、映像品質推定部１６は、映像符号化情報収集部１１からの映像符号化情報を参照し、その映像符号化情報に含まれる符号化方式，コンテンツ，帯域などの条件に合致した映像品質推定モデルを映像品質推定モデルＤＢ１５から選択する（図５：ステップ１０５）。 In this case, the video quality estimation unit 16 refers to the video encoding information from the video encoding information collection unit 11, and the video quality that meets the conditions such as the encoding method, content, and bandwidth included in the video encoding information An estimation model is selected from the video quality estimation model DB 15 (FIG. 5: step 105).

そして、この選択した映像品質推定モデルに無効フレーム率期待値算出部１４からの無効フレーム率期待値を代入して、測定区間Ｔ（Ｔ１）内の映像品質の推定値を求める（ステップ１０６）。 Then, the estimated invalid frame rate value from the invalid frame rate expected value calculation unit 14 is substituted into the selected video quality estimation model to obtain an estimated value of the video quality in the measurement section T (T1) (step 106).

以下同様にして、次の測定区間Ｔ（Ｔ２）についても、映像品質の推定を行う。この場合、ステップ１０７のＹＥＳに応じてステップ１０３へ戻り、測定区間Ｔ内の無効パケット発生情報の収集（ステップ１０３）、測定区間Ｔにおける無効フレーム率期待値の算出（ステップ１０４）、映像品質推定モデルの選択（ステップ１０５）、映像品質推定モデルに無効フレーム率期待値を代入しての映像品質推定値の算出（ステップ１０６）を繰り返す。なお、この例では、映像品質推定モデルをモデル式としたが、無効フレーム率と映像品質との関係をテーブルとして用意しておくようにしてもよい。 In the same manner, the video quality is estimated for the next measurement section T (T2). In this case, the process returns to step 103 in response to YES in step 107, collection of invalid packet generation information in the measurement section T (step 103), calculation of expected invalid frame rate value in the measurement section T (step 104), video quality estimation The model selection (step 105) and calculation of the video quality estimation value (step 106) by substituting the expected invalid frame rate into the video quality estimation model are repeated. In this example, the video quality estimation model is a model formula, but the relationship between the invalid frame rate and the video quality may be prepared as a table.

このようにして、本実施の形態では、ｎ個のＧＯＰ開始パケット位置毎に測定区間Ｔの終端までの各映像フレームの位置および種別を仮に定め、この仮に定めた各映像フレームの位置および種別から無効パケットがどの種別の映像フレームに属しているのか判断し、測定区間Ｔ内の映像品質の推定を行うので、無効パケットが属する映像フレームの情報をヘッダ情報やペイロード情報から得ることなく、また損失パケットの前後のパケットが損失しているような場合であっても、精度よく映像品質を推定することができるようになる。 In this way, in the present embodiment, the position and type of each video frame up to the end of the measurement section T are temporarily determined for every n GOP start packet positions, and the position and type of each video frame determined temporarily are determined. The type of video frame to which the invalid packet belongs is judged and the video quality within the measurement section T is estimated, so the information about the video frame to which the invalid packet belongs is not obtained from the header information and payload information, and the loss is lost. Even when the packets before and after the packet are lost, the video quality can be accurately estimated.

〔実施の形態２（方式２）〕
実施の形態１（方式１）では、図７の説明図からも分かるように、ＧＯＰ開始パケット位置を１つずつずらすと、無効フレーム数の算出対象区間Ｔｘが少しずつ短くなる。これは、無効フレーム数の算出対象区間ＴｘをＧＯＰ開始パケット位置から測定区間Ｔの終端までとしているためである。１ＧＯＰ時間が０．５秒、測定区間Ｔが１０秒というように、測定区間Ｔが１ＧＯＰ時間に対してかなり長いような場合、ＧＯＰ開始パケット位置毎に算出される無効フレーム数の誤差は小さく、映像品質の推定精度への影響は少ない。しかし、測定区間Ｔが短くなると、映像品質の推定精度への影響が心配される。 [Embodiment 2 (Method 2)]
In the first embodiment (method 1), as can be seen from the explanatory diagram of FIG. 7, if the GOP start packet position is shifted by one, the calculation target section Tx of the number of invalid frames is gradually shortened. This is because the invalid frame count calculation target section Tx extends from the GOP start packet position to the end of the measurement section T. When the measurement interval T is considerably longer than the 1 GOP time, such as 1 GOP time of 0.5 seconds and measurement interval T of 10 seconds, the error in the number of invalid frames calculated for each GOP start packet position is small. There is little impact on the estimation accuracy of video quality. However, when the measurement section T is shortened, there is a concern about the influence on the estimation accuracy of the video quality.

これに対して、実施の形態２（方式２）では、図１０に示すように、全てのＧＯＰ開始パケット位置について、そのＧＯＰ開始パケット位置から測定区間Ｔの時間幅の区間を無効フレーム数の算出対象区間Ｔｘとする。すなわち、ＧＯＰ開始パケット位置毎に、そのＧＯＰ開始パケット位置から測定区間Ｔの時間幅の区間を算出対象区間Ｔｘとし、この算出対象区間Ｔｘ内の無効フレーム数を算出する。この場合、次の測定区間Ｔの一部が無効フレーム数の算出対象区間Ｔｘに含まれることになるが、無効フレーム数の算出対象区間Ｔｘは短くなることがなく、常に一定の時間幅（測定区間Ｔの時間幅）が確保される。したがって、方式２とすることにより、方式１とする場合よりも、映像品質の推定精度のアップが望める。測定区間Ｔが短い場合には方式２とするとよい。 On the other hand, in the second embodiment (method 2), as shown in FIG. 10, for all GOP start packet positions, the number of invalid frames is calculated from the GOP start packet position to the time width of the measurement section T. Let it be the target section Tx. That is, for each GOP start packet position, a time span from the GOP start packet position to the measurement section T is set as a calculation target section Tx, and the number of invalid frames in the calculation target section Tx is calculated. In this case, a part of the next measurement section T is included in the calculation target section Tx of the number of invalid frames, but the calculation target section Tx of the number of invalid frames is not shortened and always has a constant time width (measurement A time width of the section T) is secured. Therefore, the use of method 2 can be expected to improve the estimation accuracy of the video quality as compared to the method 1. When the measurement section T is short, the method 2 is preferable.

図８に図５に対応する実施の形態２のフローチャートを示す。図９に図６に対応する実施の形態２のフローチャートを示す。実施の形態２では、ステップ１０３’において「測定区間＋１ＧＯＰ時間」の無効パケット発生情報を収集する。また、ステップ２０４’において、ＧＯＰ開始パケット位置毎に、測定区間Ｔの時間幅の区間の各映像フレームの位置および種別を仮に定めるようにする。また、ステップ２０５’において、ＧＯＰ開始パケット位置毎に、測定区間Ｔの時間幅の区間の無効パケットがどの種類の映像フレームに属しているのかを判断する。また、ステップ２０６’において、ＧＯＰ開始パケット位置毎に、測定区間Ｔの時間幅の区間内の無効フレーム数を算出する。 FIG. 8 shows a flowchart of the second embodiment corresponding to FIG. FIG. 9 shows a flowchart of the second embodiment corresponding to FIG. In the second embodiment, invalid packet occurrence information of “measurement interval + 1 GOP time” is collected in step 103 ′. Further, in step 204 ', the position and type of each video frame in the time width section of the measurement section T are temporarily determined for each GOP start packet position. In step 205 ′, for each GOP start packet position, it is determined to which type of video frame the invalid packet in the time width of the measurement section T belongs. In step 206 ′, the number of invalid frames in the time width of the measurement section T is calculated for each GOP start packet position.

なお、上述した実施の形態では、測定区間Ｔを１０秒としたが、原理的には１ＧＯＰ時間以上あればよい。１ＧＯＰ時間が例えば０．５秒というように短い場合には、１ＧＯＰ時間に対して測定対象期間をかなり長くした方がよい。また、１ＧＯＰ時間が例えば８秒といように長い場合には、２ＧＯＰ時間以上とすることが望ましい。
また、上述した実施の形態では、先頭領域ＴＳを１０分割するようにしたが、１０分割に限られるものではなく、２分割以上であればよい。
また、上述した実施の形態では、双方向ユーザ端末２０における映像品質を推定する場合を例にとって説明したが、映像配信端末４０における映像品質も同様にして推定される。
また、図５，図６、図８，図９に示したフローチャートは一例であり、各種の変形が自在であることは言うまでもない。 In the above-described embodiment, the measurement interval T is 10 seconds. However, in principle, it may be 1 GOP time or more. When the 1 GOP time is as short as 0.5 seconds, for example, it is better to make the measurement target period considerably longer than the 1 GOP time. In addition, when 1 GOP time is long, for example, 8 seconds, it is desirable to set it to 2 GOP time or more.
In the above-described embodiment, the head region TS is divided into ten parts, but is not limited to ten parts, and may be two or more parts.
In the above-described embodiment, the case where the video quality at the interactive user terminal 20 is estimated has been described as an example. However, the video quality at the video distribution terminal 40 is similarly estimated.
Further, the flowcharts shown in FIGS. 5, 6, 8, and 9 are merely examples, and it goes without saying that various modifications are possible.

本発明に係る映像品質推定装置を含む映像通信サービスシステムの一例を示すシステム構成図である。1 is a system configuration diagram showing an example of a video communication service system including a video quality estimation apparatus according to the present invention. この映像通信サービスシステムにおける映像品質推定装置の内部構成の要部を示すブロック図である。It is a block diagram which shows the principal part of the internal structure of the video quality estimation apparatus in this video communication service system. 映像品質推定モデル（無効フレーム率と映像品質との関係）の一例を示す図である。It is a figure which shows an example of a video quality estimation model (relationship between an invalid frame rate and video quality). ＧＯＰ内の総パケット数もしくは帯域と映像フレーム種類別構成パケット比率との関係を示す図である。It is a figure which shows the relationship between the total packet number or band in GOP, and the structure packet ratio according to video frame kind. 映像品質推定装置における映像品質の推定処理（実施の形態１（方式１））の過程を示すフローチャートである。It is a flowchart which shows the process of the estimation process (Embodiment 1 (method 1)) of the video quality in a video quality estimation apparatus. 映像品質の推定処理（実施の形態１（方式１））における無効フレーム率期待値の算出処理の詳細を示すフローチャートである。12 is a flowchart illustrating details of an invalid frame rate expected value calculation process in a video quality estimation process (Embodiment 1 (method 1)). 映像品質の推定処理（実施の形態１（方式１））における無効フレーム率期待値の算出処理を説明するためのタイムチャートである。10 is a time chart for explaining calculation processing of expected invalid frame rate values in video quality estimation processing (Embodiment 1 (method 1)). 映像品質推定装置における映像品質の推定処理（実施の形態２（方式２））の過程を示すフローチャートである。It is a flowchart which shows the process of the estimation process (Embodiment 2 (method 2)) of the video quality in a video quality estimation apparatus. 映像品質の推定処理（実施の形態２（方式２））における無効フレーム率期待値の算出処理の詳細を示すフローチャートである。10 is a flowchart showing details of an invalid frame rate expected value calculation process in a video quality estimation process (Embodiment 2 (Method 2)). 映像品質の推定処理（実施の形態２（方式２））における無効フレーム率期待値の算出処理を説明するためのタイムチャートである。10 is a time chart for explaining calculation processing of an expected invalid frame rate value in video quality estimation processing (Embodiment 2 (Method 2)). ＧＯＰ構成の一例（Ｍ＝３，Ｎ＝１５の場合）を示す図である。It is a figure which shows an example (in the case of M = 3, N = 15) of GOP structure.

Explanation of symbols

１０…映像品質推定装置、１１…映像符号化情報収集部、１２…無効パケット発生情報収集部、１３…映像フレーム種類別構成パケット比率算出部、１４…無効フレーム率期待値算出部、１５…映像品質推定モデルＤＢ、１６…映像品質推定部、Ｔ…測定区間（測定対象区間）、ＴＳ…先頭領域、Ｔｘ…算出対象区間、Ｐ１〜Ｐｎ…ＧＯＰ開始パケット位置。 DESCRIPTION OF SYMBOLS 10 ... Video quality estimation apparatus, 11 ... Video coding information collection part, 12 ... Invalid packet generation | occurrence | production information collection part, 13 ... Composition packet ratio calculation part according to video frame type, 14 ... Invalid frame rate expected value calculation part, 15 ... Video Quality estimation model DB, 16 ... video quality estimation unit, T ... measurement section (measurement target section), TS ... head region, Tx ... calculation target section, P1 to Pn ... GOP start packet position.

Claims

In a video quality estimation apparatus for estimating video quality in a video communication service between communication terminals via a network,
Video coding information collecting means for collecting video coding information including configuration information of video frames in one unit time corresponding to a group of encoded video frames from the communication terminal;
One configuration in which the section of one unit time duration from the beginning of the video quality measurement target section is set as a head region, and the head region is divided into n (n ≧ 2) to tentatively determine start positions of n one unit time. A unit time start position determining means;
Temporary setting means for temporarily determining the position and type of each video frame after the start position based on the video encoding information for each start position of the n unit time units,
Invalid packet occurrence information collecting means for collecting occurrence information of invalid packets in the network and the communication terminal;
For each start position of the n unit time units, based on the position and type of the video frame provisionally determined by the temporary setting means, which type of video frame the invalid packet generated after the start position is Invalid frame number calculating means for determining whether it belongs, and calculating the number of video frames to which the invalid packet belongs and the number of video frames deteriorated under the influence of the video frame,
A video quality estimation device comprising: video quality estimation means for estimating the video quality in the measurement target section based on the number of invalid frames calculated for each of the n start positions of one unit time.

The video quality estimation apparatus according to claim 1,
The invalid frame number calculating means includes:
For each start position of the one unit time, a section from the start position to the end of the measurement target section is used as a calculation target section, and the number of invalid frames in the calculation target section is calculated. apparatus.

The video quality estimation apparatus according to claim 1,
The invalid frame number calculating means includes:
For each start position of the n unit time units, a time width of the measurement target section from the start position is set as a calculation target section, and the number of invalid frames in the calculation target section is calculated. Video quality estimation device.

In the video quality estimation device according to any one of claims 1 to 3,
The temporary setting means includes
The ratio of the total number of packets for each video frame type to the total number of packets in one structural unit time is obtained as a constituent packet ratio for each video frame type, and the total number of packets in one structural unit time, the constituent packet ratio for each frame type And, based on the arrangement of the video frames during the one unit time, the position and type of each subsequent video frame are provisionally determined for each start position of the n one unit time. apparatus.

In the video quality estimation device according to any one of claims 1 to 3,
The video quality estimation means includes
An average value of the number of invalid frames calculated for each start position of the n unit time units is defined as an invalid frame number expected value, and a ratio of the invalid frame number expected value to the total number of transmission frames in the measurement target section is expressed as follows. Estimating the invalid frame rate, and substituting the expected invalid frame rate into a video quality estimation model indicating the relationship between the predetermined invalid frame rate and video quality, and estimating the video quality within the measurement target section A video quality estimation apparatus characterized by obtaining a value.

In the video quality estimation device according to any one of claims 1 to 3,
The video quality estimation means includes
The maximum value of the number of invalid frames calculated for each start position of the n unit time units is set as an invalid frame number expected value, and the ratio of the invalid frame number expected value to the total number of transmission frames in the measurement target section is expressed as follows. Estimating the invalid frame rate, and substituting the expected invalid frame rate into a video quality estimation model indicating the relationship between the predetermined invalid frame rate and video quality, and estimating the video quality within the measurement target section A video quality estimation apparatus characterized by obtaining a value.

In a video quality estimation method for estimating video quality in a video communication service between communication terminals via a network,
A first step of collecting, from the communication terminal, video encoding information including configuration information of video frames in one unit time corresponding to a group of encoded video frames;
A section having the one unit unit time width from the beginning of the video quality measurement target section is set as a head area, and the head area is divided into n (n ≧ 2) to temporarily determine the start positions of n number of one unit time. Steps,
A third step of tentatively determining the position and type of each video frame after the start position based on the video encoding information for each start position of the n unit time units;
A fourth step of collecting occurrence information of invalid packets in the network and the communication terminal;
For each start position of the n unit time units, based on the position and type of the video frame provisionally determined by the temporary setting means, which type of video frame the invalid packet generated after the start position is A fifth step of determining whether it belongs and calculating the number of video frames to which the invalid packet belongs and the number of video frames deteriorated under the influence of the video frame as the number of invalid frames;
A video quality estimation method comprising: a sixth step of estimating video quality in the measurement target section based on the number of invalid frames calculated for each start position of the n unit time units.