JP5350300B2

JP5350300B2 - Transcode video quality objective evaluation apparatus, method and program

Info

Publication number: JP5350300B2
Application number: JP2010068798A
Authority: JP
Inventors: 和久山岸; 淳岡本
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-03-24
Filing date: 2010-03-24
Publication date: 2013-11-27
Anticipated expiration: 2030-03-24
Also published as: JP2011205253A

Description

本発明は、トランスコード映像品質客観評価装置及び方法及びプログラムに係り、特に、
インターネットのようなIP(Internet Protocol)ネットワーク経由で行うＩＰＴＶサービス、映像配信サービスにおけるトランスコードされた映像の品質を客観的に評価するトランスコード映像品質客観評価装置及び方法及びプログラムに関する。 The present invention relates to a transcoded video quality objective evaluation apparatus, method and program, and in particular,
The present invention relates to a transcoded video quality objective evaluation apparatus, method and program for objectively evaluating the quality of transcoded video in an IPTV service and video distribution service performed via an IP (Internet Protocol) network such as the Internet.

インターネットアクセス回線の高速・広帯域化に伴い、インターネットを介して映像や音声を含む映像メディアを端末間あるいはサーバと端末との間で転送する映像通信サービスの普及が期待されている。 With the increase in speed and bandwidth of Internet access lines, the spread of video communication services that transfer video media including video and audio between terminals or between servers and terminals via the Internet is expected.

インターネットは、必ずしも通信品質が保証されていないネットワークであるため、音声及び映像メディアなどを用いて通信を行う場合、ユーザ端末間のネットワークの回線帯域が狭いことによるビットレートの低下、回線が輻輳することでパケット損失やパケット転送遅延が発生し、音声や映像メディアなどに対してユーザが知覚する品質（ユーザ体感品質：QoE（Quality of Experience））が劣化してしまう。 Since the Internet is a network whose communication quality is not always guaranteed, when communication is performed using audio and video media, the bit rate is lowered and the line is congested due to the narrow line bandwidth of the network between user terminals. As a result, packet loss and packet transfer delay occur, and the quality perceived by the user for audio, video media, etc. (user experience quality: QoE (Quality of Experience)) deteriorates.

具体的には、原映像を符号化する場合や既に符号化された映像を再符号化（トランスコード）する場合、フレーム内の映像信号にブロック単位の処理による劣化が生じたり、映像信号の高周波成分が失われることにより、映像全体の精細感が低くなる。 Specifically, when the original video is encoded or when the already encoded video is re-encoded (transcoded), the video signal in the frame may be deteriorated by processing in block units, or the high-frequency of the video signal The loss of components reduces the fineness of the entire video.

結果として、ユーザは受信した映像に、ぼけ、にじみ、やモザイク状の歪みを知覚する。 As a result, the user perceives blur, blur, and mosaic distortion in the received video.

上記のような映像通信サービスを良好な品質で提供していることを確認するためには、サービス提供前もしくは提供中に、ユーザが体感する映像の品質を測定し、ユーザに対して提供される映像の品質が高いことを監視することが重要となる。 In order to confirm that the video communication service as described above is provided with good quality, the quality of the video experienced by the user is measured before the service is provided or is provided to the user. It is important to monitor the high quality of the video.

したがって、ユーザが体感する映像の品質を適切に表現することができる映像品質客観評価技術が必要とされている。 Therefore, there is a need for video quality objective evaluation technology that can appropriately express the quality of the video experienced by the user.

従来、映像品質を評価する手法として、主観品質評価法（例えば、非特許文献１参照）や客観品質評価法（例えば、非特許文献２参照）がある。 Conventionally, as a method for evaluating video quality, there are a subjective quality evaluation method (for example, see Non-Patent Document 1) and an objective quality evaluation method (for example, Non-Patent Document 2).

主観品質評価法は、複数のユーザが実際に映像を視聴し、体感した品質を５段階（９段階や１１段階の場合もある）の品質尺度（非常に良い、良い、ふつう、悪い、非常に悪い）や妨害尺度（劣化が全く認められない、劣化が認められるが気にならない、劣化がわずかに気になる、劣化が気になる、劣化が非常に気になる）などにより評価し、全ユーザ数で各条件（例えば、パケット損失率０％でビットレートが２０Mbps）の映像品質評価値を平均し、その値をMOS（Mean Opinion Score）値やDMOS（Degradation Mean Opinion Score）値として定義している。 The subjective quality evaluation method is a quality scale (very good, good, normal, bad, very good) of the quality experienced by multiple users who actually watched the video and experienced the quality (may be 9 or 11 levels) Bad) and disturbance scale (no degradation is observed, degradation is observed but not bothered, slightly concerned about degradation, concerned about degradation, very concerned about degradation), etc. Average the video quality evaluation value for each condition (for example, packet loss rate 0% and bit rate 20Mbps) by the number of users, and define the value as MOS (Mean Opinion Score) value or DMOS (Degradation Mean Opinion Score) value ing.

しかしながら、主観品質評価は、特別な専用機材（モニタなど）や評価環境（室内照度や室内騒音など）を調整可能な評価施設を必要とするだけではなく、多数のユーザが実際に映像を評価する必要がある。そのため、ユーザが実際に評価を完了するまでに時間がかかってしまい、品質をリアルタイムに評価する場合には不向きである。 However, subjective quality evaluation not only requires special dedicated equipment (such as a monitor) and an evaluation facility that can adjust the evaluation environment (such as room illuminance and room noise), but many users actually evaluate the video. There is a need. For this reason, it takes time for the user to actually complete the evaluation, which is not suitable for evaluating the quality in real time.

そこで、映像品質に影響を与える特徴量（例えば、ビットレートやフレーム単位のビット量、パケット損失情報など）を利用し、映像品質評価値を出力する客観品質評価法の開発が望まれている。 Therefore, it is desired to develop an objective quality evaluation method that outputs a video quality evaluation value using a feature amount (for example, bit rate, bit amount in frame units, packet loss information, etc.) that affects video quality.

従来の客観品質推定法の１つに、符号化前の原映像と符号化後の劣化映像を入力とし、両者の映像信号（つまり、画素値）を比較し、映像品質に影響を与える特徴量から映像品質評価値を導出する技術がある（例えば、非特許文献２参照）。 One of the conventional objective quality estimation methods is that the original video before encoding and the degraded video after encoding are input, and the video signals (that is, pixel values) of both are compared, and the feature quantity that affects the video quality There is a technique for deriving a video quality evaluation value from the above (for example, see Non-Patent Document 2).

また、従来の客観品質評価法の１つに、符号化前の原映像を用いず、符号化後の劣化映像を入力とし、この劣化映像信号から映像品質に影響を与える特徴量を導出し、映像品質評価値を導出する技術がある（例えば、非特許文献３参照）。 In addition, in one of the conventional objective quality evaluation methods, the original video before encoding is not used, but the deteriorated video after encoding is input, and the feature quantity that affects the video quality is derived from this deteriorated video signal, There is a technique for deriving a video quality evaluation value (see, for example, Non-Patent Document 3).

さらに、従来の客観品質評価法の１つに、送信されたパケットを入力とし、これらパケットから映像品質に影響を与える特徴量を導出し、映像品質評価値を導出する技術がある（例えば、非特許文献４、５参照）。 Further, as one of the conventional objective quality evaluation methods, there is a technique that receives transmitted packets as input, derives a feature quantity that affects video quality from these packets, and derives a video quality evaluation value (for example, (See Patent Documents 4 and 5).

従来の客観品質評価法の多くは、上記のように、パケットや映像信号（画素値）を用いて映像品質評価値を推定するものであった。 Many conventional objective quality evaluation methods estimate a video quality evaluation value using a packet or a video signal (pixel value) as described above.

ITU-T勧告P.910ITU-T recommendation P.910 ITU-T勧告J.247ITU-T recommendation J.247 J. Yang, H. Choi, and T. Kim, "Noise estimation for blocking artifacts reduction in DCT coded images," IEEE Trans. on CSVT, vol. 10, no. 7, pp. 1116-1134, Oct. 2000.J. Yang, H. Choi, and T. Kim, "Noise estimation for blocking artifacts reduction in DCT coded images," IEEE Trans. On CSVT, vol. 10, no. 7, pp. 1116-1134, Oct. 2000. K. Yamagishi and T. Hayashi, "Non-intrusive Packet-layer Model for Monitoring Video Quality of IPTV Services," IEICE Trans. Fundamentals, vol. E92-A, no. 12, pp. 3297--3306, Dec. 2009.K. Yamagishi and T. Hayashi, "Non-intrusive Packet-layer Model for Monitoring Video Quality of IPTV Services," IEICE Trans. Fundamentals, vol. E92-A, no. 12, pp. 3297--3306, Dec. 2009 . K. Watanabe, K. Yamagishi, J. Okamoto, and A. Takahashi, "Proposal of new QoE assessment approach for quality management of IPTV services," IEEE ICIP 2008, pp. 2060--2063, Oct. 2008.K. Watanabe, K. Yamagishi, J. Okamoto, and A. Takahashi, "Proposal of new QoE assessment approach for quality management of IPTV services," IEEE ICIP 2008, pp. 2060-2063, Oct. 2008. Stephane Pechard, Dominique Barba and Patrick Le Callet: Video Quality Model based on a spatio-temporal features extraction for H.264-coded HDTV sequences, in Proceedings of the IEEE Picture Coding Symposium, PCS2007, 2007.Stephane Pechard, Dominique Barba and Patrick Le Callet: Video Quality Model based on a spatio-temporal features extraction for H.264-coded HDTV sequences, in Proceedings of the IEEE Picture Coding Symposium, PCS2007, 2007.

しかしながら、非特許文献２の技術は、原映像信号を用いることを前提としているため、一度、符号化された映像と再符号化（トランスコード）した映像のみを用いて、トランスコード後の映像の品質を評価する場合に、品質評価精度が著しく低下するといった問題がある。 However, since the technique of Non-Patent Document 2 is based on the premise that the original video signal is used, only the encoded video and the re-encoded (transcoded) video are used to transcode the video after transcoding. When evaluating quality, there is a problem that the quality evaluation accuracy is remarkably lowered.

具体的には、トランスコード前の映像は、原映像を符号化しているため、ブロックノイズ、ブラー（ぼけ）ノイズ、モスキートノイズ、フリッカノイズなどのノイズを含んでいる。そのため、本来、原画像を利用できた際には、抽出できたエッジが抽出できないなどの問題が発生し、品質評価精度が著しく低下するといった問題がある。 Specifically, since the video before transcoding encodes the original video, it includes noise such as block noise, blur noise, mosquito noise, and flicker noise. For this reason, when the original image can be used, there is a problem that the extracted edge cannot be extracted, and the quality evaluation accuracy is significantly lowered.

また、非特許文献３，４，５技術は、受信映像、受信パケットを用いて映像の品質を評価する技術であるが、受信映像や受信パケットのみを用い、再符号化する前の符号化映像を用いることができないため、再符号化前の符号化映像の品質低下を捉えることができないといった問題があった。 Non-Patent Documents 3, 4, and 5 are techniques for evaluating video quality using received video and received packets, but using only received video and received packets, encoded video before re-encoding. Therefore, there has been a problem that it is impossible to catch the deterioration in quality of the encoded video before re-encoding.

本発明は、上記の点に鑑みなされたもので、上述の問題を解決すべく、トランスコードの前後の符号化映像から導出された特徴量から映像品質評価値を導出することで、トランスコード映像に対して、映像品質値を高精度に推定することが可能なトランスコード映像品質客観評価装置及び方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and in order to solve the above-described problem, a transcoded video is obtained by deriving a video quality evaluation value from a feature amount derived from encoded video before and after transcoding. On the other hand, an object of the present invention is to provide a transcoded video quality objective evaluation apparatus, method and program capable of estimating a video quality value with high accuracy.

図１は、本発明の原理構成図である。 FIG. 1 is a principle configuration diagram of the present invention.

本発明（請求項１）は、トランスコードされた映像の品質を客観的に評価するトランスコード映像品質客観評価装置であって、
入力されたトランスコード前の符号化映像からエッジ量及びテクスチャ量を導出するエッジ・テクスチャ量抽出手段１００と、
エッジ・テクスチャ量抽出手段から導出されたエッジ量とテクスチャ量から導出される統計量であって、該エッジ量と該テクスチャ量の映像フレームあたりの比率に対する、全映像フレームの平均値、ある一定区間毎の平均値の集合の最大値及び標準偏差値である特徴量または、トランスコード前の符号化映像から導出される特徴量である動き量の少なくとも１つの第１の特徴量を導出する第１の特徴量抽出手段２００と、
入力されたトランスコード前の符号化映像とトランスコード後のトランスコード映像の水平方向と垂直方向の画素数を用いた差異情報から第２の特徴量を導出する第２の特徴量抽出手段４００と、
第１の特徴量抽出手段より出力された第１の特徴量より、入力されたトランスコード前の符号化映像の映像品質評価値を第１の映像品質評価値として導出する第１の映像品質手段３００と、
第１の映像品質評価値と第２の特徴量抽出手段４００から導出された第２の特徴量からトランスコード後のトランスコード映像の品質評価値を示す第２の映像品質評価値を導出する第２の映像品質推定手段５００と、を備える。 The present invention (Claim 1) is a transcoded video quality objective evaluation apparatus for objectively evaluating the quality of transcoded video,
Edge / texture amount extraction means 100 for deriving an edge amount and a texture amount from the input encoded video before transcoding;
A statistical amount derived from the edge amount and the texture amount derived from the edge / texture amount extraction means , and the average value of all the video frames with respect to the ratio of the edge amount and the texture amount per video frame, a certain interval A first feature amount that derives at least one first feature amount of a feature amount that is a maximum value and a standard deviation value of a set of average values for each time or a motion amount that is a feature amount derived from an encoded video before transcoding; Feature quantity extraction means 200;
Second feature quantity extraction means 400 for deriving a second feature quantity from difference information using the number of pixels in the horizontal and vertical directions of the input encoded video before transcoding and transcoded video after transcoding; ,
First video quality means for deriving, as a first video quality evaluation value, a video quality evaluation value of the input encoded video before transcoding from the first feature quantity output from the first feature quantity extraction means 300,
A second video quality evaluation value indicating a quality evaluation value of the transcoded video after transcoding is derived from the first video quality evaluation value and the second feature quantity derived from the second feature quantity extraction unit 400. 2 video quality estimation means 500.

本発明（請求項２）は、トランスコードされた映像の品質を客観的に評価するトランスコード映像品質客観評価装置であって、
入力されたトランスコード前の符号化映像からエッジ量及びテクスチャ量を導出するエッジ・テクスチャ量抽出手段と、
エッジ・テクスチャ量抽出手段から導出されたエッジ量とテクスチャ量から導出される統計量であって、該エッジ量と該テクスチャ量の映像フレームあたりの比率に対する、全映像フレームの平均値、ある一定区間毎の平均値の集合の最大値及び標準偏差値である特徴量または、トランスコード前の符号化映像から導出される特徴量である動き量の少なくとも１つの第１の特徴量を導出する第１の特徴量抽出手段と、
入力されたトランスコード前の符号化映像とトランスコード後のトランスコード映像の水平方向と垂直方向の画素数を用いた差異情報から第２の特徴量を導出する第２の特徴量抽出手段と、
第１の特徴量抽出手段より導出された第１の特徴量と第２の特徴量抽出手段から導出された第２の特徴量からトランスコード後のトランスコード映像の品質評価値を示す第２の映像品質評価値を導出する第２の映像品質推定手段と、を備える。 The present invention (Claim 2) is a transcoded video quality objective evaluation device that objectively evaluates the quality of transcoded video,
Edge / texture amount extraction means for deriving an edge amount and a texture amount from the input encoded video before transcoding;
A statistical amount derived from the edge amount and the texture amount derived from the edge / texture amount extraction means , and the average value of all the video frames with respect to the ratio of the edge amount and the texture amount per video frame, a certain interval A first feature amount that derives at least one first feature amount of a feature amount that is a maximum value and a standard deviation value of a set of average values for each time or a motion amount that is a feature amount derived from an encoded video before transcoding; Feature amount extraction means;
Second feature quantity extraction means for deriving a second feature quantity from difference information using the number of pixels in the horizontal and vertical directions of the input encoded video before transcoding and transcoded video after transcoding;
A second evaluation value indicating the quality evaluation value of the transcoded video after transcoding from the first feature quantity derived from the first feature quantity extraction means and the second feature quantity derived from the second feature quantity extraction means. Second video quality estimation means for deriving a video quality evaluation value.

また、本発明（請求項４）は、請求項１または２記載の第１の特徴量抽出手段において、
エッジ・テクスチャ量抽出手段より映像フレーム単位に抽出されたエッジ量及びテクスチャ量に基づき、推定対象の映像の全映像フレームに対するエッジ量の合計値と、同様に、推定対象の映像の全映像フレームに対するテクスチャ量の合計値を算出し、該エッジ量の合計値を該テクスチャ量の合計値で除算した映像フレーム平均特徴量を導出する映像フレーム平均特徴量導出手段を含む。 Further, the present invention (Claim 4) is the first feature amount extraction means according to Claim 1 or 2,
Based on the edge amount and texture amount extracted in units of video frames by the edge / texture amount extraction means, the total value of the edge amounts for all video frames of the estimation target video, and similarly, for all video frames of the estimation target video Video frame average feature quantity deriving means for calculating a total value of the texture quantities and deriving an average video frame feature quantity obtained by dividing the total value of the edge quantities by the total value of the texture quantities is included.

また、本発明（請求項５）は、請求項１、２、または４記載の第１の特徴量抽出手段において、
エッジ・テクスチャ量抽出手段より映像フレーム単位に抽出されたエッジ量及びテクスチャ量に基づき、映像フレーム単位にエッジ量の合計値をテクスチャ量の合計値で除算した映像フレーム特徴量を導出する映像フレーム特徴量導出手段と、
推定対象の映像のある一定区間の複数の映像フレームに対し、映像フレーム特徴量導出手段から導出された映像フレーム特徴量を合計し、映像フレーム数で除算した平均特徴量を導出する平均特徴量導出手段と、
平均特徴量導出手段より導出された推定対象の映像のある一定区間の複数の映像フレームの平均特徴量に対し、推定対象の全区間の複数の平均特徴量の最大の値を示す最大特徴量を導出する最大特徴量導出手段と、
平均特徴量導出手段より導出された推定対象の映像のある一定区間の複数の映像フレームの平均特徴量に対し、推定対象の全区間の複数の平均特徴量の標準偏差をとった標準偏差特徴量を導出する標準偏差特徴量導出手段と、を更に有する。 Further, the present invention (Claim 5) is the first feature amount extraction means according to Claim 1, 2, or 4,
A video frame feature for deriving a video frame feature amount obtained by dividing the total value of the edge amount by the total value of the texture amount for each video frame based on the edge amount and the texture amount extracted for each video frame by the edge / texture amount extraction means A quantity derivation means;
Deriving an average feature value by deriving an average feature value obtained by summing the video frame feature values derived from the video frame feature value deriving means for a plurality of video frames in a certain section of the video to be estimated and dividing by the number of video frames Means,
The maximum feature amount indicating the maximum value of the plurality of average feature amounts of all the estimation target sections is calculated with respect to the average feature amount of a plurality of video frames in a certain section of the estimation target video derived by the average feature amount deriving unit. A maximum feature amount deriving means for deriving;
A standard deviation feature value obtained by taking a standard deviation of a plurality of average feature values in all sections to be estimated with respect to an average feature value of a plurality of video frames in a certain section of the estimation target video derived by the average feature quantity deriving means. And standard deviation feature quantity deriving means for deriving.

また、本発明（請求項６）は、請求項１，２，４、又は５の第１の特徴量抽出手段において、
入力されたトランスコード前の符号化映像から映像の動きを示す動き量を導出する動き量導出手段を更に有する。 Further, the present invention (Claim 6) is the first feature amount extraction means according to Claims 1, 2, 4, or 5,
It further has a motion amount deriving means for deriving a motion amount indicating the motion of the video from the input encoded video before transcoding.

また、本発明（請求項７）は、請求項１また２記載の第２の特徴量抽出手段において、
入力されたトランスコード前の符号化映像とトランスコード後のトランスコード映像からPeak-signal-to-noise ratio（PSNR）を導出するPSNR導出手段を含む。 Further, the present invention (Claim 7) is the second feature amount extraction means according to Claim 1 or 2,
PSNR deriving means for deriving a peak-signal-to-noise ratio (PSNR) from the input encoded video before transcoding and transcoded video after transcoding is included.

図２は、本発明の原理を説明するための図である。 FIG. 2 is a diagram for explaining the principle of the present invention.

本発明（請求項８）は、トランスコードされた映像の品質を客観的に評価するトランスコード映像品質客観評価方法であって、
トランスコード映像品質客観評価装置が、
入力されたトランスコード前の符号化映像からエッジ量及びテクスチャ量を導出するエッジ・テクスチャ量抽出ステップ（ステップ１）と、
エッジ・テクスチャ量抽出ステップから導出されたエッジ量とテクスチャ量から導出される統計量であって、該エッジ量と該テクスチャ量の映像フレームあたりの比率に対する、全映像フレームの平均値、ある一定区間毎の平均値の集合の最大値及び標準偏差値である特徴量または、トランスコード前の符号化映像から導出される特徴量である動き量の少なくとも１つの第１の特徴量を導出する第１の特徴量抽出ステップ（ステップ２）と、
入力されたトランスコード前の符号化映像とトランスコード後のトランスコード映像の水平方向と垂直方向の画素数を用いた差異情報から第２の特徴量を導出する第２の特徴量抽出ステップ（ステップ３）と、
第１の特徴量抽出ステップより出力された第１の特徴量より、入力されたトランスコード前の符号化映像の映像品質評価値を第１の映像品質評価値として導出する第１の映像品質推定ステップ（ステップ４）と、
第１の映像品質評価値と第２の特徴量抽出ステップから導出された第２の特徴量からトランスコード後のトランスコード映像の品質評価値を示す第２の映像品質評価値を導出する第２の映像品質推定ステップ（ステップ５）と、を行う。 The present invention (Claim 8) is a transcoded video quality objective evaluation method for objectively evaluating the quality of transcoded video,
Transcode video quality objective evaluation device
An edge / texture amount extraction step (step 1) for deriving an edge amount and a texture amount from the input encoded video before transcoding;
A statistical amount derived from the edge amount and the texture amount derived from the edge / texture amount extraction step , and an average value of all the video frames with respect to the ratio of the edge amount and the texture amount per video frame, a certain interval A first feature amount that derives at least one first feature amount of a feature amount that is a maximum value and a standard deviation value of a set of average values for each time or a motion amount that is a feature amount derived from an encoded video before transcoding; Feature amount extraction step (step 2),
A second feature amount extraction step (step for deriving a second feature amount from difference information using the number of pixels in the horizontal direction and the vertical direction of the input encoded video before transcoding and transcoded video after transcoding 3) and
First video quality estimation for deriving, as a first video quality evaluation value, a video quality evaluation value of an input encoded video before transcoding from the first feature quantity output from the first feature extraction step Step (step 4);
A second video quality evaluation value indicating a quality evaluation value of the transcoded video after transcoding is derived from the first video quality evaluation value and the second feature quantity derived from the second feature quantity extraction step. The video quality estimation step (step 5) is performed.

本発明（請求項９）は、トランスコードされた映像の品質を客観的に評価するトランスコード映像品質客観評価方法であって、
トランスコード映像品質客観評価装置が、
入力されたトランスコード前の符号化映像からエッジ量及びテクスチャ量を導出するエッジ・テクスチャ量抽出ステップと、
エッジ・テクスチャ量抽出ステップから導出されたエッジ量とテクスチャ量から導出される統計量であって、該エッジ量と該テクスチャ量の映像フレームあたりの比率に対する、全映像フレームの平均値、ある一定区間毎の平均値の集合の最大値及び標準偏差値である特徴量または、トランスコード前の符号化映像から導出される特徴量である動き量の少なくとも１つの第１の特徴量を導出する第１の特徴量抽出ステップと、
入力されたトランスコード前の符号化映像とトランスコード後のトランスコード映像の水平方向と垂直方向の画素数を用いた差異情報から第２の特徴量を導出する第２の特徴量抽出ステップと、
第１の特徴量抽出ステップより導出された第１の特徴量と第２の特徴量抽出ステップから導出された第２の特徴量からトランスコード後のトランスコード映像の品質評価値を示す第２の映像品質評価値を導出する第２の映像品質推定ステップと、を行う。 The present invention (claim 9) is a transcoded video quality objective evaluation method for objectively evaluating the quality of transcoded video,
Transcode video quality objective evaluation device
An edge / texture amount extraction step for deriving an edge amount and a texture amount from the input encoded video before transcoding;
A statistical amount derived from the edge amount and the texture amount derived from the edge / texture amount extraction step , and an average value of all the video frames with respect to the ratio of the edge amount and the texture amount per video frame, a certain interval A first feature amount that derives at least one first feature amount of a feature amount that is a maximum value and a standard deviation value of a set of average values for each time or a motion amount that is a feature amount derived from an encoded video before transcoding; A feature extraction step of
A second feature amount extraction step of deriving a second feature amount from difference information using the number of pixels in the horizontal direction and the vertical direction of the input encoded video before transcoding and transcoded video after transcoding;
A second evaluation value indicating a transcoded video quality evaluation value from the first feature value derived from the first feature value extraction step and the second feature value derived from the second feature value extraction step. And a second video quality estimation step for deriving a video quality evaluation value.

また、本発明（請求項１０）は、請求項８または９のエッジ・テクスチャ量抽出ステップにおいて、
入力されたトランスコード前の符号化映像をキャプチャしエッジ画像を作成し、エッジ画像の各画素の値に基づきクラスタリングを実施し、クラスタリング画像を導出するクラスタリングステップと、
クラスタリングステップにより導出されたクラスタリング画像をフィルタリングし、フィルタリング画像を作成するフィルタリングステップと、
クラスタリングにより導出されたクラスタリング画像とフィルタリングステップから導出されたフィルタリング画像の差分値から差分クラスタリング画像を作成する差分クラスタリング画像導出ステップと、
差分クラスタリング画像導出ステップにより導出された差分クラスタリング画像のエッジ量とテクスチャ量を映像フレーム単位にカウントするカウントステップと、を行う。 Further, according to the present invention (Claim 10), in the edge texture amount extraction step of Claim 8 or 9,
A clustering step of capturing an input encoded video before transcoding to create an edge image, performing clustering based on the value of each pixel of the edge image, and deriving a clustered image;
Filtering the clustering image derived by the clustering step to create a filtered image;
A difference clustering image derivation step for creating a difference clustering image from a difference value between the clustering image derived by clustering and the filtering image derived from the filtering step;
The counting step of counting the edge amount and the texture amount of the difference clustering image derived by the difference clustering image deriving step for each video frame is performed.

また、本発明（請求項１１）は、請求項８または９の第１の特徴量抽出ステップにおいて、
エッジ・テクスチャ量抽出ステップより映像フレーム単位に抽出されたエッジ量及びテクスチャ量に基づき、推定対象の映像の全映像フレームに対するエッジ量の合計値と、同様に、推定対象の映像の全映像フレームに対するテクスチャ量の合計値を算出し、該エッジ量の合計値を該テクスチャ量の合計値で除算した映像フレーム平均特徴量を導出する映像フレーム平均特徴量導出ステップを行う。 The present invention (Claim 11) is characterized in that, in the first feature amount extraction step of Claim 8 or 9,
Based on the edge amount and texture amount extracted for each video frame in the edge / texture amount extraction step, the total value of the edge amount for all video frames of the estimation target video, and similarly for all video frames of the estimation target video A video frame average feature quantity derivation step is performed in which a total value of the texture quantities is calculated, and a video frame average feature quantity is derived by dividing the total edge quantity by the total texture quantity.

また、本発明（請求項１２）は、請求項８、９、または１１の第１の特徴量抽出ステップにおいて、
エッジ・テクスチャ量抽出ステップより映像フレーム単位に抽出されたエッジ量及びテクスチャ量に基づき、映像フレーム単位にエッジ量の合計値をテクスチャ量の合計値で除算した映像フレーム特徴量を導出する映像フレーム特徴量導出ステップと、
推定対象の映像のある一定区間の複数の映像フレームに対し、映像フレーム特徴量導出ステップから導出された映像フレーム特徴量を合計し、映像フレーム数で除算した平均特徴量を導出する平均特徴量導出ステップと、
平均特徴量導出ステップより導出された推定対象の映像のある一定区間の複数の映像フレームの平均特徴量に対し、推定対象の全区間の複数の平均特徴量の最大の値を示す最大特徴量を導出する最大特徴量導出ステップと、
平均特徴量導出ステップより導出された推定対象の映像のある一定区間の複数の映像フレームの平均特徴量に対し、推定対象の全区間の複数の平均特徴量の標準偏差をとった標準偏差特徴量を導出する標準偏差特徴量導出ステップと、を更に行う。 Further, the present invention (Claim 12) is characterized in that, in the first feature amount extraction step according to Claim 8, 9, or 11,
A video frame feature that derives a video frame feature amount by dividing the total edge amount by the total texture amount for each video frame based on the edge amount and the texture amount extracted for each video frame in the edge / texture amount extraction step. A quantity derivation step;
Deriving an average feature value by summing up the video frame feature values derived from the video frame feature value deriving step for a plurality of video frames in a certain section of the video to be estimated, and deriving an average feature value divided by the number of video frames Steps,
The maximum feature quantity indicating the maximum value of the plurality of average feature quantities of all the estimation target sections is calculated with respect to the average feature quantity of a plurality of video frames in a certain section of the estimation target video derived from the average feature quantity deriving step. A maximum feature amount derivation step to be derived;
The standard deviation feature value obtained by taking the standard deviation of the multiple average feature values of all the estimation target sections from the average feature value of multiple video frames in a certain section of the estimation target video derived from the average feature value deriving step. And a standard deviation feature quantity derivation step for deriving.

また、本発明（請求項１３）は、請求項８，９，１１、または、１２の第１の特徴量抽出ステップにおいて、
入力されたトランスコード前の符号化映像から映像の動きを示す動き量を導出する動き量導出ステップを更に行う。 Further, the present invention (Claim 13) is the first feature amount extraction step according to Claim 8, 9, 11 or 12,
A motion amount deriving step for deriving a motion amount indicating the motion of the video from the input encoded video before transcoding is further performed.

また、本発明（請求項１４）は、請求項８または９の第２の特徴量抽出ステップにおいて、
入力されたトランスコード前の符号化映像とトランスコード後のトランスコード映像からPeak-signal-to-noise ratio（PSNR）を導出するPSNR導出ステップを行う。 The present invention (Claim 14) is characterized in that, in the second feature amount extraction step of Claim 8 or 9,
A PSNR deriving step of deriving a peak-signal-to-noise ratio (PSNR) from the input encoded video before transcoding and transcoded video after transcoding is performed.

本発明（請求項１５）は、請求項１乃至７のいずれか1項に記載のトランスコード映像品質客観評価装置を構成する各手段としてコンピュータを機能させるためのトランスコード映像品質客観評価プログラムである。 The present invention (Claim 15) is a transcoded video quality objective evaluation program for causing a computer to function as each means constituting the transcoded video quality objective evaluation apparatus according to any one of claims 1 to 7. .

従来、原映像とトランスコード映像を用いブロックノイズ、ブラー（ぼけ）ノイズ、モスキートノイズ、フリッカノイズなどを抽出していた技術では、トランスコード前の符号化された原映像が入力となるため、ブロックノイズ、ブラー（ぼけ）ノイズ、モスキートノイズ、フリッカノイズなど適切に抽出できないために映像品質推定精度が著しく低下していた。また、受信映像や受信パケットのみを用い映像品質を推定する技術では、トランスコードする前の符号化映像を用いることができないため、トランスコード前の符号化映像の品質低下を捉えることできないため、映像品質推定精度が著しく低下していた。 Conventionally, block noise, blur noise, mosquito noise, flicker noise, etc., which are extracted from the original video and transcoded video, the encoded original video before transcoding is input. Noise, blur (blurred) noise, mosquito noise, flicker noise, etc. cannot be extracted properly, and the video quality estimation accuracy has been significantly reduced. In addition, in the technology that estimates the video quality using only the received video and the received packet, the encoded video before transcoding cannot be used, so the deterioration of the quality of the encoded video before transcoding cannot be captured. The quality estimation accuracy was significantly reduced.

これに対し、本発明によれば、トランスコード前の符号化映像の品質を第１の映像品質評価値として導出し、また、符号化映像とトランスコード映像の両者から特徴量を導出し、第１の映像品質評価値と符号化映像とトランスコード映像から導出された特徴量より第２の映像品質評価値を導出することで、トランスコードされた映像に対して適切に品質評価値を推定することができる。 On the other hand, according to the present invention, the quality of the encoded video before transcoding is derived as the first video quality evaluation value, the feature quantity is derived from both the encoded video and the transcoded video, A quality evaluation value is appropriately estimated for the transcoded video by deriving the second video quality evaluation value from the video quality evaluation value of 1, the feature amount derived from the encoded video and the transcoded video. be able to.

したがって、映像通信サービスの提供者は、本発明によりユーザが実際に視聴する映像通信サービスの映像について映像品質値を監視可能となるため、提供中のサービスがユーザに対してある一定以上の品質を保っているか否かを容易に判断することができる。 Accordingly, the video communication service provider can monitor the video quality value of the video of the video communication service actually viewed by the user according to the present invention. It is possible to easily determine whether or not it is maintained.

このため、映像通信サービスの提供者は、提供中のサービスの品質実態を従来より詳細に把握・管理することが可能となる。 Therefore, the video communication service provider can grasp and manage the actual quality of the service being provided in more detail than before.

本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の第１の実施の形態における映像品質客観評価装置の構成図である。It is a block diagram of the video quality objective evaluation apparatus in the 1st Embodiment of this invention. 本発明の第１の実施の形態における領域分割方法を示す図である。It is a figure which shows the area | region division method in the 1st Embodiment of this invention. 本発明の第１の実施の形態における差分クラスタリング画像の導出過程を概念的に説明するための図である。It is a figure for demonstrating notionally the derivation process of the difference clustering image in the 1st Embodiment of this invention. 本発明の第１の実施の形態における映像品質客観評価装置の動作のフローチャートである。It is a flowchart of operation | movement of the video quality objective evaluation apparatus in the 1st Embodiment of this invention. 本発明の第２の実施の形態における映像品質客観評価装置の構成図である。It is a block diagram of the video quality objective evaluation apparatus in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における映像品質客観評価装置の動作のフローチャートである。It is a flowchart of operation | movement of the video quality objective evaluation apparatus in the 2nd Embodiment of this invention.

以下図面と共に、本発明の実施の形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

［第１の実施の形態］
本発明の本実施の形態にかかるトランスコード映像品質客観評価装置は、符号化映像にクラスタリング処理を施し、クラスタリング画像を作成し、そのクラスタリング画像から特徴量を導出し、トランスコード前の符号化映像の映像品質を定量的に表した第１の映像品質評価値を導出し、次に、符号化映像とトランスコード映像を用いPSNR（Peak Signal to Noise Ratio）を導出し、第１の映像品質評価値とPSNRからトランスコード映像の品質を示す第２の映像品質評価値（トランスコード映像品質評価値）を客観的な評価で実現するものである。 [First Embodiment]
The transcoded video quality objective evaluation device according to this embodiment of the present invention performs clustering processing on an encoded video, creates a clustered image, derives a feature amount from the clustered image, and encodes the encoded video before transcoding. The first video quality evaluation value that quantitatively represents the video quality of the video is derived, then the PSNR (Peak Signal to Noise Ratio) is derived using the encoded video and the transcoded video, and the first video quality evaluation The second video quality evaluation value (transcoded video quality evaluation value) indicating the quality of the transcoded video from the value and the PSNR is realized by objective evaluation.

例えば、本実施の形態においては、インターネットのようなIPネットワーク経由で行うIPTVサービス、映像配信サービスなど映像通信における客観的な映像品質評価を実現するために、トランスコード前の符号化映像とトランスコード後のトランスコード映像を分析し、これらの映像通信に関わる映像品質に影響を与える特徴量を定量的に表したトランスコード映像品質値を導出する。 For example, in this embodiment, in order to realize objective video quality evaluation in video communication such as IPTV service and video distribution service performed via an IP network such as the Internet, encoded video and transcode before transcoding are used. The later transcoded video is analyzed, and a transcoded video quality value that quantitatively represents a feature quantity that affects the video quality related to the video communication is derived.

図３は、本発明の一実施の形態における映像品質客観評価装置の構成を示す。 FIG. 3 shows a configuration of the video quality objective evaluation apparatus according to the embodiment of the present invention.

同図に示すように、映像品質客観評価装置１は、エッジ・テクスチャ量抽出部１００と、第１の特徴量抽出部２００と、第１の映像品質推定部３００と、第２の特徴量抽出部４００と、第２の映像品質推定部５００とから構成されている。 As shown in the figure, the video quality objective evaluation apparatus 1 includes an edge / texture amount extraction unit 100, a first feature amount extraction unit 200, a first video quality estimation unit 300, and a second feature amount extraction. Unit 400 and a second video quality estimation unit 500.

エッジ・テクスチャ量抽出部１００は、クラスタリング部１０と、フィルタリング部１１と、差分クラスタリング画像導出部１２と、カウント部１３とから構成されている。 The edge / texture amount extraction unit 100 includes a clustering unit 10, a filtering unit 11, a difference clustering image deriving unit 12, and a counting unit 13.

第１の特徴量抽出部２００は、映像フレーム平均特徴量導出部１４と、映像フレーム特徴量導出部１５と、平均特徴量導出部１６、最大特徴量導出部１７と、標準偏差特徴量導出部１８と、動き量導出部１９とから構成されている。 The first feature quantity extraction unit 200 includes a video frame average feature quantity derivation section 14, a video frame feature quantity derivation section 15, an average feature quantity derivation section 16, a maximum feature quantity derivation section 17, and a standard deviation feature quantity derivation section. 18 and a motion amount deriving unit 19.

クラスタリング部１０は、入力されたトランスコード前の符号化映像I(x,y,f)に、前述の非特許文献６の手法に基づきSobelフィルタFx、Fyをかけ、垂直方向（Y方向）のエッジ画像、水平方向（X方向）のエッジ画像Sx(x,y,f)及びSy(x,y,f)を導出する。具体的には、対象画素の隣接８画素の情報を用い、以下の式（１）に基づきエッジ画像を作成する。 The clustering unit 10 applies Sobel filters Fx and Fy to the input encoded video I (x, y, f) before transcoding based on the method of Non-Patent Document 6 described above, and performs the vertical direction (Y direction). Edge images, edge images Sx (x, y, f) and Sy (x, y, f) in the horizontal direction (X direction) are derived. Specifically, an edge image is created based on the following formula (1) using information on eight pixels adjacent to the target pixel.

各画素に対し導出された垂直及び水平方向のエッジ値を、図４に示す領域分割方法で１〜４の４つのクラスに分類されたクラスタリング画像（クラスタリング画像はＣ１〜Ｃ４に対応するクラスタリング値１〜４の値を持つ）を作成する。

Clustering images in which the vertical and horizontal edge values derived for each pixel are classified into four classes 1 to 4 by the region dividing method shown in FIG. 4 (the clustering image is a clustering value 1 corresponding to C1 to C4). With a value of ~ 4).

フィルタリング部１１は、クラスタリング部１０より出力されたクラスタリング画像に図５に示すようにErosionフィルタ及びDilatationフィルタをかけフィルタリング画像を作成する。図５は、４つのクラスにクラスタリングされた画像（８×８画素の画像）に対して、パディング無し（Ａ）と、パディング有り（Ｂ）の例を示している。 The filtering unit 11 creates a filtered image by applying an erosion filter and a dilatation filter to the clustered image output from the clustering unit 10 as shown in FIG. FIG. 5 shows an example of no padding (A) and padding (B) for images clustered into four classes (8 × 8 pixel images).

具体的には、クラスタリング画像C（x,y）（xは水平画素位置、yは垂直画素位置）に以下の式（２）に基づきErosionフィルタを施し、Erosion画像E(x,y)を作成する。次に、以下の式（３）に基づきErosion画像にDilatationフィルタを施し、Dilatation画像D(x,y)（つまり、フィルタリング画像）を作成する。２種類のフィルタリングをかけることで、クラスタリング画像C(x,y)を平滑化したフィルタリング画像D(x,y)ができることになり、両者の差分からＣ３とＣ４を適切に抽出できる。 Specifically, Erosion filter is applied to clustering image C (x, y) (x is horizontal pixel position, y is vertical pixel position) based on the following formula (2) to create Erosion image E (x, y) To do. Next, a dilatation filter is applied to the Erosion image based on the following formula (3) to create a dilatation image D (x, y) (that is, a filtered image). By applying two types of filtering, a filtered image D (x, y) obtained by smoothing the clustering image C (x, y) can be obtained, and C3 and C4 can be appropriately extracted from the difference between the two.

ただし、画面の端の画素に対しては、画面外に画素がないため、画面の端の画素値をパディングするか、もしくは画面の端の画素を処理しないことで対応する。

However, since there is no pixel outside the screen, the pixel at the edge of the screen is dealt with by padding the pixel value at the edge of the screen or not processing the pixel at the edge of the screen.

差分クラスタリング画像導出部１２は、クラスタリング部１０より出力されたクラスタリング画像C(x,y)と、フィルタリング部１１より出力されたフィルタリング画像D(x,y)との差分値（差分クラスタリング画像：C(x,y)-D(x,y)+1、もし、C(x,y)-D(x,y)+1が1未満の場合は1とする）を導出する。 The difference clustering image derivation unit 12 is a difference value between the clustering image C (x, y) output from the clustering unit 10 and the filtering image D (x, y) output from the filtering unit 11 (difference clustering image: C (x, y) -D (x, y) +1, and if C (x, y) -D (x, y) +1 is less than 1, then 1) is derived.

カウント部１３は、差分クラスタリング画像の中で、クラスタリング値が３（エッジ量）と示された画素数ＮＣ３と、クラスタリング値が４（テクスチャ量）と示された画素数ＮＣ４をカウントする。 The counting unit 13 counts the number of pixels NC3 in which the clustering value is 3 (edge amount) and the number of pixels NC4 in which the clustering value is 4 (texture amount) in the difference clustering image.

映像フレーム平均特徴量導出部１４は、以下の式（４）に示すように、カウント部１３より出力された映像フレーム毎のＮＣ３を全映像フレームにわたって加算し映像フレーム数で除算した値を、映像フレーム毎のＮＣ４を全映像フレームにわたって加算し映像フレーム数で除算した値で除算した映像フレーム平均特徴量Ｐを導出する。 As shown in the following equation (4), the video frame average feature amount deriving unit 14 adds the NC3 for each video frame output from the counting unit 13 over all video frames and divides the value by the number of video frames. An image frame average feature amount P obtained by adding NC4 for each frame over all the image frames and dividing by the value obtained by dividing by the number of image frames is derived.

ここで、fは映像フレーム番号、Ｆは総映像フレーム数（例えば、推定対象の映像が３０ｆｐｓ、１０秒分の場合、３００フレームとなる）を示す。

Here, f indicates the video frame number, and F indicates the total number of video frames (for example, if the estimation target video is 30 fps and 10 seconds, it is 300 frames).

映像フレーム特徴量導出部１５は、以下の式（５）に示すように、映像フレーム毎に、カウント部１３より出力された映像フレーム毎のＮＣ３を、映像フレーム毎のＮＣ４で除算した映像フレーム特徴量P（f）を以下の式（５）に基づき導出する。 The video frame feature quantity deriving unit 15, as shown in the following equation (5), for each video frame, the video frame feature obtained by dividing NC3 for each video frame output from the counting unit 13 by NC4 for each video frame. The quantity P (f) is derived based on the following equation (5).

ここで、fは映像フレーム番号を示す。

Here, f indicates a video frame number.

平均特徴量導出部１６は、ある一定の区間（Ｚフレーム、例えば、１０フレーム）を一区間とし、以下の式（６）により、その区間の映像フレーム特徴量P（f）の平均値を平均特徴量Pt（k）とする。 The average feature amount deriving unit 16 sets a certain section (Z frame, for example, 10 frames) as one section, and averages the average value of the video frame feature amounts P (f) in the section by the following equation (6). The feature amount is Pt (k).

ここで、fは映像フレーム番号、ｋは平均区間の番号（例えば、推定対象の映像が３００フレームから構成され、Ｚが１０フレームの場合、ｋは０〜２９の値となる）を示す。

Here, f represents a video frame number, and k represents an average interval number (for example, when the estimation target video is composed of 300 frames and Z is 10 frames, k is a value from 0 to 29).

最大特徴量導出部１７は、平均特徴量導出部１６より出力された平均特徴量Pt（k）の中で最大の値を最大特徴量MaxPtとして導出する（例えば、Pt（0）〜Pt（29）の中で、Pt（10）が最大であった場合、Pt（10）がMaxPtとなる）。 The maximum feature amount deriving unit 17 derives the maximum value among the average feature amounts Pt (k) output from the average feature amount deriving unit 16 as the maximum feature amount MaxPt (for example, Pt (0) to Pt (29 ), If Pt (10) is the maximum, Pt (10) becomes MaxPt).

標準偏差特徴量導出部１８は、平均特徴量導出部１６より出力された平均特徴量Pt（k）の標準偏差を標準偏差特徴量StdPtとして導出する。

The standard deviation feature quantity derivation unit 18 derives the standard deviation of the average feature quantity Pt (k) output from the average feature quantity derivation unit 16 as the standard deviation feature quantity StdPt.

ここで、Stdevは、標準偏差を計算する演算子を表す。

Here, Stdev represents an operator for calculating a standard deviation.

動き量導出部１９は、映像フレーム間の動きを示す動き量TIを導出する。上記で求められたP、MaxPt、StdPtは映像フレーム単位に導出される特徴量の統計値であり，映像の動きとは無関係の特徴量であるため、動き量を考慮することで正確な品質推定が可能となる。 The motion amount deriving unit 19 derives a motion amount TI indicating the motion between video frames. P, MaxPt, and StdPt found above are statistical values of feature values derived for each video frame, and are feature values that are unrelated to video motion. Is possible.

ここで、式（７）のI（x，y，f）は水平方向ｘ、垂直方向ｙ、映像フレーム番号ｆの画素を示す。また、式（７）、（８）のM（f）はI（x，y，f）とI（x，y，f−1）との差分画像、式（８）、（９）のTI（f）は映像フレーム毎のフレーム関差分値の標準偏差、式（９）のＦは総映像フレーム数（例えば、30fpsの映像１０秒分の場合、３００フレームとなる）を示す。

Here, I (x, y, f) in Expression (7) indicates pixels in the horizontal direction x, the vertical direction y, and the video frame number f. Further, M (f) in equations (7) and (8) is a difference image between I (x, y, f) and I (x, y, f−1), and TI in equations (8) and (9). (F) indicates the standard deviation of the frame-related difference value for each video frame, and F in Expression (9) indicates the total number of video frames (for example, 300 frames for 10 seconds of 30 fps video).

第１の映像品質推定部３００は、映像フレーム平均特徴量導出部１４より出力される映像フレーム平均特徴量Ｐ、最大特徴量導出部１７より出力される最大特徴量MaxPt、標準偏差特徴量導出部１８より出力される標準偏差特徴量StdPt、動き量導出部１９より出力される動き量TIより、第１の映像品質評価値（Vq）を以下の式（１０）に基づき導出する。 The first video quality estimation unit 300 includes a video frame average feature P output from the video frame average feature deriving unit 14, a maximum feature MaxPt output from the maximum feature deriving unit 17, and a standard deviation feature deriving unit. The first video quality evaluation value (Vq) is derived from the standard deviation feature amount StdPt output from 18 and the motion amount TI output from the motion amount deriving unit 19 based on the following equation (10).

ここで、a〜iは、映像フォーマット（例えば、QCIF，VGA，HDなど）に固有の係数とする。

Here, a to i are coefficients specific to the video format (for example, QCIF, VGA, HD, etc.).

ただし、Vqは３次関数で表現されているが、以下に示すような他の数式で表現してもよい。 However, Vq is expressed by a cubic function, but may be expressed by other mathematical expressions as shown below.

ここで、j〜lは、映像フォーマット（例えば、QCIF，VGA，HDなど）に固有の係数とする。

Here, j to l are coefficients specific to the video format (for example, QCIF, VGA, HD, etc.).

第２の特徴量抽出部４００は、PSNR導出部２０から構成されている。 The second feature amount extraction unit 400 includes the PSNR derivation unit 20.

PSNRは以下の式（１２）〜（１４）を用いて導出される。 PSNR is derived using the following equations (12) to (14).

ここで、ｙ_transcode（x,y,f））は水平方向ｘ、垂直方向ｙ、フレーム番号ｆのトランスコード映像の画素を示し、ｙ_code（x,y,f）は水平方向ｘ、垂直方向ｙ、フレーム番号ｆの符号化映像の画素を示し、Ｗは水平方向画素の総数、Ｈは垂直方向画素の総数、Ｎは総映像フレーム数を示す。

Here, y _transcode (x, y, f)) represents the pixel of the transcoded video image in the horizontal direction x, vertical direction y, and frame number f, and y _code (x, y, f) represents the horizontal direction x, vertical direction. y represents the pixel of the encoded video of frame number f, W represents the total number of horizontal pixels, H represents the total number of vertical pixels, and N represents the total number of video frames.

第２の映像品質推定部５００は、第１の映像品質評価値Vqと特徴量PSNRを用いて、以下の式（１５）に基づき第２の映像品質評価値を導出する。 The second video quality estimation unit 500 uses the first video quality evaluation value Vq and the feature amount PSNR to derive a second video quality evaluation value based on the following equation (15).

ここで、m，n，oは映像フォーマット（例えば、QCIF，VGA，HDなど）に固有の係数とする。

Here, m, n, and o are coefficients specific to the video format (for example, QCIF, VGA, HD, etc.).

但し、本実施の形態では、符号化映像とトランスコード映像の両者から抽出する特徴量をPNSRのみを用いてトランスコード映像の品質評価値を導出したが、非特許文献２のITU-T勧告J.247の中で示されるMin_HVなどの特徴量を重み付け加算し、トランスコード映像の品質評価値を評価してもよい。 However, in this embodiment, the quality evaluation value of the transcoded video is derived using only the PNSR as the feature quantity extracted from both the coded video and the transcoded video. It is also possible to evaluate the quality evaluation value of the transcoded video by weighting and adding feature quantities such as Min _HV shown in .247.

次に、本実施の形態にかかる映像品質客観評価装置１の動作について説明する。 Next, the operation of the video quality objective evaluation device 1 according to the present embodiment will be described.

図６は、本発明の第１の実施の形態における映像品質客観評価装置の動作のフローチャートである。 FIG. 6 is a flowchart of the operation of the video quality objective evaluation device according to the first embodiment of the present invention.

映像品質客観評価装置１のクラスタリング部１０に、劣化映像が入力されると（Ｓ１０１）、クラスタリング部１０は、入力された符号化映像ｙ_code(x,y,f)（ｘは水平方向、ｙは垂直方向、ｆはフレーム番号）にSobelフィルタFx,Fyをかけ、エッジ画像Sx(x,y,f)及びSy(x,y,f)を導出し、エッジ画像Sx(x,y,f)及びSy(x,y,f)を用い、１〜４のクラスにクラスタリングし、クラスタリング画像C(x,y,f)を導出し（Ｓ１０２）、フィルタリング部１１及び差分クラスタリング画像導出部１２へ出力する。 When the deteriorated video is input to the clustering unit 10 of the video quality objective evaluation apparatus 1 (S101), the clustering unit 10 inputs the encoded video y _code (x, y, f) (x is the horizontal direction, y Is the vertical direction, f is the frame number), and Sobel filters Fx and Fy are applied to derive edge images Sx (x, y, f) and Sy (x, y, f), and edge image Sx (x, y, f ) And Sy (x, y, f) and clustering into classes 1 to 4 to derive a clustered image C (x, y, f) (S102), to the filtering unit 11 and the differential clustering image deriving unit 12. Output.

フィルタリング部１１は、クラスタリング部１０によって導出されたクラスタリング画像C(x,y,f)を入力として、Erosionフィルタ及びDilatationフィルタをかけ、前述の式（２）、（３）によりフィルタリング画像D(x,y,f)を導出し（Ｓ１０３）、差分クラスタリング画像導出部１２へ出力する。 The filtering unit 11 receives the clustering image C (x, y, f) derived by the clustering unit 10 and applies an erosion filter and a dilatation filter. The filtering image D (x , y, f) is derived (S103) and output to the difference clustering image deriving unit 12.

差分クラスタリング画像導出部１２は、クラスタリング部１０より出力されたクラスタリング画像C(x,y,f)からフィルタリング部１１より出力されたフィルタリング画像D(x,y,f)の差分値をとり、１を加え、１〜４の値を持つ差分クラスタリング画像を導出し（Ｓ１０４）、カウント部１３へ出力する。 The difference clustering image derivation unit 12 takes the difference value of the filtering image D (x, y, f) output from the filtering unit 11 from the clustering image C (x, y, f) output from the clustering unit 10. And a difference clustering image having values of 1 to 4 is derived (S104) and output to the counting unit 13.

カウント部１３は、差分クラスタリング画像導出部１２より出力された差分クラスタリング画像のＣ３（エッジ量）とＣ４（テクスチャ量）の画素数を映像フレーム毎にカウントし（Ｓ１０５）、映像フレーム平均特徴量導出部１４及び映像フレーム特徴量導出部１５へ出力する。 The counting unit 13 counts the number of pixels of C3 (edge amount) and C4 (texture amount) of the difference clustering image output from the difference clustering image deriving unit 12 for each video frame (S105), and derives the video frame average feature amount. Output to the unit 14 and the video frame feature quantity deriving unit 15.

映像フレーム平均特徴量導出部１４は、カウント部１３より出力されたＣ３とＣ４の画素数を、全映像フレームに対して、前述の式（４）によりＣ３の総数をＣ４の総数で除算した映像フレーム平均特徴量Ｐを導出し（Ｓ１０６）、映像品質推定部２０へ出力する。 The video frame average feature quantity deriving unit 14 divides the total number of C3 by the total number of C4 by the above-described equation (4) with respect to the total number of C3 and C4 pixels output from the counting unit 13 for all video frames. The frame average feature amount P is derived (S106) and output to the video quality estimation unit 20.

映像フレーム特徴量導出部１５は、カウント部１３より出力されたＣ３とＣ４の画素数を各映像フレームに対し、前述の式（５）によりＣ３の画素数をＣ４の画素数で除算し、映像フレーム毎の映像フレーム特徴量P（f）を導出し（Ｓ１０７）、平均特徴量導出部１６へ出力する。 The video frame feature quantity deriving unit 15 divides the number of C3 and C4 pixels output from the counting unit 13 for each video frame by the number of C4 pixels by the above-described equation (5) to obtain the video. A video frame feature amount P (f) for each frame is derived (S107) and output to the average feature amount deriving unit 16.

平均特徴量導出部１６は、前述の式（６）により映像フレーム特徴量導出部１５より出力された映像フレーム特徴量P（f）を特定の映像フレーム数毎に平均し、平均特徴量Pt（ｋ）を導出し（Ｓ１０８）、最大特徴量導出部１７及び標準偏差特徴量導出部１８に出力する。 The average feature amount deriving unit 16 averages the video frame feature amount P (f) output from the video frame feature amount deriving unit 15 by the above-described equation (6) for each specific number of video frames, and calculates the average feature amount Pt ( k) is derived (S108) and output to the maximum feature quantity deriving unit 17 and the standard deviation feature quantity deriving unit 18.

最大特徴量導出部１７は、平均特徴量Pt（k）の中で最大である最大特徴量MaxPtを導出し（Ｓ１０９）、映像品質推定部２０へ出力する。 The maximum feature amount deriving unit 17 derives the maximum feature amount MaxPt that is the maximum among the average feature amounts Pt (k) (S109), and outputs the maximum feature amount MaxPt to the video quality estimation unit 20.

標準偏差特徴量導出部１８は、平均特徴量Pt（k）の標準偏差を示す標準偏差特徴量StdPtを導出し（Ｓ１１０）、映像品質推定部２０へ出力する。 The standard deviation feature quantity deriving unit 18 derives a standard deviation feature quantity StdPt indicating the standard deviation of the average feature quantity Pt (k) (S110) and outputs it to the video quality estimation unit 20.

動き量導出部１９は、映像の動きを示す動き量TIを導出し（Ｓ１１１）、映像品質推定部２０へ出力する。 The motion amount deriving unit 19 derives a motion amount TI indicating the motion of the video (S111) and outputs it to the video quality estimating unit 20.

第１の映像品質推定部３００は、映像フレーム平均特徴量Ｐ、最大特徴量MaxPt、標準偏差特徴量StdPt及び動き量TIから前述の式（１０）または式（１１）により映像品質評価値Vqを導出し（Ｓ１１２）、第１の映像品質評価値Vqを出力する。 The first video quality estimation unit 300 calculates the video quality evaluation value Vq from the video frame average feature amount P, the maximum feature amount MaxPt, the standard deviation feature amount StdPt, and the motion amount TI according to the above equation (10) or equation (11). Deriving (S112), the first video quality evaluation value Vq is output.

第２の特徴量抽出部４００は、符号化映像y_code(x,y,f)とトランスコード映像y_transcode(x,y,f)から前述の式（１２）〜（１４）から特徴量PSNRを導出し（Ｓ１１３）、第２の映像品質推定部５００に出力する。 The second feature quantity extraction unit 400 calculates the feature quantity PSNR from the above-described equations (12) to (14) from the encoded video y _code (x, y, f) and the transcoded video y _transcode (x, y, f). Is derived (S113) and output to the second video quality estimation unit 500.

第２の映像品質推定部５００は、前述の式（１５）により第１の映像品質評価値Vqと特徴量PSNRから第２の映像品質評価値Vq_transcodeを導出し（Ｓ１１４）、処理を終了する。 The second video quality estimation unit 500 derives the second video quality evaluation value Vq_transcode from the first video quality evaluation value Vq and the feature amount PSNR by the above equation (15) (S114), and ends the process.

このように、本実施の形態によれば、符号化映像からトランスコード前の第１の映像品質評価値Vqを導出し、次に、符号化映像y_code(x,y,f)とトランスコード映像y_transcode(x,y,f)の両者から特徴量PSNRを導出し、第１の映像品質評価値Vqと第２の特徴量PSNRを用いることで、原映像の得られない状況下で、トランスコード映像の品質評価値Vq_transcodeを算出することができるため、従来よりも正確なトランスコード映像品質客観評価法による映像品質推定が可能となる。 Thus, according to the present embodiment, the first video quality evaluation value Vq before transcoding is derived from the encoded video, and then the encoded video y _code (x, y, f) and the transcode are derived. By deriving the feature amount PSNR from both the video y _transcode (x, y, f) and using the first video quality evaluation value Vq and the second feature amount PSNR, Since the quality evaluation value Vq_transcode of the transcoded video can be calculated, it is possible to estimate the video quality by the transcoded video quality objective evaluation method which is more accurate than before.

したがって、映像通信サービスの提供者は、提供中のサービスがユーザに対してある一定以上の品質を保っているか否かを容易に判断することができ、提供中のサービスの品質実態をリアルタイムで把握・管理することが可能となる。 Therefore, the video communication service provider can easily determine whether the service being provided maintains a certain level of quality to the user, and can grasp the actual quality of the service being provided in real time.・ It becomes possible to manage.

［第２の実施の形態］
前述の第１の実施の形態では、第１の映像品質評価値Ｖｑを求めたのち、特徴量PSNRを重み付け加算することで、第２の映像品質評価値（トランスコード映像の映像品質評価値）を導出したが、本実施の形態では、第１の映像品質評価値を構成する特徴量と特徴量PNSRを直接重み付け加算する例を示す。 [Second Embodiment]
In the first embodiment described above, after obtaining the first video quality evaluation value Vq, the feature amount PSNR is weighted and added to obtain the second video quality evaluation value (video quality evaluation value of the transcoded video). However, in the present embodiment, an example is shown in which the feature amount constituting the first video quality evaluation value and the feature amount PNSR are directly weighted and added.

図７は、本発明の第２の実施の形態における映像品質客観評価装置の構成を示す。 FIG. 7 shows the configuration of the video quality objective evaluation apparatus in the second embodiment of the present invention.

同図において、図３の構成と同一構成部分には同一符号を付し、その説明を省略する。本実施の形態では、第１の映像品質推定部３００は用いず、第２の映像品質推定部５００の代わりに映像品質推定部６００を備える。 In the figure, the same components as those in FIG. 3 are denoted by the same reference numerals, and the description thereof is omitted. In the present embodiment, the first video quality estimation unit 300 is not used, and a video quality estimation unit 600 is provided instead of the second video quality estimation unit 500.

本実施の形態の映像品質客観評価装置２では、映像フレーム平均特徴量導出部１４、最大特徴量導出部１７、標準偏差特徴量導出部１８、動き量導出部１９は各特徴量を第１の映像品質推定部３００ではなく、映像品質推定部６００に出力する。 In the video quality objective evaluation device 2 of the present embodiment, the video frame average feature value deriving unit 14, the maximum feature value deriving unit 17, the standard deviation feature value deriving unit 18, and the motion amount deriving unit 19 This is output to the video quality estimation unit 600 instead of the video quality estimation unit 300.

図８は、本発明の第２の実施の形態における映像品質客観評価装置の動作のフローチャートである。Ｓ２０１〜Ｓ２１１までは、第１の実施の形態の図６のＳ１０１〜Ｓ１１１と同様であるが、Ｓ２１２において、ＰＳＮＲ導出部２０において、入力された符号化映像及びトランスコード映像に基づいてＰＳＮＲを導出した後、Ｓ２１３において、映像品質推定部６００が、式（１６）に示すように、映像フレーム平均特徴量導出部１４から取得した映像フレーム平均特徴量Ｐ、最大特徴量導出部１７から取得した最大特徴量MaxPt、標準偏差特徴量導出部１８から取得した標準偏差特徴量StdPt、動き量導出部１９から取得した動き量TIとPNSR導出部２０から取得した特徴量PSNRを直接重み付け加算することでトランスコード映像の映像品質評価値Ｖq_transcodを直接導出する。 FIG. 8 is a flowchart of the operation of the video quality objective evaluation device according to the second embodiment of the present invention. S201 to S211 are the same as S101 to S111 of FIG. 6 of the first embodiment, but in S212, the PSNR deriving unit 20 derives the PSNR based on the input encoded video and transcoded video. After that, in S213, the video quality estimation unit 600 obtains the video frame average feature quantity P acquired from the video frame average feature quantity derivation unit 14 and the maximum feature quantity obtained from the maximum feature quantity derivation unit 17 as shown in Expression (16). The feature amount MaxPt, the standard deviation feature amount StdPt acquired from the standard deviation feature amount deriving unit 18, the motion amount TI acquired from the motion amount deriving unit 19 and the feature amount PSNR acquired from the PNSR deriving unit 20 are directly weighted and added. The video quality evaluation value Vq_transcod of the code video is directly derived.

ここで、u，v，w，x，y，zは映像フォーマット（例えば、QCIF，VGA，HDなど）に固有の係数とする。

Here, u, v, w, x, y, and z are coefficients specific to the video format (for example, QCIF, VGA, HD, etc.).

但し、本実施の形態では、符号化映像とトランスコード映像の両者から抽出する特徴量をPSNRのみを用いてトランスコード映像の品質評価値を導出したが、非特許文献２のITU-T勧告J.247の中で示されるMin_HVなどの特徴量を重み付け加算し、トランスコード映像の品質評価値を評価してもよい。 However, in this embodiment, the quality evaluation value of the transcoded video is derived using only the PSNR as the feature quantity extracted from both the coded video and the transcoded video. It is also possible to evaluate the quality evaluation value of the transcoded video by weighting and adding feature quantities such as Min _HV shown in .247.

なお、上記の第１、第２の実施の形態における映像品質客観評価装置１、２は、ＣＰＵ（中央演算装置）やメモリ、インターフェースからなるコンピュータにコンピュータプログラムをインストールすることによって実現され、上述した映像品質客観評価装置１の各種機能は、上記コンピュータの各種ハードウェア資源と上記コンピュータプログラム（ソフトウェア）とが協働して実現される。 The video quality objective evaluation apparatuses 1 and 2 in the first and second embodiments described above are realized by installing a computer program in a computer including a CPU (Central Processing Unit), a memory, and an interface. Various functions of the video quality objective evaluation apparatus 1 are realized by cooperation of various hardware resources of the computer and the computer program (software).

図３、図７に示す映像品質客観評価装置１，２の構成要素の動作をプログラムとして構築し、当該映像品質客観評価装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 The operations of the constituent elements of the video quality objective evaluation devices 1 and 2 shown in FIGS. 3 and 7 are constructed as a program and installed and executed on a computer used as the video quality objective evaluation device, or via a network. It can be distributed.

さらに、構築されたプログラムをハードディスクや、フレキシブルディスク・ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 Furthermore, the constructed program can be stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, and installed in a computer or distributed.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

ＩＰネットワーク経由で行うIPTVサービス、映像配信サービスなどの映像通信のトランスコード映像品質評価値を推定するトランスコード映像品質客観評価装置に利用できる。 It can be used for a transcoded video quality objective evaluation apparatus that estimates a transcoded video quality evaluation value of video communication such as an IPTV service and a video distribution service performed via an IP network.

１，２映像品質客観評価装置
１０クラスタリング部
１１フィルタリング部
１２差分クラスタリング画像導出部
１３カウント部
１４映像フレーム平均特徴量導出部
１５映像フレーム特徴量導出部
１６平均特徴量導出部
１７最大特徴量導出部
１８標準偏差特徴量導出部
１９動き量導出部
２０ＰＳＮＲ導出部
１００エッジ・テクスチャ量抽出手段、エッジ・テクスチャ量抽出部
２００第１の特徴量抽出手段、第１の特徴量抽出部
３００第１の映像品質推定手段、第１の映像品質推定部
４００第２の特徴量抽出手段、第２の特徴量抽出部
５００第２の映像品質推定手段
６００映像品質推定部 1, 2 Video quality objective evaluation device 10 Clustering unit 11 Filtering unit 12 Difference clustering image deriving unit 13 Counting unit 14 Video frame average feature deriving unit 15 Video frame feature deriving unit 16 Average feature deriving unit 17 Maximum feature deriving unit 18 standard deviation feature quantity deriving section 19 motion quantity deriving section 20 PSNR deriving section 100 edge / texture quantity extraction means, edge / texture quantity extraction section 200 first feature quantity extraction means, first feature quantity extraction section 300 first Video quality estimation means, first video quality estimation section 400 second feature quantity extraction means, second feature quantity extraction section 500 Second video quality estimation means 600 Video quality estimation section

Claims

A transcoded video quality objective evaluation device that objectively evaluates the quality of transcoded video,
Edge / texture amount extraction means for deriving an edge amount and a texture amount from the input encoded video before transcoding;
A statistic derived from the edge amount and the texture amount derived from the edge / texture amount extraction means , and an average value of all video frames with respect to a ratio of the edge amount and the texture amount per video frame; A feature amount that is a maximum value and a standard deviation value of a set of average values for every certain interval, or at least one first feature amount of a motion amount that is a feature amount derived from an encoded video before the transcoding. First feature amount extraction means for deriving;
Second feature quantity extraction means for deriving a second feature quantity from difference information using the number of pixels in the horizontal and vertical directions of the input encoded video before transcoding and transcoded video after transcoding;
First video quality for deriving a video quality evaluation value of the input encoded video before transcoding as a first video quality evaluation value from the first feature quantity output from the first feature quantity extraction means Means,
A second video quality evaluation value indicating a quality evaluation value of a transcoded video after transcoding is derived from the first video quality evaluation value and the second feature quantity derived from the second feature quantity extraction means. Second video quality estimating means for
A transcoded video quality objective evaluation apparatus comprising:

A transcoded video quality objective evaluation device that objectively evaluates the quality of transcoded video,
Edge / texture amount extraction means for deriving an edge amount and a texture amount from the input encoded video before transcoding;
A statistic derived from the edge amount and the texture amount derived from the edge / texture amount extraction means , and an average value of all video frames with respect to a ratio of the edge amount and the texture amount per video frame; A feature amount that is a maximum value and a standard deviation value of a set of average values for every certain interval, or at least one first feature amount of a motion amount that is a feature amount derived from an encoded video before the transcoding. First feature amount extraction means for deriving;
Second feature quantity extraction means for deriving a second feature quantity from difference information using the number of pixels in the horizontal and vertical directions of the input encoded video before transcoding and transcoded video after transcoding;
The quality evaluation value of the transcoded video transcoded from the first feature extraction and the second feature quantity derived and the derived first feature quantity from the second feature extraction means than means Second video quality estimation means for deriving a second video quality evaluation value to be indicated;
A transcoded video quality objective evaluation apparatus comprising:

The edge / texture amount extraction means includes:
Clustering means for capturing the input encoded video before transcoding and creating an edge image, performing clustering based on the value of each pixel of the edge image, and deriving the clustered image;
Filtering the clustering image derived by the clustering unit to create a filtered image;
A difference clustering image derivation unit that creates a difference clustering image from a difference value between the clustering image derived by the clustering and the filtering image derived from the filtering unit;
Counting means for counting the edge amount and the texture amount of the difference clustering image derived by the difference clustering image deriving means in units of video frames;
The transcoded video quality objective evaluation apparatus according to claim 1 or 2, characterized by comprising:

The first feature amount extraction means includes:
Based on the edge amount and the texture amount extracted in units of video frames by the edge / texture amount extraction means, the total value of the edge amounts for all video frames of the estimation target video, as well as all video frames of the estimation target video A video frame average feature amount derivation unit is provided for calculating a total value of texture amounts with respect to, and deriving an average video frame feature amount obtained by dividing the total value of edge amounts by the total value of texture amounts. The transcoded video quality objective evaluation apparatus according to 1 or 2.

The first feature amount extraction means includes:
A video frame for deriving a video frame feature value obtained by dividing a total value of edge amounts by a total value of texture amounts in units of video frames based on the edge amount and texture amount extracted in units of video frames by the edge / texture amount extraction unit A feature derivation means;
Average feature value for deriving an average feature value obtained by summing up the video frame feature values derived from the video frame feature value deriving means for a plurality of video frames in a certain section of the video to be estimated and dividing by the number of video frames Deriving means;
A maximum feature that indicates a maximum value of a plurality of average feature values of all sections to be estimated with respect to the average feature quantity of a plurality of video frames in a certain section of the estimation target video derived by the average feature quantity deriving unit. Maximum feature amount deriving means for deriving the amount;
A standard deviation obtained by taking a standard deviation of a plurality of average feature values of all sections to be estimated with respect to the average feature quantity of a plurality of video frames in a certain section of the estimation target video derived by the average feature quantity deriving unit. Standard deviation feature quantity deriving means for deriving the feature quantity;
The transcoded video quality objective evaluation apparatus according to claim 1, 2, or 4, further comprising:

The first feature amount extraction means includes:
6. The transcoded video quality according to claim 1, 2, 4 or 5, further comprising motion amount deriving means for deriving a motion amount indicating the motion of the video from the input encoded video before transcoding. Objective evaluation device.

The second feature amount extraction unit includes:
3. The PSNR deriving means for deriving a peak-signal-to-noise ratio (PSNR) from the input encoded video before transcoding and transcoded video after transcoding. Transcode video quality objective evaluation device.

A transcoded video quality objective evaluation method for objectively evaluating the quality of transcoded video,
Transcode video quality objective evaluation device
An edge / texture amount extraction step for deriving an edge amount and a texture amount from the input encoded video before transcoding;
A statistical amount derived from the edge amount and the texture amount derived from the edge / texture amount extraction step , and an average value of all video frames with respect to a ratio of the edge amount and the texture amount per video frame; A feature amount that is a maximum value and a standard deviation value of a set of average values for every certain interval, or at least one first feature amount of a motion amount that is a feature amount derived from an encoded video before the transcoding. A first feature amount extraction step to be derived;
A second feature amount extraction step of deriving a second feature amount from difference information using the number of pixels in the horizontal direction and the vertical direction of the input encoded video before transcoding and transcoded video after transcoding;
First video quality for deriving a video quality evaluation value of an input encoded video before transcoding as a first video quality evaluation value from the first feature value output from the first feature value extraction step An estimation step;
A second video quality evaluation value indicating a quality evaluation value of the transcoded video after transcoding is derived from the first video quality evaluation value and the second feature quantity derived from the second feature quantity extraction step. A second video quality estimation step,
Transcode video quality objective evaluation method characterized by

A transcoded video quality objective evaluation method for objectively evaluating the quality of transcoded video,
Transcode video quality objective evaluation device
An edge / texture amount extraction step for deriving an edge amount and a texture amount from the input encoded video before transcoding;
A statistical amount derived from the edge amount and the texture amount derived from the edge / texture amount extraction step , and an average value of all video frames with respect to a ratio of the edge amount and the texture amount per video frame; A feature amount that is a maximum value and a standard deviation value of a set of average values for every certain interval, or at least one first feature amount of a motion amount that is a feature amount derived from an encoded video before the transcoding. A first feature amount extraction step to be derived;
A second feature amount extraction step of deriving a second feature amount from difference information using the number of pixels in the horizontal direction and the vertical direction of the input encoded video before transcoding and transcoded video after transcoding;
A quality evaluation value of a transcoded video after transcoding is calculated from the first feature quantity derived from the first feature quantity extraction step and the second feature quantity derived from the second feature quantity extraction step. A second video quality estimation step for deriving a second video quality evaluation value to be indicated;
Transcode video quality objective evaluation method characterized by

In the edge / texture amount extraction step,
A clustering step of capturing an input encoded video before transcoding to create an edge image, performing clustering based on the value of each pixel of the edge image, and deriving a clustered image;
Filtering the clustering image derived by the clustering step to create a filtered image;
A difference clustering image derivation step for creating a difference clustering image from a difference value between the clustering image derived by the clustering and the filtering image derived from the filtering step;
A counting step of counting the edge amount and the texture amount of the difference clustering image derived by the difference clustering image derivation step in units of video frames;
10. The transcoded video quality objective evaluation method according to claim 8 or 9, wherein:

In the first feature amount extraction step,
Based on the edge amount and texture amount extracted in units of video frames in the edge / texture amount extraction step, the total value of edge amounts for all video frames of the estimation target video, as well as all video frames of the estimation target video A video frame average feature amount derivation step is performed, wherein a total value of texture amounts is calculated and a video frame average feature amount is derived by dividing the total amount of edge amounts by the total value of texture amounts. The transcoded video quality objective evaluation method according to 8 or 9.

In the first feature amount extraction step,
A video frame for deriving a video frame feature amount obtained by dividing the total value of edge amounts by the total value of texture amounts in units of video frames based on the edge amount and texture amount extracted in units of video frames in the edge / texture amount extraction step A feature amount derivation step;
Average feature value for deriving an average feature value obtained by summing the video frame feature values derived from the video frame feature value derivation step for a plurality of video frames in a certain section of the estimation target video and dividing by the number of video frames A derivation step;
The maximum feature that indicates the maximum value of the plurality of average feature values of all the sections to be estimated with respect to the average feature amount of the plurality of video frames in a certain section of the estimation target image that is derived from the average feature amount derivation step. A maximum feature amount deriving step for deriving an amount;
A standard deviation obtained by taking a standard deviation of a plurality of average feature values of all sections to be estimated with respect to the average feature quantity of a plurality of video frames in a certain section of the estimation target video derived from the average feature quantity derivation step. A standard deviation feature amount deriving step for deriving the feature amount;
The transcoded video quality objective evaluation method according to claim 8, 9 or 11, further comprising:

In the first feature amount extraction step,
13. The transcoded video quality according to claim 8, 9, 11, or 12, further comprising a motion amount deriving step of deriving a motion amount indicating a motion of the video from the input encoded video before transcoding. Objective evaluation method.

In the second feature amount extraction step,
10. The PSNR deriving step of deriving a peak-signal-to-noise ratio (PSNR) from the inputted encoded video before transcoding and transcoded video after transcoding. Transcode video quality objective evaluation method.

A transcoded video quality objective evaluation program for causing a computer to function as each means constituting the transcoded video quality objective evaluation device according to any one of claims 1 to 7.