JP6431449B2

JP6431449B2 - Video matching apparatus, video matching method, and program

Info

Publication number: JP6431449B2
Application number: JP2015132330A
Authority: JP
Inventors: 喜美子川嶋; 仁志青木
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-07-01
Filing date: 2015-07-01
Publication date: 2018-11-28
Anticipated expiration: 2035-07-01
Also published as: JP2017017524A

Description

本発明は、映像整合装置、映像整合方法、及びプログラムに関する。 The present invention relates to a video matching device, a video matching method, and a program.

良好な品質で映像サービスを提供していることを確認するためには、サービス提供前又は提供中に、ユーザが体感する映像の品質を測定し、ユーザに対して提供される映像の品質が良好であることを確認・監視することが重要である。そこで、ユーザが体感する映像品質を適切に評価できる映像品質評価技術が必要である。 In order to confirm that the video service is provided with good quality, the quality of the video experienced by the user is measured before or during service provision, and the quality of the video provided to the user is good. It is important to confirm and monitor this. Therefore, a video quality evaluation technique that can appropriately evaluate the video quality experienced by the user is necessary.

映像品質を評価する手法には、主観評価法（例えば、非特許文献１参照）と、客観評価法（例えば、非特許文献２参照）とが存在する。主観評価法は、評価環境（室内照度や室内騒音など）が再現可能な施設において、多数のユーザに映像を評価してもらう必要がある。そのため、高コストになると共に評価に時間を要するため、品質をリアルタイムに評価する用途には向かない。そこで、映像信号から物理的な特徴量を導出して、映像品質評価値を出力する、映像品質の客観評価法の開発が進められている（例えば、非特許文献２参照）。 Methods for evaluating video quality include a subjective evaluation method (for example, see Non-Patent Document 1) and an objective evaluation method (for example, see Non-Patent Document 2). The subjective evaluation method requires a large number of users to evaluate images in a facility where the evaluation environment (such as room illuminance and room noise) can be reproduced. For this reason, the cost is high and the evaluation takes time, so that it is not suitable for use in which quality is evaluated in real time. Therefore, development of an objective video quality evaluation method that derives physical feature quantities from video signals and outputs video quality evaluation values is underway (see Non-Patent Document 2, for example).

非特許文献２には、基準映像信号と、基準映像信号を符号化やネットワーク伝送したことで劣化した劣化映像信号との物理的特徴量を比較することで、映像品質を推定する客観評価法が記述されている。このような方法では、基準映像信号と劣化映像信号との空間的及び時間的な位置が整合している必要がある。つまり、基準映像信号と劣化映像信号との間で、時間方向のずれや空間位置のずれが整合している必要がある。特許文献１では、インターネットでの映像配信を考慮し、パケット到着間隔のゆらぎやパケット損失の発生による、基準映像信号と劣化映像信号との同期がずれた際の空間的及び時間的な位置の整合手法が提案されている。 Non-Patent Document 2 discloses an objective evaluation method for estimating video quality by comparing physical feature amounts of a reference video signal and a degraded video signal that has deteriorated due to encoding or network transmission of the reference video signal. It has been described. In such a method, the spatial and temporal positions of the reference video signal and the degraded video signal need to be matched. That is, it is necessary that the time-direction deviation and the spatial position deviation are consistent between the reference video signal and the degraded video signal. In Patent Document 1, in consideration of video distribution over the Internet, spatial and temporal positional alignment when the reference video signal and the degraded video signal are out of synchronization due to fluctuations in packet arrival intervals and occurrence of packet loss. A method has been proposed.

特許第５３４７０１２号公報Japanese Patent No. 5347012

ITU-T勧告P.910 "Subjective video quality assessment methods for multimedia applications"ITU-T recommendation P.910 "Subjective video quality assessment methods for multimedia applications" ITU-T勧告J.247 "Objective perceptual multimedia video quality measurement in the presence of a full reference"ITU-T recommendation J.247 "Objective perceptual multimedia video quality measurement in the presence of a full reference" 次世代放送推進フォーラムの事業概要http://www.nextv-f.jp/pdf/press20130617-2.pdfBusiness Overview of Next Generation Broadcast Promotion Forum http://www.nextv-f.jp/pdf/press20130617-2.pdf ARIB STD-B56 1.1版、［online］、［平成27年6月9日検索］、インターネット（URL:http://www.arib.or.jp/english/html/overview/doc/2-STD-B56v1_1.pdf）ARIB STD-B56 version 1.1, [online], [Search June 9, 2015], Internet (URL: http://www.arib.or.jp/english/html/overview/doc/2-STD- B56v1_1.pdf) THE 世界遺産4K、［online］、［平成27年6月9日検索］、インターネット（URL:http://www.bs-tbs.co.jp/sekaiisan4ksetsumei/）THE World Heritage 4K, [online], [Search June 9, 2015], Internet (URL: http://www.bs-tbs.co.jp/sekaiisan4ksetsumei/) ひかりTV 4K、［online］、［平成27年6月9日検索］、インターネット（URL:http://www.hikaritv.net/4k/）Hikari TV 4K, [online], [Search June 9, 2015], Internet (URL: http://www.hikaritv.net/4k/)

一方、近年、従来のフルＨＤ映像よりも高解像度な４Ｋ／８Ｋ映像サービスが注目されている（例えば、非特許文献３参照）。４Ｋ／８Ｋ映像サービスでは、フレームレートが従来の３０ｆｐｓから、６０ｆｐｓ又は１２０ｆｐｓへと高くなり（例えば、非特許文献４参照）、フレーム前後での輝度値の変化が小さくなっている。また、４Ｋ／８Ｋ映像サービスでは、高解像度を強調するため、静止画のように映像の動きの少ない番組制作が進んでいる（例えば、非特許文献５及び６参照）。 On the other hand, in recent years, attention has been paid to 4K / 8K video services with higher resolution than conventional full HD video (for example, see Non-Patent Document 3). In the 4K / 8K video service, the frame rate is increased from the conventional 30 fps to 60 fps or 120 fps (see, for example, Non-Patent Document 4), and the change in the luminance value before and after the frame is small. In the 4K / 8K video service, in order to emphasize high resolution, production of a program with less video motion such as a still image is in progress (see, for example, Non-Patent Documents 5 and 6).

そのため、特許文献１のような従来技術で対象としているパケットゆらぎやパケット損失の現象とは異なり、フレーム前後の輝度値の変化が小さく、時間的な変化が小さいため、時間的な位置の整合がとりにくいという課題が生じている。 Therefore, unlike the packet fluctuation or packet loss phenomenon that is the subject of the prior art such as Patent Document 1, the change in the luminance value before and after the frame is small, and the temporal change is small. There is a problem that it is difficult to take.

本発明は、上記の点に鑑みてなされたものであって、時間的な変化の小さい映像信号についても劣化映像信号との時間的な位置の整合の精度を高めることを目的とする。 The present invention has been made in view of the above points, and an object of the present invention is to improve the accuracy of temporal position matching with a deteriorated video signal even for a video signal having a small temporal change.

そこで上記課題を解決するため、映像整合装置は、第１の映像信号と、前記第１の映像信号の映像品質が変化した第２の映像信号とのそれぞれの特徴量を導出する導出部と、前記第１の映像信号の特徴量に基づいて、前記第１の映像信号を、少なくとも映像の時間的変化の大きさに基づいて区分される複数の類型のうちのいずれかの類型に分類する分類部と、前記第１の映像信号が分類された類型に対応する、前記第１の映像信号及び前記第２の映像信号のそれぞれの特徴量に基づいて、前記第１の映像信号と前記第２の映像信号との時間的な位置のずれ量を算出する整合部と、を有する。
Therefore, in order to solve the above-described problem, the video matching device includes a deriving unit that derives respective feature amounts of the first video signal and the second video signal in which the video quality of the first video signal has changed , A classification for classifying the first video signal into one of a plurality of types classified based on at least the magnitude of temporal change of the video based on the feature quantity of the first video signal. And the first video signal and the second video signal based on the respective feature amounts of the first video signal and the second video signal corresponding to the type into which the first video signal is classified. And a matching unit for calculating a temporal positional shift amount from the video signal.

時間的な変化の小さい映像信号についても劣化映像信号との時間的な位置の整合の精度を高めることができる。従って、４Ｋ／８Ｋ映像サービスにおいても、正確に映像品質を推定する客観評価法が実施可能となる。 Even for a video signal having a small temporal change, it is possible to improve the accuracy of temporal position matching with a deteriorated video signal. Therefore, even in the 4K / 8K video service, an objective evaluation method for accurately estimating the video quality can be implemented.

本発明の実施の形態における映像整合装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the image | video matching apparatus in embodiment of this invention. 本発明の実施の形態における映像整合装置の機能構成例を示す図である。It is a figure which shows the function structural example of the image | video matching apparatus in embodiment of this invention. Ａ群又はそれ以外への第１の分類方法を説明するための図である。It is a figure for demonstrating the 1st classification method to A group or other than that. Ａ群又はそれ以外への第２の分類方法を説明するための図である。It is a figure for demonstrating the 2nd classification method to A group or other than that. Ｂ群又はＣ群への第１の分類方法を説明するための図である。It is a figure for demonstrating the 1st classification method to B group or C group. Ｂ群又はＣ群への第２の分類方法を説明するための図である。It is a figure for demonstrating the 2nd classification method to B group or C group. Ｂ群又はＣ群への第３の分類方法を説明するための図である。It is a figure for demonstrating the 3rd classification method to B group or C group. Ａ群のフレームずれ量τの算出方法を説明するための図である。It is a figure for demonstrating the calculation method of frame deviation | shift amount (tau) of A group. 整合基準映像信号R2と整合劣化映像信号P2とを示す図である。It is a figure which shows the matching reference | standard video signal R2 and the matching degradation video signal P2. Ｂ群及びＣ群のフレームずれ量τの算出方法を説明するための図である。It is a figure for demonstrating the calculation method of frame shift amount (tau) of B group and C group.

以下、図面に基づいて本発明の実施の形態を説明する。図１は、本発明の実施の形態における映像整合装置のハードウェア構成例を示す図である。図１の映像整合装置１０は、それぞれバスで相互に接続されているドライブ装置１００、補助記憶装置１０２、メモリ装置１０３、ＣＰＵ１０４、及びインタフェース装置１０５等を有する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a diagram illustrating a hardware configuration example of a video matching device according to an embodiment of the present invention. The video matching device 10 in FIG. 1 includes a drive device 100, an auxiliary storage device 102, a memory device 103, a CPU 104, an interface device 105, and the like that are mutually connected by a bus.

映像整合装置１０での処理を実現するプログラムは、ＣＤ−ＲＯＭ等の記録媒体１０１によって提供される。プログラムを記憶した記録媒体１０１がドライブ装置１００にセットされると、プログラムが記録媒体１０１からドライブ装置１００を介して補助記憶装置１０２にインストールされる。但し、プログラムのインストールは必ずしも記録媒体１０１より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置１０２は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。 A program for realizing processing in the video matching apparatus 10 is provided by a recording medium 101 such as a CD-ROM. When the recording medium 101 storing the program is set in the drive device 100, the program is installed from the recording medium 101 to the auxiliary storage device 102 via the drive device 100. However, the program need not be installed from the recording medium 101 and may be downloaded from another computer via a network. The auxiliary storage device 102 stores the installed program and also stores necessary files and data.

メモリ装置１０３は、プログラムの起動指示があった場合に、補助記憶装置１０２からプログラムを読み出して格納する。ＣＰＵ１０４は、メモリ装置１０３に格納されたプログラムに従って映像整合装置１０に係る機能を実行する。インタフェース装置１０５は、ネットワークに接続するためのインタフェースとして用いられる。 The memory device 103 reads the program from the auxiliary storage device 102 and stores it when there is an instruction to start the program. The CPU 104 executes functions related to the video matching device 10 in accordance with a program stored in the memory device 103. The interface device 105 is used as an interface for connecting to a network.

図２は、本発明の実施の形態における映像整合装置の機能構成例を示す図である。図２において、映像整合装置１０は、映像特徴量導出部１１、映像分類部１２、及び映像整合部１３等を有する。これら各部は、映像整合装置１０にインストールされた１以上のプログラムが、ＣＰＵ１０４に実行させる処理により実現される。映像整合装置１０は、また、映像分類情報記憶部１４を利用する。映像分類情報記憶部１４は、例えば、補助記憶装置１０２、又は映像整合装置１０にネットワークを介して接続可能な記憶装置等を用いて実現可能である。 FIG. 2 is a diagram illustrating a functional configuration example of the video matching apparatus according to the embodiment of the present invention. In FIG. 2, the video matching apparatus 10 includes a video feature quantity deriving unit 11, a video classifying unit 12, a video matching unit 13, and the like. Each of these units is realized by processing executed by the CPU 104 by one or more programs installed in the video matching apparatus 10. The video matching apparatus 10 also uses the video classification information storage unit 14. The video classification information storage unit 14 can be realized by using, for example, a storage device that can be connected to the auxiliary storage device 102 or the video matching device 10 via a network.

映像整合装置１０に対する入力データは、基準映像信号R1及び劣化映像信号P1である。劣化映像信号P1は、基準映像信号R1が符号化やネットワーク伝送等を経ることにより、その映像品質が劣化した映像信号をいう。映像整合装置１０は、例えば、符号化やネットワーク伝送等により劣化した映像信号の品質を評価する際に用いられ、基準映像信号R1と評価対象の映像信号である劣化映像信号P1との時間的な位置の整合をとる際に用いられる。 The input data to the video matching device 10 is the reference video signal R1 and the degraded video signal P1. The degraded video signal P1 refers to a video signal whose video quality has deteriorated as a result of the reference video signal R1 undergoing encoding, network transmission, or the like. The video matching device 10 is used, for example, when evaluating the quality of a video signal that has deteriorated due to encoding, network transmission, or the like. Used when aligning positions.

映像特徴量導出部１１は、入力される基準映像信号R1と劣化映像信号P1とのそれぞれについて、特徴量を導出する。具体的には、非特許文献１に示されているITU-TP.910において規定されている空間情報量（SI:Spatialperceptualinformation）、時間情報量（TI:Temporalperceptualinformation）、及び符号化難易度を表すPSNR（PeakSignal-to-NoiseRatio）が導出される。空間情報量及び時間情報量は、入力された各映像信号のフレームごとに導出される。以下、基準映像信号R1から導出された空間情報量を、SI_R1(k)(k=1,2,…,r)と表記し、基準映像信号R1から導出された時間情報量を、TI_R1(k)(k=1,2,…,r)と表記する。また、劣化映像信号P1から導出された空間情報量を、SI_P1(k)(k=1,2,…,p)と表記し、劣化映像信号P1から導出された時間情報量を、TI_P1(k)(k=1,2,…,p)と表記する。 The video feature quantity deriving unit 11 derives a feature quantity for each of the input reference video signal R1 and the degraded video signal P1. Specifically, the amount of spatial information (SI: Spatial perceptual information), time information (TI: Temporal perceptual information) specified in ITU-TP.910 shown in Non-Patent Document 1, and PSNR representing the encoding difficulty level (PeakSignal-to-NoiseRatio) is derived. The amount of spatial information and the amount of temporal information are derived for each frame of each input video signal. Hereinafter, the spatial information amount derived from the reference video signal R1 is represented as SI_R1 (k) (k = 1, 2,..., R), and the temporal information amount derived from the reference video signal R1 is expressed as TI_R1 (k ) (k = 1,2, ..., r). Further, the spatial information amount derived from the degraded video signal P1 is expressed as SI_P1 (k) (k = 1, 2,..., P), and the temporal information amount derived from the degraded video signal P1 is expressed as TI_P1 (k ) (k = 1,2, ..., p).

ここで、rは、基準映像信号R1のフレーム数、pは、劣化映像信号P1のフレーム数である。基準映像信号R1のフレーム数と、劣化映像信号P1のフレーム数とが、必ずしも同一ではないのは、符号化やネットワーク伝送等を経ることによりフレームが欠落したり、ノイズ等によってフレームが増加したりする可能性が有るからである。 Here, r is the number of frames of the reference video signal R1, and p is the number of frames of the degraded video signal P1. The number of frames of the reference video signal R1 and the number of frames of the degraded video signal P1 are not necessarily the same because frames are lost due to encoding, network transmission, etc., or the number of frames increases due to noise, etc. This is because there is a possibility of doing.

PSNRは、基準映像信号R1をn種の符号化ビットレートBR(l)(1≦l≦n)で符号化した映像と基準映像信号R1とを比較して導出される。導出された結果は、PSNR_R1(l)(1≦l≦n)と表記される。なお、PSNR_R1(l)(1≦l≦n)は、ある符号化ビットレートBR(l)で基準映像信号R1を符号化した際のrフレーム分の平均PSNRである。また、符号化の際に用いられるコーデックは、PSNR_R1(l)(1≦l≦n)と、映像分類情報記憶部１４に記憶されている符号化難易度分類図（図７）との比較を可能とするため、符号化難易度分類図を作成する際に用いられたコーデックと同じであることが望ましい。例えば、コーデックには、国際標準化で検討されている標準ソフトウェアが用いられてもよい。 The PSNR is derived by comparing a video obtained by coding the reference video signal R1 at n types of coding bit rates BR (l) (1 ≦ l ≦ n) and the reference video signal R1. The derived result is expressed as PSNR_R1 (l) (1 ≦ l ≦ n). PSNR_R1 (l) (1 ≦ l ≦ n) is an average PSNR for r frames when the reference video signal R1 is encoded at a certain encoding bit rate BR (l). The codec used for encoding is a comparison between PSNR_R1 (l) (1 ≦ l ≦ n) and the encoding difficulty level classification diagram (FIG. 7) stored in the video classification information storage unit 14. In order to make it possible, it is desirable that the codec is the same as the codec used when creating the encoding difficulty level classification diagram. For example, standard software that is studied in international standardization may be used as the codec.

映像分類部１２は、映像特徴量導出部１１から出力された基準映像信号R1の空間情報量（SI_R1）、時間情報量（TI_R1）、又は符号化難易度（PSNR_R1）と、映像分類情報記憶部１４に記憶されている各種の閾値等とを比較して、基準映像信号R1を３つの類型のいずれかの類型に分類し、映像分類結果Xを出力する。具体的には、まず、映像分類部１２は、時間情報量TI値に基づいて、基準映像信号R1が、大きな時間的な変化を含む映像群の類型（Ａ群）とそうでない映像群の類型とに分類される。映像分類部１２は、基準映像信号R1が、Ａ群には該当しない場合、空間情報量SI値及び符号化難易度を示すPSNRに基づいて、基準映像信号R1を、複雑な映像群の類型（Ｂ群）、又はそうでない映像群の類型（Ｃ群）とのいずれかに分類する。 The video classification unit 12 includes a spatial information amount (SI_R1), a temporal information amount (TI_R1), or an encoding difficulty (PSNR_R1) of the reference video signal R1 output from the video feature quantity deriving unit 11, and a video classification information storage unit 14, the reference video signal R1 is classified into one of the three types, and the video classification result X is output. Specifically, first, the video classification unit 12 determines, based on the temporal information amount TI value, that the reference video signal R1 includes a video group type (group A) that includes a large temporal change and a video group type that does not. And classified. When the reference video signal R1 does not correspond to the group A, the video classification unit 12 converts the reference video signal R1 into a complex video group type (based on the spatial information SI value and the PSNR indicating the encoding difficulty ( Group B) or a type of video group that is not (Group C).

ここで、Ａ群は、大きな時間的な変化を含む映像群である。具体的には、シーンチェンジを含む映像が、Ａ群に含まれる。Ｂ群は、大きな時間的な変化を含まず、フレーム内の画素値分布が広く、複雑な映像群である。具体的には、Ｂ群は、シーンチェンジは含まないが、フレーム内に様々な輝度のオブジェクトが存在している複雑な映像で、例えば、動きの少ない風景を背景に、動きのある人物やオブジェクトが混在しているような映像である。Ｃ群は、大きな時間的な変化を含まず、フレーム内の画素値分布が狭い映像群である。具体的には、Ｃ群は、シーンチェンジを含まない森林や青空の映像のように、ほとんど動きのないオブジェクトのみが存在している映像である。なお、映像を分類する理由は、映像整合部１３において、より少ない計算量で、基準映像信号R1と劣化映像信号P1とのフレームずれ量τを導出するためである。 Here, the group A is a video group including a large temporal change. Specifically, a video including a scene change is included in the A group. Group B is a complex video group that does not include large temporal changes, has a wide distribution of pixel values within a frame, and is a complex video group. Specifically, group B does not include scene changes, but is a complex video in which objects of various brightness exist in the frame. For example, a person or object with a motion in the background with a little motion It is an image that is mixed. Group C is a video group that does not include large temporal changes and has a narrow distribution of pixel values within a frame. Specifically, the group C is an image in which only an object that hardly moves is present, such as a forest or blue sky image that does not include a scene change. The reason for classifying the video is that the video matching unit 13 derives the frame shift amount τ between the reference video signal R1 and the degraded video signal P1 with a smaller calculation amount.

映像分類情報記憶部１４には、基準映像信号R1の分類に用いられる情報が記憶されている。例えば、映像分類情報記憶部１４には、閾値Th_TI、閾値Th_SI、閾値Th_PSNR(m)、符号化難易度分類図、及び閾値Th_std_PSNR等が記憶されている。閾値Th_TIは、Ａ群とＡ群以外とを分類するための、基準映像信号R1のTI値に対する閾値である。閾値Th_SI、閾値Th_PSNR(m)、符号化難易度分類図、及び閾値Th_std_PSNRは、Ｂ群とＣ群とを分類するための、基準映像信号R1のSI値又はPSNRに対する閾値等である。 The video classification information storage unit 14 stores information used for classification of the reference video signal R1. For example, the video classification information storage unit 14 stores a threshold value Th_TI, a threshold value Th_SI, a threshold value Th_PSNR (m), an encoding difficulty level classification diagram, a threshold value Th_std_PSNR, and the like. The threshold value Th_TI is a threshold value for the TI value of the reference video signal R1 for classifying the A group and other than the A group. The threshold value Th_SI, the threshold value Th_PSNR (m), the encoding difficulty level classification diagram, and the threshold value Th_std_PSNR are the SI value of the reference video signal R1 or the threshold value for the PSNR for classifying the B group and the C group.

映像整合部１３は、映像分類部１２から出力される映像分類結果Xに応じた方法で、基準映像信号R1の特徴量と劣化映像信号P1の特徴量とに基づいてフレームずれ量τを算出する。 The video matching unit 13 calculates a frame shift amount τ based on the feature amount of the reference video signal R1 and the feature amount of the degraded video signal P1 by a method according to the video classification result X output from the video classification unit 12. .

以下、映像分類部１２及び映像整合部１３が実行する処理手順について更に詳しく説明する。図３は、Ａ群又はそれ以外への第１の分類方法を説明するための図である。 Hereinafter, the processing procedure executed by the video classification unit 12 and the video matching unit 13 will be described in more detail. FIG. 3 is a diagram for explaining a first classification method for group A or other groups.

ステップＳ１０１において、映像分類部１２は、基準映像信号R1のTI値（TI_R1(k)(k=1,2,…,r)）の最大値（max(TI_R1)）が、映像分類情報記憶部１４に記憶されている閾値Th_TIを超えるか否かを判定する。 In step S101, the video classification unit 12 determines that the maximum value (max (TI_R1)) of the TI values (TI_R1 (k) (k = 1, 2,..., R)) of the reference video signal R1 is the video classification information storage unit. It is determined whether or not the threshold value Th_TI stored in 14 is exceeded.

max(TI_R1)が、閾値Th_TIを超える場合（ステップＳ１０１でＹｅｓ）、映像分類部１２は、基準映像信号R1をＡ群に分類する（ステップＳ１０２）。ここで、max(TI_R1)が、Th_TIよりも大きいことは、シーンチェンジを含むことを意味し、TI値が最大となる時間（t_TI_max_R1）に基づいて、容易に基準映像信号R1と劣化映像信号P1との時間的な位置の整合をとることができる。 When max (TI_R1) exceeds the threshold Th_TI (Yes in step S101), the video classification unit 12 classifies the reference video signal R1 into the A group (step S102). Here, when max (TI_R1) is larger than Th_TI, it means that a scene change is included. Based on the time (t_TI_max_R1) at which the TI value becomes maximum, the reference video signal R1 and the degraded video signal P1 can be easily obtained. It is possible to match the temporal position with

一方、max(TI_R1)が、閾値Th_TI以下である場合（ステップＳ１０１でＮｏ）、映像分類部１２は、基準映像信号R1をＢ群又はＣ群に分類するための処理を実行する（ステップＳ１０３）。 On the other hand, when max (TI_R1) is equal to or less than the threshold value Th_TI (No in step S101), the video classification unit 12 executes processing for classifying the reference video signal R1 into the B group or the C group (step S103). .

なお、max(TI_R1)を利用するのではなく、図４に示されるように、基準映像信号R1のTI値（TI_R1(k)(k=1,2,…,r)）の微分値（TI_R1'）を導出し、TI_R1'の最大値max(TI_R1')が、映像分類情報記憶部１４に記憶されている閾値（Th_TI'）を超えるか否かによって、基準映像信号R1が、Ａ群又はそれ以外に分類されてもよい。ここで、TI値の微分値が大きいことは、シーンチェンジのタイミングであることを意味する。 Instead of using max (TI_R1), as shown in FIG. 4, the differential value (TI_R1) of the TI value (TI_R1 (k) (k = 1, 2,..., R)) of the reference video signal R1. )), And the reference video signal R1 is set to A group or depending on whether or not the maximum value max (TI_R1 ') of TI_R1' exceeds a threshold (Th_TI ') stored in the video classification information storage unit 14. It may be classified other than that. Here, a large differential value of the TI value means a scene change timing.

続いて、ステップＳ１０３（Ｂ群又はＣ群への分類処理）の詳細について説明する。ステップＳ１０３では、基準映像信号R1のSI値又はPSNRに基づいて、基準映像信号R1がＢ群又はＣ群に分類される。SI値及びPSNRに基づいてＢ群又はＣ群に分類する理由は、SI値が大きいことはフレーム内の画素値分布が広く、様々なオブジェクトが存在する複雑な映像であることを意味するためであり、PSNRが小さいことは、符号化が困難な複雑な映像であることを意味するためである。 Next, details of step S103 (classification process into the B group or the C group) will be described. In step S103, the reference video signal R1 is classified into the B group or the C group based on the SI value or PSNR of the reference video signal R1. The reason for classifying into the B group or the C group based on the SI value and PSNR is that a large SI value means that the pixel value distribution in the frame is wide, and that it is a complex video in which various objects exist. In other words, the low PSNR means that the video is complicated and difficult to encode.

ここでは、第１の方法から第３の方法の３種類の分類方法について述べる。第１の方法は、基準映像信号R1のSI値（SI_R1(k)(k=1,2,…,r)）の最大値（max(SI_R1)）に基づいて、基準映像信号R1をＢ群又はＣ群に分類する方法である。図５は、Ｂ群又はＣ群への第１の分類方法を説明するための図である。 Here, three classification methods from the first method to the third method will be described. In the first method, based on the maximum value (max (SI_R1)) of the SI values (SI_R1 (k) (k = 1, 2,..., R)) of the reference video signal R1, the reference video signal R1 is grouped into the B group. Or it is the method of classifying into C group. FIG. 5 is a diagram for explaining a first classification method into the B group or the C group.

ステップＳ２０１において、映像分類部１２は、基準映像信号R1のSI値（SI_R1(k)(k=1,2,…,r))の最大値（max(SI_R1)）が、映像分類情報記憶部１４に記憶されている閾値Th_SIを超えるか否かを判定する。max(SI_R1)が、閾値Th_SIを超える場合（ステップＳ２０２でＹｅｓ）、映像分類部１２は、基準映像信号R1をＢ群に分類する（ステップＳ２０２）。ここで、SI値が大きいことはフレーム内の画素値分布が広く、様々なオブジェクトが存在する複雑な映像であることを意味する。 In step S201, the video classification unit 12 determines that the maximum value (max (SI_R1)) of the SI value (SI_R1 (k) (k = 1, 2,..., R)) of the reference video signal R1 is the video classification information storage unit. It is determined whether or not the threshold value Th_SI stored in 14 is exceeded. When max (SI_R1) exceeds the threshold value Th_SI (Yes in step S202), the video classification unit 12 classifies the reference video signal R1 into the B group (step S202). Here, a large SI value means that the pixel value distribution in the frame is wide and the image is a complex image in which various objects exist.

一方、max(SI_R1)が、閾値Th_SI以下である場合（ステップＳ２０２でＮｏ）、映像分類部１２は、基準映像信号R1をＣ群に分類する（ステップＳ２０３）。 On the other hand, when max (SI_R1) is equal to or smaller than the threshold Th_SI (No in step S202), the video classification unit 12 classifies the reference video signal R1 into the group C (step S203).

続いて、第２の方法について述べる。第２の方法は、基準映像信号R1を１つの符号化ビットレート（BR(m)）で符号化したときのPSNR（PSNR_R1(m)）に基づいて、基準映像信号R1をＢ群又はＣ群に分類する方法である。図６は、Ｂ群又はＣ群への第２の分類方法を説明するための図である。 Next, the second method will be described. In the second method, based on PSNR (PSNR_R1 (m)) when the reference video signal R1 is encoded at one encoding bit rate (BR (m)), the reference video signal R1 is group B or group C. It is a method to classify. FIG. 6 is a diagram for explaining a second classification method into the B group or the C group.

ステップＳ３０１において、映像分類部１２は、基準映像信号R1を１つの符号化ビットレートBR(m)で符号化したときのPSNR（PSNR_R1(m)）が、映像分類情報記憶部１４に記憶されている閾値Th_PSNR(m)未満であるか否かを判定する。PSNR_R1(m)が、閾値Th_PSNR(m)未満である場合（ステップＳ３０１でＹｅｓ）、映像分類部１２は、基準映像信号R1をＢ群に分類する（ステップＳ３０２）。ここで、PSNR_R1(m)が、閾値Th_PSNR(m)未満であることは、符号化が難しい、複雑な映像であることを意味する。 In step S301, the video classification unit 12 stores the PSNR (PSNR_R1 (m)) when the reference video signal R1 is encoded at one encoding bit rate BR (m) in the video classification information storage unit 14. It is determined whether it is less than a certain threshold Th_PSNR (m). When PSNR_R1 (m) is less than the threshold Th_PSNR (m) (Yes in step S301), the video classification unit 12 classifies the reference video signal R1 into the group B (step S302). Here, PSNR_R1 (m) being less than the threshold Th_PSNR (m) means that the video is complicated and difficult to encode.

一方、PSNR_R1(m)が、閾値Th_PSNR(m)以上である場合（ステップＳ３０１でＮｏ）、映像分類部１２は、基準映像信号R1をＣ群に分類する（ステップＳ３０３）。 On the other hand, when PSNR_R1 (m) is greater than or equal to the threshold Th_PSNR (m) (No in step S301), the video classification unit 12 classifies the reference video signal R1 into the C group (step S303).

続いて、第３の方法について述べる。第３の方法は、基準映像信号R1をn個の符号化ビットレートで符号化したときの各PSNR（PSNR_R1(l)）の特性に基づいて、基準映像信号R1をＢ群又はＣ群に分類する方法である。図７は、Ｂ群又はＣ群への第３の分類方法を説明するための図である。 Subsequently, the third method will be described. The third method classifies the reference video signal R1 into the B group or the C group based on the characteristics of each PSNR (PSNR_R1 (l)) when the reference video signal R1 is encoded with n encoding bit rates. It is a method to do. FIG. 7 is a diagram for explaining a third classification method into the B group or the C group.

映像分類部１２は、基準映像信号R1をn個の符号化ビットレートで符号化したときのPSNR（PSNR_R1(l)）を用いて、映像分類情報記憶部１４に記憶されている符号化難易度分類図（図７）に基づいて、特性の近い方の群に分類する。Ｂ群は、符号化によるPSNRの変化が大きい映像群で、符号化が難しい複雑な映像群である。一方、Ｃ群は、符号化によるPSNRの変化が小さい映像群で、符号化が容易な映像群である。なお、符号化難易度分類図（図７）よれば、Ｂ群に属する映像のPSNRは、符号化ビットレートに応じた変化が大きく、Ｃ群に属する映像のPSNRは、符号化ビットレートに応じた変化が小さいことが分かる。これより、以下の（１）式を用いて算出したPSNRの標準偏差（std_PSNR_R1）に基づいて、PSNRの標準偏差（std_PSNR_R1）が閾値Th_std_PSNRを超える場合には、基準映像信号R1がＢ群に分類され、PSNRの標準偏差（std_PSNR_R1）が閾値Th_std_PSNR以下である場合は、基準映像信号R1がＣ群に分類されてもよい。 The video classification unit 12 uses the PSNR (PSNR_R1 (l)) obtained when the reference video signal R1 is encoded at n encoding bit rates, and the encoding difficulty level stored in the video classification information storage unit 14 Based on the classification diagram (FIG. 7), classification is made into a group having a closer characteristic. Group B is a video group in which PSNR changes greatly due to encoding, and is a complex video group that is difficult to encode. On the other hand, the group C is a video group in which a change in PSNR due to encoding is small, and is a video group that can be encoded easily. According to the encoding difficulty level classification diagram (FIG. 7), the PSNR of the video belonging to the B group varies greatly according to the encoding bit rate, and the PSNR of the video belonging to the C group depends on the encoding bit rate. It can be seen that the change is small. Thus, based on the standard deviation of PSNR (std_PSNR_R1) calculated using the following equation (1), when the standard deviation of PSNR (std_PSNR_R1) exceeds the threshold Th_std_PSNR, the reference video signal R1 is classified into the B group. When the standard deviation of PSNR (std_PSNR_R1) is less than or equal to the threshold value Th_std_PSNR, the reference video signal R1 may be classified into the C group.

次に、映像整合部１３が実行する処理について説明する。まず、Ａ群に関してフレームずれ量τを算出する方法を述べる。図８は、Ａ群のフレームずれ量τの算出方法を説明するための図である。

Next, processing executed by the video matching unit 13 will be described. First, a method for calculating the frame shift amount τ for the A group will be described. FIG. 8 is a diagram for explaining a method of calculating the frame shift amount τ of the A group.

Ａ群に属する映像は、シーンチェンジを含むため、映像整合部１３は、シーンチェンジのタイミングで、基準映像信号R1と劣化映像信号P1との時間的な位置の整合をとる。具体的には、映像整合部１３は、基準映像信号R1と劣化映像信号P1とのそれぞれのTI値（TI_R1(k)(k=1,2,…,r）、TI_P1(k)(k=1,2,…,p)）が最大となるフレーム（t_TImax_R1、t_TImax_P1）を導出する。映像整合部１３は、t_TImax_R1、t_TImax_P1より、基準映像信号R1と劣化映像信号P1とのフレームずれ量τを、以下の（２）式により算出する。 Since the video belonging to the A group includes a scene change, the video matching unit 13 matches the temporal position of the reference video signal R1 and the degraded video signal P1 at the timing of the scene change. Specifically, the video matching unit 13 sets TI values (TI_R1 (k) (k = 1, 2,..., R), TI_P1 (k) (k =) of the reference video signal R1 and the degraded video signal P1. Deriving a frame (t_TImax_R1, t_TImax_P1) that maximizes 1,2, ..., p)). The video matching unit 13 calculates the frame shift amount τ between the reference video signal R1 and the deteriorated video signal P1 based on t_TImax_R1 and t_TImax_P1, using the following equation (2).

τ＝t_TImax_R1−t_TImax_P1 （２）
これより、劣化映像信号P1と基準映像信号R1とは、τフレームずれていることが分かる。なお、TI_R1、TI_P1の微分値が最大となるタイミングからフレームずれ量τが算出されてよい。 τ = t_TImax_R1−t_TImax_P1 (2)
From this, it can be seen that the deteriorated video signal P1 and the reference video signal R1 are shifted by τ frames. Note that the frame shift amount τ may be calculated from the timing at which the differential values of TI_R1 and TI_P1 are maximized.

τ＞０の場合には、劣化映像信号P1が基準映像信号R1よりも、τフレーム進んでいることを意味するので、映像整合部１３は、図９に示されるように、劣化映像信号P1の最初からτフレームを除外した整合劣化映像信号P2と、基準映像信号R1の最後からτフレームを除外した整合基準映像信号R2とを出力する。 In the case of τ> 0, it means that the degraded video signal P1 is advanced by τ frames from the reference video signal R1, and therefore the video matching unit 13 displays the degraded video signal P1 as shown in FIG. The matching deteriorated video signal P2 excluding the τ frame from the beginning and the matching reference video signal R2 excluding the τ frame from the end of the reference video signal R1 are output.

τ＜０の場合には、基準映像信号R1が劣化映像信号P1よりも、τフレーム進んでいることを意味するので、映像整合部１３は、基準映像信号R1の最初からτフレームを除外した整合基準映像信号R2と、劣化映像信号P1の最後からτフレームを除外した整合劣化映像信号P2とを出力する。 When τ <0, it means that the reference video signal R1 is ahead of the degraded video signal P1 by τ frames. Therefore, the video matching unit 13 excludes the τ frame from the beginning of the reference video signal R1. The reference video signal R2 and the matched and degraded video signal P2 excluding the τ frame from the end of the degraded video signal P1 are output.

次に、Ｂ群及びＣ群に関して、フレームずれ量τを算出する方法を述べる。４Ｋ／８Ｋ映像といった高解像度化及び高精細化とともに、フレームレートの向上が図られている。フレームレートが向上すると、フレーム間の変化が小さくなるため、TI値の変化は小さくなる。しかし、フレーム間の変化は小さくても静止画ではないため、フレーム内での輝度値は変化する。そこで、映像整合部１３は、SI値に基づいて、基準映像信号R1と劣化映像信号P1のフレームずれ量τを導出する。 Next, a method for calculating the frame shift amount τ for the B group and the C group will be described. Along with higher resolution and higher definition such as 4K / 8K video, the frame rate is improved. As the frame rate increases, the change between TI values becomes smaller because the change between frames becomes smaller. However, even if the change between frames is small, it is not a still image, so the luminance value in the frame changes. Therefore, the video matching unit 13 derives a frame shift amount τ between the reference video signal R1 and the degraded video signal P1 based on the SI value.

図１０は、Ｂ群及びＣ群のフレームずれ量τの算出方法を説明するための図である。図１０に示されるように、映像整合部１３は、基準映像信号R1のSI値（SI_R1(k)(k=1,2,…,r)）と、劣化映像信号P1のSI値（SI_P1(k=1,2,…,p)）との相互相関係数を導出し、相互相関係数が最大となるフレームずれ量τを導出する。 FIG. 10 is a diagram for explaining a method of calculating the frame shift amount τ of the B group and the C group. As shown in FIG. 10, the video matching unit 13 performs the SI value (SI_R1 (k) (k = 1, 2,..., R)) of the reference video signal R1 and the SI value (SI_P1 ( k = 1, 2,..., p)), and a frame shift amount τ that maximizes the cross correlation coefficient is derived.

なお、Ｂ群とＣ群とでは、映像の複雑さが異なることを考慮し、Ｂ群とＣ群とでは相互相関係数を導出する際の対象とするフレーム数が変更される。具体的には、Ｂ群に関して分析対象とするフレーム数をFrame_B、Ｃ群に関して分析対象とするフレーム数をFrame_Cとすると、Frame_B＜Frame_Cとする。なお、Frame_Cは、基準映像信号R1のフレーム数及び劣化映像信号P1のフレーム数のうち、少ない方のフレーム数に一致した値とされる。 Note that the number of frames to be used in deriving the cross-correlation coefficient is changed between the B group and the C group in consideration of the fact that the video complexity differs between the B group and the C group. Specifically, if the number of frames to be analyzed for Group B is Frame_B and the number of frames to be analyzed for Group C is Frame_C, then Frame_B <Frame_C. Note that Frame_C is a value that matches the smaller number of frames of the number of frames of the reference video signal R1 and the number of frames of the degraded video signal P1.

このように、Ｂ群とＣ群とで相互相関係数を導出する際のフレーム数を変える理由は、Ｂ群の方がＣ群よりもSI値の変化が大きく、少ないフレーム数であっても、相互相関係数によってフレームずれ量τを算出可能なためである。Ｃ群のSI値は変化が小さいため、短い時間では変化をとらえることが困難であるが、ある程度の時間幅をもって相互相関係数を求めることで、わずかな変化をとらえることが可能となり、基準映像信号R1と劣化映像信号P1とのフレームずれ量τを導出することが可能となる。なお、Ｂ群に関しても、分析対象とするフレーム数をFrame_Cとすることが可能だが、より少ないフレーム数のFrame_Bを用いることで、計算量の削減が実現できる。 As described above, the reason for changing the number of frames when deriving the cross-correlation coefficient between the B group and the C group is that the change in the SI value is larger in the B group than in the C group. This is because the frame shift amount τ can be calculated from the cross-correlation coefficient. Since the SI value of Group C is small, it is difficult to capture the change in a short time. However, by obtaining the cross-correlation coefficient with a certain amount of time, it is possible to capture a slight change, and the reference image It is possible to derive the frame shift amount τ between the signal R1 and the deteriorated video signal P1. Regarding the group B, the number of frames to be analyzed can be set to Frame_C, but the amount of calculation can be reduced by using Frame_B having a smaller number of frames.

相互相関係数より導出した基準映像信号R1と劣化映像信号P1とのフレームずれ量τに関して、τ＞０の場合には、劣化映像信号P1が基準映像信号R1よりも、τフレーム進んでいることを意味するので、映像整合部１３は、劣化映像信号P1の最初からτフレームを除外した整合劣化映像信号P2と、基準映像信号R1の最後からτフレームを除外した整合基準映像信号R2とを出力する。τ＜０の場合には、基準映像信号R1が劣化映像信号P1よりも、τフレーム進んでいることを意味するので、映像整合部１３は、基準映像信号R1の最初からτフレームを除外した整合基準映像信号R2と、劣化映像信号P1の最後からτフレームを除外した整合劣化映像信号P2とを出力する。 Regarding the frame shift amount τ between the reference video signal R1 derived from the cross-correlation coefficient and the degraded video signal P1, if τ> 0, the degraded video signal P1 is advanced by τ frames from the reference video signal R1. Therefore, the video matching unit 13 outputs a matched degraded video signal P2 excluding the τ frame from the beginning of the degraded video signal P1 and a matched reference video signal R2 from which the τ frame is excluded from the end of the reference video signal R1. To do. When τ <0, it means that the reference video signal R1 is ahead of the degraded video signal P1 by τ frames. Therefore, the video matching unit 13 excludes the τ frame from the beginning of the reference video signal R1. The reference video signal R2 and the matched and degraded video signal P2 excluding the τ frame from the end of the degraded video signal P1 are output.

または、整合劣化映像信号P2及び整合基準映像信号R2そのものではなく、整合を取るための基準映像信号R1のフレーム位置と劣化映像信号P1のフレーム位置とが出力されてもよい。 Alternatively, the frame position of the reference video signal R1 and the frame position of the degraded video signal P1 for matching may be output instead of the matched degraded video signal P2 and the matched reference video signal R2 itself.

以上の処理により、整合基準映像信号R2と整合劣化映像信号P2とが出力される。又は、整合を取るための基準映像信号のフレーム位置と劣化映像信号のフレーム位置とが出力され、そちらを用いて整合基準映像信号R2と整合劣化映像信号P2とが算出される。この整合基準映像信号R2と整合劣化映像信号P2とを用いることで、基準映像信号R1と劣化映像信号P1との間で時間的な整合がとれた状態で、適切な品質推定が可能となる。 Through the above processing, the matching reference video signal R2 and the matching deteriorated video signal P2 are output. Alternatively, the frame position of the reference video signal and the frame position of the degraded video signal for matching are output, and the matched reference video signal R2 and the matched degraded video signal P2 are calculated using these. By using the matching reference video signal R2 and the matching deteriorated video signal P2, it is possible to perform appropriate quality estimation in a state where the reference video signal R1 and the deteriorated video signal P1 are temporally matched.

上述したように、本実施の形態によれば、主観品質の推定のために、基準映像信号と劣化映像信号との物理的特徴量を比較するに際し、時間的な変化が小さい映像に対しても、それらの時間的な位置の整合の精度を向上させることができる。 As described above, according to the present embodiment, when comparing the physical feature values of the reference video signal and the deteriorated video signal in order to estimate the subjective quality, even for a video with a small temporal change. Thus, the accuracy of the temporal position alignment can be improved.

なお、本実施の形態において、映像特徴量導出部１１は、導出部の一例である。映像分類部１２は、分類部の一例である。映像整合部１３は、整合部の一例である。閾値Th_TI又は閾値Th_TI'は、第１の閾値の一例である。閾値Th_SIは、第２の閾値の一例である。閾値Th_R1(m)は、第３の閾値の一例である。 In the present embodiment, the video feature quantity deriving unit 11 is an example of a deriving unit. The video classification unit 12 is an example of a classification unit. The video matching unit 13 is an example of a matching unit. The threshold value Th_TI or the threshold value Th_TI ′ is an example of a first threshold value. The threshold value Th_SI is an example of a second threshold value. The threshold value Th_R1 (m) is an example of a third threshold value.

以上、本発明の実施例について詳述したが、本発明は斯かる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 As mentioned above, although the Example of this invention was explained in full detail, this invention is not limited to such specific embodiment, In the range of the summary of this invention described in the claim, various deformation | transformation・ Change is possible.

１０映像整合装置
１１映像特徴量導出部
１２映像分類部
１３映像整合部
１４映像分類情報記憶部
１００ドライブ装置
１０１記録媒体
１０２補助記憶装置
１０３メモリ装置
１０４ＣＰＵ
１０５インタフェース装置
R1 基準映像信号
P1 劣化映像信号
r 基準映像信号R1のフレーム数
p 劣化映像信号P1のフレーム数
SI_R1(k) 基準映像信号R1のkフレーム目の空間情報量
TI_R1(k) 基準映像信号R1のkフレーム目の時間情報量
PSNR_R1(k) 基準映像信号R1のkフレーム目の符号化難易度
SI_P1(k) 劣化映像信号P1のkフレーム目の空間情報量
TI_P1(k) 劣化映像信号P1のｋフレーム目の時間情報量
X 映像分類結果
Ａ、Ｂ、Ｃ映像分類類型
Th_TI TI値の閾値
max(TI_R1) TI_R1の最大値
TI_R1' TI_R1の微分値
Th_TI'TI値の微分値の閾値
max(SI_R1) SI_R1の最大値
Th_SI SI値の閾値
BR 符号化ビットレート
n 符号化ビットレートの種類
PSNR_R1(l) 符号化ビットレートBR(l)で基準映像信号R1を符号化した際のPSNR
Th_PSNR(m) PSNRの閾値
std_PSNR_R1 PSNR_R1(l)の標準偏差
Th_std_PSNR PSNRの標準偏差の閾値
t_TImax_R1 基準映像信号R1のTI値が最大となるフレーム
t_TImax_P1 劣化映像信号P1のTI値が最大となるフレーム
τ 基準映像信号R1と劣化映像信号P1のフレームずれ量
Frame_B Ｂ群に属する映像の相互相関係数を算出する際の時間長
Frame_C Ｃ群に属する映像の相互相関係数を算出する際の時間長
R2 整合基準映像信号
P2 整合劣化映像信号 DESCRIPTION OF SYMBOLS 10 Image | video matching apparatus 11 Image | video feature-value derivation | leading-out part 12 Image | video classification | category part 13 Image | video matching part 14 Image | video classification information storage part 100 Drive apparatus 101 Recording medium 102 Auxiliary storage apparatus 103 Memory apparatus 104 CPU
105 Interface device
R1 reference video signal
P1 Degraded video signal
r Number of frames of reference video signal R1
p Number of frames of degraded video signal P1
SI_R1 (k) Spatial information amount of the kth frame of the reference video signal R1
TI_R1 (k) Time information amount of the kth frame of the reference video signal R1
PSNR_R1 (k) Encoding difficulty of the kth frame of the reference video signal R1
SI_P1 (k) Spatial information amount of frame k of degraded video signal P1
TI_P1 (k) Amount of time information in frame k of degraded video signal P1
X Video classification results A, B, C Video classification type
Th_TI TI value threshold
max (TI_R1) Maximum value of TI_R1
TI_R1 'Differential value of TI_R1
Threshold for differential value of Th_TI'TI value
max (SI_R1) Maximum value of SI_R1
Th_SI SI value threshold
BR encoding bit rate
n Types of encoding bit rates
PSNR_R1 (l) PSNR when the reference video signal R1 is encoded at the encoding bit rate BR (l)
Th_PSNR (m) PSNR threshold
std_PSNR_R1 Standard deviation of PSNR_R1 (l)
Th_std_PSNR PSNR standard deviation threshold
t_TImax_R1 Frame that maximizes the TI value of the reference video signal R1
t_TImax_P1 Frame where the TI value of the degraded video signal P1 is maximum τ Frame shift amount between the reference video signal R1 and the degraded video signal P1
Frame_B Time length for calculating the cross-correlation coefficient of video belonging to group B
Frame_C Time length for calculating the cross-correlation coefficient of video belonging to group C
R2 alignment reference video signal
P2 alignment degraded video signal

Claims

A derivation unit for deriving respective feature amounts of the first video signal and the second video signal in which the video quality of the first video signal has changed ;
A classification for classifying the first video signal into one of a plurality of types classified based on at least the magnitude of temporal change of the video based on the feature quantity of the first video signal. And
The first video signal and the second video signal based on the respective feature quantities of the first video signal and the second video signal corresponding to the classified type of the first video signal. A matching unit that calculates the amount of positional deviation with respect to
An image matching apparatus comprising:

The derivation unit derives a time information amount for each frame of the first video signal and the second video signal,
The classification unit classifies the first video signal by comparing a time information amount for each frame of the first video signal with a first threshold value,
When the amount of time information for each frame of the first video signal is greater than the first threshold, the matching unit includes the first video signal frame, the second video signal frame , To calculate the amount of time positional deviation of
The image matching apparatus according to claim 1, wherein:

The derivation unit further derives a spatial information amount for each frame of the first video signal and the second video signal,
When the time information amount for each frame of the first video signal is equal to or less than the first threshold, the matching unit determines the spatial information amount for each frame of the first video signal and the second video. Deriving a cross-correlation coefficient with the amount of spatial information for each frame of the signal, and deriving a temporal position shift amount of the frame related to the cross-correlation coefficient.
The image matching apparatus according to claim 2, wherein:

The classifying unit, when a time information amount for each frame of the first video signal is equal to or less than the first threshold, a maximum value of a spatial information amount for each frame of the first video signal; Comparing the threshold of 2 to classify the first video signal;
When the amount of time information for each frame of the first video signal is equal to or less than the first threshold, the matching unit performs the mutual operation according to the classification result of the first video signal by the classification unit. Change the number of frames used to derive the correlation coefficient,
The image matching apparatus according to claim 3, wherein:

The derivation unit derives an encoding difficulty level when the first video signal is encoded at one encoding bit rate,
The classifying unit compares the encoding difficulty level of the first video signal with a third threshold when the amount of time information for each frame of the first video signal is equal to or less than the first threshold. And classifying the first video signal,
When the amount of time information for each frame of the first video signal is equal to or less than the first threshold, the matching unit performs the mutual operation according to the classification result of the first video signal by the classification unit. Change the number of frames used to derive the correlation coefficient,
The image matching apparatus according to claim 3, wherein:

The derivation unit derives respective encoding difficulty levels when the first video signal is encoded at a plurality of encoding bit rates,
When the amount of time information for each frame of the first video signal is equal to or less than the first threshold, the classifying unit determines each encoding difficulty level when encoding at a plurality of encoding bit rates. Classifying the first video signal based on the characteristics;
When the amount of time information for each frame of the first video signal is equal to or less than the first threshold, the matching unit performs the mutual operation according to the classification result of the first video signal by the classification unit. Change the number of frames used to derive the correlation coefficient,
The image matching apparatus according to claim 3, wherein:

Computer
A derivation procedure for deriving respective feature amounts of the first video signal and the second video signal in which the video quality of the first video signal has changed ;
A classification for classifying the first video signal into one of a plurality of types classified based on at least the magnitude of temporal change of the video based on the feature quantity of the first video signal. Procedure and
The first video signal and the second video signal based on the respective feature quantities of the first video signal and the second video signal corresponding to the classified type of the first video signal. An alignment procedure for calculating the amount of positional deviation with respect to
A video matching method comprising:

The program for functioning a computer as each part as described in any one of Claims 1 thru | or 6.