JP2017532858A5

JP2017532858A5 -

Info

Publication number: JP2017532858A5
Application number: JP2017513750A
Authority: JP
Filing date: 2015-09-03
Publication date: 2018-10-11
Anticipated expiration: 2035-09-03

Description

本発明を例示的な実施形態を参照しながら具体的に図示・説明したが、当業者であれば、添付の特許請求の範囲に包含された本発明の範囲を逸脱しない範疇で形態や細部に様々な変更を施せることを理解するであろう。
なお、本発明は、実施の態様として以下の内容を含む。
〔態様１〕
複数の映像フレームを符号化する方法であって、
前記映像フレームは、互いに重なり合わないターゲットブロックを有しており、
当該方法は、
重要度マップが量子化を調整することによって各映像フレーム内の符号化すべき各ターゲットブロックの符号化品質に影響を与えるように、前記重要度マップを用いて前記複数の映像フレームを符号化する過程、
を備え、前記重要度マップが：
時間的情報及び空間的情報を用いて当該重要度マップを設定すること；ならびに、
（ｉ）当該重要度マップが高い数値をとるブロックでは、ブロック量子化パラメータ（ＱＰ）がフレーム量子化パラメータＱＰ _{ｆｒａｍｅ} に比べて小さくされることで、これらのブロックについては高い品質となるように、かつ、（ｉｉ）当該重要度マップが低い数値をとるターゲットブロックでは、前記ブロック量子化パラメータが前記フレーム量子化パラメータＱＰ _{ｆｒａｍｅ} に比べて大きくされることで、これらのブロックについては低い品質となるように、計算によって、前記複数の映像フレームのうちのある映像フレームのどの部分が人間の知覚にとって最も気付き易いのかを当該重要度マップに示させること；
によって構成されている、方法。
〔態様２〕
態様１に記載の方法において、前記空間的情報が、ルールに基づく空間的複雑度マップ（ＳＣＭ）により提供されて、その最初のステップが、前記フレーム内のどのターゲットブロックが当該フレーム内の平均ブロック分散ｖａｒ _{ｆｒａｍｅ} よりも大きい分散を有するかを決定することであり、
前記平均ブロック分散ｖａｒ _{ｆｒａｍｅ} よりも大きい分散を有するブロックに対して、前記フレーム量子化パラメータＱＰ _{ｆｒａｍｅ} よりも高い量子化パラメータ（ＱＰ）値を振り当て、このブロック量子化パラメータ（ＱＰ）の振当量ＱＰ _{ｂｌｏｃｋ} は、そのブロック分散ｖａｒ _{ｂｌｏｃｋ} が前記平均ブロック分散ｖａｒ _{ｆｒａｍｅ} よりもいかなる程度大きいかに従って、前記フレーム量子化パラメータＱＰ _{ｆｒａｍｅ} と量子化パラメータ上限ＱＰ _ｍａｘとの間で線形的に増減される、方法。
〔態様３〕
態様１に記載の方法において、前記時間的情報が、
どのターゲットブロックが観測者である人間にとって時間的に最も気付き易いかを示す時間的コントラスト感度関数（ＴＣＳＦ）、および、
どのターゲットブロックが前景データに相当するかを示す真の動きベクトルマップ（ＴＭＶＭ）
により提供されて、前記ＴＣＳＦは、前景データとして特定されたターゲットブロックについてのみ有効とされる、方法。
〔態様４〕
態様２に記載の方法において、分散の大きいブロックは、そのブロック量子化パラメータ（ＱＰ）である前記振当量ＱＰ _{ｂｌｏｃｋ} が、前記ＴＭＶＭがターゲットブロックを前景データとして特定し且つ前記ＴＣＳＦのこのブロックについてのコントラスト感度対数値が０．５未満である場合には前記振当量ＱＰ _{ｂｌｏｃｋ} が２増加するように、前記ＴＣＳＦ及び前記ＴＭＶＭによりさらに洗練化される、方法。
〔態様５〕
態様２に記載の方法において、前記ＳＣＭは、さらに、極めて明るい（１７０超の輝度）か又は極めて暗い（６０未満の輝度）ターゲットブロックのブロック量子化パラメータである前記振当量ＱＰ _{ｂｌｏｃｋ} がＱＰ _ｍａｘに調節し直される輝度マスキングを含む、方法。
〔態様６〕
態様２に記載の方法において、前記ＳＣＭは、さらに、前記符号化された映像の品質レベルに基づく前記量子化パラメータ上限ＱＰ _ｍａｘの動的な決定を含み、
この動的な決定では、イントラ（Ｉ）フレーム内のターゲットブロックの平均構造的類似度（ＳＳＩＭ）算出結果をこれらフレームの平均ブロック分散ｖａｒ _{ｆｒａｍｅ} と共に用いて、品質が測定され、
前記測定された品質が低いと、前記量子化パラメータ上限ＱＰ _ｍａｘの数値が前記フレーム量子化パラメータＱＰ _{ｆｒａｍｅ} に近づくように減らされる、方法。
〔態様７〕
態様２に記載の方法において、分散の極めて小さいブロックに対して、これらの領域における高品質符号化を確実にするために、前記ブロック分散が小さいほど前記振当量ＱＰ _{ｂｌｏｃｋ} の数値が低くなるように（、かつ、品質が高くなるように）、決められた低い量子化パラメータ（ＱＰ）の値である前記振当量ＱＰ _{ｂｌｏｃｋ} が振り当てられる、方法。
〔態様８〕
態様７に記載の方法において、分散の極めて小さいブロックに対する前記低い量子化パラメータ（ＱＰ）の値である前記振当量ＱＰ _{ｂｌｏｃｋ} は、最初に、Ｉフレームについて決められ、その後、Ｐフレーム及びＢフレームについてはｉｐｒａｔｉｏパラメータ及びｐｂｒａｔｉｏパラメータを用いて決められる、方法。
〔態様９〕
態様７に記載の方法において、分散は小さいが、分散が極めて小さいとは見なさないブロックは、当該ブロックについて品質向上が必要か否かを判定するために、
前記ブロック量子化パラメータ（ＱＰ）の初めの推定値である前記振当量ＱＰ _{ｂｌｏｃｋ} が現在のブロックの左、左上、右および右上の既に符号化済みの近傍ブロックの量子化パラメータ（ＱＰ）の値を平均することによって算出されて、且つ、
前記現在のブロックの前記ＳＳＩＭの推定ＳＳＩＭ _ｅｓｔが前記現在のブロックの左、左上、右および右上の既に符号化済みの近傍ブロックのＳＳＩＭ値から算出されて、且つ、
ＳＳＩＭ _ｅｓｔが０．９未満の場合、前記振当量ＱＰ _{ｂｌｏｃｋ} の数値が２減少されるように、
調べられる、方法。
〔態様１０〕
態様９に記載の方法において、前記品質向上は、前記ＴＭＶＭにより前景データとして特定されて且つ前記ＴＣＳＦのコントラスト感度対数値が０．８超であるブロックにのみ適用される、方法。
〔態様１１〕
態様３に記載の方法において、前記ＴＣＳＦの時間的周波数は、前記ターゲットブロックとその参照ブロックとの間の色空間領域におけるＳＳＩＭを用いて波長の近似を求めて且つ動きベクトルの大きさとフレームレートとを用いて速度の近似を求めることによって算出される、方法。
〔態様１２〕
態様３に記載の方法において、前記ＴＣＳＦは、現在のフレームについての当該ＴＣＳＦが最近のフレームにおけるＴＣＳＦマップの重み付き平均であるように且つより最近のフレームがより大きい重み付けを受けるように、複数のフレームにわたって算出される、方法。
〔態様１３〕
態様３に記載の方法において、前記ＴＭＶＭは、前景データの場合にのみ１に設定される、方法。
〔態様１４〕
態様１３に記載の方法において、前景データは、所与のターゲットブロックについてのエンコーダ動きベクトルと当該ブロックについてのグローバル動きベクトルとの差分を算出し、十分に大きい差分を有するブロックが前景データであると判断されることによって特定される、方法。
〔態様１５〕
態様１４に記載の方法において、前景データとして特定されたデータブロックについて、前記グローバル動きベクトルから前記エンコーダ動きベクトルが減算されることによって差分動きベクトルを得て、この差分動きベクトルの大きさが前記ＴＣＳＦの時間的周波数を算出するのに用いられる、方法。
〔態様１６〕
態様３に記載の方法において、前記ＴＣＳＦは、エンコーダからの動きベクトルから算出される、方法。
〔態様１７〕
態様１に記載の方法において、前記重要度マップが前記時間的情報及び前記空間的情報で設定されたものである場合、当該重要度マップは統合化された重要度マップである、方法。
〔態様１８〕
映像データを符号化するシステムであって、
重要度マップを用いて複数の映像フレームを符号化するコーデックであって、当該映像フレームは、互いに重なり合わないターゲットブロックを有している、コーデック、
を備え、前記重要度マップは、量子化を調整することによって各映像フレーム内の符号化すべき各ターゲットブロックの符号化品質に影響を与えるように構成されており、
前記重要度マップが：
時間的情報及び空間的情報を用いて当該重要度マップを設定することであって、これら時間的情報と空間的情報とにより設定された重要度マップは、統合化された重要素マップであること；ならびに、
（ｉ）当該重要度マップが高い数値をとるブロックでは、ブロック量子化パラメータ（ＱＰ）がフレーム量子化パラメータＱＰ _{ｆｒａｍｅ} に比べて小さくされることで、これらのブロックについては高い品質となるように、かつ、（ｉｉ）当該重要度マップが低い数値をとるターゲットブロックでは、前記ブロック量子化パラメータが前記フレーム量子化パラメータＱＰ _{ｆｒａｍｅ} に比べて大きくされることで、これらのブロックについては低い品質となるように、計算によって、前記複数の映像フレームのうちのある映像フレームの、人間の知覚にとって最も気付き易い部分を当該重要度マップに示させること；
によって構成されている、システム。
〔態様１９〕
態様１８に記載のエンコーダにおいて、前記空間的情報が、ルールに基づく空間的複雑度マップ（ＳＣＭ）により提供されて、その最初のステップが、前記フレーム内のどのターゲットブロックが当該フレーム内の平均ブロック分散ｖａｒ _{ｆｒａｍｅ} よりも大きい分散を有するかを決定することであり、
前記平均ブロック分散ｖａｒ _{ｆｒａｍｅ} よりも大きい分散を有するブロックに対して、前記フレーム量子化パラメータＱＰ _{ｆｒａｍｅ} よりも高い量子化パラメータ（ＱＰ）値を振り当て、このブロック量子化パラメータ（ＱＰ）の振当量ＱＰ _{ｂｌｏｃｋ} は、そのブロック分散ｖａｒ _{ｂｌｏｃｋ} が前記平均ブロック分散ｖａｒ _{ｆｒａｍｅ} よりもいかなる程度大きいかに従って、前記フレーム量子化パラメータＱＰ _{ｆｒａｍｅ} と量子化パラメータ上限ＱＰ _ｍａｘとの間で線形的に増減される、エンコーダ。
〔態様２０〕
態様１８に記載のエンコーダにおいて、前記時間的情報が、
どのターゲットブロックが観測者である人間にとって時間的に最も気付き易いかを示す時間的コントラスト感度関数（ＴＣＳＦ）、および、
どのターゲットブロックが前景データに相当するかを示す真の動きベクトルマップ（ＴＭＶＭ）
により提供されて、前記ＴＣＳＦは、前景データとして特定されたターゲットブロックについてのみ有効とされる、エンコーダ。
〔態様２１〕
態様１９に記載のエンコーダにおいて、分散の大きいブロックは、そのブロック量子化パラメータ（ＱＰ）である前記振当量ＱＰ _{ｂｌｏｃｋ} が、前記ＴＭＶＭがターゲットブロックを前景データとして特定し且つ前記ＴＣＳＦのこのブロックについてのコントラスト感度対数値が０．５未満である場合には前記振当量ＱＰ _{ｂｌｏｃｋ} が２増加するように、前記ＴＣＳＦ及び前記ＴＭＶＭによりさらに洗練化される、エンコーダ。
〔態様２２〕
態様１９に記載のエンコーダにおいて、前記ＳＣＭは、さらに、極めて明るい（１７０超の輝度）か又は極めて暗い（６０未満の輝度）ターゲットブロックのブロック量子化パラメータである前記振当量ＱＰ _{ｂｌｏｃｋ} がＱＰ _ｍａｘに調節し直される輝度マスキングを含む、エンコーダ。
〔態様２３〕
態様１９に記載のエンコーダにおいて、前記ＳＣＭは、さらに、符号化された映像の品質レベルに前記量子化パラメータ上限基づくＱＰ _ｍａｘの動的な決定を含み、
この動的な決定では、イントラ（Ｉ）フレーム内のターゲットブロックの平均構造的類似度（ＳＳＩＭ）算出結果をこれらフレームの平均ブロック分散ｖａｒ _{ｆｒａｍｅ} と共に用いて、品質が測定され、
測定された品質が低いと、前記量子化パラメータ上限ＱＰ _ｍａｘの数値が前記フレーム量子化パラメータＱＰ _{ｆｒａｍｅ} 近づくように減らされる、エンコーダ。
〔態様２４〕
態様１９に記載のエンコーダにおいて、分散の極めて小さいブロックに対して、これらの領域における高品質符号化を確実にするために、前記ブロック分散が小さいほど前記振当量ＱＰ _{ｂｌｏｃｋ} の数値が低くなるように（、かつ、品質が高くなるように）、決められた低い量子化パラメータ（ＱＰ）の値である前記振当量ＱＰ _{ｂｌｏｃｋ} が振り当てられる、エンコーダ。
〔態様２５〕
態様２４に記載のエンコーダにおいて、分散の極めて小さいブロックに対する前記低い量子化パラメータ（ＱＰ）の値である前記振当量ＱＰ _{ｂｌｏｃｋ} は、最初に、Ｉフレームについては決められ、その後、Ｐフレーム及びＢフレームについてはｉｐｒａｔｉｏパラメータ及びｐｂｒａｔｉｏパラメータを用いて決められる、エンコーダ。
〔態様２６〕
態様１９に記載のシステムにおいて、分散は小さいが、分散が極めて小さいとは見なさないブロックは、当該ブロックについて品質向上が必要か否かを判定するために、
前記ブロック量子化パラメータ（ＱＰ）の初めの推定値である前記振当量ＱＰ _{ｂｌｏｃｋ} が現在のブロックの左、左上、右および右上の既に符号化済みの近傍ブロックの量子化パラメータ（ＱＰ）の値を平均することによって算出されて、且つ、
前記現在のブロックの前記ＳＳＩＭの推定ＳＳＩＭ _ｅｓｔが前記現在のブロックの左、左上、右および右上の既に符号化済みの近傍ブロックのＳＳＩＭ値から算出されて、且つ、
ＳＳＩＭ _ｅｓｔが０．９未満の場合、前記振当量ＱＰ _{ｂｌｏｃｋ} の数値が２減少されるように、
調べられる、システム。
〔態様２７〕
態様２６に記載のシステムにおいて、前記品質向上は、前記ＴＭＶＭにより前景データとして特定されて且つ前記ＴＣＳＦのコントラスト感度対数値が０．８超であるブロックにのみ適用される、システム。
〔態様２８〕
態様２０に記載のシステムにおいて、前記ＴＣＳＦの時間的周波数は、前記ターゲットブロックとその参照ブロックとの間の色空間領域におけるＳＳＩＭを用いて波長の近似を求めて且つ動きベクトルの大きさとフレームレートとを用いて速度の近似を求めることによって算出される、システム。
〔態様２９〕
態様２０に記載のシステムにおいて、前記ＴＣＳＦは、現在のフレームについての当該ＴＣＳＦが最近のフレームにおけるＴＣＳＦマップの重み付き平均であるように且つより最近のフレームがより大きい重み付けを受けるように、複数のフレームにわたって算出される、システム。
〔態様３０〕
態様２０に記載のシステムにおいて、前記ＴＭＶＭは、前景データの場合にのみ１に設定される、システム。
〔態様３１〕
態様３０に記載のシステムにおいて、前景データは、所与のターゲットブロックについてのエンコーダ動きベクトルと当該ブロックについてのグローバル動きベクトルとの差分を算出し、十分に大きい差分を有するブロックが前景データであると判断されることによって特定される、システム。
〔態様３２〕
態様２０に記載のシステムにおいて、前景データとして特定されたデータブロックについて、前記グローバル動きベクトルから前記エンコーダ動きベクトルが減算されることによって差分動きベクトルを得て、この差分動きベクトルの大きさが前記ＴＣＳＦの時間的周波数を算出するのに用いられる、システム。
〔態様３３〕
態様２０に記載のシステムにおいて、前記ＴＣＳＦは、前記エンコーダからの動きベクトルから算出される、システム。
〔態様３４〕
態様１８に記載のシステムにおいて、前記重要度マップが前記時間的情報と前記空間的情報で設定されたものである場合、当該重要度マップは統合化された重要度マップである、システム。
Although the present invention has been particularly shown and described with reference to exemplary embodiments, those skilled in the art will recognize that the form and details fall within the scope of the invention as encompassed by the appended claims. You will understand that various changes can be made.
In addition, this invention contains the following content as an aspect.
[Aspect 1]
A method of encoding a plurality of video frames,
The video frames have target blocks that do not overlap each other;
The method is
Encoding the plurality of video frames using the importance map such that the importance map affects the encoding quality of each target block to be encoded in each video frame by adjusting quantization ,
The importance map comprises:
Setting up the importance map using temporal and spatial information; and
(I) In blocks where the importance map has a high numerical value, the block quantization parameter (QP) is made smaller than the frame quantization parameter QP _frame , so that these blocks have high quality. And (ii) in the target block in which the importance map has a low value, the block quantization parameter is set larger than the frame quantization parameter QP _frame , so that these blocks have low quality. And causing the importance map to indicate which part of the video frame among the plurality of video frames is most easily noticed by human perception by calculation;
Consists of, the method.
[Aspect 2]
The method of aspect 1, wherein the spatial information is provided by a rule-based spatial complexity map (SCM), the first step of which target block in the frame is the average block in the frame. Determining whether to have a variance greater than the variance var _frame ,
A block having a variance larger than the average block variance var _frame is assigned a quantization parameter (QP) value higher than the frame quantization parameter QP _frame , and a block equivalent QP of the block quantization parameter (QP) is assigned. _{The block} is linearly increased or decreased between the frame quantization parameter QP _frame and the quantization parameter upper limit QP _max according to how much the block variance var _block is larger than the average block variance var _frame .
[Aspect 3]
In the method according to aspect 1, the temporal information is
A temporal contrast sensitivity function (TCSF) that indicates which target block is most noticeable in time for the observer human, and
True motion vector map (TMVM) showing which target blocks correspond to foreground data
Provided that the TCSF is only valid for target blocks identified as foreground data.
[Aspect 4]
The method according to embodiment 2, a large block of the dispersion, the appropriation amount QP _block is the block quantization parameter (QP) is the TMVM is for this block in the specified and the TCSF the target block as a foreground data The method further refined by the TCSF and the TMVM such that the shaking equivalent QP _block is increased by 2 if the contrast sensitivity logarithm is less than 0.5 .
[Aspect 5]
The method according to aspect 2, wherein the SCM is further characterized in that the vibration equivalent QP _block, which is a block quantization parameter of a very bright (greater than 170 brightness) or very dark (less than 60 brightness) target block, is QP _max . A method comprising luminance masking that is readjusted.
[Aspect 6]
The method according to aspect 2, wherein the SCM further includes dynamic determination of the quantization parameter upper limit QP _max based on a quality level of the encoded video ,
In this dynamic decision , the quality is measured using the average structural similarity (SSIM) calculation result of the target blocks in an intra (I) _frame together with the average block variance var _frame of these frames ,
The method, wherein if the measured quality is low, the value of the quantization parameter upper limit QP _max is reduced to approach the frame quantization parameter QP _frame .
[Aspect 7]
In the method according to aspect 2, in order to ensure high-quality coding in these regions for blocks with extremely small variance, the smaller the block variance, the lower the numerical value of the vibration equivalent QP _block. A method in which the shaking equivalent QP _block, which is a value of a determined low quantization parameter (QP), is allocated (and so that the quality is high) .
[Aspect 8]
In the method of aspect 7, the shaking equivalent QP _block , which is the value of the low quantization parameter (QP) for a very small variance _block, is first determined for I frames and then for P and B frames. Is determined using the ipratio and pbratio parameters.
[Aspect 9]
In the method according to aspect 7, a block whose variance is small but does not consider the variance to be extremely small is used to determine whether quality improvement is necessary for the block.
The shaking equivalent QP _block, which is the initial estimate of the block quantization parameter (QP), is the value of the quantization parameter (QP) of the already-encoded neighboring block on the left, upper left, right and upper right of the current block. Calculated by averaging, and
An estimated SSIM _{est of the} SSIM of the current block is calculated from the SSIM values of the already encoded neighboring blocks at the left, upper left, right and upper right of the current block; and
When the SSIM _est is less than 0.9, the numerical value of the shaking equivalent QP _block is decreased by 2,
Examine the method.
[Aspect 10]
10. The method of aspect 9, wherein the quality enhancement is applied only to blocks that are identified as foreground data by the TMVM and whose TCSF contrast sensitivity log value is greater than 0.8.
[Aspect 11]
In the method according to aspect 3, the temporal frequency of the TCSF is obtained by calculating an approximation of a wavelength using SSIM in a color space region between the target block and the reference block, and a motion vector size and a frame rate. Calculated by finding an approximation of velocity using
[Aspect 12]
4. The method of aspect 3, wherein the TCSF includes a plurality of TCSFs for a current frame such that the TCSF is a weighted average of TCSF maps in a recent frame and a more recent frame receives a greater weight. A method that is calculated over a frame.
[Aspect 13]
4. The method of aspect 3, wherein the TMVM is set to 1 only for foreground data.
[Aspect 14]
In the method according to aspect 13, foreground data is calculated by calculating a difference between an encoder motion vector for a given target block and a global motion vector for the block, and a block having a sufficiently large difference is foreground data. A method identified by being judged.
[Aspect 15]
In the method according to aspect 14, a difference motion vector is obtained by subtracting the encoder motion vector from the global motion vector for a data block identified as foreground data, and the magnitude of the difference motion vector is the TCSF. A method used to calculate the temporal frequency of
[Aspect 16]
4. The method of aspect 3, wherein the TCSF is calculated from a motion vector from an encoder.
[Aspect 17]
The method according to aspect 1, wherein when the importance map is set by the temporal information and the spatial information, the importance map is an integrated importance map.
[Aspect 18]
A system for encoding video data,
A codec that encodes a plurality of video frames using an importance map, wherein the video frames have target blocks that do not overlap each other;
The importance map is configured to influence the encoding quality of each target block to be encoded in each video frame by adjusting quantization,
The importance map is:
The importance map is set using temporal information and spatial information, and the importance map set by these temporal information and spatial information is an integrated heavy element map. As well as
(I) In blocks where the importance map has a high numerical value, the block quantization parameter (QP) is made smaller than the frame quantization parameter QP _frame , so that these blocks have high quality. And (ii) in the target block in which the importance map has a low value, the block quantization parameter is set larger than the frame quantization parameter QP _frame , so that these blocks have low quality. And, by calculating, let the importance level map indicate a part of the video frame that is most easily noticed by human perception of the video frame;
The system that is configured by.
[Aspect 19]
19. The encoder of aspect 18, wherein the spatial information is provided by a rule-based spatial complexity map (SCM), the first step of which target block in the frame is the average block in the frame Determining whether to have a variance greater than the variance var _frame ,
A block having a variance larger than the average block variance var _frame is assigned a quantization parameter (QP) value higher than the frame quantization parameter QP _frame , and a block equivalent QP of the block quantization parameter (QP) is assigned. _The encoder is linearly increased or decreased between the frame quantization parameter QP _frame and the quantization parameter upper limit QP _max according to how much the block variance var _block is larger than the average block variance var _frame .
[Aspect 20]
The encoder according to aspect 18, wherein the temporal information is
A temporal contrast sensitivity function (TCSF) that indicates which target block is most noticeable in time for the observer human, and
True motion vector map (TMVM) showing which target blocks correspond to foreground data
Provided that the TCSF is only valid for target blocks identified as foreground data.
[Aspect 21]
The encoder according to aspect 19, wherein a block having a large variance has a block quantization parameter (QP) of the equivalent weight QP _block , the TMVM identifies the target block as foreground data, and the block of the TCSF is An encoder further refined by the TCSF and the TMVM such that the vibration equivalent QP _block is increased by 2 when the contrast sensitivity logarithm value is less than 0.5 .
[Aspect 22]
The encoder according to aspect 19, wherein the SCM further includes the vibration equivalent QP _block, which is a block quantization parameter of a very bright (greater than 170 brightness) or very dark (less than 60 brightness) target block, at QP _max . Encoder, including brightness masking re-adjusted.
[Aspect 23]
The encoder according to aspect 19, wherein the SCM further includes a dynamic determination of QP _max based on the quantization parameter upper limit on the quality level of the encoded video ,
In this dynamic decision , the quality is measured using the average structural similarity (SSIM) calculation result of the target blocks in an intra (I) _frame together with the average block variance var _frame of these frames ,
If the measured quality is low, the numerical value of the quantization parameter upper limit QP _max is reduced so as to approach the frame quantization parameter QP _frame .
[Aspect 24]
In the encoder according to aspect 19, in order to ensure high quality coding in these regions for blocks with extremely small variance, the smaller the block variance, the lower the numerical value of the vibration equivalent QP _block. An encoder to which the shaking equivalent QP _block, which is a value of a determined low quantization parameter (QP), is allocated (and so that the quality is high) .
[Aspect 25]
25. The encoder according to aspect 24, wherein the vibration equivalent QP _block , which is a value of the low quantization parameter (QP) for a block having extremely small variance, is first determined for an I frame, and then a P frame and a B frame. Is determined using the ipatio and pbratio parameters.
[Aspect 26]
In the system according to aspect 19, a block whose variance is small but which does not consider the variance to be extremely small is used to determine whether quality improvement is necessary for the block.
The shaking equivalent QP _block, which is the initial estimate of the block quantization parameter (QP), is the value of the quantization parameter (QP) of the already-encoded neighboring block on the left, upper left, right and upper right of the current block. Calculated by averaging, and
An estimated SSIM _{est of the} SSIM of the current block is calculated from the SSIM values of the already encoded neighboring blocks at the left, upper left, right and upper right of the current block; and
When the SSIM _est is less than 0.9, the numerical value of the shaking equivalent QP _block is decreased by 2,
The system being examined.
[Aspect 27]
27. The system of aspect 26, wherein the quality enhancement is applied only to blocks that are identified as foreground data by the TMVM and for which the TCSF contrast sensitivity log value is greater than 0.8.
[Aspect 28]
The system according to aspect 20, wherein the temporal frequency of the TCSF is obtained by calculating an approximation of a wavelength using SSIM in a color space region between the target block and its reference block, and a motion vector magnitude and a frame rate. A system that is calculated by finding an approximation of speed using.
[Aspect 29]
The system according to aspect 20, wherein the TCSF includes a plurality of TCSFs for a current frame such that the TCSF is a weighted average of TCSF maps in a recent frame and a more recent frame receives a greater weight. A system that is calculated over a frame.
[Aspect 30]
21. The system according to aspect 20, wherein the TMVM is set to 1 only for foreground data.
[Aspect 31]
In the system according to aspect 30, foreground data is calculated by calculating a difference between an encoder motion vector for a given target block and a global motion vector for the block, and a block having a sufficiently large difference is foreground data. A system identified by being judged.
[Aspect 32]
The system according to aspect 20, wherein a difference motion vector is obtained by subtracting the encoder motion vector from the global motion vector for a data block identified as foreground data, and the magnitude of the difference motion vector is the TCSF. A system used to calculate the temporal frequency of
[Aspect 33]
21. The system of aspect 20, wherein the TCSF is calculated from a motion vector from the encoder.
[Aspect 34]
The system according to aspect 18, wherein when the importance map is set by the temporal information and the spatial information, the importance map is an integrated importance map.

Claims

A method of encoding a plurality of video frames,
The video frames have target blocks that do not overlap each other;
The method is
Encoding the plurality of video frames using the importance map such that the importance map affects the encoding quality of each target block to be encoded in each video frame by adjusting quantization ,
The importance map comprises:
Setting up the importance map using temporal and spatial information; and
(I) In blocks where the importance map has a high numerical value, the block quantization parameter (QP) is made smaller than the frame quantization parameter QP _frame , so that these blocks have high quality. And (ii) in the target block in which the importance map has a low value, the block quantization parameter is set larger than the frame quantization parameter QP _frame , so that these blocks have low quality. And causing the importance map to indicate which part of the video frame among the plurality of video frames is most easily noticed by human perception by calculation;
Consists of, the method.

The method of claim 1, wherein the spatial information is provided by a rule-based spatial complexity map (SCM), the first step of which target blocks within the frame are averaged within the frame. Determining whether to have a variance greater than the block variance var _frame ;
A block having a variance larger than the average block variance var _frame is assigned a quantization parameter (QP) value higher than the frame quantization parameter QP _frame , and a block equivalent QP of the block quantization parameter (QP) is assigned. _{The block} is linearly increased or decreased between the frame quantization parameter QP _frame and the quantization parameter upper limit QP _max according to how much the block variance var _block is larger than the average block variance var _frame .

The method of claim 1, wherein the temporal information is
A temporal contrast sensitivity function (TCSF) that indicates which target block is most noticeable in time for the observer human, and
True motion vector map (TMVM) showing which target blocks correspond to foreground data
Provided that the TCSF is only valid for target blocks identified as foreground data.

3. The method according to claim 2, wherein a block having a large variance has a block quantization parameter (QP) of the shaking equivalent QP _block , the TMVM identifies a target block as foreground data, and the block of the TCSF. The method is further refined by the TCSF and the TMVM such that the shaking equivalent QP _block is increased by 2 when the contrast sensitivity logarithm of is less than 0.5.

The method according to claim 2, wherein the SCM is further adjusted the appropriation amount _{QP block} blocks a quantization parameter 1 70 greater than Brightness or or a size of less than 6 0 bright target block to _{QP max} A method comprising luminance masking to be redone.

3. The method of claim 2, wherein the SCM further includes a dynamic determination of the quantization parameter upper limit QP _max based on a quality level of the encoded video.
In this dynamic decision, the quality is measured using the average structural similarity (SSIM) calculation result of the target blocks in an intra (I) _frame together with the average block variance var _frame of these frames,
The method, wherein if the measured quality is low, the value of the quantization parameter upper limit QP _max is reduced to approach the frame quantization parameter QP _frame .

3. The method according to claim 2, wherein for a block with extremely small variance, in order to ensure high quality coding in these regions, the smaller the block variance, the lower the value of the vibration equivalent QP _block. in the appropriation amount _{QP block} is a value lower quantization parameter determined (QP) is Furiate method.

8. The method of claim 7, wherein the equivalent QP _block , which is the value of the low quantization parameter (QP) for a very small variance _block, is first determined for an I frame and then for P and B frames. Is determined using the ipratio and pbratio parameters.

The method according to claim 7, wherein a block whose variance is small but which is not considered to be extremely small is used to determine whether or not quality improvement is necessary for the block.
The shaking equivalent QP _block, which is the initial estimate of the block quantization parameter (QP), is the value of the quantization parameter (QP) of the already-encoded neighboring block on the left, upper left, right and upper right of the current block. Calculated by averaging, and
An estimated SSIM _{est of the} SSIM of the current block is calculated from the SSIM values of the already encoded neighboring blocks at the left, upper left, right and upper right of the current block; and
When the SSIM _est is less than 0.9, the numerical value of the shaking equivalent QP _block is decreased by 2,
Examine the method.

10. The method of claim 9, wherein the quality enhancement is applied only to blocks identified as foreground data by the TMVM and having a contrast sensitivity logarithm value of the TCSF greater than 0.8.

4. The method according to claim 3, wherein the temporal frequency of the TCSF is obtained by approximating the wavelength using SSIM in a color space region between the target block and its reference block, and the magnitude and frame rate of the motion vector. And a method of calculating an approximation of speed using

4. The method of claim 3, wherein the TCSF is multiple such that the TCSF for a current frame is a weighted average of the TCSF maps in a recent frame and that more recent frames receive a greater weight. Calculated over a number of frames.

4. The method of claim 3, wherein the TMVM is set to 1 only for foreground data.

14. The method according to claim 13, wherein foreground data is calculated by calculating a difference between an encoder motion vector for a given target block and a global motion vector for the block, and a block having a sufficiently large difference is foreground data. The method specified by being judged.

15. The method according to claim 14, wherein a difference motion vector is obtained by subtracting the encoder motion vector from the global motion vector for a data block identified as foreground data, and the magnitude of the difference motion vector is the value of the difference motion vector. A method used to calculate the temporal frequency of a TCSF.

4. The method of claim 3, wherein the TCSF is calculated from a motion vector from an encoder.

The method according to claim 1, wherein when the importance map is set by the temporal information and the spatial information, the importance map is an integrated importance map.

A system for encoding video data,
A codec that encodes a plurality of video frames using an importance map, wherein the video frames have target blocks that do not overlap each other;
The importance map is configured to influence the encoding quality of each target block to be encoded in each video frame by adjusting quantization,
The importance map is:
The importance map is set using temporal information and spatial information, and the importance map set by these temporal information and spatial information is an integrated heavy element map. As well as
(I) In blocks where the importance map has a high numerical value, the block quantization parameter (QP) is made smaller than the frame quantization parameter QP _frame , so that these blocks have high quality. And (ii) in the target block in which the importance map has a low value, the block quantization parameter is set larger than the frame quantization parameter QP _frame , so that these blocks have low quality. And, by calculating, let the importance level map indicate a part of the video frame that is most easily noticed by human perception of the video frame;
The system that is configured by.

19. The system of claim 18, wherein the spatial information is provided by a rule-based spatial complexity map (SCM), the first step of which target blocks within the frame are averaged within the frame. Determining whether to have a variance greater than the block variance var _frame ;
A block having a variance larger than the average block variance var _frame is assigned a quantization parameter (QP) value higher than the frame quantization parameter QP _frame , and a block equivalent QP of the block quantization parameter (QP) is assigned. _block according to its or block variance _{var block} is any greater extent than the average block variance _{var frame,} linearly is increased or decreased between the frame quantization parameter _{QP frame} and the quantization parameter limit _{QP max,} system.

19. The system of claim 18, wherein the temporal information is
A temporal contrast sensitivity function (TCSF) that indicates which target block is most noticeable in time for the observer human, and
True motion vector map (TMVM) showing which target blocks correspond to foreground data
Provided by the TCSF is valid only for the target blocks identified as foreground data, system.

20. The system according to claim 19, wherein a block having a large variance has a block quantization parameter (QP) of the shaking equivalent QP _block , the TMVM identifies the target block as foreground data, and the block of the TCSF. The system is further refined by the TCSF and the TMVM so that the shaking equivalent QP _block is increased by 2 when the contrast sensitivity logarithm of is less than 0.5.

The system of claim 19, wherein the SCM is further adjusted the appropriation amount _{QP block} blocks a quantization parameter 1 70 greater than Brightness or or a size of less than 6 0 bright target block to _{QP max} A system including luminance masking that is reworked.

The system of claim 19, wherein the SCM further includes a dynamic determination of QP _max based on the quantization parameter upper limit on a quality level of the encoded video,
In this dynamic decision, the quality is measured using the average structural similarity (SSIM) calculation result of the target blocks in an intra (I) _frame together with the average block variance var _frame of these frames,
When the measured quality is low, the numerical value of the quantization parameter limit _{QP max} is reduced to approach the frame quantization parameter _{QP frame,} system.

The system according to claim 19, wherein the block equivalent QP _block is reduced as the block variance decreases to ensure high quality coding in these regions for blocks with very low variance. in the appropriation amount _{QP block} is a value lower quantization parameter determined (QP) is Furiate system.

25. The system of claim 24, wherein the shaking equivalent QP _block , which is the value of the low quantization parameter (QP) for a very small variance _block, is first determined for an I frame, and then P frames and A system that is determined using the ipratio and pbratio parameters for frames.

The system according to claim 19, wherein a block whose variance is small but which is not considered to be extremely small is to determine whether a quality improvement is necessary for the block.
The shaking equivalent QP _block, which is the initial estimate of the block quantization parameter (QP), is the value of the quantization parameter (QP) of the already-encoded neighboring block on the left, upper left, right and upper right of the current block. Calculated by averaging, and
An estimated SSIM _{est of the} SSIM of the current block is calculated from the SSIM values of the already encoded neighboring blocks at the left, upper left, right and upper right of the current block; and
When the SSIM _est is less than 0.9, the numerical value of the shaking equivalent QP _block is decreased by 2,
The system being examined.

27. The system of claim 26, wherein the quality enhancement is applied only to blocks identified as foreground data by the TMVM and having a contrast sensitivity logarithm value of the TCSF greater than 0.8.

21. The system according to claim 20, wherein the temporal frequency of the TCSF is obtained by approximating a wavelength using SSIM in a color space region between the target block and its reference block, and a motion vector magnitude and a frame rate. The system is calculated by calculating the approximation of speed using

21. The system of claim 20, wherein the TCSF is multiple such that the TCSF for a current frame is a weighted average of the TCSF maps in a recent frame and a more recent frame receives a greater weight. The system is calculated over a number of frames.

21. The system of claim 20, wherein the TMVM is set to 1 only for foreground data.

The system according to claim 30, wherein the foreground data is calculated by calculating a difference between an encoder motion vector for a given target block and a global motion vector for the block, and a block having a sufficiently large difference is foreground data. A system identified by being judged.

The system according to claim 20, wherein a difference motion vector is obtained by subtracting the encoder motion vector from the global motion vector for a data block identified as foreground data, and the magnitude of the difference motion vector is the size of the difference motion vector. A system used to calculate the temporal frequency of TCSF.

21. The system of claim 20, wherein the TCSF is calculated from a motion vector from the encoder.

The system according to claim 18, wherein the importance map is an integrated importance map when the importance map is set by the temporal information and the spatial information.