JP2005341093A

JP2005341093A - Contents adaptating apparatus, contents adaptation system, and contents adaptation method

Info

Publication number: JP2005341093A
Application number: JP2004155742A
Authority: JP
Inventors: Shunichi Sekiguchi; 俊一関口; Hirobumi Nishikawa; 博文西川; Yoshiaki Kato; 嘉明加藤; Junichi Yokosato; 純一横里; Yuichi Izuhara; 優一出原; Fuminobu Ogawa; 文伸小川
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2004-05-26
Filing date: 2004-05-26
Publication date: 2005-12-08

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem of a conventional video transcoding wherein spatial adaptation reproduction, such as limit of a video image placed on a target region and only the target part displayed on the entire screen with high definition is difficult, because video transcoding uses an entire input image frames for a conversion object and objects in the video image are all uniformly reduced, when the display screen of a reproduction terminal (such as a mobile terminal) is small. <P>SOLUTION: This contents adaptation apparatus extracts a target region in digital video contents, calculates resolution conversion ratio, on the basis of resolution information of input video contents and resolution designation information of output video contents, carries out conversion processing of the input video contents on the basis of the calculated resolution conversion ratio and the extracted target region, and converts original video contents into a form which is desirable by a reproduction terminal. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、圧縮動画データなどのデジタルマルチメディアデータの伝送、蓄積、再生技術等に用いられるコンテンツ適応化装置、適応化方法に関するものである。 The present invention relates to a content adaptation apparatus and adaptation method used for transmission, storage, and reproduction techniques of digital multimedia data such as compressed moving image data.

近年のインターネットやPCの普及、DVDや携帯電話などの情報家電のデジタル化に伴い、デジタルマルチメディアデータのフォーマット(映像符号化方式など)の多様化が進んでいる。また、これらマルチメディアデータを再生する端末も、デジタル放送対応テレビ、携帯型ビデオプレイヤ、PC、PDA、携帯電話など多岐にわたるバリエーションが生じている。また、これら多くのマルチメディア対応機器はインターネット、モバイル通信網、宅内ネットワークなど多様なネットワークに接続が可能となってきており、ネットワークを通じてマルチメディアデータを受信して再生する機能も実現されつつある。 With the spread of the Internet and PCs in recent years and the digitization of information appliances such as DVDs and mobile phones, the format of digital multimedia data (such as video encoding methods) is diversifying. In addition, a variety of terminals such as digital broadcast compatible televisions, portable video players, PCs, PDAs, cellular phones, etc. have been generated in terminals that play back these multimedia data. In addition, many of these multimedia-compatible devices can be connected to various networks such as the Internet, mobile communication networks, and home networks, and a function of receiving and reproducing multimedia data through the network is being realized.

しかし、こういった端末側の自由度が拡大される一方で、個々の端末の処理性能や接続されるネットワークの帯域の違いといった制約から、ある映像コンテンツを様々なネットワークを介して様々な端末へ伝送しオンライン再生を行うといった環境の実現はきわめて困難となっている。このような状況を改善するため、従来、MPEGなどの国際標準映像符号化方式に従う圧縮映像コンテンツを異なるビットレートや解像度へ変換するビデオトランスコーディングの技術が開発されている(特開2001-268571号公報など)。このようなトランスコーディング技術を用いることで、端末が接続されるネットワークの帯域に合わせた映像コンテンツへ変換したり、端末が受信できる映像解像度(画素数・秒あたりのフレーム数)に合わせた映像コンテンツの提供が可能となる。 However, while this degree of freedom on the terminal side is expanded, certain video content can be transferred to various terminals via various networks due to restrictions such as processing performance of individual terminals and differences in the bandwidth of the connected network. Realization of an environment for transmission and online playback is extremely difficult. In order to improve such a situation, video transcoding technology for converting compressed video content according to an international standard video encoding scheme such as MPEG to a different bit rate or resolution has been developed (Japanese Patent Laid-Open No. 2001-268571). Publication). Using such transcoding technology, video content can be converted to video content that matches the bandwidth of the network to which the terminal is connected, or video content that matches the video resolution (number of pixels / frames per second) that the terminal can receive Can be provided.

特開2001-268571号公報JP 2001-268571 A

しかしながら、従来のビデオトランスコーディングにおいては、入力される画像フレーム全体を変換対象とするので、例えば、再生端末の表示画面が小さい場合(携帯端末など)には映像中の被写体がすべて均一に縮小され、映像中に注視すべき領域に限定して、その部分だけを画面全体に高精細表示する、といった空間的な適応再生が困難であった。
これらの表示を行うには、１）まずフレーム全体が縮小された映像コンテンツを受信し、その一部を再生端末側で拡大処理する、２）もとの映像コンテンツの解像度のまま受信し、再生端末側で適宜フレーム全体の縮小処理や一部切り出し処理を行う、といった工夫が必要であり、再生端末に対する情報伝送効率の観点からも、再生端末側の処理負荷の観点からも好ましい解決方法とはいえない。
上記に述べたような問題を解決するためには、ビデオトランスコーディング実行時に変換すべき画像領域を適応的に定めて変換を行うことが望ましい。 However, in the conventional video transcoding, the entire input image frame is subject to conversion.For example, when the display screen of the playback terminal is small (such as a mobile terminal), all the subjects in the video are uniformly reduced. However, it is difficult to perform spatial adaptive reproduction in which only the area to be watched in the video is displayed in high definition on the entire screen.
To perform these displays, 1) First, the video content with the entire frame reduced is received, and a part of the video content is enlarged on the playback terminal side. 2) The original video content resolution is received and played back. It is necessary to devise such that the entire frame is reduced or partially cut out as appropriate on the terminal side, and a preferable solution from the viewpoint of information transmission efficiency for the playback terminal and the processing load on the playback terminal side I can't say that.
In order to solve the problems as described above, it is desirable to perform conversion by adaptively determining an image area to be converted when video transcoding is executed.

本発明は、このように映像コンテンツ再生端末側の表示ディスプレイの制約と映像再生に関わる要求条件に基づいて、元映像コンテンツを再生端末にとって最も望ましい形式に変換できるようにすることを目的とする。 An object of the present invention is to make it possible to convert an original video content into a format most desirable for the playback terminal based on the restrictions on the display on the video content playback terminal side and the requirements for video playback.

本発明では、デジタル映像コンテンツ中の注視領域を抽出するとともに、入力映像コンテンツの解像度情報と、出力映像コンテンツ解像度指定情報とに基づいて解像度変換比率を算出し、この算出された解像度変換比率と上記抽出された注視領域とに基づいて入力映像コンテンツの変換処理を行うようにしたものである。 In the present invention, the gaze area in the digital video content is extracted, the resolution conversion ratio is calculated based on the resolution information of the input video content and the output video content resolution designation information, and the calculated resolution conversion ratio The input video content is converted based on the extracted gaze area.

本発明によれば、コンテンツ再生端末が要求する映像コンテンツの注視領域だけを、再生端末が再生可能な解像度にあわせて適応化できるので、コンテンツ再生端末側での必要以上の画像処理無しに再生の柔軟性を実現できる。 According to the present invention, only the video content gaze area requested by the content playback terminal can be adapted to the resolution that can be played back by the playback terminal, so that it is possible to perform playback without unnecessary image processing on the content playback terminal side. Flexibility can be realized.

実施の形態１．
本実施の形態では、本発明の具体的な実施構成を説明するのに望ましい例として、本発明によるコンテンツ適応化装置を含むコンテンツ適応化システムを想定する。図１に示すような本実施の形態におけるコンテンツ適応化システムは、ネットワーク３と、それに接続されるコンテンツサーバ１、コンテンツ再生端末４、コンテンツ適応化装置５とから構成される。
コンテンツサーバ１には、複数の高解像度・高品質の映像コンテンツが蓄積されており、外部からの要求に沿う映像コンテンツ２をネットワーク３を通じて配信する。コンテンツ再生端末４は、コンテンツサーバ１に対して再生したい映像コンテンツを要求し、ネットワーク３を通じて配信される映像コンテンツを受信し再生する。ここで、コンテンツ再生端末４が、コンテンツサーバ１から送出される映像コンテンツをそのままの映像フォーマットでは受信できない状態を想定する。特に、各映像フレームの水平・垂直サイズ(画素数、走査線数)について、コンテンツ再生端末４が受信できるサイズが、コンテンツサーバ１から送出される映像コンテンツのサイズよりも小さいものとする。 Embodiment 1 FIG.
In the present embodiment, a content adaptation system including a content adaptation device according to the present invention is assumed as a desirable example for describing a specific implementation configuration of the present invention. The content adaptation system in the present embodiment as shown in FIG. 1 includes a network 3, a content server 1, a content playback terminal 4, and a content adaptation device 5 connected to the network 3.
The content server 1 stores a plurality of high-resolution and high-quality video contents, and distributes the video contents 2 in accordance with an external request through the network 3. The content playback terminal 4 requests the content server 1 for video content to be played back, and receives and plays back the video content distributed through the network 3. Here, it is assumed that the content reproduction terminal 4 cannot receive the video content transmitted from the content server 1 in the same video format. In particular, regarding the horizontal / vertical size (number of pixels, number of scanning lines) of each video frame, the size that can be received by the content reproduction terminal 4 is smaller than the size of the video content transmitted from the content server 1.

このような状況において、コンテンツ適応化装置５は、コンテンツサーバ１から送出される映像コンテンツをコンテンツ再生端末４における再生に適した形式に適応化し変換済み映像コンテンツ６として送出する。コンテンツ適応化装置５を介することにより、コンテンツ再生端末４は、コンテンツサーバ１に要求した映像コンテンツを変換済み映像コンテンツ６として受信し、再生することができる。
このようなシステムを前提として、以下、本実施形態のコンテンツ適応化装置５の構成・動作について詳しく説明する。本実施の形態におけるコンテンツ適応化装置５は、端末側からの指定に基づいて、入力映像コンテンツ２のうち変換対象となる画像領域を変化させながら、変換済み映像コンテンツ６を生成することを特徴とする。
このようなコンテンツ適応化装置５を用いることで、コンテンツ再生端末４を、真に必要とする映像情報を必要最小限のコストで受信・再生するように構成することができる。なお、以降の説明において、コンテンツ適応化装置５の動作の立場から、変換済み映像コンテンツ６は「出力映像コンテンツ」と記載する。 In such a situation, the content adaptation device 5 adapts the video content sent from the content server 1 to a format suitable for playback on the content playback terminal 4 and sends it as converted video content 6. Through the content adaptation device 5, the content playback terminal 4 can receive and play back the video content requested of the content server 1 as the converted video content 6.
Based on such a system, the configuration and operation of the content adaptation apparatus 5 of the present embodiment will be described in detail below. The content adaptation device 5 according to the present embodiment generates the converted video content 6 while changing the image area to be converted in the input video content 2 based on the designation from the terminal side. To do.
By using such a content adaptation device 5, the content reproduction terminal 4 can be configured to receive and reproduce the video information that is really necessary at the minimum necessary cost. In the following description, the converted video content 6 is described as “output video content” from the standpoint of the operation of the content adaptation device 5.

本実施の形態におけるコンテンツ適応化装置５は、入力映像コンテンツに対して、所定の映像特徴量に関するアクティビティが高い領域を自動検知して、アクティビティが高い領域を包含する矩形画像領域を抽出する機能を備え、ユーザからの指定に基づいて、入力映像コンテンツ全体を縮小するか、または、入力映像コンテンツの解像度を保ったまま上記アクティビティが高い領域を切り出す処理を選択して、コンテンツ再生端末４のディスプレイサイズに合致する出力映像コンテンツを出力する機能を備えることを特徴とする。
映像特徴量については種々の情報が定義可能であるが、本実施例では特に入力映像コンテンツ中の動きの情報に着目する。
本実施の形態のコンテンツ適応化装置５によって実現される具体的な機能の例を、図２を用いて説明する。図２において、入力映像コンテンツのうち、破線の矩形で示される領域がアクティビティの高い領域として抽出されたとする(図中に明示されていないが、破線領域内に例えば人物などの動きのあるオブジェクトが存在するなどの事例を想定する)。 The content adaptation apparatus 5 according to the present embodiment has a function of automatically detecting a region having a high activity related to a predetermined video feature amount for an input video content and extracting a rectangular image region including a region having a high activity. The display size of the content playback terminal 4 is selected based on the designation from the user by selecting the process of reducing the entire input video content or cutting out the area where the activity is high while maintaining the resolution of the input video content. It has a function of outputting output video content that matches the above.
Various kinds of information can be defined for the video feature amount, but in this embodiment, attention is particularly paid to information on movement in the input video content.
An example of specific functions realized by the content adaptation apparatus 5 of the present embodiment will be described with reference to FIG. In FIG. 2, it is assumed that an area indicated by a broken-line rectangle is extracted as a high activity area in the input video content (not shown in the figure, but an object having movement such as a person is included in the broken-line area. It is assumed that it exists).

もともとの入力映像コンテンツの解像度は、コンテンツ適応化装置５にて出力映像コンテンツの解像度へ変換されなければならないが、この際、本実施形態のコンテンツ適応化装置５によれば、入力映像コンテンツのフレーム全体を縮小して出力映像コンテンツを生成する方法と、上記破線で示される高アクティビティ領域を出力映像コンテンツのフレームサイズで切り出して解像度をダウンコンバートすることなしに出力映像コンテンツを生成する方法とを選択することが可能になる。
これによって、コンテンツ再生端末４側でのコンテンツ再生の柔軟性が向上するという効果が得られる。また、どちらを選択するにせよ、コンテンツ適応化装置５からコンテンツ再生端末４へ伝送される画素数は同じであるため、再生端末４側に無駄な情報を伝送する必要もない。 The original resolution of the input video content must be converted to the resolution of the output video content by the content adaptation device 5. At this time, according to the content adaptation device 5 of this embodiment, the frame of the input video content Select the method to generate the output video content by reducing the whole and the method to generate the output video content without cutting down the high activity area indicated by the broken line with the frame size of the output video content and downconverting the resolution It becomes possible to do.
As a result, the effect of improving the flexibility of content playback on the content playback terminal 4 side can be obtained. Whichever one is selected, the number of pixels transmitted from the content adaptation apparatus 5 to the content reproduction terminal 4 is the same, so there is no need to transmit useless information to the reproduction terminal 4 side.

また、コンテンツ適応化装置５は、入出力映像コンテンツとして、国際標準方式として広く普及しているMPEGフォーマットを想定する。MPEGフォーマットでは、映像コンテンツの各フレームがマクロブロックと呼ばれる１６画素×１６走査線の矩形ブロックに分割され、マクロブロックの単位でデジタル圧縮符号化が行われる。よって、原則として入力映像コンテンツの画像解像度は、水平・垂直方向ともに１６の整数倍の値となる。本コンテンツ適応化装置５は、入力と出力がともにMPEGフォーマットであることを利用して、入力映像コンテンツの各マクロブロックの圧縮符号化データを出力映像コンテンツ生成処理に再利用することにより、変換処理を効率化する。 In addition, the content adaptation apparatus 5 assumes an MPEG format widely used as an international standard system as input / output video content. In the MPEG format, each frame of video content is divided into 16 pixel × 16 scanning line rectangular blocks called macroblocks, and digital compression coding is performed in units of macroblocks. Therefore, in principle, the image resolution of the input video content is an integer multiple of 16 in both the horizontal and vertical directions. The content adaptation device 5 utilizes the fact that both the input and output are in the MPEG format, and reuses the compression-coded data of each macroblock of the input video content for the output video content generation processing, thereby converting the processing. To improve efficiency.

図３に本実施の形態におけるコンテンツ適応化装置５の内部構成を示す。また、図４に図３のコンテンツ適応化装置５の処理フローを示す。
まず、コンテンツ適応化装置５は、コンテンツサーバ１より入力映像コンテンツ２および入力映像コンテンツ解像度情報７を、コンテンツ再生端末４より出力映像コンテンツ解像度指定情報８、注視領域再生指示情報９を入力として受け取る。
出力映像コンテンツ解像度指定情報８、注視領域再生指示情報９は、コンテンツサーバ１が仲介してコンテンツ適応化装置５へ受け渡す構成をとってもよい。コンテンツ適応化装置５は、注視領域再生指示情報９を確認し、同情報の状態に基づいて以降の処理を切り替える(ステップS1)。 FIG. 3 shows the internal configuration of the content adaptation apparatus 5 in the present embodiment. FIG. 4 shows a processing flow of the content adaptation apparatus 5 of FIG.
First, the content adaptation apparatus 5 receives the input video content 2 and the input video content resolution information 7 from the content server 1, and the output video content resolution designation information 8 and the gaze area reproduction instruction information 9 from the content reproduction terminal 4 as inputs.
The output video content resolution designation information 8 and the gaze area reproduction instruction information 9 may be transferred to the content adaptation device 5 through the content server 1. The content adaptation apparatus 5 confirms the gaze area reproduction instruction information 9 and switches the subsequent processing based on the state of the information (step S1).

注視領域再生指示情報９が「注視領域を再生する」を示している場合は、入力映像コンテンツ２の映像データを解析し、注視領域とすべき画像領域を抽出する(ステップS2)。この処理は、注視領域抽出部１０で行われる。
注視領域の抽出結果は、この実施形態では注視領域の始点となるマクロブロック位置の座標情報(変換処理始点１１)であり、注視領域再生指示情報９が「注視領域の再生は行わない」ことを示している場合は、本処理は行われないが、等価的に本処理の結果として変換処理始点１１には入力映像コンテンツの第1マクロブロックの位置が設定されることになる。処理詳細は後述する。 When the gaze area reproduction instruction information 9 indicates “reproduce the gaze area”, the video data of the input video content 2 is analyzed, and an image area to be the gaze area is extracted (step S2). This process is performed by the gaze area extraction unit 10.
In this embodiment, the gaze area extraction result is the coordinate information of the macroblock position that is the start point of the gaze area (conversion processing start point 11), and the gaze area reproduction instruction information 9 indicates that the gaze area is not reproduced. In the case shown, this processing is not performed, but equivalently, as a result of this processing, the position of the first macroblock of the input video content is set at the conversion processing start point 11. Details of the process will be described later.

ステップS1において「注視領域の再生は行わない」と判断された場合は、解像度変換比率決定処理に移る(ステップS3)。この処理は解像度変換比率決定部１２で行われる。具体的には、入力映像コンテンツ解像度情報７と、出力映像コンテンツ解像度指定情報８とから入出力映像コンテンツ間の解像度変換比率１３を求め、それに基づいて入力映像コンテンツのマクロブロック符号化情報の、出力マクロブロック情報へのマッピング比率を求める処理に該当する。注視領域再生指示情報９が「注視領域を再生する」を示している場合は、本処理は行われないが、等価的に解像度変換比率１３は「１」となる。詳細は後述する。 If it is determined in step S1 that “the gaze area is not reproduced”, the process proceeds to the resolution conversion ratio determination process (step S3). This process is performed by the resolution conversion ratio determination unit 12. Specifically, the resolution conversion ratio 13 between the input and output video contents is obtained from the input video content resolution information 7 and the output video content resolution designation information 8, and the output of the macroblock coding information of the input video content is based on the resolution conversion ratio 13. This corresponds to the processing for obtaining the mapping ratio to the macroblock information. When the gaze area reproduction instruction information 9 indicates “reproduce the gaze area”, this processing is not performed, but the resolution conversion ratio 13 is equivalently “1”. Details will be described later.

次いで、変換処理始点１１、解像度変換比率１３の2つのパラメータを用いて、入力映像コンテンツの変換処理を行う(ステップS4)。この処理は映像コンテンツ変換部１４で行われる。基本処理としては、入力映像コンテンツの各フレームを一旦復号し、さらに復号過程で圧縮符号化データから抽出した入力映像コンテンツのマクロブロック情報を用いて出力映像コンテンツのマクロブロック情報を生成する処理を行い、その結果を用いて上記復号画像へ戻された入力映像コンテンツを再符号化することによって出力映像コンテンツを生成する。
つまり、この処理過程において、変換処理始点１１は入力映像コンテンツのマクロブロック情報をマッピングする処理を開始する点を指定することになり、解像度変換比率１３は入力映像コンテンツのマクロブロック情報を何個まとめてマッピングするかを規定することとなる。これらのパラメータは本実施の形態におけるコンテンツ適応化装置５が実現する領域適応映像コンテンツ適応化に不可欠な情報である。 Next, the input video content conversion process is performed using the two parameters of the conversion process start point 11 and the resolution conversion ratio 13 (step S4). This processing is performed by the video content conversion unit 14. As basic processing, each frame of the input video content is temporarily decoded, and further, macroblock information of the output video content is generated using the macroblock information of the input video content extracted from the compression encoded data in the decoding process. The output video content is generated by re-encoding the input video content returned to the decoded image using the result.
That is, in this process, the conversion processing start point 11 designates the point at which the process of mapping the macroblock information of the input video content is started, and the resolution conversion ratio 13 is a number of pieces of macroblock information of the input video content. Will be specified. These parameters are indispensable information for area adaptive video content adaptation realized by the content adaptation apparatus 5 in the present embodiment.

以下、各処理について詳細に説明する。
注視領域抽出処理(ステップS2)
注視領域抽出部１０における注視領域抽出処理＝変換処理始点１１決定処理について説明する。本実施の形態では、予め注視領域の候補となる複数の領域を領域群として定めておく。注視領域は、入力映像コンテンツ中、動きのアクティビティが所定の閾値よりも大きい画像領域と定義する。動きのアクティビティは入力映像コンテンツのマクロブロック符号化データである動きベクトルの大きさＥ_ｋ(k：フレーム内におけるマクロブロックのインクリメントカウンタ)の領域内和 Σ Ｅ_ｋで定義する。
Σ Ｅ_ｋをフレーム中のすべてのマクロブロックを始点とする領域に適用するのは演算コスト面で負荷が高くなるため、ここでは、あらかじめ始点の候補を数点に限定する。具体的な例を図５に示す。一番大きな矩形が入力映像コンテンツのフレーム解像度であり、図５(a)は出力映像コンテンツの解像度が入力映像コンテンツの解像度のちょうど縦横2分の1のサイズである場合、図５(b)は出力映像コンテンツの解像度が入力映像コンテンツの解像度のちょうど縦横4分の1のサイズである場合の例である。×印が注視領域の始点を示す。(a),(b)のいずれが選択されるかは、コンテンツ適応化装置５に入力される出力映像コンテンツ解像度指定情報８に拠る。個々の注視領域候補は水平・垂直ともにマクロブロックの整数倍のサイズであるものとする。もちろん、これらは設定の例に過ぎず、これ以外にも様々な設定の方法が考えられる。注視領域抽出部は、図５のように設定された注視領域候補のそれぞれについてアクティビティ Σ Ｅ_ｋを求め、最も大きなアクティビティを有する領域を注視領域として抽出する。 Hereinafter, each process will be described in detail.
Gaze area extraction process (step S2)
A gaze area extraction process = conversion process start point 11 determination process in the gaze area extraction unit 10 will be described. In the present embodiment, a plurality of areas that are candidates for the gaze area are determined in advance as an area group. The gaze area is defined as an image area in the input video content whose motion activity is larger than a predetermined threshold. The activity of motion is defined by a regional sum Σ E _k of a motion vector magnitude E _k (k: an increment counter of a macroblock in a frame) which is macroblock encoded data of the input video content.
Applying Σ E _k to an area starting from all macroblocks in the frame increases the computational cost, so here, the starting point candidates are limited to several in advance. A specific example is shown in FIG. The largest rectangle is the frame resolution of the input video content. FIG. 5 (a) shows the case where the resolution of the output video content is exactly half the size of the resolution of the input video content. This is an example in which the resolution of the output video content is exactly one-fourth the size of the resolution of the input video content. A cross indicates the starting point of the gaze area. Which of (a) and (b) is selected depends on the output video content resolution designation information 8 input to the content adaptation device 5. Each gaze area candidate is assumed to have a size that is an integral multiple of the macroblock both horizontally and vertically. Of course, these are only examples of settings, and various other setting methods are conceivable. The gaze area extraction unit obtains the activity Σ E _k for each gaze area candidate set as shown in FIG. 5 and extracts the area having the largest activity as the gaze area.

なお、図５のように定められた各々の注視領域候補は空間的に分散しているため、フレーム単位で注視領域を切り替えることは再生映像として見苦しい状態になる。このような頻繁な注視領域の変動を回避するため、注視領域の変更は、例えばコンテンツサーバ１ないしはコンテンツ再生端末４から明示的に注視領域変更要求がなされた場合、所定の時間間隔、異なる注視領域候補にて極端にアクティビティが高くなる状態が発生した場合など、なんらかのトリガ（注視領域変更トリガ）による切り替えに限定するものとする。図示していないが、本注視領域変更トリガは注視領域抽出部１０の内部で入力映像コンテンツを解析する過程で検出される情報でもよいし、外部信号として注視領域抽出部１０に供給される情報でもよい。 Note that each gaze area candidate determined as shown in FIG. 5 is spatially dispersed, so switching the gaze area in units of frames is unsightly as a reproduced video. In order to avoid such frequent changes in the gaze area, the gaze area is changed by, for example, when a gaze area change request is explicitly issued from the content server 1 or the content reproduction terminal 4, for a predetermined time interval and a different gaze area. For example, when a candidate has a state in which the activity is extremely high, the switching is limited to some trigger (gaze area change trigger). Although not shown, the gaze area change trigger may be information detected in the process of analyzing the input video content inside the gaze area extraction unit 10 or information supplied to the gaze area extraction unit 10 as an external signal. Good.

解像度変換比率決定処理(ステップS3)
解像度変換比率決定部１２における解像度変換比率１３を決定する処理について説明する。一般に図１のようなシステムにおいては、入力映像コンテンツ解像度７＞＝出力映像コンテンツ解像度指定情報８であり、入力映像コンテンツ解像度情報７と出力映像コンテンツ解像度指定情報８とが一致しないときは、入力映像コンテンツのマクロブロック情報と出力映像コンテンツのマクロブロック情報とのマッピングを容易にするために、入力映像コンテンツ解像度を水平・垂直とも同じく２^Ｒ分の１(Rは正の整数)に解像度変換することが望ましい。
例えば、入力映像コンテンツ解像度がCIF(352画素×288走査線)で出力映像コンテンツ解像度がQCIF(176画素×144走査線)の場合、上記Rがちょうど１となる。このとき、入力映像コンテンツの4つのマクロブロックがちょうど出力映像コンテンツの１つのマクロブロックに対応し、入力・出力間でマクロブロック情報のマッピングを行いやすい。このRを、入力映像コンテンツ解像度情報７と出力映像コンテンツ解像度指定情報８とから定まる解像度変換比率１３とする。 Resolution conversion ratio determination process (step S3)
Processing for determining the resolution conversion ratio 13 in the resolution conversion ratio determination unit 12 will be described. In general, in the system as shown in FIG. 1, when the input video content resolution 7> = the output video content resolution designation information 8 and the input video content resolution information 7 and the output video content resolution designation information 8 do not match, the input video content resolution 7 In order to facilitate mapping between the macroblock information of the content and the macroblock information of the output video content, the resolution of the input video content resolution is converted to 1 / ^R 2 (R is a positive integer) in the same horizontal and vertical directions. Is desirable.
For example, when the input video content resolution is CIF (352 pixels × 288 scan lines) and the output video content resolution is QCIF (176 pixels × 144 scan lines), R is exactly 1. At this time, four macro blocks of the input video content correspond to exactly one macro block of the output video content, and it is easy to perform mapping of macro block information between input and output. Let R be the resolution conversion ratio 13 determined from the input video content resolution information 7 and the output video content resolution designation information 8.

実応用上、Rはたかだか１〜３程度の値しか使用されない。その理由は、Rが４ともなると入力映像コンテンツ解像度に対して水平・垂直各1/16の画像サイズとなり、入力映像コンテンツがHDTVであっても出力は携帯電話等に用いられるような小型の表示装置が有する解像度よりも小さくなるからである。Rが大きくなるほど入出力間の圧縮符号化データマッピングの精度が落ちるため、変換処理性能的にも好ましくない。 In practical applications, R is only a value of about 1 to 3 at most. The reason is that when R is 4, the image size is 1/16 horizontal and vertical with respect to the input video content resolution, and even if the input video content is HDTV, the output is small enough to be used for a mobile phone or the like. This is because the resolution is smaller than the resolution of the device. As R increases, the accuracy of compression-encoded data mapping between input and output decreases, which is not preferable in terms of conversion processing performance.

一方、実際上は、出力映像コンテンツ解像度指定情報８が、必ずしも入力映像コンテンツ解像度の２^Ｒ分の１の解像度に合致しないという問題がある。
例えば、入力映像コンテンツとしてITU-R BT.601フォーマット、704画素×480走査線の映像を圧縮符号化したコンテンツを想定し、出力映像コンテンツ解像度指定情報８としてQCIF(176画素×144走査線)が指定される場合を考える。いずれも水平・垂直ともにマクロブロックの整数倍の解像度であるが、このとき、水平方向については、704/(2*2)=176 (R=2)でよいが、垂直方向を同様に考えると480/(2*2)=120となってしまい、出力映像コンテンツ解像度指定情報８が指定するQCIFとはならない。また、上記例では、垂直方向のR=2の場合の走査線数が16の倍数でなくなるため、マクロブロックを単位として再符号化するにはそぐわない走査線数となってしまう。 Meanwhile, practice, output picture content resolution designation information 8, there is always a problem that does not conform to 2 ^R portion of the first resolution of the input image content resolution.
For example, it is assumed that the input video content is compression-encoded video of 704-pixel × 480-scan video in ITU-R BT.601 format, and QCIF (176 pixels × 144 scan-line) is used as output video content resolution designation information 8. Consider the case where it is specified. Both are horizontal and vertical resolutions that are integer multiples of the macroblock, but at this time, the horizontal direction may be 704 / (2 * 2) = 176 (R = 2). 480 / (2 * 2) = 120, which is not the QCIF designated by the output video content resolution designation information 8. In the above example, since the number of scanning lines in the case of R = 2 in the vertical direction is not a multiple of 16, the number of scanning lines is not suitable for re-encoding in units of macroblocks.

以上のことから、解像度変換比率１３としてRを求める解像度変換比率決定処理に加え、入力映像コンテンツ解像度情報７を２^Ｒ分の１した結果が16の倍数となっているか否かを検証して、その結果を踏まえて最終的な出力映像コンテンツの変換処理を行う処理が必要となる。本実施の形態においては、この後者の処理は映像コンテンツ変換部１４における処理の一部として行われるもので、具体的には、映像コンテンツ変換処理の説明に後述する。 From the above, in addition to the resolution conversion ratio determining process for determining the R as resolution conversion ratio 13 verifies whether or not the input picture content resolution information 7 2 ^R min 1 result for is a multiple of 16, Based on the result, a process for converting the final output video content is required. In the present embodiment, this latter process is performed as a part of the process in the video content conversion unit 14, and will be specifically described later in the description of the video content conversion process.

映像コンテンツ変換処理(ステップS4)
次に映像コンテンツ変換部１４における映像コンテンツ変換処理について説明する。
上述のとおり、ここでの基本処理としては、入力映像コンテンツの各フレームを一旦復号し、さらに復号過程で圧縮符号化データから抽出した入力映像コンテンツのマクロブロック情報を用いて出力映像コンテンツのマクロブロック情報を生成する処理を行い、その結果を用いて上記復号画像へ戻された入力映像コンテンツを再符号化することによって出力映像コンテンツを生成する。
この再符号化処理過程において、注視領域抽出部１０からの変換処理始点１１は入力映像コンテンツのマクロブロック情報をマッピングする処理を開始する点を指定するものであり、また解像度変換比率決定部１２からの解像度変換比率１３は入力映像コンテンツのマクロブロック情報を何個まとめてマッピングするかを規定することとなるものであり、これらのパラメータを用いて再符号化が行われて出力映像コンテンツが生成される。 Video content conversion process (step S4)
Next, video content conversion processing in the video content conversion unit 14 will be described.
As described above, the basic processing here is to temporarily decode each frame of the input video content, and then use the macroblock information of the input video content extracted from the compression encoded data in the decoding process to output the macroblock of the output video content A process of generating information is performed, and the output video content is generated by re-encoding the input video content returned to the decoded image using the result.
In this re-encoding process, the conversion process start point 11 from the gaze area extraction unit 10 designates a point at which the process of mapping the macroblock information of the input video content is started, and from the resolution conversion ratio determination unit 12 The resolution conversion ratio 13 defines how many pieces of macroblock information of the input video content are mapped together, and re-encoding is performed using these parameters to generate the output video content. The

また、以下に特にこの映像コンテンツ変換処理における特異処置について説明する。
上記注視領域抽出の結果、入力映像コンテンツの各フレームにおける特定の画像領域だけを変換することになった場合、該特定画像領域のエッジに位置するマクロブロックについて特別な処置が必要となる。そのようなマクロブロックに含まれる動きベクトルは、入力映像コンテンツ中ではフレームの内部を指すベクトルになっていても、出力映像コンテンツではフレームの外部を指すベクトルになることがあるが、このようなベクトルをそのまま使用することにより、出力映像コンテンツの画面端における符号化効率が低下することがある。 In the following, a specific procedure in the video content conversion process will be described.
When only the specific image area in each frame of the input video content is converted as a result of the gaze area extraction, special treatment is required for the macroblock located at the edge of the specific image area. A motion vector included in such a macroblock may be a vector indicating the inside of the frame in the input video content, but may be a vector indicating the outside of the frame in the output video content. If this is used as it is, the encoding efficiency at the screen edge of the output video content may be reduced.

この課題を解決する方法は種々考えられるが、例えば、入力映像コンテンツの動きベクトルを、出力映像コンテンツの動きベクトルへマッピングした結果が、出力映像コンテンツの画面外部を指す場合にのみ動きベクトルの再探索を実施する。また、解像度変換を伴う場合(例えばR=1の場合)、入力映像コンテンツの4つのマクロブロックを出力映像コンテンツの1つのマクロブロックにマッピングするため、入力映像コンテンツの4つのマクロブロックに含まれる動きベクトルのうち、画面内を指すベクトルを出力映像コンテンツの動きベクトルとして選択する、などといった処理としてもよい。 There are various methods for solving this problem. For example, only when the result of mapping the motion vector of the input video content to the motion vector of the output video content points outside the screen of the output video content, the motion vector is searched again. To implement. Also, when resolution conversion is involved (for example, when R = 1), the four macroblocks of the input video content are mapped to one macroblock of the output video content in order to map the motions included in the four macroblocks of the input video content. Of the vectors, a vector indicating the screen may be selected as a motion vector of the output video content.

また、上記解像度変換比率決定処理(ステップ3)において説明したように、実際上は、出力映像コンテンツ解像度指定情報８が、必ずしも入力映像コンテンツ解像度の２^Ｒ分の１の解像度に合致しないという場合があり、このような場合の映像コンテンツ変換処理について以下に説明する。入力映像コンテンツ解像度情報７が２^Ｒ分の１した結果が16の倍数となっているか否かを検証して、その結果を踏まえて最終的な出力映像コンテンツの変換処理を行う。 Further, as described in the above resolution conversion ratio determination process (step 3), in practice, the output picture content resolution designation information 8, may be referred to not necessarily match the input picture content resolution 2 ^R min of 1 resolution A video content conversion process in such a case will be described below. To verify whether the input picture content resolution information 7 2 ^R min 1 result for is a multiple of 16, performs conversion processing of the final output video content based on the results.

まず、解像度変換比率決定部１２において決定された解像度変換比率１３に基づいて入力映像コンテンツの解像度７をダウンスケーリングした結果と、出力映像コンテンツ解像度指定情報８とのいずれをも下回る値で、かつマクロブロックの整数倍となる解像度のうち最大となる暫定値を求める。そして、その暫定解像度で再符号化を行い、かつ暫定解像度と、出力映像コンテンツ解像度指定情報８による解像度との間の差に対応する符号化データをダミーの符号化データで埋めるものである。
例えば、入力映像コンテンツの解像度７が320画素×240走査線のQVGA解像度を示し、出力映像コンテンツ解像度指定情報８がQCIF(176画素×144走査線)の場合を例にとると、暫定値は水平方向が160、垂直方向が112となる。その結果、水平方向に1マクロブロック分(16画素)、垂直方向に２マクロブロック分(32走査線)のダミーデータを付与することで最終的な出力映像コンテンツの解像度を出力映像コンテンツ解像度指定情報８にあわせこむ（図６参照）。 First, a value that is lower than both the result of downscaling the resolution 7 of the input video content based on the resolution conversion rate 13 determined by the resolution conversion rate determination unit 12 and the output video content resolution designation information 8, and the macro The maximum provisional value of the resolution that is an integral multiple of the block is obtained. Then, re-encoding is performed with the provisional resolution, and the encoded data corresponding to the difference between the provisional resolution and the resolution based on the output video content resolution designation information 8 is filled with dummy encoded data.
For example, if the input video content resolution 7 indicates a QVGA resolution of 320 pixels × 240 scanning lines and the output video content resolution designation information 8 is QCIF (176 pixels × 144 scanning lines), the provisional value is horizontal. The direction is 160 and the vertical direction is 112. As a result, by adding dummy data for one macroblock (16 pixels) in the horizontal direction and two macroblocks (32 scanning lines) in the vertical direction, the resolution of the final output video content is specified as output video content resolution designation information. 8 (see FIG. 6).

ダミーの符号化データとは例えば、イントラ(フレーム内)符号化されるフレームにおいては黒、グレイなどの単色データを符号化したデータを追加し、インター(フレーム間予測)符号化されるフレームにおいてはイントラ符号化されるフレームのダミーデータの情報をそのままコピーする符号化モードに強制的に設定する符号化データを付与するといった方法で実現できる。上記単色データには、あらかじめ定められる背景画像などを使用してもよいが、単色データとすることで符号化すべきAC成分が必要なくなるため、符号化効率上は単色データのほうが望ましい。 The dummy encoded data is, for example, in a frame encoded intra (intra-frame), adding data encoded with monochrome data such as black and gray, and in a frame encoded inter (inter-frame prediction) This can be realized by a method in which encoded data that is forcibly set to an encoding mode for copying dummy data information of a frame to be intra-encoded is directly applied. A predetermined background image or the like may be used for the monochromatic data. However, the monochromatic data is more desirable in terms of encoding efficiency because the monochromatic data eliminates the need for an AC component to be encoded.

また、符号化方式によっては、近傍マクロブロックの情報を予測に用いるなどしてマクロブロックの符号化・復号を行う方式があるため、ダミー符号化データの挿入を行う際、ダミーデータが意図的に挿入されたことを認識できないデコーダ側での動作をも予測して符号化データを生成する必要がある。これを逐一厳密に行うことはコンテンツ適応化装置５自体の処理負荷を増加させることになる。一般に、MPEGなどの動画像符号化方式においては、フレーム内・フレーム間予測ないしは可変長符号化された映像圧縮データに生じる伝送誤りが時間方向・空間方向に伝播するのを抑止する目的で、圧縮符号化の近傍依存性を断絶するスライス・ビデオパケットといった構造が定義されている。そこで、ダミー符号化データを挿入する際、ダミー符号化データの部分を本来の映像データとは別のスライスないしはビデオパケットとなるようシンタックスを構成することで、上記のようなデコーダでの近傍依存性のある動作を想定したダミー符号化データの生成を一切行うことなくコンテンツ適応化装置５を構成することができる Also, depending on the encoding method, there is a method for encoding / decoding macroblocks by using information of neighboring macroblocks for prediction. Therefore, when inserting dummy encoded data, dummy data is intentionally It is necessary to generate the encoded data by predicting the operation on the decoder side that cannot recognize the insertion. If this is performed strictly one by one, the processing load of the content adaptation device 5 itself is increased. In general, in a moving picture coding system such as MPEG, compression is performed for the purpose of suppressing transmission errors occurring in video compression data that has undergone intraframe / interframe prediction or variable length coding in the time direction / space direction. A structure such as a slice video packet that breaks the neighborhood dependency of encoding is defined. Therefore, when inserting dummy encoded data, the syntax of the dummy encoded data becomes a slice or video packet that is different from the original video data, so that it depends on the neighborhood in the decoder as described above. It is possible to configure the content adaptation device 5 without generating dummy encoded data assuming a reliable operation.

以上のようなコンテンツ適応化装置５により、コンテンツ再生端末４は要求した映像コンテンツを自身の再生可能な解像度にあわせて画面全体再生が行えるだけでなく、注視領域だけをもとの映像コンテンツの解像度で再生することができる。よって、これら再生の柔軟性をコンテンツ再生端末４側で一切の画像処理無しに実現できるほか、常にコンテンツ再生端末４側では同一解像度の映像コンテンツを受信するため、情報伝送効率の観点でも適応再生のために無駄な情報を受信・再生する必要もない。 With the content adaptation device 5 as described above, the content reproduction terminal 4 can not only reproduce the requested video content in accordance with its reproducible resolution, but also the resolution of the video content based only on the gaze area. Can be played. Therefore, the flexibility of playback can be realized without any image processing on the content playback terminal 4 side, and since the video content of the same resolution is always received on the content playback terminal 4 side, adaptive playback is also possible from the viewpoint of information transmission efficiency. Therefore, it is not necessary to receive and reproduce useless information.

実施の形態２．
上記コンテンツ適応化装置５の効果を奏する具体的な応用事例として、例えば映像監視システムで複数の監視カメラからの映像をコンテンツサーバ１が集積して1画面に合成した映像を生成し、それを圧縮符号化して入力映像コンテンツとして送出するようなものがある。
この場合、コンテンツ適応化装置５に対する入力映像コンテンツは意味的に注視領域が合成画面の数だけ存在することになる。簡単のために、ちょうどコンテンツ再生端末４の再生解像度を有する監視カメラの映像が4台分合成された入力映像コンテンツがある場合を想定する。このとき、監視カメラにおける異常検知アラームは上述した注視領域変更トリガとする。すなわち、この異常検知アラームは、注視領域抽出部１０の内部で入力映像コンテンツを解析する過程で検出される情報でもよいし、外部信号として（例えば監視者の指示によって）注視領域抽出部１０に供給される情報でもよい。この場合、注視領域変更トリガとしては、領域変更することを指示するとともに、その変更領域を指定する情報を含むものになる。 Embodiment 2. FIG.
As a specific application example that brings about the effect of the content adaptation device 5, for example, in a video surveillance system, the content server 1 accumulates videos from a plurality of surveillance cameras and generates a video that is synthesized into one screen and compresses it. Some of them are encoded and transmitted as input video content.
In this case, the input video content to the content adaptation apparatus 5 has semantically the number of gaze areas corresponding to the number of synthesized screens. For the sake of simplicity, it is assumed that there is an input video content in which four video images of the surveillance camera having the playback resolution of the content playback terminal 4 are synthesized. At this time, the abnormality detection alarm in the monitoring camera is set as the gaze area change trigger described above. That is, the abnormality detection alarm may be information detected in the process of analyzing the input video content inside the gaze area extraction unit 10 or supplied to the gaze area extraction unit 10 as an external signal (for example, according to a supervisor's instruction). Information may be used. In this case, the gaze area change trigger includes information for instructing the area change and designating the change area.

このようなものにおいて、コンテンツ適応化装置５は通常は複数の監視カメラの映像を合成した結果の映像をコンテンツ再生端末４の解像度に合うように水平・垂直各2分の1に解像度変換して伝送する。そして、あるカメラで異常検知がなされた場合、そのアラームを受け取って該当するカメラの映像が合成される注視領域を特定し、これにより、その注視領域だけを本来のカメラが持つ解像度で高精細にコンテンツ再生端末４へ伝送する。これにより、再生端末４は、注視したい領域について、4台分合成されている場合よりも高精細な映像を得ることができるので、注視対象を詳細に監視することができる。 In such a case, the content adaptation apparatus 5 normally converts the resolution of the video obtained by combining the video of a plurality of surveillance cameras into half each of the horizontal and vertical directions so as to match the resolution of the content playback terminal 4. To transmit. When an abnormality is detected by a certain camera, the alarm is received and the gaze area where the video of the corresponding camera is synthesized is specified, so that only the gaze area is high-definition with the resolution of the original camera. It is transmitted to the content reproduction terminal 4. As a result, the playback terminal 4 can obtain a higher-definition image than the case where four regions are synthesized for the region to be watched, so that the watched object can be monitored in detail.

実施の形態３．
また、本発明のコンテンツ適応化装置５を用いたコンテンツ適応化システムの例として、図７のようなシステム構成とすることもできる。この場合、コンテンツサーバ１はコンテンツ適応化装置５を含む構成となっており、コンテンツサーバ１がコンテンツ適応化装置５の機能を包含することになる。 Embodiment 3 FIG.
Further, as an example of a content adaptation system using the content adaptation device 5 of the present invention, a system configuration as shown in FIG. 7 can be adopted. In this case, the content server 1 includes the content adaptation device 5, and the content server 1 includes the functions of the content adaptation device 5.

実施の形態４．
上記実施形態では、注視領域再生指示情報９が「注視領域を再生する」を示している場合（ステップS1でY）は、解像度変換比率決定処理(ステップS3)は行われず、等価的に解像度変換比率１３は「１」となる場合を示した。これは、注視領域を出力映像コンテンツのフレームサイズで切り出して解像度をダウンコンバートすることなしに出力映像コンテンツを生成するというものであるが、この発明では、注視領域を再生する場合でも、その注視領域の解像度を出力映像コンテンツ解像度指定情報８に対応して変えて出力映像コンテンツを生成してもよい。 Embodiment 4 FIG.
In the above embodiment, when the gaze area reproduction instruction information 9 indicates “reproduce the gaze area” (Y in step S1), the resolution conversion ratio determination process (step S3) is not performed and the resolution conversion is equivalently performed. The ratio 13 is “1”. This is to cut out the gaze area with the frame size of the output video content and generate the output video content without down-converting the resolution. In the present invention, even when the gaze area is reproduced, the gaze area is reproduced. The output video content may be generated by changing the resolution corresponding to the output video content resolution designation information 8.

この場合のフローチャートを図８に示す。注視領域抽出処理（ステップS2）の後に解像度変換比率決定処理(ステップS3)に進み、再生端末表示解像度８に対応して注視領域の解像度の変換比率を決定する。この場合も、実施形態１で説明したときと同様に、注視領域の解像度の変換比率による変換後解像度と出力映像コンテンツ解像度指定情報８と合致しないという場合があるので、映像コンテンツ変換処理(ステップS4)において、実施形態１と同様の再符号化処理を行う。 A flowchart in this case is shown in FIG. After the gaze area extraction process (step S2), the process proceeds to the resolution conversion ratio determination process (step S3), and the resolution conversion ratio of the gaze area corresponding to the playback terminal display resolution 8 is determined. In this case as well, as described in the first embodiment, the post-conversion resolution based on the resolution conversion ratio of the gaze area may not match the output video content resolution designation information 8, so the video content conversion process (step S 4). ), The same re-encoding processing as in the first embodiment is performed.

なお、上記実施の形態の説明では、注視領域抽出のキーとなる映像特徴量を動きのアクティビティ（入力映像コンテンツ中の動きの大きさ）としたが、この注視領域抽出のキーとなる映像特徴量は、注視領域抽出のトリガとなりうるいかなる情報でもよい。例えば、入力映像コンテンツ中の色をその映像特徴量として用い、特定の注目すべき色を検出するようにしてもよい。この場合、入力映像コンテンツ中に特定の色を含む領域が含まれている場合にのみ、その領域を含む注視領域を抽出するような構成とすることができる。さらに、入力映像コンテンツ中から人間の顔領域を抽出する機能があれば、人物を含む領域、ないしは顔特徴から特定の人物を含む映像領域を注視領域とすることもできる。 In the description of the above embodiment, the video feature quantity used as the key for gaze area extraction is the motion activity (the magnitude of the motion in the input video content). May be any information that can trigger a gaze area extraction. For example, the color in the input video content may be used as the video feature amount to detect a specific noteworthy color. In this case, only when a region including a specific color is included in the input video content, the gaze region including the region can be extracted. Furthermore, if there is a function of extracting a human face area from input video content, an area including a person or a video area including a specific person from facial features can be set as a gaze area.

また、映像コンテンツ変換処理については、解像度に注目した変換動作を説明したが、コンテンツ適応化装置において、コンテンツのデータ形式、データビットレートなどの変換を行ってもよいことはいうまでもない。 In the video content conversion process, the conversion operation focusing on the resolution has been described, but it goes without saying that the content adaptation apparatus may convert the data format, data bit rate, and the like of the content.

なお、以上の実施形態の説明は、コンテンツを適応化する方法をも示すものである。それぞれの装置は、計算機で動作するソフトウエアで実現することもできる。 Note that the above description of the embodiment also shows a method for adapting content. Each device can also be realized by software operating on a computer.

実施の形態1におけるコンテンツ適応化システムを示した構成図である。1 is a configuration diagram showing a content adaptation system in Embodiment 1. FIG. コンテンツ適応化装置５による適応化処理例を説明する説明図である。It is explanatory drawing explaining the example of an adaptation process by the content adaptation apparatus. コンテンツ適応化装置５の構成を示した構成図である。3 is a configuration diagram showing a configuration of a content adaptation device 5. FIG. コンテンツ適応化装置５の処理フローを示したフローチャートである。It is the flowchart which showed the processing flow of the content adaptation apparatus. 注視領域の具体的な設定例を示した説明図である。It is explanatory drawing which showed the specific example of a gaze area | region setting. 解像度変換を含む映像コンテンツ変換処理の具体的な事例を示した説明図である。It is explanatory drawing which showed the specific example of the video content conversion process containing resolution conversion. コンテンツ適応化システムの別の例を示した構成図である。It is the block diagram which showed another example of the content adaptation system. コンテンツ適応化装置５の処理フローを示したフローチャートである。It is the flowchart which showed the processing flow of the content adaptation apparatus.

Explanation of symbols

１コンテンツサーバ
２入力映像コンテンツ
３ネットワーク
４コンテンツ再生端末
５コンテンツ適応化装置
６出力映像コンテンツ
７入力映像コンテンツ解像度情報
８出力映像コンテンツ解像度指定情報
９注視領域再生指示情報
１０注視領域抽出部
１１変換処理始点
１２解像度変換比率決定部
１３解像度変換比率
１４映像コンテンツ変換部 DESCRIPTION OF SYMBOLS 1 Content server 2 Input video content 3 Network 4 Content reproduction terminal 5 Content adaptation apparatus 6 Output video content 7 Input video content resolution information 8 Output video content resolution designation information 9 Gaze area reproduction instruction information 10 Gaze area extraction unit 11 Conversion processing start point 12 Resolution conversion ratio determination unit 13 Resolution conversion ratio 14 Video content conversion unit

Claims

A gaze area extraction unit that extracts a gaze area in the digital video content;
A resolution conversion ratio calculation unit that calculates a resolution conversion ratio based on the resolution information of the input video content and the output video content resolution designation information;
And a video content conversion unit that performs conversion processing of the input video content based on the resolution conversion ratio calculated by the resolution conversion ratio calculation unit and the gaze area extracted by the gaze area extraction unit. Content adaptation device.

The gaze area extraction unit stores a group of areas that are predetermined gaze area candidates for the frame image of the input video content data, and quantifies the image feature amount included in each area of the area group as an activity. 2. The content adaptation apparatus according to claim 1, wherein an area with the highest activity is extracted as a gaze area.

Both the input video content and the output video content are MPEG-encoded video data, and the candidate areas for the gaze area are areas each composed of an integral multiple of a macroblock in both horizontal and vertical directions. 3. The content adaptation apparatus according to claim 2, wherein the activity is calculated as a sum of magnitudes of motion vectors of macroblocks included in a region as a gaze region candidate.

Both the input video content and the output video content are MPEG-encoded video data, and the video content conversion unit converts the resolution of the input video content according to the resolution conversion ratio to be input, either horizontal or vertical If the converted resolution does not match the integer multiple of the macroblock, the result of converting the resolution of the input video content based on the resolution conversion ratio and the resolution of the specified output video content are both lower than the macroblock. 2. A maximum provisional value among resolutions that are integer multiples is obtained, and image data that is a difference between the resolution of the designated output video content and the provisional value is filled with dummy encoded data. The content adaptation apparatus described.

2. The content adaptation apparatus according to claim 1, wherein the control signal for controlling the driving of the gaze area extraction unit is a signal notified as a command for requesting execution of extraction from the content reproduction terminal.

2. The content adaptation according to claim 1, wherein the control signal for controlling the driving of the gaze area extracting unit is a signal notifying whether or not an activity calculated in the gaze area extracting unit exceeds a predetermined threshold value. Device.

A content adaptation system that adapts digital video content according to the playback capability of a content playback terminal,
A content server that delivers digital video content in response to a request from a content playback terminal;
A content adaptation device that inputs digital video content distributed by the content server, converts image resolution, and outputs it as output video content;
A content playback terminal that requests digital video content from the content server and receives output video content from the content adaptation device;
The content adaptation device includes a gaze area extraction unit that extracts a gaze area in digital video content, and a resolution conversion ratio that calculates a resolution conversion ratio based on resolution information of the input video content and output video content resolution designation information A calculation unit; and a video content conversion unit that performs a conversion process of the input video content based on the resolution conversion ratio calculated by the resolution conversion ratio calculation unit and the gaze area extracted by the gaze area extraction unit. Content adaptation system characterized by

A gaze area extraction step for extracting a gaze area in the digital video content;
A resolution conversion ratio calculating step for calculating a resolution conversion ratio based on the resolution information of the input video content and the output video content resolution designation information;
And a video content conversion step of performing a conversion process of the input video content based on the resolution conversion ratio calculated in the resolution conversion ratio calculation step and the gaze area extracted in the gaze area extraction step. Content adaptation method.

In the gaze area extraction step, the image feature amount included in each area of the area group that is a predetermined gaze area candidate for the frame image of the input video content data is quantified as an activity, and the area with the largest activity is determined. The content adaptation method according to claim 8, wherein the content adaptation method is extracted as a gaze area.

Both the input video content and the output video content are MPEG-encoded video data, and the candidate areas for the gaze area are areas each composed of an integral multiple of a macroblock in both horizontal and vertical directions. The content adaptation method according to claim 9, wherein the activity is calculated as a sum of magnitudes of motion vectors of macroblocks included in a region that is a gaze region candidate.

Both the input video content and the output video content are MPEG-encoded video data, and the video content conversion step converts the resolution of the input video content according to the input resolution conversion ratio. If the converted resolution does not match the integer multiple of the macroblock, the result of converting the resolution of the input video content based on the resolution conversion ratio and the resolution of the specified output video content are both lower than the macroblock. 9. The maximum provisional value among resolutions that are integer multiples is obtained, and image data that is the difference between the resolution of the designated output video content and the provisional value is filled with dummy encoded data. The content adaptation method described.

9. The content adaptation method according to claim 8, wherein the gaze area extraction step is controlled based on a command requesting execution of extraction from the content reproduction terminal.

9. The content adaptation method according to claim 8, wherein the gaze area extraction step is controlled based on whether or not the activity calculated in the gaze area extraction step exceeds a predetermined threshold value.