JP2006332882A

JP2006332882A - Moving picture coding apparatus

Info

Publication number: JP2006332882A
Application number: JP2005151219A
Authority: JP
Inventors: Fumitoshi Karube; 文利輕部; Norimichi Hiwasa; 憲道日和佐; Hideki Inomata; 英樹猪股
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2005-05-24
Filing date: 2005-05-24
Publication date: 2006-12-07

Abstract

<P>PROBLEM TO BE SOLVED: To provide a moving picture coding apparatus for achieving a video image with high image quality. <P>SOLUTION: A viewer 100 views a video image displayed on an image display section 101 in the moving picture coding apparatus. A sight line detection section 106 detects a sight line 202 of the viewer 100 viewing the video image to provide an output of view line information 203 being a set of sight line position data. A sight line information analysis section 107 determines a target region by analyzing the sight line information 203 output from the sight line detection section 106 on the basis of coded data 207 output from a video coding section 105 and outputs sight line analysis data 204. A coded parameter setting section 103 inputs the sight line analysis data 204 and the coded data 207 to set and output a coded parameter 205. A video coding section 105 codes a division signal 206 on the basis of the coded parameter 205 to output the coded data 207. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、動画像を圧縮符号化する動画像符号化装置に関するものである。 The present invention relates to a moving image encoding apparatus for compressing and encoding a moving image.

従来の動画像符号化装置としては、視線入力装置と注目領域検出装置と映像符号化制御装置を設けることにより、実時間で注目領域の品質を重視した受信映像を見ることが可能な映像通信システムが提供されている（例えば、特許文献１参照）。 As a conventional video encoding device, a video communication system capable of viewing received video that emphasizes the quality of a region of interest in real time by providing a line-of-sight input device, a region of interest detection device, and a video encoding control device Is provided (for example, refer to Patent Document 1).

特開平７−１３５６５１号公報JP 7-135651 A

従来の動画像符号化装置は、以上のように構成されていたので、注目領域を検出する際に、ある時刻における視線位置データから注目領域を検出しているため、同一映像が続いた時に、視線が頻繁に動いた場合には、映像が歪んで見えるという課題があった。
また、従来の画像符号化装置は一人の人を対象としており、複数の人の視線情報を利用することができないという課題があった。 Since the conventional video encoding device is configured as described above, when detecting the attention area, the attention area is detected from the line-of-sight position data at a certain time, so when the same video continues, When the line of sight moves frequently, there is a problem that the image looks distorted.
Further, the conventional image encoding device is intended for one person, and there is a problem that it is impossible to use the line-of-sight information of a plurality of persons.

この発明は上記のような課題を解消するためになされたもので、ある期間内の視線位置データを分析することによって、時間方向の視線情報を考慮して注目領域を検出することで、高画質な映像を実現する動画像符号化装置を得ることを目的とする。
また、画像の特徴情報（色検出、人物検出、文字検出等）を利用することで、特徴のある画像の画質改善を図ることを目的とする。
更に、複数の人の視線情報を利用して注目領域を検出することで、高画質な映像を実現することを目的とする。 The present invention has been made to solve the above-described problems. By analyzing the line-of-sight position data within a certain period and detecting the attention area in consideration of the line-of-sight information in the time direction, the image quality can be improved. An object of the present invention is to obtain a moving picture coding apparatus that realizes a simple video.
Another object of the present invention is to improve the image quality of a characteristic image by using image feature information (color detection, person detection, character detection, etc.).
It is another object of the present invention to realize a high-quality video by detecting a region of interest using gaze information of a plurality of people.

この発明に係る動画像符号化装置は、映像の入力信号を複数の画素からなるブロックに分割した分割信号を出力するブロック化部と、前記入力信号に同期して、前記映像を観察する少なくとも１人以上の観察者の視線を検出した視線情報を出力する視線検出部と、前記視線情報を解析した視線解析情報を出力する視線情報解析部と、前記視線解析情報に基づいて符号化パラメータを設定する符号化パラメータ設定部と、前記符号化パラメータに基づいて前記分割信号を符号化するビデオ符号化部とを備えている。 The moving image encoding apparatus according to the present invention includes a blocking unit that outputs a divided signal obtained by dividing a video input signal into blocks composed of a plurality of pixels, and at least one for observing the video in synchronization with the input signal. A line-of-sight detection unit that outputs line-of-sight information that detects the line of sight of more than one observer, a line-of-sight information analysis unit that outputs line-of-sight analysis information obtained by analyzing the line-of-sight information, and sets encoding parameters based on the line-of-sight analysis information An encoding parameter setting unit that performs encoding, and a video encoding unit that encodes the divided signal based on the encoding parameter.

この発明によれば、ある期間内の視線情報を解析することによって、一時的に視線が乱れた場合の視線情報を注目領域から除くことができ、視線が十分集中している領域に対して目標情報量を多く割り当てることができる。これにより、視線が注がれている領域の画質を改善することができる。 According to the present invention, by analyzing the line-of-sight information within a certain period, it is possible to remove the line-of-sight information when the line of sight is temporarily disturbed from the attention area, and target the area where the line of sight is sufficiently concentrated. A large amount of information can be allocated. Thereby, the image quality of the region where the line of sight is poured can be improved.

実施の形態１．
以下、この発明の実施の形態１について説明する。図１は、この発明の実施の形態１に係る動画像符号化装置の構成を示すブロック図である。図１において、本動画像符号化装置は、画像表示部１０１と、視線検出部１０６と、視線情報解析部１０７と、符号化パラメータ設定部１０３と、ブロック化部１０４と、ビデオ符号化部１０５を備えている。 Embodiment 1 FIG.
Embodiment 1 of the present invention will be described below. FIG. 1 is a block diagram showing a configuration of a moving picture coding apparatus according to Embodiment 1 of the present invention. In FIG. 1, the moving image encoding apparatus includes an image display unit 101, a line-of-sight detection unit 106, a line-of-sight information analysis unit 107, an encoding parameter setting unit 103, a blocking unit 104, and a video encoding unit 105. It has.

次に、動作について説明する。画像表示部１０１は入力信号２０１に基づいて映像を表示する。ここで、入力信号２０１は入力画像信号または予測誤差信号であり、復号されているものとする。画像表示部１０１に表示された映像を観察者１００が観察する。視線検出部１０６は、観察者１００が観察している視線２０２を検出して、視線位置データの集合である視線情報２０３を出力する。視線検出部１０６が観察者１００の視線を検出する際に入力信号２０１との同期をとるために、入力信号２０１が視線検出部１０６にも入力されている。 Next, the operation will be described. The image display unit 101 displays an image based on the input signal 201. Here, it is assumed that the input signal 201 is an input image signal or a prediction error signal and has been decoded. An observer 100 observes an image displayed on the image display unit 101. The line-of-sight detection unit 106 detects the line of sight 202 observed by the observer 100 and outputs line-of-sight information 203 that is a set of line-of-sight position data. The input signal 201 is also input to the line-of-sight detection unit 106 in order to synchronize with the input signal 201 when the line-of-sight detection unit 106 detects the line of sight of the observer 100.

それと並行して、ブロック化部１０４は、入力信号２０１を複数の画素からなるブロックに分割して分割信号２０６を出力する。 In parallel with this, the blocking unit 104 divides the input signal 201 into blocks composed of a plurality of pixels and outputs a divided signal 206.

視線情報解析部１０７は、ビデオ符号化部１０５から出力された符号化データ２０７を基に、視線検出部１０６から出力された視線情報２０３を解析して注目領域の判定を行い、視線解析データ２０４を出力する。ここで、判定に用いる符号化データ２０７は、時間的に現在より前のフレームの符号化データ（以下、前符号化データ）である。 The line-of-sight information analysis unit 107 analyzes the line-of-sight information 203 output from the line-of-sight detection unit 106 based on the encoded data 207 output from the video encoding unit 105 to determine a region of interest, and the line-of-sight analysis data 204 Is output. Here, the encoded data 207 used for the determination is encoded data of a frame before the present in terms of time (hereinafter referred to as pre-encoded data).

以下、視線情報解析部１０７の動作について詳細に説明する。図２は、実施の形態１において、時刻Ｔ１〜Ｔ３間の視線位置データを示した図である。図２（ａ）は時刻Ｔ１〜Ｔ２間の視線位置データを、図２（ｂ）は時刻Ｔ２〜Ｔ３間の視線位置データを示している。図２において、視線位置データの分布を解析する際に、視線情報解析部１０７が画面を一例として６４分割している。また、黒丸が観察者１００の視線２０２の位置、即ち視線位置データを示しており、時間が経過するにつれて視線２０２が広がっていく様子を示している。 Hereinafter, the operation of the line-of-sight information analysis unit 107 will be described in detail. FIG. 2 is a diagram showing line-of-sight position data between times T1 and T3 in the first embodiment. 2A shows line-of-sight position data between times T1 and T2, and FIG. 2B shows line-of-sight position data between times T2 and T3. In FIG. 2, when analyzing the distribution of the line-of-sight position data, the line-of-sight information analysis unit 107 divides the screen into 64 as an example. A black circle indicates the position of the line of sight 202 of the observer 100, that is, line-of-sight position data, and shows a state in which the line of sight 202 spreads over time.

図３は、図１中の視線情報解析部１０７が注目領域を判定する処理フローを示すフローチャートである。視線情報解析部１０７は、視線検出部１０６からの視線情報２０３を一定期間分蓄積しておき、同期間分の視線情報２０３を解析する。 FIG. 3 is a flowchart illustrating a processing flow in which the line-of-sight information analysis unit 107 in FIG. The line-of-sight information analysis unit 107 accumulates the line-of-sight information 203 from the line-of-sight detection unit 106 for a certain period, and analyzes the line-of-sight information 203 for the same period.

先ず、ステップＳＴ１において、図２で分割したブロックのうち、対象ブロック内の視線位置データをカウントする。そして、同ブロック内の視線位置データの累算値（以下、ブロック内累算値）を算出する。 First, in step ST1, the line-of-sight position data in the target block among the blocks divided in FIG. 2 is counted. Then, an accumulated value of the line-of-sight position data in the block (hereinafter, an accumulated value in the block) is calculated.

次に、ステップＳＴ２において、ブロック内累算値が１より大きいかどうかを判定する。対象ブロックのブロック内累算値が１より大きければＳＴ３へ移行し、１以下であればステップＳＴ５へ移行する。 Next, in step ST2, it is determined whether or not the intra-block accumulated value is greater than one. If the in-block accumulated value of the target block is greater than 1, the process proceeds to ST3, and if it is 1 or less, the process proceeds to step ST5.

ステップＳＴ３へ移行した場合、ステップＳＴ３において、「割合＝ブロック内累算値／総視線位置データ数」を算出する。そして、ブロック内累算値が、１ピクチャ当たりの総視線位置データ数に占める割合（以下、ブロック内割合）を求める。 When the process proceeds to step ST3, “ratio = accumulated value in block / total number of line-of-sight position data” is calculated in step ST3. Then, the ratio of the accumulated value in the block to the total number of line-of-sight position data per picture (hereinafter referred to as the ratio in the block) is obtained.

次に、ステップＳＴ４においてブロック内割合を任意の値Ｘと比較する。ブロック内割合がＸよりも大きい場合は、対象ブロックを第１注目領域と判定する。一方、ブロック内割合がＸ以下の場合には、対象ブロックを第２注目領域と判定してステップＳＴ７へ移行する。ステップＳＴ７については後述する。 Next, the in-block ratio is compared with an arbitrary value X in step ST4. When the in-block ratio is larger than X, the target block is determined as the first attention area. On the other hand, if the in-block ratio is equal to or less than X, the target block is determined as the second region of interest, and the process proceeds to step ST7. Step ST7 will be described later.

前述のステップＳＴ２からステップＳＴ５へ移行した場合、ブロック内累算値が１と等しいかどうか判定する。ブロック内累算値が１と等しい場合はステップＳＴ６へ移行する。一方、ブロック内累算値が１と等しくない場合（即ち、ブロック内累算値が０の場合）には、対象ブロックを非注目領域と判定する。 When the process proceeds from step ST2 to step ST5, it is determined whether or not the accumulated value in the block is equal to 1. If the accumulated value in the block is equal to 1, the process proceeds to step ST6. On the other hand, when the intra-block accumulated value is not equal to 1 (that is, when the intra-block accumulated value is 0), the target block is determined as a non-attention area.

ステップＳＴ６へ移行した場合、対象ブロックに隣接するブロック内の視線位置データの累算値（以下、隣接ブロックの累算値）が０かどうか判定する。隣接ブロックの累算値が０の場合は、対象ブロックを非注目領域と判定する。一方、隣接ブロックの累算値が０でない場合には、対象ブロックを第２注目領域と判定し、ステップＳＴ７へ移行する。 When the process proceeds to step ST6, it is determined whether the accumulated value of the line-of-sight position data in the block adjacent to the target block (hereinafter, the accumulated value of the adjacent block) is zero. When the accumulated value of the adjacent block is 0, the target block is determined as a non-target area. On the other hand, if the accumulated value of the adjacent block is not 0, the target block is determined as the second region of interest, and the process proceeds to step ST7.

次に、ステップＳＴ７において、隣接ブロックが非注目領域であるかどうかの判定を行う。隣接ブロックが非注目領域であれば、対象ブロックを第２注目領域のままとする。一方、隣接ブロックが非注目領域ではない場合、即ち注目領域である場合には、対象ブロックを第１注目領域に変更する。 Next, in step ST7, it is determined whether or not the adjacent block is a non-target area. If the adjacent block is a non-attention area, the target block remains as the second attention area. On the other hand, if the adjacent block is not a non-attention area, that is, if it is an attention area, the target block is changed to the first attention area.

図３において、注目領域を第１注目領域と第２注目領域の２種類としているが、より多くの注目領域に分類してもよい。また、視線位置データの累算値を元に注目領域の判定を行っているが、視線位置データに注視時間を乗じた値を累算値として注目領域の判定を行ってもよい。 In FIG. 3, the attention areas are two types of the first attention area and the second attention area, but the attention areas may be classified into more attention areas. Further, although the attention area is determined based on the accumulated value of the line-of-sight position data, the attention area may be determined using a value obtained by multiplying the line-of-sight position data by the gaze time as an accumulated value.

ブロック化部１０４で分割されたブロックの中で、視線情報解析部１０７により決定された注目領域を含むブロックの目標情報量を多くすることで、注目領域の画質を改善する。図２においては、注目領域が広がっていくので、目標情報量を多くするブロック数が時間の経過と共に増える。そして、注目領域以外の領域（非注目領域）は目標情報量を削減し、１ピクチャ内の総目標情報量は変わらないようにする。 The image quality of the attention area is improved by increasing the target information amount of the block including the attention area determined by the line-of-sight information analysis section 107 among the blocks divided by the blocking section 104. In FIG. 2, since the attention area is expanded, the number of blocks that increase the target information amount increases with time. Then, the area other than the attention area (non-attention area) reduces the target information amount so that the total target information amount in one picture does not change.

図４は、実施の形態１において、目標情報量の設定変更例を示した図である。図４では、例として１ピクチャを６４分割している。図４（ａ）は目標情報量の初期値であり、１ピクチャ分の目標情報量（６４００ｂｉｔ）を総ブロック数（６４個）で割った、各ブロックの目標情報量は同一値（１００ｂｉｔ）となる。 FIG. 4 is a diagram illustrating a setting change example of the target information amount in the first embodiment. In FIG. 4, one picture is divided into 64 as an example. FIG. 4A shows an initial value of the target information amount. The target information amount of each block obtained by dividing the target information amount for one picture (6400 bits) by the total number of blocks (64) is the same value (100 bits). Become.

図４（ｂ）は、図２（ａ）の視線位置データから注目領域を指定した図である。図４（ｂ）中の斜線ブロックが注目領域である。図２（ａ）のＢ１、Ｂ２、Ｂ３、Ｂ４ブロックは、視線位置データが集中しているため、第１注目領域と判定される。Ｂ５、Ｂ６、Ｂ７ブロックは、視線位置データが１つしか存在せず、隣接ブロックにも視線位置データが存在しないため、非注目領域と判定される。そして、注目領域の周辺は、視覚的に認識されやすいため、目標情報量を削減しない第２注目領域とする。図４（ｂ）において、一例として、第１注目領域（斜線ブロック）の目標情報量を２．２０倍、第２注目領域の（横線ブロック）の目標情報量を１．００倍（初期設定のまま）、非注目領域（空白ブロック）の目標情報量を０．９０倍している。１ピクチャの総目標情報量は図４（ａ）、図４（ｂ）共に同じである。 FIG. 4B is a diagram in which a region of interest is designated from the line-of-sight position data of FIG. A hatched block in FIG. 4B is a region of interest. The blocks B1, B2, B3, and B4 in FIG. 2A are determined as the first region of interest because the line-of-sight position data is concentrated. The B5, B6, and B7 blocks have only one line-of-sight position data, and the line-of-sight position data also does not exist in adjacent blocks, so that they are determined as non-attention areas. Since the periphery of the attention area is easily visually recognized, the second attention area that does not reduce the target information amount is set. In FIG. 4B, as an example, the target information amount of the first region of interest (hatched block) is 2.20 times, and the target information amount of the second region of interest (horizontal line block) is 1.00 times (initial setting). The target information amount of the non-attention area (blank block) is multiplied by 0.90. The total target information amount of one picture is the same in both FIG. 4 (a) and FIG. 4 (b).

再び図１において、符号化パラメータ設定部１０３は、視線解析データ２０４と符号化データ２０７とを入力して符号化パラメータ２０５を設定して出力する。ここで入力される符号化データ２０７は、視線情報解析部１０７と同様に、前符号化データである。 In FIG. 1 again, the encoding parameter setting unit 103 inputs the line-of-sight analysis data 204 and the encoded data 207, sets the encoding parameter 205, and outputs it. The encoded data 207 input here is pre-encoded data as in the line-of-sight information analysis unit 107.

ビデオ符号化部１０５は、符号化パラメータ２０５に基づいて分割信号２０６を符号化して、符号化データ２０７を出力する。この符号化データ２０７は、次フレーム以降を符号化する際に、視線情報解析部１０７で注目領域を判定するために用いられる。 The video encoding unit 105 encodes the divided signal 206 based on the encoding parameter 205 and outputs encoded data 207. The encoded data 207 is used by the line-of-sight information analysis unit 107 to determine a region of interest when encoding subsequent frames.

以上のように、この実施の形態１によれば、一定期間内の視線情報２０３を解析することによって、一時的に視線が乱れた場合の視線情報２０３を注目領域から除くことができ、視線が十分集中している領域に対して目標情報量を多く割り当てることができる。その結果、視線が注がれている領域の画質を改善することができる。 As described above, according to the first embodiment, by analyzing the line-of-sight information 203 within a certain period, the line-of-sight information 203 when the line of sight is temporarily disturbed can be removed from the region of interest. A large amount of target information can be assigned to a sufficiently concentrated area. As a result, the image quality of the region where the line of sight is poured can be improved.

実施の形態２．
以下、この発明の実施の形態２について説明する。図５は、この発明の実施の形態２に係る動画像符号化装置の構成を示すブロック図である。実施の形態２に係る動画像符号化装置は、実施の形態１に係る動画像符号化装置（図１）に画像特徴抽出部１０２を追加した構成である。その他の構成は実施の形態１と同様であるので説明を省略する。 Embodiment 2. FIG.
The second embodiment of the present invention will be described below. FIG. 5 is a block diagram showing the configuration of the moving picture coding apparatus according to Embodiment 2 of the present invention. The moving picture coding apparatus according to Embodiment 2 has a configuration in which an image feature extraction unit 102 is added to the moving picture coding apparatus (FIG. 1) according to Embodiment 1. Since other configurations are the same as those of the first embodiment, description thereof is omitted.

次に、動作について説明する。画像特徴抽出部１０２は、入力信号２０１を入力し、画像に含まれる特徴を抽出して、画像特徴信号２０８を出力する。符号化パラメータ設定部１０３は、画像特徴信号２０８と視線解析データ２０４と前符号化データ２０７から符号化パラメータの設定を行う。ブロック化部１０４は、入力信号２０１を複数の画素からなるブロックに分割して分割信号２０６を出力する。ビデオ符号化部１０５は、符号化パラメータ２０５に基づいて分割信号２０６を符号化して、符号化データ２０７を出力する。その他の動作は実施の形態１と同様であるので説明を省略する。 Next, the operation will be described. The image feature extraction unit 102 receives the input signal 201, extracts features included in the image, and outputs an image feature signal 208. The encoding parameter setting unit 103 sets an encoding parameter from the image feature signal 208, the line-of-sight analysis data 204, and the pre-encoded data 207. The blocking unit 104 divides the input signal 201 into blocks composed of a plurality of pixels and outputs a divided signal 206. The video encoding unit 105 encodes the divided signal 206 based on the encoding parameter 205 and outputs encoded data 207. Since other operations are the same as those in the first embodiment, description thereof is omitted.

以下、画像特徴抽出部１０２の動作について詳細に説明する。図６は、実施の形態２において、ある期間内の視線位置データを示した図である。例として、画面を６４分割している。視線位置データは画面中央付近の楕円内に集中しているが、楕円外にも視線位置データは点在している。ここで、画像特徴抽出部１０２が、同画面内に人物と文字が映っていることを検出することにより、楕円外に点在した視線位置データが、人物と文字が映っている領域（以下、特徴領域）に含まれていることがわかる。 Hereinafter, the operation of the image feature extraction unit 102 will be described in detail. FIG. 6 is a diagram showing line-of-sight position data within a certain period in the second embodiment. As an example, the screen is divided into 64 parts. The line-of-sight position data is concentrated in an ellipse near the center of the screen, but the line-of-sight position data is also scattered outside the ellipse. Here, when the image feature extraction unit 102 detects that the person and the character are reflected in the same screen, the line-of-sight position data scattered outside the ellipse is an area in which the person and the character are reflected (hereinafter, referred to as “line”). It can be seen that it is included in the feature region.

ブロック化部１０４で分割されたブロックの中で、視線情報解析部１０７により決定された注目領域を含むブロックの目標情報量を多くすることで、注目領域の画質を改善する。また、画像特徴抽出部１０２で検出された特徴領域に視線が向いていた場合も、該当ブロックの目標情報量を多くすることで、画質の改善を図る。そして、注目領域及び特徴領域以外の領域は目標情報量を削減し、１ピクチャ内の総目標情報量は変わらないようにする。 The image quality of the attention area is improved by increasing the target information amount of the block including the attention area determined by the line-of-sight information analysis section 107 among the blocks divided by the blocking section 104. Also, when the line of sight is directed to the feature area detected by the image feature extraction unit 102, the image quality is improved by increasing the target information amount of the block. The target information amount is reduced in regions other than the attention region and the feature region, so that the total target information amount in one picture does not change.

図７は、実施の形態２において、目標情報量の設定変更例を示した図である。例として、画面を６４分割している。図７（ａ）は目標情報量の初期値であり、１ピクチャ分の目標情報量を総ブロック数で割ったものであり、各ブロックは同一値となる。図７（ｂ）は図６の視線位置データから注目領域及び特徴量域を指定した図である。図７（ｂ）において、注目領域（斜線ブロック）の目標情報量を２．００倍、特徴領域（波線ブロック）の目標情報量を１．６０倍、注目領域及び特徴領域の周辺（横線ブロック）の目標情報量を１．００倍（初期設定のまま）、非注目領域（空白ブロック）の目標情報量を０．８０倍している。１ピクチャの総目標情報量は図７（ａ）、図７（ｂ）共に同じである。 FIG. 7 is a diagram illustrating a setting change example of the target information amount in the second embodiment. As an example, the screen is divided into 64 parts. FIG. 7A shows an initial value of the target information amount, which is obtained by dividing the target information amount for one picture by the total number of blocks, and each block has the same value. FIG. 7B is a diagram in which a region of interest and a feature amount region are designated from the line-of-sight position data of FIG. In FIG. 7B, the target information amount of the attention area (shaded block) is 2.00 times, the target information amount of the feature area (wavy line block) is 1.60 times, and the periphery of the attention area and the feature area (horizontal line block). The target information amount of the non-target area (blank block) is multiplied by 0.80 times. The total target information amount of one picture is the same in both FIG. 7 (a) and FIG. 7 (b).

以上のように、この実施の形態２によれば、ある期間内の視線情報２０３を解析し、更に画像の特徴を考慮することによって、視線が十分集中している領域と、視線が集中していなくても画像に特徴がある領域に対して目標情報量を多く割り当てることができる。これにより、視線が注がれている領域と特徴のある領域の画質を改善することができる。 As described above, according to the second embodiment, by analyzing the line-of-sight information 203 within a certain period and further considering the characteristics of the image, the area where the line of sight is sufficiently concentrated and the line of sight are concentrated. Even if it is not, a large amount of target information can be assigned to a region having an image characteristic. Thereby, the image quality of the region where the line of sight is poured and the characteristic region can be improved.

実施形態３．
以下、この発明の実施の形態３について説明する。図８は、この発明の実施の形態３に係る動画像符号化装置の構成を示すブロック図である。実施の形態３に係る動画像符号化装置は、実施の形態２に係る動画像符号化装置（図５）に注目領域演算部１０９と注目領域蓄積部１１０を追加した構成である。なお、図８の上段と下段は注目領域蓄積部１１０を接続点として連続しているが、上段と下段それぞれが別に動作する。その他の構成は実施の形態２と同様であるので説明を省略する。 Embodiment 3. FIG.
The third embodiment of the present invention will be described below. FIG. 8 is a block diagram showing the configuration of the moving picture coding apparatus according to Embodiment 3 of the present invention. The moving picture coding apparatus according to Embodiment 3 has a configuration in which attention area calculation section 109 and attention area storage section 110 are added to the moving picture coding apparatus (FIG. 5) according to Embodiment 2. Note that the upper and lower stages in FIG. 8 are continuous with the attention area storage unit 110 as a connection point, but the upper and lower stages operate separately. Since other configurations are the same as those of the second embodiment, description thereof is omitted.

次に、動作について説明する。入力画像データ１０８は、映像を格納した媒体である。画像表示部１０１は、入力画像データ１０８の入力信号２０１に基づいて映像を表示する。画像表示部１０１に表示された映像を観察者１００が観察する。視線検出部１０６は、観察者１００が観察している視線２０２を検出して、視線情報２０３を出力する。視線情報解析部１０７は、視線情報２０３を解析して視線解析データ２０４を出力する。画像特徴抽出部１０２は、画像に含まれる特徴を抽出して画像特徴信号２０８を出力する。 Next, the operation will be described. The input image data 108 is a medium storing video. The image display unit 101 displays a video based on the input signal 201 of the input image data 108. An observer 100 observes an image displayed on the image display unit 101. The line-of-sight detection unit 106 detects the line of sight 202 observed by the observer 100 and outputs line-of-sight information 203. The line-of-sight information analysis unit 107 analyzes the line-of-sight information 203 and outputs line-of-sight analysis data 204. The image feature extraction unit 102 extracts features included in the image and outputs an image feature signal 208.

注目領域演算部１０９は、入力された視線解析データ２０４と画像特徴信号２０８に基づいて、画像の注目領域を算出して注目領域特定信号２０９を出力する。注目領域蓄積部１１０には、注目領域特定信号２０９が入力され、どのような画像に視線が注がれているのか、映像全体の情報が保存される。 The attention area calculation unit 109 calculates an attention area of the image based on the input line-of-sight analysis data 204 and the image feature signal 208 and outputs an attention area specifying signal 209. The attention area storage unit 110 receives the attention area identification signal 209 and stores information on the entire video indicating what kind of line of sight is being poured.

符号化パラメータ設定部１０３は注目領域蓄積部１１０から出力される注目領域蓄積データ２１０（映像全体の情報が保存されている）に基づいて、符号化パラメータ２０５を設定して出力する。注目領域蓄積部１１０には映像全体の情報が保存されているため、符号化済みの過去の情報だけでなく、未だ符号化していない未来の情報も利用することができる。例えば、視線が集中する領域が広がることがわかっている場合には、予め目標情報量を多く割り当てる領域を広く設定することにより、画質改善を図ることができる。 The encoding parameter setting unit 103 sets and outputs the encoding parameter 205 based on the attention area accumulation data 210 (the entire video information is stored) output from the attention area accumulation section 110. Since the attention area storage unit 110 stores information of the entire video, not only past information that has been encoded but also future information that has not yet been encoded can be used. For example, when it is known that the area where the line of sight concentrates increases, the image quality can be improved by setting a wide area to which a large amount of target information is allocated in advance.

また、あるブロックに対する視線の注視時間が長いことがわかっている場合には、視線が注視し始めた時から当該ブロックの情報量を多く割り当てることにより、画質改善を図ることができる。 When it is known that the gaze time of a line of sight for a certain block is long, the image quality can be improved by assigning a large amount of information of the block from the time when the line of sight begins to gaze.

ブロック化部１０４は、入力信号２０１を複数の画素からなるブロックに分割して分割信号２０６を出力する。ここで、入力信号２１１は入力信号２０１と同じ信号である。ビデオ符号化部１０５は、符号化パラメータ２０５に基づいて分割信号２０６を符号化して、符号化データ２０７を出力する。 The blocking unit 104 divides the input signal 201 into blocks composed of a plurality of pixels and outputs a divided signal 206. Here, the input signal 211 is the same signal as the input signal 201. The video encoding unit 105 encodes the divided signal 206 based on the encoding parameter 205 and outputs encoded data 207.

以上のように、この実施の形態３によれば、注目領域蓄積部１１０に映像全体の視線解析データ２０４と画像特徴信号２０８を蓄積しているので、符号化の際には、過去の情報だけでなく、未来の情報も利用できる。これにより、視線が今後注がれる領域に、前もって情報量を配分することが可能となり、画質改善を図ることができる。 As described above, according to the third embodiment, since the line-of-sight analysis data 204 and the image feature signal 208 of the entire video are stored in the attention area storage unit 110, only past information is encoded. In addition, future information can be used. This makes it possible to allocate the amount of information in advance to an area where the line of sight will be poured in the future, and to improve image quality.

実施の形態４．
以下、この発明の実施の形態４について説明する。図９は、この発明の実施の形態４に係る動画像符号化装置の構成を示すブロック図である。実施の形態４に係る動画像符号化装置は、実施の形態２に係る動画像符号化装置（図５）に学習データ演算部１１１と学習データ蓄積部１１２を追加した構成である。その他の構成は実施の形態２と同様であるので説明を省略する。 Embodiment 4 FIG.
The fourth embodiment of the present invention will be described below. FIG. 9 is a block diagram showing the configuration of the moving picture coding apparatus according to Embodiment 4 of the present invention. The moving picture coding apparatus according to Embodiment 4 has a configuration in which learning data calculation section 111 and learning data storage section 112 are added to the moving picture coding apparatus (FIG. 5) according to Embodiment 2. Since other configurations are the same as those of the second embodiment, description thereof is omitted.

次に、動作について説明する。画像表示部１０１は入力信号２０１に基づいて映像を表示する。画像表示部１０１に表示された映像を観察者１００が観察する。視線検出部１０６は、観察者１００が観察している視線２０２を検出して、視線情報２０３を出力する。視線情報解析部１０７は、前符号化データ２０７を基に視線情報２０３を解析して、視線解析データ２０４を出力する。画像特徴抽出部１０２は、画像に含まれる特徴を抽出して画像特徴信号２０８を出力する。 Next, the operation will be described. The image display unit 101 displays an image based on the input signal 201. An observer 100 observes an image displayed on the image display unit 101. The line-of-sight detection unit 106 detects the line of sight 202 observed by the observer 100 and outputs line-of-sight information 203. The line-of-sight information analysis unit 107 analyzes the line-of-sight information 203 based on the pre-encoded data 207 and outputs the line-of-sight analysis data 204. The image feature extraction unit 102 extracts features included in the image and outputs an image feature signal 208.

学習データ演算部１１１は、視線解析データ２０４と画像特徴信号２０８に基づいて、注目領域を算出して学習データ信号２１２を出力する。学習データ蓄積部１１２は、観察者１００が注目することが多い画像の特徴を学習データ（学習データ信号２１２）として蓄積する。そして、学習データ信号２１２を蓄積した蓄積学習データ２１３を出力する。 The learning data calculation unit 111 calculates a region of interest based on the line-of-sight analysis data 204 and the image feature signal 208 and outputs a learning data signal 212. The learning data accumulation unit 112 accumulates image features that the observer 100 often pays attention as learning data (learning data signal 212). Then, accumulated learning data 213 in which the learning data signal 212 is accumulated is output.

符号化パラメータ設定部１０３は、視線解析データ２０４と画像特徴信号２０８と蓄積学習データ２１３、及び前符号化データ２０７に基づいて、符号化パラメータ２０５を設定して出力する。学習データ蓄積部１１２には、観察者１００が注目することが多い画像の特徴（蓄積学習データ２１３）が保存されているので、現在の入力画像の特徴と蓄積学習データ２１３とを比べて、観察者１００が注目する領域であれば、符号化パラメータ設定部１０３は目標情報量を多くするように符号化パラメータ１０５の設定を行う。 The encoding parameter setting unit 103 sets and outputs an encoding parameter 205 based on the line-of-sight analysis data 204, the image feature signal 208, the accumulated learning data 213, and the pre-encoded data 207. Since the learning data storage unit 112 stores image features (accumulated learning data 213) that the observer 100 often pays attention to, the current input image features and the stored learning data 213 are compared for observation. If it is an area that the person 100 is interested in, the encoding parameter setting unit 103 sets the encoding parameter 105 so as to increase the target information amount.

以下、符号化パラメータ設定部１０５での符号化パラメータ２０５の設定例を示す。例えば、学習データによって、観察者１００が赤色の映像に注目する傾向があることがわかっている場合には、入力信号２０１の中に赤色を検出した際には、該当ブロックの目標情報量を多く設定することによって、画質改善を図る。 Hereinafter, a setting example of the encoding parameter 205 in the encoding parameter setting unit 105 will be shown. For example, when it is known from the learning data that the observer 100 tends to focus on a red video, when the red color is detected in the input signal 201, the target information amount of the corresponding block is increased. The image quality is improved by setting.

例えば、学習データによって、観察者１００が静止した映像（背景）よりも動きのある映像に注目する傾向があることがわかっている場合には、入力信号２０１の中に動きのある領域を検出した際には、該当ブロックの目標情報量を多く設定することによって、画質改善を図る。 For example, if the learning data indicates that the observer 100 tends to focus on moving images rather than a stationary image (background), a moving region is detected in the input signal 201. In this case, image quality is improved by setting a large amount of target information for the corresponding block.

例えば、学習データによって、観察者１００の視線が動き易い傾向にあることがわかっている場合には、注目領域に割り当てる目標情報量の増加分を抑制することによって、特定領域に情報量が集中するのを避け、画面全体に情報量が配分されるようにすることで、画質改善を図る。 For example, when it is known from the learning data that the line of sight of the viewer 100 tends to move, the amount of information is concentrated in a specific area by suppressing an increase in the amount of target information allocated to the attention area. The image quality is improved by avoiding the problem and distributing the information amount to the entire screen.

例えば、学習データによって、観察者１００の視線が画面中央付近に集中する傾向があることがわかっている場合には、画面中央付近の目標情報量を多く設定することによって、画質改善を図る。 For example, when it is known from the learning data that the line of sight of the viewer 100 tends to concentrate near the center of the screen, the image quality is improved by setting a large amount of target information near the center of the screen.

再び図９において、ブロック化部１０４は、入力信号２０１を複数の画素からなるブロックに分割して分割信号２０６を出力する。ビデオ符号化部１０５は、符号化パラメータ２０５に基づいて分割信号２０６を符号化して、符号化データ２０７を出力する。 In FIG. 9 again, the blocking unit 104 divides the input signal 201 into blocks composed of a plurality of pixels and outputs a divided signal 206. The video encoding unit 105 encodes the divided signal 206 based on the encoding parameter 205 and outputs encoded data 207.

以上のように、この実施の形態４によれば、観察者１００が注目する傾向にある画像の特徴を学習データとして学習データ蓄積部１１２に蓄積しているので、観察者１００に特化した符号化を行うことが可能となり、画質改善を図ることができる。 As described above, according to the fourth embodiment, since the features of the image that the observer 100 tends to focus on are accumulated in the learning data accumulation unit 112 as learning data, the code specialized for the observer 100 is used. Image quality can be improved, and image quality can be improved.

実施の形態５．
以下、この発明の実施の形態５について説明する。図１０は、この発明の実施の形態５に係る動画像符号化装置の構成を示すブロック図である。実施の形態５に係る動画像符号化装置は、実施の形態３に係る動画像符号化装置（図８）と同構成であるので説明を省略する。但し、複数人の観察者１００が存在することが相違する。 Embodiment 5. FIG.
The fifth embodiment of the present invention will be described below. FIG. 10 is a block diagram showing the configuration of the moving picture coding apparatus according to Embodiment 5 of the present invention. Since the moving picture encoding apparatus according to Embodiment 5 has the same configuration as the moving picture encoding apparatus (FIG. 8) according to Embodiment 3, the description thereof is omitted. However, it is different that there are a plurality of observers 100.

次に、動作について説明する。動作についても、実施の形態３との相違点のみ説明する。図１１は、実施の形態５において、時刻Ｔ１〜Ｔ２間の３人の観察者１００の視線位置データを示したものである。例として、画面を６４分割している。図１１中で、黒丸、ひし形、バツは、３人の観察者１００それぞれの視線位置データを表す。観察者１００が複数人の場合、特定の観察者１００に対して、視線位置データのカウントの際に重み付けを行うことを可能とする。即ち、特定の観察者１００の視線情報２０３を重視する場合には、特定の観察者１００の視線位置データをα（＞１．０）倍させてカウントする。 Next, the operation will be described. Only the differences from the third embodiment will be described. FIG. 11 shows line-of-sight position data of three observers 100 between times T1 and T2 in the fifth embodiment. As an example, the screen is divided into 64 parts. In FIG. 11, black circles, diamonds, and crosses represent line-of-sight position data of the three observers 100. When there are a plurality of observers 100, a specific observer 100 can be weighted when counting line-of-sight position data. That is, when emphasizing the line-of-sight information 203 of a specific observer 100, the line-of-sight position data of the specific observer 100 is multiplied by α (> 1.0) and counted.

図１２は、実施の形態５において、目標情報量の設定変更例を示した図である。例として、画面を６４分割している。図１２（ａ）は目標情報量の初期値であり、１ピクチャ分の目標情報量を総ブロック数で割ったものであり、各ブロック同一値となる。図１２（ｂ）は、図１１の視線位置データから注目領域及び特徴領域を指定した図である。図１２（ｂ）において、注目領域１（斜線ブロック）の目標情報量を２．０５倍、注目領域２（波線ブロック）の目標情報量を１．５５倍、注目領域の周辺（横線ブロック）の目標情報量を１．００倍（初期設定のまま）、非注目領域（空白ブロック）の目標情報量を０．８０倍している。１ピクチャの総目標情報量は図１２（ａ）、図１２（ｂ）共に同じである。 FIG. 12 is a diagram illustrating a setting change example of the target information amount in the fifth embodiment. As an example, the screen is divided into 64 parts. FIG. 12A shows an initial value of the target information amount, which is obtained by dividing the target information amount for one picture by the total number of blocks, and has the same value for each block. FIG. 12B is a diagram in which a region of interest and a feature region are designated from the line-of-sight position data of FIG. In FIG. 12B, the target information amount of the attention area 1 (hatched block) is 2.05 times, the target information amount of the attention area 2 (dashed block) is 1.55 times, and the area around the attention area (horizontal line block) is increased. The target information amount is increased by 1.00 (initially set), and the target information amount of the non-attention area (blank block) is increased by 0.80. The total target information amount of one picture is the same in both FIG. 12 (a) and FIG. 12 (b).

以上のように、この実施の形態５によれば、複数の人の映像全体の視線情報を蓄積しているので、観察者１００の個人特性の影響を少なくし、より一般的な視線情報を利用することができる。また、視線情報と画像特徴情報を蓄積しているので、符号化の際には、過去の情報だけでなく、未来の情報も利用できる。これにより、視線が今後注がれる領域に、前もって情報量を配分することが可能となり、画質改善を図ることができる。 As described above, according to the fifth embodiment, since the line-of-sight information of the entire images of a plurality of people is accumulated, the influence of the personal characteristics of the observer 100 is reduced and more general line-of-sight information is used. can do. In addition, since line-of-sight information and image feature information are stored, not only past information but also future information can be used for encoding. This makes it possible to allocate the amount of information in advance to an area where the line of sight will be poured in the future, and to improve image quality.

実施の形態５は、実施の形態３の観察者１００を複数人にしたものであるが、他の実施の形態においても観察者１００を複数人としてもよい。 In the fifth embodiment, a plurality of observers 100 according to the third embodiment are used. However, in another embodiment, a plurality of observers 100 may be used.

実施の形態１、２、５において、図２、４、６、７、１１、１２は画面を６４分割しているが、分割数は６４に限定する必要はない。 In Embodiments 1, 2, and 5, FIGS. 2, 4, 6, 7, 11, and 12 divide the screen into 64, but the number of divisions is not necessarily limited to 64.

本発明は、システムや装置にプログラムを供給することによって達成される場合にも適用できることは言うまでもない。 Needless to say, the present invention can also be applied to a case where the present invention is achieved by supplying a program to a system or apparatus.

この発明の実施の形態１に係る動画像符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the moving image encoder which concerns on Embodiment 1 of this invention. 実施の形態１において、時刻Ｔ１〜Ｔ３間の視線位置データを示した図である。In Embodiment 1, it is the figure which showed the gaze position data between the times T1-T3. 図１中の視線情報解析部１０７が注目領域を判定する処理フローを示すフローチャートである。It is a flowchart which shows the processing flow which the gaze information analysis part 107 in FIG. 1 determines an attention area. 実施の形態１において、目標情報量の設定変更例を示した図である。In Embodiment 1, it is the figure which showed the example of setting change of the target information amount. この発明の実施の形態２に係る動画像符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the moving image encoder which concerns on Embodiment 2 of this invention. 実施の形態２において、ある期間内の視線位置データを示した図である。In Embodiment 2, it is the figure which showed the gaze position data within a certain period. 実施の形態２において、目標情報量の設定変更例を示した図である。In Embodiment 2, it is the figure which showed the example of setting change of the target information amount. この発明の実施の形態３に係る動画像符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the moving image encoder which concerns on Embodiment 3 of this invention. この発明の実施の形態４に係る動画像符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the moving image encoder which concerns on Embodiment 4 of this invention. この発明の実施の形態５に係る動画像符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the moving image encoder which concerns on Embodiment 5 of this invention. 実施の形態５において、時刻Ｔ１〜Ｔ２間の３人の観察者１００の視線位置データを示したものである。In Embodiment 5, the line-of-sight position data of three observers 100 between times T1 and T2 are shown. 実施の形態５において、目標情報量の設定変更例を示した図である。In Embodiment 5, it is the figure which showed the example of setting change of the target information amount.

Explanation of symbols

１００観察者、１０１画像表示部、１０２画像特徴抽出部、１０３符号化パラメータ設定部、１０４ブロック化部、１０５ビデオ符号化部、１０６視線検出部、１０７視線情報解析部、１０８入力画像データ、１０９注目領域演算部、１１０注目領域蓄積部、１１１学習データ演算部、１１２学習データ蓄積部、２０１入力信号、２０２視線、２０３視線情報、２０４視線解析データ、２０５符号化パラメータ、２０６分割信号、２０７符号化データ、２０８画像特徴信号、２０９注目領域特定信号、２１０注目領域蓄積データ、２１１入力信号、２１２学習データ信号、２１３蓄積学習データ。
100 observer, 101 image display unit, 102 image feature extraction unit, 103 coding parameter setting unit, 104 blocking unit, 105 video coding unit, 106 gaze detection unit, 107 gaze information analysis unit, 108 input image data, 109 Region-of-interest calculation unit 110 Region-of-interest storage unit 111 Learning data calculation unit 112 Learning data storage unit 201 Input signal 202 Line of sight 203 Line-of-sight information 204 Line-of-sight analysis data 205 Coding parameter 206 Divided signal 207 Code Data, 208 image feature signal, 209 attention area specifying signal, 210 attention area accumulation data, 211 input signal, 212 learning data signal, 213 accumulation learning data.

Claims

A blocking unit that outputs a divided signal obtained by dividing a video input signal into blocks composed of a plurality of pixels;
A line-of-sight detection unit that outputs line-of-sight information obtained by detecting the line of sight of at least one observer who observes the video in synchronization with the input signal;
A line-of-sight information analysis unit that outputs line-of-sight analysis information obtained by analyzing the line-of-sight information;
An encoding parameter setting unit that sets an encoding parameter based on the line-of-sight analysis information;
A video encoding apparatus comprising: a video encoding unit that encodes the divided signal based on the encoding parameter.

The moving image encoding apparatus according to claim 1, wherein the line-of-sight information analysis unit analyzes a distribution of the line-of-sight information in a time axis direction or a space axis direction.

The moving image encoding apparatus according to claim 1, wherein the line-of-sight information analysis unit performs weighting according to the observer when analyzing the line-of-sight information of a plurality of the observers.

An image feature extraction unit for outputting an image feature signal obtained by extracting an image feature from the input signal;
4. The moving image according to claim 1, wherein the encoding parameter setting unit sets the encoding parameter based on the line-of-sight analysis information and the image feature signal. 5. Image encoding device.

A region-of-interest calculator that outputs a region-of-interest specifying signal that calculates the region of interest of the image from the line-of-sight analysis data and the image feature signal;
A region-of-interest storage unit that outputs region-of-interest storage data that stores the region-of-interest specifying signal for the entire video;
5. The moving image encoding apparatus according to claim 4, wherein the encoding parameter setting unit sets the encoding parameter based on the attention area accumulation data.

A learning data calculation unit that outputs a learning data signal obtained by learning the line-of-sight information based on the line-of-sight analysis data and the image feature signal;
A learning data accumulation unit that outputs accumulated learning data in which the learning data signal is accumulated;
5. The moving image encoding apparatus according to claim 4, wherein the encoding parameter setting unit sets the encoding parameter based on the line-of-sight analysis information, the image feature signal, and the accumulated learning data.

The moving image encoding apparatus according to claim 6, wherein the learning data calculation unit learns a tendency of the observer's line of sight and outputs the learning data signal.