JP2010161740A

JP2010161740A - Image coding device and image coding method

Info

Publication number: JP2010161740A
Application number: JP2009003971A
Authority: JP
Inventors: Satoru Kobayashi; 悟小林
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2009-01-09
Filing date: 2009-01-09
Publication date: 2010-07-22
Anticipated expiration: 2029-01-09
Also published as: JP5100667B2

Abstract

<P>PROBLEM TO BE SOLVED: To perform speedy playback and easy editing by creating a scene with an expression such as a smiling face or a tear-stained face as a reference frame while suppressing reduction in the efficiency of coding. <P>SOLUTION: An image coding device includes: a face information creating means for creating face information for identifying a face by analyzing an input image signal consisting of a plurality of frames; a coding means for performing compression coding on the input image signal utilizing an inter-frame prediction scheme; a prohibition decision means for determining whether to prohibit reference jumping over frames when performing inter-frame prediction in the coding means on the basis of the face information created by the face information creating means; and a setting means for setting a reference frame to prohibit the jumping reference when the prohibition decision means determines that the face information meets a prohibition condition to prohibit the reference jumping over frames. Coding is performed in accordance with a random access enabled picture type (IDR picture). <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は画像符号化装置及び画像符号化方法に関し、特に、フレーム間予測を行って画像を圧縮符号化するために用いて好適な技術に関する。 The present invention relates to an image encoding device and an image encoding method, and more particularly to a technique suitable for use in compression encoding an image by performing inter-frame prediction.

画像を高能率符号化するための技術として、ＪＰＥＧ方式の圧縮技術や動き予測・動き補償技術を用いたＭＰＥＧ１、２といった符号化方式が確立されている。各メーカーは、これらの符号化方式を利用して画像を記録媒体に記録可能としたディジタルカメラやディジタルビデオカメラといった撮像装置、或いはＤＶＤレコーダーなどの記録装置を開発し製品化している。 As techniques for high-efficiency encoding of images, encoding systems such as MPEG1 and MPEG2 using JPEG compression techniques and motion prediction / compensation techniques have been established. Each manufacturer has developed and commercialized an imaging device such as a digital camera or a digital video camera, or a recording device such as a DVD recorder, which can record an image on a recording medium using these encoding methods.

一方、ユーザは、これらの撮像装置や記録装置、或いはパーソナルコンピュータやＤＶＤプレーヤーなどを用いて画像を簡単に視聴することが可能となっている。 On the other hand, the user can easily view images using these imaging devices, recording devices, personal computers, DVD players, and the like.

ところで、ディジタル化された動画像は膨大なデータ量となる。そこで、ＭＰＥＧ１，２などよりも更なる高圧縮が望める動画像の符号化方式が研究され続けてきている。近年、ＩＴＵ−Ｔ（国際電気通信連合電気通信標準化部門）とＩＳＯ（国際標準化機構）により、Ｈ．２６４／ＭＰＥＧ−４ｐａｒｔ１０という符号化方式（以下、「Ｈ．２６４」と称す。）が標準化された。 By the way, the digitized moving image has a huge amount of data. Therefore, research has been continued on a moving picture encoding method that can achieve higher compression than MPEG 1 and 2 and the like. In recent years, ITU-T (International Telecommunication Union Telecommunication Standardization Sector) and ISO (International Organization for Standardization) An encoding method called H.264 / MPEG-4 part 10 (hereinafter referred to as “H.264”) has been standardized.

ここで、Ｈ．２６４におけるピクチャタイプ及びフレーム間予測に用いる参照画像の選択について、図１２及び図１３を参照して説明する。なお、図１２（ａ）〜（ｃ）及び図１３（ａ）〜（ｂ）は、入力画像シーケンス及びそのピクチャタイプを表しており、上段が表示順序（左から順に表示）、下段が符号化順序（左から順に符号化）で示している。 Here, H. Selection of a reference image used for picture type and inter-frame prediction in H.264 will be described with reference to FIGS. 12 (a) to 12 (c) and 13 (a) to 13 (b) show the input image sequence and its picture type. The upper row is the display order (displayed in order from the left), and the lower row is encoded. It is shown in order (encoded in order from the left).

例えば、図１２（ａ）において、Ｐ８ピクチャは９番目に表示されるＰピクチャのフレームであることを示している。また、図１２中の矢印は参照関係を示しており、例えば、図１２（ａ）に示した例では、Ｐ８ピクチャがＢ０ピクチャを参照していることを示す。また、図１２（ｂ）に示した例では、Ｂ０ピクチャがＰ２ピクチャとＢ７ピクチャとを参照していることを示す。 For example, in FIG. 12A, the P8 picture is the ninth P picture frame displayed. Also, the arrows in FIG. 12 indicate the reference relationship. For example, in the example illustrated in FIG. 12A, the P8 picture refers to the B0 picture. In the example shown in FIG. 12B, the B0 picture refers to the P2 picture and the B7 picture.

Ｈ．２６４における画像フレームのピクチャタイプは３種類である。すなわち、同一フレーム内の情報のみから符号化するＩピクチャと、時間的に前のフレームとの差分を利用して符号化するＰピクチャと、さらに時間的に前のフレームとの差分に加えて時間的に後のフレームとの差分も利用できるＢピクチャがある。 H. There are three picture types of image frames in H.264. That is, in addition to the difference between the I picture that is encoded only from the information in the same frame, the P picture that is encoded using the difference between the temporally previous frame, and the temporally previous frame, the time In particular, there is a B picture that can use a difference from a later frame.

Ｈ．２６４においては、フレーム間予測を行う際に、画像シーケンス中の任意のフレーム及びピクチャタイプを参照画像として利用することが可能である。例えば、図１２（ａ）に示したようにＰピクチャ（Ｐ８）は、Ｉピクチャだけでなく、Ｉピクチャを飛び越してのフレームの参照が可能となる。同様に、図１２（ｂ）に示したように、Ｂピクチャ（Ｂ０）もＩピクチャだけでなく、Ｉピクチャを飛び越してのフレームの参照が可能となる。 H. In H.264, when performing inter-frame prediction, any frame and picture type in an image sequence can be used as a reference image. For example, as shown in FIG. 12A, the P picture (P8) can refer to not only the I picture but also the frame by skipping the I picture. Similarly, as shown in FIG. 12B, the B picture (B0) can be referred to not only the I picture but also the I picture.

このように、Ｈ．２６４では柔軟な参照を許容している。これにより、ＭＰＥＧ２のようにＰピクチャであれば当該Ｐピクチャの直前のＩピクチャもしくはＰピクチャしか参照できないような方式と比較して、Ｈ．２６４は、フレーム間予測精度が向上し、符号化効率を向上させることができる。 In this way, H.C. H.264 allows flexible references. As a result, as compared with a method in which only the I picture or the P picture immediately before the P picture can be referred to in the case of a P picture like MPEG2, the H. H.264 improves inter-frame prediction accuracy and improves coding efficiency.

一方、前述したような柔軟な参照を許容したために、Ｈ．２６４においてはランダムアクセスが迅速に行えなくなる場合がある。その一例として、図１２（ｃ）において、ランダムアクセスにより画像シーケンスの途中のフレームであるＩ５ピクチャより再生する場合について説明する。 On the other hand, since the flexible reference as described above is allowed, In H.264, random access may not be performed quickly. As an example, a case where playback is performed from an I5 picture that is a frame in the middle of an image sequence by random access in FIG.

画像シーケンス中のＩ５ピクチャから再生を開始して、Ｐ８ピクチャを復号する場合には、Ｐ８ピクチャがＢ０ピクチャを参照しているので、このＢ０ピクチャを前以って復号しておく必要がある。さらに、Ｂ０ピクチャはＰ２ピクチャとＢ７ピクチャとを参照しているので、Ｂ０ピクチャを復号するには、これらＰ２ピクチャ及びＢ７ピクチャを前以って復号しておく必要がある。 When decoding is started from the I5 picture in the image sequence and the P8 picture is decoded, since the P8 picture refers to the B0 picture, it is necessary to decode the B0 picture in advance. Furthermore, since the B0 picture refers to the P2 picture and the B7 picture, it is necessary to decode the P2 picture and the B7 picture in advance in order to decode the B0 picture.

同様に、図示していないが、Ｐ２ピクチャ及びＢ７ピクチャもそれぞれ他のピクチャを参照しているので、Ｐ２ピクチャ及びＢ７ピクチャを復号するには、他のピクチャを前以って復号しておく必要がある。このように、Ｉ５ピクチャから再生を開始したい場合であっても、Ｉ５ピクチャを飛び越しての参照を許容しているために、Ｉ５ピクチャ以前のデータに遡って復号を開始する必要が生じ、Ｉ５ピクチャから迅速に再生を開始することが困難になる。 Similarly, although not shown, since the P2 picture and the B7 picture also refer to other pictures, respectively, in order to decode the P2 picture and the B7 picture, it is necessary to decode the other pictures in advance. There is. As described above, even when it is desired to start reproduction from the I5 picture, since the reference is allowed to skip the I5 picture, it is necessary to start decoding retroactively to the data before the I5 picture. It becomes difficult to start playback quickly.

そこで、この問題を解消し迅速なランダムアクセスを実現可能とするために、定期的にＩピクチャに制限を設ける方法が提案されている（特許文献１参照）。この制限付きのＩピクチャは、Ｈ．２６４ではＩＤＲピクチャと呼ばれている。 Therefore, in order to solve this problem and realize quick random access, a method of periodically limiting I pictures has been proposed (see Patent Document 1). This restricted I picture is H.264. In H.264, this is called an IDR picture.

ここで、図１３（ａ）及び（ｂ）を参照して、ＩＤＲピクチャについて説明する。
図１３（ａ）及び（ｂ）に示した画像シーケンスは、図１２（ａ）及び（ｂ）と同様の画像シーケンスに対して、Ｉ５ピクチャをＩＤＲピクチャに設定した画像シーケンスである。 Here, the IDR picture will be described with reference to FIGS. 13 (a) and 13 (b).
The image sequence shown in FIGS. 13A and 13B is an image sequence in which the I5 picture is set as the IDR picture with respect to the image sequence similar to FIGS. 12A and 12B.

Ｉ５ピクチャをＩＤＲピクチャに設定すると、該ピクチャを符号化するときに参照画像を記録しているフレームメモリがクリアされる。したがって、ＩＤＲピクチャ以降に符号化されるピクチャが、そのＩＤＲピクチャ以前に符号化されたピクチャを参照することができない。同様に、ＩＤＲピクチャ以前に符号化されたピクチャが、そのＩＤＲピクチャ以降に符号化されるピクチャを参照することができない。 When the I5 picture is set as the IDR picture, the frame memory in which the reference image is recorded is cleared when the picture is encoded. Therefore, pictures encoded after the IDR picture cannot refer to pictures encoded before the IDR picture. Similarly, a picture encoded before an IDR picture cannot refer to a picture encoded after the IDR picture.

図１３（ａ）に示した例では、ＩＤＲ（ＩＤＲ５）ピクチャ以降に符号化されるＰピクチャ（Ｐ８など）やＢピクチャ（Ｂ６など）は、そのＩＤＲピクチャ以前に符号化されたＰピクチャ（Ｐ２など）やＢピクチャ（Ｂ０など）を参照することができない。 In the example shown in FIG. 13A, a P picture (such as P8) or a B picture (such as B6) encoded after an IDR (IDR5) picture is a P picture (P2) encoded before the IDR picture. Etc.) and B picture (B0 etc.) cannot be referred to.

逆に、図１３（ｂ）に示した例では、ＩＤＲ（ＩＤＲ５）ピクチャ以前に符号化されるＰピクチャ（Ｐ２など）やＢピクチャ（Ｂ０など）は、そのＩＤＲピクチャ以降に符号化されたＰピクチャ（Ｐ８など）やＢピクチャ（Ｂ６など）を参照することができない。 Conversely, in the example shown in FIG. 13B, a P picture (such as P2) or a B picture (such as B0) encoded before an IDR (IDR5) picture is a P picture encoded after the IDR picture. A picture (such as P8) or a B picture (such as B6) cannot be referenced.

これにより、ＩＤＲピクチャから再生を開始すれば、ＩＤＲピクチャ以前の画像データまで遡って復号する必要がないので、迅速なランダムアクセスを実現して再生することができる。 As a result, if playback is started from the IDR picture, it is not necessary to decode back to the image data before the IDR picture, so that it is possible to realize playback with quick random access.

また、ＩＤＲピクチャを飛び越しての参照が禁止されるので、例えば、ＩＤＲピクチャをカットフレームとしたカット編集が再符号化処理をせずに可能となる。このようなＩＤＲピクチャを利用して編集を行うために、画像の動きの変化に応じて撮影者にとって重要と思われるシーンを判定し、ＩＤＲピクチャを設定する方法が提案されている（特許文献２参照）。 In addition, since the reference skipping the IDR picture is prohibited, for example, cut editing using the IDR picture as a cut frame becomes possible without performing re-encoding processing. In order to perform editing using such an IDR picture, a method is proposed in which a scene that is considered to be important for a photographer is determined in accordance with a change in image motion, and an IDR picture is set (Patent Document 2). reference).

特開２００３−１９９１１２号公報JP 2003-199112 A 特開２００６−１５７８９３号公報JP 2006-157893 A

前述のようにＨ．２６４符号化方式では、フレーム間予測の参照関係を制限するＩＤＲピクチャを利用することでランダムアクセスを迅速に行うことができる。そのため、画像シーケンスの任意の場所から迅速な再生及び容易な編集を行うには数多くのＩＤＲピクチャが設定されている必要がある。 As described above, H.P. In the H.264 coding scheme, random access can be quickly performed by using an IDR picture that restricts the reference relationship of inter-frame prediction. Therefore, a large number of IDR pictures need to be set in order to perform quick reproduction and easy editing from any place in the image sequence.

しかし、ＩＤＲピクチャを設定することにより、前述のように参照関係が制限される。このために、数多くのＩＤＲピクチャを設定すると符号化効率は低化する可能性がある。すなわち、符号化効率を考慮するならば、ＩＤＲピクチャの設定は必要最低限にすることが望ましい。 However, by setting an IDR picture, the reference relationship is limited as described above. For this reason, if a large number of IDR pictures are set, the coding efficiency may be lowered. That is, it is desirable to set the IDR picture to the minimum necessary in consideration of encoding efficiency.

特許文献１のように、定期的にＩＤＲピクチャを設定する場合は、ランダムアクセスに必要のないフレームもＩＤＲピクチャに設定され、符号化効率は低化してしまう可能性がある。また、特許文献２のように、画像の動きの変化に応じて撮影者にとって重要と思われるシーンを判定してＩＤＲピクチャを設定する方法では、撮影者にとって重要と思われるシーンの全てについて判定することは困難である。 When an IDR picture is set periodically as in Patent Document 1, a frame that is not required for random access is also set as an IDR picture, which may reduce the coding efficiency. Further, as in Patent Document 2, in the method of setting an IDR picture by determining a scene that seems to be important for a photographer according to a change in image motion, all scenes that are considered important for the photographer are determined. It is difficult.

例えば、人が笑ったり、泣いたりするシーンは重要なシーンであると考えられるが、特許文献２のように画像の動きの変化に基いて重要なシーンを判定する方法では、笑顔や泣き顔のシーンを検出するのは困難である。 For example, a scene in which a person laughs or crys is considered an important scene. However, in the method of determining an important scene based on a change in image movement as in Patent Document 2, a scene of a smile or a crying face Is difficult to detect.

本発明は前述の問題点に鑑みてなされたものであり、符号化効率の低化を抑えながら、笑顔や泣き顔等の表情であるシーンから迅速な再生及び容易な編集を行うことができるようにすることを目的としている。 The present invention has been made in view of the above-described problems, so that rapid reproduction and easy editing can be performed from a scene having a facial expression such as a smile or a crying face while suppressing a decrease in encoding efficiency. The purpose is to do.

本発明の画像符号化装置は、複数のフレームから成る入力画像信号を圧縮符号化する画像符号化装置において、前記入力画像信号を解析して顔を識別するための顔情報を作成する顔情報作成手段と、前記入力画像信号を、フレーム間予測方式を利用して圧縮符号化する符号化手段と、前記符号化手段における符号化対象フレームに対して前記顔情報作成手段により作成された顔情報に基づき、当該符号化対象フレームを飛び越したフレーム間予測での参照を禁止するか否かを判定する禁止判定手段と、前記禁止判定手段により、前記符号化対象フレームを飛び越した参照を禁止すると判定された場合に、前記符号化対象フレームを飛び越し参照を禁止する基準フレームに設定する設定手段とを備えることを特徴とする。 An image encoding apparatus according to the present invention is an image encoding apparatus that compresses and encodes an input image signal composed of a plurality of frames, and generates face information for analyzing the input image signal and generating face information for identifying a face. Means, encoding means for compressing and encoding the input image signal using an inter-frame prediction method, and face information generated by the face information generating means for the encoding target frame in the encoding means. Based on the prohibition determination means for determining whether to prohibit reference in inter-frame prediction that skips the encoding target frame and the prohibition determination means, it is determined to prohibit reference that skips the encoding target frame. Setting means for setting the frame to be encoded to a reference frame that prohibits interlaced reference.

本発明の画像符号化方法は、複数のフレームから成る入力画像信号を圧縮符号化する画像符号化方法において、前記入力画像信号を解析して顔を識別するための顔情報を作成する顔情報作成工程と、前記入力画像信号を、フレーム間予測方式を利用して圧縮符号化する符号化工程と、前記符号化工程における符号化対象フレームに対して前記顔情報作成工程において作成された顔情報に基づき、当該符号化対象フレームを飛び越したフレーム間予測での参照を禁止するか否かを判定する禁止判定工程と、前記禁止判定工程において、前記符号化対象フレームを飛び越した参照を禁止すると判定された場合に、前記符号化対象フレームを飛び越し参照を禁止する基準フレームに設定する設定工程とを備えることを特徴とする画像符号化方法。 An image encoding method of the present invention is an image encoding method for compressing and encoding an input image signal composed of a plurality of frames, and generating face information for analyzing the input image signal and generating face information for identifying a face A step of compressing and encoding the input image signal using an inter-frame prediction method, and the face information created in the face information creation step for the encoding target frame in the encoding step. Based on the prohibition determination step for determining whether to prohibit reference in inter-frame prediction that skips the encoding target frame, and in the prohibition determination step, it is determined that reference that skips the encoding target frame is prohibited. And a setting step of setting the encoding target frame to a reference frame that prohibits interlaced reference.

本発明のコンピュータプログラムは、複数のフレームから成る入力画像信号を圧縮符号化する処理をコンピュータに実行させるコンピュータプログラムにおいて、前記入力画像信号を解析して顔を識別するための顔情報を作成する顔情報作成工程と、前記入力画像信号を、フレーム間予測方式を利用して圧縮符号化する符号化工程と、前記符号化工程における符号化対象フレームに対して前記顔情報作成工程において作成された顔情報に基づき、当該符号化対象フレームを飛び越したフレーム間予測での参照を禁止するか否かを判定する禁止判定工程と、前記禁止判定工程において、前記符号化対象フレームを飛び越した参照を禁止すると判定された場合に、前記符号化対象フレームを飛び越し参照を禁止する基準フレームに設定する設定工程とをコンピュータに実行させることを特徴とする。 The computer program of the present invention is a computer program for causing a computer to execute a process of compressing and encoding an input image signal composed of a plurality of frames, and generating face information for identifying the face by analyzing the input image signal An information creating step, a coding step for compressing and coding the input image signal using an inter-frame prediction method, and a face created in the face information creating step for a frame to be coded in the coding step Based on the information, in a prohibition determination step for determining whether or not reference in inter-frame prediction that skips the encoding target frame is prohibited, and prohibiting reference that skips the encoding target frame in the prohibition determination step If it is determined, a setting process for setting the encoding target frame to a reference frame that prohibits interlaced reference. Characterized in that to execute the door to the computer.

本発明によれば、表情の度合いに応じて、必要最低限の画像フレームを基準フレームに設定して符号化を行うようにしたので、符号化効率の低化を抑えながら表情の度合いが高いシーンからの迅速な再生及び容易な編集を行うことが可能となる。これにより、例えば、笑顔フレームから頭出しを行うようにしたり、笑顔フレームを起点とした編集を行うようにしたりすることが容易にできる。 According to the present invention, encoding is performed by setting the minimum necessary image frame as a reference frame according to the degree of facial expression, so a scene with a high degree of facial expression while suppressing a decrease in encoding efficiency. Can be quickly reproduced and easily edited. Thereby, for example, it is possible to easily start from the smile frame, or to perform editing with the smile frame as a starting point.

（第１の実施形態）
図１は、本発明の実施形態を示し、画像符号化装置の構成例を示すブロック図である。本実施形態における画像符号化装置は、表情の度合いに応じて基準フレームを設定し、符号化を行う装置である。以下、本実施形態の画像符号化装置の構成例について図１を参照して説明する。 (First embodiment)
FIG. 1 is a block diagram illustrating an embodiment of the present invention and a configuration example of an image encoding device. The image encoding apparatus according to the present embodiment is an apparatus that performs encoding by setting a reference frame according to the degree of facial expression. Hereinafter, a configuration example of the image encoding device of the present embodiment will be described with reference to FIG.

本実施形態における画像符号化装置は、符号化部１０１、顔判定部１０２及び基準フレーム設定判定部１０３を有する。符号化部１０１は、入力される映像信号（入力画像信号）の圧縮符号化を行い、符号化ストリームを生成して出力する。本実施形態における符号化方式は、Ｈ．２６４符号化方式やＭＰＥＧ２符号化方式などのフレーム間予測方式を利用した符号化方式であり、以下は、Ｈ．２６４符号化方式を例として説明する。 The image encoding apparatus according to the present embodiment includes an encoding unit 101, a face determination unit 102, and a reference frame setting determination unit 103. The encoding unit 101 performs compression encoding of an input video signal (input image signal), generates an encoded stream, and outputs the encoded stream. The encoding method in this embodiment is H.264. H.264 encoding scheme and MPEG2 encoding scheme, and other encoding schemes using inter-frame prediction schemes. The H.264 encoding method will be described as an example.

顔判定部１０２は、入力される映像信号を解析し、被写体の顔を解析して顔を識別するための顔情報作成を行い、作成した顔情報を出力する。顔情報の詳細については後述する。基準フレーム設定判定部１０３は、顔判定部１０２から出力された顔情報に応じて、符号化部１０１で符号化する符号化対象フレームを基準フレームとして符号化すると判定した場合に、基準フレーム設定情報を符号化部１０１に出力する。 The face determination unit 102 analyzes the input video signal, analyzes the face of the subject, creates face information for identifying the face, and outputs the created face information. Details of the face information will be described later. When the reference frame setting determination unit 103 determines that the encoding target frame to be encoded by the encoding unit 101 is to be encoded as the reference frame according to the face information output from the face determination unit 102, the reference frame setting information Is output to the encoding unit 101.

ここで、基準フレームとは、該基準フレームを飛び越したフレーム間予測での参照を禁止することにより、迅速にランダムアクセスが可能なピクチャタイプを設定されたフレームである。Ｈ．２６４符号化方式においては、基準フレームとは、ＩＤＲピクチャのフレームである。ＭＰＥＧ符号化方式においては、基準フレームとはIピクチャのフレームである。 Here, the reference frame is a frame in which a picture type capable of quick random access is set by prohibiting reference in inter-frame prediction that skips the reference frame. H. In the H.264 encoding method, the reference frame is an IDR picture frame. In the MPEG encoding method, the reference frame is an I picture frame.

次に、符号化部１０１、顔判定部１０２、基準フレーム設定判定部１０３の動作について詳しく説明する。
まず、符号化部１０１の構成例について、図２を参照して詳しく説明する。
図２は、符号化部１０１の構成例を示すブロック図である。図２に示すように、符号化部１０１は、フレーム並び替え部２０１、減算器２０２、整数変換部２０３、量子化部２０４、エントロピー符号化部２０５、逆量子化部２０６、逆整数変換部２０７等を有する。また、加算器２０８、第１のフレームメモリ２０９及び第２のフレームメモリ２１３、イントラ予測部２１０、第１のスイッチ２１１及び第２のスイッチ２１７を有する。さらに、デブロッキングフィルタ２１２、インター予測部２１４、動き検出部２１５、ピクチャタイプ決定部２１６を有する。 Next, operations of the encoding unit 101, the face determination unit 102, and the reference frame setting determination unit 103 will be described in detail.
First, a configuration example of the encoding unit 101 will be described in detail with reference to FIG.
FIG. 2 is a block diagram illustrating a configuration example of the encoding unit 101. As shown in FIG. 2, the encoding unit 101 includes a frame rearrangement unit 201, a subtracter 202, an integer conversion unit 203, a quantization unit 204, an entropy encoding unit 205, an inverse quantization unit 206, and an inverse integer conversion unit 207. Etc. The adder 208 includes a first frame memory 209 and a second frame memory 213, an intra prediction unit 210, a first switch 211, and a second switch 217. Furthermore, a deblocking filter 212, an inter prediction unit 214, a motion detection unit 215, and a picture type determination unit 216 are included.

このように構成された本実施形態の画像符号化装置における符号化部１０１は、入力された映像信号を分割することによりブロックを構成し、ブロック単位に符号化処理を行って符号化ストリームを出力する。 The encoding unit 101 in the image encoding apparatus of the present embodiment configured as described above configures a block by dividing an input video signal, performs an encoding process on a block basis, and outputs an encoded stream To do.

続いて、符号化部１０１で行われる符号化処理について説明する。
まず、フレーム並び替え部２０１は、表示順で入力された映像信号を符号化順に並び替える。減算器２０２は、入力画像データから予測画像データを減算して画像残差データを整数変換部２０３に出力する。なお、予測画像データの生成については後述する。 Next, the encoding process performed by the encoding unit 101 will be described.
First, the frame rearrangement unit 201 rearranges the video signals input in the display order in the encoding order. The subtracter 202 subtracts the predicted image data from the input image data and outputs the image residual data to the integer conversion unit 203. The generation of predicted image data will be described later.

整数変換部２０３は、減算器２０２から出力された画像残差データを直交変換処理して変換係数を量子化部２０４に出力する。量子化部２０４は、整数変換部２０３より出力された変換係数を所定の量子化パラメータを用いて量子化する。エントロピー符号化部２０５は、量子化部２０４で量子化された変換係数を入力し、これをエントロピー符号化して符号化ストリームとして出力する。 The integer transform unit 203 performs orthogonal transform processing on the image residual data output from the subtracter 202 and outputs transform coefficients to the quantization unit 204. The quantization unit 204 quantizes the transform coefficient output from the integer transform unit 203 using a predetermined quantization parameter. The entropy encoding unit 205 receives the transform coefficient quantized by the quantization unit 204, entropy encodes it, and outputs it as an encoded stream.

一方、量子化部２０４で量子化された変換係数は、前述した予測画像データの生成にも使われる。逆量子化部２０６は、量子化部２０４で量子化された変換係数を逆量子化する。逆整数変換部２０７は、逆量子化部２０６で逆量子化された変換係数を逆整数変換し、復号画像残差データとして出力する。 On the other hand, the transform coefficient quantized by the quantization unit 204 is also used for generating the predicted image data described above. The inverse quantization unit 206 inversely quantizes the transform coefficient quantized by the quantization unit 204. The inverse integer transform unit 207 performs inverse integer transform on the transform coefficient inversely quantized by the inverse quantization unit 206 and outputs it as decoded image residual data.

加算器２０８は、逆整数変換部２０７より出力された復号画像残差データと、予測画像データとを加算して、再構成画像データとして出力する。加算器２０８から出力された再構成画像データは、フレームメモリ２０９に記録される。それとともに、再構成画像データに対してデブロッキングフィルタ処理を施す場合にはデブロッキングフィルタ２１２を介して第２のフレームメモリ２１３に記録される。また、デブロッキングフィルタ処理を施さない場合にはデブロッキングフィルタ２１２を介さずに第２のフレームメモリ２１３に記録される。 The adder 208 adds the decoded image residual data output from the inverse integer transform unit 207 and the predicted image data, and outputs the result as reconstructed image data. The reconstructed image data output from the adder 208 is recorded in the frame memory 209. At the same time, when the deblocking filter process is performed on the reconstructed image data, it is recorded in the second frame memory 213 via the deblocking filter 212. Further, when the deblocking filter process is not performed, it is recorded in the second frame memory 213 without going through the deblocking filter 212.

第１のスイッチ２１１は、加算器２０８から出力された再構成画像データに対してデブロッキングフィルタ処理を施すか否かを選択する選択部である。再構成画像データの中で、以降の予測で参照される可能性があるデータは、第１のフレームメモリ２０９または第２のフレームメモリ２１３に暫くの期間保存される。 The first switch 211 is a selection unit that selects whether to perform deblocking filter processing on the reconstructed image data output from the adder 208. Among the reconstructed image data, data that may be referred to in the subsequent prediction is stored in the first frame memory 209 or the second frame memory 213 for a while.

イントラ予測部２１０は、第１のフレームメモリ２０９に記録された再構成画像データを用いてフレーム内予測処理を行い、予測画像データを生成する。また、インター予測部２１４は、第２のフレームメモリ２１３に記録された再構成画像データを用いて動き検出部２１５により検出された動きベクトル情報に基づくフレーム間予測処理を行い、予測画像データを生成する。ここで、動き検出部２１５は、入力画像データにおける動きベクトルを検出して、検出した動きベクトル情報をエントロピー符号化部２０５及びインター予測部２１４にそれぞれ出力する。 The intra prediction unit 210 performs intra-frame prediction processing using the reconstructed image data recorded in the first frame memory 209, and generates predicted image data. Also, the inter prediction unit 214 performs inter-frame prediction processing based on the motion vector information detected by the motion detection unit 215 using the reconstructed image data recorded in the second frame memory 213, and generates predicted image data To do. Here, the motion detection unit 215 detects a motion vector in the input image data, and outputs the detected motion vector information to the entropy encoding unit 205 and the inter prediction unit 214, respectively.

ピクチャタイプ決定部２１６は、符号化を行うピクチャタイプをイントラ予測部２１０、インター予測部２１４及び第２のスイッチ２１７に出力する。ピクチャタイプの決定方法は、基準フレーム設定判定部１０３により符号化フレームを基準フレームにすると判定された場合には、該フレームを基準フレームとする。また、そうでない場合には、フレームのピクチャタイプを符号化方式に準拠したピクチャタイプに決定する。 The picture type determination unit 216 outputs the picture type to be encoded to the intra prediction unit 210, the inter prediction unit 214, and the second switch 217. When the reference frame setting determination unit 103 determines that the encoded frame is to be a reference frame, the picture type is determined as the reference frame. Otherwise, the picture type of the frame is determined as a picture type that conforms to the encoding method.

なお、符号化フレームが基準フレームと判定された場合には、そのフレームのピクチャタイプをＩピクチャと決定すると共に、そのフレームに飛び越し参照禁止フラグを付加する。そして、飛び越し参照禁止フラグの有無に基づいて禁止判定を行い、インター予測部２１４によりＩピクチャを飛び越さないような参照関係を決定させるようにしてもよい。 When it is determined that the encoded frame is a reference frame, the picture type of the frame is determined to be an I picture, and a jump reference prohibition flag is added to the frame. Then, the prohibition determination may be performed based on the presence or absence of the skip reference prohibition flag, and the inter prediction unit 214 may determine a reference relationship that does not skip the I picture.

第２のスイッチ２１７は、予測画像データとしてイントラ予測部２１０で生成された予測画像データ又はインター予測部２１４で生成された予測画像データのどちらを用いるかを選択するための選択部である。すなわち、イントラ予測又はインター予測のどちらを用いるか選択するための機能を有する。 The second switch 217 is a selection unit for selecting whether to use predicted image data generated by the intra prediction unit 210 or predicted image data generated by the inter prediction unit 214 as predicted image data. That is, it has a function for selecting whether to use intra prediction or inter prediction.

ピクチャタイプ決定部２１６によって決定されたピクチャタイプに応じて第２のスイッチ２１７を制御する。これにより、イントラ予測部２１０からの出力とインター予測部２１４からの出力のどちらか一方を選択し、選択された予測画像データを減算器２０２、加算器２０８に出力する。以上が符号化部１０１に関する説明である。 The second switch 217 is controlled according to the picture type determined by the picture type determination unit 216. As a result, either the output from the intra prediction unit 210 or the output from the inter prediction unit 214 is selected, and the selected predicted image data is output to the subtracter 202 and the adder 208. The above is the description regarding the encoding unit 101.

次に、顔判定部１０２について、図３、図４及び図５を参照して詳細に説明する。
図３は、顔判定部１０２の構成例を示すブロック図である。図３に示すように、顔判定部１０２は、顔検出部３０１、顔認識履歴データ記録部３０２、顔認識部３０３、表情判定部３０４、及びスイッチ３０５から構成される。 Next, the face determination unit 102 will be described in detail with reference to FIG. 3, FIG. 4, and FIG.
FIG. 3 is a block diagram illustrating a configuration example of the face determination unit 102. As shown in FIG. 3, the face determination unit 102 includes a face detection unit 301, a face recognition history data recording unit 302, a face recognition unit 303, a facial expression determination unit 304, and a switch 305.

続いて、顔判定部１０２で行われる顔判定処理について説明する。
まず、顔検出部３０１は、入力される映像信号のフレーム、すなわち符号化対象フレームに含まれる被写体の少なくとも１つの顔を検出し、フレーム内における顔の中心座標を顔の基準座標とし、大きさ及び方向を表す情報を、顔ごとに検出及び算出して出力する。 Next, face determination processing performed by the face determination unit 102 will be described.
First, the face detection unit 301 detects at least one face of a subject included in a frame of an input video signal, that is, an encoding target frame, and uses the center coordinates of the face in the frame as the reference coordinates of the face, And direction information are detected and calculated for each face and output.

顔認識履歴データ記録部３０２は、顔検出部３０１により検出された顔の画像データと、後述の顔認識部３０３により設定された「顔ＩＤ」を記録する。顔認識部３０３は、顔検出部３０１から出力された顔の中心座標、大きさ及び方向を表す情報を基に、入力した映像信号の被写体の顔が、顔認識履歴データ記録部３０２に記録されている顔と一致するか判定する。そして、顔を識別するための情報である「顔ＩＤ」を顔情報として顔ごとに出力すると共に、顔認識処理のために必要な顔の画像と、その顔に対応する「顔ＩＤ」を顔認識履歴データ記録部３０２に出力する。 The face recognition history data recording unit 302 records the face image data detected by the face detection unit 301 and “face ID” set by the face recognition unit 303 described later. The face recognition unit 303 records the face of the subject of the input video signal in the face recognition history data recording unit 302 based on the information indicating the center coordinates, size, and direction of the face output from the face detection unit 301. It is determined whether the face matches the face. Then, “face ID”, which is information for identifying the face, is output as face information for each face, and the face image necessary for face recognition processing and the “face ID” corresponding to the face are output to the face. The data is output to the recognition history data recording unit 302.

入力された映像信号の被写体の顔が、顔認識履歴データ記録部３０２に記録されている顔と一致しないと判定された場合は、判定された顔に対して新規の「顔ＩＤ」を設定する。一方、入力した映像信号の被写体の顔が、顔認識履歴データ記録部３０２に記録されている顔と一致すると判定された場合は、顔認識履歴データ記録部３０２に記録されている顔と同じ「顔ＩＤ」を算定された顔に設定する。 If it is determined that the face of the subject of the input video signal does not match the face recorded in the face recognition history data recording unit 302, a new “face ID” is set for the determined face. . On the other hand, if it is determined that the face of the subject of the input video signal matches the face recorded in the face recognition history data recording unit 302, the same as the face recorded in the face recognition history data recording unit 302 “ “Face ID” is set to the calculated face.

顔認識履歴データ記録部３０２に記録されている顔認識履歴データはストリーム毎にクリアされるようにしてもよい。表情判定部３０４は、顔検出部３０１から出力された顔の中心座標、大きさ及び方向を表す情報を基に、映像信号に含まれる被写体の顔の表情を判定し、表情の種類と表情指数を出力する。 The face recognition history data recorded in the face recognition history data recording unit 302 may be cleared for each stream. The facial expression determination unit 304 determines the facial expression of the subject included in the video signal based on the information indicating the center coordinates, size, and direction of the face output from the face detection unit 301, and determines the facial expression type and facial expression index. Is output.

表情の種類とは、例えば、笑顔、怒り顔、泣き顔等がある。表情指数とは、表情の度合いを表す指標であり、本実施形態では、複数段階の表情の度合いを表す表情指数を算出し、例えば、値０〜１０までの範囲で変化する値とする。例えば、笑顔の表情指数０の場合は、笑っていない顔であり、いわゆる真顔である。反対に、笑顔の表情指数１０の場合は、大笑いしている顔である。スイッチ３０５は、顔認識部３０３から出力される「顔ＩＤ」情報を顔情報に含めるか選択する選択部である。 Examples of facial expressions include a smile, an angry face, and a crying face. The facial expression index is an index representing the degree of facial expression. In the present embodiment, a facial expression index representing the degree of facial expression in a plurality of stages is calculated, for example, a value that varies in the range of 0 to 10. For example, a smile expression index of 0 is a face that is not laughing and is a so-called true face. On the other hand, a smile expression index of 10 is a laughing face. The switch 305 is a selection unit that selects whether to include the “face ID” information output from the face recognition unit 303 in the face information.

なお、顔検出部３０１により行われる顔検出の方法は、例えば、オブジェクト検出等の既知の方式を用いることができるので、本実施形態においては詳細な説明を省略する。また、顔認識部３０３により行われる顔認識の方法は、例えば、オブジェクト認識等の既知の方式を用いることができるので、本実施形態においては詳細な説明を省略する。表情判定部３０４による表情の判定は、例えば、顔領域内の顔の各パーツ（目や鼻や口等）の相対位置や形等に応じて判定する既知の表情判定方式を用いるものとし詳細は省略する。 Note that a known method such as object detection can be used as the face detection method performed by the face detection unit 301, and thus detailed description thereof is omitted in the present embodiment. Further, as the face recognition method performed by the face recognition unit 303, for example, a known method such as object recognition can be used, and thus detailed description thereof is omitted in the present embodiment. The facial expression determination by the facial expression determination unit 304 uses, for example, a known facial expression determination method that is determined according to the relative position or shape of each part (eyes, nose, mouth, etc.) of the face in the face area. Omitted.

以上のような方法で、顔検出部３０１、顔認識部３０３及び表情判定部３０４から出力された顔情報を図４及び図５を参照して説明する。
図４は、フレーム番号０の映像信号を示した図であり、図５は、フレーム毎に顔判定部１０２から出力された顔情報を示した図であり、フレーム番号０及び１から得られた顔情報を示している。図４及び図５の例では、簡単のため、フレーム内に１つの顔が含まれている場合を説明するが、フレーム内に複数の顔が含まれていてもよい。 The face information output from the face detection unit 301, the face recognition unit 303, and the facial expression determination unit 304 by the above method will be described with reference to FIGS.
FIG. 4 is a diagram showing a video signal of frame number 0, and FIG. 5 is a diagram showing face information output from the face determination unit 102 for each frame, obtained from frame numbers 0 and 1. The face information is shown. In the example of FIGS. 4 and 5, the case where one face is included in the frame will be described for the sake of simplicity. However, a plurality of faces may be included in the frame.

図４のようなフレーム番号０の映像信号において、顔検出部３０１は、点線内に顔を検出し、顔の中心座標として（ｘ，ｙ）＝（９６０，５４０）、大きさとして（ｘ＿ｓｉｚｅ，ｙ＿ｓｉｚｅ）＝（３７０，３７０）、方向として「右」という顔情報を出力する。 In the video signal of frame number 0 as shown in FIG. 4, the face detection unit 301 detects the face within the dotted line, (x, y) = (960, 540) as the center coordinate of the face, and (x_size, y_size) = (370, 370), and the face information “right” is output as the direction.

表情判定部３０４は、例えば、図４の点線内の顔は口角が上がっているため、表情の種類として「笑顔」、表情指数として「５」という顔情報を出力する。顔認識部３０３は、フレーム番号０においては、顔認識履歴データに顔情報は記録されていないので、顔検出部３０１によって検出された顔に対し新規の「顔ＩＤ」を設定し、「顔ＩＤ０」という顔情報を出力する。図５の例では、フレーム番号１においても、フレーム番号０と同じ顔の被写体が含まれていたために、フレーム番号０と同じ「顔ＩＤ０」という「顔ＩＤ」がフレーム番号１の顔情報として出力されている。 The facial expression determination unit 304 outputs facial information of “smile” as the facial expression type and “5” as the facial expression index because the face in the dotted line in FIG. The face recognition unit 303 sets a new “face ID” for the face detected by the face detection unit 301 because no face information is recorded in the face recognition history data at frame number 0, and “face ID 0 ”Is output. In the example of FIG. 5, since the subject having the same face as frame number 0 is included in frame number 1, the same “face ID” “face ID 0” as frame number 0 is output as face information of frame number 1. Has been.

以上のように、顔判定部１０２から出力される顔情報により、映像信号に含まれる複数の顔に関する情報をフレーム毎に知ることが可能である。さらに、顔認識部３０３から出力される「顔ＩＤ」により、過去に検出された顔と一致するか否かを判定することが可能である。すなわち、現フレームの第１の顔が過去フレームの第２の顔と一致するか否かを判定する顔認識を行うことにより、フレームを飛び越した参照を禁止する禁止条件に適合すると判定する。この場合には、第２の顔のフレームから所定期間内において、第２の顔と一致すると判定された第１の顔に対してはフレームを飛び越した参照を禁止する禁止条件に適合すると判定しないようにする。 As described above, it is possible to know information about a plurality of faces included in the video signal for each frame from the face information output from the face determination unit 102. Furthermore, it is possible to determine whether or not a face detected in the past matches with the “face ID” output from the face recognition unit 303. That is, by performing face recognition for determining whether or not the first face of the current frame matches the second face of the past frame, it is determined that the prohibition condition prohibiting the reference that skips the frame is satisfied. In this case, it is not determined that the first face determined to match the second face within the predetermined period from the second face frame satisfies the prohibition condition for prohibiting reference that skips the frame. Like that.

次に、基準フレーム設定判定部１０３について、図６、図７、図８、図９及び図１０を参照して詳しく説明する。基準フレーム設定判定部１０３は、顔判定部１０２から出力される顔情報に応じて、基準フレームを設定して符号化を行うか判定し、基準フレーム設定情報を出力する。 Next, the reference frame setting determination unit 103 will be described in detail with reference to FIG. 6, FIG. 7, FIG. 8, FIG. The reference frame setting determination unit 103 determines whether to perform encoding by setting a reference frame according to the face information output from the face determination unit 102, and outputs the reference frame setting information.

まず、１フレーム内に存在する顔が１つの場合の基準フレーム設定について、図６を参照して説明する。なお、図６の場合では、スイッチ３０５はＯＦＦの状態であり、基準フレーム設定判定部１０３は、顔認識部３０３から出力される「顔ＩＤ」は使用していない。 First, reference frame setting when there is one face in one frame will be described with reference to FIG. In the case of FIG. 6, the switch 305 is in an OFF state, and the reference frame setting determination unit 103 does not use the “face ID” output from the face recognition unit 303.

図６は、表情指数の時間的変化と基準フレーム設定を示した図である。図６の例では、ユーザが表情指数閾値を設定し、表情指数が表情指数閾値を超えた場合に基準フレーム設定判定部１０３は、基準フレーム設定情報を出力する。 FIG. 6 is a diagram showing temporal changes in facial expression index and reference frame setting. In the example of FIG. 6, when the user sets a facial expression index threshold and the facial expression index exceeds the facial expression index threshold, the reference frame setting determination unit 103 outputs reference frame setting information.

表情指数閾値は、図６の例では、「８」に設定されているものとする。時刻ｔ０において、符号化部１０１及び顔判定部１０２に映像信号が入力されると、符号化処理及び顔情報の出力が開始される。 It is assumed that the facial expression index threshold is set to “8” in the example of FIG. When a video signal is input to the encoding unit 101 and the face determination unit 102 at time t0, encoding processing and face information output are started.

そして、時刻ｔ１において、表情指数が表情指数閾値を超え、さらに時刻ｔ２において、所定期間（ｔ２−ｔ１期間）表情指数が表情指数閾値を超えている状態を維持している。このため、基準フレーム設定判定部１０３は、基準フレームを設定すると判定し、符号化部１０１に対して基準フレーム設定情報を出力する。また、時刻ｔ３から時刻ｔ４の期間においては、表情指数が表情指数閾値を越えていないので、基準フレーム設定判定部１０３は、基準フレーム設定情報を出力しない。 Then, at time t1, the facial expression index exceeds the facial expression index threshold, and at time t2, the facial expression index maintains a state where it exceeds the facial expression index threshold for a predetermined period (t2-t1 period). Therefore, the reference frame setting determination unit 103 determines to set a reference frame, and outputs the reference frame setting information to the encoding unit 101. In addition, during the period from time t3 to time t4, since the expression index does not exceed the expression index threshold, the reference frame setting determination unit 103 does not output the reference frame setting information.

時刻ｔ４において、表情指数が表情指数閾値を超えているが、所定期間表情指数が表情指数閾値を超えている状態を所定期間維持せずに、時刻ｔ５において、表情指数が表情指数閾値よりも小さくなってしまっている。このため、このような場合には、基準フレーム設定判定部１０３は、基準フレーム設定情報を出力しない。 At time t4, the facial expression index exceeds the facial expression index threshold, but the facial expression index is smaller than the facial expression index threshold at time t5 without maintaining the state where the facial expression index exceeds the facial expression index threshold for a predetermined period. It has become. For this reason, in such a case, the reference frame setting determination unit 103 does not output the reference frame setting information.

このように、表情指数が表情指数閾値を超える期間が短期間である場合は、基準フレームを設定しないことにより、必要最低限のフレームを基準フレームに設定することができる。ただし、言うまでもないが、表情指数が表情指数閾値を超える期間が短期間である場合でも基準フレームを設定してもよい。この場合は、表情指数が表情指数閾値を超えたら基準フレーム設定判定部１０３は、基準フレームを設定すると即座に判定し、符号化部１０１に対して基準フレーム設定情報を出力する。 As described above, when the period during which the facial expression index exceeds the facial expression index threshold is a short period, the minimum necessary frame can be set as the reference frame by not setting the reference frame. Needless to say, however, the reference frame may be set even when the period during which the facial expression index exceeds the facial expression index threshold is short. In this case, when the facial expression index exceeds the facial expression index threshold, the reference frame setting determination unit 103 immediately determines that the reference frame is set, and outputs the reference frame setting information to the encoding unit 101.

次に、顔情報として、１フレーム内に存在する顔が複数の場合の基準フレーム設定について、図７を参照して説明する。なお、図７の場合では、スイッチ３０５はＯＦＦの状態であり、基準フレーム設定判定部１０３は、顔認識部３０３から出力される「顔ＩＤ」は使用していない。 Next, reference frame setting when there are a plurality of faces present in one frame as face information will be described with reference to FIG. In the case of FIG. 7, the switch 305 is in an OFF state, and the reference frame setting determination unit 103 does not use the “face ID” output from the face recognition unit 303.

図７は、３つの顔における表情指数の時間的変化と基準フレーム設定を示した図である。図７の例では、ユーザが表情指数の表情指数閾値と顔の数の顔数閾値を設定し、表情指数が表情指数閾値を超えた顔の数が顔数閾値を超えた場合に基準フレーム設定判定部１０３は、基準フレーム設定情報を出力する。 FIG. 7 is a diagram showing temporal changes in expression indices and reference frame settings for three faces. In the example of FIG. 7, the user sets the expression index threshold of the expression index and the face number threshold of the number of faces, and the reference frame is set when the number of faces whose expression index exceeds the expression index threshold exceeds the face number threshold. The determination unit 103 outputs reference frame setting information.

図７の例では、表情指数閾値は、第１の閾値として「８」を設定する第１の閾値設定を行う。また、顔の数の顔数閾値は、第２の閾値として「３」を設定する第２の閾値設定を行う。時刻ｔ０において、符号化部１０１及び顔判定部１０２に映像信号が入力されると、符号化処理及び顔情報の出力が開始される。 In the example of FIG. 7, the facial expression index threshold is set to a first threshold that sets “8” as the first threshold. The face number threshold of the number of faces is set to a second threshold value that sets “3” as the second threshold value. When a video signal is input to the encoding unit 101 and the face determination unit 102 at time t0, encoding processing and face information output are started.

時刻ｔ１から時刻ｔ２の期間において、１つの顔の表情指数が表情指数閾値を超えているが、顔数閾値「３」に達していないために基準フレーム設定判定部１０３は、基準フレーム設定情報を出力しない。 In the period from time t1 to time t2, since the facial expression index of one face exceeds the facial expression index threshold, but does not reach the face number threshold “3”, the reference frame setting determination unit 103 stores the reference frame setting information. Do not output.

時刻ｔ３において、３つの顔の表情指数が同時に表情指数閾値を超え、さらに時刻ｔ４において、所定期間（ｔ４−ｔ３期間）３つの顔の表情指数が表情指数閾値を超えている状態を維持している。このため、基準フレーム設定判定部１０３は、時刻ｔ４において基準フレームを設定すると判定し、符号化部１０１に対して基準フレーム設定情報を出力する。 At time t3, the facial expression indexes of the three faces simultaneously exceed the facial expression index threshold, and at time t4, the state where the facial expression indexes of the three faces exceed the facial expression index threshold for a predetermined period (t4-t3 period) is maintained. Yes. For this reason, the reference frame setting determination unit 103 determines to set a reference frame at time t4, and outputs the reference frame setting information to the encoding unit 101.

次に、１フレーム内に存在する顔が複数の場合において、顔の方向に応じた基準フレーム設定について、図８を参照して説明する。なお、図８の場合では、スイッチ３０５はＯＦＦの状態であり、基準フレーム設定判定部１０３は、顔認識部３０３から出力される「顔ＩＤ」は使用していない。 Next, reference frame setting according to the face direction when there are a plurality of faces in one frame will be described with reference to FIG. In the case of FIG. 8, the switch 305 is in the OFF state, and the reference frame setting determination unit 103 does not use the “face ID” output from the face recognition unit 303.

図８は、３つの顔における表情指数及び方向の時間的変化と基準フレーム設定を示した図である。図８の例では、ユーザが表情指数の表情指数閾値と顔の数の顔数閾値を設定する。そして、表情指数が表情指数閾値を超えた顔のうち、同じ方向を向いている顔の合計数が顔数閾値、すなわち、第２の閾値を超えた場合に基準フレーム設定判定部１０３は、基準フレーム設定情報を出力する。 FIG. 8 is a diagram showing temporal changes in expression indices and directions and reference frame settings for three faces. In the example of FIG. 8, the user sets a facial expression index threshold for the facial expression index and a face number threshold for the number of faces. When the total number of faces facing in the same direction among the faces whose facial expression index exceeds the facial expression index threshold exceeds the face number threshold, that is, the second threshold, the reference frame setting determination unit 103 Output frame setting information.

図８の例では、表情指数閾値は、「８」に設定され、顔数閾値は、「３」に設定されていることとする。時刻ｔ０において、符号化部１０１及び顔判定部１０２に映像信号が入力されると、符号化処理及び顔情報の出力が開始される。時刻ｔ１から時刻ｔ２の期間において、３つの顔の表情指数が表情指数閾値を超えている。しかし、本実施形態の基準フレーム設定判定部１０３は、顔の方向毎に顔個数を検出する顔方向検出を行っている。したがって、顔の方向が「右」、「正面」、「左」と異なる方向を向いている場合には、フレームを飛び越した参照を禁止する禁止条件に適合すると判定し、飛び越し参照を禁止する基準フレームを設定する。 In the example of FIG. 8, it is assumed that the facial expression index threshold is set to “8” and the face number threshold is set to “3”. When a video signal is input to the encoding unit 101 and the face determination unit 102 at time t0, encoding processing and face information output are started. In the period from time t1 to time t2, the facial expression indexes of the three faces exceed the facial expression index threshold. However, the reference frame setting determination unit 103 of the present embodiment performs face direction detection that detects the number of faces for each face direction. Therefore, when the face direction is different from “right”, “front”, and “left”, it is determined that the prohibition condition prohibiting the reference that skips the frame is satisfied, and the reference that prohibits the jump reference is determined. Set the frame.

時刻ｔ３において、２つの顔の向きが変更され、３つの顔の方向が全て「右」となる。そして、時刻ｔ４において、同じ「右」方向を向いた３つの顔の表情指数が同時に表情指数閾値を超え、さらに時刻ｔ５において、所定期間（ｔ５−ｔ４期間）同じ「右」方向を向いた３つの顔の表情指数が表情指数閾値を超えている状態を維持している。このため、基準フレーム設定判定部１０３は、基準フレームを設定すると判定し、符号化部１０１に対して基準フレーム設定情報を出力する。 At time t3, the orientations of the two faces are changed, and the directions of the three faces are all “right”. At the time t4, the facial expression indices of the three faces facing the same “right” direction simultaneously exceed the facial expression index threshold, and at the time t5, the facial expression indices 3 are directed to the same “right” direction for a predetermined period (t5-t4 period). The facial expression index of one face is over the expression index threshold. Therefore, the reference frame setting determination unit 103 determines to set a reference frame, and outputs the reference frame setting information to the encoding unit 101.

次に、図９を参照しながら、顔情報から判定した主顔情報に応じた基準フレーム設定について説明する。なお、図９の場合では、スイッチ３０５はＯＦＦの状態であり、基準フレーム設定判定部１０３は、顔認識部３０３から出力される「顔ＩＤ」は使用していない。 Next, reference frame setting according to main face information determined from face information will be described with reference to FIG. In the case of FIG. 9, the switch 305 is in an OFF state, and the reference frame setting determination unit 103 does not use the “face ID” output from the face recognition unit 303.

図９は、３つの顔における表情指数及び主顔情報の時間的変化と基準フレーム設定を示した図である。図９の例では、ユーザが表情指数の表情指数閾値を設定し、主顔の表情指数が表情指数閾値を超えた場合に基準フレーム設定判定部１０３は、基準フレーム設定情報を出力する。 FIG. 9 is a diagram showing temporal changes in facial expression indices and main face information and reference frame settings for three faces. In the example of FIG. 9, when the user sets a facial expression index threshold for the facial expression index, and the facial expression index exceeds the facial expression index threshold, the reference frame setting determination unit 103 outputs the reference frame setting information.

図９の例では、表情指数閾値は、「８」に設定されていることとする。なお、主顔とは、視聴者（ユーザ）が注目する顔のことである。本実施形態では、基準フレーム設定判定部１０３が顔情報に含まれる顔の中心座標、大きさ、方向から主顔判定を行う例について説明するが、主顔判定はこれに限ったものではない。例えば、本実施形態では、基準フレーム設定判定部１０３は顔の中心座標がフレーム中央に近く、顔の大きさが大きく、顔の方向が正面を向いている顔を主顔と判定をする。 In the example of FIG. 9, it is assumed that the facial expression index threshold is set to “8”. The main face is a face that the viewer (user) pays attention to. In the present embodiment, an example in which the reference frame setting determination unit 103 performs main face determination from the center coordinates, size, and direction of the face included in the face information will be described, but the main face determination is not limited to this. For example, in this embodiment, the reference frame setting determination unit 103 determines a face whose face center coordinates are close to the center of the frame, the face size is large, and the face direction is the front, as the main face.

時刻ｔ０において、符号化部１０１及び顔判定部１０２に映像信号が入力されると、符号化処理及び顔情報の出力が開始される。そして、時刻ｔ１から時刻ｔ２の期間において、１つの顔の表情指数が表情指数閾値を超えているが、主顔と判定されていないために基準フレーム設定判定部１０３は、基準フレーム設定情報を出力しない。 When a video signal is input to the encoding unit 101 and the face determination unit 102 at time t0, encoding processing and face information output are started. In the period from time t1 to time t2, the facial expression index of one face exceeds the facial expression index threshold value, but since it is not determined to be the main face, the reference frame setting determination unit 103 outputs the reference frame setting information. do not do.

また、時刻ｔ３から時刻ｔ４の期間において、１つの顔が主顔と判定されているが、表情指数が表情指数閾値を超えていないために基準フレーム設定判定部１０３は、基準フレーム設定情報を出力しない。 Also, in the period from time t3 to time t4, one face is determined to be the main face, but since the expression index does not exceed the expression index threshold, the reference frame setting determination unit 103 outputs the reference frame setting information. do not do.

時刻ｔ５において、１つの顔が主顔と判定され、さらに主顔と判定された顔の表情指数が所定期間（ｔ６―ｔ５期間）に亘って表情指数閾値を超えている状態を維持している。このため、基準フレーム設定判定部１０３は、基準フレームを設定すると判定し、符号化部１０１に対して基準フレーム設定情報を出力する。 At time t5, one face is determined as the main face, and the expression index of the face determined as the main face is maintained in a state where it exceeds the expression index threshold for a predetermined period (t6-t5 period). . Therefore, the reference frame setting determination unit 103 determines to set a reference frame, and outputs the reference frame setting information to the encoding unit 101.

次に、図１０を参照しながら、顔情報に含まれる「顔ＩＤ」に応じた基準フレーム設定について説明する。なお、図１０の場合では、スイッチ３０５はＯＮの状態であり、基準フレーム設定判定部１０３は、顔認識部３０３から出力される「顔ＩＤ」を使用している。 Next, reference frame setting according to the “face ID” included in the face information will be described with reference to FIG. In the case of FIG. 10, the switch 305 is in the ON state, and the reference frame setting determination unit 103 uses the “face ID” output from the face recognition unit 303.

図１０は、「顔ＩＤ」により識別可能な３つの顔における表情指数の時間的変化と基準フレーム設定を示した図である。図１０の例では、ユーザは表情指数の表情指数閾値を設定し、表情指数が表情指数閾値を超えた場合に基準フレーム設定判定部１０３は、基準フレーム設定情報を出力する。図１０の例では、表情指数閾値は、「８」に設定されていることとする。 FIG. 10 is a diagram showing temporal changes in expression indices and reference frame settings for three faces that can be identified by “face ID”. In the example of FIG. 10, the user sets a facial expression index threshold for the facial expression index, and when the facial expression index exceeds the facial expression index threshold, the reference frame setting determination unit 103 outputs the reference frame setting information. In the example of FIG. 10, it is assumed that the facial expression index threshold is set to “8”.

時刻ｔ０において、符号化部１０１及び顔判定部１０２に映像信号が入力されると、符号化処理及び顔情報の出力が開始される。そして、時刻ｔ１において、「顔ＩＤ０」の表情指数が表情指数閾値を超え、さらに時刻ｔ２において、所定期間（ｔ２−ｔ１期間）顔ＩＤ０の表情指数が表情指数閾値を超えている状態を維持している。このため、基準フレーム設定判定部１０３は、基準フレームを設定すると判定し、符号化部１０１に対して基準フレーム設定情報を出力する。 When a video signal is input to the encoding unit 101 and the face determination unit 102 at time t0, encoding processing and face information output are started. At time t1, the facial expression index of “face ID 0” exceeds the facial expression index threshold, and at time t2, the facial expression index of face ID 0 exceeds the facial expression index threshold for a predetermined period (t2-t1 period). ing. Therefore, the reference frame setting determination unit 103 determines to set a reference frame, and outputs the reference frame setting information to the encoding unit 101.

また、時刻ｔ３において、「顔ＩＤ１」の表情指数が表情指数閾値を超え、さらに時刻ｔ４において、所定期間（ｔ４−ｔ３期間）「顔ＩＤ１」の表情指数が表情指数閾値を超えている状態を維持している。このため、基準フレーム設定判定部１０３は、基準フレームを設定すると判定し、符号化部１０１に対して基準フレーム設定情報を出力する。 At time t3, the expression index of “Face ID1” exceeds the expression index threshold, and at time t4, the expression index of “Face ID1” exceeds the expression index threshold for a predetermined period (t4-t3 period). Is maintained. Therefore, the reference frame setting determination unit 103 determines to set a reference frame, and outputs the reference frame setting information to the encoding unit 101.

時刻ｔ５において、「顔ＩＤ０」の表情指数が表情指数閾値を超え、さらに時刻ｔ６において、所定期間内（ｔ６−ｔ５期間）について「顔ＩＤ０」の表情指数が表情指数閾値を超えている状態を維持している。しかし、前回の時刻ｔ２における基準フレーム設定から所定期間経過していないため、基準フレーム設定判定部１０３は、基準フレームを設定しないと禁止判定を行い、符号化部１０１に対して基準フレーム設定情報を出力しない。 At time t5, the expression index of “face ID 0” exceeds the expression index threshold, and at time t6, the expression index of “face ID 0” exceeds the expression index threshold within a predetermined period (t6 to t5 period). Is maintained. However, since the predetermined period has not elapsed since the reference frame setting at the previous time t2, the reference frame setting determination unit 103 determines prohibition unless the reference frame is set, and the reference frame setting information is sent to the encoding unit 101. Do not output.

このように、同一顔において、前回の基準フレーム設定から所定期間以内であれば、基準フレーム設定判定部１０３は、基準フレーム設定情報を出力しないことにより、必要最低限のフレームを基準フレームに設定することができる。 Thus, if the same face is within a predetermined period from the previous reference frame setting, the reference frame setting determination unit 103 sets the minimum necessary frame as the reference frame by not outputting the reference frame setting information. be able to.

なお、基準フレーム設定判定部１０３は、顔情報に含まれる「表情の種類」応じて、基準フレームを設定すると判定してもよい。例えば、笑顔の表情指数が表情指数閾値を超えた場合のみ基準フレームを設定すると判定し、泣き顔の表情指数が表情指数閾値を超えたとしても基準フレームを設定すると判定しないようにしてもよい。 Note that the reference frame setting determination unit 103 may determine to set a reference frame in accordance with the “expression type” included in the face information. For example, it may be determined that the reference frame is set only when the facial expression index of the smile exceeds the expression index threshold, and it is not determined that the reference frame is set even if the facial expression index of the crying face exceeds the expression index threshold.

次に、図１１を参照して、顔情報に応じて基準フレームを設定する処理について説明する。
まず、ステップＳ１１０１において、ユーザにより選択された基準フレーム設定方法が決定される。基準フレーム設定方法とは、基準フレーム設定判定部１０３が基準フレームの設定判定に用いる方法のことである。 Next, a process for setting a reference frame according to face information will be described with reference to FIG.
First, in step S1101, the reference frame setting method selected by the user is determined. The reference frame setting method is a method used by the reference frame setting determination unit 103 for setting determination of a reference frame.

本実施形態では、図６、図７、図８、図９及び図１０を用いて説明した基準フレーム設定方法がある。次に、ステップＳ１１０２において、符号化部１０１及び顔判定部１０２に映像信号が入力される。次に、ステップＳ１１０３において、顔判定部１０２は、映像信号を解析することにより顔判定処理を行い、処理結果を顔情報として出力する。 In the present embodiment, there is a reference frame setting method described with reference to FIGS. 6, 7, 8, 9, and 10. In step S <b> 1102, the video signal is input to the encoding unit 101 and the face determination unit 102. In step S1103, the face determination unit 102 performs face determination processing by analyzing the video signal, and outputs the processing result as face information.

次に、ステップＳ１１０４において、基準フレーム設定判定部１０３が顔情報及びユーザにより選択された基準フレーム設定方法に基いて、基準フレームを設定するか否かを判定する。この判定の結果、基準フレームを設定すると判定した場合は、ステップＳ１１０５に進み、符号化部１０１は、基準フレームを設定する。 In step S1104, the reference frame setting determination unit 103 determines whether to set a reference frame based on the face information and the reference frame setting method selected by the user. If it is determined that the reference frame is set as a result of the determination, the process proceeds to step S1105, and the encoding unit 101 sets the reference frame.

一方、ステップＳ１１０４の判定の結果、基準フレーム設定判定部１０３が、顔情報及びユーザにより選択された基準フレーム設定方法に基いて、基準フレームを設定しないと判定した場合はステップＳ１１０６に進む。ステップＳ１１０６においては、符号化部１０１は、符号化方式に準拠したピクチャタイプを設定する。 On the other hand, as a result of the determination in step S1104, if the reference frame setting determination unit 103 determines not to set a reference frame based on the face information and the reference frame setting method selected by the user, the process proceeds to step S1106. In step S1106, the encoding unit 101 sets a picture type conforming to the encoding method.

なお、符号化部１０１は発生する符号量を制御する符号量制御部を有し、基準フレーム設定情報に応じて設定した基準フレーム対する符号量制御を行うとよい。このとき、符号化方式に準拠して設定されるような、他の条件に基づく基準フレームよりも、顔情報に基づいて設定した基準フレームの符号量を多くするとよい。これにより、例えば、笑顔シーンで設定した基準フレームの画質が通常の基準フレームと比較して向上する。 Note that the encoding unit 101 may include a code amount control unit that controls a generated code amount, and may perform code amount control for a reference frame set according to reference frame setting information. At this time, the code amount of the reference frame set based on the face information may be larger than the reference frame based on other conditions such as set in accordance with the encoding method. Thereby, for example, the image quality of the reference frame set in the smile scene is improved as compared with the normal reference frame.

前述した実施形態においては、入力画像信号をＨ．２６４符号化方式に準拠して圧縮符号化し、基準フレームのピクチャタイプをＩＤＲピクチャに設定するようにした。そして、顔情報がフレームを飛び越した参照を禁止する禁止条件に適合すると判定された場合に、飛び越し参照を禁止する基準フレームを設定する。しかし、入力画像信号をMPEG符号化方式に準拠して圧縮符号化し、基準フレームのピクチャタイプをＩピクチャに設定するようにしてもよい。 In the embodiment described above, the input image signal is H.264. The picture type of the reference frame is set to the IDR picture by compressing and coding in accordance with the H.264 coding system. Then, when it is determined that the face information meets the prohibition condition for prohibiting the reference that skips frames, a reference frame that prohibits the jump reference is set. However, the input image signal may be compressed and encoded in accordance with the MPEG encoding method, and the picture type of the reference frame may be set to I picture.

本実施形態によれば、表情の度合いに応じて、必要最低限の画像フレームを基準フレームに設定し符号化を行うようにした。これにより、従来に比べ、符号化効率の低化を抑えながら表情の度合いが高いシーンからの迅速な再生及び容易な編集が可能な符号化ストリームを得ることができる効果が得られる。 According to the present embodiment, encoding is performed by setting the minimum necessary image frame as a reference frame in accordance with the degree of facial expression. As a result, it is possible to obtain an encoded stream that can be quickly reproduced and easily edited from a scene with a high degree of facial expression while suppressing a decrease in encoding efficiency as compared with the related art.

（本発明に係る他の実施形態）
前述した本発明の実施形態における画像符号化装置を構成する各手段は、コンピュータのＲＡＭやＲＯＭなどに記憶されたプログラムが動作することによって実現できる。このプログラム及び前記プログラムを記録したコンピュータ読み取り可能な記録媒体は本発明に含まれる。 (Other embodiments according to the present invention)
Each means constituting the image coding apparatus according to the above-described embodiment of the present invention can be realized by operating a program stored in a RAM or ROM of a computer. This program and a computer-readable recording medium recording the program are included in the present invention.

また、本発明は、例えば、システム、装置、方法、プログラムもしくは記憶媒体等としての実施形態も可能であり、具体的には、複数の機器から構成されるシステムに適用してもよいし、また、一つの機器からなる装置に適用してもよい。 In addition, the present invention can be implemented as, for example, a system, apparatus, method, program, storage medium, or the like. Specifically, the present invention may be applied to a system including a plurality of devices. The present invention may be applied to an apparatus composed of a single device.

なお、本発明は、前述した画像符号化方法における各工程を実行するソフトウェアのプログラム（実施形態では図１１に示すフローチャートに対応したプログラム）を、システムあるいは装置に直接、あるいは遠隔から供給する。そして、そのシステムあるいは装置のコンピュータが前記供給されたプログラムコードを読み出して実行することによっても達成される場合を含む。 In the present invention, a software program (in the embodiment, a program corresponding to the flowchart shown in FIG. 11) for executing each step in the above-described image encoding method is directly or remotely supplied to a system or apparatus. In addition, this includes a case where the system or the computer of the apparatus is also achieved by reading and executing the supplied program code.

したがって、本発明の機能処理をコンピュータで実現するために、前記コンピュータにインストールされるプログラムコード自体も本発明を実現するものである。つまり、本発明は、本発明の機能処理を実現するためのコンピュータプログラム自体も含まれる。 Accordingly, since the functions of the present invention are implemented by computer, the program code installed in the computer also implements the present invention. In other words, the present invention includes a computer program itself for realizing the functional processing of the present invention.

その場合、プログラムの機能を有していれば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータ等の形態であってもよい。 In that case, as long as it has the function of a program, it may be in the form of object code, a program executed by an interpreter, script data supplied to the OS, and the like.

プログラムを供給するための記録媒体としては種々の記録媒体を使用することができる。例えば、フロッピー（登録商標）ディスク、ハードディスク、光ディスク、光磁気ディスク、ＭＯ、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＤＶＤ（ＤＶＤ−ＲＯＭ，ＤＶＤ−Ｒ）などがある。 Various recording media can be used as a recording medium for supplying the program. For example, floppy (registered trademark) disk, hard disk, optical disk, magneto-optical disk, MO, CD-ROM, CD-R, CD-RW, magnetic tape, nonvolatile memory card, ROM, DVD (DVD-ROM, DVD- R).

その他、プログラムの供給方法としては、クライアントコンピュータのブラウザを用いてインターネットのホームページに接続する。そして、前記ホームページから本発明のコンピュータプログラムそのもの、もしくは圧縮され自動インストール機能を含むファイルをハードディスク等の記録媒体にダウンロードすることによっても供給できる。 As another program supply method, a browser on a client computer is used to connect to an Internet home page. The computer program itself of the present invention or a compressed file including an automatic installation function can be downloaded from the homepage by downloading it to a recording medium such as a hard disk.

また、本発明のプログラムを構成するプログラムコードを複数のファイルに分割し、それぞれのファイルを異なるホームページからダウンロードすることによっても実現可能である。つまり、本発明の機能処理をコンピュータで実現するためのプログラムファイルを複数のユーザに対してダウンロードさせるＷＷＷサーバも、本発明に含まれるものである。 It can also be realized by dividing the program code constituting the program of the present invention into a plurality of files and downloading each file from a different homepage. That is, a WWW server that allows a plurality of users to download a program file for realizing the functional processing of the present invention on a computer is also included in the present invention.

また、本発明のプログラムを暗号化してＣＤ−ＲＯＭ等の記憶媒体に格納してユーザに配布し、所定の条件をクリアしたユーザに対し、インターネットを介してホームページから暗号化を解く鍵情報をダウンロードさせる。そして、その鍵情報を使用することにより暗号化されたプログラムを実行してコンピュータにインストールさせて実現することも可能である。 In addition, the program of the present invention is encrypted, stored in a storage medium such as a CD-ROM, distributed to users, and key information for decryption is downloaded from a homepage via the Internet to users who have cleared predetermined conditions. Let It is also possible to execute the encrypted program by using the key information and install the program on a computer.

また、コンピュータが、読み出したプログラムを実行することによって、前述した実施形態の機能が実現される他、コンピュータ上で稼動しているＯＳなどが、実際の処理の一部または全部を行うことによっても前述した実施形態の機能が実現され得る。 In addition to the functions of the above-described embodiments being realized by the computer executing the read program, the OS running on the computer may perform part or all of the actual processing. The functions of the above-described embodiments can be realized.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれる。その後、そのプログラムの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によっても前述した実施形態の機能が実現される。 Further, the program read from the recording medium is written in a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer. Thereafter, the CPU of the function expansion board or function expansion unit performs part or all of the actual processing based on the instructions of the program, and the functions of the above-described embodiments are realized by the processing.

本発明の実施形態を示し、画像符号化装置の構成例を示すブロック図である。1 is a block diagram illustrating an exemplary configuration of an image encoding device according to an embodiment of the present invention. 本発明の実施形態を示し、符号化部の構成例を示すブロック図である。It is a block diagram which shows embodiment of this invention and shows the structural example of an encoding part. 本発明の実施形態を示し、顔判定部の構成例を示すブロック図である。It is a block diagram which shows embodiment of this invention and shows the structural example of a face determination part. 本発明の実施形態を示し、映像信号から顔情報を出力する様子を説明する図である。It is a figure which shows embodiment of this invention and demonstrates a mode that face information is output from a video signal. 本発明の実施形態を示し、顔情報の内容の一例を説明する図である。It is a figure which shows embodiment of this invention and demonstrates an example of the content of face information. 本発明の実施形態を示し、基準フレーム設定方法の第１の例を説明する図である。It is a figure which shows embodiment of this invention and demonstrates the 1st example of the reference | standard frame setting method. 本発明の実施形態を示し、基準フレーム設定方法の第２の例を説明する図である。It is a figure which shows embodiment of this invention and demonstrates the 2nd example of the reference | standard frame setting method. 本発明の実施形態を示し、基準フレーム設定方法の第３の例を説明する図である。It is a figure which shows embodiment of this invention and demonstrates the 3rd example of the reference | standard frame setting method. 本発明の実施形態を示し、基準フレーム設定方法の第４の例を説明する図である。It is a figure which shows embodiment of this invention and demonstrates the 4th example of the reference | standard frame setting method. 本発明の実施形態を示し、基準フレーム設定方法の第５の例を説明する図である。It is a figure which shows embodiment of this invention and demonstrates the 5th example of the reference | standard frame setting method. 本発明の実施形態を示し、本発明の制御手順説明するフローチャートである。It is a flowchart which shows embodiment of this invention and demonstrates the control procedure of this invention. 従来例を示し、Ｈ．２６４におけるピクチャタイプ及びフレーム間予測に用いる参照画像の選択について説明する図である。A conventional example is shown. 2 is a diagram illustrating selection of a reference image used for picture type and inter-frame prediction in H.264. 従来例を示し、ＩＤＲピクチャについて説明する図である。It is a figure which shows a prior art example and demonstrates an IDR picture.

１０１符号化部
１０２顔判定部
１０３基準フレーム設定判定部
２０１フレーム並び替え部
２０２減算器
２０３整数変換部
２０４量子化部
２０５エントロピー符号化部
２０６逆量子化部
２０７逆整数変換部
２０８加算器
２０９第１のフレームメモリ
２１３第２のフレームメモリ
２１０イントラ予測部
２１１第１のスイッチ
２１７第２のスイッチ
２１２デブロッキングフィルタ
２１４インター予測部
２１５動き検出部
２１６ピクチャタイプ決定部
３０１顔検出部
３０２顔認識履歴データ記録部
３０３顔認識部
３０４表情判定部
３０５スイッチ DESCRIPTION OF SYMBOLS 101 Encoding part 102 Face determination part 103 Reference | standard frame setting determination part 201 Frame rearrangement part 202 Subtractor 203 Integer conversion part 204 Quantization part 205 Entropy encoding part 206 Inverse quantization part 207 Inverse integer conversion part 208 Adder 209 1st 1 frame memory 213 2nd frame memory 210 intra prediction unit 211 first switch 217 second switch 212 deblocking filter 214 inter prediction unit 215 motion detection unit 216 picture type determination unit 301 face detection unit 302 face recognition history data Recording unit 303 Face recognition unit 304 Expression determination unit 305 Switch

Claims

In an image encoding device that compresses and encodes an input image signal composed of a plurality of frames,
Face information creating means for analyzing the input image signal and creating face information for identifying a face;
Encoding means for compressing and encoding the input image signal using an inter-frame prediction method;
Prohibition to determine whether to prohibit reference in inter-frame prediction that skips over the encoding target frame based on the face information generated by the face information generation unit with respect to the encoding target frame in the encoding unit A determination means;
A setting unit configured to set the encoding target frame as a reference frame that prohibits jumping reference when it is determined by the prohibition determination unit that the reference skipping the encoding target frame is prohibited. Image encoding device.

Face detection means for detecting a face from a frame included in the input image signal;
Facial expression determination means for determining facial expression detected by the face detection means,
2. The image according to claim 1, wherein the prohibition determination unit determines whether or not to prohibit the reference skipping the encoding target frame according to the degree of the facial expression determined by the facial expression determination unit. Encoding device.

The image coding apparatus according to claim 2, wherein the face detection unit calculates at least a reference coordinate of the face and a size of the face.

3. The image coding apparatus according to claim 2, wherein the facial expression determination unit calculates a facial expression index representing the degree of facial expression in a plurality of stages, and determines the facial expression based on the facial expression index.

The facial expression determination means includes first threshold value setting means for setting a first threshold value related to the facial expression index,
The image encoding device according to claim 4, wherein the prohibition determination unit determines that reference that skips the encoding target frame is prohibited when the facial expression index exceeds the first threshold.

The facial expression determination means includes second threshold setting means for setting a second threshold relating to the number of faces in a frame included in the input image signal,
The prohibition determination unit determines that reference that skips the encoding target frame is prohibited when the total number of faces whose facial expression index exceeds the first threshold exceeds the second threshold. The image encoding device according to claim 5.

The image coding apparatus according to claim 4, wherein the face detection unit includes a face direction detection unit that detects the number of faces for each face direction.

The prohibition determination unit is configured such that, among faces whose facial expression index exceeds the first threshold, the number of faces facing the same direction detected by the face direction detection unit exceeds a preset threshold. The image encoding apparatus according to claim 7, wherein it is determined that reference that skips the encoding target frame is prohibited.

A main face determination means for determining a main face that is a face that is viewed by the viewer;
The prohibition determining means determines that reference that skips the encoding target frame is prohibited when the facial expression index of the main face determined by the main face determination means exceeds the first threshold. The image encoding device according to claim 5.

The said prohibition determination means determines to prohibit the reference which skipped the said encoding object frame, when the state where the said facial expression index exceeds the said 1st threshold value is maintained for a predetermined period. The image encoding device described.

Face recognition means for identifying the face detected by the face detection means and determining whether or not the first face of the current frame matches the second face of the past frame;
Within a predetermined period from the frame of the second face determined to prohibit the reference that skipped the encoding target frame by the prohibition determination unit, the prohibition determination unit determines that the face recognition unit and the second face are The image coding apparatus according to claim 2, wherein the first face determined to match is not determined to prohibit interlaced reference.

The image coding apparatus according to claim 2, wherein the facial expression determination means determines at least one facial expression among a smile, a crying face, and an angry face.

The encoding means includes code amount control means for controlling the amount of code generated,
The code amount control unit is configured to set a code amount for the reference frame to be set based on other conditions when the prohibition determining unit determines that the reference that skips the encoding target frame is prohibited. The image encoding device according to claim 1, wherein the image encoding device is also increased.

The encoding means converts the input image signal to H.264. H.264 encoding and compression encoding,
2. The image encoding apparatus according to claim 1, wherein the picture type of the reference frame is set to an IDR picture.

The encoding means compresses and encodes the input image signal in accordance with an MPEG encoding method,
The picture coding apparatus according to claim 1, wherein the picture type of the reference frame is set to I picture.

In an image encoding method for compressing and encoding an input image signal composed of a plurality of frames,
A face information creation step of analyzing the input image signal and creating face information for identifying a face;
An encoding step of compressing and encoding the input image signal using an inter-frame prediction method;
Prohibition of determining whether to prohibit reference in inter-frame prediction that skips over the encoding target frame based on the face information generated in the face information generation step with respect to the encoding target frame in the encoding step A determination process;
And a setting step of setting the encoding target frame to a reference frame that prohibits jumping reference when it is determined in the prohibition determination step that the reference skipping the encoding target frame is prohibited. Image coding method.

In a computer program for causing a computer to execute a process of compressing and encoding an input image signal composed of a plurality of frames,
A face information creation step of analyzing the input image signal and creating face information for identifying a face;
An encoding step of compressing and encoding the input image signal using an inter-frame prediction method;
Prohibition of determining whether to prohibit reference in inter-frame prediction that skips over the encoding target frame based on the face information generated in the face information generation step with respect to the encoding target frame in the encoding step A determination process;
Causing the computer to execute a setting step of setting the encoding target frame to a reference frame that prohibits jumping reference when it is determined in the prohibition determination step that the reference skipping the encoding target frame is prohibited. A featured computer program.