JP2013098750A

JP2013098750A - Image processing apparatus, image processing method, image processing program

Info

Publication number: JP2013098750A
Application number: JP2011239765A
Authority: JP
Inventors: Takayuki Sato; 貴之佐藤
Original assignee: JVCKenwood Corp
Current assignee: JVCKenwood Corp
Priority date: 2011-10-31
Filing date: 2011-10-31
Publication date: 2013-05-20
Anticipated expiration: 2031-10-31
Also published as: JP5712898B2

Abstract

PROBLEM TO BE SOLVED: To provide an image processing apparatus which can mask a region to which a mask target is reflected even when time is required for determining whether being a mask target or not in a case where a subject requiring mask processing is framed in.SOLUTION: A mask candidate cutting part 222 cuts a region to be masked as a mask candidate region from a decoded image frame. A similarity determination part 234 confirms, when the latest mask candidate region is similar to a mask target, the mask candidate as a mask target. A mask supplement part 260 tracks temporally behind movement of the region confirmed as a mask target by the determination with the similarity determination part 234, and obtains movement locus. Subsequently, the movement locus is extrapolated temporally ahead and movement before detection of the mask target is estimated. A supplement mask to be applied to the region obtained by the extrapolation is generated.

Description

本発明は、画像処理装置、画像処理方法、及び画像処理プログラムに関する。本発明は、例えば、特定人物の顔などプライバシー保護に係るものが映っている画像領域に自動的にマスク処理を施すための画像処理装置等に関する。 The present invention relates to an image processing apparatus, an image processing method, and an image processing program. The present invention relates to an image processing apparatus and the like for automatically performing mask processing on an image area in which an object relating to privacy protection such as a face of a specific person is shown.

近年、個人が撮影した動画像を各個人のパソコン端末や携帯端末からインターネットを介して簡単に公開できるようになってきた。しかし、動画像には、人物や、車両、撮像された地域を示す地名や住所など、個人を特定できる情報が含まれていることがある。そのような情報が含まれる動画像がそのまま公開されてしまうことは、プライバシー保護の観点から望ましくない。 In recent years, it has become possible to easily publish moving images taken by individuals from the personal computer terminals and portable terminals of the individuals via the Internet. However, a moving image may include information that can identify an individual, such as a person, a vehicle, and a place name or address indicating a captured area. It is not desirable from the viewpoint of privacy protection that a moving image including such information is disclosed as it is.

そこで、動画像中の特定人物やナンバープレートに対して自動的にモザイクを施す処理を行う方法が提案されている（例えば、特許文献１、特許文献２）。
特許文献１および特許文献２に開示される方法では、まず、モザイクを施す処理が必要になる人物の顔情報を参照情報として予め用意しておく。
そして、参照情報と画像中の顔とが一致した場合には、その顔部分にモザイクを施す処理を行う。 In view of this, there has been proposed a method of performing a process of automatically mosaicing a specific person or license plate in a moving image (for example, Patent Document 1 and Patent Document 2).
In the methods disclosed in Patent Literature 1 and Patent Literature 2, first, face information of a person who needs to perform mosaic processing is prepared in advance as reference information.
When the reference information matches the face in the image, a process for applying a mosaic to the face portion is performed.

この方法であれば、確かに特定人物の顔にモザイクを施す処理を行うことができるものの、画像中の顔が参照情報に一致する前の画像にはモザイクを施せないことになる。例えば、モザイクを施す処理の対象となる人物がゆっくりと画面内にスライドインしてきた場合、その顔が画面内に完全に収まってはじめてモザイクを施す処理が開始されるので、それまではスライドイン途中の顔が一部ではあるものの画面に曝されることとなってしまう。 With this method, it is possible to perform a process for applying a mosaic to the face of a specific person, but it is not possible to apply a mosaic to an image before the face in the image matches the reference information. For example, if a person who is subject to mosaicing slowly slides in the screen, the mosaicing process starts only after the face is completely within the screen. Will be exposed to the screen although it is a part of the face.

上記問題に対し、特許文献３（特開２０１０−２３３１３３号公報）では次の方法を提案している。特許文献３では、ビデオバッファを用意しておき、動画像を数フレーム遡れるように前記ビデオバッファに一時的に溜めておく。
そして、モザイクを施す対象、つまりマスク対象が検出された場合には、ビデオバッファ内にあるフレームを順番に遡ってマスク対象に関連する部分が無いかを改めて検索する。これにより、マスク対象が完全に画面内に入る前であってもマスク対象が部分的に映っている領域にマスクを掛ける、つまりモザイクを施すことができる。 To solve the above problem, Patent Document 3 (Japanese Patent Laid-Open No. 2010-233133) proposes the following method. In Patent Document 3, a video buffer is prepared, and a moving image is temporarily stored in the video buffer so as to go back several frames.
When a target to be mosaicked, that is, a mask target is detected, the frames in the video buffer are traced back in order to search again for a portion related to the mask target. Thereby, even before the mask target completely enters the screen, it is possible to apply a mask to a region where the mask target is partially reflected, that is, to apply a mosaic.

特開２００１−０８６４０７号公報JP 2001-086407 A 特開２００４−０６２５６０号公報JP 2004-062560 A 特開２０１０−２３３１３３号公報JP 2010-233133 A

しかしながら、ビデオバッファの容量にも限界がある。３０フレーム／秒の動画像であるとして、長くても１０秒分ぐらいを溜めておけるのが最大限である（特許文献３の段落００５６）。従って、カメラを横にゆっくり振りながら撮影したり、あるいは、固定カメラの前を人がゆっくりと通り過ぎるような場合、マスク処理が必要な人物が画面内にスライドインしてくる時間がビデオバッファの容量よりも長くなるという事態が起こりえる。すると、ビデオバッファによる保持時間を超えてビデオバッファから吐き出されてしまったフレームについては、マスク処理が不十分なままで公開されてしまうことになる。 However, there is a limit to the capacity of the video buffer. Assuming that the moving image is 30 frames / second, it is the maximum that about 10 seconds can be stored at the longest (paragraph 0056 of Patent Document 3). Therefore, if you take a picture while slowly shaking the camera, or if a person passes slowly in front of a fixed camera, the amount of time that the person who needs masking slides in the screen is the capacity of the video buffer. It can happen that it becomes longer. As a result, a frame that has been discharged from the video buffer beyond the retention time of the video buffer is released with insufficient mask processing.

なお、単純にビデオバッファの容量を大きくすることが考えられるが、どれほど容量を大きくしても動画像を一時的にバッファできる容量（時間）には限界がある。また、マスク対象を一旦検出したあとで、再び最初のフレームから検索し直してマスク対象に関連する部分をピックアップするという方法も考えられるが、時間と手間が掛かり過ぎる。 Although it is conceivable to simply increase the capacity of the video buffer, there is a limit to the capacity (time) in which a moving image can be temporarily buffered no matter how large the capacity is. Also, a method of once detecting a mask target and then searching again from the first frame and picking up a portion related to the mask target can be considered, but this takes too much time and effort.

そこで、本発明の目的は、マスク処理が必要な被写体がフレームインしてくる場合など、マスク対象かどうかの判定までに時間を要するような場合であっても、それまでの間にマスク対象が映っている領域に確実にマスクを掛けることができる画像処理装置、画像処理方法および画像処理プログラムを提供することにある。 Therefore, the object of the present invention is that even if it takes a long time to determine whether or not the subject is to be masked, such as when a subject requiring mask processing enters the frame, An object of the present invention is to provide an image processing apparatus, an image processing method, and an image processing program that can reliably mask an imaged area.

そこで、本発明は、
動画データからマスク処理すべき画像領域を検索してそれらにマスクを付加する画像処理装置（１００）であって、
予め登録された切出参照リストに基づいて、前記動画データの画像フレームのなかからマスク対象になり得る領域をマスク候補領域として切り出すマスク候補切出部（２２２）と、
予め登録されたマスク対象参照リストに基づいて、最新の前記マスク候補領域を前記マスク対象参照リストと対比して両者の類似度を算出する類似度算出部（２３３）と、
前記算出された類似度を所定閾値と対比して、前記類似度が所定閾値以上である場合にはそのマスク候補領域をマスク対象として確定させる類似判定部（２３４）と、
前記類似判定部（２３４）による判定よってマスク対象に確定した領域の動きを時間的に後ろに追跡して得られる移動軌跡を時間的に前に外挿し、この外挿で得られた領域に掛かる補充マスクを作成するマスク補充部（２６０）と、
前記類似判定部（２３４）による判定によってマスク対象に確定した領域に加えて前記マスク補充部（２６０）で作成された前記補充マスクの領域に対してマスクを付加するマスク付与部（２４０）と、を備える
ことを特徴とする画像処理装置を提供する。 Therefore, the present invention provides
An image processing apparatus (100) for searching image areas to be masked from moving image data and adding a mask to them,
A mask candidate cutout unit (222) that cuts out an area that can be a mask target from the image frames of the moving image data as a mask candidate area based on a pre-registered cutout reference list;
A similarity calculation unit (233) that compares the latest mask candidate region with the mask target reference list and calculates the similarity between the two based on a mask target reference list registered in advance;
A similarity determination unit (234) that compares the calculated similarity with a predetermined threshold and determines the mask candidate region as a mask target when the similarity is equal to or greater than the predetermined threshold;
The movement trajectory obtained by temporally tracking the movement of the region determined as the mask target by the determination by the similarity determination unit (234) is temporally extrapolated forward and applied to the region obtained by this extrapolation. A mask replenishment unit (260) for creating a replenishment mask;
A mask applying unit (240) for adding a mask to the region of the replenishment mask created by the mask replenishment unit (260) in addition to the region determined as a mask target by the determination by the similarity determination unit (234); An image processing apparatus is provided.

上記画像処理装置（１００）において、
前記マスク補充部（２６０）は、
前記類似判定部（２３４）による判定によってマスク対象に確定した領域の座標情報を順次取得する座標情報取得部（２６１）と、
前記座標情報取得部（２６１）にて取得された座標情報を保持する補外用データ保持部（２６２）と、
前記補外用データ保持部（２６２）に保持されたデータを時間的に前に外挿してマスク対象が検出される前の動きを推測する動き補外部（２６３）と、
前記動き補外部（２６３）による外挿によって得られた動きに掛かるように移動するマスクを作成する移動マスク作成部（２６４）と、を備える
ようにしても良い。 In the image processing apparatus (100),
The mask replenishment unit (260)
A coordinate information acquisition unit (261) that sequentially acquires coordinate information of areas determined as mask targets by the determination by the similarity determination unit (234);
An extrapolation data holding unit (262) that holds the coordinate information acquired by the coordinate information acquisition unit (261);
A motion extrapolation (263) that extrapolates the data held in the extrapolation data holding unit (262) in time and estimates a motion before the mask target is detected;
A moving mask creating unit (264) that creates a mask that moves so as to be applied to the motion obtained by extrapolation by the motion compensation exterior (263) may be provided.

また、上記画像処理装置において、
前記類似判定部（２３４）による判定よってマスク対象に確定した領域と、画像フレーム枠と、の距離が所定閾値以下であるか否かを判定する位置判定部（２５０）をさらに備え、
前記距離が所定閾値以下の場合に、前記位置判定部（２５０）は前記マスク補充部（２６０）を起動させる
ようにしても良い。 In the image processing apparatus,
A position determination unit (250) for determining whether or not the distance between the region determined as the mask target by the determination by the similarity determination unit (234) and the image frame frame is equal to or less than a predetermined threshold;
When the distance is less than or equal to a predetermined threshold, the position determination unit (250) may activate the mask supplementation unit (260).

また、上記画像処理装置において、
前記類似判定部（２３４）による判定よってマスク対象に確定した領域が画像フレームの中心方向に移動しているか否かを判定する位置判定部（２５０）をさらに備え、
前記マスク対象に確定した領域が画像フレームの中心方向に移動している場合に、前記位置判定部（２５０）は前記マスク補充部（２６０）を起動させる
ようにしても良い。 In the image processing apparatus,
A position determination unit (250) for determining whether or not the region determined as the mask target by the determination by the similarity determination unit (234) has moved in the center direction of the image frame;
The position determination unit (250) may activate the mask replenishment unit (260) when the region determined as the mask target is moving in the center direction of the image frame.

さらに、本発明は、
動画データからマスク処理すべき画像領域を検索してそれらにマスクを付加する画像処理方法であって、
予め登録された切出参照リストに基づいて、前記動画データの画像フレームのなかからマスク対象になり得る領域をマスク候補領域として切り出すマスク候補切出工程（ＳＴ１０３、ＳＴ１０４）と、
予め登録されたマスク対象参照リストに基づいて、最新の前記マスク候補領域を前記マスク対象参照リストと対比して両者の類似度を算出する類似度算出工程（ＳＴ１０６）と、
前記算出された類似度を所定閾値と対比して、前記類似度が所定閾値以上である場合にはそのマスク候補領域をマスク対象として確定させる類似判定工程（ＳＴ１０７）と、
前記類似判定工程（ＳＴ１０７）による判定よってマスク対象に確定した領域の動きを時間的に後ろに追跡して得られる移動軌跡を時間的に前に外挿し（ＳＴ１２３）、この外挿で得られた領域に掛かる補充マスクを作成するマスク補充工程（ＳＴ１２４）と、
前記類似判定工程（ＳＴ１０７）による判定によってマスク対象に確定した領域に加えて前記マスク補充工程（ＳＴ１２４）で作成された前記補充マスクの領域に対してマスクを付加するマスク付与工程（ＳＴ１２５）と、を備える
ことを特徴とする画像処理方法を提供する。 Furthermore, the present invention provides
An image processing method for searching image areas to be masked from video data and adding a mask to them,
A mask candidate extraction step (ST103, ST104) for extracting, as a mask candidate area, an area that can be a mask target from the image frame of the moving image data based on a pre-registered extraction reference list;
A similarity calculation step (ST106) for comparing the latest mask candidate area with the mask target reference list and calculating the similarity between the two based on a mask target reference list registered in advance;
A similarity determination step (ST107) in which the calculated similarity is compared with a predetermined threshold, and when the similarity is equal to or higher than the predetermined threshold, the mask candidate region is determined as a mask target;
The movement trajectory obtained by temporally tracking the movement of the area determined as the mask target by the determination in the similarity determination step (ST107) is temporally extrapolated forward (ST123), and obtained by this extrapolation. A mask replenishment step (ST124) for creating a replenishment mask over the region;
A mask applying step (ST125) for adding a mask to the region of the replenishment mask created in the mask replenishment step (ST124) in addition to the region determined as a mask object by the determination in the similarity determination step (ST107); An image processing method is provided.

さらに、本発明は
動画データからマスク処理すべき画像領域を検索してそれらにマスクを付加する画像処理装置に組み込んだコンピュータを、
予め登録された切出参照リストに基づいて、前記動画データの画像フレームのなかからマスク対象になり得る領域をマスク候補領域として切り出すマスク候補切出部（２２２）と、
予め登録されたマスク対象参照リストに基づいて、最新の前記マスク候補領域を前記マスク対象参照リストと対比して両者の類似度を算出する類似度算出部（２３３）と、
前記算出された類似度を所定閾値と対比して、前記類似度が所定閾値以上である場合にはそのマスク候補領域をマスク対象として確定させる類似判定部（２３４）と、
前記類似判定部（２３４）による判定よってマスク対象に確定した領域の動きを時間的に後ろに追跡して得られる移動軌跡を時間的に前に外挿し、この外挿で得られた領域に掛かる補充マスクを作成するマスク補充部（２６０）と、
前記類似判定部（２３４）による判定によってマスク対象に確定した領域に加えて前記マスク補充部２６０で作成された前記補充マスクの領域に対してマスクを付加するマスク付与部（２４０）と、して機能させる
ことを特徴とする画像処理プログラムを提供する。 Furthermore, the present invention provides a computer incorporated in an image processing apparatus that searches image areas to be masked from moving image data and adds a mask to them.
A mask candidate cutout unit (222) that cuts out an area that can be a mask target from the image frames of the moving image data as a mask candidate area based on a pre-registered cutout reference list;
A similarity calculation unit (233) that compares the latest mask candidate region with the mask target reference list and calculates the similarity between the two based on a mask target reference list registered in advance;
A similarity determination unit (234) that compares the calculated similarity with a predetermined threshold and determines the mask candidate region as a mask target when the similarity is equal to or greater than the predetermined threshold;
The movement trajectory obtained by temporally tracking the movement of the region determined as the mask target by the determination by the similarity determination unit (234) is temporally extrapolated forward and applied to the region obtained by this extrapolation. A mask replenishment unit (260) for creating a replenishment mask;
A mask applying unit (240) for adding a mask to the region of the replenishment mask created by the mask replenishment unit 260 in addition to the region determined as a mask target by the determination by the similarity determination unit (234); An image processing program characterized by functioning is provided.

本発明が想定する動画像データ記録再生システムの一例を示す図。The figure which shows an example of the moving image data recording / reproducing system which this invention assumes. 動画を撮影している様子を示す図。The figure which shows a mode that the moving image is image | photographed. 撮影された動画の例を示す図。The figure which shows the example of the image | photographed moving image. 第１実施形態において、マスク処理部の機能ブロック図。The functional block diagram of a mask process part in 1st Embodiment. 第１実施形態において、閾値以上の類似度を有するマスク対象領域を検出した状態を模式的に示す図。The figure which shows typically the state which detected the mask object area | region which has a similarity more than a threshold value in 1st Embodiment. 第１実施形態において、補外用データ保持部に保持する座標情報の例を示す図。The figure which shows the example of the coordinate information hold | maintained in the data holding part for extrapolation in 1st Embodiment. 第１実施形態において、検出されたマスク対象領域の移動軌跡を示す図。The figure which shows the movement locus | trajectory of the detected mask object area | region in 1st Embodiment. 第１実施形態において、マスク対象の移動軌跡を時間的に前の方に外挿したグラフ例を示す図。The figure which shows the example of a graph which extrapolated the movement locus | trajectory of mask object to the front in time in 1st Embodiment. 第１実施形態において、移動マスクの例を示す図。The figure which shows the example of a movement mask in 1st Embodiment. 第１実施形態において、動画撮影（ＳＴ１０）からマスク付加されたデータの出力（ＳＴ６０）までの動作手順を示すフローチャート。6 is a flowchart showing an operation procedure from moving image shooting (ST10) to output of masked data (ST60) in the first embodiment. 第１実施形態において、マスク処理用メタファイルの作成工程を示すフローチャート。6 is a flowchart illustrating a process of creating a mask processing metafile in the first embodiment. 第１実施形態において、マスク候補を切り出す様子を示す図。The figure which shows a mode that a mask candidate is cut out in 1st Embodiment. 第１実施形態において、移動マスクを作成する工程を示すフローチャート。The flowchart which shows the process of producing a movement mask in 1st Embodiment. 第１実施形態において、マスクを付加した動画の例を示す図。The figure which shows the example of the moving image which added the mask in 1st Embodiment.

以下、図面を参照して本発明の実施の形態について説明する。
（第１実施形態）
図１は、本発明が想定する動画像データ記録再生システム９００の一例である。
各個人はおのおの好きな被写体をビデオカメラ１００で撮像する。例えば、戸外に出て街の様子を撮影したり、家族が公園で遊んでいる光景を撮影したりしてもよい。このようにして撮影された動画像データはビデオカメラ１００に内蔵された動画メモリ１２０に蓄積されていく。 Embodiments of the present invention will be described below with reference to the drawings.
(First embodiment)
FIG. 1 shows an example of a moving image data recording / reproducing system 900 assumed by the present invention.
Each individual takes an image of his / her favorite subject with the video camera 100. For example, you may go out and take pictures of the city, or take a picture of a family playing in a park. The moving image data shot in this way is stored in the moving image memory 120 built in the video camera 100.

撮影者は家に帰ると、ビデオカメラ１００をパソコン９１０に接続し、撮影した動画像データをパソコン９１０経由でインターネット９２０上のサーバ９３０に送る。すると、このサーバ９３０に動画像データが保存され、第三者もインターネット９２０を利用して動画像を見られるようになる。 When the photographer returns home, he connects the video camera 100 to the personal computer 910 and sends the captured moving image data to the server 930 on the Internet 920 via the personal computer 910. Then, the moving image data is stored in the server 930, and a third party can view the moving image using the Internet 920.

しかし、このようにして撮影された動画像には公開しては不都合な画像も含まれていることがある。例えば、家族の顔や、意図せずに映り込んでしまった通行人の顔、個人を特定する情報、例えば、氏名、電話番号、車のナンバープレートなどが映り込んでいる可能性がある。
これらの個人情報に関連する画像をそのままインターネット上に公開してしまうとプライバシー保護の観点からみて問題が生じる恐れがある。従って、プラバシーに関わる画像部分にはマスク処理を施すことが必要になる。 However, the moving images shot in this way may include images that are inconvenient for the public. For example, there is a possibility that the face of a family member, the face of a passerby unintentionally reflected, and information for identifying an individual, such as a name, a telephone number, or a car license plate, may be reflected.
If these images related to personal information are published on the Internet as they are, there is a risk that problems may arise from the viewpoint of privacy protection. Therefore, it is necessary to perform mask processing on the image portion related to privacy.

ここで、以下の説明に用いる動画の例として、図２のように特定人物ＣＡの様子を撮影したとする。このとき撮影者Ｐは、ビデオカメラ１００を右から左にゆっくりと向きを変えるように動かし、人物ＣＡがフレームのなかにゆっくりとフレームインするように撮影するとする。すると、例えば、図３のように、フレームの左から人物ＣＡが徐々にフレームインしてくる動画が撮影されるであろう。ここで、上述の従来技術によれば、フレームＦ４０のように顔の８０％程度がフレーム内に映っていれば、これをマスク対象であると判定することができる。従って、フレームＦ４０以降のフレームＦ５０やフレームＦ６０には人物ＣＡの顔にマスクを掛けられる。しかし、フレームＦ４０に至るまでは、人物ＣＡの顔が映っているにも関わらず、顔の一部しか映っていないのでマスク対象であるとまでは確定できず、このままではマスクが掛からず顔が公開されてしまうことになる。そこで、本実施形態では、マスク対象であると確定する前のフレームに対してもマスクを掛けるようにする。 Here, as an example of a moving image used in the following description, it is assumed that a specific person CA is photographed as shown in FIG. At this time, the photographer P moves the video camera 100 so as to slowly change the direction from right to left, and shoots so that the person CA slowly enters the frame in the frame. Then, for example, as shown in FIG. 3, a moving image in which the person CA gradually enters the frame from the left of the frame will be shot. Here, according to the above-described prior art, if about 80% of the face is reflected in the frame as in the frame F40, it can be determined that this is a mask target. Therefore, the face of the person CA can be masked on the frames F50 and F60 after the frame F40. However, until the frame F40, although the face of the person CA is shown, only a part of the face is shown, so it cannot be determined that it is a mask target. It will be released. Therefore, in this embodiment, the mask is also applied to the frame before it is determined that it is a mask target.

（第１実施形態の構成）
第１実施形態の構成を説明する。
図４は、マスク処理部２００の機能ブロック図であり、あわせて、ビデオカメラ１００の主要要素を示す。本実施形態ではマスク処理部２００がビデオカメラ１００に内蔵されている場合を想定しているが、パソコン９１０の一機能としてマスク処理部が設けられてもよい。ビデオカメラ１００の撮像部１１０で撮像された動画データは、一旦動画メモリ１２０に蓄積される。ここで、ビデオカメラ１００の撮像部１１０は、レンズユニットやＣＣＤ（光電変換素子）回路、所定のロジック回路で構成され、被写体を撮影した動画（映像）信号から動画データ（フレーム）を生成する。動画データには輝度データや色データが含まれる。動画メモリ１２０としては、例えば、フラッシュメモリが利用できる。 (Configuration of the first embodiment)
The configuration of the first embodiment will be described.
FIG. 4 is a functional block diagram of the mask processing unit 200 and also shows the main elements of the video camera 100. In the present embodiment, it is assumed that the mask processing unit 200 is built in the video camera 100, but a mask processing unit may be provided as one function of the personal computer 910. The moving image data captured by the imaging unit 110 of the video camera 100 is temporarily stored in the moving image memory 120. Here, the imaging unit 110 of the video camera 100 includes a lens unit, a CCD (photoelectric conversion element) circuit, and a predetermined logic circuit, and generates moving image data (frame) from a moving image (video) signal obtained by photographing a subject. The moving image data includes luminance data and color data. As the moving image memory 120, for example, a flash memory can be used.

マスク処理部２００は、データ入力部２１０と、マスク候補取得部２２０と、マスク対象決定部２３０と、マスクフラグ付与部２４０と、位置判定部２５０と、マスク補充部２６０と、メタファイル作成部２７０と、マスク付加部２８０と、データ出力部２９０と、を備える。 The mask processing unit 200 includes a data input unit 210, a mask candidate acquisition unit 220, a mask target determination unit 230, a mask flag assignment unit 240, a position determination unit 250, a mask supplementation unit 260, and a metafile creation unit 270. A mask adding unit 280, and a data output unit 290.

データ入力部２１０は、動画メモリ１２０に蓄積された動画データを読み込んで後段に出力する。データ入力部２１０は、復号化部２１１を有し、動画データを復号化した画像フレームを出力する。 The data input unit 210 reads the moving image data stored in the moving image memory 120 and outputs it to the subsequent stage. The data input unit 210 includes a decoding unit 211 and outputs an image frame obtained by decoding moving image data.

ここで、動画を視聴することを目的とせずにマスク処理だけを目的とする場合、すべての画像フレームが必要になるわけではない。従って、Ｉピクチャや、Ｐピクチャ、Ｂピクチャなどがあるなかで、処理の目的に応じたフレームだけがデータ入力部２１０から後段回路に提供されるようにしてもよい。例えば、マスク処理にはＩピクチャとＰピクチャとだけを使用するようにしてもよい。もちろん、すべてのフレームを使用してマスク処理を行ってもよい。 Here, not all image frames are required when the purpose is not only to view a moving image but only to mask processing. Therefore, among the I picture, P picture, B picture, etc., only the frame corresponding to the purpose of processing may be provided from the data input unit 210 to the subsequent circuit. For example, only an I picture and a P picture may be used for mask processing. Of course, mask processing may be performed using all frames.

マスク候補取得部２２０は、プラバシーマスクを必要とする可能性がある画像領域だけを切り出す。たとえば、特定人物であるＡさんの顔にマスクを掛ける必要がある場合、肌色が所定面積以上である領域は人の顔の可能性があるので、このような領域をマスク候補として切り出す。 The mask candidate acquisition unit 220 cuts out only an image area that may require a privacy mask. For example, when it is necessary to put a mask on the face of Mr. A who is a specific person, an area having a skin color of a predetermined area or more may be a human face, and such an area is cut out as a mask candidate.

マスク候補取得部２２０は、切出参照リスト格納部２２１と、マスク候補切出部２２２と、を備える。 The mask candidate acquisition unit 220 includes a cutout reference list storage unit 221 and a mask candidate cutout unit 222.

切出参照リスト格納部２２１は、マスク候補として切出し対象になる領域を検出するための切出参照リストを格納している。切出参照リストは、プリセットされていてもよく、マスク対象の特徴から自動的に生成されるようにしてもよく、あるいは、ユーザが設定入力してもよい。切出参照リストとしては、所定面積以上の肌色領域や、文字および数字のリストデータなどがあげられる。 The cut-out reference list storage unit 221 stores a cut-out reference list for detecting a region to be cut out as a mask candidate. The extraction reference list may be preset, may be automatically generated from the features to be masked, or may be set and input by the user. Examples of the cut-out reference list include a skin color area having a predetermined area or more, and list data of characters and numbers.

文字および数字のリストデータを切出参照リストとして持つ意味を補足しておく。
プライバシーに関するものとしては、顔の他にも、氏名、団体名（会社名や学校名）、住所表示、電話番号、車のナンバーなどがある。したがって、０９０−ＸＸＸＸ−ＸＸＸＸといった電話番号にはプライバシーマスクを掛ける必要がある。そこで、画像フレーム中に"０"といった数字や後述するように何らかの文字が現れれば、それらをマスク対象の候補として切り出しておく必要がある。そして、最終的に、"０９０−"のように数字が並べばこの数字列にマスク処理を施すことになるし、単に"０"だけ、もしくは"０"のように見える模様であった場合にはマスク処理は必要ないことになる。 The meaning of having the list data of letters and numbers as the cut-out reference list will be supplemented.
As for privacy, in addition to the face, there are name, organization name (company name or school name), address display, telephone number, car number, etc. Therefore, it is necessary to put a privacy mask on a phone number such as 090-XXXX-XXXX. Therefore, if a number such as “0” or some characters appear in the image frame as described later, it is necessary to cut them out as candidates for masking. Finally, if the numbers are arranged like “090-”, the number string is masked. If the pattern looks just “0” or “0”, Does not require masking.

もう一例あげると、最終的に"横浜"や"横浜市ＸＸＸ区"、"横Ｘ学園ＸＸ学校"といったように文字が並べばこれらにマスク処理を施すことになるし、単に、"横"という字が一文字だけであれば、プライバシーに何ら関係がないので、マスク処理は必要ないことになる。
文字としては、外国語にも対応できるように、アルファベットや、ハングル、中国語の略字も切出参照リストに加えておくとよい。 As another example, if characters are finally arranged like "Yokohama", "Yokohama City XXX Ward", or "Horizontal X Gakuen XX School", they will be masked. If there is only one character, there is no need for masking because it has nothing to do with privacy.
As characters, alphabets, Hangul, and Chinese abbreviations should be added to the extracted reference list so that foreign languages can be handled.

マスク候補切出部２２２は、データ入力部２１０から順次入力される画像フレームを検索し、切出参照リスト格納部２２１に格納された切出参照リストに合致するものがあるか否かを探す。そして、切出参照リストに合致するものが画像フレーム中に存在している場合、その領域部分を切り出してマスク候補領域として切り出し、後段回路に出力する。
このとき、切り出されたマスク候補領域は、もとのフレームに関連付けられる情報を保持するものとする。フレームに関連付けられる情報とは、例えば、切出し元のフレーム番号およびそのフレーム内での座標である。 The mask candidate cutout unit 222 searches the image frames sequentially input from the data input unit 210 and searches for whether there is a match with the cutout reference list stored in the cutout reference list storage unit 221. If there is an image frame that matches the cut-out reference list, the region portion is cut out as a mask candidate region, and is output to the subsequent circuit.
At this time, it is assumed that the extracted mask candidate area holds information associated with the original frame. The information associated with the frame is, for example, the frame number of the cut-out source and the coordinates within the frame.

次に、マスク対象決定部２３０は、マスク候補取得部２２０で取得されたマスク候補の領域に対し、真にマスク対象であるか否かを検証する。
マスク対象決定部２３０は、特徴量算出部２３１と、マスク対象参照リスト格納部２３２と、類似度算出部２３３と、類似判定部２３４と、を備える。 Next, the mask target determination unit 230 verifies whether or not the mask candidate area acquired by the mask candidate acquisition unit 220 is truly a mask target.
The mask target determination unit 230 includes a feature amount calculation unit 231, a mask target reference list storage unit 232, a similarity calculation unit 233, and a similarity determination unit 234.

特徴量算出部２３１は、マスク候補切出部２２２で切り出された最新のマスク候補に対して特徴量を算出する。特徴量とは、対象画像の特徴を表わす各種指標値の組み合わせであり、対象画像の特徴点を表わす指標値や、特徴点同士の距離関係を表わす指標値、特徴部分の大きさを表わす指標値、対象画像の輪郭を表わす指標値、輝度を表わす指標値、色を表わす指標値、などからなる。
特徴量算出部２３１は、算出した特徴量を類似度算出部２３３に送る。 The feature amount calculation unit 231 calculates a feature amount for the latest mask candidate extracted by the mask candidate extraction unit 222. The feature amount is a combination of various index values representing the characteristics of the target image. The index value representing the feature point of the target image, the index value representing the distance relationship between the feature points, and the index value representing the size of the feature portion. , An index value representing the contour of the target image, an index value representing luminance, an index value representing color, and the like.
The feature amount calculation unit 231 sends the calculated feature amount to the similarity calculation unit 233.

マスク対象参照リスト格納部２３２は、マスク対象になる領域を検出するための参照情報を格納している。マスク対象参照リスト格納部２３２に格納されている参照情報としては、プライバシーマスクが必要な特定人物の顔（Ａさんの顔、Ｂさんの顔・・・）、電話番号や車両番号であることを特定するための数字列、氏名・団体名（会社名や学校名）・住所表示であることを特定するための文字列、などが挙げられる。これらの参照情報がそれぞれの特徴量とセットになって格納されている。 The mask target reference list storage unit 232 stores reference information for detecting an area to be masked. Reference information stored in the mask target reference list storage unit 232 includes a face of a specific person who needs a privacy mask (A's face, B's face ...), a telephone number, and a vehicle number. Examples include a numeric string for identifying, a name / organization name (company name or school name), a character string for identifying address display, and the like. These pieces of reference information are stored as a set with each feature amount.

類似度算出部２３３は、特徴量算出部２３１で算出された特徴量をマスク対象参照リスト格納部２３２に格納された特徴量と比較し、類似度を算出する。類似度の算出にあたっては、マスク候補の指標値とマスク対象参照リストの指標値とで互いに対応するもの同士をつきあわせ、両者の類似度合いを総合的に評価して類似度を求める。なお、マスク候補が肌色領域であって人の顔である可能性がある場合に、これを電話番号の特徴量と対比して類似度を算出しても意味のないことなのであり、マスク候補が肌色領域である場合に突き合わされる参照情報は特定人物の顔である。
類似度の算出に当たって適切な参照情報を選ばせる方法は各種考えられるが、例えば、色や輝度で判断してもよく、あるいは、総ての参照情報と突き合わせた上で最も類似度が高くなるものを選んでもよい。 The similarity calculation unit 233 compares the feature amount calculated by the feature amount calculation unit 231 with the feature amount stored in the mask target reference list storage unit 232, and calculates the similarity. When calculating the similarity, the index values of the mask candidates and the index values of the mask target reference list are matched to each other, and the similarity is obtained by comprehensively evaluating the degree of similarity between the two. If there is a possibility that the mask candidate is a skin color area and a human face, it is meaningless to calculate the similarity by comparing this with the feature quantity of the telephone number. The reference information that is matched in the case of the skin color region is the face of the specific person.
There are various ways to select appropriate reference information for calculating similarity, but for example, it may be judged by color or brightness, or the highest similarity will be obtained after matching with all reference information You may choose.

類似判定部２３４は、類似度算出部２３３で算出された類似度を所定の閾値と比較する。そして、類似度が閾値を超えている場合、そのマスク候補領域がマスク対象領域であることが確定する。類似度が閾値を超えた場合には、その旨をマスクフラグ付与部２４０と位置判定部２５０とに通知する。 The similarity determination unit 234 compares the similarity calculated by the similarity calculation unit 233 with a predetermined threshold. When the similarity exceeds the threshold, it is determined that the mask candidate area is a mask target area. When the similarity exceeds the threshold value, the fact is notified to the mask flag assignment unit 240 and the position determination unit 250.

マスクフラグ付与部２４０は、マスク対象領域にマスクフラグを設定する。マスク対象領域にマスクフラグを設定するにあっては、類似判定部２３４による判定によってマスク対象であることが確定したマスク対象の領域にマスクフラグを付与することはもちろんである。
図５に示すように、フレームＦ４０、フレームＦ５０、フレームＦ６０等では類似度が閾値以上になる程度に顔が映っているので、マスクフラグ付与部２４０によってこれらの顔が映っている領域にはマスクフラグが付与される。ここで、マスク対象（被写体）がフレームインし始めてからマスク対象であると確定するまでの間の映像（フレームＦ１０からフレーム３０）にもマスクを掛ける必要があるところ、これは後述の位置判定部２５０およびマスク補充部２６０によって達成される。 The mask flag assigning unit 240 sets a mask flag in the mask target area. In setting the mask flag in the mask target area, it goes without saying that the mask flag is given to the mask target area that is determined to be a mask target by the determination by the similarity determination unit 234.
As shown in FIG. 5, since the face is reflected in the frame F40, the frame F50, the frame F60, and the like so that the degree of similarity is equal to or greater than the threshold value, the mask flag adding unit 240 masks the area where these faces are reflected. A flag is given. Here, it is necessary to also mask the video (frame F10 to frame 30) from when the mask target (subject) starts to frame in until it is determined to be the mask target. This is a position determination unit described later. 250 and the mask replenishment unit 260.

位置判定部２５０は、確定したマスク対象領域の位置と画像フレームのフレーム枠との位置関係を調べる。具体的には、マスク対象領域とフレーム枠との距離が所定閾値以下であるか否かを判定する。マスク対象領域とフレーム枠との距離が所定閾値以下である場合、位置判定部２５０は、マスク対象を検出できたフレーム（フレームＦ４０）よりも前のフレーム（フレームＦ１０からフレーム３０）に対して補充的なマスクを作成させるため、マスク補充部２６０を起動させる。そして、マスク補充部２６０を起動させた後は、類似判定部２３４経由で取得するマスク対象領域の座標情報をマスク補充部２６０に出力する。 The position determination unit 250 checks the positional relationship between the determined position of the mask target area and the frame of the image frame. Specifically, it is determined whether or not the distance between the mask target region and the frame is equal to or less than a predetermined threshold value. When the distance between the mask target area and the frame frame is equal to or smaller than the predetermined threshold, the position determination unit 250 supplements the frame (frame F10 to frame 30) before the frame (frame F40) in which the mask target was detected. In order to create a typical mask, the mask replenishment unit 260 is activated. Then, after the mask replenishment unit 260 is activated, the coordinate information of the mask target area acquired via the similarity determination unit 234 is output to the mask replenishment unit 260.

ここで、マスク対象領域とフレーム枠との距離が所定閾値以下である場合、このマスク対象はフレームの端から徐々にフレームインしてきたものであると考えられる。例えば、図５を参照すると、フレームＦ４０においてはマスク対象領域がフレーム枠に極めて近い。この場合、フレームＦ４０の以前のフレーム（フレームＦ１０からフレームＦ３０）ではマスク対象（被写体）が十分にフレームに収まっていなかったためにマスク対象であると認識されていない恐れがある。
なお、マスク対象領域の位置という場合、マスク対象領域の中心座標Ｃｃを用いてもよく、あるいは、マスク対象領域が四角形であれば四角形の四隅のいずれを用いてもよい。マスク対象領域の位置の取り方に合わせて所定閾値の値や判定方法を適切に調整すればよい。 Here, when the distance between the mask target area and the frame frame is equal to or smaller than a predetermined threshold value, it is considered that the mask target is gradually framed in from the end of the frame. For example, referring to FIG. 5, in the frame F40, the mask target area is very close to the frame frame. In this case, in the previous frame (frame F10 to frame F30) of the frame F40, there is a possibility that the mask target (subject) is not sufficiently contained in the frame and is not recognized as the mask target.
In the case of the position of the mask target area, the center coordinates Cc of the mask target area may be used, or any of the four corners of the rectangle may be used if the mask target area is a square. What is necessary is just to adjust the value of a predetermined threshold value and the determination method appropriately according to how to take the position of the mask target area.

次に、マスク補充部２６０について説明する。
マスク補充部２６０は、マスク対象領域の移動軌跡に基づいて、マスク対象が検出される以前のフレームに掛けるマスクを作成する。
マスク補充部２６０は、座標情報取得部２６１と、補外用データ保持部２６２と、動き補外部２６３と、移動マスク作成部２６４と、を備える。 Next, the mask supplement unit 260 will be described.
The mask supplementation unit 260 creates a mask to be applied to the frame before the mask target is detected based on the movement trajectory of the mask target region.
The mask supplementation unit 260 includes a coordinate information acquisition unit 261, an extrapolation data holding unit 262, a motion compensation external 263, and a movement mask creation unit 264.

座標情報取得部２６１は、類似判定部２３４および位置判定部２５０経由で取得するマスク対象領域の座標情報を抜き出し、補外用データ保持部２６２に格納させる。ここで、座標情報とは、マスク対象領域の座標値、マスク対象領域の大きさ（サイズ）、および、マスク対象領域が属するフレームのタイムスタンプ、であり、これらをセットにして補外用データ保持部に保持させる。補外用データ保持部２６２は、一時メモリであり、図６のように、マスク対象領域の座標値、サイズ、そのタイムおよびフレーム番号を合わせて保持する。 The coordinate information acquisition unit 261 extracts the coordinate information of the mask target area acquired via the similarity determination unit 234 and the position determination unit 250 and stores the extracted coordinate information in the extrapolation data holding unit 262. Here, the coordinate information is the coordinate value of the mask target area, the size (size) of the mask target area, and the time stamp of the frame to which the mask target area belongs. To hold. The extrapolation data holding unit 262 is a temporary memory, and holds the coordinate value, size, time, and frame number of the mask target area together as shown in FIG.

動き補外部２６３は、補外用データ保持部２６２に保持された座標情報に基づいて、マスク対象が検出される前の動きを補外法によって推測する。
例えば、図６のように補外用データ保持部２６２に保持されたデータをグラフにプロットすると図７のようにフレームＦ４０以降におけるマスク対象領域の移動軌跡が求められる。ここでは、分かり易くするため、ｘ座標の変化のみを表わしているが、ｙ座標および領域サイズについても時間の関数として同じようにプロットできる。そして、フレームＦ４０よりも時間的に前のフレーム内でのマスク対象の移動軌跡を補外法で推測するため、図７のグラフを時間的に前に延長する。すると、図８のように、フレームＦ４０よりも時間的に前のフレームにおけるマスク対象の推定軌跡が得られる。 The motion extrapolation 263 estimates the motion before the mask target is detected based on the coordinate information held in the extrapolation data holding unit 262 by the extrapolation method.
For example, when the data held in the extrapolation data holding unit 262 is plotted on a graph as shown in FIG. 6, the movement trajectory of the mask target area after the frame F40 is obtained as shown in FIG. Here, for the sake of simplicity, only the change in the x coordinate is shown, but the y coordinate and the region size can be similarly plotted as a function of time. Then, in order to estimate the movement trajectory of the mask target in the frame before the frame F40 by the extrapolation method, the graph of FIG. 7 is extended before the time. Then, as shown in FIG. 8, the estimated trajectory of the mask target in the frame temporally prior to the frame F40 is obtained.

移動マスク作成部２６４は、動き補外部２６３において図８のように得られた推定軌跡に対し、この軌跡に沿って移動するマスク対象に掛かるように移動するマスクを作成する。例えば、図９のように、フレームＦ１０からフレームＦ３９までに対して左から徐々にフレーム内に入ってくるように移動するマスクを作成する。移動マスク作成部２６４は、このように作成した移動マスクの情報をマスクフラグ付与部２４０に出力する。移動マスクの情報とは、移動マスクの座標値、移動マスクの大きさ（サイズ）、および、移動マスクを付加するフレーム番号の情報である。 The movement mask creating unit 264 creates a mask that moves so as to be applied to a mask target that moves along this locus with respect to the estimated locus obtained as shown in FIG. For example, as shown in FIG. 9, a mask that moves so as to gradually enter the frame from the left with respect to the frames F10 to F39 is created. The movement mask creation unit 264 outputs information on the movement mask created in this way to the mask flag assignment unit 240. The movement mask information is information on the coordinate value of the movement mask, the size (size) of the movement mask, and the frame number to which the movement mask is added.

メタファイル作成部２７０は、マスクを掛ける領域についての情報をマスクフラグ付与部２４０から取得する。マスクを掛ける領域とはマスク対象領域であり、マスクを掛ける領域についての情報とは、例えば、マスク対象領域の座標、サイズ、フレーム番号といった情報である。繰り返しになるが、マスク対象領域としては、類似判定部２３４による閾値判定でマスク対象であると確定した領域の他、移動マスク作成部２６４で作成された移動マスクが付加される領域も含まれることはもちろんである。メタファイル作成部２７０は、マスク対象領域にマスクを掛ける命令を含むメタファイルを作成する。メタファイル作成部２７０は、作成したマスク用メタファイルをマスク付加部２８０に出力する。 The metafile creation unit 270 acquires information about the area to be masked from the mask flag assignment unit 240. The area to be masked is a mask target area, and the information about the area to be masked is information such as the coordinates, size, and frame number of the mask target area. Again, the mask target area includes not only the area determined as the mask target by the threshold determination by the similarity determination unit 234 but also the area to which the movement mask created by the movement mask creation unit 264 is added. Of course. The metafile creation unit 270 creates a metafile that includes a command for applying a mask to the mask target area. The metafile creation unit 270 outputs the created mask metafile to the mask addition unit 280.

マスク付加部２８０は、メタファイルのマスク処理命令に従って動画データにマスクを掛けていく。このとき、動画メモリ１２０に蓄積された総ての動画データに関するメタファイルが作成されるのを待ってからマスク付加部２８０でのマスク処理を開始してもよい。すなわち、メタファイル作成のための復号処理をしたあと、あらためての動画メモリ１２０から動画データを取り出して復号化する。このときはマスク候補取得部２２０やマスク対象決定部２３０を起動させる必要はなく、復号化された画像データをマスク付加部２８０にのみ送り、そこでメタファイルに従ったマスク付加を行う。 The mask addition unit 280 masks the moving image data in accordance with the mask processing command of the metafile. At this time, the mask adding unit 280 may start the masking process after waiting for the creation of metafiles related to all the moving picture data stored in the moving picture memory 120. That is, after the decoding process for creating the metafile, the moving image data is extracted from the moving image memory 120 and decoded again. At this time, it is not necessary to activate the mask candidate acquisition unit 220 and the mask target determination unit 230, and the decoded image data is sent only to the mask addition unit 280, where mask addition according to the metafile is performed.

あるいは、メタファイルの作成から所定の時間遅れをもたせてマスク付加部２８０でのマスク処理を進行させてもよい。例えば、マスク候補切出部２２２が処理中であるフレームのタイムスタンプに対し、数分分の遅れを持たせていれば前記フレームに対するマスク処理の要否は確定されている。 Alternatively, the mask processing in the mask adding unit 280 may be advanced with a predetermined time delay from the creation of the metafile. For example, if the mask candidate cutout unit 222 has a delay of several minutes with respect to the time stamp of the frame being processed, the necessity of the mask processing for the frame is determined.

マスク付加部２８０でマスク付加された動画データはデータ出力部２９０から出力される。そして、例えば、パソコンのハードディスクなどに一旦保存されたあと、インターネット上のサーバに送られることになる。 The moving image data masked by the mask adding unit 280 is output from the data output unit 290. Then, for example, the data is once stored in a hard disk of a personal computer and then sent to a server on the Internet.

（第１実施形態の動作）
続いて、第１実施形態の動作を説明する。
ユーザによる動画撮影（ＳＴ１０）からマスク付加されたデータの出力（ＳＴ６０）までの流れは、図１０に示すように、撮影（ＳＴ１０）、動画データの保存（ＳＴ２０）、マスク処理の事前準備（ＳＴ３０）、マスク処理用メタファイルの作成（ＳＴ４０）、マスク付加（ＳＴ５０）、データ出力（ＳＴ６０）の順で進む。 (Operation of the first embodiment)
Subsequently, the operation of the first embodiment will be described.
As shown in FIG. 10, the flow from the moving image shooting (ST10) by the user to the output of the masked data (ST60) is as follows: shooting (ST10), storage of moving image data (ST20), and advance preparation of mask processing (ST30) ), Creation of a mask processing metafile (ST40), mask addition (ST50), and data output (ST60).

先に説明したように、図２のように動画を撮影し（ＳＴ１０）、動画データは動画メモリ１２０に保存される（ＳＴ２０）。次に、マスク処理の事前準備を行う（ＳＴ３０）。マスク処理の事前準備（ＳＴ３０）として、ユーザはマスク対象に関する情報をマスク対象参照リスト格納部２３２に格納しておく。本例でいえば、子供ＣＡの顔の特徴量をマスク対象参照リスト格納部２３２に格納しておく。
子供ＣＡの顔の特徴量をマスク対象参照リスト格納部２３２に格納するにあたっては、例えば、別途に撮影した子供ＣＡの顔のデータをビデオカメラ１００にセットしてもよい。あるいは今回撮影した動画のなかから子供ＣＡの顔が映っている領域を指定して、これをマスク対象としてビデオカメラ１００にセットしてもよい。ビデオカメラ１００は、マスク対象としてセットされた領域から自動的に特徴量を算出してマスク対象参照リスト格納部２３２に格納する。 As described above, a moving image is shot as shown in FIG. 2 (ST10), and the moving image data is stored in the moving image memory 120 (ST20). Next, preliminary preparation for mask processing is performed (ST30). As advance preparation for mask processing (ST30), the user stores information on the mask target in the mask target reference list storage unit 232. In this example, the facial feature amount of the child CA is stored in the mask target reference list storage unit 232.
In storing the feature amount of the face of the child CA in the mask target reference list storage unit 232, for example, data of the face of the child CA photographed separately may be set in the video camera 100. Alternatively, an area showing the face of the child CA may be designated from the moving picture taken this time, and this may be set in the video camera 100 as a mask target. The video camera 100 automatically calculates the feature amount from the area set as the mask target and stores it in the mask target reference list storage unit 232.

これに合わせて切出参照リスト格納部２２１に切出参照リストを登録しておく必要がある。これについてはマスク対象参照リスト格納部２３２の情報からビデオカメラ１００が自動的に切出参照リストを生成するようにしてもよい。あるいは、人物の顔がマスク対象に設定されたならば、自動的に所定面積以上の肌色領域が切出参照リストに加えられるようにしてもよい。 In accordance with this, it is necessary to register the extracted reference list in the extracted reference list storage unit 221. In this regard, the video camera 100 may automatically generate a cut-out reference list from information in the mask target reference list storage unit 232. Alternatively, if a person's face is set as a mask target, a skin color region having a predetermined area or more may be automatically added to the extraction reference list.

このような事前準備（ＳＴ３０）が終わったところで、マスク処理部２００を起動してマスク処理用メタファイルの作成（ＳＴ４０）を実行する。マスク処理用メタファイルの作成工程（ＳＴ４０）を図１１のフローチャートを参照して説明する。撮影された動画データは動画メモリ１２０に蓄積された後、データ入力部２１０からマスク処理部２００に入力される（ＳＴ１０１）。データ入力部２１０の復号化部２１１において動画データが復号化され（ＳＴ１０２）、画像フレームが順次マスク候補切出部２２２に送られる。マスク候補切出部２２２は、画像フレーム中に切出参照リストに合致するマスク候補領域があるか否かを探索する。フレームＦ００の段階では、肌色の領域はないのであるからマスク候補となる領域はない（ＳＴ１３０でＮＯ）。マスク候補の領域がなければ、次ぎの画像フレームに探索対象を移していく。 When such advance preparation (ST30) is completed, the mask processing unit 200 is activated to create a mask processing metafile (ST40). The mask processing metafile creation step (ST40) will be described with reference to the flowchart of FIG. The captured moving image data is stored in the moving image memory 120 and then input from the data input unit 210 to the mask processing unit 200 (ST101). The moving image data is decoded by the decoding unit 211 of the data input unit 210 (ST102), and the image frames are sequentially sent to the mask candidate cutout unit 222. The mask candidate cutout unit 222 searches for a mask candidate region that matches the cutout reference list in the image frame. At the stage of frame F00, since there is no skin color area, there is no area that is a mask candidate (NO in ST130). If there is no mask candidate area, the search target is moved to the next image frame.

フレームＦ００からフレームＦ１０に移行するにつれて徐々に子供ＣＡがフレームに入ってくる。子供ＣＡの顔が所定面積を超えてフレームに入ってくれば、これは切出参照リストに合致することになる（ＳＴ１０３でＹＥＳ）。
例えばフレームＦ１０で肌色面積が所定値を超えたとする。すると、マスク候補切出部２２２は、所定面積以上の肌色領域Ｒ１０をマスク候補として切り出す（ＳＴ１０４）（図１２参照）。このとき、前述したように、切り出された領域（Ｒ１０）のデータには、切出し元のフレームＦ１０のフレーム番号およびこのフレームＦ１０中での座標が付加されている。 As the frame F00 shifts to the frame F10, the child CA gradually enters the frame. If the face of the child CA exceeds the predetermined area and enters the frame, this matches the cut-out reference list (YES in ST103).
For example, assume that the skin color area exceeds a predetermined value in the frame F10. Then, the mask candidate cutout unit 222 cuts out a skin color region R10 having a predetermined area or more as a mask candidate (ST104) (see FIG. 12). At this time, as described above, the frame number of the cut-out frame F10 and the coordinates in the frame F10 are added to the data of the cut-out area (R10).

マスク候補領域（Ｒ１０）が切り出されると、この領域（Ｒ１０）に対して特徴量算出部２３１により特徴量が算出される（ＳＴ１０５）。算出された特徴量は類似度算出部２３３に送られ、そこでマスク対象参照リストとの比較に基づいた類似度が求められる（ＳＴ１０６）。
算出された類似度は類似判定部２３４において閾値と対比される。領域Ｒ１０については、子供ＣＡの顔の特徴量と類似する点はないので、類似度が閾値Ｔｈを超えることはない（ＳＴ１０７でＮＯ）。この場合、領域Ｒ１０はマスク対象として確定せず（ＳＴ１０７でＮＯ）、次ぎの画像フレームの処理に進む。 When the mask candidate region (R10) is cut out, the feature amount calculation unit 231 calculates the feature amount for this region (R10) (ST105). The calculated feature amount is sent to the similarity calculation unit 233, where the similarity based on the comparison with the mask target reference list is obtained (ST106).
The calculated similarity is compared with a threshold value in the similarity determination unit 234. Since there is no point similar to the facial feature amount of the child CA in the region R10, the similarity does not exceed the threshold Th (NO in ST107). In this case, the region R10 is not determined as a mask target (NO in ST107), and the process proceeds to the next image frame.

フレームＦ１０からフレームＦ３０に移るにつれて、子供が映っている領域が徐々に大きくなっていくので肌色領域はすべてマスク候補として切出しの対象となるが（ＳＴ１０３でＹＥＳ）、まだ子供の顔が十分にフレーム内に映ってはいないので、類似判定（ＳＴ１０７）において類似度が閾値Ｔｈを超えることはない（ＳＴ１０７でＮＯ）。 As the area from the frame F10 to the frame F30 is gradually increased, the area in which the child is reflected gradually increases, so all skin color areas are subject to extraction as mask candidates (YES in ST103), but the child's face is still fully framed. In the similarity determination (ST107), the similarity does not exceed the threshold value Th (NO in ST107).

フレームＦ３０からさらに進んでフレームＦ４０が処理対象のフレームになった時点を考える。フレームＦ４０では領域Ｒ４０において顔の８０％程度がフレーム内に入っている状態である。領域Ｒ４０は、所定面積以上の肌色領域であるのでマスク候補としてマスク候補切出部２２２によって切り出され（ＳＴ１０４）、さらに、子供ＣＡの顔の８０％程度が映っているので、マスク対象参照リストにある指標（子供の顔の特徴量）と複数点で相関が高く、類似度としては大きな値が算出される。従って、Ｒ４０について求められた類似度Ｓ４０は、閾値Ｔｈ以上になる（ＳＴ１０７でＹＥＳ）。 Consider the point in time when the frame F40 is further processed from the frame F30 and becomes the frame to be processed. In the frame F40, about 80% of the face in the region R40 is in the frame. Since the region R40 is a skin color region having a predetermined area or more, it is cut out as a mask candidate by the mask candidate cutout unit 222 (ST104), and furthermore, about 80% of the face of the child CA is shown, so that it is included in the mask target reference list. A certain index (feature value of a child's face) has a high correlation at a plurality of points, and a large value is calculated as the similarity. Therefore, the similarity S40 obtained for R40 is equal to or greater than the threshold Th (YES in ST107).

このように類似度Ｓ４０が閾値Ｔｈ以上になった場合、類似判定部２３４は、閾値Ｔｈ以上の類似度Ｓをもつ領域が出現したことをマスクフラグ付与部２４０に通知する。すると、類似度が閾値以上となった領域Ｒ４０に対してマスクフラグ付与部２４０によってマスクフラグが付与される（ＳＴ１０８）。 As described above, when the similarity S40 is equal to or greater than the threshold Th, the similarity determination unit 234 notifies the mask flag giving unit 240 that an area having the similarity S equal to or greater than the threshold Th has appeared. Then, a mask flag is assigned by the mask flag assigning unit 240 to the region R40 where the similarity is equal to or greater than the threshold (ST108).

次に、このようにマスク対象が出現したところで、位置判定部２５０による位置判定（ＳＴ１０９）を実行する。すなわち、マスク対象である領域Ｒ４０とフレーム枠との距離を所定閾値と対比する（ＳＴ１１０）。 Next, when the mask target appears in this way, position determination (ST109) by the position determination unit 250 is executed. That is, the distance between the area R40 to be masked and the frame is compared with a predetermined threshold (ST110).

マスク対象（Ｒ４０）とフレーム枠との距離が所定閾値を超えていれば（ＳＴ１１０でＮＯ）、移動マスクを補充する必要はないと判断する。したがって、マスク対象として確定した領域Ｒ４０に付されているマスクフラグに従って、マスク対象領域Ｒ４０にマスクを掛けるメタファイルをメタファイル作成部２７０によって作成する（ＳＴ１１１）。 If the distance between the mask target (R40) and the frame frame exceeds a predetermined threshold (NO in ST110), it is determined that it is not necessary to replenish the moving mask. Accordingly, the metafile creation unit 270 creates a metafile for masking the mask target region R40 according to the mask flag attached to the region R40 determined as the mask target (ST111).

一方、ＳＴ１１０において、マスク対象領域（Ｒ４０）とフレーム枠との距離が所定閾値以下である場合（ＳＴ１１０でＹＥＳ）、フレームインしてきた被写体に必要なマスクを掛けられていない可能性がある。そこでこの場合、マスク補充部２６０を起動させて、移動マスク作成を開始する（ＳＴ１２０）。 On the other hand, in ST110, when the distance between the mask target region (R40) and the frame is equal to or smaller than the predetermined threshold (YES in ST110), there is a possibility that the necessary mask is not put on the subject that has entered the frame. Therefore, in this case, the mask replenishment unit 260 is activated to start moving mask creation (ST120).

移動マスクを作成する工程を図１３のフローチャートを参照して説明する。
移動マスクを作成するために、座標情報取得部２６１によってマスク対象領域の座標情報を取得する（ＳＴ１２１）。
これは、マスク対象が確定した後のフレーム（フレームＦ４０以降のフレーム）についても順次ＳＴ１０１からＳＴ１０７を実行していく際に、座標情報取得部２６１は、フレームＦ４０以降のフレームに関して類似判定部２３４および位置判定部２５０経由でマスク対象となる領域の座標情報を取得する。これにより、フレームＦ４０以降のマスク対象領域について、座標、サイズ、そのフレームのタイムスタンプといった座標情報が補外用データ保持部２６２に格納されていく（図６参照）。
そして、ＳＴ１２２において、必要量のデータが拾集されたか否かを判定する。これは、図８にて示したように補外法で時間的に前のフレームにおけるマスク対象の動きを推定するにあたって十分なデータ量があるか否かを動き補外部２６３において判定するものである。 The process of creating the movement mask will be described with reference to the flowchart of FIG.
In order to create a movement mask, the coordinate information acquisition unit 261 acquires the coordinate information of the mask target area (ST121).
This is because the coordinate information acquisition unit 261 performs the similarity determination unit 234 and the frame F40 and subsequent frames with respect to the frame after the frame F40 when the ST101 to ST107 are sequentially performed on the frames after the mask target is determined (frames after the frame F40). The coordinate information of the area to be masked is acquired via the position determination unit 250. Accordingly, coordinate information such as coordinates, size, and time stamp of the frame is stored in the extrapolation data holding unit 262 for the mask target area after the frame F40 (see FIG. 6).
In ST122, it is determined whether or not a necessary amount of data has been collected. As shown in FIG. 8, the motion extrapolation 263 determines whether or not there is a sufficient amount of data for estimating the motion of the mask target in the temporally previous frame by the extrapolation method. .

データ数が十分かどうかを判定する基準は適宜設定されればよい。
例えば、フレームＦ４０以降の座標情報に対して回帰直線や、適当な次数の関数曲線や、適切なスプライン曲線などでフィッティングカーブを描く（図７参照）。マスク対象として最初に確定したフレーム（フレームＦ４０）から数えて数十フレーム（フレームＦ４０からフレームＦ６０あたりまで）に対する残差が所定値以下になっていれば、補外法が適用できる程度にデータが集まったと判断してもよい。
なお、マスク対象として最初に確定したフレーム（フレームＦ４０）に近いほど寄与度が大きくなる重みを付け、マスク対象として最初に確定したフレーム（フレームＦ４０）の直後の数フレームにおける動きがより強く反映されるようにしてもよい。 The criteria for determining whether the number of data is sufficient may be set as appropriate.
For example, a fitting curve is drawn with respect to the coordinate information after the frame F40 using a regression line, a function curve of an appropriate degree, an appropriate spline curve, or the like (see FIG. 7). If the residual with respect to several tens of frames (from the frame F40 to around the frame F60) counted from the frame (frame F40) first determined as a mask target is equal to or less than a predetermined value, the data can be applied to the extent that extrapolation can be applied. You may judge that they have gathered.
It should be noted that the weight of the contribution increases as the distance from the frame first determined as the mask target (frame F40) increases, and the motions in the frames immediately after the frame first determined as the mask target (frame F40) are more strongly reflected. You may make it do.

補外法を適用するのに十分なデータが集まったところで（ＳＴ１２２でＹＥＳ）、マスク対象領域の移動軌跡を外挿し（ＳＴ１２３）、マスク対象が検出される前の動きを推定する（図８参照）。このように外挿で推定したマスク対象の移動に掛かるように移動マスク作成部２６４により移動マスクを作成する。これにより、図９に示したように、マスク対象が検出される前のフレーム（フレームＦ１０からフレームＦ３０・・）に掛かるマスクが作成される。 When sufficient data is collected to apply the extrapolation method (YES in ST122), the movement trajectory of the mask target region is extrapolated (ST123), and the motion before the mask target is detected is estimated (see FIG. 8). ). In this way, the movement mask creation unit 264 creates a movement mask so as to affect the movement of the mask target estimated by extrapolation. As a result, as shown in FIG. 9, a mask for a frame (frame F10 to frame F30...) Before the mask target is detected is created.

作成された移動マスクの情報をマスクフラグ付与部２４０に出力し、マスクフラグ付与部２４０において移動マスクの領域にマスクフラグを付与する（ＳＴ１２５）。 Information on the created moving mask is output to the mask flag assigning section 240, and the mask flag assigning section 240 assigns a mask flag to the area of the moving mask (ST125).

マスクフラグを付与されたマスク対象データがメタファイル作成部２７０に送られ、マスク処理を命令するメタファイルが作成される（ＳＴ１１１）。動画メモリ１２０に蓄積された動画データの全フレームを検証して（ＳＴ１１２でＹＥＳ）、マスク処理用のメタファイルが完成する。 The mask target data provided with the mask flag is sent to the metafile creation unit 270, and a metafile for instructing mask processing is created (ST111). All frames of the moving image data stored in the moving image memory 120 are verified (YES in ST112), and the mask processing metafile is completed.

このように作成されたメタファイルに従って動画データにマスク処理を施す（ＳＴ５０）。すなわち、マスク付加部２８０は、復号化された動画フレームデータに対しメタファイルに指定された領域にマスクを掛けていく（ＳＴ５０）。すると、図１４に示すように、子供がフレームに映り始めたところからマスク処理が掛かり、子供のプライバシーを保護することができる。 The moving image data is masked according to the metafile thus created (ST50). That is, mask adding section 280 masks the area specified in the metafile for the decoded moving picture frame data (ST50). Then, as shown in FIG. 14, mask processing is applied from the point where the child starts to appear in the frame, and the child's privacy can be protected.

このような構成を備える第１実施形態によれば次の効果を奏することができる。
（１）本実施形態ではマスク補充部２６０を備えており、マスク対象が検出される前のフレームに対しても補外法によってマスク対象の動きを推定し、移動マスクを掛けることとしている。
これにより、マスク対象（被写体）がフレームインし始めてからマスク対象であると確定するまでの間の映像にもマスクを掛けることができ、プライバシーを確実に保護できる。 According to 1st Embodiment provided with such a structure, there can exist the following effects.
(1) In this embodiment, the mask supplementing unit 260 is provided, and the motion of the mask target is estimated by the extrapolation method on the frame before the mask target is detected, and the moving mask is applied.
Thereby, it is possible to mask an image from when the mask target (subject) starts to be framed in until the mask target (subject) is determined to be the mask target, and privacy can be reliably protected.

（２）本実施形態では、移動マスクを補充するにあたって補外用データ保持部２６２に座標情報を保持するが、これらは画像情報ではなく単なる数値であるので補外用データ保持部２６２の容量は小さくてもよい。例えば、従来技術のようにマスク対象を検出した後で時間的に遡及して関連部分を辿れるように所定量の動画データをバッファに一時保存しておくという方法があるが、これでは非常に大きな容量のメモリが必要になる。そして、バッファから溢れてしまった画像データについては遡及的に検索することはできず、その部分はプライバシーマスク無しで公開されてしまう恐れもある。
この点、本実施形態では移動マスクの作成にあたって必要になるデータは座標情報だけであり、画像情報そのものは保持しておく必要はないのでメモリ容量は極小さくてもよい。また仮にマスク対象の移動軌跡を外挿するのにどれほど多量のフレームを追跡しなければならないとしても、座標情報を一時的に保持するだけでよいのでメモリ容量が問題になることはない。
したがって、必要であれば十分な量の座標情報を収集した後で、正確性の高い移動マスクを作成することができる。 (2) In this embodiment, coordinate information is held in the extrapolation data holding unit 262 when replenishing the moving mask. However, since these are not numerical information but merely numerical values, the extrapolation data holding unit 262 has a small capacity. Also good. For example, there is a method of temporarily storing a predetermined amount of moving image data in a buffer so that a related part can be traced back in time after detecting a mask target as in the prior art, but this is very large. A large amount of memory is required. The image data overflowing from the buffer cannot be retrieved retroactively, and the portion may be disclosed without a privacy mask.
In this respect, in the present embodiment, only the coordinate information is necessary for creating the movement mask, and it is not necessary to store the image information itself, so the memory capacity may be extremely small. Also, no matter how many frames must be tracked to extrapolate the movement trajectory of the mask object, the memory capacity does not become a problem because it is only necessary to temporarily hold the coordinate information.
Therefore, a movement mask with high accuracy can be created after collecting a sufficient amount of coordinate information if necessary.

（３）位置判定部２５０を備え、新たに検出されたマスク対象の領域がフレーム枠に近いかどうかを位置判定部２５０によって判定することによって移動マスクの要否を判断する。単純に、新たにマスク対象が検出されるたびに補外法で移動マスクを補充していると必要のないマスクが入り乱れる恐れがあるが、この点、本実施形態ではマスク対象領域の出現位置に基づいて移動マスクの必要性を適切に判断することができる。これにより、必要な場合にだけ移動マスクを適切に付加することができる。 (3) A position determination unit 250 is provided, and the position determination unit 250 determines whether the newly detected mask target area is close to the frame frame, thereby determining whether or not a moving mask is necessary. Simply, every time a new mask target is detected, if a moving mask is replenished by extrapolation, unnecessary masks may be confused. In this respect, in this embodiment, the appearance position of the mask target region Based on the above, the necessity of the moving mask can be appropriately determined. Thereby, a movement mask can be appropriately added only when necessary.

（変形例１）
上記第１実施形態において、位置判定部２５０では新たに出現したマスク対象領域とフレーム枠との距離にだけ基づいて移動マスクの要否を判断していた。移動マスクが必要であるか否かを判断するにあたっては、別の判定要件を用いてもよい。
例えば、マスク対象領域がフレームの中心方向に移動しているかどうかを判定要件としもよい。あるいは上記の二つとも用い、両方を満たした場合にマスク補充部を起動させてもよい。さらには、上記二つの要件のうちいずれか一方でも満たせばマスク補充部を起動させてもよい。 (Modification 1)
In the first embodiment, the position determination unit 250 determines whether or not a moving mask is necessary based only on the distance between a newly appearing mask target area and the frame frame. In determining whether or not a moving mask is necessary, another determination requirement may be used.
For example, it may be determined whether or not the mask target area is moving in the center direction of the frame. Alternatively, both of the above two may be used, and the mask replenishment unit may be activated when both are satisfied. Furthermore, the mask replenishment unit may be activated if any one of the two requirements is satisfied.

なお、本発明は上記実施形態および変形例に限られず、本発明の趣旨を逸脱しない範囲で適宜変更することが可能である。
上記実施形態においては、特定人物の顔にプライバシーマスクを掛ける場合を説明したが、逆に、特定人物にはマスクを掛けず、その他のたまたま映ってしまった一般の人の顔にプライバシーマスクを掛けるようにしてもよいことはもちろんである。この場合、被写体が、「人の顔ではあるが特定人物の顔ではない」ということを閾値判定で判断すればよいのであり、このような判定のためにマスク対象参照リストや類似判定部を改変することは当業者には明らかであろう。 The present invention is not limited to the above-described embodiments and modifications, and can be appropriately changed without departing from the spirit of the present invention.
In the above embodiment, the case where the privacy mask is put on the face of the specific person has been described, but conversely, the mask is not put on the specific person, and the face of the general person who happens to be reflected is put on the privacy mask. Of course, you may do it. In this case, it is only necessary to determine that the subject is “a person's face but not a specific person's face” by threshold determination, and the mask target reference list and similarity determination unit are modified for such determination. It will be apparent to those skilled in the art.

「肌色」というのは人種によって異なるのであるから、マスク対象の人種を考慮して解釈されるべきである。例えば、ビデオカメラが使用される国がアジア圏であれば、肌色とは例えばモンゴロイドの肌の色、すなわち、淡黄色を意味する。被写体がコーカソイドやニグロイドであれば当然ながら肌色は白色であったり黄褐色であったりする。 Since “skin color” varies depending on race, it should be interpreted in consideration of the race to be masked. For example, if the country where the video camera is used is Asia, the skin color means, for example, the skin color of Mongoloid, that is, light yellow. Naturally, if the subject is a Caucasian or Niggloid, the skin color may be white or tan.

上記実施形態では撮影が終了した後でマスク処理部を起動してマスク処理を行う例を説明したが、撮影しながら並行してマスク処理を実行してもよい。 In the above-described embodiment, an example in which the mask processing unit is activated and mask processing is performed after photographing is completed has been described. However, mask processing may be performed in parallel while photographing.

ＣＰＵやメモリを配置してコンピュータとして機能できるように構成し、このメモリに所定の制御プログラムをインターネット等の通信手段や、ＣＤ−ＲＯＭ、メモリカード等の記録媒体を介してインストールし、このインストールされたプログラムでＣＰＵ等を動作させて、上記実施形態で説明した各機能部としての機能を実現してもよい。 The CPU and memory are arranged so as to function as a computer, and a predetermined control program is installed in this memory via a communication means such as the Internet or a recording medium such as a CD-ROM or a memory card. The function as each functional unit described in the above embodiment may be realized by operating a CPU or the like with a program.

１００…ビデオカメラ、１１０…ビデオカメラの撮像部、１２０…動画メモリ、２００…マスク処理部、２１０…データ入力部、２１１…復号化部、２２０…マスク候補取得部、２２１…切出参照リスト格納部、２２２…マスク候補切出部、２２２…順次マスク候補切出部、２２２…マスク候補切出部、２３０…マスク対象決定部、２３１…特徴量算出部、２３２…マスク対象参照リスト格納部、２３３…類似度算出部、２３４…類似判定部、２４０…マスクフラグ付与部、２５０…位置判定部、２６０…マスク補充部、２６１…座標情報取得部、２６２…補外用データ保持部、２６３…動き補外部、２６４…移動マスク作成部、２７０…メタファイル作成部、２８０…マスク付加部、２９０…データ出力部、９００…動画像データ記録再生システム、９１０…パソコン、９２０…インターネット、９３０…サーバ。 DESCRIPTION OF SYMBOLS 100 ... Video camera, 110 ... Video camera imaging part, 120 ... Movie memory, 200 ... Mask processing part, 210 ... Data input part, 211 ... Decoding part, 220 ... Mask candidate acquisition part, 221 ... Extraction reference list storage , 222 ... Mask candidate cutout part, 222 ... Sequential mask candidate cutout part, 222 ... Mask candidate cutout part, 230 ... Mask target determination part, 231 ... Feature quantity calculation part, 232 ... Mask target reference list storage part, 233 ... Similarity calculation unit, 234 ... Similarity determination unit, 240 ... Mask flag assignment unit, 250 ... Position determination unit, 260 ... Mask supplementation unit, 261 ... Coordinate information acquisition unit, 262 ... Extrapolation data holding unit, 263 ... Movement Supplementary external, 264 ... movement mask creation unit, 270 ... metafile creation unit, 280 ... mask addition unit, 290 ... data output unit, 900 ... moving image data recording / playback Stems, 910 ... PC, 920 ... the Internet, 930 ... server.

Claims

An image processing apparatus for searching image areas to be masked from moving image data and adding a mask to them,
Based on a pre-registered cut-out reference list, a mask candidate cut-out unit that cuts out an area that can be a mask target from among the image frames of the moving image data as a mask candidate area;
Based on a pre-registered mask target reference list, a similarity calculation unit that calculates the similarity of both by comparing the latest mask candidate region with the mask target reference list;
A similarity determination unit that compares the calculated similarity with a predetermined threshold and determines the mask candidate region as a mask target when the similarity is equal to or higher than the predetermined threshold;
Estimating the temporal movement trajectory of the area determined as the mask target using a movement trajectory obtained by tracking the movement of the area determined as the mask target by time determination by the similarity determination unit; A mask replenishment unit that creates a replenishment mask over the area obtained by the estimated movement trajectory in time;
A mask applying unit that adds a mask to the region of the replenishment mask created by the mask replenishment unit in addition to the region determined as a mask target by the determination by the similarity determination unit. apparatus.

The mask replenishment unit
A coordinate information acquisition unit that sequentially acquires coordinate information of regions determined as mask targets by the determination by the similarity determination unit;
An extrapolation data holding unit for holding the coordinate information acquired by the coordinate information acquisition unit;
Extrapolation of the data held in the extrapolation data holding unit in time and extrapolating the movement before the mask target is detected by extrapolating in advance in time,
The image processing apparatus according to claim 1, further comprising: a moving mask creating unit that creates a mask that moves so as to be subjected to a motion obtained by extrapolation by the motion compensation external.

A position determination unit that determines whether or not the distance between the region determined as the mask target by the determination by the similarity determination unit and the image frame is equal to or less than a predetermined threshold;
The image processing apparatus according to claim 1, wherein the position determination unit activates the mask replenishment unit when the distance is equal to or less than a predetermined threshold.

A position determination unit that determines whether or not the region determined as a mask target by the determination by the similarity determination unit is moving in the center direction of the image frame;
The position determination unit activates the mask replenishment unit when the region determined as the mask target is moving in the center direction of the image frame. An image processing apparatus according to 1.

An image processing method for searching image areas to be masked from video data and adding a mask to them,
Based on a pre-registered cut-out reference list, a mask candidate cut-out step of cutting out an area that can be a mask target from among the image frames of the moving image data as a mask candidate area;
Based on a pre-registered mask target reference list, a similarity calculation step of comparing the latest mask candidate region with the mask target reference list and calculating the similarity between the two,
A similarity determination step of comparing the calculated similarity with a predetermined threshold and determining the mask candidate region as a mask target when the similarity is equal to or higher than the predetermined threshold;
Estimating the temporal movement trajectory of the area determined as the mask target using a movement trajectory obtained by tracking the movement of the area determined as the mask target in time by the determination by the similarity determination step; A mask replenishment step for creating a replenishment mask over the region obtained by the estimated movement trajectory in time;
A mask applying step of adding a mask to the region of the replenishment mask created in the mask replenishment step in addition to the region determined as the mask target by the determination in the similarity determination step. Method.

An image processing program for searching image areas to be masked from video data and adding a mask to them,
This computer,
Based on a pre-registered cut-out reference list, a mask candidate cut-out unit that cuts out an area that can be a mask target from among the image frames of the moving image data as a mask candidate area;
Based on a pre-registered mask target reference list, a similarity calculation unit that calculates the similarity of both by comparing the latest mask candidate region with the mask target reference list;
A similarity determination unit that compares the calculated similarity with a predetermined threshold and determines the mask candidate region as a mask target when the similarity is equal to or higher than the predetermined threshold;
Estimating the temporal movement trajectory of the area determined as the mask target using a movement trajectory obtained by tracking the movement of the area determined as the mask target by time determination by the similarity determination unit; A mask replenishment unit that creates a replenishment mask that is applied to the region by the estimated movement trajectory in time,
An image processing program that functions as a mask applying unit that adds a mask to a region of the replenishment mask created by the mask replenishment unit in addition to a region determined as a mask target by determination by the similarity determination unit.