JP2020198476A

JP2020198476A - Image processing device, program, and image processing method

Info

Publication number: JP2020198476A
Application number: JP2019101839A
Authority: JP
Inventors: 蓬田　裕一; Yuichi Yomogida; 裕一蓬田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2019-05-30
Filing date: 2019-05-30
Publication date: 2020-12-10
Anticipated expiration: 2039-05-30
Also published as: JP7287118B2

Abstract

To keep the quality of an image in a boundary region between a portion including a large volume of information and a portion including a small volume of information of the image.SOLUTION: An image processing device according to one embodiment of the disclosed technique comprises a first division processing unit for dividing a region except the face region of a person included in an input image on the basis of a first feature amount of the input image; and an information volume reduction processing unit for executing a process of reducing an information volume, for each region obtained by the division by the first division processing unit.SELECTED DRAWING: Figure 8

Description

本発明は、画像処理装置、プログラム、及び画像処理方法に関する。 The present invention relates to an image processing apparatus, a program, and an image processing method.

ネットワークや通信機器等の普及により、動画像等の画像を利用した利便性の高いコミュニケーションが可能となっている。このような画像を用いた通信では、ネットワーク帯域が狭い場合等に、画像や音声の品質が低下することがある。これに対し、ネットワークの混雑状況を検出し、送信する画像の情報量を減少させる方法が用いられている。 With the spread of networks and communication devices, highly convenient communication using images such as moving images has become possible. In communication using such images, the quality of images and sounds may deteriorate when the network band is narrow. On the other hand, a method of detecting the congestion status of the network and reducing the amount of information of the image to be transmitted is used.

また、スマートフォンや、タブレット端末、ビデオ会議端末等の画像を用いた通信機能を有する画像処理装置において、通信する情報量を削減するために、画像内で喋りを伴う唇の動きを含んだ顔等の重要性の高い部分と、人の顔以外の背景等の重要性の低い部分を区別し、画像の情報量を異ならせるものが開示されている（例えば、特許文献１参照）。 In addition, in an image processing device having a communication function using images such as a smartphone, a tablet terminal, and a video conferencing terminal, in order to reduce the amount of information to be communicated, a face or the like including the movement of the lips accompanied by talking in the image. There is disclosed a part that distinguishes between a highly important part and a less important part such as a background other than a human face and makes the amount of information in the image different (see, for example, Patent Document 1).

しかしながら、従来の装置では、画像の情報量が多い部分と少ない部分との境界領域で画像の品質が低下する場合があった。 However, in the conventional apparatus, the quality of the image may deteriorate in the boundary region between the portion where the amount of information in the image is large and the portion where the amount of information in the image is small.

開示の技術は、画像の情報量が多い部分と少ない部分との境界領域における画像の品質を保持することを課題とする。 The disclosed technique has an object of maintaining the quality of an image in a boundary region between a portion having a large amount of information and a portion having a small amount of information in the image.

開示の技術の一態様に係る画像処理装置は、入力画像の第１特徴量に基づいて、前記入力画像に含まれる人物の顔領域以外の領域を分割する第１分割処理部と、前記第１分割処理部により分割された領域毎で、情報量を減少させる処理を実行する情報量減少処理部と、を有する。 The image processing apparatus according to one aspect of the disclosed technology includes a first division processing unit that divides an area other than the face area of a person included in the input image based on the first feature amount of the input image, and the first division processing unit. Each area divided by the division processing unit has an information amount reduction processing unit that executes a process of reducing the information amount.

開示の技術によれば、画像の情報量が多い部分と少ない部分との境界領域における画像の品質を保持することができる。 According to the disclosed technique, it is possible to maintain the quality of the image in the boundary region between the portion having a large amount of information and the portion having a small amount of information in the image.

実施形態に係るビデオ会議端末の構成の一例を示す外観図である。It is an external view which shows an example of the structure of the video conferencing terminal which concerns on embodiment. 実施形態に係るビデオ会議端末のハードウェア構成の一例を説明するブロック図である。It is a block diagram explaining an example of the hardware configuration of the video conferencing terminal which concerns on embodiment. 実施形態に係るビデオ会議端末の機能構成の一例を説明するブロック図である。It is a block diagram explaining an example of the functional structure of the video conferencing terminal which concerns on embodiment. 実施形態に係るビデオ会議端末による情報量減少処理を説明する図である。It is a figure explaining the information amount reduction process by the video conferencing terminal which concerns on embodiment. 実施形態に係るビデオ会議端末における画像の領域毎の更新タイミングを説明するタイミングチャートである。It is a timing chart explaining the update timing for each area of the image in the video conferencing terminal which concerns on embodiment. 実施形態に係るビデオ会議端末による画像の情報量の減少効果を説明する図である。It is a figure explaining the effect of reducing the information amount of an image by the video conferencing terminal which concerns on embodiment. 実施形態に係るビデオ会議端末による処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process by the video conferencing terminal which concerns on embodiment. 第１の実施形態に係るビデオ会議端末の機能構成の一例を説明するブロック図である。It is a block diagram explaining an example of the functional structure of the video conferencing terminal which concerns on 1st Embodiment. 非顔領域を囲むように設定された枠状領域の一例を説明する図である。It is a figure explaining an example of the frame-shaped area set to surround a non-face area. 非顔領域の分割処理の一例を説明する図である。It is a figure explaining an example of the division process of a non-face area. 第１の実施形態に係るビデオ会議端末で使用される空間フィルタの一例を説明する図であり、（ａ）はサイズ３×３画素の空間フィルタを説明する図、（ｂ）はサイズ５×５画素の空間フィルタを説明する図である。It is a figure explaining an example of the spatial filter used in the video conferencing terminal which concerns on 1st Embodiment, (a) is the figure explaining the spatial filter of size 3 × 3 pixels, (b) is the figure which explains the size 5 × 5. It is a figure explaining the spatial filter of a pixel. 第１の実施形態に係るビデオ会議端末による処理の一例を示すフローチャートである。It is a flowchart which shows an example of the processing by the video conference terminal which concerns on 1st Embodiment. 第２の実施形態に係るビデオ会議端末の機能構成の一例を説明するブロック図である。It is a block diagram explaining an example of the functional structure of the video conferencing terminal which concerns on 2nd Embodiment. 枠状領域の分割処理の一例を説明する図である。It is a figure explaining an example of the division process of a frame-shaped area. 第２の実施形態に係るビデオ会議端末による処理の一例を示すフローチャートである。It is a flowchart which shows an example of the processing by the video conference terminal which concerns on 2nd Embodiment.

以下、図面を参照して発明を実施するための形態について説明する。なお、各図面において、同一構成部分には同一符号を付し、重複した説明を省略する場合がある。 Hereinafter, modes for carrying out the invention will be described with reference to the drawings. In each drawing, the same components may be designated by the same reference numerals and duplicate description may be omitted.

実施形態では、画像処理装置の一例としてのビデオ会議端末について説明する。但し、通信機能を備えた装置であれば、画像処理装置はビデオ会議端末に限定されるものではない。画像処理装置は、例えば、ＰＪ（Projector：プロジェクタ）、ＩＷＢ（Interactive White Board：相互通信が可能な電子式の黒板機能を有する白板）、デジタルサイネージ等の出力装置、ＨＵＤ（Head Up Display）装置、産業機械、撮像装置、集音装置、医療機器、ネットワーク家電、自動車（Connected Car）、ノートＰＣ（Personal Computer）、携帯電話、スマートフォン、タブレット端末、ゲーム機、PAD（Personal Digital Assistant）、デジタルカメラ、ウェアラブルＰＣ、又はデスクトップＰＣ等であっても良い。 In the embodiment, a video conferencing terminal as an example of the image processing device will be described. However, the image processing device is not limited to the video conferencing terminal as long as it is a device having a communication function. The image processing device includes, for example, a PJ (Projector: projector), an IWB (Interactive White Board: a white board having an electronic whiteboard function capable of intercommunication), an output device such as a digital signage, a HUD (Head Up Display) device, and the like. Industrial machines, imaging devices, sound collectors, medical devices, network home appliances, automobiles (Connected Cars), notebook PCs (Personal Computers), mobile phones, smartphones, tablet terminals, game machines, PADs (Personal Digital Assistants), digital cameras, It may be a wearable PC, a desktop PC, or the like.

＜実施形態に係るビデオ会議端末の構成＞
先ず、実施形態に係るビデオ会議端末の構成の一例を説明する。図１は、ビデオ会議端末１０の外観図である。図１に示すように、ビデオ会議端末１０は、筐体１１００と、アーム１２００と、カメラハウジング１３００とを有する。このうち、筐体１１００の前側壁面１１１０には、複数の吸気孔によって形成された吸気面が設けられており、筐体１１００の後側壁面１１２０には、複数の排気孔が形成された排気面１１２１が設けられている。これにより、筐体１１００に内蔵された冷却ファンの駆動によって、吸気面を介してビデオ会議端末１０の後方の外気を取り込み、排気面１１２１を介してビデオ会議端末１０の後方へ排気することができる。筐体１１００の右側壁面１１３０に収音用孔１１３１が形成されていることで、内蔵型のマイク１１４は、音声、物音、雑音等の音を収音可能となっている。 <Configuration of video conferencing terminal according to the embodiment>
First, an example of the configuration of the video conferencing terminal according to the embodiment will be described. FIG. 1 is an external view of the video conferencing terminal 10. As shown in FIG. 1, the video conferencing terminal 10 has a housing 1100, an arm 1200, and a camera housing 1300. Of these, the front side wall surface 1110 of the housing 1100 is provided with an intake surface formed by a plurality of intake holes, and the rear side wall surface 1120 of the housing 1100 is provided with an exhaust surface having a plurality of exhaust holes. 1121 is provided. As a result, the outside air behind the video conferencing terminal 10 can be taken in through the intake surface and exhausted to the rear of the video conferencing terminal 10 through the exhaust surface 1121 by driving the cooling fan built in the housing 1100. .. Since the sound collecting hole 1131 is formed on the right side wall surface 1130 of the housing 1100, the built-in microphone 114 can collect sounds such as voice, noise, and noise.

筐体１１００の右側壁面１１３０側には、操作パネル１１５０が形成されている。この操作パネル１１５０には、複数の操作ボタン（１０８ａ〜１０８ｅ）、電源スイッチ１０９、及びアラームランプ１１９が設けられていると共に、内蔵型のスピーカ１１５からの出力音を通すための複数の音出力孔によって形成された音出面１１５１が形成されている。また、筐体１１００の左側壁面１１４０側には、アーム１２００及びカメラハウジング１３００を収容するための凹部としての収容部１１６０が形成されている。筐体１１００の右側壁面１１３０には、接続Ｉ／Ｆ１１８に対して電気的にケーブルを接続するための複数の接続口（１１３２ａ〜１１３２ｃ）が設けられている。一方、筐体１１００の左側壁面１１４０には、接続Ｉ／Ｆ１１８に対して電気的にディスプレイ１２０用のケーブル１２０ｃを接続するための接続口が設けられている。 An operation panel 1150 is formed on the right side wall surface 1130 side of the housing 1100. The operation panel 1150 is provided with a plurality of operation buttons (108a to 108e), a power switch 109, and an alarm lamp 119, and a plurality of sound output holes for passing the output sound from the built-in speaker 115. The sound output surface 1151 formed by the above is formed. Further, on the left side wall surface 1140 side of the housing 1100, an accommodating portion 1160 is formed as a recess for accommodating the arm 1200 and the camera housing 1300. The right side wall surface 1130 of the housing 1100 is provided with a plurality of connection ports (1132a to 1132c) for electrically connecting the cable to the connection I / F 118. On the other hand, the left side wall surface 1140 of the housing 1100 is provided with a connection port for electrically connecting the cable 120c for the display 120 to the connection I / F 118.

次に、アーム１２００は、トルクヒンジ１２１０を介して筐体１１００に取り付けられており、アーム１２００が筐体１１００に対して、１３５度のチルト角θ１の範囲で、上下方向に回転可能になっている。図１は、チルト角θ１が９０度の状態を示している。カメラハウジング１３００には、内蔵型のカメラ１１２が設けられており、ユーザ、書類、及び部屋等を撮像することができる。また、カメラハウジング１３００には、トルクヒンジ１３１０が形成されている。カメラハウジング１３００は、トルクヒンジ１３１０を介して、アーム１２００に取り付けられている。カメラハウジング１３００はアーム１２００に対して、図１で示されている状態を０度として±１８０度のパン角θ２の範囲で、且つ、±４５度のチルト角θ３の範囲で、上下左右方向に回転可能になっている。 Next, the arm 1200 is attached to the housing 1100 via the torque hinge 1210, and the arm 1200 can rotate in the vertical direction with respect to the housing 1100 within a tilt angle θ1 of 135 degrees. There is. FIG. 1 shows a state in which the tilt angle θ1 is 90 degrees. The camera housing 1300 is provided with a built-in camera 112, which can image users, documents, rooms, and the like. Further, a torque hinge 1310 is formed in the camera housing 1300. The camera housing 1300 is attached to the arm 1200 via a torque hinge 1310. The camera housing 1300 with respect to the arm 1200 in the vertical and horizontal directions within a pan angle θ2 of ± 180 degrees and a tilt angle θ3 of ± 45 degrees with the state shown in FIG. 1 as 0 degrees. It is rotatable.

なお、上記図１の外観図はあくまで一例であってこの外観に限定するものではない。ビデオ会議端末１０として用いるコンピュータに、マイクやカメラが備わっていない場合には、外付けのマイクおよびカメラをコンピュータに接続させることができる。また、ビデオ会議端末１０が汎用コンピュータやスマートフォン等である場合には、無線ＬＡＮや携帯電話網等による無線通信により、ビデオ会議端末１０とインターネットとを接続しても良い。また、ビデオ会議端末１０として汎用コンピュータを用いる場合には、コンピュータにビデオ会議端末１０の処理を実行するためのアプリケーションをインストールしておくことができる。 The external view of FIG. 1 is merely an example and is not limited to this external view. If the computer used as the video conferencing terminal 10 is not equipped with a microphone or a camera, an external microphone and a camera can be connected to the computer. When the video conferencing terminal 10 is a general-purpose computer, a smartphone, or the like, the video conferencing terminal 10 may be connected to the Internet by wireless communication via a wireless LAN, a mobile phone network, or the like. When a general-purpose computer is used as the video conferencing terminal 10, an application for executing the processing of the video conferencing terminal 10 can be installed on the computer.

＜実施形態に係るビデオ会議端末のハードウェア構成＞
次に、図２は、ビデオ会議端末１０のハードウェア構成の一例を説明する図である。図２に示すように、ビデオ会議端末１０は、ＣＰＵ（Central Processing Unit）７０１と、ＲＯＭ（Read Only Memory)７０２と、ＲＡＭ（Random Access Memory）７０３と、フラッシュメモリ７０４と、ＳＳＤ（Solid State Drive）７０５と、メディアＩ／Ｆ(Interface)７０７と、操作ボタン７０８と、電源スイッチ７０９とを有する。またビデオ会議端末１０は、バスライン７１０と、ネットワークＩ／Ｆ７１１と、カメラ７１２と、撮像素子Ｉ／Ｆ７１３と、マイク７１４と、スピーカ７１５と、音入出力Ｉ／Ｆ７１６と、ディスプレイＩ／Ｆ７１７と、外部機器接続Ｉ／Ｆ７１８と、近距離通信回路７１９と、近距離通信回路７１９のアンテナ７１９ａとを有する。 <Hardware configuration of the video conferencing terminal according to the embodiment>
Next, FIG. 2 is a diagram illustrating an example of the hardware configuration of the video conferencing terminal 10. As shown in FIG. 2, the video conferencing terminal 10 includes a CPU (Central Processing Unit) 701, a ROM (Read Only Memory) 702, a RAM (Random Access Memory) 703, a flash memory 704, and an SSD (Solid State Drive). ) 705, a media I / F (Interface) 707, an operation button 708, and a power switch 709. The video conferencing terminal 10 includes a bus line 710, a network I / F 711, a camera 712, an image pickup element I / F 713, a microphone 714, a speaker 715, a sound input / output I / F 716, and a display I / F 717. It has an external device connection I / F 718, a short-range communication circuit 719, and an antenna 719a of the short-range communication circuit 719.

これらのうち、ＣＰＵ７０１は、ビデオ会議端末１０全体の動作を制御する。ＲＯＭ７０２は、ＩＰＬ（Initial Program Loader）等のＣＰＵ７０１の駆動に用いられるプログラムを記憶する。ＳＳＤ７０４は、ＣＰＵ７０１のワークエリアとして使用される。フラッシュメモリ７０４は、通信用プログラム、画像データ、及び音データ等の各種データを記憶する。 Of these, the CPU 701 controls the operation of the entire video conferencing terminal 10. The ROM 702 stores a program used to drive the CPU 701 such as an IPL (Initial Program Loader). The SSD 704 is used as a work area for the CPU 701. The flash memory 704 stores various data such as a communication program, image data, and sound data.

ＳＳＤ７０５は、ＣＰＵ７０１の制御に従ってフラッシュメモリ７０４に対する各種データの読み出し又は書き込みを制御する。なお、ＳＳＤに代えてＨＤＤ（Hard Disk Drive）を用いても良い。メディアＩ／Ｆ７０７は、フラッシュメモリ等の記録メディア７０６に対するデータの読み出し又は書き込み（記憶）を制御する。 The SSD 705 controls reading or writing of various data to the flash memory 704 according to the control of the CPU 701. An HDD (Hard Disk Drive) may be used instead of the SSD. The media I / F 707 controls reading or writing (storage) of data to a recording medium 706 such as a flash memory.

操作ボタン７０８は、図１に示した複数の操作ボタン（１０８ａ〜１０８ｅ）を含み、ビデオ会議端末１０の宛先を選択する場合等に操作される。但し、ビデオ会議端末１０の操作は、操作ボタン７０８に限定されるものではなく、タッチパネル機能が搭載された液晶表示装置（ＬＣＤ；Liquid Crystal Display）や有機ＥＬ（Organic Electro-Luminescence）表示装置で構成されてもよい。 The operation button 708 includes a plurality of operation buttons (108a to 108e) shown in FIG. 1 and is operated when selecting a destination of the video conferencing terminal 10. However, the operation of the video conferencing terminal 10 is not limited to the operation buttons 708, and is composed of a liquid crystal display (LCD) and an organic EL (Organic Electro-Luminescence) display device equipped with a touch panel function. May be done.

電源スイッチ７０９は、ビデオ会議端末１０の電源のＯＮ／ＯＦＦを切り換えるためのスイッチである。また、ネットワークＩ／Ｆ７１１は、インターネット等の通信ネットワークを利用してデータ通信をするためのインタフェースである。 The power switch 709 is a switch for switching the power ON / OFF of the video conferencing terminal 10. Further, the network I / F711 is an interface for data communication using a communication network such as the Internet.

ＣＭＯＳ(Complementary Metal Oxide Semiconductor)センサ７１２は、ＣＰＵ７０１の制御に従って被写体を撮像して画像データを得る内蔵型の撮像手段の一種である。なお、ＣＭＯＳセンサではなく、ＣＣＤ(Charge Coupled Device)センサ等の撮像手段であってもよい。撮像素子Ｉ／Ｆ７１３は、ＣＭＯＳセンサ７１２の駆動を制御する回路である。マイク７１４は、音を電気信号に変える内蔵型の回路である。スピーカ７１５は、電気信号を物理振動に変えて音楽や音声などの音を生み出す内蔵型の回路である。 The CMOS (Complementary Metal Oxide Semiconductor) sensor 712 is a kind of built-in imaging means for capturing an image of a subject and obtaining image data under the control of the CPU 701. Instead of a CMOS sensor, an imaging means such as a CCD (Charge Coupled Device) sensor may be used. The image sensor I / F 713 is a circuit that controls the drive of the CMOS sensor 712. The microphone 714 is a built-in circuit that converts sound into an electrical signal. The speaker 715 is a built-in circuit that converts an electric signal into physical vibration to produce sounds such as music and voice.

音入出力Ｉ／Ｆ７１６は、ＣＰＵ７０１の制御に従ってマイク７１４及びスピーカ７１５との間で音信号の入出力を処理する回路である。ディスプレイＩ／Ｆ７１７は、ＣＰＵ７０１の制御に従って外付けのディスプレイに画像データを送信する回路である。 The sound input / output I / F 716 is a circuit that processes sound signal input / output between the microphone 714 and the speaker 715 under the control of the CPU 701. The display I / F 717 is a circuit that transmits image data to an external display under the control of the CPU 701.

外部機器接続Ｉ／Ｆ７１８は、各種の外部機器を接続するためのインタフェースである。外部機器接続Ｉ／Ｆ７１８には、ＵＳＢ（Universal Serial Bus）ケーブル等によって、外付けカメラ、外付けマイク、及び外付けスピーカ等の外部機器がそれぞれ接続可能である。外付けカメラが接続された場合には、ＣＰＵ７０１の制御に従って、内蔵型のＣＭＯＳセンサ７１２に優先して、外付けカメラが駆動する。同じく、外付けマイクが接続された場合や、外付けスピーカが接続された場合には、ＣＰＵ７０１の制御に従って、それぞれが内蔵型のマイク７１４や内蔵型のスピーカ７１５に優先して、外付けマイクや外付けスピーカが駆動する。 The external device connection I / F718 is an interface for connecting various external devices. External devices such as an external camera, an external microphone, and an external speaker can be connected to the external device connection I / F718 by a USB (Universal Serial Bus) cable or the like. When an external camera is connected, the external camera is driven in preference to the built-in CMOS sensor 712 according to the control of the CPU 701. Similarly, when an external microphone is connected or an external speaker is connected, the external microphone and the external speaker are given priority over the built-in microphone 714 and the built-in speaker 715 according to the control of the CPU 701. The external speaker is driven.

近距離通信回路７１９は、ＮＦＣ(Near Field Communication)やＢｌｕｅｔｏｏｔｈ（登録商標）等の通信回路である。また、バスライン７１０は、図２に示されているＣＰＵ７０１等の各構成要素を電気的に接続するためのアドレスバスやデータバス等である。 The short-range communication circuit 719 is a communication circuit such as NFC (Near Field Communication) or Bluetooth (registered trademark). Further, the bus line 710 is an address bus, a data bus, or the like for electrically connecting each component such as the CPU 701 shown in FIG.

ディスプレイ７２０は、被写体の画像や操作用アイコン等を表示する液晶や有機ＥＬ(Electro Luminescence)等によって構成された表示手段の一種である。また、ディスプレイ７２０は、ケーブルによってディスプレイＩ／Ｆ７１７に接続される。このケーブルは、アナログＲＧＢ（ＶＧＡ）信号用のケーブルであってもよいし、コンポーネントビデオ用のケーブルであってもよいし、ＨＤＭＩ(High-Definition Multimedia Interface)（登録商標）やＤＶＩ(Digital Video Interactive)信号用のケーブルであってもよい。 The display 720 is a kind of display means composed of a liquid crystal, an organic EL (Electro Luminescence), or the like for displaying an image of a subject, an operation icon, or the like. Further, the display 720 is connected to the display I / F 717 by a cable. This cable may be a cable for analog RGB (VGA) signals, a cable for component video, HDMI (High-Definition Multimedia Interface) (registered trademark), or DVI (Digital Video Interactive). ) It may be a signal cable.

また、記録メディア７０６は、ビデオ会議端末１０に対して着脱自在な構成となっている。また、ＣＰＵ７０１の制御にしたがってデータの読み出し又は書き込みを行う不揮発性メモリであれば、フラッシュメモリ７０４に限らず、ＥＥＰＲＯＭ等を用いてもよい。 Further, the recording medium 706 has a structure that can be attached to and detached from the video conferencing terminal 10. Further, as long as it is a non-volatile memory that reads or writes data under the control of the CPU 701, not only the flash memory 704 but also an EEPROM or the like may be used.

＜実施形態に係るビデオ端末の機能構成＞
ここで、ビデオ会議端末１０を用いて動画像によるビデオ会議を行う場合等には、動画像内で人物の顔を含む顔領域は、会議参加者の表情等を相手方に伝えることができるため、コミュニケーションを円滑にするために重要性が高い領域となる。一方、顔を含まない非顔領域は、顔を含む領域と比較して重要性が低い領域である。 <Functional configuration of the video terminal according to the embodiment>
Here, when a video conference using a moving image is performed using the video conference terminal 10, the face area including the face of a person in the moving image can convey the facial expression of the conference participant to the other party. This is an area of high importance for facilitating communication. On the other hand, the non-face region that does not include the face is a region that is less important than the region that includes the face.

実施形態では、動画像等の画像内で、重要性が低い非顔領域のみに情報量を減少させる処理を実行する。これにより、重要性の高い顔領域における画像の品質を維持しつつ、画像の情報量を減少させて、画像や音声の品質が低下することを防いでいる。 In the embodiment, in an image such as a moving image, a process of reducing the amount of information is executed only in a non-face region having low importance. As a result, the quality of the image in the face region, which is highly important, is maintained, the amount of information in the image is reduced, and the quality of the image and the sound is prevented from being deteriorated.

図３は、実施形態に係るビデオ会議端末１０の機能構成の一例を説明するブロック図である。図３に示すように、ビデオ会議端末１０は、顔検出処理部３０１と、非顔領域設定部３０２と、情報量減少処理部３０３と、符号化処理部３０４とを有する。これらの各部は、図２のＣＰＵ７０１が所定のプログラムを実行すること等により実現される。 FIG. 3 is a block diagram illustrating an example of the functional configuration of the video conferencing terminal 10 according to the embodiment. As shown in FIG. 3, the video conferencing terminal 10 has a face detection processing unit 301, a non-face area setting unit 302, an information amount reduction processing unit 303, and a coding processing unit 304. Each of these parts is realized by the CPU 701 of FIG. 2 executing a predetermined program or the like.

顔検出処理部３０１は、ビデオ会議端末１０に所定のフレームレートで入力される動画像の各フレームの画像（入力画像）に含まれる人物の顔を検出する。そして、画像内で人物の顔に該当する顔領域を示す座標情報を、非顔領域設定部３０２に出力する。また、顔検出処理部３０１は、入力される動画像のフレーム間での人物の顔の動き量を検出し、フレームレート低下処理部３０７に出力する。 The face detection processing unit 301 detects the face of a person included in the image (input image) of each frame of the moving image input to the video conferencing terminal 10 at a predetermined frame rate. Then, the coordinate information indicating the face area corresponding to the face of the person in the image is output to the non-face area setting unit 302. Further, the face detection processing unit 301 detects the amount of movement of the person's face between the frames of the input moving image and outputs it to the frame rate reduction processing unit 307.

非顔領域設定部３０２は、入力した顔領域を示す座標情報に基づいて、顔領域以外の領域を非顔領域として設定し、非顔領域が設定された画像を情報量減少処理部３０３に出力する。 The non-face area setting unit 302 sets an area other than the face area as a non-face area based on the input coordinate information indicating the face area, and outputs an image in which the non-face area is set to the information amount reduction processing unit 303. To do.

情報量減少処理部３０３は、非顔領域設定部３０２から入力した画像で設定された非顔領域に対して、情報量を減少させる処理を実行する。また、情報量減少処理部３０３は、低域通過フィルタ処理部３０５と、コントラスト低減処理部３０６と、フレームレート低下処理部３０７とを有する。 The information amount reduction processing unit 303 executes a process of reducing the amount of information for the non-face area set by the image input from the non-face area setting unit 302. Further, the information amount reduction processing unit 303 includes a low-pass filter processing unit 305, a contrast reduction processing unit 306, and a frame rate reduction processing unit 307.

低域通過フィルタ処理部３０５は、入力した画像の非顔領域に対し、選択的に低域通過フィルタ処理を実行する。低域通過フィルタ処理は、画像における低い空間周波数帯域のみを通過させ、他の帯域を遮断する所謂ローパスフィルタ処理である。 The low-pass filter processing unit 305 selectively executes the low-pass filter processing on the non-face region of the input image. The low-pass filter processing is a so-called low-pass filter processing in which only a low spatial frequency band in an image is passed and other bands are blocked.

ここで、非顔領域に対して実行する低域通過フィルタ処理部３０５による処理を、図４を参照して説明する。図４は、会議室において、３人の人物が机４１を囲んで会議を行う様子が含まれる画像４０を示している。画像４０において、破線で示した３つの領域４２は、それぞれ人物の顔を含む顔領域である。画像４０における領域４２以外の領域は、人物の顔を含まない非顔領域である。低域通過フィルタ処理部３０５は、このような非顔領域に対し、選択的に低域通過フィルタリング処理を実行する。なお、以下で説明するコントラスト低減処理部３０６、及びフレームレート低下処理部３０７でも同様に、このような非顔領域に対して、選択的に処理を実行する。 Here, the processing performed by the low-pass filter processing unit 305 for the non-face region will be described with reference to FIG. FIG. 4 shows an image 40 including a state in which three people hold a meeting around a desk 41 in a meeting room. In the image 40, the three regions 42 shown by the broken lines are face regions including the face of a person. The area other than the area 42 in the image 40 is a non-face area that does not include a person's face. The low-pass filter processing unit 305 selectively executes the low-pass filtering process on such a non-face region. Similarly, the contrast reduction processing unit 306 and the frame rate reduction processing unit 307 described below also selectively execute processing on such a non-face region.

図３に戻り、図３の説明を続ける。コントラスト低減処理部３０６は、非顔領域に対し、選択的にコントラストを低減させる処理を実行する。コントラストとは画像における明暗の差であり、コントラスト低減処理部３０６は、これを低減、すなわち明暗の差を小さくする処理を行う。 Returning to FIG. 3, the description of FIG. 3 is continued. The contrast reduction processing unit 306 executes a process of selectively reducing the contrast in the non-face region. The contrast is the difference between light and dark in an image, and the contrast reduction processing unit 306 performs a process of reducing this, that is, reducing the difference between light and dark.

フレームレート低下処理部３０７は、入力画像の非顔領域に対し、選択的にフレームレートを低下させる処理を実行する。フレームレートとは、動画像において、単位時間あたりに処理させるフレーム数、すなわち静止画像数、又はコマ数である。一例として、動画像のフレームレートを３０ｆｐｓ（frame per Second）とした場合に、フレームレート低下処理部３０７は、非顔領域に対し、選択的にフレームレートを１５ｆｐｓ等に低下させる。 The frame rate reduction processing unit 307 executes a process of selectively reducing the frame rate for the non-face region of the input image. The frame rate is the number of frames processed per unit time in a moving image, that is, the number of still images or the number of frames. As an example, when the frame rate of a moving image is set to 30 fps (frame per Second), the frame rate reduction processing unit 307 selectively reduces the frame rate to 15 fps or the like with respect to the non-face region.

ここで、フレームレート低下処理部３０７による処理を、図５を参照して説明する。図５は、動画像における画像の領域毎での更新タイミングの一例を示すタイミングチャートである。 Here, the processing by the frame rate reduction processing unit 307 will be described with reference to FIG. FIG. 5 is a timing chart showing an example of update timing for each image region in a moving image.

図５において、信号ＦＲは、動画像における１フレームの画像がビデオ会議端末１０に入力するタイミングを示す信号である。図５では、４フレームの画像が、それぞれ信号ＦＲで示すタイミングでビデオ会議端末１０に入力している。 In FIG. 5, the signal FR is a signal indicating the timing at which one frame of the moving image is input to the video conferencing terminal 10. In FIG. 5, four-frame images are input to the video conferencing terminal 10 at the timing indicated by the signal FR, respectively.

信号ＡＲ１は、動画像における１フレームの画像の顔領域が更新されるタイミングを示す信号である。信号ＡＲ２は、同様に非顔領域が更新されるタイミングを示す信号である。信号ＡＲ１では、１フレームの画像が１回入力される毎に、顔領域の画像が１回更新される。これに対し、信号ＡＲ２では、１フレームの画像が２回入力される毎に、非顔領域の画像が１回更新される。つまり、非顔領域では、更新が１回間引かれることで、顔領域に対してフレームレートが低下している。フレームレート低下処理部３０７は、このようにして、入力される画像の非顔領域のフレームレートを選択的に低下させることができる。 The signal AR1 is a signal indicating the timing at which the face region of the image of one frame in the moving image is updated. The signal AR2 is also a signal indicating the timing at which the non-face region is updated. In the signal AR1, the image of the face area is updated once every time the image of one frame is input once. On the other hand, in the signal AR2, the image in the non-face region is updated once every time the image of one frame is input twice. That is, in the non-face region, the frame rate is lowered with respect to the face region by thinning out the update once. In this way, the frame rate reduction processing unit 307 can selectively reduce the frame rate of the non-face region of the input image.

図３に戻り、図３の説明を続ける。情報量減少処理部３０３は、入力画像の非顔領域に対して選択的に情報量を減少させた画像を、符号化処理部３０４に出力する。なお、実施形態では、低域通過フィルタ処理部３０５、コントラスト低減処理部３０６、及びフレームレート低下処理部３０７の各処理が全て実行される例を示すが、これに限定されるものではない。情報量減少処理部３０３は、低域通過フィルタ処理部３０５、コントラスト低減処理部３０６、又はフレームレート低下処理部３０７の少なくとも１つを有し、各部による少なくとも１つの処理が実行されるようにしても良い。 Returning to FIG. 3, the description of FIG. 3 is continued. The information amount reduction processing unit 303 outputs an image in which the amount of information is selectively reduced with respect to the non-face region of the input image to the coding processing unit 304. In the embodiment, examples are shown in which the low-pass filter processing unit 305, the contrast reduction processing unit 306, and the frame rate reduction processing unit 307 are all executed, but the present invention is not limited thereto. The information amount reduction processing unit 303 has at least one of a low-pass filter processing unit 305, a contrast reduction processing unit 306, or a frame rate reduction processing unit 307, so that at least one processing by each unit is executed. Is also good.

符号化処理部３０４は、情報量減少処理部３０３から入力した画像を符号化する。符号化処理部３０４は、一例として動画圧縮規格の一つであるＨ．２６４に基づく符号化処理を実行する。符号化された画像は、動画像における１フレームの画像として、ビデオ会議端末１０から外部装置に順次出力される。外部装置はビデオ会議を行う相手方のビデオ会議端末等であり、符号化処理部３０４により符号化された画像は、ネットワークを介して相手方のビデオ会議端末に出力される。 The coding processing unit 304 encodes the image input from the information amount reduction processing unit 303. The coding processing unit 304 is, for example, H.I., which is one of the moving image compression standards. The coding process based on 264 is executed. The encoded image is sequentially output from the video conferencing terminal 10 to an external device as a one-frame image in the moving image. The external device is a video conferencing terminal or the like of the other party performing the video conferencing, and the image encoded by the coding processing unit 304 is output to the video conferencing terminal of the other party via the network.

次に、上記の低域通過フィルタ処理部３０５、及びコントラスト低減処理部３０６のそれぞれによる画像の情報量の減少効果を、図６を参照して説明する。 Next, the effect of reducing the amount of information in the image by each of the low-pass filter processing unit 305 and the contrast reduction processing unit 306 will be described with reference to FIG.

図６は、ビデオ会議端末１０による画像の情報量の減少効果を説明する図である。図６において、縦軸は画像の情報量を示している。但し、縦軸は、非顔領域に対して何の処理も行わなかった場合の画像の情報量を１として規格化されたスケールで表示されている。横軸は各処理を表す分類を示している。図６では、分類６１〜６５が示され、また、分類毎で８つの棒グラフが示されている。８つの棒グラフは、８種類のテスト画像に対する処理結果を示している。 FIG. 6 is a diagram illustrating the effect of reducing the amount of information in the image by the video conferencing terminal 10. In FIG. 6, the vertical axis shows the amount of information in the image. However, the vertical axis is displayed on a standardized scale with the amount of information of the image when no processing is performed on the non-face region as 1. The horizontal axis shows the classification representing each process. In FIG. 6, classifications 61-65 are shown, and eight bar graphs are shown for each classification. The eight bar graphs show the processing results for eight types of test images.

分類６１は、非顔領域に対して何の処理も行わなかった場合である。分類６２は、非顔領域に対して、低域通過フィルタ処理部３０５による処理を実行した場合である。分類６３は、第１比較例として、非顔領域に対して色数を減少させる処理を実行した場合である。分類６４は、非顔領域に対してコントラスト低減処理部３０６による処理を実行した場合である。分類６５は、第２比較例として、非顔領域に対してノイズ除去処理を実行した場合である。 Classification 61 is a case where no processing is performed on the non-face region. Classification 62 is a case where the non-face region is processed by the low-pass filter processing unit 305. Classification 63 is a case where a process of reducing the number of colors is executed in the non-face region as a first comparative example. Classification 64 is a case where the non-face region is processed by the contrast reduction processing unit 306. Classification 65 is a case where noise removal processing is executed on a non-face region as a second comparative example.

分類６２における「１／２」、「１／４」、及び「１／８」は、低域通過フィルタ処理により高い空間周波数帯域が遮断された分、画像の解像度が低下したレベルを示している。分類６３における「１／２」、「１／４」、及び「１／８」は、色数の減少処理により画像の色数が減少したレベルを示している。また分類６４における「１／２」、「１／４」、及び「１／８」は、コントラスト低減処理により画像のコントラストが低減したレベルを示している。 “1/2”, “1/4”, and “1/8” in the classification 62 indicate the level at which the resolution of the image is lowered by the amount that the high spatial frequency band is blocked by the low-pass filter processing. .. “1/2”, “1/4”, and “1/8” in the classification 63 indicate the level at which the number of colors in the image is reduced by the reduction processing of the number of colors. Further, "1/2", "1/4", and "1/8" in the classification 64 indicate the level at which the contrast of the image is reduced by the contrast reduction processing.

上述したように、分類６１の処理を規格化の標準としているため、分類６１では、８つの棒グラフとも画像の情報量は全て１である。分類６２では、解像度の低下に応じて画像の情報量が大きく減少している。分類６３では、色数の減少に伴う画像の情報量の減少効果はみられず、色数の減少に応じて、むしろ情報量が増大している。これは、色数の減少によりトーンジャンプが増え、圧縮効果が阻害されたためと考えられる。分類６４では、解像度のコントラストの低減に応じて画像の情報量が減少している。分類６５では、ノイズ除去により、画像の情報量がやや減少している。 As described above, since the processing of classification 61 is used as the standard for standardization, in classification 61, the amount of image information is 1 for all eight bar graphs. In classification 62, the amount of information in the image is greatly reduced as the resolution is lowered. In classification 63, the effect of reducing the amount of information in the image due to the decrease in the number of colors is not observed, but rather the amount of information increases in accordance with the decrease in the number of colors. It is considered that this is because the tone jump increased due to the decrease in the number of colors and the compression effect was hindered. In classification 64, the amount of information in the image decreases as the contrast of the resolution decreases. In classification 65, the amount of information in the image is slightly reduced due to noise removal.

図６から分かるように、低域通過フィルタ処理部３０５による低域通過フィルタ処理、及びコントラスト低減処理部３０６によるコントラスト低減処理は、第１及び第２比較例と比較して、より好適に画像の情報量を減少させることができる。 As can be seen from FIG. 6, the low-pass filter processing by the low-pass filter processing unit 305 and the contrast reduction processing by the contrast reduction processing unit 306 are more preferably performed in comparison with the first and second comparative examples. The amount of information can be reduced.

次に、実施形態に係るビデオ会議端末１０による処理の一例を、図７のフローチャートを参照して説明する。 Next, an example of processing by the video conferencing terminal 10 according to the embodiment will be described with reference to the flowchart of FIG. 7.

まず、ステップＳ７１において、顔検出処理部３０１は、動画像のうちの１フレームの画像を入力する。 First, in step S71, the face detection processing unit 301 inputs an image of one frame of the moving images.

続いて、ステップＳ７２において、顔検出処理部３０１は、入力画像に対して顔検出処理を実行し、検出した顔領域の座標情報を非顔領域設定部３０２に出力する。 Subsequently, in step S72, the face detection processing unit 301 executes face detection processing on the input image and outputs the coordinate information of the detected face area to the non-face area setting unit 302.

続いて、ステップＳ７３において、非顔領域設定部３０２は、入力画像において非顔領域を設定し、設定後の画像を情報量減少処理部３０３に出力する。 Subsequently, in step S73, the non-face area setting unit 302 sets the non-face area in the input image, and outputs the set image to the information amount reduction processing unit 303.

続いて、ステップＳ７４において、情報量減少処理部３０３に含まれる低域通過フィルタ処理部３０５は、非顔領域に対し、選択的に低域通過フィルタ処理を実行し、処理後の画像をコントラスト低減処理部３０６に出力する。 Subsequently, in step S74, the low-pass filter processing unit 305 included in the information amount reduction processing unit 303 selectively executes the low-pass filter processing on the non-face region to reduce the contrast of the processed image. Output to the processing unit 306.

続いて、ステップＳ７５において、コントラスト低減処理部３０６は、非顔領域に対し、選択的にコントラスト低減処理を実行し、処理後の画像をフレームレート低下処理部３０７に出力する。 Subsequently, in step S75, the contrast reduction processing unit 306 selectively executes the contrast reduction processing on the non-face region, and outputs the processed image to the frame rate reduction processing unit 307.

続いて、ステップＳ７６において、フレームレート低下処理部３０７は、顔検出処理部３０１が検出した顔の動き量が予め規定した動き閾値以下であるか否かを判定する。一例として、顔検出処理部３０１は、ＲＡＭ７０３等に記憶された、前の１フレームの画像における顔領域の画素輝度の総和と、現在処理を実行している１フレームの画像における顔領域の画素輝度の総和を比較することで、顔の動き量を検出することができる。 Subsequently, in step S76, the frame rate reduction processing unit 307 determines whether or not the amount of movement of the face detected by the face detection processing unit 301 is equal to or less than a predetermined movement threshold value. As an example, the face detection processing unit 301 includes the sum of the pixel brightness of the face area in the image of the previous one frame stored in the RAM 703 or the like and the pixel brightness of the face area in the image of the one frame currently being processed. By comparing the sum of the above, the amount of facial movement can be detected.

ステップＳ７６で、顔の動き量が動き閾値以下であると判定された場合（ステップＳ７６、Ｙｅｓ）、ステップＳ７７において、フレームレート低下処理部３０７は、非顔領域に対し、選択的にフレームレート低下処理を実行する。一方、ステップＳ７６で、顔の動き量が動き閾値以下でないと判定された場合（ステップＳ７６、Ｎｏ）、ステップＳ７８に移行する。なお、上記の動き閾値は予め求められ、ＳＳＤ７０４等に記憶されている。 When it is determined in step S76 that the amount of movement of the face is equal to or less than the movement threshold value (step S76, Yes), in step S77, the frame rate reduction processing unit 307 selectively reduces the frame rate with respect to the non-face region. Execute the process. On the other hand, if it is determined in step S76 that the amount of movement of the face is not equal to or less than the movement threshold value (steps S76, No), the process proceeds to step S78. The movement threshold value is obtained in advance and stored in SSD 704 or the like.

一方で、次の１フレームの画像における顔の動き量の検出に用いるために、顔検出処理部３０１は、現在の１フレームの画像をＲＡＭ７０３等に記憶する。 On the other hand, the face detection processing unit 301 stores the current one-frame image in the RAM 703 or the like in order to use it for detecting the amount of movement of the face in the next one-frame image.

ここで、顔の動き量の検出について補足する。動画像におけるフレ−ム間での変化が小さい場合は、非顔領域に対して選択的にフレームレート低下処理を実行しても、処理後の動画像が不自然になることはない。しかし、フレ−ム間での変化が大きい場合に、非顔領域に対して選択的にフレームレート低下処理を実行すると、フレーム間での変化がぎこちなく視認され、処理後の動画像が不自然になる場合がある。 Here, the detection of the amount of facial movement is supplemented. When the change between frames in the moving image is small, even if the frame rate reduction processing is selectively executed for the non-face region, the moving image after the processing does not become unnatural. However, if the frame rate reduction processing is selectively executed for the non-face region when the change between frames is large, the change between frames is awkwardly visually recognized, and the processed moving image becomes unnatural. May become.

実施形態では、検出した顔の動き量が動き閾値以下の場合にのみ、入力画像の非顔領域に対して選択的にフレームレート低下処理を実行し、顔の動き量（フレーム間での変化）が大きい画像ではフレームレートを低下させないことで、動画像が不自然になることを防止することができる。 In the embodiment, the frame rate reduction processing is selectively executed for the non-face region of the input image only when the detected facial movement amount is equal to or less than the movement threshold value, and the facial movement amount (change between frames). By not lowering the frame rate for an image with a large image, it is possible to prevent the moving image from becoming unnatural.

なお、ステップＳ７４の低域通過フィルタ処理と、ステップＳ７５のコントラスト低減処理と、ステップＳ７６〜Ｓ７７のフレームレート低下処理は、適宜順番を入れ替えても良い。 The order of the low-pass filter processing in step S74, the contrast reduction processing in step S75, and the frame rate reduction processing in steps S76 to S77 may be changed as appropriate.

続いて、ステップＳ７８において、符号化処理部３０４は、情報量減少処理部３０３から入力した画像に対し、符号化処理を実行する。 Subsequently, in step S78, the coding processing unit 304 executes the coding processing on the image input from the information amount reduction processing unit 303.

続いて、ステップＳ７９において、符号化処理部３０４は、動画像における１フレームの画像として、符号化された画像を外部装置に出力する。 Subsequently, in step S79, the coding processing unit 304 outputs the coded image to the external device as a one-frame image in the moving image.

このようにして、画像内で重要性の高い人物の顔に対する情報量を維持しつつ、重要でない背景等の情報量を減少させることができる。 In this way, it is possible to reduce the amount of information such as an unimportant background while maintaining the amount of information on the face of a person who is highly important in the image.

以上説明したように、実施形態によれば、主観的な画像品質を保持しつつ、画像の情報量を減少させ、画像の圧縮率を高めることができる。これにより、例えば、インターネット回線を利用して動画像を伝送する場合に、動画像や音声の品質が低下することを防ぐことができる。また、監視カメラのように大量の動画像をストレージする場合においても、記憶容量を減少させることができるため、好適である。 As described above, according to the embodiment, it is possible to reduce the amount of information in the image and increase the compression rate of the image while maintaining the subjective image quality. Thereby, for example, when the moving image is transmitted using the Internet line, it is possible to prevent the quality of the moving image and the sound from being deteriorated. Further, even when a large amount of moving images are stored like a surveillance camera, the storage capacity can be reduced, which is preferable.

一方、画像の圧縮率を高める手法として、画像のノイズ信号を低減することも考えられる。しかし、ノイズ信号の低減は、暗い部屋でプロジェクタ等を用いてビデオ会議する場合等には効果があるが、明るい部屋でビデオ会議する場合には、画像のノイズ信号自体がそもそも小さくなるため、大きな効果は得られない。近年は、液晶ディスプレイ等のバックライト型の表示装置が用いられ、明るい部屋でビデオ会議が行われることが多いため、ノイズ信号の低減による画像の圧縮率の向上は得にくい。実施形態に係るビデオ会議端末１０は、このようなノイズ信号を低減させる手法と比較して、より画像の圧縮率を高めることができる。 On the other hand, as a method of increasing the compression rate of the image, it is also conceivable to reduce the noise signal of the image. However, the reduction of the noise signal is effective in the case of video conferencing using a projector or the like in a dark room, but it is large in the case of video conferencing in a bright room because the noise signal itself of the image becomes small in the first place. No effect is obtained. In recent years, a backlight type display device such as a liquid crystal display has been used, and video conferencing is often performed in a bright room. Therefore, it is difficult to improve the image compression rate by reducing noise signals. The video conferencing terminal 10 according to the embodiment can further increase the compression rate of the image as compared with the method of reducing such a noise signal.

［第１の実施形態］
次に、第１の実施形態に係るビデオ会議端末１０ａについて説明する。なお、既に説明した実施形態と同一の構成部分についての説明は省略する。 [First Embodiment]
Next, the video conferencing terminal 10a according to the first embodiment will be described. The description of the same components as those of the above-described embodiment will be omitted.

上述したように、人物の顔を含まない非顔領域は、顔領域と比較して重要性が低いため、情報量をできるだけ減少させることが好ましい。しかし、顔領域と非顔領域とで明るさ等の画像特性（特徴量）が大きく異なる場合に、顔領域と非顔領域の情報量を大きく異ならせると、領域間の境界が目立ち、違和感のある低品質の画像になる場合がある。 As described above, the non-face region that does not include the human face is less important than the face region, and therefore it is preferable to reduce the amount of information as much as possible. However, when the image characteristics (feature amount) such as brightness are significantly different between the face area and the non-face area, if the amount of information between the face area and the non-face area is significantly different, the boundary between the areas becomes conspicuous and the feeling of strangeness is felt. It may result in some low quality image.

そこで、本実施形態では、顔領域と非顔領域とで明るさ等の特徴量が大きく異なる場合は、非顔領域の情報量を顔領域に対して急激に減少させず、顔領域から遠ざかるにつれて徐々に情報量を低下させることで、処理後の画像の品質の低下を抑制している。 Therefore, in the present embodiment, when the feature amount such as brightness is significantly different between the face area and the non-face area, the amount of information in the non-face area is not sharply reduced with respect to the face area, and as the distance from the face area increases. By gradually reducing the amount of information, deterioration of the quality of the processed image is suppressed.

図８は、第１の実施形態に係るビデオ会議端末１０ａの機能構成の一例を説明するブロック図である。図８に示すように、ビデオ会議端末１０ａは、第１分割処理部３０８と、情報量減少処理部３０３ａとを有する。 FIG. 8 is a block diagram illustrating an example of the functional configuration of the video conferencing terminal 10a according to the first embodiment. As shown in FIG. 8, the video conferencing terminal 10a has a first partition processing unit 308 and an information amount reduction processing unit 303a.

第１分割処理部３０８は、非顔領域設定部３０２から入力した画像の非顔領域を、顔を囲む複数の枠状領域に分割する処理を実行する。この分割処理のために、分割数Ｎや顔領域からの距離Ｄ等の第１分割情報が予め定められ、図２のＳＳＤ７０４等に記憶されている。また、第１分割処理部３０８は、枠状領域設定部３０９と、第１特徴量抽出部３１０と、第１分割決定部３１１とを有する。 The first division processing unit 308 executes a process of dividing the non-face region of the image input from the non-face region setting unit 302 into a plurality of frame-shaped regions surrounding the face. For this division process, first division information such as the number of divisions N and the distance D from the face area is predetermined and stored in SSD 704 or the like in FIG. Further, the first division processing unit 308 includes a frame-shaped region setting unit 309, a first feature amount extraction unit 310, and a first division determination unit 311.

枠状領域設定部３０９は、ＳＳＤ７０４等に記憶された第１分割情報に従って、検出された顔を囲むように複数の矩形の枠状領域を設定する。ここで、図９は、枠状領域設定部３０９により設定された枠状領域の一例を説明する図である。なお、図９に白抜きの矢印で示した方向は、画像のＸ座標に該当するＸ方向と、Ｙ座標に該当するＹ方向をそれぞれ示している。 The frame-shaped area setting unit 309 sets a plurality of rectangular frame-shaped areas so as to surround the detected face according to the first division information stored in the SSD 704 or the like. Here, FIG. 9 is a diagram illustrating an example of a frame-shaped region set by the frame-shaped region setting unit 309. The directions indicated by the white arrows in FIG. 9 indicate the X direction corresponding to the X coordinate of the image and the Y direction corresponding to the Y coordinate, respectively.

図９において、顔領域４２ａ及び４２ｂは、顔検出処理部３０１により検出された２人の人物の顔に該当する領域である。顔領域４２ａの周囲には、矩形の枠状領域９１ａ、９２ａ及び９３ａがそれぞれ設定され、また、顔領域４２ｂの周囲には、矩形の枠状領域９１ｂ、９２ｂ及び９３ｂがそれぞれ設定されている。 In FIG. 9, the face areas 42a and 42b are areas corresponding to the faces of two persons detected by the face detection processing unit 301. Rectangular frame-shaped regions 91a, 92a and 93a are set around the face region 42a, respectively, and rectangular frame-shaped regions 91b, 92b and 93b are set around the face region 42b, respectively.

枠状領域９１ａは、顔領域４２ａの中心位置からＸ、及びＹ方向にそれぞれ距離Ｄ１以上で距離Ｄ２未満離れた領域である。なお、距離の単位は画素数である。枠状領域９２ａは、顔領域４２ａの中心位置からＸ、及びＹ方向にそれぞれ距離Ｄ２以上で距離Ｄ３未満離れた領域である。枠状領域９３ａは、顔領域４２ａの中心位置からＸ、及びＹ方向にそれぞれ距離Ｄ３以上離れた領域である。 The frame-shaped region 91a is a region separated from the center position of the face region 42a in the X and Y directions by a distance D1 or more and a distance less than D2, respectively. The unit of distance is the number of pixels. The frame-shaped region 92a is a region separated from the center position of the face region 42a in the X and Y directions by a distance D2 or more and a distance less than D3, respectively. The frame-shaped region 93a is a region separated from the center position of the face region 42a in the X and Y directions by a distance D3 or more, respectively.

また、枠状領域９１ｂは、顔領域４２ｂの中心位置からＸ、及びＹ方向にそれぞれ距離Ｄ１以上でＤ２未満離れた領域である。枠状領域９２ｂは、顔領域４２ｂの中心位置からＸ、及びＹ方向にそれぞれ距離Ｄ２以上でＤ３未満離れた領域である。枠状領域９３ｂは、顔領域４２ｂの中心位置からＸ、及びＹ方向にそれぞれ距離Ｄ３以上離れた領域である。 Further, the frame-shaped region 91b is a region separated from the center position of the face region 42b in the X and Y directions at a distance of D1 or more and less than D2, respectively. The frame-shaped region 92b is a region separated from the center position of the face region 42b in the X and Y directions at a distance of D2 or more and less than D3, respectively. The frame-shaped region 93b is a region separated from the center position of the face region 42b in the X and Y directions by a distance D3 or more, respectively.

ここで、顔領域４２ａと顔領域４２ｂの間に該当する画像領域では、枠状領域が重複する場合がある。例えば、図９において、枠状領域９３における顔領域４２ａの負のＸ方向側（顔領域４２ｂ側）の領域は、枠状領域９１ｂにおける顔領域４２ｂの正のＸ方向側（顔領域４２ａ側）の領域と重複している。このように枠状領域が重複する領域は、両方の枠状領域に属する領域として設定される。一方、枠状領域が顔領域と重複する場合は、重複する領域は枠状領域から除外される。 Here, in the corresponding image region between the face region 42a and the face region 42b, the frame-shaped region may overlap. For example, in FIG. 9, the negative X-direction side (face region 42b side) of the face region 42a in the frame-shaped region 93 is the positive X-direction side (face region 42a side) of the face region 42b in the frame-shaped region 91b. It overlaps with the area of. The region where the frame-shaped regions overlap in this way is set as a region belonging to both frame-shaped regions. On the other hand, when the frame-shaped area overlaps with the face area, the overlapping area is excluded from the frame-shaped area.

但し、重複する領域に対する設定は、上述したものに限定されるものではなく、重複する領域に対する設定の規則を予め定めておき、規則に従って設定することが可能である。 However, the setting for the overlapping area is not limited to the above-mentioned one, and it is possible to set the setting rule for the overlapping area in advance and set according to the rule.

図９のように矩形の枠状領域を設定した後、第１特徴量抽出部３１０は、顔領域及び各枠状領域のそれぞれで、領域内に含まれる全画素の輝度の平均値を算出し、隣接する枠状領域同士の間、及び隣接する顔領域と枠状領域との間での輝度の平均値の差を算出する。ここで、領域内に含まれる全画素の輝度の平均値の差は、「第１特徴量」の一例である。 After setting the rectangular frame-shaped area as shown in FIG. 9, the first feature amount extraction unit 310 calculates the average value of the brightness of all the pixels included in the face area and each frame-shaped area in each of the face area and each frame-shaped area. , Calculate the difference in the average brightness between adjacent frame-shaped regions and between adjacent face regions and frame-shaped regions. Here, the difference in the average value of the brightness of all the pixels included in the region is an example of the "first feature amount".

より具体的には、第１特徴量抽出部３１０は、顔領域４２ａに含まれる全画素の輝度の平均値Ａｖｅ４２ａ、枠状領域９１ａに含まれる全画素の輝度の平均値Ａｖｅ９１ａ、枠状領域９２ａに含まれる全画素の輝度の平均値Ａｖｅ９２ａ、枠状領域９３ａに含まれる全画素の輝度の平均値Ａｖｅ９３ａを、それぞれ算出する。また、第１特徴量抽出部３１０は、顔領域４２ｂに含まれる全画素の輝度の平均値Ａｖｅ４２ｂ、枠状領域９１ｂに含まれる全画素の輝度の平均値Ａｖｅ９１ｂ、枠状領域９２ｂに含まれる全画素の輝度の平均値Ａｖｅ９２ｂ、枠状領域９３ｂに含まれる全画素の輝度の平均値Ａｖｅ９３ｂを、それぞれ算出する。 More specifically, the first feature amount extraction unit 310 has an average luminance value Ave42a of all the pixels included in the face region 42a, an average luminance value Ave91a of all the pixels included in the frame-shaped region 91a, and a frame-shaped region 92a. The average value Ave92a of the brightness of all the pixels included in the above and the average value Ave93a of the brightness of all the pixels included in the frame-shaped region 93a are calculated respectively. In addition, the first feature amount extraction unit 310 includes an average value Ave42b of the brightness of all the pixels included in the face area 42b, an average value Ave91b of the brightness of all the pixels included in the frame-shaped area 91b, and all included in the frame-shaped area 92b. The average value Ave92b of the brightness of the pixels and the average value Ave93b of the brightness of all the pixels included in the frame-shaped region 93b are calculated, respectively.

そして、第１特徴量抽出部３１０は、平均値の差として「Ａｖｅ４２ａ−Ａｖｅ９１ａ」、「Ａｖｅ９１ａ−Ａｖｅ９２ａ」、「Ａｖｅ９２ａ−Ａｖｅ９３ａ」、「Ａｖｅ４２ｂ−Ａｖｅ９１ｂ」、「Ａｖｅ９１ｂ−Ａｖｅ９２ｂ」、及び「Ａｖｅ９２ｂ−Ａｖｅ９３ｂ」をそれぞれ算出し、算出結果を第１分割決定部３１１に出力する。 Then, the first feature amount extraction unit 310 has "Ave42a-Ave91a", "Ave91a-Ave92a", "Ave92a-Ave93a", "Ave42b-Ave91b", "Ave91b-Ave92b", and "Ave92b-" as the difference between the average values. "Ave93b" is calculated respectively, and the calculation result is output to the first division determination unit 311.

その後、第１分割決定部３１１は、算出された輝度の平均値の差のそれぞれを、予め定められた第１輝度閾値と比較し、平均値の差が第１輝度閾値より大きいか否かを判定する。輝度の平均値の差が第１輝度閾値より大きいと判定された場合、第１分割決定部３１１は、隣接する領域を２つの領域として分割する。一方、輝度の平均値の値が第１輝度閾値以下の場合、第１分割決定部３１１は、隣接する領域を分割せず、両者を合体させて１つの枠状領域とする。ここで、第１輝度閾値は、「第１閾値」の一例である。 After that, the first division determination unit 311 compares each of the calculated differences in the average values of the brightness with the predetermined first brightness threshold value, and determines whether or not the difference in the average value is larger than the first brightness threshold value. judge. When it is determined that the difference between the average values of the luminances is larger than the first luminance threshold value, the first division determination unit 311 divides the adjacent regions into two regions. On the other hand, when the average value of the brightness is equal to or less than the first brightness threshold value, the first division determination unit 311 does not divide the adjacent regions, but combines the two to form one frame-shaped region. Here, the first luminance threshold is an example of the "first threshold".

一例として、平均値の差「Ａｖｅ４２ａ−Ａｖｅ９１ａ」、「Ａｖｅ９１ａ−Ａｖｅ９２ａ」、「Ａｖｅ９２ａ−Ａｖｅ９３ａ」、及び「Ａｖｅ４２ｂ−Ａｖｅ９１ｂ」が第１輝度閾値より大きく、平均値の差「Ａｖｅ９１ｂ−Ａｖｅ９２ｂ」及び「Ａｖｅ９２ｂ−Ａｖｅ９３ｂ」が第１輝度閾値より小さい場合、図１０に示すように非顔領域が分割される。図１０は、第１分割処理部３０８により分割された非顔領域の一例を説明する図である。 As an example, the difference between the average values "Ave42a-Ave91a", "Ave91a-Ave92a", "Ave92a-Ave93a", and "Ave42b-Ave91b" are larger than the first luminance threshold value, and the difference between the average values "Ave91b-Ave92b" and "Ave91b-Ave92b". When "Ave92b-Ave93b" is smaller than the first luminance threshold value, the non-face region is divided as shown in FIG. FIG. 10 is a diagram illustrating an example of a non-face region divided by the first division processing unit 308.

図１０において、平均値の差「Ａｖｅ４２ａ−Ａｖｅ９１ａ」、「Ａｖｅ９１ａ−Ａｖｅ９２ａ」、「Ａｖｅ９２ａ−Ａｖｅ９３ａ」及び「Ａｖｅ４２ｂ−Ａｖｅ９１ｂ」は第１輝度閾値より大きいため、隣接する顔領域４２ａと枠状領域９１ａ、隣接する枠状領域９１ａと枠状領域９２ａ、隣接する枠状領域９２ａと枠状領域９３ａ、及び隣接する顔領域４２ｂと枠状領域９１ｂは、それぞれ分割されている。分割された枠状領域はそれぞれ別々の領域として扱われ、情報量減少処理部３０３ａにより領域毎に異なる処理が実行される。 In FIG. 10, since the difference between the average values “Ave42a-Ave91a”, “Ave91a-Ave92a”, “Ave92a-Ave93a” and “Ave42b-Ave91b” is larger than the first luminance threshold value, the adjacent face region 42a and the frame-shaped region 91a , The adjacent frame-shaped region 91a and the frame-shaped region 92a, the adjacent frame-shaped region 92a and the frame-shaped region 93a, and the adjacent face region 42b and the frame-shaped region 91b are each divided. The divided frame-shaped regions are treated as separate regions, and the information amount reduction processing unit 303a executes different processing for each region.

一方、差分「Ａｖｅ９１ｂ−Ａｖｅ９２ｂ」及び「Ａｖｅ９２ｂ−Ａｖｅ９３ｂ」は第１輝度閾値以下であるため、枠状領域９１ｂ、９２ｂ及び９３ｂは分割されず、合体されて１つの枠状領域９３ｂとなっている。枠状領域９３ｂには、情報量減少処理部３０３ａにより共通の処理が実行される。 On the other hand, since the differences "Ave91b-Ave92b" and "Ave92b-Ave93b" are equal to or less than the first luminance threshold value, the frame-shaped regions 91b, 92b and 93b are not divided and are combined into one frame-shaped region 93b. .. A common process is executed in the frame-shaped region 93b by the information amount reduction processing unit 303a.

ここで、上述したように、顔領域４２ａと顔領域４２ｂの間に該当する画像領域では、枠状領域が重複する場合がある。枠状領域が重複する領域で、一方が分割しないと判定され、他方が分割すると判定された場合、分割する処理が優先して行われる。但し、重複する領域に対する設定は、上述したものに限定されるものではなく、重複する領域に対する設定の規則を予め定めておき、規則に従って分割処理を行うことができる。 Here, as described above, the frame-shaped region may overlap in the corresponding image region between the face region 42a and the face region 42b. In the area where the frame-shaped areas overlap, when it is determined that one is not divided and the other is determined to be divided, the process of dividing is prioritized. However, the setting for the overlapping area is not limited to the above-mentioned one, and the setting rule for the overlapping area can be set in advance and the division process can be performed according to the rule.

図８の情報量減少処理部３０３ａに含まれる低域通過フィルタ処理部３０５ａは、第１分割処理部３０８により分割された非顔領域の枠状領域毎に、異なるタップサイズの空間フィルタを用いて、空間フィルタ処理を実行する。 The low-pass filter processing unit 305a included in the information amount reduction processing unit 303a of FIG. 8 uses a spatial filter having a different tap size for each frame-shaped region of the non-face region divided by the first division processing unit 308. , Perform spatial filtering.

図１１は、ビデオ会議端末１０ａで使用される空間フィルタの一例を説明する図であり、（ａ）はサイズ３×３画素の空間フィルタ１０１を説明する図、（ｂ）はサイズ５×５画素の空間フィルタ１０２を説明する図である。 11A and 11B are views for explaining an example of a spatial filter used in the video conferencing terminal 10a, FIG. 11A is a diagram for explaining a spatial filter 101 having a size of 3 × 3 pixels, and FIG. 11B is a diagram for explaining a spatial filter 101 having a size of 5 × 5 pixels. It is a figure explaining the space filter 102 of.

低域通過フィルタ処理部３０５ａは、空間フィルタ１０１と枠状領域の各画素との間でコンボリューション（畳み込み積分）処理を実行する。空間フィルタ１０１の各升目に示された値が、枠状領域の各画素に積算及び加算されることで、枠状領域の隣接画素間での輝度の差が低減される。これにより、枠状領域の画像における空間周波数のうち、高周波帯域が遮断され、低域通過フィルタ（ローパスフィルタ）の効果が得られる。空間フィルタ１０２についても同様である。 The low-pass filter processing unit 305a executes a convolution (convolution integration) process between the spatial filter 101 and each pixel in the frame-shaped region. By integrating and adding the values shown in each square of the spatial filter 101 to each pixel in the frame-shaped region, the difference in brightness between the adjacent pixels in the frame-shaped region is reduced. As a result, the high frequency band of the spatial frequency in the image of the frame-shaped region is cut off, and the effect of the low-pass filter (low-pass filter) can be obtained. The same applies to the spatial filter 102.

タップサイズが大きいほど、枠状領域における隣接画素間での輝度の差が低減され、高周波側で遮断される周波数帯域が広くなって、低域通過フィルタによる情報量の減少効果が大きくなる。従って、３×３画素の空間フィルタ１０１より５×５画素の空間フィルタ１０２のほうが情報量の減少効果が大きい。 The larger the tap size, the smaller the difference in brightness between adjacent pixels in the frame-shaped region, the wider the frequency band blocked on the high frequency side, and the greater the effect of reducing the amount of information by the low-pass filter. Therefore, the effect of reducing the amount of information is greater in the space filter 102 with 5 × 5 pixels than in the space filter 101 with 3 × 3 pixels.

一方、隣接する枠状領域同士の間、又は隣接する顔領域と枠状領域との間でタップサイズの差が大きいほど、低域通過フィルタ処理後に、領域間の境界が目立ちやすくなる。 On the other hand, the larger the difference in tap size between the adjacent frame-shaped regions or between the adjacent face region and the frame-shaped region, the more conspicuous the boundary between the regions becomes after the low-pass filter processing.

低域通過フィルタ処理部３０５ａは、図１０の枠状領域９１ａに対しては、空間フィルタ１０１を用いて低域通過フィルタ処理を実行し、枠状領域９２ａに対しては、空間フィルタ１０２を用いて低域通過フィルタ処理を実行する。また、枠状領域９３ａに対しては、空間フィルタ１０２より更にタップサイズが大きい７×７画素の空間フィルタ等を用いて低域通過フィルタ処理を実行する。 The low-pass filter processing unit 305a executes low-pass filter processing on the frame-shaped region 91a in FIG. 10 by using the space filter 101, and uses the space filter 102 on the frame-shaped region 92a. And low-pass filter processing is executed. Further, for the frame-shaped region 93a, the low-pass filter processing is executed by using a 7 × 7 pixel spatial filter or the like having a tap size larger than that of the spatial filter 102.

ここで、顔領域４２ａは、情報量を維持するために低域通過フィルタ処理が実行されない。そのため、顔領域４２ａと非顔領域との間で画素輝度の平均値の差が第１輝度閾値より大きい場合に、非顔領域全体に対して、タップサイズが大きい７×７画素の空間フィルタを用いて低域通過フィルタ処理が実行されると、顔領域４２ａと非顔領域との間の境界が目立つ場合がある。 Here, in the face region 42a, the low-pass filter processing is not executed in order to maintain the amount of information. Therefore, when the difference in the average value of the pixel brightness between the face region 42a and the non-face region is larger than the first luminance threshold value, a 7 × 7 pixel spatial filter having a large tap size is applied to the entire non-face region. When the low-pass filtering process is performed using the method, the boundary between the face region 42a and the non-face region may be conspicuous.

これに対し、低域通過フィルタ処理部３０５ａは、顔領域４２ａに隣接する枠状領域９１ａには、タップサイズが小さい３×３画素の空間フィルタを用いて低域通過フィルタ処理を実行する。また、枠状領域９１ａに隣接する枠状領域９２ａには、３×３画素の空間フィルタに近い５×５画素の空間フィルタを用いて低域通過フィルタ処理を実行する。同様に、枠状領域９２ａに隣接する枠状領域９３ａには、５×５画素の空間フィルタに近い７×７画素の空間フィルタを用いて低域通過フィルタ処理を実行する。このようにすることで、隣接する領域間で空間フィルタのタップサイズを大きく異ならせないようにすることができる。その結果、低域通過フィルタ処理後に領域間の境界が目立たないようにすることができる。 On the other hand, the low-pass filter processing unit 305a executes the low-pass filter processing in the frame-shaped region 91a adjacent to the face region 42a by using a spatial filter of 3 × 3 pixels having a small tap size. Further, in the frame-shaped region 92a adjacent to the frame-shaped region 91a, a low-pass filter process is executed by using a space filter having 5 × 5 pixels, which is close to a space filter having 3 × 3 pixels. Similarly, in the frame-shaped region 93a adjacent to the frame-shaped region 92a, a low-pass filter process is executed using a 7 × 7 pixel spatial filter close to a 5 × 5 pixel spatial filter. By doing so, it is possible to prevent the tap size of the spatial filter from being significantly different between adjacent areas. As a result, the boundaries between the regions can be made inconspicuous after the low-pass filter processing.

一方、顔領域４２ｂの周囲では、枠状領域９１ｂ、９２ｂ及び９３ｂは合体されて１つの枠状領域９３ｂとして扱われる。枠状領域を合体させる際には、顔領域から遠い側の枠状領域に適用するタップサイズの空間フィルタが採用される。従って、合体後の枠状領域９３ｂでは、枠状領域９１ｂ、９２ｂ及び９３ｂに適用するタップサイズのうち、最も遠い側の枠状領域で使用される７×７画素のタップサイズの空間フィルタが、低域通過フィルタ処理に用いられる。 On the other hand, around the face region 42b, the frame-shaped regions 91b, 92b and 93b are combined and treated as one frame-shaped region 93b. When combining the frame-shaped areas, a tap-sized spatial filter applied to the frame-shaped area on the side far from the face area is adopted. Therefore, in the frame-shaped region 93b after coalescence, the 7 × 7 pixel tap size spatial filter used in the farthest frame-shaped region among the tap sizes applied to the frame-shaped regions 91b, 92b and 93b is used. Used for low-pass filtering.

顔領域４２ｂと合体後の枠状領域９３ｂとの間では、画素輝度の平均値の差が第１輝度閾値以下であって小さい。そのため、枠状領域９３ｂ全体に対して、タップサイズが大きい７×７画素の空間フィルタを用いて低域通過フィルタ処理が実行されても、顔領域４２ｂと非顔領域との間の境界が目立つことはない。従って、タップサイズの大きい空間フィルタを用いて、大きな情報量の減少効果を得ることができる。 The difference in the average value of the pixel luminance between the face region 42b and the frame-shaped region 93b after coalescence is smaller than the first luminance threshold and is small. Therefore, even if the low-pass filter processing is executed using the 7 × 7 pixel spatial filter having a large tap size on the entire frame-shaped region 93b, the boundary between the face region 42b and the non-face region is conspicuous. There is no such thing. Therefore, a large amount of information can be reduced by using a spatial filter having a large tap size.

このようにして、情報量を減少させる処理を実行した場合に、領域間の境界を目立たなくすることで、画像の品質を保持しつつ、画像の情報量を減少させることができる。 In this way, when the process of reducing the amount of information is executed, the amount of information in the image can be reduced while maintaining the quality of the image by making the boundaries between the regions inconspicuous.

＜第１の実施形態に係るビデオ会議端末による処理＞
図１２は、本実施形態に係るビデオ会議端末１０ａによる処理の一例を示すフローチャートである。 <Processing by the video conferencing terminal according to the first embodiment>
FIG. 12 is a flowchart showing an example of processing by the video conferencing terminal 10a according to the present embodiment.

ここで、ステップＳ１２１〜Ｓ１２３の処理は、図７のステップＳ７１〜Ｓ７３の処理と同じであるため、ここでは重複する説明を省略する。 Here, since the processing of steps S121 to S123 is the same as the processing of steps S71 to S73 of FIG. 7, duplicate description will be omitted here.

ステップＳ１２４において、枠状領域設定部３０９は、ＳＳＤ７０４等に記憶された第１分割情報に従って、検出された顔を囲むように複数の矩形の枠状領域を設定する。 In step S124, the frame-shaped area setting unit 309 sets a plurality of rectangular frame-shaped areas so as to surround the detected face according to the first division information stored in the SSD 704 or the like.

続いて、ステップＳ１２５において、第１特徴量抽出部３１０は、顔領域及び各枠状領域のそれぞれで、領域内に含まれる全画素の輝度の平均値を算出し、隣接する枠状領域同士の間、及び隣接する顔領域と枠状領域との間での輝度の平均値の差を算出する。その後、算出結果を第１分割決定部３１１に出力する。 Subsequently, in step S125, the first feature amount extraction unit 310 calculates the average value of the brightness of all the pixels included in the face region and each frame-shaped region in each of the face region and each frame-shaped region, and the adjacent frame-shaped regions are connected to each other. Calculate the difference in the average brightness between the face area and the adjacent face area and the frame-shaped area. After that, the calculation result is output to the first division determination unit 311.

続いて、ステップＳ１２６において、第１分割決定部３１１は、第１特徴量抽出部３１０から入力した輝度の平均値の差のそれぞれを、第１輝度閾値と比較し、輝度の平均値の差が第１輝度閾値より大きいか否かを判定する。 Subsequently, in step S126, the first division determination unit 311 compares each of the differences in the average luminance values input from the first feature quantity extraction unit 310 with the first luminance threshold value, and the difference in the average luminance values is the difference. It is determined whether or not it is larger than the first luminance threshold.

ステップＳ１２６で、輝度の平均値の差が第１輝度閾値より大きいと判定された場合（ステップＳ１２６、Ｙｅｓ）、ステップＳ１２７において、第１分割決定部３１１は、隣接する領域を２つの領域として分割する。 When it is determined in step S126 that the difference between the average values of the brightness is larger than the first brightness threshold value (step S126, Yes), in step S127, the first division determination unit 311 divides the adjacent region into two regions. To do.

一方、ステップＳ１２６で、輝度の平均値の差が第１輝度閾値以下である判定された場合（ステップＳ１２６、Ｎｏ）、ステップＳ１２８において、第１分割決定部３１１は、隣接する領域を分割せず、両者を合体させて１つの枠状領域とする。 On the other hand, when it is determined in step S126 that the difference between the average luminance values is equal to or less than the first luminance threshold value (step S126, No), in step S128, the first division determination unit 311 does not divide the adjacent region. , Both are combined into one frame-shaped area.

このようなステップＳ１２６〜Ｓ１２８の処理は、隣接する枠状領域同士、及び隣接する顔領域と枠状領域の全てに対して実行される。 Such processing of steps S126 to S128 is executed for all of the adjacent frame-shaped regions and the adjacent face region and frame-shaped region.

続いて、ステップＳ１２９において、低域通過フィルタ処理部３０５ａは、第１分割処理部３０８により分割された枠状領域毎に、異なるタップサイズの空間フィルタを用いて、空間フィルタ処理を実行する。 Subsequently, in step S129, the low-pass filter processing unit 305a executes the spatial filter processing by using a spatial filter having a different tap size for each frame-shaped region divided by the first division processing unit 308.

ステップＳ１３０〜Ｓ１３４の処理は、図７のステップＳ７５〜Ｓ７９の処理と同じであるため、ここでは重複する説明を省略する。 Since the processes of steps S130 to S134 are the same as the processes of steps S75 to S79 of FIG. 7, redundant description will be omitted here.

なお、ステップＳ１３０〜Ｓ１３２では、非顔領域全体に対して共通のコントラスト低減処理、及びフレームレート低下処理を実行する例を示したが、第１分割処理部３０８により分割された非顔領域の枠状領域毎に、異なるコントラスト低減処理、及びフレームレート低下処理をそれぞれ実行しても良い。 In steps S130 to S132, an example of executing a common contrast reduction process and a frame rate reduction process for the entire non-face region has been shown, but the frame of the non-face region divided by the first division processing unit 308 has been shown. Different contrast reduction processing and frame rate reduction processing may be executed for each state region.

このようにして、ビデオ会議端末１０ａは、非顔領域を複数の枠状領域に分割し、枠状領域毎で情報量を減少させる処理を実行することができる。 In this way, the video conferencing terminal 10a can divide the non-face region into a plurality of frame-shaped regions and execute a process of reducing the amount of information for each frame-shaped region.

＜作用効果＞
以上説明してきたように、本実施形態では、ビデオ会議端末１０ａへの入力画像の非顔領域を、領域内に含まれる画素の輝度に基づき複数の枠状領域に分割する。具体的には、非顔領域において、隣接する枠状領域同士の間、及び隣接する顔領域と枠状領域との間で、各領域内に含まれる全ての画素の輝度の平均値の差を算出する。そして、輝度の平均値の差が第１輝度閾値より大きい場合に、隣接する枠状領域同士、又は隣接する顔領域と枠状領域を分割する。また、輝度の平均値の差が第１輝度閾値以下の場合に、隣接する枠状領域同士、又は隣接する顔領域と枠状領域を分割せずに合体させて１つの領域とする。その後、非顔領域の分割された領域毎で、異なるタップサイズの空間フィルタを用いて情報量を減少させる。 <Effect>
As described above, in the present embodiment, the non-face region of the input image to the video conferencing terminal 10a is divided into a plurality of frame-shaped regions based on the brightness of the pixels included in the region. Specifically, in the non-face region, the difference in the average brightness of all the pixels included in each region between the adjacent frame-shaped regions and between the adjacent face region and the frame-shaped region is calculated. calculate. Then, when the difference between the average values of the brightness is larger than the first brightness threshold value, the adjacent frame-shaped regions or the adjacent face region and the frame-shaped region are divided. Further, when the difference between the average values of the brightness is equal to or less than the first brightness threshold value, the adjacent frame-shaped regions or the adjacent face region and the frame-shaped region are combined without being divided into one region. After that, the amount of information is reduced by using a spatial filter having a different tap size for each divided region of the non-face region.

このようにすることで、隣接する枠状領域同士の間、又は隣接する顔領域と枠状領域との間で輝度の平均値の差が大きい場合に、低域通過フィルタ処理で用いる空間フィルタのタップサイズを隣接する領域間で大きく異ならせないようにすることができる。これにより、低域通過フィルタ処理後に領域間の境界を目立たないようにし、画像の品質を保持することができる。 By doing so, when the difference in the average brightness between the adjacent frame-shaped regions or between the adjacent face region and the frame-shaped region is large, the spatial filter used in the low-pass filter processing can be used. It is possible to prevent the tap size from being significantly different between adjacent areas. As a result, the boundaries between the regions can be made inconspicuous after the low-pass filter processing, and the quality of the image can be maintained.

また、画像において重要な情報となる人物の情報量を維持したまま、重要でない背景等の情報量を減少させることができる。このようにして、主観的な画像品質を保持しつつ、画像の情報量を減少させることができ、画像の圧縮率を高めることができる。 In addition, it is possible to reduce the amount of information such as an unimportant background while maintaining the amount of information of a person who is important information in an image. In this way, the amount of information in the image can be reduced and the compression rate of the image can be increased while maintaining the subjective image quality.

上述した例では、枠状領域の一例として矩形の枠状領域を説明したが、円形等の他の形状の枠状領域であっても良い。また、低域通過フィルタ処理の一例として空間フィルタを用いたコンボリューション処理を説明したが、フーリエ変換処理等の他の空間周波数を低下させる処理を適用することもできる。 In the above-mentioned example, a rectangular frame-shaped region has been described as an example of the frame-shaped region, but a frame-shaped region having another shape such as a circle may be used. Further, although the convolution processing using the spatial filter has been described as an example of the low-pass filter processing, other processing for lowering the spatial frequency such as the Fourier transform processing can also be applied.

また、第１特徴量の一例として、領域内に含まれる全画素の輝度平均値の隣接領域間での差を説明したが、領域内に含まれる全画素の輝度の最大値や最小値、総和、コントラスト（輝度の最大値と最小値の差）の隣接領域間での差等を用いても良い。 Further, as an example of the first feature amount, the difference in the brightness average value of all the pixels included in the region between adjacent regions has been described, but the maximum value, the minimum value, and the total of the brightness of all the pixels included in the region are described. , The difference in contrast (difference between the maximum value and the minimum value of the brightness) between adjacent regions may be used.

なお、分割した枠状領域の特徴量の差分判断を、ＳＳＤ７０４等のメモリに保持し、別途設定された差分判断のＮ数の閾値により、枠状領域を分割するかの判断を行っても良い。このようにすることで、枠内全体から求めた特徴量に基づき、枠状領域を分割するかの判断を行う場合と比較して、より細かい判断が可能になる。 It should be noted that the difference determination of the feature amount of the divided frame-shaped area may be held in a memory such as SSD 704, and it may be determined whether to divide the frame-shaped area according to the threshold value of the N number of the difference determination set separately. .. By doing so, it is possible to make a finer judgment as compared with the case of determining whether to divide the frame-shaped region based on the feature amount obtained from the entire frame.

［第２の実施形態］
次に、第２の実施形態に係るビデオ会議端末１０ｂについて説明する。 [Second Embodiment]
Next, the video conferencing terminal 10b according to the second embodiment will be described.

第１の実施形態で述べた第１分割処理部３０８により分割された枠状領域内において、部分的に明るさ等の画像特性（特徴量）が異なる場合がある。この場合に、１つの枠状領域の全体に対して、情報量を減少させるために同じ処理を実行すると、枠状領域内の部分毎で、明るさ等の特徴量の差が目立ち、違和感のある低品質の画像になる場合がある。そこで、本実施形態では、枠状領域内の部分毎で明るさ等の特徴量が大きく異なる場合は、枠状領域を複数の矩形領域に分割し、矩形領域毎で情報量を減少させる処理を異ならせることで、処理後の画像の品質の低下を抑制している。 In the frame-shaped region divided by the first division processing unit 308 described in the first embodiment, image characteristics (features) such as brightness may be partially different. In this case, if the same processing is executed for the entire frame-shaped area in order to reduce the amount of information, the difference in the feature amount such as brightness is conspicuous for each part in the frame-shaped area, which makes the person feel uncomfortable. It may result in some low quality image. Therefore, in the present embodiment, when the feature amount such as brightness is significantly different for each part in the frame-shaped area, the frame-shaped area is divided into a plurality of rectangular areas, and the amount of information is reduced for each rectangular area. By making them different, deterioration of the quality of the processed image is suppressed.

＜第２の実施形態に係るビデオ会議端末の機能構成＞
図１３は、本実施形態に係るビデオ会議端末１０ｂの機能構成の一例について説明する図である。図１３に示すように、ビデオ会議端末１０ｂは、第２分割処理部３１２と、情報量減少処理部３０３ｂとを有する。これらは、図２のＣＰＵ７０１が所定のプログラムを実行すること等により実現される。 <Functional configuration of the video conferencing terminal according to the second embodiment>
FIG. 13 is a diagram illustrating an example of the functional configuration of the video conferencing terminal 10b according to the present embodiment. As shown in FIG. 13, the video conferencing terminal 10b has a second division processing unit 312 and an information amount reduction processing unit 303b. These are realized by the CPU 701 of FIG. 2 executing a predetermined program or the like.

第２分割処理部３１２は、入力画像の第２特徴量に基づいて、第１分割処理部３０８により分割された枠状領域を複数の矩形領域に分割する処理を実行する。この分割処理のために、分割数Ｐ、矩形領域のサイズＱ等の第２分割情報が予め定められ、ＳＳＤ７０４等に記憶されている。また、第２分割処理部３１２は、矩形領域設定部３１３と、第２特徴量抽出部３１４と、第２分割決定部３１５とを有する。 The second division processing unit 312 executes a process of dividing the frame-shaped region divided by the first division processing unit 308 into a plurality of rectangular regions based on the second feature amount of the input image. For this division process, second division information such as the number of divisions P and the size Q of the rectangular area is predetermined and stored in SSD 704 or the like. In addition, the second division processing unit 312 has a rectangular area setting unit 313, a second feature amount extraction unit 314, and a second division determination unit 315.

矩形領域設定部３１３は、ＳＳＤ７０４等に記憶された第２分割情報に従って、第１分割処理部３０８により分割された枠状領域のそれぞれに対して、枠状領域の境界に複数の矩形領域を設定する。ここで、図１４は、矩形領域設定部３１３により設定された矩形領域の一例を説明する図である。 The rectangular area setting unit 313 sets a plurality of rectangular areas at the boundary of the frame-shaped area for each of the frame-shaped areas divided by the first division processing unit 308 according to the second division information stored in the SSD 704 or the like. To do. Here, FIG. 14 is a diagram illustrating an example of a rectangular area set by the rectangular area setting unit 313.

図１４において、斜線ハッチングで示した１２個の矩形領域９４ａは、第１分割処理部３０８により分割された枠状領域９２ａに対して、枠状領域の境界に設定された矩形領域である。なお、図１４では、図を簡略化するため、矩形領域が枠状領域９２ａのみに対して設定された例を示したが、矩形領域設定部３１３は、第１分割処理部３０８により分割された全ての枠状領域に対して、枠状領域の境界に矩形領域を設定することができる。 In FIG. 14, the twelve rectangular regions 94a shown by diagonal hatching are rectangular regions set at the boundaries of the frame-shaped regions with respect to the frame-shaped regions 92a divided by the first division processing unit 308. Note that FIG. 14 shows an example in which the rectangular area is set only for the frame-shaped area 92a in order to simplify the drawing, but the rectangular area setting unit 313 is divided by the first partition processing unit 308. For all frame-shaped areas, a rectangular area can be set at the boundary of the frame-shaped area.

図１４のように枠状領域内に矩形領域９４ａを設定した後、第２特徴量抽出部３１４は、顔領域及び各矩形領域のそれぞれで、領域内に含まれる全画素の輝度の平均値を算出する。そして、隣接する矩形領域同士の間、及び隣接する顔領域と矩形領域との間で、領域内に含まれる全ての画素の輝度の平均値の差を算出し、算出結果を第２分割決定部３１５に出力する。ここで、領域内に含まれる全画素の輝度の平均値は、「第２特徴量」の一例である。 After setting the rectangular region 94a in the frame-shaped region as shown in FIG. 14, the second feature amount extraction unit 314 sets the average value of the brightness of all the pixels included in the region in each of the face region and each rectangular region. calculate. Then, the difference in the average value of the brightness of all the pixels included in the area is calculated between the adjacent rectangular areas and between the adjacent face area and the rectangular area, and the calculation result is determined by the second division determination unit. Output to 315. Here, the average value of the brightness of all the pixels included in the area is an example of the "second feature amount".

その後、第２分割決定部３１５は、算出した輝度の平均値の差のそれぞれを、予め定められた第２輝度閾値と比較する。そして、輝度の平均値の差が第２輝度閾値より大きい場合は、隣接する領域を２つの領域として分割する。一方、輝度の平均値の値が第２輝度閾値以下の場合は、隣接する領域を分割せずに両者を合体させて１つの領域とする。ここで、第２輝度閾値は、「第２閾値」の一例である。なお、第２閾値は、上述した第１閾値と同じであっても良いし、異なっていても良い。 After that, the second division determination unit 315 compares each of the calculated differences in the average values of the brightness with a predetermined second brightness threshold. Then, when the difference between the average values of the brightness is larger than the second brightness threshold value, the adjacent regions are divided into two regions. On the other hand, when the average value of the brightness is equal to or less than the second brightness threshold value, the adjacent regions are not divided but combined into one region. Here, the second luminance threshold is an example of the "second threshold". The second threshold value may be the same as or different from the first threshold value described above.

分割された領域のそれぞれは別々の領域として扱われ、情報量減少処理部３０３ｂにより領域毎で異なる処理が実行される。一方、分割されずに合体された領域は、１つの領域として扱われ、情報量減少処理部３０３ｂにより、共通の処理が実行される。 Each of the divided regions is treated as a separate region, and the information amount reduction processing unit 303b executes different processing for each region. On the other hand, the combined regions that are not divided are treated as one region, and the information amount reduction processing unit 303b executes common processing.

情報量減少処理部３０３ｂに含まれる低域通過フィルタ処理部３０５ｂは、第２分割処理部３１２により分割された非顔領域の枠状領域における矩形領域９４ａ毎に、異なるタップサイズの空間フィルタを用いて空間フィルタ処理を実行する。ここで、本実施形態では、空間フィルタのタップサイズを異ならせる場合、境界を目立ちにくくするために、隣接する矩形領域間でタップサイズを大きく異ならせないようにする。例えば、隣接する一方の矩形領域に３×３画素のタップサイズの空間フィルタを適用する場合、他方の矩形領域には５×５画素のタップサイズの空間フィルタ等の３×３画素に対してタップサイズが近い空間フィルタを使用する。 The low-pass filter processing unit 305b included in the information amount reduction processing unit 303b uses a spatial filter having a different tap size for each rectangular area 94a in the frame-shaped area of the non-face area divided by the second division processing unit 312. And execute spatial filtering. Here, in the present embodiment, when the tap size of the spatial filter is different, the tap size is not significantly different between the adjacent rectangular areas in order to make the boundary inconspicuous. For example, when applying a 3 × 3 pixel tap size spatial filter to one adjacent rectangular area, tap the other rectangular area for 3 × 3 pixels such as a 5 × 5 pixel tap size spatial filter. Use spatial filters that are close in size.

＜第２の実施形態に係るビデオ会議端末による処理＞
図１５は、本実施形態に係るビデオ会議端末１０ｂによる処理の一例を示すフローチャートである。 <Processing by the video conferencing terminal according to the second embodiment>
FIG. 15 is a flowchart showing an example of processing by the video conferencing terminal 10b according to the present embodiment.

ここで、ステップＳ１５１〜Ｓ１５４の処理は、図１２のステップＳ１２１〜Ｓ１２４の処理と同じであるため、ここでは重複する説明を省略する。 Here, since the processing of steps S151 to S154 is the same as the processing of steps S121 to S124 of FIG. 12, duplicate description will be omitted here.

ステップＳ１５５において、矩形領域設定部３１３は、ＳＳＤ７０４等に記憶された第２分割情報に従って、第１分割処理部３０８により分割された枠状領域に対して、枠状領域の境界に複数の矩形領域を設定する。 In step S155, the rectangular area setting unit 313 has a plurality of rectangular areas at the boundary of the frame-shaped area with respect to the frame-shaped area divided by the first division processing unit 308 according to the second division information stored in the SSD 704 or the like. To set.

続いて、ステップＳ１５６において、第２特徴量抽出部３１４は、顔領域及び各矩形領域のそれぞれで、領域内に含まれる全画素の輝度の平均値を算出する。そして、隣接する矩形領域同士の間、及び隣接する顔領域と矩形領域との間での輝度の平均値の差を算出し、算出結果を第２分割決定部３１５に出力する。 Subsequently, in step S156, the second feature amount extraction unit 314 calculates the average value of the brightness of all the pixels included in each of the face region and each rectangular region. Then, the difference in the average value of the brightness between the adjacent rectangular regions and between the adjacent face region and the rectangular region is calculated, and the calculation result is output to the second division determination unit 315.

続いて、ステップＳ１５７において、第２分割決定部３１５は、第２特徴量抽出部３１４から入力した輝度の平均値の差のそれぞれを第２輝度閾値と比較し、輝度の平均値の差が第２輝度閾値より大きいか否かを判定する。 Subsequently, in step S157, the second division determination unit 315 compares each of the differences in the average luminance values input from the second feature quantity extraction unit 314 with the second luminance threshold value, and the difference in the average luminance values is the second. 2 Determine whether or not it is larger than the brightness threshold.

ステップＳ１５７で、輝度の平均値の差が第２輝度閾値より大きいと判定された場合（ステップＳ１５７、Ｙｅｓ）、ステップＳ１５８において、第２分割決定部３１５は、隣接する領域を２つの領域として分割する。 When it is determined in step S157 that the difference between the average values of the luminances is larger than the second luminance threshold value (steps S157, Yes), in step S158, the second division determination unit 315 divides the adjacent region into two regions. To do.

一方、ステップＳ１５７で、輝度の平均値の差が第２輝度閾値以下である判定された場合（ステップＳ１５７、Ｎｏ）、ステップＳ１５９において、第２分割決定部３１５は、隣接する領域を分割せずに両者を合体させて１つの枠状領域とする。 On the other hand, when it is determined in step S157 that the difference between the average values of the brightness is equal to or less than the second brightness threshold value (step S157, No), in step S159, the second division determination unit 315 does not divide the adjacent region. The two are combined into one frame-shaped area.

続いて、ステップＳ１６０において、低域通過フィルタ処理部３０５ｂは、第２分割処理部３１２により分割された非顔領域の矩形領域毎に、異なるタップサイズの空間フィルタを用いて、空間フィルタ処理を実行する。 Subsequently, in step S160, the low-pass filter processing unit 305b executes spatial filter processing using a spatial filter having a different tap size for each rectangular region of the non-face region divided by the second division processing unit 312. To do.

ステップＳ１６１〜Ｓ１６５の処理は、図１２のステップＳ１３０〜Ｓ１３４の処理と同じであるため、ここでは重複する説明を省略する。 Since the processes of steps S161 to S165 are the same as the processes of steps S130 to S134 of FIG. 12, redundant description will be omitted here.

なお、ステップＳ１６１〜Ｓ１６５では、非顔領域全体に対して共通のコントラスト低減処理、及びフレームレート低下処理を実行する例を示したが、第２分割処理部３１２により分割された非顔領域の矩形領域毎に異なるコントラスト低減処理、及びフレームレート低下処理を、それぞれ実行しても良い。 In steps S161 to S165, an example of executing a common contrast reduction process and a frame rate reduction process for the entire non-face region has been shown, but the rectangle of the non-face region divided by the second division processing unit 312 has been shown. The contrast reduction process and the frame rate reduction process, which are different for each region, may be executed respectively.

このようにして、ビデオ会議端末１０ｂは、非顔領域の枠状領域を複数の矩形領域に分割し、矩形領域毎で情報量を減少させる処理を実行することができる。 In this way, the video conferencing terminal 10b can divide the frame-shaped region of the non-face region into a plurality of rectangular regions and execute a process of reducing the amount of information for each rectangular region.

＜作用効果＞
以上説明してきたように、本実施形態では、ビデオ会議端末１０ｂへの入力画像の非顔領域を、領域内に含まれる画素の輝度に基づき複数の枠状領域に分割し、さらに複数の枠状領域を、領域内に含まれる画素の輝度に基づき複数の矩形領域に分割する。 <Effect>
As described above, in the present embodiment, the non-face region of the input image to the video conferencing terminal 10b is divided into a plurality of frame-shaped regions based on the brightness of the pixels included in the region, and a plurality of frame-shaped regions are further divided. The area is divided into a plurality of rectangular areas based on the brightness of the pixels included in the area.

具体的には、枠状領域において、隣接する矩形領域同士の間、及び隣接する顔領域と矩形領域との間で、各領域内に含まれる全画素の輝度の平均値の差を算出する。そして、輝度の平均値の差が第１輝度閾値より大きい場合に、隣接する矩形領域同士、又は隣接する顔領域と矩形領域を分割する。また、輝度の平均値の差が第２輝度閾値以下の場合に、隣接する矩形領域同士、又は隣接する顔領域と矩形領域を分割する。その後、分割された領域毎で、異なるタップサイズの空間フィルタを用いて情報量を減少させる。 Specifically, in the frame-shaped region, the difference in the average value of the brightness of all the pixels included in each region is calculated between the adjacent rectangular regions and between the adjacent face region and the rectangular region. Then, when the difference between the average values of the brightness is larger than the first brightness threshold value, the adjacent rectangular regions or the adjacent face region and the rectangular region are divided. Further, when the difference between the average values of the brightness is equal to or less than the second brightness threshold value, the adjacent rectangular regions or the adjacent face region and the rectangular region are divided. After that, the amount of information is reduced by using spatial filters having different tap sizes for each divided area.

このようにすることで、隣接する矩形領域同士の間、又は隣接する顔領域と矩形領域との間で、各領域内に含まれる全画素の輝度の平均値の差が大きい場合に、低域通過フィルタ処理で用いる空間フィルタのタップサイズを隣接する領域間で大きく異ならせないようにすることができる。これにより、低域通過フィルタ処理後に領域間の境界を目立たないようにし、画像の品質を保持することができる。 By doing so, when the difference in the average value of the brightness of all the pixels included in each area is large between the adjacent rectangular areas or between the adjacent face area and the rectangular area, the low range is used. It is possible to prevent the tap size of the spatial filter used in the pass filtering process from being significantly different between adjacent regions. As a result, the boundaries between the regions can be made inconspicuous after the low-pass filter processing, and the quality of the image can be maintained.

なお、上述した例では、第２特徴量の一例として、領域内に含まれる全画素の輝度平均値の隣接領域間での差を説明したが、領域内に含まれる全画素の輝度の最大値や最小値、総和、コントラスト（輝度の最大値と最小値の差）の隣接領域間での差等を用いても良い。 In the above-mentioned example, as an example of the second feature amount, the difference in the brightness average value of all the pixels included in the region between the adjacent regions has been described, but the maximum value of the brightness of all the pixels included in the region has been described. , The minimum value, the sum, and the difference in contrast (difference between the maximum value and the minimum value of the brightness) between adjacent regions may be used.

以上、実施形態に係る画像処理装置について説明したが、本発明は上記実施形態に限定されるものではなく、本発明の範囲内で種々の変形及び改良が可能である。 Although the image processing apparatus according to the embodiment has been described above, the present invention is not limited to the above embodiment, and various modifications and improvements can be made within the scope of the present invention.

また、実施形態は、プログラムも含む。例えば、プログラムは、コンピュータを、入力画像の第１特徴量に基づいて、前記入力画像に含まれる人物の顔領域以外の領域を分割する第１分割処理部、及び前記第１分割処理部により分割された領域毎で、情報量を減少させる情報量減少処理部として機能させる。このようなプログラムにより、上述した画像処理装置と同様の効果を得ることができる。 The embodiment also includes a program. For example, the program divides the computer by the first division processing unit that divides the area other than the face area of the person included in the input image based on the first feature amount of the input image, and the first division processing unit. It functions as an information amount reduction processing unit that reduces the amount of information in each of the created areas. With such a program, the same effect as that of the image processing apparatus described above can be obtained.

さらに、実施形態は、画像処理方法も含む。例えば、画像処理方法は、入力画像の第１特徴量に基づいて、前記入力画像に含まれる人物の顔領域以外の領域を分割する工程と、前記第１分割処理部により分割された領域毎で、情報量を減少させる処理を実行する工程と、を含む。このような画像処理方法により、上述した画像処理装置と同様の効果を得ることができる。 Further, the embodiment also includes an image processing method. For example, the image processing method includes a step of dividing a region other than the face region of a person included in the input image based on the first feature amount of the input image, and each region divided by the first division processing unit. , And a step of executing a process of reducing the amount of information. By such an image processing method, the same effect as that of the image processing apparatus described above can be obtained.

また、上記で説明した実施形態の各機能は、一又は複数の処理回路によって実現することが可能である。ここで、本明細書における「処理回路」とは、電子回路により実装されるプロセッサのようにソフトウェアによって各機能を実行するようプログラミングされたプロセッサや、上記で説明した各機能を実行するよう設計されたASIC(Application Specific Integrated Circuit)、DSP（digital signal processor）、FPGA（field programmable gate array）や従来の回路モジュール等のデバイスを含むものとする。 Further, each function of the embodiment described above can be realized by one or a plurality of processing circuits. Here, the "processing circuit" in the present specification is a processor programmed to execute each function by software such as a processor implemented by an electronic circuit, or a processor designed to execute each function described above. It shall include devices such as ASIC (Application Specific Integrated Circuit), DSP (digital signal processor), FPGA (field programmable gate array) and conventional circuit modules.

１０、１０ａビデオ会議端末（画像処理装置の一例）
４０画像
４２ａ、４２ｂ顔領域
９１ａ〜９３ａ、９１ｂ〜９３ｂ枠状領域
９４ａ矩形領域
１０１、１０２空間フィルタ
３０１顔検出処理部
３０２非顔領域設定部
３０３情報量減少処理部
３０４符号化処理部
３０５低域通過フィルタ処理部
３０６コントラスト低減処理部
３０７フレームレート低下処理部
３０８第１分割処理部
３０９枠状領域設定部
３１０第１特徴量抽出部
３１１第１分割決定部
３１２第２分割処理部
３１３矩形領域設定部
３１４第２特徴量抽出部
３１５第２分割決定部 10, 10a Video conferencing terminal (an example of image processing device)
40 Images 42a, 42b Face areas 91a to 93a, 91b to 93b Frame-shaped areas 94a Rectangular areas 101, 102 Spatial filter 301 Face detection processing unit 302 Non-face area setting unit 303 Information amount reduction processing unit 304 Coding processing unit 305 Low range Pass filter processing unit 306 Contrast reduction processing unit 307 Frame rate reduction processing unit 308 First division processing unit 309 Frame-shaped area setting unit 310 First feature amount extraction unit 311 First division determination unit 312 Second division processing unit 313 Rectangular area setting Part 314 Second feature amount extraction part 315 Second division determination part

特開２００２−１６５２２２号公報JP-A-2002-165222

Claims

A first division processing unit that divides an area other than the face area of a person included in the input image based on the first feature amount of the input image.
An image processing apparatus including an information amount reduction processing unit that executes a process of reducing the amount of information for each area divided by the first division processing unit.

The first feature amount is
Pixel brightness between a plurality of frame-shaped regions surrounding the face region and the face region in a region other than the face region, between adjacent frame-shaped regions and between the adjacent face region and the frame-shaped region. Extracted based on the difference between
The first partition processing unit
The image processing apparatus according to claim 1, wherein when the difference in pixel brightness is larger than a predetermined first threshold value, the adjacent frame-shaped regions or the adjacent face region and the frame-shaped region are divided.

It has a second division processing unit that divides the frame-shaped region based on the second feature amount of the input image.
The information amount reduction processing unit
The image processing apparatus according to claim 2, wherein the processing for differentiating the amount of information is executed for each region divided by the second division processing unit.

The second feature amount is
Extracted based on the difference in pixel brightness between the adjacent rectangular regions and between the adjacent face regions and the rectangular regions among the plurality of rectangular regions and the face regions in each of the plurality of frame-shaped regions. Being done
The second division processing unit
The image processing apparatus according to claim 3, wherein when the difference in pixel brightness is larger than a predetermined second threshold value, the adjacent rectangular regions or the adjacent face region and the rectangular region are divided.

Computer,
Based on the first feature amount of the input image, the amount of information is divided by the first division processing unit that divides the area other than the face area of the person included in the input image and the area divided by the first division processing unit. A program that functions as an information amount reduction processing unit that executes processing that reduces the amount of information.

A step of dividing a region other than the human face region included in the input image based on the first feature amount of the input image, and
An image processing method including a step of executing a process of reducing an amount of information for each area divided by the first division processing unit.