JP2023082816A

JP2023082816A - Image processing device, information processing device, video conference server, and video conference system

Info

Publication number: JP2023082816A
Application number: JP2021196770A
Authority: JP
Inventors: 泰史塚本; Yasushi Tsukamoto; 梓王; Zi Wang
Original assignee: Lenovo Singapore Pte Ltd
Current assignee: Lenovo Singapore Pte Ltd
Priority date: 2021-12-03
Filing date: 2021-12-03
Publication date: 2023-06-15
Anticipated expiration: 2041-12-03
Also published as: JP7250101B1

Abstract

To reduce dimensional irregularity in the faces of attendants placed in a common background.SOLUTION: An image processing section 20 comprises: an image acquisition unit 21 which acquires image data; a face area identifying unit 22 which identifies the face area of a person included in the image data; a dimension detection unit 23 which detects the dimension of a horizontal width at a predetermined height position of the identified face area, as a horizontal width dimension; a reference value identifying unit 27 which identifies a horizontal width reference value corresponding to the age of the person from reference information associating an age and the horizontal width reference value of the face area with each other; an image adjustment unit 28 which adjusts the image data so as to make the horizontal width dimension close to the identified horizontal width reference value; and an output unit 29 which outputs the adjusted image data.SELECTED DRAWING: Figure 4

Description

本発明は、画像処理装置、情報処理装置、ビデオ会議サーバ、及びビデオ会議システムに関するものである。 The present invention relates to an image processing device, an information processing device, a videoconference server, and a videoconference system.

近年、ビデオ会議が頻繁に行われるようになり、これに伴いビデオ会議に関する様々なアプリケーションツールが提案されている。その中の一つに、ビデオ会議に参加している参加者を共通の背景下に配置することにより、参加者があたかも共通の空間に集まってミーティングやセミナーの傍聴を行っているような感覚を参加者に与えることのできる技術が提案されている。例えば、ズームビデオコミュニケーションズインコーポレイテッド社が提供する「イマーシブビュー（ｉｍｍｅｒｓｉｖｅｖｉｅｗ）」では、ホストが予め準備された複数の背景の中から好みの背景を選択でき、選択した背景の中に参加者の画像を手動で配置することができる。 In recent years, video conferences have become more frequent, and along with this, various application tools related to video conferences have been proposed. One of them is that by placing the participants in the video conference under a common background, participants can feel as if they are gathering in a common space and listening to the meeting or seminar. Techniques are suggested that can be given to participants. For example, in the "immersive view" provided by Zoom Video Communications, Inc., the host can select a favorite background from among multiple prepared backgrounds, and the participant's image is displayed in the selected background. can be placed manually.

しかしながら、ビデオ会議の各参加者から送信される画像内における各参加者の顔を含む体の大きさ、顔の大きさ、顔の高さ位置は、さまざまである。このため、各参加者から受信した画像データをそのまま配置しようとすると、図１４に例示するように、参加者間で顔の位置や高さが不揃いとなり、全体として違和感のある画像となる可能性があった。 However, the size of the body including the face of each participant, the size of the face, and the height position of the face in the image transmitted from each participant of the video conference vary. Therefore, if the image data received from each participant is arranged as it is, as shown in FIG. 14, the positions and heights of the faces of the participants may become uneven, and the image as a whole may appear unnatural. was there.

本発明は、このような事情に鑑みてなされたものであって、共通の背景に配置される参加者の顔の大きさのばらつきを低減することのできる画像処理装置、情報処理装置、ビデオ会議サーバ、及びビデオ会議システムを提供することを目的とする。 The present invention has been made in view of such circumstances, and provides an image processing apparatus, an information processing apparatus, and a video conference that can reduce variations in face size of participants placed on a common background. The purpose is to provide a server and a video conference system.

本発明の第一態様は、プロセッサと、前記プロセッサによって実行されるように構成されたプログラムを記憶するメモリと、を備え、前記プログラムは、画像データを取得し、前記画像データに含まれる人物の顔領域を特定し、特定した顔領域の所定高さ位置における横幅の寸法を横幅寸法として検出し、年齢と顔領域の横幅基準値とが関連付けられた基準情報から前記人物の年齢に対応する横幅基準値を特定し、特定された前記横幅基準値に前記横幅寸法を近づけるように前記画像データを調整し、調整後の前記画像を出力するための命令を含む画像処理装置である。 A first aspect of the present invention comprises a processor and a memory storing a program configured to be executed by the processor, the program acquiring image data and displaying images of a person included in the image data. A face area is identified, a width dimension at a predetermined height position of the identified face area is detected as a width dimension, and a width corresponding to the age of the person is detected from reference information in which the age and the width reference value of the face area are associated. The image processing apparatus includes instructions for specifying a reference value, adjusting the image data so that the width dimension approaches the specified width reference value, and outputting the adjusted image.

本発明の第二態様は、画像データを取得する画像取得部と、前記画像データに含まれる人物の顔領域を特定する顔領域特定部と、特定された顔領域の所定高さ位置における横幅の寸法を横幅寸法として検出する寸法検出部と、年齢と顔領域の横幅基準値とが関連付けられた基準情報から前記人物の年齢に対応する横幅基準値を特定する基準値特定部と、特定された横幅基準値に前記横幅寸法を近づけるように前記画像データを調整する画像調整部と、調整後の前記画像データを出力する出力部とを具備する画像処理装置である。 A second aspect of the present invention includes an image acquisition unit that acquires image data, a face region specifying unit that specifies a face region of a person included in the image data, and a horizontal width at a predetermined height position of the specified face region. a dimension detection unit that detects the dimension as a width dimension; a reference value identification unit that identifies a width reference value corresponding to the age of the person from reference information in which the age and the width reference value of the face area are associated; The image processing apparatus includes an image adjustment unit that adjusts the image data so that the width dimension approaches a width reference value, and an output unit that outputs the adjusted image data.

本発明の第三態様は、コンピュータを上記画像処理装置として機能させるためのプログラムである。 A third aspect of the present invention is a program for causing a computer to function as the image processing apparatus.

本発明の第四態様は、上記画像処理装置を備える情報処理装置である。 A fourth aspect of the present invention is an information processing apparatus including the above image processing apparatus.

本発明の第五態様は、画像データを取得する工程と、前記画像データに含まれる人物の顔領域を特定する工程と、特定された顔領域の所定高さ位置における横幅の寸法を横幅寸法として検出する工程と、年齢と顔領域の横幅基準値とが関連付けられた基準情報から前記人物の年齢に対応する横幅基準値を特定する工程と、特定された横幅基準値に前記横幅寸法を近づけるように前記画像データを調整する工程と、調整後の前記画像データを出力する工程とをコンピュータが実行する画像処理方法である。 A fifth aspect of the present invention comprises a step of obtaining image data, a step of identifying a face region of a person included in the image data, and a width dimension at a predetermined height position of the identified face region as a width dimension. identifying a width reference value corresponding to the age of the person from reference information in which the age and the width reference value of the face area are associated; and bringing the width dimension closer to the identified width reference value. and a step of outputting the adjusted image data by a computer.

本発明の第六態様は、複数の情報処理装置と、複数の前記情報処理装置から調整後の前記画像データを受信するビデオ会議サーバとを備え、各前記情報処理装置は、上記記載の画像処理装置を備え、前記ビデオ会議サーバは、各前記情報処理装置から受信した前記画像データを共通の背景下に配置する画像合成部を備えるビデオ会議システムである。 A sixth aspect of the present invention includes a plurality of information processing devices, and a video conference server that receives the adjusted image data from the plurality of information processing devices, each of the information processing devices performing the image processing described above. device, wherein the videoconference server is a videoconferencing system comprising an image synthesizing unit that arranges the image data received from each of the information processing devices under a common background.

本発明の第七態様は、ビデオ会議の複数の参加者の画像データを受信する受信部と、各前記画像データに含まれる参加者の顔領域を特定する顔領域特定部と、特定された顔領域の所定高さ位置における横幅の寸法を横幅寸法として検出する寸法検出部と、各前記画像データにおいて検出された複数の横幅寸法を統計的に処理することにより、基準となる横幅基準値を演算し、各前記画像データにおける各前記横幅寸法を前記横幅基準値に近づけるように、各前記画像データを調整する画像調整部と、各前記画像データを共通のバーチャル背景に配置する画像合成部とを備えるビデオ会議サーバである。 A seventh aspect of the present invention is a receiving unit for receiving image data of a plurality of participants in a video conference, a face area specifying unit for specifying face areas of the participants included in each of the image data, and the specified faces. A dimension detection unit that detects a width dimension at a predetermined height position of an area as a width dimension, and a width reference value that serves as a reference is calculated by statistically processing a plurality of width dimensions detected in each of the image data. and an image adjustment unit for adjusting each of the image data so that each of the width dimensions of each of the image data approaches the width reference value, and an image synthesizing unit for arranging each of the image data on a common virtual background. video conferencing server.

本発明によれば、共通の背景に配置される参加者の顔の大きさのばらつきを低減することができるという効果を奏する。 ADVANTAGE OF THE INVENTION According to this invention, it is effective in the ability to reduce the dispersion|variation in the size of the face of the participant arranged on a common background.

本発明の第１実施形態に係るビデオ会議システムのシステム構成を概略的に示したシステム構成図である。1 is a system configuration diagram schematically showing the system configuration of a video conference system according to a first embodiment of the present invention; FIG. 本発明の第１実施形態に係る情報処理装置の一例としての概略外観図である。1 is a schematic external view as an example of an information processing apparatus according to a first embodiment of the present invention; FIG. 本発明の第１実施形態に係る情報処理装置のハードウェア構成の一例を示した概略構成図である。1 is a schematic configuration diagram showing an example of a hardware configuration of an information processing apparatus according to a first embodiment of the present invention; FIG. 本発明の第１実施形態に係る情報処理装置が備える画像処理機能の一例を示した機能ブロック図である。2 is a functional block diagram showing an example of image processing functions provided in the information processing apparatus according to the first embodiment of the present invention; FIG. 本発明の第１実施形態に係る寸法検出部によって検出される横幅寸法と高さ寸法について説明するための図である。It is a figure for demonstrating the width dimension and height dimension which are detected by the dimension detection part which concerns on 1st Embodiment of this invention. 身長、腸骨棘高、右足長、頭囲のそれぞれについての１歳から２０歳までの成長曲線を示した図である。FIG. 10 is a diagram showing growth curves for height, iliac spine height, right leg length, and head circumference from 1 year old to 20 years old. 本発明の第１実施形態に係る画像調整部による調整処理について説明するための図である。FIG. 4 is a diagram for explaining adjustment processing by an image adjustment unit according to the first embodiment of the present invention; 本発明の第１実施形態に係る画像処理方法の処理手順の一例を示したフローチャートである。4 is a flow chart showing an example of a processing procedure of an image processing method according to the first embodiment of the present invention; 本発明の第１実施形態に係るビデオ会議サーバのハードウェア構成の一例を示した概略構成図である。1 is a schematic configuration diagram showing an example of a hardware configuration of a videoconference server according to a first embodiment of the present invention; FIG. 本発明の第１実施形態に係るビデオ会議サーバが備える機能の一例を示した機能ブロック図である。3 is a functional block diagram showing an example of functions provided in the videoconference server according to the first embodiment of the present invention; FIG. 本発明の第１実施形態に係るビデオ会議システムにおいて、各情報処理装置のディスプレイに表示される画像の一例を示した図である。FIG. 4 is a diagram showing an example of an image displayed on the display of each information processing device in the video conference system according to the first embodiment of the present invention; 本発明の第２実施形態に係るビデオ会議サーバが備える機能の一例を示した機能ブロック図である。FIG. 9 is a functional block diagram showing an example of functions provided in a videoconference server according to the second embodiment of the present invention; 本発明の第３実施形態に係るビデオ会議サーバが備える機能の一例を示した機能ブロック図である。FIG. 11 is a functional block diagram showing an example of functions provided in a videoconference server according to the third embodiment of the present invention; 本発明の課題を説明するための図である。It is a figure for demonstrating the subject of this invention.

〔第１実施形態〕
以下に、本発明に係る画像処理装置、情報処理装置、ビデオ会議サーバ、及びビデオ会議システムの第１実施形態について、図面を参照して説明する。 [First embodiment]
A first embodiment of an image processing apparatus, an information processing apparatus, a videoconference server, and a videoconference system according to the present invention will be described below with reference to the drawings.

図１は、本発明の第１実施形態に係るビデオ会議システム１のシステム構成を概略的に示したシステム構成図である。図１に示すように、ネットワーク８には、複数の情報処理装置１０及びビデオ会議サーバ５０が接続されている。情報処理装置１０の一例として、ノートＰＣ、タブレット端末、スマートフォンなどが挙げられる。以下説明の便宜上、情報処理装置１０としてノートＰＣを例示して説明する。
なお、図１に示した例では、３台の情報処理装置１０が図示されているが、情報処理装置の接続台数はこれに限られない。 FIG. 1 is a system configuration diagram schematically showing the system configuration of a video conference system 1 according to the first embodiment of the invention. As shown in FIG. 1, a network 8 is connected with a plurality of information processing apparatuses 10 and a videoconference server 50 . Examples of the information processing device 10 include a notebook PC, a tablet terminal, a smartphone, and the like. For convenience of explanation, a notebook PC will be described as an example of the information processing apparatus 10 .
Although three information processing apparatuses 10 are illustrated in the example shown in FIG. 1, the number of connected information processing apparatuses is not limited to this.

図２は、本発明の第１実施形態に係る情報処理装置１０の一例としての概略外観図である。図２に示すように、情報処理装置１０は、いずれも略直方体である本体側筐体２及びディスプレイ側筐体３を備える。本体側筐体２は、入力デバイス４を備える。入力デバイス４は、使用者が入力操作を行うための使用者インターフェースであり、文字、コマンド等を入力する各種キーより構成されるキーボードや、画面上のカーソルを移動させたり、各種メニューを選択するタッチパッド等を備えている。 FIG. 2 is a schematic external view as an example of the information processing apparatus 10 according to the first embodiment of the invention. As shown in FIG. 2, the information processing apparatus 10 includes a body-side housing 2 and a display-side housing 3, both of which are substantially rectangular parallelepipeds. The body-side housing 2 has an input device 4 . The input device 4 is a user interface for the user to perform input operations, and includes a keyboard composed of various keys for inputting characters, commands, etc., and for moving the cursor on the screen and selecting various menus. It has a touch pad.

ディスプレイ側筐体３は、画像を表示するディスプレイ５を備える。本実施形態において、ディスプレイ５は、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）を例示するが、ＬＣＤに限らず有機ＥＬ（Ｅｌｅｃｔｒｏｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイ等の他の表示機器、又はタッチパネルとされてもよい。ＬＣＤは、入力される表示データをビデオ信号に変換し、変換したビデオ信号に応じた各種情報を表示画面に表示する。 The display-side housing 3 includes a display 5 for displaying images. In this embodiment, the display 5 is an LCD (Liquid Crystal Display). The LCD converts input display data into a video signal, and displays various information according to the converted video signal on the display screen.

本体側筐体２及びディスプレイ側筐体３は、それぞれの端部で一対の連結部６によって連結されている。連結部６は、ヒンジであり、本体側筐体２及びディスプレイ側筐体３を開閉可能に支持している。 The body-side housing 2 and the display-side housing 3 are connected by a pair of connecting portions 6 at their respective ends. The connecting portion 6 is a hinge, and supports the main body side housing 2 and the display side housing 3 so that they can be opened and closed.

また、ディスプレイ側筐体３は、画像を取得するためのカメラ７を備える。カメラ７は、例えば、ディスプレイ側筐体３に設けられた表示画面の上方（連結部６で連結された側部と反対側の側部）の中央部に配置され、前方の人物（例えば、利用者の顔）を撮像可能とされている。 The display-side housing 3 also includes a camera 7 for acquiring an image. The camera 7 is arranged, for example, in the central portion above the display screen provided in the display-side housing 3 (the side portion opposite to the side portion connected by the connecting portion 6), and a person in front (for example, a user person's face) can be imaged.

図３は、本実施形態に係る情報処理装置１０のハードウェア構成の一例を示した概略構成図である。図３に示すように、情報処理装置１０は、上述した入力デバイス４、ディスプレイ５、カメラ７に加えて、ＣＰＵ（プロセッサ）１１、メインメモリ１２、記憶部１３、外部インターフェース１４、通信インターフェース１５、スピーカ１６、及びマイク１７などを備えている。これら各部は直接的にまたはバスを介して間接的に相互に接続されており互いに連携して各種処理を実行する。 FIG. 3 is a schematic configuration diagram showing an example of the hardware configuration of the information processing apparatus 10 according to this embodiment. As shown in FIG. 3, the information processing apparatus 10 includes, in addition to the input device 4, the display 5, and the camera 7 described above, a CPU (processor) 11, a main memory 12, a storage section 13, an external interface 14, a communication interface 15, A speaker 16, a microphone 17, and the like are provided. These units are connected to each other directly or indirectly via a bus, and cooperate with each other to perform various processes.

ＣＰＵ１１は、例えば、バスを介して接続された記憶部１３に格納されたＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）により情報処理装置１０全体の制御を行うとともに、記憶部１３に格納された各種プログラムを実行することにより各種処理を実行する。ＣＰＵ１１は、複数設けられており、互いに協働して処理を実現させてもよい。 The CPU 11 controls the entire information processing apparatus 10 by, for example, an OS (Operating System) stored in a storage unit 13 connected via a bus, and executes various programs stored in the storage unit 13. Executes various processing. A plurality of CPUs 11 may be provided and cooperate with each other to realize processing.

メインメモリ１２は、キャッシュメモリ、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の書き込み可能なメモリで構成され、ＣＰＵ１１の実行プログラムの読み出し、実行プログラムによる処理データの書き込み等を行う作業領域として利用される。 The main memory 12 is composed of a writable memory such as a cache memory and a RAM (Random Access Memory), and is used as a working area for reading the execution program of the CPU 11 and writing processing data by the execution program.

記憶部１３は、非一時的な記録媒体（ｎｏｎ－ｔｒａｎｓｉｔｏｒｙｃｏｍｐｕｔｅｒｒｅａｄａｂｌｅｓｔｏｒａｇｅｍｅｄｉｕｍ）である。記憶部１３の一例として、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、フラッシュメモリなどが挙げられる。記憶部１３は、例えば、Ｗｉｎｄｏｗｓ（登録商標）、ｉＯＳ（登録商標）、Ａｎｄｒｏｉｄ（登録商標）等の情報処理装置１０全体の制御を行うためのＯＳ、ＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）、周辺機器類をハードウェア操作するための各種デバイスドライバ、各種アプリケーションソフトウェア、及び各種データやファイル等を格納する。また、記憶部１３には、各種処理を実現するためのプログラムや、各種処理を実現するために必要とされる各種データが格納されている。記憶部１３は、複数設けられていてもよく、各記憶部１３に上述したようなデータが分割されて格納されていてもよい。 The storage unit 13 is a non-transitory computer readable storage medium. Examples of the storage unit 13 include ROM (Read Only Memory), HDD (Hard Disk Drive), flash memory, and the like. The storage unit 13 stores, for example, an OS such as Windows (registered trademark), iOS (registered trademark), and Android (registered trademark) for controlling the entire information processing apparatus 10, BIOS (Basic Input/Output System), and peripheral devices. It stores various device drivers, various application software, and various data and files for operating the hardware. The storage unit 13 also stores programs for implementing various processes and various data required for implementing various processes. A plurality of storage units 13 may be provided, and data as described above may be divided and stored in each storage unit 13 .

外部インターフェース１４は、外部機器と接続するためのインターフェースである。外部機器の一例として、外部モニタ、ＵＳＢメモリ、外付けＨＤＤ、外付けカメラ等が挙げられる。なお、図１に示した例では、外部インターフェースは、１つしか図示されていないが、複数の外部インターフェースを備えていてもよい。 The external interface 14 is an interface for connecting with an external device. Examples of external devices include an external monitor, a USB memory, an external HDD, an external camera, and the like. Although only one external interface is shown in the example shown in FIG. 1, a plurality of external interfaces may be provided.

通信インターフェース１５は、ネットワークに接続して他の装置と通信を行い、情報の送受信を行うためのインターフェースとして機能する。例えば、通信インターフェース１５は、有線又は無線により他の装置と通信を行う。無線通信として、Ｂｌｕｅｔｏｏｔｈ（登録商標）、Ｗｉ－Ｆｉ、３Ｇ、４Ｇ、５Ｇ、ＬＴＥ、無線ＬＡＮなどの回線を通じた通信が挙げられる。有線通信の一例として、有線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）などの回線を通じた通信が挙げられる。 The communication interface 15 functions as an interface for connecting to a network, communicating with other devices, and transmitting and receiving information. For example, the communication interface 15 communicates with other devices by wire or wirelessly. Wireless communication includes communication through lines such as Bluetooth (registered trademark), Wi-Fi, 3G, 4G, 5G, LTE, and wireless LAN. An example of wired communication is communication through a line such as a wired LAN (Local Area Network).

スピーカ１６及びマイク１７については周知の構成であるため、ここでの詳細な説明は省略する。 Since the speaker 16 and the microphone 17 have well-known configurations, detailed description thereof will be omitted here.

ビデオ会議サーバ５０は、いわゆるコンピュータであり、上述した情報処理装置１０と同様に、ＣＰＵ、メインメモリ、記憶部、通信インターフェースなどを備えている。なお、ビデオ会議サーバ５０についての詳細は後述する。 The videoconference server 50 is a so-called computer, and includes a CPU, a main memory, a storage unit, a communication interface, and the like, like the information processing apparatus 10 described above. Details of the video conference server 50 will be described later.

次に、本実施形態に係る情報処理装置１０が有する機能の一例について図を参照して説明する。図４は、本実施形態に係る情報処理装置１０が備える画像処理機能の一例を示した機能ブロック図である。 Next, an example of the functions of the information processing apparatus 10 according to this embodiment will be described with reference to the drawings. FIG. 4 is a functional block diagram showing an example of image processing functions provided in the information processing apparatus 10 according to this embodiment.

後述する各種機能を実現するための一連の処理は、一例として、プログラムの形式で各情報処理装置１０が備える記憶部１３などに記憶されており、このプログラムをＣＰＵ（プロセッサ）１１がメインメモリ１２に読み出して、情報の加工・演算処理を実行することにより、各種機能が実現される。なお、プログラムは、記憶部１３に予めインストールされている形態や、他のコンピュータ読み取り可能な記憶媒体に記憶された状態で提供される形態、有線又は無線による通信手段を介して配信される形態等が適用されてもよい。コンピュータ読み取り可能な記憶媒体とは、磁気ディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、半導体メモリ等である。 A series of processes for realizing various functions to be described later are stored in the storage unit 13 provided in each information processing apparatus 10 in the form of a program, for example. Various functions are realized by reading out the information and executing processing and arithmetic processing of the information. The program may be pre-installed in the storage unit 13, may be provided in a state stored in another computer-readable storage medium, or may be distributed via wired or wireless communication means. may apply. Computer-readable storage media include magnetic disks, magneto-optical disks, CD-ROMs, DVD-ROMs, semiconductor memories, and the like.

図４に示すように、情報処理装置１０の画像処理部（画像処理装置）２０は、画像取得部２１、顔領域特定部２２、寸法検出部２３、属性推定部２４、パラメータ記憶部２５、基準情報記憶部２６、基準値特定部２７、画像調整部２８、及び出力部２９を備えている。 As shown in FIG. 4, the image processing unit (image processing device) 20 of the information processing device 10 includes an image acquisition unit 21, a face region identification unit 22, a dimension detection unit 23, an attribute estimation unit 24, a parameter storage unit 25, a reference An information storage unit 26 , a reference value identification unit 27 , an image adjustment unit 28 and an output unit 29 are provided.

画像取得部２１は、例えば、カメラ７によって撮像された画像データを取得する。具体的には、画像取得部２１は、カメラ７によって所定のフレームレートで撮像された画像データを次々と取得する。 The image acquisition unit 21 acquires image data captured by the camera 7, for example. Specifically, the image acquisition unit 21 successively acquires image data captured by the camera 7 at a predetermined frame rate.

顔領域特定部２２は、入力された画像データに含まれる人物（例えば、ビデオ会議の参加者）の顔領域を特定する。人物の顔領域の特定は、公知の技術を適宜採用することにより実現可能である。例えば、画像内における顔の特徴量を抽出することで顔領域を特定する。また、この顔領域特定部２２は、顔領域における眉毛、目、鼻、口などについても特定することができるように構成されていてもよい。 The face area identifying unit 22 identifies the face area of a person (for example, a participant in a video conference) included in the input image data. Identification of a person's face area can be realized by appropriately adopting a known technique. For example, the face region is specified by extracting the feature amount of the face in the image. Further, the face area specifying section 22 may be configured to be able to specify eyebrows, eyes, nose, mouth, etc. in the face area.

寸法検出部２３は、例えば、図５に示すように、顔領域特定部２２によって特定された顔領域の所定高さ位置における横幅の寸法を横幅寸法Ｌ１として検出する。例えば、本実施形態では、図５に示すように、所定高さ位置として、耳の所定の部位（例えば、耳珠（trugs））の位置を採用するが、この例に限定されない。例えば、耳の位置、眉毛、瞳孔の位置などを高さ位置として採用してもよい。 For example, as shown in FIG. 5, the dimension detection unit 23 detects the width dimension at a predetermined height position of the face area identified by the face area identification unit 22 as the width dimension L1. For example, in this embodiment, as shown in FIG. 5, the position of a predetermined portion of the ear (for example, trugs) is used as the predetermined height position, but the present invention is not limited to this example. For example, the position of the ear, the position of the eyebrows, the position of the pupil, etc. may be used as the height position.

また、寸法検出部２３は、図５に示すように、顔領域特定部２２によって特定された顔領域の情報に基づいて、人物の高さの寸法を高さ寸法Ｌ２として検出する。例えば、本実施形態では、所定高さ位置として、耳の所定の部位（例えば、耳珠（trugs））の位置を採用するが、この例に限定されない。例えば、耳の位置、眉毛、瞳孔の位置などを高さ位置として採用してもよい。 5, the dimension detection unit 23 detects the height dimension of the person as the height dimension L2 based on the information of the face area identified by the face area identification unit 22. FIG. For example, in the present embodiment, a position of a predetermined portion of the ear (for example, tragus) is used as the predetermined height position, but the position is not limited to this example. For example, the position of the ear, the position of the eyebrows, the position of the pupil, etc. may be used as the height position.

ここで、顔の横幅や頭囲は、体の他の部位に比べて年齢や性別によるばらつきが小さいことが学術論文などにおいて発表されている。例えば、”Static adult human physical characteristics of the adult head, from pages 72-75 of Poston, Alan. (April 2000) Department of Defense Human Factors Engineering Technical Advisory Group (DOD HFE TAG)” （https://de.wikipedia.org/wiki/Datei:HeadAnthropometry.JPG）には、ある統計の結果、男性の顔の横幅の平均値は１４．５ｃｍ、女性の顔の横幅の平均値は１３．３ｃｍであり、男女比で０．８ｃｍ程度しか変わらないことが開示されている。また、実践女子大学生活科学部生活環境学科の「第２０回こどもの成長と衣服高部恵子」「https://www.jissen.ac.jp/kankyo/lib-lec20.html」に開示されている工技院資料の百分率成長曲線（男、１９７８－８１）には、身長、腸骨棘高、右足長、頭囲のそれぞれについての１歳から２０歳までの成長曲線が開示されている（例えば、図６参照）。 Here, it has been published in academic papers and the like that the width of the face and the circumference of the head show less variation depending on age and sex than other parts of the body. For example, ”Static adult human physical characteristics of the adult head, from pages 72-75 of Poston, Alan. (April 2000) Department of Defense Human Factors Engineering Technical Advisory Group (DOD HFE TAG)” (https://de.wikipedia .org/wiki/Datei:HeadAnthropometry.JPG), as a result of a certain statistic, the average face width for men is 14.5 cm, and the average face width for women is 13.3 cm. It is disclosed that the difference is only about 0.8 cm. In addition, Jissen Women's University, Faculty of Life Science, Department of Life Environment, "20th Child Growth and Clothing Keiko Takabe" "https://www.jissen.ac.jp/kankyo/lib-lec20.html" The Percentage Growth Curves (Male, 1978-81) of Kogyo Giken data shows growth curves for height, iliac spine height, right leg length, and head circumference from 1 to 20 years of age ( For example, see FIG. 6).

上述した統計結果から、顔の横幅や頭囲については年齢及び性別によるばらつきが他の部位に比べて極めて小さいことがわかる。すなわち、図６に示した成長曲線によれば、すでに１歳において成人の頭囲の約８０％に相当する値を示しており、また、１３歳以降は、年齢による値の違いがほとんどないことがわかる。発明者らは、このような新たな知見から、カメラ７から入力された画像データにおいて、顔の横幅を年齢に応じた値に調整すれば、ビデオ会議に参加している参加者間の顔の大きさを自然な大きさに調整できるとの着想を得た。 From the statistical results described above, it can be seen that variations in the width of the face and head circumference due to age and gender are extremely small compared to other parts. That is, according to the growth curve shown in FIG. 6, the head circumference already shows a value corresponding to about 80% of the head circumference of an adult at the age of 1, and there is almost no difference in the value due to age after the age of 13. I understand. Based on such new findings, the inventors found that if the width of the face in the image data input from the camera 7 is adjusted to a value according to the age, the faces of the participants participating in the video conference can be improved. I got the idea that the size can be adjusted to a natural size.

属性推定部２４は、顔領域特定部２２によって特定された顔領域の画像に基づいて人物の年齢及び性別を推定する。人物の年齢及び性別を推定する手法については公知の手法を適宜採用することが可能である。例えば、顔画像の特徴量を抽出し、さらに、予め登録されている年齢（年代）および性別を識別するための特徴量とから類似度を計算し、類似度の値に基づいて、年齢及び性別を推定する。年齢及び性別を推定する手法の一例として、例えば、特許第５２８７３３３号公報に記載されている技術を採用することが可能である。 The attribute estimation unit 24 estimates the age and gender of the person based on the image of the facial area identified by the facial area identification unit 22 . As for the method of estimating the age and sex of a person, it is possible to adopt a known method as appropriate. For example, the feature amount of the face image is extracted, and the similarity is calculated from the pre-registered feature amount for identifying age (age) and gender, and based on the similarity value, age and gender to estimate As an example of a technique for estimating age and sex, it is possible to employ the technique described in Japanese Patent No. 5287333, for example.

パラメータ記憶部２５は、同じ画像データにおいて検出された横幅寸法及び推定された年齢及び性別を１つのデータセットとして記憶する。例えば、パラメータ記憶部２５は、新しい順に所定数（所定数≧２）のデータセットを格納するバッファメモリである。パラメータ記憶部２５は、一例として、ＦＩＦＯ（ＦｉｒｓｔＩｎＦｉｒｓｔＯｕｔ）メモリによって実現される。これにより、パラメータ記憶部２５には、直近に検出された所定数のデータセットが格納されることとなる。 The parameter storage unit 25 stores the width dimension and the estimated age and sex detected in the same image data as one data set. For example, the parameter storage unit 25 is a buffer memory that stores a predetermined number (predetermined number≧2) of data sets in order of newest. The parameter storage unit 25 is implemented by, for example, a FIFO (First In First Out) memory. As a result, the parameter storage unit 25 stores a predetermined number of data sets detected most recently.

基準情報記憶部２６には、年齢と顔領域の横幅基準値Ｌ１＿ｒｅｆと人物の高さ基準値Ｌ２＿ｒｅｆとが関連付けられた基準情報が格納されている。この基準情報は、性別に応じてそれぞれ設けられている。年齢は、１０代、２０代、３０代などのように年代別に区分されていてもよく、年代毎に横幅基準値及び高さ基準値が関連付けられていてもよい。 The reference information storage unit 26 stores reference information in which the age, the width reference value L1_ref of the face region, and the height reference value L2_ref of the person are associated with each other. This reference information is provided for each gender. Ages may be divided into ages such as teens, twenties, and thirties, and each age may be associated with a width reference value and a height reference value.

基準値特定部２７は、パラメータ記憶部２５に格納されている所定数のデータセットから年齢の情報を取得し、これらを統計的に処理することにより、代表的な年齢を特定する。換言すると、基準値特定部２７は、時系列の複数の画像データにおいて推定された複数の年齢を統計的に処理することにより代表的な年齢を演算する。代表的な年齢の一例として、平均年齢、年齢分布のパーセンタイル５０％の年齢などが挙げられる。 The reference value identification unit 27 acquires age information from a predetermined number of data sets stored in the parameter storage unit 25, and statistically processes the information to identify a representative age. In other words, the reference value identifying unit 27 calculates a representative age by statistically processing a plurality of ages estimated in a plurality of time-series image data. Examples of typical ages include the average age, the 50% age of the age distribution, and the like.

また、基準値特定部２７は、パラメータ記憶部２５に格納されている所定数のデータセットから性別の情報を取得し、これらを統計的に処理することにより、代表的な性別を特定する。例えば、基準値特定部２７は、パラメータ記憶部２５から読みだした所定数の性別のうち、数の多い性別を代表的な性別として特定する。
基準値特定部２７は、特定した性別に対応する基準情報を用いて、代表的な年齢に対応する横幅基準値Ｌ１＿ｒｅｆ及び高さ基準値Ｌ２＿ｒｅｆを特定する。 Further, the reference value specifying unit 27 acquires gender information from a predetermined number of data sets stored in the parameter storage unit 25, and statistically processes the information to specify representative gender. For example, the reference value identification unit 27 identifies the sex with the highest number among the predetermined number of sexes read from the parameter storage unit 25 as the representative sex.
The reference value specifying unit 27 specifies the width reference value L1_ref and the height reference value L2_ref corresponding to the representative age using the reference information corresponding to the specified sex.

画像調整部２８は、画像データの横幅寸法Ｌ１が基準値特定部２７によって特定された横幅基準値Ｌ１＿ｒｅｆに近づくように、画像データを調整する。
例えば、画像調整部２８は、パラメータ記憶部２５に格納されている所定数のデータセットから横幅寸法を取得する。換言すると、画像調整部２８は、時系列の複数の画像データにおいて検出された複数の横幅寸法を統計的に処理することにより代表的な横幅寸法を演算する。代表的な横幅寸法の一例として、平均値、横幅寸法のパーセンタイル５０％の値などが挙げられる。そして、画像調整部２８は、代表的な横幅寸法が横幅基準値となるような調整比率（拡大縮小比率）を演算し、演算した調整比率を用いて画像データを縮小又は拡大する。このように、直近に検出された所定数の横幅寸法を用いて調整比率を演算することにより、調整比率の変化を滑らかにすることができる。 The image adjusting unit 28 adjusts the image data so that the horizontal width dimension L1 of the image data approaches the horizontal width reference value L1_ref specified by the reference value specifying unit 27 .
For example, the image adjustment unit 28 acquires the width dimension from a predetermined number of data sets stored in the parameter storage unit 25 . In other words, the image adjuster 28 statistically processes a plurality of width dimensions detected in a plurality of time-series image data to calculate a representative width dimension. Examples of typical width dimensions include an average value, a 50% percentile value of the width dimensions, and the like. Then, the image adjustment unit 28 calculates an adjustment ratio (enlargement/reduction ratio) such that the representative width dimension becomes the width reference value, and uses the calculated adjustment ratio to reduce or enlarge the image data. In this way, by calculating the adjustment ratio using a predetermined number of the most recently detected width dimensions, it is possible to smooth the change in the adjustment ratio.

更に、画像調整部２８は、画像データにおける人物の高さ寸法Ｌ２が基準値特定部２７によって特定された高さ基準値Ｌ２＿ｒｅｆに近づくように、画像データにおける人物の高さ位置を調整する。例えば、画像調整部２８は、画像データにおける人物の高さ寸法Ｌ２が高さ基準値Ｌ２＿ｒｅｆと一致するようにクリッピングする。
これにより、例えば、図５に示した画像データは、図７に示すように、人物の横幅寸法が横幅基準値Ｌ１＿ｒｅｆになるように調整されるとともに、人物の高さ寸法が高さ基準値Ｌ２＿ｒｅｆになるように高さ位置が調整される。 Further, the image adjusting unit 28 adjusts the height position of the person in the image data so that the height L2 of the person in the image data approaches the height reference value L2_ref specified by the reference value specifying unit 27. For example, the image adjuster 28 clips the height dimension L2 of the person in the image data so that it matches the height reference value L2_ref.
As a result, for example, in the image data shown in FIG. 5, as shown in FIG. 7, the width of the person is adjusted to the width reference value L1_ref, and the height of the person is adjusted to the height reference value L2_ref. The height position is adjusted so that

また、画像調整部２８は、図７に示すように、人物の高さ位置を調整することにより、調整後の画像データにおいて、高さ方向に不足する画素Ｐｘが生じた場合には、不足した画素Ｐｘを周囲の画素情報から推測して補充する。 Further, as shown in FIG. 7, the image adjustment unit 28 adjusts the height position of the person, and if there is a shortage of pixels Px in the height direction in the image data after adjustment, the lack of pixels Px occurs. The pixel Px is estimated from surrounding pixel information and supplemented.

出力部２９は、画像調整部２８によって調整された画像データを出力する。出力部２９は、例えば、調整後の画像データをネットワーク８を介してビデオ会議サーバ５０に送信する。 The output section 29 outputs the image data adjusted by the image adjusting section 28 . The output unit 29 , for example, transmits the adjusted image data to the videoconference server 50 via the network 8 .

次に、情報処理装置１０によって実行される画像処理方法について図８を参照して説明する。図８は、画像処理方法の処理手順の一例を示したフローチャートである。以下の一連の処理は、記憶部１３に記憶されているプログラムをＣＰＵ（プロセッサ）１１がメインメモリ１２に読み出して、情報の加工・演算処理を実行することにより実行される。 Next, an image processing method executed by the information processing apparatus 10 will be described with reference to FIG. FIG. 8 is a flow chart showing an example of the processing procedure of the image processing method. The following series of processes are executed by CPU (processor) 11 reading a program stored in storage unit 13 to main memory 12 and processing and arithmetic processing of information.

例えば、情報処理装置１０がネットワーク８を介してビデオ会議サーバ５０と接続し、カメラ機能がオンにされると、カメラ７によって画像データが取得され、取得された画像データが随時出力される。
画像処理部（画像処理装置）２０は、カメラ７から出力された画像データを取得すると（ＳＡ１）、取得した画像データに含まれる人物の顔領域を特定する（ＳＡ２）。続いて、特定した顔領域の所定高さ位置、例えば、耳珠（trugs）の位置における横幅の寸法を横幅寸法Ｌ１として検出するとともに、画像データに含まれる人物の高さ寸法Ｌ２を検出する（ＳＡ３）。 For example, when the information processing apparatus 10 is connected to the videoconference server 50 via the network 8 and the camera function is turned on, image data is acquired by the camera 7 and the acquired image data is output as needed.
When the image processing unit (image processing device) 20 acquires the image data output from the camera 7 (SA1), the image processing unit (image processing device) 20 specifies a person's face area included in the acquired image data (SA2). Subsequently, the width dimension at a predetermined height position of the specified face region, for example, the position of the trugs, is detected as the width dimension L1, and the height dimension L2 of the person included in the image data is detected ( SA3).

続いて、画像データの人物の顔領域の特徴量に基づいて人物の年齢（年代）及び性別を推定する（ＳＡ４）。続いて、ステップＳＡ３で検出した横幅寸法Ｌ１及び高さ寸法Ｌ２並びにステップＳＡ４で検出した人物の年齢及び性別を一つのデータセットとして、パラメータ記憶部２５に格納する（ＳＡ５）。これにより、すでにパラメータ記憶部２５に格納されていた一番古いデータセットが消去されるとともに、上述した最新のデータセットがパラメータ記憶部２５に格納されることとなる。 Subsequently, the age (generation) and sex of the person are estimated based on the feature amount of the person's face area in the image data (SA4). Subsequently, the width dimension L1 and height dimension L2 detected in step SA3 and the age and sex of the person detected in step SA4 are stored as one data set in the parameter storage unit 25 (SA5). As a result, the oldest data set already stored in the parameter storage unit 25 is deleted, and the latest data set described above is stored in the parameter storage unit 25 .

次に、パラメータ記憶部２５に格納されている所定数のデータセットから年齢の情報を取得し、これらを統計的に処理することにより、代表的な年齢を特定する。また、同様に、パラメータ記憶部２５に格納されている所定数のデータセットから性別の情報を取得し、これらを統計的に処理することにより、代表的な性別を特定する（ＳＡ６）。 Next, age information is acquired from a predetermined number of data sets stored in the parameter storage unit 25, and representative ages are specified by statistically processing these. Similarly, gender information is obtained from a predetermined number of data sets stored in the parameter storage unit 25, and is statistically processed to specify representative gender (SA6).

続いて、代表的な性別に対応する基準情報を用いて、代表的な年齢に対応する横幅基準値Ｌ１＿ｒｅｆ及び高さ基準値Ｌ２＿ｒｅｆを特定する（ＳＡ７）。 Subsequently, the width reference value L1_ref and the height reference value L2_ref corresponding to the representative age are specified using the reference information corresponding to the representative sex (SA7).

次に、パラメータ記憶部２５に格納されている所定数のデータセットから横幅寸法Ｌ１を取得し、これらを統計的に処理することにより代表的な横幅寸法を演算する。同様に、パラメータ記憶部２５に格納されている所定数のデータセットから高さ寸法Ｌ２を取得し、これらを統計的に処理することにより代表的な高さ寸法を演算する（ＳＡ８）。 Next, the width dimension L1 is acquired from a predetermined number of data sets stored in the parameter storage unit 25, and a representative width dimension is calculated by statistically processing these. Similarly, the height dimension L2 is acquired from a predetermined number of data sets stored in the parameter storage unit 25, and a representative height dimension is calculated by statistically processing these (SA8).

続いて、代表的な横幅寸法が横幅基準値Ｌ１＿ｒｅｆとなるような調整比率（拡大縮小比率）を算出し（ＳＡ９）、算出した調整比率及び高さ基準値Ｌ２＿ｒｅｆに基づいてステップＳＡ１で入力された画像データを調整する（ＳＡ１０）。具体的には、算出した調整比率に基づいて当該画像データを拡大又は縮小するとともに、当該画像データにおける人物の高さが高さ基準値Ｌ２＿ｒｅｆに近づくように、人物の高さ位置を調整する。これにより、例えば、図５に示した画像データは、図７に示すように、人物の顔の横幅寸法Ｌ１が横幅基準値Ｌ１＿ｒｅｆとなるように拡大又は縮小されるとともに、人物の高さ寸法Ｌ２が高さ基準値Ｌ２＿ｒｅｆとなるように高さ位置が調整される。また、図７に示すように、調整後の画像データにおいて、高さ方向に不足する画素Ｐｘが生じた場合には、不足した画素Ｐｘを周囲の画素情報から推測して補充する。 Subsequently, an adjustment ratio (enlargement/reduction ratio) is calculated so that the representative width dimension becomes the width reference value L1_ref (SA9), and based on the calculated adjustment ratio and the height reference value L2_ref, which was input in step SA1 Image data is adjusted (SA10). Specifically, the image data is enlarged or reduced based on the calculated adjustment ratio, and the height position of the person in the image data is adjusted so that the height of the person in the image data approaches the height reference value L2_ref. As a result, for example, the image data shown in FIG. 5 is enlarged or reduced so that the width dimension L1 of the person's face becomes the width reference value L1_ref as shown in FIG. is the height reference value L2_ref. Further, as shown in FIG. 7, if there is a missing pixel Px in the height direction in the adjusted image data, the missing pixel Px is estimated from surrounding pixel information and supplemented.

このようにして画像の調整が完了すると、調整後の画像データを出力する（ＳＡ１１）。調整後の画像データは、ビデオ会議サーバ５０（図１参照）へ送信される。
そして、上記の如き処理を画像データを取得するたびに行うことにより、人物の大きさ及び高さ位置が調整された画像データが継続的にビデオ会議サーバ５０に送信されることとなる。 When image adjustment is completed in this way, the image data after adjustment is output (SA11). The adjusted image data is sent to the videoconference server 50 (see FIG. 1).
By performing the above-described processing each time image data is acquired, image data in which the size and height position of the person are adjusted are continuously transmitted to the videoconference server 50 .

ビデオ会議サーバ５０（図１参照）は、各情報処理装置１０から画像データを受信すると、受信した画像データを共通のバーチャル背景下に配置する。ここで、各情報処理装置１０から受信する画像データは、それぞれ人物（参加者）の顔の横幅や高さ位置が調整された画像とされているので、共通の背景に配置した際に、統一感が生まれ、違和感のない合成画像を作成することが可能となる。 When the videoconference server 50 (see FIG. 1) receives image data from each information processing device 10, it arranges the received image data under a common virtual background. Here, since the image data received from each information processing device 10 is an image in which the width and height position of the face of each person (participant) are adjusted, when arranged on a common background, the image data can be unified. It is possible to create a synthetic image that gives a feeling and does not give a sense of incongruity.

以下、ビデオ会議サーバ５０について図面を参照して説明する。
図９は、本実施形態に係るビデオ会議サーバ５０のハードウェア構成の一例を示した概略構成図である。図９に示すように、ビデオ会議サーバ５０は、コンピュータであり、ＣＰＵ（プロセッサ）５１、メインメモリ５２、記憶部５３、外部インターフェース５４、通信インターフェース５５などを備えている。また、ビデオ会議サーバ５０は、入力デバイス、ディスプレイを備えていてもよい。
上述した各部は直接的にまたはバスを介して間接的に相互に接続されており互いに連携して各種処理を実行する。これら各構成については、上述した情報処理装置１０と同様であるため、ここでの詳細な説明は省略する。 The videoconference server 50 will be described below with reference to the drawings.
FIG. 9 is a schematic configuration diagram showing an example of the hardware configuration of the videoconference server 50 according to this embodiment. As shown in FIG. 9, the videoconference server 50 is a computer, and includes a CPU (processor) 51, a main memory 52, a storage section 53, an external interface 54, a communication interface 55, and the like. The videoconference server 50 may also include input devices and displays.
The units described above are connected to each other directly or indirectly via a bus, and cooperate with each other to perform various processes. Since each of these configurations is the same as that of the information processing apparatus 10 described above, detailed description thereof will be omitted here.

図１０は、ビデオ会議サーバ５０が備える機能の一例を示した機能ブロック図である。後述する各種機能を実現するための一連の処理は、一例として、プログラムの形式でビデオ会議サーバ５０が備える記憶部５３などに記憶されており、このプログラムをＣＰＵ（プロセッサ）５１がメインメモリ５２に読み出して、情報の加工・演算処理を実行することにより、各種機能が実現される。なお、プログラムは、記憶部５３に予めインストールされている形態や、他のコンピュータ読み取り可能な記憶媒体に記憶された状態で提供される形態、有線又は無線による通信手段を介して配信される形態等が適用されてもよい。コンピュータ読み取り可能な記憶媒体とは、磁気ディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、半導体メモリ等である。 FIG. 10 is a functional block diagram showing an example of the functions of the videoconference server 50. As shown in FIG. A series of processes for realizing various functions to be described later are stored, for example, in the form of a program in the storage unit 53 provided in the video conference server 50 , and the program is stored in the main memory 52 by the CPU (processor) 51 . Various functions are realized by reading out and processing and arithmetic processing of information. The program may be pre-installed in the storage unit 53, may be provided in a state stored in another computer-readable storage medium, or may be distributed via wired or wireless communication means. may apply. Computer-readable storage media include magnetic disks, magneto-optical disks, CD-ROMs, DVD-ROMs, semiconductor memories, and the like.

図１０に示すように、ビデオ会議サーバ５０は、受信部６１、輝度演算部６２、輝度調整部６３、画像合成部６４、及び送信部６５を備えている。 As shown in FIG. 10 , the videoconference server 50 includes a receiver 61 , a luminance calculator 62 , a luminance adjuster 63 , an image synthesizer 64 and a transmitter 65 .

受信部６１は、各情報処理装置１０から送信された画像データを受信する。
輝度演算部６２は、各情報処理装置１０から受信した各画像データについて輝度ヒストグラムを演算する。
輝度調整部６３は、複数の画像データの輝度ヒストグラムのばらつきが小さくなるように、各画像データの輝度を調整する。例えば、各画像データの輝度ヒストグラムのパーセンタイル５０％の輝度が一致するように、各画像データの輝度を調整する。これにより、画像データ間の輝度のばらつきを軽減することができる。 The receiving unit 61 receives image data transmitted from each information processing device 10 .
The luminance calculation unit 62 calculates a luminance histogram for each image data received from each information processing device 10 .
The brightness adjustment unit 63 adjusts the brightness of each image data so that variations in the brightness histograms of a plurality of image data are reduced. For example, the brightness of each image data is adjusted so that the brightness at the 50% percentile of the brightness histogram of each image data matches. This makes it possible to reduce variations in luminance between image data.

画像合成部６４は、輝度調整後の各画像データを共通のバーチャル背景下に配置し、合成画像を作成する。共通のバーチャル背景に画像データを配置させる処理は、所定のアルゴリズムに従って自動的に行われてもよいし、ビデオ会議のホスト（いずれか１つの情報処理装置１０）からの入力指令に基づいて配置することとしてもよい。
送信部６５は、合成画像を各情報処理装置１０に送信する。 The image synthesizing unit 64 arranges each image data after brightness adjustment under a common virtual background to create a synthesized image. The process of arranging the image data on the common virtual background may be performed automatically according to a predetermined algorithm, or may be arranged based on an input command from the videoconference host (one of the information processing devices 10). You can do it.
The transmission unit 65 transmits the composite image to each information processing device 10 .

次に、本実施形態におけるビデオ会議システム１の動作について簡単に説明する。
例えば、各情報処理装置１０の入力デバイス４を各ユーザ（参加者）が操作することにより、ビデオ会議が開始され、カメラ機能がオンにされると、カメラ７によりユーザの画像データが取得され、画像処理部２０に入力される。画像処理部２０は、カメラ７から入力される画像データに対して上述した画像処理を実行する。これにより、ユーザの年齢に応じて画像データにおける顔の横幅寸法及び高さ寸法が調整され、調整後の画像データがビデオ会議サーバ５０に送信される。 Next, the operation of the video conference system 1 according to this embodiment will be briefly described.
For example, when each user (participant) operates the input device 4 of each information processing device 10 to start a video conference and turn on the camera function, the user's image data is acquired by the camera 7, It is input to the image processing section 20 . The image processing unit 20 executes the image processing described above on image data input from the camera 7 . As a result, the width and height dimensions of the face in the image data are adjusted according to the age of the user, and the adjusted image data is transmitted to the videoconference server 50 .

ビデオ会議サーバ５０は、各情報処理装置１０から画像データをそれぞれ受信すると、受信したこれら画像データの輝度調整を行い、輝度調整後の画像データを共通のバーチャル背景に配置することにより、合成画像を作成する。そして、作成した合成画像を各情報処理装置１０に送信する。これにより、各情報処理装置１０のディスプレイ５には、例えば、図１１に示すように、共通のバーチャル背景に各参加者が配置された画像が表示されることとなる。このとき、各参加者の顔の大きさは、年齢に応じた横幅に調整され、また、各参加者の画像の輝度は、ばらつきが低減されるように調整されている。これにより、全体として統一感のあるイマーシブビューを各参加者に提供することができる。 When the videoconference server 50 receives the image data from each of the information processing devices 10, the videoconference server 50 adjusts the brightness of the received image data, and arranges the image data after the brightness adjustment on a common virtual background to create a composite image. create. Then, the created composite image is transmitted to each information processing apparatus 10 . As a result, on the display 5 of each information processing device 10, for example, as shown in FIG. 11, an image in which each participant is arranged on a common virtual background is displayed. At this time, the size of each participant's face is adjusted to the width according to age, and the brightness of each participant's image is adjusted to reduce variation. As a result, each participant can be provided with an immersive view with a sense of unity as a whole.

以上説明してきたように、本実施形態に係る画像処理部（画像処理装置）２０、情報処理装置１０、ビデオ会議サーバ５０、及びビデオ会議システム１によれば、以下の作用効果を奏する。 As described above, the image processing unit (image processing device) 20, the information processing device 10, the video conference server 50, and the video conference system 1 according to the present embodiment have the following effects.

画像処理部２０は、カメラ７によって撮像された画像データに含まれる人物の顔領域を特定し、特定した顔領域の所定高さ位置における横幅寸法Ｌ１及び人物の高さ寸法Ｌ２を検出し、年齢に応じた横幅基準値Ｌ１＿ｒｅｆ及び高さ基準値Ｌ２＿ｒｅｆを取得し、取得した横幅基準値Ｌ１＿ｒｅｆ及び高さ基準値Ｌ２＿ｒｅｆに横幅寸法及び高さ寸法が近づくように画像データを調整する。また、調整後の画像データに不足画素が生じていた場合には、不足した画素を周囲の画素情報から推測して補充する。これにより、顔の大きさや人物の高さが年齢に応じて規格化された画像を出力することができる。 The image processing unit 20 identifies the face area of the person included in the image data captured by the camera 7, detects the width dimension L1 and the height dimension L2 of the person at a predetermined height position of the identified face area, and calculates the age. The width reference value L1_ref and the height reference value L2_ref are obtained according to the width reference value L1_ref and the height reference value L2_ref, and the image data is adjusted so that the width dimension and the height dimension are close to the obtained width reference value L1_ref and height reference value L2_ref. Further, if there is a missing pixel in the image data after adjustment, the missing pixel is estimated from the surrounding pixel information and supplemented. This makes it possible to output an image in which the size of the face and the height of the person are standardized according to age.

また、画像を調整する際には、今回検出された横幅寸法だけでなく、直近に検出された所定数の横幅寸法を統計的に処理することにより、代表的な横幅寸法を演算し、代表的な横幅寸法が横幅基準値になるような調整比率を演算し、演算した調整比率を用いて画像データを拡大又は縮小させる。このように、直近に検出された複数の横幅寸法も加味して調整比率を算出することにより、調整比率の変化を滑らかにすることが可能となり、時間軸上における画像の変化を抑制することが可能となる。上記統計的処理の一例として、平均化処理、正規化処理などが挙げられる。 In addition, when adjusting the image, not only the width dimension detected this time, but also a predetermined number of recently detected width dimensions are statistically processed to calculate a representative width dimension, The adjustment ratio is calculated so that the width dimension becomes the width reference value, and the image data is enlarged or reduced using the calculated adjustment ratio. In this way, by calculating the adjustment ratio in consideration of the most recently detected width dimensions, it is possible to smooth the change in the adjustment ratio and suppress the change in the image on the time axis. It becomes possible. Examples of the statistical processing include averaging processing and normalization processing.

また、人物の年齢を画像データから推定し、推定した年齢に基づいて横幅基準値及び高さ基準値を特定するので、ユーザが年齢を入力する手間を省略することが可能となる。 In addition, since the age of the person is estimated from the image data and the width reference value and the height reference value are specified based on the estimated age, it is possible to save the user from having to input the age.

また、今回推定された年齢と直近に推定された所定数の年齢を統計的に処理することにより代表的な年齢を演算し、代表的な年齢に対応する横幅基準値及び高さ基準値を基準情報から特定する。これにより、横幅基準値の変化を滑らかにすることが可能となる。 In addition, by statistically processing the age estimated this time and the specified number of ages most recently estimated, the representative age is calculated, and the width standard value and height standard value corresponding to the representative age are used as standards. Identify from information. This makes it possible to smoothly change the width reference value.

基準情報は、性別に応じてそれぞれ設けられているので、性別も加味した自然な人物の大きさになるように画像データを調整することが可能となる。 Since the reference information is provided for each gender, it is possible to adjust the image data so that the size of the person is natural considering the gender.

なお、上述した実施形態では、顔の横幅と人物の高さの両方を調整することとしたがこれに限られない。例えば、顔の横幅のみを調整することとし、人物の高さ位置については調整しないような構成としてもよい。 In addition, in the embodiment described above, both the width of the face and the height of the person are adjusted, but the present invention is not limited to this. For example, only the width of the face may be adjusted, and the height position of the person may not be adjusted.

本実施形態では、所定数の横幅寸法Ｌ１を統計的に処理することにより代表的な横幅寸法を演算し、演算した横幅寸法と横幅基準値Ｌ１＿ｒｅｆとを用いて調整比率を算出していたが、この例に限定されない。例えば、パラメータ記憶部２５には、１つのデータセットのみが格納されるような構成とし、最新の横幅寸法Ｌ１と横幅基準値Ｌ１＿ｒｅｆとを用いて調整比率を算出することとしてもよい。 In this embodiment, a representative width dimension is calculated by statistically processing a predetermined number of width dimensions L1, and an adjustment ratio is calculated using the calculated width dimension and the width reference value L1_ref. It is not limited to this example. For example, the parameter storage unit 25 may be configured to store only one data set, and the adjustment ratio may be calculated using the latest width dimension L1 and width reference value L1_ref.

人物の高さ寸法Ｌ２についても同様に、最新の高さ寸法Ｌ２と高さ基準値Ｌ２＿ｒｅｆを用いて画像データを調整することとしてもよい。
本実施形態では、所定数の年齢を統計的に処理することにより代表的な年齢を演算し、演算した年齢から横幅基準値及び高さ基準値を特定していたがこの例に限定されない。例えば、最新の年齢に基づいて横幅基準値及び高さ基準値を特定することとしてもよい。 Similarly, for the height L2 of the person, the image data may be adjusted using the latest height L2 and the height reference value L2_ref.
In this embodiment, a representative age is calculated by statistically processing a predetermined number of ages, and the width reference value and height reference value are specified from the calculated age, but the present invention is not limited to this example. For example, the width reference value and height reference value may be specified based on the latest age.

本実施形態では、属性推定部２４が画像データから人物の年齢及び性別を推定していたがこの例に限定されない。例えば、年齢及び性別については、画像データから推定するのではなく、情報処理装置１０が備える記憶部１３にユーザ情報として登録されているユーザの年齢情報及び性別情報を取得することとしてもよい。 In this embodiment, the attribute estimating unit 24 estimates the age and sex of the person from the image data, but the present invention is not limited to this example. For example, instead of estimating the age and gender from the image data, the user's age information and gender information registered as user information in the storage unit 13 included in the information processing apparatus 10 may be acquired.

本実施形態では、性別に応じて基準情報を設けていたがこの例に限定されない。例えば、基準情報は性別に応じて設けられておらず、性別にかかわらずに共通の基準情報を用いて横幅基準値などを特定することとしてもよい。 In this embodiment, reference information is provided according to gender, but the present invention is not limited to this example. For example, the reference information is not provided according to gender, and the width reference value and the like may be specified using common reference information regardless of gender.

本実施形態において、画像処理部２０は、カメラ７と一体化されていてもよく、カメラ７と画像処理部２０とが一体化されたカメラモジュールとして提供されてもよい。この場合、カメラモジュールから出力された画像データは、その画像データに含まれる人物の大きさがすでに規格化されているため、情報処理装置１０は、カメラモジュールから出力される画像データをビデオ会議サーバ５０に送信することとなる。 In this embodiment, the image processing section 20 may be integrated with the camera 7, or may be provided as a camera module in which the camera 7 and the image processing section 20 are integrated. In this case, since the size of the person included in the image data output from the camera module has already been standardized, the information processing apparatus 10 can transfer the image data output from the camera module to the videoconference server. 50.

〔第２実施形態〕
次に、本発明の第２実施形態に係る画像処理装置、情報処理装置、ビデオ会議サーバ５０ａ、及びビデオ会議システムについて説明する。
上述した実施形態では、各情報処理装置１０が画像処理部２０を備えていたが、本実施形態では、情報処理装置ではなく、ビデオ会議サーバが画像処理部２０を備えている点が上述した第１実施形態と異なる。以下、上述した第１実施形態と共通する構成については同一の符号を付して説明を省略し、異なる点について主に説明する。 [Second embodiment]
Next, an image processing device, an information processing device, a video conference server 50a, and a video conference system according to the second embodiment of the present invention will be described.
In the above-described embodiment, each information processing device 10 includes the image processing unit 20. However, in the present embodiment, the image processing unit 20 is provided not by the information processing device but by the videoconference server. Differs from one embodiment. In the following, the same reference numerals are given to the configurations that are common to the above-described first embodiment, and the description thereof will be omitted, and the different points will be mainly described.

図１２は、本実施形態に係るビデオ会議サーバ５０ａが備える機能の一例を示した機能ブロック図である。図１２に示すように、ビデオ会議サーバ５０ａは、受信部６１によって受信された各情報処理装置１０からの画像データにおける人物の大きさ及び位置を調整するための画像処理部２０を備えている。この画像処理部２０の詳細な機能や処理手順の一例は、上述した実施形態で述べた通りである。 FIG. 12 is a functional block diagram showing an example of the functions of the videoconference server 50a according to this embodiment. As shown in FIG. 12, the videoconference server 50a includes an image processing section 20 for adjusting the size and position of the person in the image data from each information processing device 10 received by the receiving section 61. FIG. An example of detailed functions and processing procedures of the image processing unit 20 is as described in the above embodiment.

画像処理部２０による調整後の画像データは、輝度演算部６２に出力され、輝度調整のための処理が更に行われる。ここで、輝度調整とサイズ調整とはどちらを先に行ってもよい。例えば、輝度調整部６３によって輝度が調整された後の各画像データを画像処理部２０に入力することとしてもよい。 The image data after adjustment by the image processing unit 20 is output to the luminance calculation unit 62, and processing for luminance adjustment is further performed. Here, either the brightness adjustment or the size adjustment may be performed first. For example, each image data whose brightness has been adjusted by the brightness adjusting section 63 may be input to the image processing section 20 .

また、輝度調整は省略することとしてもよい。この場合、図１２において、輝度演算部６２及び輝度調整部６３が省略された構成とされる。これにより、画像処理部２０によって調整された画像データが画像合成部６４に入力されることとなり、入力された画像データが共通の背景に配置されることとなる。 Also, the brightness adjustment may be omitted. In this case, in FIG. 12, the configuration is such that the luminance calculation unit 62 and the luminance adjustment unit 63 are omitted. As a result, the image data adjusted by the image processing section 20 is input to the image synthesizing section 64, and the input image data is laid out on a common background.

〔第３実施形態〕
次に、本発明の第３実施形態に係る画像処理装置、情報処理装置、ビデオ会議サーバ５０ｂ、及びビデオ会議システムについて説明する。
上述した第２実施形態では、ビデオ会議サーバ５０ｂが基準情報を用いて各情報処理装置から受信した画像データの人物の大きさ及び高さ位置を規格化することとしたが、本実施形態に係るビデオ会議サーバ５０ｂは、画像データの調整方法が上述した第２実施形態と異なる。具体的には、本実施形態では、ビデオ会議サーバ５０ｂが各情報処理装置からの画像データを取得することができることを利用し、各情報処理装置から受信した複数の画像データに基づいて横幅基準値を導出する。
以下、上述した第２実施形態と共通する構成については同一の符号を付して説明を省略し、異なる点について主に説明する。 [Third embodiment]
Next, an image processing device, an information processing device, a video conference server 50b, and a video conference system according to a third embodiment of the present invention will be described.
In the second embodiment described above, the videoconference server 50b uses the reference information to standardize the size and height position of the person in the image data received from each information processing device. The videoconference server 50b differs from the above-described second embodiment in the method of adjusting image data. Specifically, in this embodiment, the videoconference server 50b can acquire image data from each information processing device, and the horizontal width reference value is calculated based on a plurality of image data received from each information processing device. to derive
In the following, the same reference numerals are given to the configurations that are common to the above-described second embodiment, and the description thereof will be omitted, and the different points will be mainly described.

図１３は、本実施形態に係るビデオ会議サーバ５０ｂが備える機能の一例を示した機能ブロック図である。図１３に示すように、ビデオ会議サーバ５０ｂは、受信部６１、顔領域特定部７１、寸法検出部７２、画像調整部７３、輝度演算部６２、輝度調整部６３、画像合成部６４、及び送信部６５を備えている。 FIG. 13 is a functional block diagram showing an example of the functions of the videoconference server 50b according to this embodiment. As shown in FIG. 13, the videoconference server 50b includes a receiving unit 61, a face area identifying unit 71, a dimension detecting unit 72, an image adjusting unit 73, a brightness calculating unit 62, a brightness adjusting unit 63, an image synthesizing unit 64, and a transmitting unit. A portion 65 is provided.

受信部６１は、各情報処理装置１０から送信される画像データを受信する。
顔領域特定部７１は、各画像データに含まれる人物（参加者）の顔領域を特定する。なお、顔領域特定部７１は、上述した顔領域特定部２２と同様の機能であるため、詳細は省略する。
寸法検出部７２は、特定された顔領域の所定高さ位置における横幅の寸法を横幅寸法として検出する。これにより、各画像データにおける顔の横幅寸法が検出される。 The receiving unit 61 receives image data transmitted from each information processing device 10 .
The face area specifying unit 71 specifies the face area of the person (participant) included in each image data. Note that the facial area specifying unit 71 has the same function as the above-described facial area specifying unit 22, so details thereof will be omitted.
The dimension detection unit 72 detects the width dimension at a predetermined height position of the specified face area as the width dimension. Thereby, the width dimension of the face in each image data is detected.

画像調整部７３は、各画像データにおいて検出された複数の横幅寸法を統計的に処理することにより、基準となる横幅基準値を取得する。例えば、画像調整部７３は、横幅寸法の平均値を横幅基準値として取得する。また、画像調整部７３は、横幅寸法の分布（ばらつき）の所定のパーセンタイル（例えば、５０％）の値を横幅基準値として取得してもよいし、横幅寸法の最大値を横幅基準値として取得してもよい。 The image adjuster 73 obtains a reference width reference value by statistically processing a plurality of width dimensions detected in each image data. For example, the image adjuster 73 acquires the average width dimension as the width reference value. In addition, the image adjustment unit 73 may obtain a predetermined percentile value (for example, 50%) of the distribution (variation) of the width dimension as the width reference value, or obtain the maximum value of the width dimension as the width reference value. You may

続いて、画像調整部７３は、各画像データにおける各横幅寸法を横幅基準値に近づけるように、各画像データを調整する。すなわち、各画像データにおける人物（参加者）の横幅寸法と横幅基準値とから調整比率を演算し、演算した調整比率に基づいて画像データを拡大縮小させる。これにより、各情報処理装置１０から受信した各画像データにおける人物の顔の大きさのばらつきを低減させることが可能となる。 Subsequently, the image adjustment unit 73 adjusts each image data so that each width dimension of each image data approaches the width reference value. That is, an adjustment ratio is calculated from the width dimension of the person (participant) in each image data and the width reference value, and the image data is enlarged or reduced based on the calculated adjustment ratio. This makes it possible to reduce variation in the size of a person's face in each image data received from each information processing device 10 .

調整後の画像データは、輝度演算部６２及び輝度調整部６３によって輝度調整された後に、画像合成部６４によって共通のバーチャル背景下に配置される。合成画像は、送信部６５によって各情報処理装置１０に送信される。 The adjusted image data is arranged under a common virtual background by the image synthesizing section 64 after the brightness is adjusted by the brightness calculating section 62 and the brightness adjusting section 63 . The composite image is transmitted to each information processing device 10 by the transmission unit 65 .

なお、上記説明では、顔の横幅寸法を調整する場合について説明したが、顔の横幅だけでなく、人物の高さ寸法についても同様の手法によって調整することとしてもよい。すなわち、各画像データにおける人物の高さ寸法を検出し、検出した複数の高さ寸法を統計的に処理することにより、基準となる高さ基準値を取得する。例えば、高さ寸法の平均値を高さ基準値として取得する。また、高さ寸法の分布（ばらつき）の所定のパーセンタイル（例えば、５０％）の値を高さ基準値として取得してもよいし、高さ寸法の最大値を高さ基準値として取得してもよい。 In the above description, the case of adjusting the width of the face has been described, but not only the width of the face but also the height of the person may be adjusted by a similar method. That is, the height reference value that serves as a reference is acquired by detecting the height dimension of the person in each image data and statistically processing the detected height dimensions. For example, the average height dimension is acquired as the height reference value. Alternatively, a predetermined percentile (for example, 50%) value of the height dimension distribution (variation) may be obtained as the height reference value, or the maximum value of the height dimension may be obtained as the height reference value. good too.

そして、各画像データにおける各高さ寸法を高さ基準値に近づけるように、各画像データを調整する。具体的には、各画像データにおける人物（参加者）の高さ寸法が高さ基準値と一致するように、画像データにおける人物の高さ位置を調整する。これにより、各情報処理装置１０から受信した各画像データにおける人物の高さ位置のばらつきを低減させることができる。 Then, each image data is adjusted so that each height dimension in each image data approaches the height reference value. Specifically, the height position of the person in the image data is adjusted so that the height dimension of the person (participant) in each image data matches the height reference value. This makes it possible to reduce variation in the height position of the person in each image data received from each information processing device 10 .

なお、第３実施形態に係るビデオ会議サーバ５０ｂは、上述した第１実施形態に係る情報処理装置１０と組み合わせて利用することが可能である。この場合、各情報処理装置１０からはすでに規格化された画像データを受信することとなる。そして、更にビデオ会議サーバ５０ｂにおいて、上述した画像調整を行うことにより、各画像データにおける人物の横幅や高さ位置を統一させることが可能となる。 Note that the video conference server 50b according to the third embodiment can be used in combination with the information processing apparatus 10 according to the first embodiment described above. In this case, already standardized image data is received from each information processing apparatus 10 . Further, by performing the above-described image adjustment in the video conference server 50b, it is possible to unify the horizontal width and height position of the person in each image data.

以上、本発明について実施形態を用いて説明したが、本発明の技術的範囲は上記実施形態に記載の範囲には限定されない。発明の要旨を逸脱しない範囲で上記実施形態に多様な変更又は改良を加えることができ、該変更又は改良を加えた形態も本発明の技術的範囲に含まれる。
また、上記実施形態で説明した処理の流れも一例であり、本発明の主旨を逸脱しない範囲内において不要なステップを削除したり、新たなステップを追加したり、処理順序を入れ替えたりしてもよい。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments. Various changes or improvements can be made to the above-described embodiments without departing from the gist of the invention, and forms with such changes or improvements are also included in the technical scope of the present invention.
Further, the flow of processing described in the above embodiment is also an example, and unnecessary steps may be deleted, new steps added, or the processing order changed without departing from the scope of the present invention. good.

例えば、上述した各実施形態では、ビデオ会議サーバを設けていたが、ビデオ会議サーバの機能の全部又は一部をビデオ会議のホストとなる情報処理装置１０が備えていてもよい。 For example, in each of the embodiments described above, a videoconference server is provided, but all or part of the functions of the videoconference server may be provided in the information processing apparatus 10 serving as the host of the videoconference.

１：ビデオ会議システム
４：入力デバイス
５：ディスプレイ
７：カメラ
８：ネットワーク
１０：情報処理装置
１１：ＣＰＵ
１２：メインメモリ
１３：記憶部
１４：外部インターフェース
１５：通信インターフェース
１６：スピーカ
１７：マイク
２０：画像処理部
２１：画像取得部
２２：顔領域特定部
２３：寸法検出部
２４：属性推定部
２５：パラメータ記憶部
２６：基準情報記憶部
２７：基準値特定部
２８：画像調整部
２９：出力部
５０：ビデオ会議サーバ
５０ａ：ビデオ会議サーバ
５０ｂ：ビデオ会議サーバ
５１：ＣＰＵ
５２：メインメモリ
５３：記憶部
５４：外部インターフェース
５５：通信インターフェース
６１：受信部
６２：輝度演算部
６３：輝度調整部
６４：画像合成部
６５：送信部
７１：顔領域特定部
７２：寸法検出部
７３：画像調整部
Ｌ１：横幅寸法
Ｌ１＿ｒｅｆ：横幅基準値
Ｌ２：高さ寸法
Ｌ２＿ｒｅｆ：高さ基準値 1: video conference system 4: input device 5: display 7: camera 8: network 10: information processing device 11: CPU
12: main memory 13: storage unit 14: external interface 15: communication interface 16: speaker 17: microphone 20: image processing unit 21: image acquisition unit 22: face area identification unit 23: dimension detection unit 24: attribute estimation unit 25: Parameter storage unit 26 : Reference information storage unit 27 : Reference value identification unit 28 : Image adjustment unit 29 : Output unit 50 : Video conference server 50a : Video conference server 50b : Video conference server 51 : CPU
52 : Main memory 53 : Storage unit 54 : External interface 55 : Communication interface 61 : Reception unit 62 : Luminance calculation unit 63 : Luminance adjustment unit 64 : Image composition unit 65 : Transmission unit 71 : Face area identification unit 72 : Dimension detection unit 73: Image adjustment unit L1: Width dimension L1_ref: Width reference value L2: Height dimension L2_ref: Height reference value

Claims

a processor;
a memory storing a program configured to be executed by the processor;
with
The program
get the image data,
identifying a person's face area included in the image data;
detecting a width dimension at a predetermined height position of the identified face area as a width dimension;
identifying a width reference value corresponding to the age of the person from reference information in which the age and the width reference value of the face area are associated;
adjusting the image data so that the width dimension approaches the specified width reference value;
An image processing device including instructions for outputting the adjusted image data.

an image acquisition unit that acquires image data;
a face area identification unit that identifies a person's face area included in the image data;
a dimension detection unit that detects the width dimension at a predetermined height position of the identified face area as the width dimension;
a reference value identification unit that identifies a width reference value corresponding to the age of the person from reference information in which the age and the width reference value of the face area are associated;
an image adjustment unit that adjusts the image data so that the width dimension approaches the specified width reference value;
and an output unit that outputs the adjusted image data.

The image adjustment unit calculates a representative width dimension by statistically processing the width dimension detected this time and a predetermined number of the width dimensions detected most recently, and the representative width dimension is the 3. The image processing apparatus according to claim 2, wherein an adjustment ratio that provides a horizontal width reference value is calculated, and the image data is enlarged or reduced using the calculated adjustment ratio.

an attribute estimation unit that estimates the age of the person from the image data;
4. The image processing apparatus according to claim 2, wherein the reference value specifying unit specifies the width reference value corresponding to the estimated age from the reference information.

The reference value specifying unit calculates a representative age by statistically processing the currently estimated age and a predetermined number of ages most recently estimated, and calculates the width reference value corresponding to the representative age. is specified from the reference information.

4. The image processing apparatus according to claim 2, wherein the reference information is provided according to gender.

An attribute estimation unit that estimates the age and gender of the person from the image data,
7. The image processing apparatus according to claim 6, wherein the reference value specifying unit specifies the width reference value corresponding to the estimated age and sex from the reference information.

The dimension detection unit detects a height dimension of a person in the image data as a height dimension,
The reference information is associated with an age, a width reference value, and a person's height reference value,
The reference value specifying unit specifies a width reference value and a height reference value corresponding to the age of the person from the reference information,
The image processing apparatus according to any one of claims 2 to 7, wherein the image adjustment unit adjusts the image data so that the height dimension approaches the specified height reference value.

9. The image processing according to claim 8, wherein, in the image data after adjustment, if there is a shortage of pixels in the height direction, the image adjustment unit estimates and replenishes the missing pixels from surrounding pixel information. Device.

A program for causing a computer to function as the image processing apparatus according to any one of claims 1 to 9.

An information processing apparatus comprising the image processing apparatus according to claim 1 .

obtaining image data;
identifying a person's face area included in the image data;
a step of detecting a width dimension at a predetermined height position of the identified face area as the width dimension;
identifying a width reference value corresponding to the age of the person from reference information in which the age and the width reference value of the face area are associated;
adjusting the image data so that the width dimension approaches the specified width reference value;
and outputting the adjusted image data.

a plurality of information processing devices;
a video conference server that receives the adjusted image data from the plurality of information processing devices;
Each information processing device comprises the image processing device according to any one of claims 1 to 9,
A video conference system in which the video conference server includes an image synthesizing unit that arranges the image data received from each of the information processing devices under a common background.

a receiver for receiving image data of a plurality of participants in a video conference;
a facial area identifying unit that identifies the facial area of the participant included in each of the image data;
a dimension detection unit that detects the width dimension at a predetermined height position of the identified face area as the width dimension;
By statistically processing a plurality of width dimensions detected in each of the image data, a width reference value that serves as a reference is calculated, and each of the width dimensions in each of the image data is brought closer to the width reference value, an image adjusting unit that adjusts each of the image data;
an image synthesizer for placing each of said image data on a common virtual background.

a luminance calculation unit for calculating a luminance histogram of each image data;
a luminance adjustment unit that adjusts the luminance of each of the image data so that variations in the luminance histogram between the plurality of image data are reduced;
15. The videoconference server according to claim 14, wherein the image synthesizing unit arranges each of the image data after brightness adjustment under the common background.