JP2023025990A

JP2023025990A - Information processing apparatus, information processing method, and program

Info

Publication number: JP2023025990A
Application number: JP2021131522A
Authority: JP
Inventors: 佑介河本; Yusuke Komoto
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2023-02-24

Abstract

To allow for reducing user's time and effort and the load of analysis processing.SOLUTION: An information processing apparatus determines a size of an area where estimation is possible by prescribed estimation processing, on the basis of a size and position of an area designated by a user on a setting screen where an image acquired by image capture is displayed. Then, the information processing apparatus displays an index representing the size of the area where estimation is possible by the estimation processing, on the setting screen for the user to designate the size and position of the area.SELECTED DRAWING: Figure 3

Description

本発明は、撮像された画像を解析する情報処理技術に関する。 The present invention relates to information processing technology for analyzing captured images.

特許文献１には、画像から人物等の特定の物体を検出してその数を推定するような技術が開示されている。さらに特許文献１には、画像から人体のサイズと位置を取得できない領域に対しては、ユーザが教示した人体の平均的な大きさを基に、画像上の任意の位置における人体の平均的な大きさを補間により推定する技術が開示されている。
また、特許文献２には、所望の処理結果を得るための設定作業やパラメータ修正作業を適正に行うために必要な情報を、ユーザに提示する技術が開示されている。 Patent Literature 1 discloses a technique for detecting specific objects such as people from an image and estimating the number of such objects. Further, in Patent Document 1, for a region where the size and position of the human body cannot be obtained from the image, based on the average size of the human body instructed by the user, the average size of the human body at an arbitrary position on the image is Techniques for estimating magnitude by interpolation are disclosed.
Further, Patent Literature 2 discloses a technique of presenting information necessary for appropriately performing setting work and parameter correction work for obtaining desired processing results to a user.

特開２０１８－２２３４０号公報Japanese Patent Application Laid-Open No. 2018-22340 特開２０１６－１４３１５７号公報JP 2016-143157 A

画像から特定の物体を検出等する推定処理の結果を取得するような画像解析が行われる場合、ユーザにとって期待する推定結果が得られるかどうかは、実際に画像解析を行った結果を確認するまでは分からない。そして、その推定結果がユーザにとって期待する結果でない場合、ユーザは推定処理に用いる値等を設定或いは調整等して画像解析を再度実行するような作業を繰り返す必要があり、非常に手間がかかるだけでなく、画像解析の処理負荷も増大する。 When image analysis is performed to obtain the result of estimation processing such as detection of a specific object from an image, whether or not the estimation result expected by the user can be obtained cannot be determined until the result of the actual image analysis is confirmed. I don't know. Then, if the estimation result is not the result expected by the user, the user needs to repeat the work of setting or adjusting the values used in the estimation process and re-executing the image analysis, which is very time-consuming. In addition, the processing load of image analysis also increases.

そこで、本発明は、ユーザの手間の軽減及び解析処理の負荷軽減を可能にすることを目的とする。 SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to make it possible to reduce the user's trouble and the load of analysis processing.

本発明の情報処理装置は、撮像されて取得された画像が表示されている設定画面上で、ユーザが指定した領域の大きさと位置を基に、所定の推定処理による推定が可能な領域の大きさを決定する決定手段と、前記ユーザが前記領域の大きさと位置を指定する前記設定画面上に、前記決定手段にて決定された前記推定処理による推定が可能な領域の大きさを表す指標を表示する表示手段と、を有することを特徴とする。 The information processing apparatus according to the present invention is capable of estimating the size of an area that can be estimated by a predetermined estimation process based on the size and position of an area specified by a user on a setting screen on which an acquired image is displayed. and an index representing the size of the area that can be estimated by the estimation process determined by the determining means on the setting screen for the user to specify the size and position of the area. and display means for displaying.

本発明によれば、ユーザの手間の軽減及び解析処理の負荷軽減が可能になる。 According to the present invention, it is possible to reduce user's trouble and load of analysis processing.

情報処理システムの全体構成例を示す図である。It is a figure which shows the whole structural example of an information processing system. 情報処理装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of an information processing apparatus. 情報処理装置の機能構成の一例を示す図である。It is a figure which shows an example of the functional structure of an information processing apparatus. 情報処理の流れを示すフローチャートである。It is a flowchart which shows the flow of information processing. 顔サイズと顔位置の入力を受け付ける画面例を示す図である。FIG. 10 is a diagram showing an example of a screen for accepting input of face size and face position; 推定可能な顔サイズを重畳表示する画面例を示す図である。FIG. 10 is a diagram showing an example of a screen that superimposes an estimable face size; 推定可能な顔サイズを重畳表示する位置の説明に用いる図である。FIG. 10 is a diagram used for explaining the positions where the estimable face sizes are superimposed and displayed; 推定可能な顔サイズを重畳表示するアイコン例を示す図である。FIG. 10 is a diagram showing an example of an icon that superimposes an estimable face size;

以下、本発明に係る実施形態を、図面を参照して詳細に説明する。以下の実施形態は本発明を限定するものではなく、また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。実施形態の構成は、本発明が適用される装置の仕様や各種条件（使用条件、使用環境等）によって適宜修正又は変更され得る。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments according to the present invention will be described in detail with reference to the drawings. The following embodiments do not limit the present invention, and not all combinations of features described in the embodiments are essential to the solution of the present invention. The configuration of the embodiment can be appropriately modified or changed according to the specifications of the device to which the present invention is applied and various conditions (use conditions, use environment, etc.).

本実施形態の情報処理装置の構成及び処理の詳細を述べる前に、本実施形態の一適用例について簡単に説明する。
監視カメラ、ネットワークカメラなどの撮像装置によって取得された画像から所定の対象物を検出すること、さらにはその対象物の数を計測することなどを可能とする画像解析技術がある。ここで、対象物の一例として人物（人体）を想定した場合、当該画像解析技術は、イベント会場等での混雑の検知等に用いることができ、さらに混雑解消や避難誘導等などへの利用が可能となる。画像中から人物を検出する手法としては、背景差分法による検出や、人体モデルを用いたパターン抽出などの人体検出手法が知られており、それら人体検出手法によって検出した人物の数を計数することで人数計測も可能となる。 Before describing the details of the configuration and processing of the information processing apparatus of this embodiment, an application example of this embodiment will be briefly described.
2. Description of the Related Art There is an image analysis technique that makes it possible to detect predetermined objects from images acquired by imaging devices such as surveillance cameras and network cameras, and to count the number of such objects. Here, assuming a person (human body) as an example of the target object, the image analysis technology can be used to detect congestion at an event venue, etc., and furthermore, it can be used to relieve congestion and provide evacuation guidance. It becomes possible. Human body detection methods such as detection by background subtraction and pattern extraction using a human body model are known as methods for detecting people in an image. It is also possible to count the number of people.

また、機械学習によって得た学習モデルを用いて、画像の所定の領域内に写っている人数を直接推定するような画像解析処理もある。ただし、学習モデルを用いて人数を直接推定する画像解析処理の場合、画像に対してどのように学習モデルを設定するかによって推定の精度が変わってしまうことがある。一般的には、画像中の人物のサイズと位置を前述した人体検出手法によって取得し、その取得した人体サイズに合わせて学習モデルを適用することが行われている。そして、このような人体サイズに合わせて学習モデルを適用する手法を用いた場合であっても、画像に対してどのように学習モデルを適用するかについては、ユーザが設定する必要がある。すなわちユーザは、画像に対してどう学習モデルを設定するかを決定するための値（パラメータ）を設定する必要がある。 There is also image analysis processing that directly estimates the number of people in a given area of an image using a learning model obtained by machine learning. However, in the case of image analysis processing that directly estimates the number of people using a learning model, the accuracy of estimation may change depending on how the learning model is set for the image. Generally, the size and position of a person in an image are obtained by the above-described human body detection method, and a learning model is applied according to the obtained human body size. Even in the case of using the method of applying the learning model according to the size of the human body, it is necessary for the user to set how the learning model is applied to the image. That is, the user needs to set values (parameters) for determining how to set the learning model for the image.

この場合、ユーザは、画像解析処理で得られた結果を確認してパラメータを変更する操作を繰り返し行って、適切なパラメータに調節する必要がある。すなわち学習モデルの適用パラメータをユーザが設定する場合、期待する解析結果が得られるかどうかは、実際に解析処理を行ってみるまで分からず、期待する結果が得られるまでパラメータの変更・調整の作業を繰り返す必要がある。これはユーザにとって非常に大きな負担となる。なお、前述した特許文献２に開示された手法のように、ユーザが所望の処理結果を得るための設定作業やパラメータ修正作業を適正に行うのに必要な情報をユーザに提示することも考えられる。この手法の場合、解析処理の結果をユーザが確認してパラメータを修正するまでの手間はある程度削減できるが、画像解析処理を実行する手間は変わらない。このため、画像解析処理に時間がかかる場合には、その解析処理が終わるまでユーザは待たねばならず、ユーザビリティを損なう原因となってしまう。また、画像解析処理の実行が必要であるため、処理負荷は軽減されない。 In this case, the user needs to check the result obtained by the image analysis processing and repeat the operation of changing the parameters to adjust the parameters appropriately. In other words, when the user sets the application parameters of the learning model, it is impossible to know whether the expected analysis results will be obtained until the actual analysis processing is performed. must be repeated. This imposes a very heavy burden on the user. As in the method disclosed in Patent Document 2, it is conceivable to present to the user information necessary for the user to appropriately perform setting work and parameter correction work for obtaining desired processing results. . In the case of this method, the user's work until the user confirms the result of the analysis processing and corrects the parameters can be reduced to some extent, but the work of executing the image analysis processing does not change. Therefore, if the image analysis process takes a long time, the user has to wait until the analysis process is completed, which impairs usability. Moreover, since it is necessary to execute image analysis processing, the processing load is not reduced.

そこで、本実施形態の情報処理装置では、以下に説明する構成を有し、後述する処理を行うことによって、ユーザの手間の軽減と画像解析処理の負荷を軽減可能にしている。
＜システムの概要構成＞
図１は、本実施形態の情報処理装置の一適用例としての、情報処理システム１０１の概略的な構成例を示す図である。なお、情報処理システム１０１は、例えば、劇場やスタジアム、ショッピングモール、公園などの屋外や、会議室、待合室のような屋内など、様々な場所を対象にして構築されるものであるとする。 Therefore, the information processing apparatus according to the present embodiment has the configuration described below and performs the processing described later, thereby making it possible to reduce the user's trouble and the load of the image analysis processing.
<General configuration of the system>
FIG. 1 is a diagram showing a schematic configuration example of an information processing system 101 as an application example of an information processing apparatus according to this embodiment. It is assumed that the information processing system 101 is constructed for various places such as outdoors such as theaters, stadiums, shopping malls and parks, and indoors such as meeting rooms and waiting rooms.

図１の情報処理システム１０１は、クライアントＰＣ（パーソナルコンピュータ）１０２、解析サーバ１０３、録画サーバ１０４、複数のネットワークカメラ１０５等を備えている。これらクライアントＰＣ１０２～ネットワークカメラ１０５は、それぞれネットワークに接続され相互に通信が可能になされている。なお、図１では、クライアントＰＣ１０２、解析サーバ１０３、及び録画サーバ１０４が別構成となされているが、それらクライアントＰＣ１０２～録画サーバ１０４の各機能は一台のＰＣあるいはサーバに集約されていてもよい。 An information processing system 101 in FIG. 1 includes a client PC (personal computer) 102, an analysis server 103, a recording server 104, a plurality of network cameras 105, and the like. These client PC 102 to network camera 105 are connected to a network and are capable of communicating with each other. Although the client PC 102, the analysis server 103, and the recording server 104 are configured separately in FIG. 1, each function of the client PC 102 to the recording server 104 may be integrated into one PC or server. .

クライアントＰＣ１０２は、ユーザからの入力を受け付けて、解析サーバ１０３、録画サーバ１０４、ネットワークカメラ１０５に対して指示を行い、その結果を表示可能なパーソナルコンピュータ或いはモバイル端末である。 The client PC 102 is a personal computer or mobile terminal that can receive input from a user, issue instructions to the analysis server 103, recording server 104, and network camera 105, and display the results.

解析サーバ１０３は、クライアントＰＣ１０２からの指示を受け付け、その指示に基づいて画像解析処理を行う情報処理サーバである。本実施形態において、画像解析処理とは、録画サーバ１０４やネットワークカメラ１０５から取得した画像内に写っている人物等の対象物の判別（検出）、対象物の動き（人物の行動）の分析、さらにはその対象物の数（人数）の推定等を行う処理であるとする。 The analysis server 103 is an information processing server that receives instructions from the client PC 102 and performs image analysis processing based on the instructions. In this embodiment, the image analysis processing includes discrimination (detection) of an object such as a person appearing in an image acquired from the recording server 104 or the network camera 105, analysis of the movement of the object (person's behavior), Furthermore, it is assumed that the processing is to estimate the number of objects (number of people).

録画サーバ１０４は、クライアントＰＣ１０２または解析サーバ１０３からの指示を基に録画処理を行う記録サーバである。録画処理とは、当該録画サーバ１０４と同じネットワークに接続されている複数のネットワークカメラ１０５に対して画像取得要求を行い、その要求に応じてネットワークカメラ１０５から送られてきた画像を保存する処理を指す。また、録画処理には、ネットワークカメラ１０５から得られるメタ情報を保存する処理も含まれる。メタ情報とは、時刻を含む時間情報、ネットワークカメラ１０５の位置情報、ネットワークカメラ１０５の設定情報、ネットワークカメラ１０５に付随するセンサの情報などの少なくとも一つを含む情報である。 The recording server 104 is a recording server that performs recording processing based on instructions from the client PC 102 or the analysis server 103 . The recording process is a process of making an image acquisition request to a plurality of network cameras 105 connected to the same network as the recording server 104, and storing the image sent from the network camera 105 in response to the request. Point. Recording processing also includes processing for storing meta information obtained from the network camera 105 . Meta information is information including at least one of time information including time, location information of the network camera 105, setting information of the network camera 105, information of sensors attached to the network camera 105, and the like.

ネットワークカメラ１０５は、建物の天井や壁、電柱などに設けられ、周囲の画像を取得し、その画像をネットワーク経由で録画サーバ１０４に送信可能な監視カメラなどの撮像装置である。なお、ネットワークカメラ１０５からの画像は、解析サーバ１０３やクライアントＰＣ１０２に直接送られてもよい。ネットワークカメラ１０５は、クライアントＰＣ１０２からの指示を受け付けて画像や音声の取得を行い、またカメラの回転制御などを行う。ネットワークカメラ１０５には、先述のメタ情報取得のために、人感センサ、温度センサ、音声センサ、赤外線センサなど、種々のセンサが接続されていてもよい。 The network camera 105 is an imaging device, such as a monitoring camera, which is installed on the ceiling, wall, utility pole, or the like of a building, acquires an image of the surroundings, and can transmit the image to the recording server 104 via the network. Note that the image from the network camera 105 may be sent directly to the analysis server 103 or client PC 102 . The network camera 105 receives an instruction from the client PC 102, acquires images and sounds, and controls rotation of the camera. Various sensors such as a motion sensor, a temperature sensor, an audio sensor, and an infrared sensor may be connected to the network camera 105 in order to acquire the aforementioned meta information.

＜ハードウェア構成＞
図２は、本実施形態に係る情報処理装置のハードウェア構成例を示すブロック図であり、図１に示した情報処理システム１０１のクライアントＰＣ１０２、解析サーバ１０３、録画サーバ１０４は、それぞれが図２のようなハードウェア構成を有している。なお、図２の構成は、クライアントＰＣ１０２～録画サーバ１０４の各機能が集約された一台のＰＣあるいはサーバにも適用される。本実施形態の場合、情報処理装置は、パーソナルコンピュータ（ＰＣ）により実現されるとするが、組込みシステム、タブレット端末等により実現されてもよい。 <Hardware configuration>
FIG. 2 is a block diagram showing a hardware configuration example of the information processing apparatus according to this embodiment. The client PC 102, analysis server 103, and recording server 104 of the information processing system 101 shown in FIG. It has a hardware configuration like The configuration of FIG. 2 is also applied to a single PC or server in which the functions of the client PC 102 to recording server 104 are integrated. In this embodiment, the information processing apparatus is implemented by a personal computer (PC), but may be implemented by an embedded system, a tablet terminal, or the like.

図２において、ＣＰＵ２０１は中央演算処理装置であり、コンピュータプログラムに基づいて、他の構成要素と協働し、情報処理装置全体の動作を制御し、様々な情報処理を実行する。
ＲＯＭ２０２は読出し専用メモリであり、基本プログラムや基本処理に使用するデータ等を記憶している。ＲＡＭ２０３は書込み可能メモリであり、ＣＰＵ２０１のワークエリア等として機能する。外部記憶装置２０４は、記録媒体へのアクセスを実現し、ＵＳＢメモリ等のメディア（記録媒体）に記憶されたコンピュータプログラムやデータを、本情報処理装置にロードすることができる。ストレージ部２０５は、ＳＳＤ（ソリッドステートドライブ）等の大容量メモリとして機能する装置である。ストレージ部２０５には、本実施形態に係る情報処理のためのコンピュータプログラムを含む各種プログラムや、画像データを含む各種データが格納される。 In FIG. 2, a CPU 201 is a central processing unit, and based on a computer program, cooperates with other components, controls the operation of the entire information processing apparatus, and executes various information processing.
A ROM 202 is a read-only memory, and stores basic programs and data used for basic processing. A RAM 203 is a writable memory and functions as a work area for the CPU 201 and the like. The external storage device 204 can access a recording medium and load computer programs and data stored in a medium (recording medium) such as a USB memory into the information processing apparatus. The storage unit 205 is a device that functions as a large-capacity memory such as an SSD (Solid State Drive). The storage unit 205 stores various programs including computer programs for information processing according to the present embodiment, and various data including image data.

操作部２０６は、ユーザからの指示やコマンドの入力を受け付ける装置であり、キーボードやポインティングデバイス、タッチパネル等がこれに相当する。
ディスプレイ２０７は、操作部２０６から入力されたコマンドや、それに対する情報処理装置の応答出力等を表示する表示装置である。
インターフェイス（Ｉ／Ｆ）部２０８は、外部装置とのデータのやり取りを中継する装置である。
システムバス２０９は、情報処理装置内のデータの流れを司るデータバスである。
なお、以上の各部の機能は、それら各部と同等の機能を実現するソフトウェアによって、ハードウェア装置の代替として構成することもできる。 An operation unit 206 is a device that receives an instruction or command input from a user, and corresponds to a keyboard, pointing device, touch panel, or the like.
A display 207 is a display device that displays a command input from the operation unit 206, a response output from the information processing apparatus in response to the command, and the like.
An interface (I/F) unit 208 is a device that relays data exchange with an external device.
A system bus 209 is a data bus that controls the flow of data within the information processing apparatus.
It should be noted that the functions of the respective units described above can also be configured as substitutes for hardware devices by software that implements functions equivalent to those of the respective units.

＜情報処理装置の機能構成＞
図３は、クライアントＰＣ１０２、解析サーバ１０３、及び録画サーバ１０４によって実現される、本実施形態の情報処理装置３０１の機能構成を示すブロック図である。本実施形態の場合、図３に示す各機能部の構成はプログラムの実行により実現されるとするが、一部が回路等のハードウェア構成として実現されてもよいし、すべてがハードウェア構成として実現されてもよい。また以下の説明では、画像から対象物としての人物を検出し、それら人物の数を推定するような画像解析処理を例に挙げて説明する。 <Functional Configuration of Information Processing Device>
FIG. 3 is a block diagram showing the functional configuration of the information processing apparatus 301 of this embodiment realized by the client PC 102, analysis server 103, and recording server 104. As shown in FIG. In the case of this embodiment, the configuration of each functional unit shown in FIG. 3 is assumed to be implemented by executing a program. may be implemented. In the following description, an example of image analysis processing for detecting persons as objects from an image and estimating the number of such persons will be described.

画像取得部３０２は、録画サーバ１０４において実現される機能部である。画像取得部３０２は、ネットワークカメラ１０５に対して画像取得要求を送り、その要求に応じてネットワークカメラ１０５から送られてきた画像を取得する。本実施形態では、ネットワークカメラ１０５で撮像された画像は録画サーバ１０４に保存されるとしたが、保存せず後段の処理を行う解析サーバ１０３に直接送られてもよく、この場合、画像取得部３０２の機能は解析サーバ１０３に含まれる。 The image acquisition unit 302 is a functional unit implemented in the recording server 104 . The image acquisition unit 302 sends an image acquisition request to the network camera 105 and acquires an image sent from the network camera 105 in response to the request. In this embodiment, the image captured by the network camera 105 is stored in the recording server 104, but it may be directly sent to the analysis server 103 that performs subsequent processing without being stored. A function of 302 is included in the analysis server 103 .

表示部３０６は、クライアントＰＣ１０２において実現される機能部である。表示部３０６は、ネットワークカメラ１０５で撮像されて画像取得部３０２で取得された画像を含む設定画面を、クライアントＰＣ１０２のディスプレイ２０７に表示させる。設定画面と表示部３０６における処理の詳細は後述する。 A display unit 306 is a functional unit implemented in the client PC 102 . The display unit 306 causes the display 207 of the client PC 102 to display a setting screen including an image captured by the network camera 105 and acquired by the image acquisition unit 302 . Details of the setting screen and processing in the display unit 306 will be described later.

推定部３０４は、解析サーバ１０３において実現される機能部である。推定部３０４は、画像取得部３０２から受け取った画像に対して所定の推定処理を行い、その推定結果を出力する。本実施形態において、所定の推定処理とは、撮像により取得された画像から、学習モデルを用いて、所定の対象物としての人物を検出すると共に人物の数（人数）を推定する処理である。本実施形態の推定部３０４は、画像取得部３０２から受け取った画像について、後述する領域決定部３０３により決定された学習モデルを適用する領域に対して、機械学習手法による回帰器を適用して人数を推定する処理を行う。ここでは、一例として、固定サイズの画像を入力とし、その画像に写っている人物の数を出力とする回帰器を用いる手法について説明する。この場合、事前に人数が既知である大量の画像を学習データとし、深層学習等の既知の機械学習手法を用いて回帰器を学習しておく。そして、推定部３０４は、後述する領域決定部３０３で決定された学習モデルを適用する領域を固定サイズにリサイズして、回帰器の入力とし、各領域内の人数を推定する。なお、推定する人数は必ずしも整数とは限らず、実数を取ることもあり得る。 The estimation unit 304 is a functional unit implemented in the analysis server 103 . The estimation unit 304 performs predetermined estimation processing on the image received from the image acquisition unit 302 and outputs the estimation result. In the present embodiment, the predetermined estimation process is a process of detecting a person as a predetermined object and estimating the number of persons (number of people) from an image acquired by imaging using a learning model. The estimating unit 304 of the present embodiment applies a regressor using a machine learning method to a region to which a learning model determined by a region determining unit 303 (to be described later) is applied to the image received from the image acquiring unit 302 to calculate the number of people. Perform the process of estimating Here, as an example, a method using a regressor whose input is an image of a fixed size and whose output is the number of people appearing in the image will be described. In this case, a large number of images in which the number of people is known in advance are used as learning data, and a known machine learning method such as deep learning is used to learn the regressor. Then, the estimation unit 304 resizes the area to which the learning model determined by the area determination unit 303 (to be described later) is applied to a fixed size, uses it as an input to the regressor, and estimates the number of people in each area. Note that the number of people to be estimated is not necessarily an integer, and may be a real number.

統合部３０５は、解析サーバ１０３において実現される機能部である。統合部３０５は、後述する領域決定部３０３で決定された学習モデルを適用する領域毎に推定部３０４が推定した結果を統合することで、画像内に写っている人物の合計数を取得する。 The integration unit 305 is a functional unit implemented in the analysis server 103 . The integration unit 305 acquires the total number of people in the image by integrating the results estimated by the estimation unit 304 for each area to which the learning model determined by the area determination unit 303 (to be described later) is applied.

領域決定部３０３は、解析サーバ１０３において実現される機能部である。領域決定部３０３は、画像取得部３０２からの画像と、表示部３０６に表示された設定画面上で画像に対してユーザが指定した領域の大きさ及び位置とを基に、推定部３０４で行われる所定の推定処理による推定が可能な領域の大きさを決定する。本実施形態の場合、設定画面上でユーザが指定する領域とは、人物の少なくとも顔を含む領域であり、ユーザは顔の大きさ（顔サイズとする）及びその位置（顔位置とする）を指定する。したがって領域決定部３０３は、画像取得部３０２からの画像と、設定画面上でユーザが指定した顔サイズ及び顔位置とを基に、推定部３０４が学習モデルを適用する領域、つまり回帰器を用いた人数推定処理を適用する領域の大きさを決定する。このとき、領域サイズと顔サイズとの比率がほぼ一定となるように設定することで、回帰器の精度は向上する。 The area determining unit 303 is a functional unit implemented in the analysis server 103. FIG. The area determination unit 303 performs the calculation performed by the estimation unit 304 based on the image from the image acquisition unit 302 and the size and position of the area specified by the user on the setting screen displayed on the display unit 306 . Determine the size of the region that can be estimated by a predetermined estimation process. In the case of this embodiment, the area specified by the user on the setting screen is an area including at least the face of the person, and the user specifies the size of the face (face size) and its position (face position). specify. Therefore, based on the image from the image acquisition unit 302 and the face size and face position specified by the user on the setting screen, the region determination unit 303 uses the region to which the estimation unit 304 applies the learning model, that is, the regressor. Determines the size of the area to which the number-of-people estimation process is applied. At this time, the precision of the regressor is improved by setting the ratio between the area size and the face size to be substantially constant.

また、領域決定部３０３は、後述する想定顔サイズと想定顔位置の算出をも行う。想定顔サイズ及び想定顔位置の情報は表示部３０６に送られる。
表示部３０６は、クライアントＰＣ１０２において実現される機能部である。表示部３０６は、領域決定部３０３から想定顔サイズ及び想定顔位置の情報を受け取り、当該想定顔サイズ及び想定顔位置を表す指標を、設定画面上の画像に重畳表示する。さらに、表示部３０６は、統合部３０５において人数集計結果が取得された場合には、その人数集計結果を受け取って表示する。 The area determining unit 303 also calculates an assumed face size and an assumed face position, which will be described later. Information on the assumed face size and assumed face position is sent to the display unit 306 .
A display unit 306 is a functional unit implemented in the client PC 102 . The display unit 306 receives the information on the assumed face size and assumed face position from the area determination unit 303, and displays the indices representing the assumed face size and assumed face position superimposed on the image on the setting screen. Further, when the integration unit 305 acquires the result of counting the number of people, the display unit 306 receives and displays the result of counting the number of people.

＜情報処理のフローチャート＞
図４は、情報処理システム１０１における処理の流れを示したフローチャートである。ここでは、画像取得部３０２が画像を取得し、ユーザからの指示を受け付けて領域決定部３０３が学習モデルを適用する領域を決定し、さらに推定部３０４が領域毎に人数推定処理を行い、統合部３０５が画像内の人数推定結果を取得する処理の流れを説明する。また、適宜説明のために図５～８を参照する。なお、以下のフローチャートの説明では、処理ステップを符号のＳによって表している。 <Flowchart of information processing>
FIG. 4 is a flowchart showing the flow of processing in the information processing system 101. As shown in FIG. Here, the image acquisition unit 302 acquires an image, the region determination unit 303 receives an instruction from the user, determines the region to which the learning model is applied, and the estimation unit 304 performs population estimation processing for each region, and integrates the images. A flow of processing in which the unit 305 acquires the result of estimating the number of people in the image will be described. Also, reference is made to FIGS. 5-8 for explanation as appropriate. In the following description of the flowcharts, processing steps are represented by symbol S.

Ｓ４０１において、画像取得部３０２は、人数推定を行う対象となる画像を受け付ける。本実施形態の場合、画像取得部３０２が取得する画像は、録画サーバ１０４に保存されている画像とするが、ネットワークカメラ１０５から直接送られてきた画像でもよいし、ユーザが指定した画像であってもよい。 In S401, the image acquisition unit 302 receives an image to be used for population estimation. In this embodiment, the image acquired by the image acquisition unit 302 is an image stored in the recording server 104, but may be an image directly sent from the network camera 105 or an image specified by the user. may

次に、Ｓ４０２において、領域決定部３０３は、表示部３０６がクライアントＰＣ１０２のディスプレイ２０７に表示している設定画面上で、操作部２０６を介してユーザにより入力された顔サイズと顔位置の情報を受け付ける。なお、顔サイズは、顔の大きさを測れる尺度であればどのようなものでもよい。顔位置は、画像中の座標を用いるとする。 Next, in step S402, the area determination unit 303 selects the face size and face position information input by the user via the operation unit 206 on the setting screen displayed by the display unit 306 on the display 207 of the client PC 102. accept. Note that the face size may be any scale as long as it can measure the size of the face. Coordinates in the image are used for the face position.

図５は、クライアントＰＣ１０２のディスプレイ２０７に表示され、操作部２０６を介してユーザから顔サイズと顔位置の入力を受け付ける設定画面の例を示した図である。
設定画面５０１は、ユーザによる顔サイズと顔位置の入力を受け付ける設定用の画面の一例である。この設定画面５０１では、学習モデルを適用する領域（学習モデル適用領域とする）を設定するために必要となる、顔サイズと顔位置の入力を複数受け付け可能となされている。なお、図５の画面例では、顔サイズと顔位置の２つを受け付けた例を挙げているが、３つ以上を受け付けてもよい。また、学習モデル適用領域を設定可能な情報であれば、ユーザにより指定されるのは顔サイズと顔位置でなくてもよく、例えば直接に学習モデル適用領域が指定されてもよいし、入力情報から学習モデル適用領域を計算できるものならばどのようなものでもよい。 FIG. 5 is a diagram showing an example of a setting screen that is displayed on the display 207 of the client PC 102 and receives input of face size and face position from the user via the operation unit 206. As shown in FIG.
A setting screen 501 is an example of a setting screen that accepts input of a face size and a face position by the user. This setting screen 501 can accept a plurality of face size and face position inputs required for setting a learning model application region (learning model application region). In addition, in the screen example of FIG. 5, an example in which two of the face size and the face position are accepted is given, but three or more may be accepted. Also, as long as the learning model application area can be set by the information, what is specified by the user does not have to be the face size and the face position. Anything can be used as long as it can calculate the learning model application area from .

カーソル５０２及び５０３は、顔位置及び顔サイズを指定する際に表示される位置及びサイズ指定用のカーソルである。ユーザは、クライアントＰＣ１０２の操作部２０６の操作によって、設定画面５０１上でカーソルを画像中の人物に合わせた上で、顔位置及び顔サイズを指定することができる。また、ユーザは、クライアントＰＣ１０２の操作部２０６を介して、カーソルをドラッグアンドドロップ等によって移動させることができ、さらにカーソルの端をドラッグすることで拡大縮小することも可能である。カーソルに対する操作方法はこれらの例に限らず、マウス操作やタッチ操作で行えるものならばどのような方法であってもよい。なお、図５には、カーソル５０２は画面内で手前側に映っている人物に合わせられ、カーソル５０３は画面内で奥側に映っている人物に合わせられた例を示している。 Cursors 502 and 503 are cursors for specifying the position and size displayed when specifying the face position and face size. By operating the operation unit 206 of the client PC 102, the user can place the cursor on the person in the image on the setting screen 501 and specify the face position and face size. Also, the user can move the cursor by drag-and-drop or the like via the operation unit 206 of the client PC 102, and can also enlarge or reduce the size by dragging the end of the cursor. The method of operating the cursor is not limited to these examples, and any method may be used as long as it can be performed by mouse operation or touch operation. Note that FIG. 5 shows an example in which the cursor 502 is aligned with a person appearing on the near side of the screen, and the cursor 503 is aligned with a person appearing on the far side of the screen.

図４のフローチャートに説明を戻し、領域決定部３０３は、Ｓ４０２で得られた顔サイズと顔位置を受け取ると、Ｓ４０３において、それら顔サイズと顔位置を基に学習モデル適用領域を決定（設定）する。ここでは、学習モデル適用領域を適宜、パッチと呼ぶことにする。学習モデル適用領域（パッチ）を設定する処理では、顔サイズと顔位置を基に人が立っている平面を推定し、その平面に立っている人の顔サイズが学習モデルに対して適切なサイズになるように画像を分割するパッチサイズと位置を求める。なお、平面がないと推定される領域については、学習モデルを用いた人数推定処理の対象外の領域としてもよいし、最小のパッチサイズを設定してもよい。平面としては例えば地面や床面を挙げることができ、平面がないと推定される領域としては壁面や人が存在し得ない場所などを挙げることができる。さらに最小のパッチサイズに設定する場合とは、例えばネットワークカメラによる撮影場所からの距離が遠く、対象物の人物等が写っていても非常に小さく写る可能性が高い場合を想定した例である。 Returning to the flowchart of FIG. 4, upon receiving the face size and face position obtained in S402, the region determination unit 303 determines (sets) a learning model application region based on the face size and face position in S403. do. Here, the learning model application area is appropriately called a patch. In the process of setting the learning model application area (patch), the plane on which the person is standing is estimated based on the face size and face position, and the face size of the person standing on that plane is the appropriate size for the learning model. Find the patch size and position to divide the image so that Note that an area estimated to have no flat surface may be excluded from the population estimation process using the learning model, or the minimum patch size may be set. Examples of flat surfaces include the ground and floor surfaces, and examples of regions that are presumed to have no flat surfaces include wall surfaces and places where people cannot exist. Furthermore, the case of setting the minimum patch size is an example assuming a case where, for example, the distance from the photographing location by the network camera is long, and there is a high possibility that even if an object such as a person is photographed, it will be photographed extremely small.

次にＳ４０４において、領域決定部３０３は、Ｓ４０３で得られた学習モデル適用領域の大きさ（サイズ）から、想定顔サイズと想定顔位置を算出する。ここで、想定顔とは、人数推定処理を行った結果として得られると想定される顔のことを指す。想定顔は推定処理の結果得られる顔なので、想定顔のサイズは幅のある値となる。想定顔サイズは、人数推定の回帰器が取りうる顔サイズの範囲を求めておき、その値をパッチ画像の比率に直すことで求めることができる。これら想定顔サイズと想定顔位置は、設定画面の画像上の複数の位置に応じた複数のパッチの大きさをそれぞれ表す指標、つまり学習モデル適用領域の大きさをそれぞれ表す指標を生成して表示する際に使用される。 Next, in S404, the area determining unit 303 calculates an assumed face size and an assumed face position from the size of the learning model application area obtained in S403. Here, the assumed face refers to a face that is assumed to be obtained as a result of performing the population estimation process. Since the assumed face is a face obtained as a result of the estimation process, the size of the assumed face has a range of values. The assumed face size can be obtained by obtaining the range of face sizes that can be taken by the regressor for estimating the number of people, and converting the obtained value into the ratio of the patch image. These assumed face size and assumed face position are generated and displayed as indexes that represent the sizes of multiple patches corresponding to multiple positions on the setting screen image, that is, the sizes of the learning model application areas, respectively. used when

次にＳ４０５において、表示部３０６は、領域決定部３０３からの指示を受け、Ｓ４０４で算出された想定顔サイズと想定顔位置を表す指標を設定画面上に重畳表示する。本実施形態では、想定顔サイズと想定顔位置に表す指標として、アイコンを重畳表示する例を挙げる。 Next, in S405, the display unit 306 receives an instruction from the area determination unit 303, and superimposes the index representing the assumed face size and assumed face position calculated in S404 on the setting screen. In the present embodiment, an example in which icons are superimposed and displayed as indicators for the assumed face size and the assumed face position will be given.

図６は、想定顔サイズに対応した指標（アイコン）が重畳表示された設定画面の例を示す図である。
図６において、設定画面６０１は、ユーザによる顔サイズと顔位置の入力を受け付ける図５と同様の画面を示しており、この設定画面６０１上に想定顔サイズ及び想定顔位置を表す指標としてのアイコン６０４～６０６が重畳表示された例を示している。カーソル６０２、６０３は、前述の図５で説明したカーソル５０２、５０３と同様のものであり、ユーザによって顔サイズと顔位置を指定する際の指定用カーソルである。アイコン６０４～６０６は、想定顔サイズ及び想定顔位置に対応した指標としてのアイコンである。表示部３０６は、Ｓ４０４で算出された想定顔サイズと想定顔位置の情報を基に、複数のアイコン６０４～６０６を作成し、それらを画面上に重畳表示する。すなわち図６は、領域決定部３０３によって画像上の各位置に応じた複数の学習モデル適用領域に対応した想定顔サイズ及び想定顔位置が決定され、それら複数の想定顔サイズ及び想定顔位置に対応した複数のアイコン６０４～６０６が表示された例である。 FIG. 6 is a diagram showing an example of a setting screen on which indexes (icons) corresponding to assumed face sizes are superimposed and displayed.
In FIG. 6, a setting screen 601 shows a screen similar to that in FIG. 5 for accepting input of face size and face position by the user. 604 to 606 show an example of superimposed display. Cursors 602 and 603 are similar to the cursors 502 and 503 described above with reference to FIG. 5, and are cursors for designation when the user designates the face size and face position. Icons 604 to 606 are icons as indices corresponding to the assumed face size and assumed face position. The display unit 306 creates a plurality of icons 604 to 606 based on the information on the assumed face size and assumed face position calculated in S404, and superimposes them on the screen. That is, in FIG. 6, the assumed face size and assumed face position corresponding to a plurality of learning model application areas corresponding to each position on the image are determined by the area determination unit 303, and the assumed face size and assumed face position corresponding to the plurality of assumed face sizes and assumed face positions are determined. This is an example in which a plurality of icons 604 to 606 are displayed.

図７は、アイコンを重畳表示する際の表示位置の説明に用いる図である。
図７に示した枠７０１は、図６に示した設定画面６０１の画像上の複数の位置ごとに対したパッチ（学習モデル適用領域）を表している。カーソル７０２、７０３は、前述の図５で説明したカーソル５０２、５０３と同様に、ユーザによって顔サイズと顔位置を指定する際の指定用カーソルであるとする。アイコン７０４～７０６は、枠７０１で示されるパッチ（学習モデル適用領域）毎にそれぞれ算出される想定顔サイズ及び想定顔位置を基に作成されたアイコンである。アイコン７０４は下段のパッチから算出された想定顔サイズ及び想定顔位置を基に作成されたアイコンである。以下同様に、アイコン７０５は中段のパッチから、アイコン７０６は上段のパッチから、それぞれ算出された想定顔サイズ及び想定顔位置を基に作成されたアイコンである。なお、ここでは、各位置のパッチにおいて想定顔サイズが取り得るサイズ範囲のうちの中央値となる想定顔サイズに応じたアイコンを作成した例を示している。 FIG. 7 is a diagram used to explain display positions when icons are superimposed and displayed.
A frame 701 shown in FIG. 7 represents a patch (learning model application area) for each of a plurality of positions on the image of the setting screen 601 shown in FIG. Cursors 702 and 703 are assumed to be specification cursors for specifying the face size and face position by the user, similar to the cursors 502 and 503 described with reference to FIG. Icons 704 to 706 are created based on the assumed face size and assumed face position calculated for each patch (learning model application area) indicated by the frame 701 . An icon 704 is an icon created based on the assumed face size and assumed face position calculated from the lower patch. Similarly, an icon 705 and an icon 706 are created from the middle patch and the upper patch, respectively, based on the calculated assumed face size and assumed face position. Here, an example is shown in which an icon corresponding to the assumed face size, which is the median value of the size range that the assumed face size can take in the patch at each position, is created.

また、作成されたアイコンを表示する位置は、カーソル７０２、７０３と重ならない、サイズの異なるパッチの各中央とする。すなわち、作成されたアイコンを表示する位置は、パッチを基準に位置を決定することでより推定処理の結果に近い想定顔位置にできる。このとき、想定顔サイズを基に作成された各アイコンは、ユーザの見やすさを考慮して縦に直線状になるように位置を調整して配置してもよいし、画面の固定位置（例えば、画面右端）に配置してもよい。
なお、図６と図７に示した重畳表示の例では、パッチごとの想定顔サイズ及び想定顔位置からアイコンを作成して表示するとしていたが、パッチごとに表示するアイコンは複数であってもよい。 Also, the positions where the created icons are displayed are the centers of the patches of different sizes that do not overlap the cursors 702 and 703 . That is, the positions at which the created icons are displayed can be set to the assumed face positions closer to the result of the estimation processing by determining the positions based on the patches. At this time, each icon created based on the assumed face size may be arranged by adjusting the position so as to form a vertical straight line in consideration of the user's visibility, or may be arranged at a fixed position on the screen (for example, , right edge of the screen).
In the example of superimposed display shown in FIGS. 6 and 7, an icon is created and displayed from the assumed face size and assumed face position for each patch. good.

図８（ａ）及び図８（ｂ）は、パッチごとに複数のアイコンを重畳表示する例を示した図ある。すなわち図８（ａ）及び図８（ｂ）の場合、領域決定部３０３は、学習モデル設定領域に対して想定顔サイズが取り得るサイズ範囲を決定する。そして、表示部３０６は、想定顔サイズが取り得るサイズ範囲を表す指標としての複数のアイコンを表示する。 FIGS. 8A and 8B are diagrams showing an example of superimposed display of a plurality of icons for each patch. That is, in the case of FIGS. 8A and 8B, the area determination unit 303 determines the size range that the assumed face size can take for the learning model setting area. Then, the display unit 306 displays a plurality of icons as indices representing the possible size range of the assumed face size.

図８（ａ）は、想定顔サイズのサイズ値にある程度の幅があることが分かるように表示する例である。図８（ａ）は、想定顔サイズのサイズ値が取り得る範囲を表す指標として、その想定顔サイズの範囲内で最小の大きさを表す指標から最大の大きさを表す指標までを表すアイコン画像を表示する例を示している。すなわち例えば、画面内の手前側には、その位置に対応して取り得る最小の想定顔サイズを表すアイコン８０５から最大の想定顔サイズを表すアイコン８０６までを表すアイコン画像が表示される。一方、画面内の奥側には、その位置に対応して取り得る最小の想定顔サイズを表すアイコン８０１から最大の想定顔サイズを表すアイコン８０２までを表すアイコン画像が表示される。またそれら画面内の手前側と奥側との間では、その位置に対応して取り得る最小の想定顔サイズを表すアイコン８０３から最大の想定顔サイズを表すアイコン８０４までを表すアイコン画像が表示される。このように、前述したアイコン７０４～７０６の代わりに、図８（ａ）の表示を行うことで、ユーザに対し想定顔サイズの取りうるサイズの範囲を知らせることができる。 FIG. 8(a) is an example of displaying so that it can be seen that the size values of the assumed face size have a certain range. FIG. 8(a) is an icon image representing an index representing the range of possible size values of the assumed face size, from an index representing the minimum size to an index representing the maximum size within the range of the assumed face size. shows an example of displaying . That is, for example, on the near side of the screen, icon images are displayed that represent from an icon 805 representing the minimum assumed face size that can be taken corresponding to the position to an icon 806 representing the maximum assumed face size. On the other hand, on the far side of the screen, icon images representing an icon 801 representing the minimum assumed face size that can be taken corresponding to the position and an icon 802 representing the maximum assumed face size are displayed. Also, between the front side and the back side of the screen, icon images are displayed that represent from an icon 803 that represents the minimum assumed face size that can be taken corresponding to the position to an icon 804 that represents the maximum assumed face size. be. In this manner, by displaying FIG. 8A in place of the icons 704 to 706 described above, the user can be notified of the possible size range of the assumed face size.

また、この表示例に限らずアイコンサイズをアニメーションで変化させて表示するなどしてもよい。一例として、アイコンの表示のサイズと表示の位置を順次変化させるようなアニメーション表示や、サイズの小さいアイコンを画面の奥側に表示し、サイズが大きくなるほど画面の手前側に表示させるようなアニメーション表示が挙げられる。 In addition, the icon size is not limited to this display example, and may be displayed by changing the icon size by animation. For example, an animation display that sequentially changes the size and position of the icon display, or an animation display that displays small-sized icons at the back of the screen and displays larger icons at the front of the screen. is mentioned.

さらに図８（ｂ）は、想定顔サイズの人物が密集している状況が分かるように表示する例である。図８（ｂ）は、想定顔サイズの値が取り得る範囲を表す指標として、想定顔サイズの値が取り得る範囲内で最小の大きさを表す指標から最大の大きさを表す指標までを含む集合画像を表示する例を示している。すなわち例えば、画面内の手前側から奥側に行くにしたがって、最大の想定顔サイズに対応した人物から最小の想定顔サイズに対応した人物までの複数の人物が集合した集合画像（群衆画像）８１０が表示される。このように、前述したアイコン７０４～７０６の代わりに、図８（ｂ）に示すように、想定顔サイズの値の幅に合う複数の人物の集合画像８１０を作成して表示することで、想定顔サイズの人が密集している状況をユーザに知らせることができる。集合画像８１０の表示は、予め用意した画像を表示することにより行われてもよいし、コンピュータグラフィック（ＣＧ）にて生成される複数の人物ＣＧを合成した画像を表示することにより行われてもよい。 Furthermore, FIG. 8(b) is an example of displaying in such a way that people with assumed face sizes are crowded together. FIG. 8(b) includes an index representing the minimum size to the maximum size within the range that the value of the assumed face size can take as an index representing the range that the value of the assumed face size can take. An example of displaying an aggregate image is shown. That is, for example, a group image (crowd image) 810 in which a plurality of persons from the person corresponding to the maximum assumed face size to the person corresponding to the minimum assumed face size gather from the front side to the back side of the screen. is displayed. In this way, instead of the icons 704 to 706 described above, as shown in FIG. A user can be notified of a situation in which people of the same face size are crowded. The collective image 810 may be displayed by displaying an image prepared in advance, or by displaying an image obtained by synthesizing a plurality of person CGs generated by computer graphics (CG). good.

なお図８の例に限らず、人数推定処理を行った場合に、どのような大きさの学習モデル適用領域で人物を推定することになるのかをユーザが判断できる情報を提示できるのであれば、どのような方法が用いられてもよい。例えば、すべてのパッチにアイコンを表示してもよいし、アイコンを半透明にする、アイコンの代わりに文字で表現する、アイコンではなく実映像を用いるなどしてもよい。 Not limited to the example of FIG. 8, if the user can present information that allows the user to determine the size of the learning model application area in which people are to be estimated when the number of people estimation process is performed, Any method may be used. For example, icons may be displayed on all patches, icons may be translucent, characters may be used instead of icons, or actual images may be used instead of icons.

図４のフローチャートに説明を戻し、Ｓ４０６に進むと、領域決定部３０３は、Ｓ４０２で得られた顔サイズと顔位置の設定が完了したかどうかを判定する。ここで、設定が完了したかどうかの判定は、ユーザによる入力を受け付けて完了かどうか判定してもよい。或いは、領域決定部３０３は、Ｓ４０２とＳ４０４でそれぞれ得られた顔位置及び顔サイズと想定顔位置及び想定顔サイズとの差分を計算し、その差分が閾値以下になっているかどうかで判定してもよい。すなわちそれらの差分が閾値以下であるならば、ユーザにより指定された値（顔位置・顔サイズ）と、領域決定部３０３で決定された想定値（想定顔位置・想定顔サイズ）とが近いため、設定完了とみなすようにしてもよい。そして、Ｓ４０６において設定が完了したと判定された場合は、処理はＳ４０７に進み、一方、完了していないと判定された場合にはＳ４０２の処理に戻る。 Returning to the flowchart of FIG. 4 and proceeding to S406, the region determination unit 303 determines whether the setting of the face size and the face position obtained in S402 has been completed. Here, whether or not the setting is completed may be determined by receiving an input from the user. Alternatively, the area determination unit 303 calculates the difference between the face position and face size obtained in S402 and S404 and the assumed face position and assumed face size, and determines whether the difference is equal to or less than a threshold. good too. That is, if the difference between them is equal to or less than the threshold, the value specified by the user (face position/face size) is close to the assumed value (assumed face position/assumed face size) determined by the area determination unit 303. , the setting may be regarded as completed. If it is determined in S406 that the setting has been completed, the process proceeds to S407, and if it is determined that the setting has not been completed, the process returns to S402.

Ｓ４０７に進むと、領域決定部３０３は、Ｓ４０３で得られた学習モデル適用領域に応じて画像を分割する。本実施形態では、この分割した画像をパッチとする。
次にＳ４０８において、推定部３０４は、Ｓ４０３で得られた学習モデル適用領域の結果を受け取って、人数推定処理を実行する。推定部３０４は、パッチ画像をリサイズして入力とし、人数推定の回帰器への入力することで各領域の人数推定結果を得る。
次にＳ４０９において、推定部３０４は、分割したパッチ画像すべてを処理したか判定し、処理が終わっていなければＳ４０８の処理を繰り返す。一方、すべてのパッチに対して人数推定処理を実行した場合、処理はＳ４１０に進む。 After proceeding to S407, the area determination unit 303 divides the image according to the learning model application area obtained in S403. In this embodiment, these divided images are used as patches.
Next, in S408, the estimating unit 304 receives the result of the learning model application region obtained in S403, and executes the number of people estimation process. The estimating unit 304 resizes the patch image and inputs it to the regressor for estimating the number of people, thereby obtaining the result of estimating the number of people in each region.
Next, in S409, the estimating unit 304 determines whether all divided patch images have been processed, and if the processing has not ended, repeats the processing of S408. On the other hand, if the population estimation process has been executed for all patches, the process proceeds to S410.

Ｓ４１０に進むと、統合部３０５は、Ｓ４０８とＳ４０９の処理によって得られた人数推定処理の各結果を受け取り、それらの結果の統合を行う。すなわち、統合部３０５は、各結果を合算することで画像全体における人数推定結果を算出する。その後、図４のフローチャートの処理は終了する。 After proceeding to S410, the integrating unit 305 receives each result of the population estimation process obtained by the processes of S408 and S409, and integrates those results. In other words, the integration unit 305 calculates the result of estimating the number of people in the entire image by adding up each result. After that, the processing of the flowchart of FIG. 4 ends.

以上説明したように、本実施形態の情報処理システムによれば、画像解析処理を行うためのパラメータ設定に必要な情報として、画像解析処理が推定可能な顔サイズと顔位置を設定画面上に表示する。これによりユーザは適切な解析設定が行えているか判断し易くなる。 As described above, according to the information processing system of the present embodiment, the face size and face position that can be estimated by the image analysis process are displayed on the setting screen as information necessary for parameter setting for performing the image analysis process. do. This makes it easier for the user to determine whether appropriate analysis settings have been made.

また、本実施形態の情報処理システムの場合は、画像解析処理の実行も削減されるため画像解析処理の負荷の軽減が可能となる。すなわち特許文献２では、解析結果を用いて、所望の処理結果を得るための設定作業やパラメータ修正作業を適正に行うのに必要な情報をユーザに提示している。これに対し、本実施形態の情報処理装置は、ユーザが指定した領域の大きさと位置を基に、所定の推定処理による推定が可能な領域の大きさを決定し、その決定された推定処理による推定が可能な領域の大きさを表す指標をユーザに提示している。言い換えると、特許文献２の場合、ユーザに対し、解析結果を用いてパラメータの修正提案を行っているのに対し、本実施形態の場合は入力パラメータから算出される解析パラメータを用いて修正提案を行う。このため、本実施形態によれば、画像解析処理を実行する前に、ユーザに対し、どのような解析結果が得られるかをフィードバックすることができ、必要性が少ない画像解析処理が実行されることによる処理負荷を減らすことができる。 Further, in the case of the information processing system of the present embodiment, execution of image analysis processing is also reduced, so it is possible to reduce the load of image analysis processing. That is, in Patent Document 2, the analysis result is used to present the user with the information necessary to properly perform the setting work and the parameter correction work for obtaining the desired processing result. On the other hand, the information processing apparatus of the present embodiment determines the size of an area that can be estimated by a predetermined estimation process based on the size and position of the area specified by the user, and the determined estimation process The user is presented with an index representing the size of an area that can be estimated. In other words, in the case of Patent Document 2, the user is suggested to correct the parameters using the analysis results. conduct. For this reason, according to the present embodiment, it is possible to provide feedback to the user as to what kind of analysis results are obtained before image analysis processing is executed, and image analysis processing that is less necessary is executed. processing load can be reduced.

本発明は、上述の実施形態の一以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける一つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、一以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。
上述の実施形態は、何れも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明は、その技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (eg, ASIC) that implements one or more functions.
All of the above-described embodiments merely show specific examples for carrying out the present invention, and the technical scope of the present invention should not be construed to be limited by these. That is, the present invention can be embodied in various forms without departing from its technical concept or main features.

３０２：画像取得部、３０３：領域決定部、３０４：推定部、３０５：統合部、３０６：表示部 302: image acquisition unit, 303: area determination unit, 304: estimation unit, 305: integration unit, 306: display unit

Claims

Determination means for determining the size of an area that can be estimated by a predetermined estimation process based on the size and position of an area specified by a user on a setting screen on which an image acquired by imaging is displayed;
display means for displaying, on the setting screen for the user to specify the size and position of the area, an indicator representing the size of the area that can be estimated by the estimation process determined by the determination means;
An information processing device comprising:

The determination means determines a plurality of sizes corresponding to a plurality of positions on the image displayed on the setting screen as sizes of areas that can be estimated by the estimation process,
2. The information processing apparatus according to claim 1, wherein said display means displays a plurality of indicators respectively representing a plurality of sizes corresponding to a plurality of positions on said image determined by said determination means. .

The determining means determines a possible size range of an area that can be estimated by the estimation process,
3. The information processing apparatus according to claim 1, wherein said display means displays said index representing said size range.

The display means displays, as the index representing the range of sizes, an image representing an index representing a minimum size to an index representing a maximum size within the range of sizes. Item 4. The information processing device according to item 3.

The display means displays, as the index representing the range of sizes, a group image including an index representing a minimum size to an index representing a maximum size within the range of sizes. The information processing apparatus according to claim 3.

The predetermined estimation process is a process of detecting at least a predetermined target object from the captured image using a learning model,
The determining means determines the size of the area to which the learning model is applied based on the size and position of the area designated by the user on the setting screen,
6. The information processing apparatus according to any one of claims 1 to 5, wherein said display means displays an index representing the size of the area to which said learning model is applied.

7. The information processing apparatus according to claim 6, wherein the predetermined estimation processing includes processing for estimating the number of the detected predetermined objects from the captured image using the learning model.

8. An information processing apparatus according to claim 6, wherein said predetermined object is a person.

9. The information processing apparatus according to claim 8, wherein the size and position of the region designated by the user are the face size and position of the person.

An information processing method executed by an information processing device,
a determination step of determining the size of an area that can be estimated by a predetermined estimation process based on the size and position of an area specified by a user on a setting screen displaying an image acquired by imaging;
a display step of displaying, on the setting screen for the user to specify the size and position of the region, an indicator representing the size of the region that can be estimated by the estimation process determined in the determination step;
An information processing method characterized by having

A program for causing a computer to function as the information processing apparatus according to any one of claims 1 to 9.