JP2020188449A

JP2020188449A - Image analyzing program, information processing terminal, and image analyzing system

Info

Publication number: JP2020188449A
Application number: JP2019178900A
Authority: JP
Inventors: 安紘土田; Yasuhiro Tsuchida
Original assignee: AWL Inc
Current assignee: AWL Inc
Priority date: 2019-05-10
Filing date: 2019-09-30
Publication date: 2020-11-19
Anticipated expiration: 2039-09-30
Also published as: JP6651086B1

Abstract

To provide an image analyzing program, an information processing terminal, and an image analyzing system that allow users to easily install cameras without adjusting installation positions and installation angles of the cameras by a specialized technical staff.SOLUTION: A method includes measuring inference processing accuracy by NN model (S4) and notifying a user of instructions to improve the inference processing accuracy based on the measured inference processing accuracy (S5, S6, and S8). The user adjusts an installation position and installation angle of a camera of a smart phone based on the instructions, and thereby appropriate inference accuracy of the NN model can be obtained without a specialized technical staff visiting the installation position of the camera and adjusting the installation position and installation angle of the camera.SELECTED DRAWING: Figure 6

Description

本発明は、画像分析プログラム、情報処理端末、及び画像分析システムに関する。 The present invention relates to an image analysis program, an information processing terminal, and an image analysis system.

従来から、店舗等の屋内に、監視カメラや、いわゆるＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）カメラ等のカメラを配して、これらのカメラで撮影したフレーム画像に映り込んだ人等の物体を、学習済物体検出用ニューラルネットワーク等で検出し、検出した物体の認識を、学習済物体認識用ニューラルネットワークを用いて行うようにした装置やシステムが知られている（例えば、特許文献１参照）。 Conventionally, a surveillance camera or a camera such as a so-called AI (Artificial Intelligence) camera is arranged indoors of a store or the like, and an object such as a person reflected in a frame image taken by these cameras is detected as a learned object. There are known devices and systems that detect an object by using a neural network for recognizing an object and recognize the detected object by using a neural network for recognizing a learned object (see, for example, Patent Document 1).

特開２０１８−０９３２８３号公報Japanese Unexamined Patent Publication No. 2018-093283

ところが、上記特許文献１に記載されたような従来のカメラで撮影した人等の物体の検出及び認識を行う装置やシステムでは、ＡＩカメラ等のカメラの設置に非常に手間がかかるという問題があった。具体的には、この種の装置やシステムの導入時には、専門の技術スタッフが、カメラの設置場所に赴いて、（逆光になるのを避けるために光源を考慮した、）カメラの設置位置（撮影位置）及び設置角度（撮影方向）の調整を行い、適切な学習済物体検出用ニューラルネットワーク又は学習済物体認識用ニューラルネットワークの推論処理の精度（推論精度）を担保できるようにする必要があった。 However, in a device or system that detects and recognizes an object such as a person photographed by a conventional camera as described in Patent Document 1, there is a problem that it takes a lot of time and effort to install a camera such as an AI camera. It was. Specifically, when introducing this type of device or system, specialized technical staff will go to the camera installation location (taking into account the light source to avoid backlighting) and the camera installation position (shooting). It was necessary to adjust the position) and installation angle (shooting direction) so that the accuracy (inference accuracy) of the inference processing of the appropriate trained object detection neural network or trained object recognition neural network could be ensured. ..

本発明は、上記課題を解決するものであり、専門の技術スタッフが、カメラの設置場所に赴いて、カメラの設置位置や設置角度の調整を行うことなく、適切な学習済物体検出用又は学習済物体認識用ニューラルネットワークの推論精度を得ることができるようにして、ユーザが容易にカメラを設置することを可能にする画像分析プログラム、情報処理端末、及び画像分析システムを提供することを目的とする。 The present invention solves the above-mentioned problems, and a specialized technical staff goes to the installation location of the camera to appropriately detect or learn the learned object without adjusting the installation position and the installation angle of the camera. An object of the present invention is to provide an image analysis program, an information processing terminal, and an image analysis system that enable a user to easily install a camera by making it possible to obtain the inference accuracy of a neural network for finished object recognition. To do.

上記課題を解決するために、本発明の第1の態様による画像分析プログラムは、カメラを備えた情報処理端末を、前記カメラからの入力画像に映り込んだ物体を検出するための学習済物体検出用ニューラルネットワークモデルを含む、１つ以上の学習済ニューラルネットワークモデルを用いて、前記カメラからの入力画像を分析する画像分析部と、前記１つ以上の学習済ニューラルネットワークモデルのうち、いずれかの学習済ニューラルネットワークモデルによる推論処理の精度を測定する推論精度測定部と、前記推論精度測定部により測定された推論処理の精度に基づいて、この推論処理の精度を向上させるための指示情報をユーザに報知する報知部として機能させる。 In order to solve the above problem, the image analysis program according to the first aspect of the present invention uses an information processing terminal equipped with a camera to detect a trained object for detecting an object reflected in an input image from the camera. One of the image analysis unit that analyzes the input image from the camera and the one or more trained neural network models using one or more trained neural network models including the neural network model for Based on the inference accuracy measurement unit that measures the accuracy of the inference processing by the trained neural network model and the inference processing accuracy measured by the inference accuracy measurement unit, the user provides instruction information for improving the accuracy of the inference processing. It functions as a notification unit that notifies the user.

この画像分析プログラムにおいて、前記学習済物体検出用ニューラルネットワークモデルによる検出物体は、人又は顔であり、前記指示情報は、前記学習済物体検出用ニューラルネットワークモデルによって前記人又は顔を検出するのに適した設置位置又は設置角度に前記カメラを動かすことを、前記ユーザに促す指示情報であってもよい。 In this image analysis program, the object detected by the trained object detection neural network model is a person or a face, and the instruction information is used to detect the person or face by the trained object detection neural network model. It may be instructional information that prompts the user to move the camera to a suitable installation position or installation angle.

この画像分析プログラムにおいて、前記報知部は、前記指示情報を表示装置に表示することにより、前記指示情報をユーザに報知してもよい。 In this image analysis program, the notification unit may notify the user of the instruction information by displaying the instruction information on the display device.

本発明の第２の態様による画像分析プログラムは、カメラを備えた情報処理端末を、前記カメラからの入力画像に人又は顔の画像を重畳させる画像重畳部と、前記画像重畳部により重畳させた人又は顔の画像を、１つ以上の学習済ニューラルネットワークモデルを用いて分析する画像分析部と、前記１つ以上の学習済ニューラルネットワークモデルのうち、いずれかの学習済ニューラルネットワークモデルによる推論処理の精度を測定する推論精度測定部と、前記推論精度測定部により測定された推論処理の精度に基づいて、この推論処理の精度を向上させるための指示情報をユーザに報知する報知部として機能させる。 In the image analysis program according to the second aspect of the present invention, an information processing terminal provided with a camera is superposed by an image superimposing unit that superimposes an image of a person or a face on an input image from the camera and the image superimposing unit. Inference processing by an image analysis unit that analyzes a person or face image using one or more trained neural network models and one of the trained neural network models of the one or more trained neural network models. Based on the inference accuracy measurement unit that measures the accuracy of the inference accuracy and the inference processing accuracy measured by the inference accuracy measurement unit, it functions as a notification unit that notifies the user of instruction information for improving the accuracy of the inference processing. ..

この画像分析プログラムにおいて、前記指示情報は、前記いずれかの学習済ニューラルネットワークモデルによる推論処理を行うのに適した設置位置又は設置角度に前記カメラを動かすことを、前記ユーザに促す指示情報であってもよい。 In this image analysis program, the instruction information is instruction information that prompts the user to move the camera to an installation position or an installation angle suitable for performing inference processing by any of the learned neural network models. You may.

この画像分析プログラムにおいて、前記画像重畳部は、前記入力画像に重畳させる人又は顔の画像における人の属性及び数を、種々の属性及び数に経時的に変化させ、前記画像分析部は、前記経時的に変化させた属性及び数の人又は顔の画像を前記入力画像に重畳させた種々の画像を、次々に分析し、前記報知部は、前記経時的に変化させた属性及び数の人又は顔の画像を前記入力画像に重畳させた種々の画像について、前記推論精度測定部により測定された推論処理の精度に基づいて、前記推論処理の精度を向上させるための指示情報をユーザに報知してもよい。 In this image analysis program, the image superimposing unit changes the attributes and numbers of people in the image of a person or face superimposed on the input image with time to various attributes and numbers, and the image analysis unit is said to be the same. Various images obtained by superimposing an image of a person or a face with an attribute and a number changed with time on the input image are analyzed one after another, and the notification unit is a person with an attribute and a number changed with time. Alternatively, for various images in which a face image is superimposed on the input image, instruction information for improving the accuracy of the inference processing is notified to the user based on the accuracy of the inference processing measured by the inference accuracy measuring unit. You may.

本発明の第３の態様による情報処理端末は、カメラと、前記カメラからの入力画像に映り込んだ物体を検出するための学習済物体検出用ニューラルネットワークモデルを含む、１つ以上の学習済ニューラルネットワークモデルを用いて、前記カメラからの入力画像を分析する画像分析部と、前記１つ以上の学習済ニューラルネットワークモデルのうち、いずれかの学習済ニューラルネットワークモデルによる推論処理の精度を測定する推論精度測定部と、前記推論精度測定部により測定された推論処理の精度に基づいて、この推論処理の精度を向上させるための指示情報をユーザに報知する報知部とを備える。 The information processing terminal according to the third aspect of the present invention includes a camera and one or more trained neural networks including a trained object detection neural network model for detecting an object reflected in an input image from the camera. An image analysis unit that analyzes an input image from the camera using a network model, and an inference that measures the accuracy of inference processing by one of the trained neural network models of the one or more trained neural network models. It includes an accuracy measuring unit and a notification unit that notifies the user of instruction information for improving the accuracy of the inference processing based on the accuracy of the inference processing measured by the inference accuracy measuring unit.

この情報処理端末において、前記学習済物体検出用ニューラルネットワークモデルによる検出物体は、人又は顔であり、前記指示情報は、前記学習済物体検出用ニューラルネットワークモデルによって前記人又は顔を検出するのに適した設置位置又は設置角度に前記カメラを動かすことを、前記ユーザに促す指示情報であってもよい。 In this information processing terminal, the object detected by the trained object detection neural network model is a person or face, and the instruction information is used to detect the person or face by the trained object detection neural network model. It may be instructional information that prompts the user to move the camera to a suitable installation position or installation angle.

この情報処理端末において、前記情報処理端末は、前記カメラからの入力画像を表示するための表示装置をさらに備え、前記報知部は、前記指示情報を前記表示装置に表示することにより、前記指示情報をユーザに報知してもよい。 In this information processing terminal, the information processing terminal further includes a display device for displaying an input image from the camera, and the notification unit displays the instruction information on the display device to display the instruction information. May be notified to the user.

この情報処理端末において、前記情報処理端末は、前記カメラからの入力画像に基づいて、前記カメラの設置位置及び設置角度を推定するカメラ位置方向推定部と、前記表示装置に表示された前記カメラからの入力画像に対して、前記人又は顔の検出位置を指定するためのポインティングデバイスとをさらに備え、前記報知部は、前記推論精度測定部により測定された推論処理の精度に加えて、前記カメラ位置方向推定部により推定された前記カメラの設置位置及び設置角度と、前記ポインティングデバイスにより指定された前記人又は顔の検出位置とに基づいて、前記指示情報を求めてもよい。 In this information processing terminal, the information processing terminal is derived from a camera position direction estimation unit that estimates the installation position and installation angle of the camera based on an input image from the camera, and the camera displayed on the display device. A pointing device for designating the detection position of the person or face is further provided with respect to the input image of the camera, and the notification unit is provided with the camera in addition to the accuracy of the inference processing measured by the inference accuracy measurement unit. The instruction information may be obtained based on the installation position and installation angle of the camera estimated by the position direction estimation unit and the detection position of the person or face designated by the pointing device.

本発明の第４の態様による情報処理端末は、カメラと、前記カメラからの入力画像に人又は顔の画像を重畳させる画像重畳部と、前記画像重畳部により重畳させた人又は顔の画像を、１つ以上の学習済ニューラルネットワークモデルを用いて分析する画像分析部と、前記１つ以上の学習済ニューラルネットワークモデルのうち、いずれかの学習済ニューラルネットワークモデルによる推論処理の精度を測定する推論精度測定部と、前記推論精度測定部により測定された推論処理の精度に基づいて、この推論処理の精度を向上させるための指示情報をユーザに報知する報知部とを備える。 The information processing terminal according to the fourth aspect of the present invention has a camera, an image superimposing unit that superimposes an image of a person or a face on an input image from the camera, and an image of a person or a face superimposed by the image superimposing unit. An image analysis unit that analyzes using one or more trained neural network models, and an inference that measures the accuracy of inference processing by one of the trained neural network models of the one or more trained neural network models. It includes an accuracy measuring unit and a notification unit that notifies the user of instruction information for improving the accuracy of the inference processing based on the accuracy of the inference processing measured by the inference accuracy measuring unit.

この情報処理端末において、前記指示情報は、前記いずれかの学習済ニューラルネットワークモデルによる推論処理を行うのに適した設置位置又は設置角度に前記カメラを動かすことを、前記ユーザに促す指示情報であってもよい。 In this information processing terminal, the instruction information is instruction information that urges the user to move the camera to an installation position or an installation angle suitable for performing inference processing by any of the learned neural network models. You may.

この情報処理端末において、前記情報処理端末は、前記カメラからの入力画像を表示する表示装置をさらに備え、前記報知部は、前記指示情報を前記表示装置に表示することにより、前記指示情報をユーザに報知してもよい。 In this information processing terminal, the information processing terminal further includes a display device for displaying an input image from the camera, and the notification unit displays the instruction information on the display device to display the instruction information to the user. May be notified to.

この情報処理端末において、前記画像重畳部は、前記入力画像に重畳させる人又は顔の画像における人の属性及び数を、種々の属性及び数に経時的に変化させ、前記画像分析部は、前記経時的に変化させた属性及び数の人又は顔の画像を前記入力画像に重畳させた種々の画像を、次々に分析し、前記報知部は、前記経時的に変化させた属性及び数の人又は顔の画像を前記入力画像に重畳させた種々の画像について、前記推論精度測定部により測定された推論処理の精度に基づいて、前記推論処理の精度を向上させるための指示情報をユーザに報知してもよい。 In this information processing terminal, the image superimposing unit changes the attributes and numbers of people in the person or face image superimposed on the input image with time to various attributes and numbers, and the image analysis unit is said to be said. Various images obtained by superimposing an image of a person or a face with an attribute and a number changed with time on the input image are analyzed one after another, and the notification unit is a person with an attribute and a number changed with time. Alternatively, for various images in which a face image is superimposed on the input image, instruction information for improving the accuracy of the inference processing is notified to the user based on the accuracy of the inference processing measured by the inference accuracy measuring unit. You may.

本発明の第５の態様による画像分析システムは、前記のいずれかの情報処理端末と、前記情報処理端末への前記学習済ニューラルネットワークモデルのインストールを含む、前記情報処理端末の管理を行う管理サーバとを備える。 The image analysis system according to the fifth aspect of the present invention is a management server that manages the information processing terminal, including any of the above information processing terminals and installation of the learned neural network model on the information processing terminal. And.

本発明の第１及び第２の態様による画像分析プログラム、及び第３及び第４の態様による情報処理端末によれば、１つ以上の学習済ニューラルネットワークモデルのうち、いずれかの学習済ニューラルネットワークモデルによる推論処理の精度を測定して、測定した推論処理の精度に基づいて、この推論処理の精度を向上させるための指示情報をユーザに報知するようにした。この報知された指示情報に基づいて、ユーザが、情報処理端末のカメラの設置位置（撮影位置）や設置角度（撮影方向）を調整することにより、専門の技術スタッフが、カメラの設置場所に赴いて、カメラの設置位置や設置角度の調整を行うことなく、適切な学習済ニューラルネットワークモデル（学習済物体検出用ニューラルネットワーク又は学習済物体認識用ニューラルネットワーク）の推論精度を得ることができる。従って、ユーザが容易にカメラを設置することができる。 According to the image analysis program according to the first and second aspects of the present invention and the information processing terminal according to the third and fourth aspects, any one of the trained neural network models of one or more trained neural networks The accuracy of the inference processing by the model is measured, and the instruction information for improving the accuracy of the inference processing is notified to the user based on the measured accuracy of the inference processing. Based on this notified instruction information, the user adjusts the installation position (shooting position) and installation angle (shooting direction) of the camera of the information processing terminal, and the specialized technical staff goes to the camera installation location. Therefore, it is possible to obtain an appropriate inference accuracy of the trained neural network model (learned object detection neural network or trained object recognition neural network) without adjusting the installation position and installation angle of the camera. Therefore, the user can easily install the camera.

また、本発明の第５の態様による画像分析システムによれば、上記の効果に加えて、管理サーバを用いて、情報処理端末への学習済ニューラルネットワークモデル（学習済物体検出用ニューラルネットワーク及び学習済物体認識用ニューラルネットワーク）のインストールを含む、情報処理端末の管理を行うことができる。 Further, according to the image analysis system according to the fifth aspect of the present invention, in addition to the above effects, a trained neural network model for an information processing terminal (a trained neural network for detecting an object and learning) using a management server. It is possible to manage information processing terminals, including the installation of a neural network for learning objects.

本発明の第１の実施形態のスマートフォンを含む、画像分析システムの概略の構成を示すブロック構成図。The block block diagram which shows the schematic structure of the image analysis system including the smartphone of 1st Embodiment of this invention. 同スマートフォンの概略のハードウェア構成を示すブロック図。A block diagram showing a schematic hardware configuration of the smartphone. 同スマートフォンにおける主なソフトウェアの構成図。Configuration diagram of the main software on the smartphone. 同スマートフォンにおけるＳｏＣの機能ブロック構成図。Functional block configuration diagram of SoC in the smartphone. 同スマートフォン（のカメラ）の設置方法の例を示す説明図。Explanatory drawing which shows an example of the installation method of the smartphone (camera). 同スマートフォンにおける、インストラクションを用いたカメラの設置位置及び設置角度の調整処理の一例を示すフローチャート。The flowchart which shows an example of the adjustment process of the installation position and the installation angle of a camera using an instruction in the smartphone. 同スマートフォンにおける顔検出モデルによる推論処理の精度を向上させるためのインストラクションの一例と、このインストラクションを用いたカメラの設置位置及び設置角度の調整処理の説明図。An example of an instruction for improving the accuracy of inference processing by a face detection model on the smartphone, and an explanatory diagram of an adjustment process of a camera installation position and an installation angle using this instruction. 同スマートフォンにおける、インストラクションを用いたカメラの設置位置及び設置角度の調整処理の他の例を示すフローチャート。A flowchart showing another example of the camera installation position and installation angle adjustment process using instructions in the smartphone. 同スマートフォンにおける人物検出モデルによる推論処理の精度を向上させるためのインストラクションの一例と、このインストラクションを用いたカメラの設置位置及び設置角度の調整処理の説明図。An example of an instruction for improving the accuracy of inference processing by a person detection model on the smartphone, and an explanatory diagram of an adjustment process of a camera installation position and an installation angle using this instruction. 同スマートフォンにおける人物検出モデルによる推論処理の精度を向上させるためのインストラクションの一例と、このインストラクションを用いたカメラの設置位置及び設置角度の調整処理の説明図。An example of an instruction for improving the accuracy of inference processing by a person detection model on the smartphone, and an explanatory diagram of an adjustment process of a camera installation position and an installation angle using this instruction. 同スマートフォンにおける人物検出モデルによる推論処理の精度を向上させるためのインストラクションの一例と、このインストラクションを用いたカメラの設置位置及び設置角度の調整処理の説明図。An example of an instruction for improving the accuracy of inference processing by a person detection model on the smartphone, and an explanatory diagram of an adjustment process of a camera installation position and an installation angle using this instruction. 同スマートフォンにおける人物検出モデルによる推論処理の精度を向上させるためのインストラクションの一例と、このインストラクションを用いたカメラの設置位置及び設置角度の調整処理の説明図。An example of an instruction for improving the accuracy of inference processing by a person detection model on the smartphone, and an explanatory diagram of an adjustment process of a camera installation position and an installation angle using this instruction. 同スマートフォンにおける人物検出モデルによる推論処理の精度を向上させるためのインストラクションの一例と、このインストラクションを用いたカメラの設置位置及び設置角度の調整処理の説明図。An example of an instruction for improving the accuracy of inference processing by a person detection model on the smartphone, and an explanatory diagram of an adjustment process of a camera installation position and an installation angle using this instruction. 同スマートフォンにおける人物検出モデルによる推論処理の精度を向上させるためのインストラクションの一例と、このインストラクションを用いたカメラの設置位置及び設置角度の調整処理の説明図。An example of an instruction for improving the accuracy of inference processing by a person detection model on the smartphone, and an explanatory diagram of an adjustment process of a camera installation position and an installation angle using this instruction. 本発明の第２の実施形態のスマートフォンにおけるＳｏＣの機能ブロック構成図。The functional block block diagram of SoC in the smartphone of the 2nd Embodiment of this invention. 同スマートフォンにおけるＳｏＣの各機能ブロックと、３Ｄモデル重畳部による３次元コンピュータグラフィックス（３ＤＣＧ）の人の画像の作成に必要なファイルを示すブロック図。A block diagram showing each functional block of SoC in the same smartphone and a file necessary for creating an image of a person in 3D computer graphics (3DCG) by a 3D model superimposition part. 同スマートフォンにおける、インストラクションを用いたカメラの設置位置及び設置角度の調整処理の例を示すフローチャート。The flowchart which shows the example of the adjustment process of the installation position and the installation angle of the camera using the instruction in the smartphone. 同スマートフォンにおける顔認識モデルによる（性別・年齢推定処理の）推論処理の精度を向上させるためのインストラクションの一例と、このインストラクションを用いたカメラの設置位置及び設置角度の調整処理の説明図。An example of an instruction for improving the accuracy of the inference processing (gender / age estimation processing) by the face recognition model on the smartphone, and an explanatory diagram of the adjustment processing of the camera installation position and installation angle using this instruction. 同スマートフォンにおけるタッチパネル上に表示される状況選択メニュー画面を示す図。The figure which shows the situation selection menu screen displayed on the touch panel of the smartphone. 同スマートフォンにおける顔認識モデルによる推論処理の精度を向上させるためのインストラクションの一例と、このインストラクションを用いたカメラの設置位置及び設置角度の調整処理の説明図。An example of an instruction for improving the accuracy of inference processing by a face recognition model on the smartphone, and an explanatory diagram of an adjustment process of a camera installation position and an installation angle using this instruction. 同スマートフォンにおける顔認識モデルによる推論処理の精度を向上させるためのインストラクションの一例と、このインストラクションを用いたカメラの設置位置及び設置角度の調整処理の説明図。An example of an instruction for improving the accuracy of inference processing by a face recognition model on the smartphone, and an explanatory diagram of an adjustment process of a camera installation position and an installation angle using this instruction. 同スマートフォンにおける顔認識モデルによる推論処理の精度を向上させるためのインストラクションの一例と、このインストラクションを用いたカメラの設置位置及び設置角度の調整処理の説明図。An example of an instruction for improving the accuracy of inference processing by a face recognition model on the smartphone, and an explanatory diagram of an adjustment process of a camera installation position and an installation angle using this instruction. 同スマートフォンにおける顔認識モデルによる推論処理の精度を向上させるためのインストラクションの一例と、このインストラクションを用いたカメラの設置位置及び設置角度の調整処理の説明図。An example of an instruction for improving the accuracy of inference processing by a face recognition model on the smartphone, and an explanatory diagram of an adjustment process of a camera installation position and an installation angle using this instruction.

以下、本発明を具体化した実施形態による画像分析プログラム、情報処理端末、及び画像分析システムについて、図面を参照して説明する。図１は、本発明の第１の実施形態による情報処理端末であるスマートフォン１（すなわち、本発明の第１の実施形態による画像分析プログラムである画像分析アプリケーション（図２参照）がインストールされた情報処理端末）を含む、画像分析システム１０の概略のブロック構成図である。図１に示すように、画像分析システム１０は、住居等の各設置エリアに設置されたスマートフォン１と、クラウドＣ上のＡＩ分析サーバ３及びスマホ管理サーバ４（請求項における「管理サーバ」）とを備えている。各スマートフォン１は、動画を撮影可能なディジタルカメラであるカメラ２を備えている。 Hereinafter, an image analysis program, an information processing terminal, and an image analysis system according to an embodiment embodying the present invention will be described with reference to the drawings. FIG. 1 shows information on which a smartphone 1 which is an information processing terminal according to the first embodiment of the present invention (that is, an image analysis application (see FIG. 2) which is an image analysis program according to the first embodiment of the present invention is installed). It is a schematic block block diagram of the image analysis system 10 including a processing terminal). As shown in FIG. 1, the image analysis system 10 includes a smartphone 1 installed in each installation area such as a residence, an AI analysis server 3 on the cloud C, and a smartphone management server 4 (“management server” in the claim). It has. Each smartphone 1 includes a camera 2 which is a digital camera capable of shooting a moving image.

上記のＡＩ分析サーバ３は、スマートフォン１における画像分析アプリケーションを用いた物体認識結果に基づいて、例えば、各エリアにおける人物の行動を分析し、分析結果の情報を、マーケティングや防犯等の種々の用途のアプリケーションが使い易いデータに変換して出力する。なお、上記のスマートフォン１（における画像分析アプリケーション）からの物体認識結果は、個人情報を含まない文字列データにして、ＡＩ分析サーバ３に送られる。また、上記のスマホ管理サーバ４は、各設置エリアに設置されたスマートフォン１の管理を行う。具体的には、スマホ管理サーバ４は、各種の物体認識（物体検出を含む）のタイプに応じた多数の画像分析アプリケーションを格納したハードディスク等の画像分析アプリストレージ５を備えており、各スマートフォン１への画像分析アプリケーションのインストールや、各スマートフォン１における画像分析アプリケーションの実行管理を行うことができる。 The AI analysis server 3 analyzes the behavior of a person in each area based on the object recognition result using the image analysis application on the smartphone 1, and uses the analysis result information for various purposes such as marketing and crime prevention. Converts the data into data that is easy for the application to use and outputs it. The object recognition result from the smartphone 1 (the image analysis application in the above) is converted into character string data that does not include personal information and is sent to the AI analysis server 3. In addition, the smartphone management server 4 manages the smartphone 1 installed in each installation area. Specifically, the smartphone management server 4 includes an image analysis application storage 5 such as a hard disk that stores a large number of image analysis applications according to various types of object recognition (including object detection), and each smartphone 1 It is possible to install the image analysis application on the smartphone 1 and manage the execution of the image analysis application on each smartphone 1.

次に、図２を参照して、スマートフォン１のハードウェア構成と、スマートフォン１にインストールされた画像分析アプリケーション２１について、説明する。スマートフォン１は、上記のカメラ２に加えて、ＳｏＣ（Ｓｙｓｔｅｍ−ｏｎ−ａ−Ｃｈｉｐ）１１と、タッチパネル１４（請求項における「表示装置」、及び「ポインティングデバイス」）と、スピーカ１５と、各種のデータやプログラムを記憶するメモリ１６と、通信部１７と、二次電池１８と、充電端子１９とを備えている。ＳｏＣ１１は、装置全体の制御及び各種演算を行うＣＰＵ１２と、学習済物体検出用ニューラルネットワークモデル（以下、「物体検出用ＮＮモデル」という）、及び学習済物体認識用ニューラルネットワークモデル（以下、「物体認識用ＮＮモデル」という）の推論処理等に用いられるＧＰＵ１３とを備えている。また、タッチパネル１４は、ユーザが、タッチパネル１４上に表示されたカメラ２からの入力画像に対して、人又は顔の検出位置を指定するためのポインティングデバイスとして用いられる。 Next, with reference to FIG. 2, the hardware configuration of the smartphone 1 and the image analysis application 21 installed on the smartphone 1 will be described. In addition to the above-mentioned camera 2, the smartphone 1 includes a SoC (System-on-a-Chip) 11, a touch panel 14 (“display device” and “pointing device” in the claim), a speaker 15, and various types. It includes a memory 16 for storing data and programs, a communication unit 17, a secondary battery 18, and a charging terminal 19. The SoC 11 includes a CPU 12 that controls the entire device and performs various operations, a trained object detection neural network model (hereinafter referred to as “object detection NN model”), and a trained object recognition neural network model (hereinafter, “object”). It is equipped with a GPU 13 used for inference processing and the like of "recognition NN model"). Further, the touch panel 14 is used as a pointing device for the user to specify a detection position of a person or a face with respect to an input image from the camera 2 displayed on the touch panel 14.

上記のメモリ１６に格納されるプログラムには、スマホ管理サーバ４からスマートフォン１にダウンロードされてインストールされた画像分析アプリケーション２１や、後述するスマホ分析ＯＳプログラム２０が含まれている。通信部１７は、通信ＩＣとアンテナを備えている。スマートフォン１は、通信部１７とネットワークとを介して、ＡＩ分析サーバ３及びスマホ管理サーバ４と接続されている。また、二次電池１８は、リチウムイオン電池等の、充電により繰り返し使用することが可能な電池であり、ＡＣ／ＤＣコンバータにより直流電力に変換した後の商用電源からの電力を、蓄電して、スマートフォン１の各部に供給する。 The program stored in the memory 16 includes an image analysis application 21 downloaded and installed from the smartphone management server 4 to the smartphone 1 and a smartphone analysis OS program 20 described later. The communication unit 17 includes a communication IC and an antenna. The smartphone 1 is connected to the AI analysis server 3 and the smartphone management server 4 via the communication unit 17 and the network. Further, the secondary battery 18 is a battery such as a lithium ion battery that can be used repeatedly by charging, and stores power from a commercial power source after being converted into DC power by an AC / DC converter. It is supplied to each part of the smartphone 1.

また、図２に示すスマートフォン１の画像分析アプリケーション２１は、スマホ管理サーバ４の画像分析アプリストレージ５に格納された、多種類の画像分析アプリケーションの一例である。画像分析アプリストレージ５に格納された画像分析アプリケーションの各々は、カメラ２からの入力画像に映り込んだ物体を検出するための物体検出用ＮＮモデルを含む、１つ以上のＮＮモデルと、これらのＮＮモデルの制御用スクリプトとを含んだパッケージ・プログラムである。このスクリプトには、上記の１つ以上のＮＮモデルの使い方（処理の順番）と、上記の１つ以上のＮＮモデルのうち、いずれかのＮＮモデルによる推論処理の精度測定処理と、この推論処理の精度を向上させるためのインストラクション（請求項における「指示情報」）の表示処理等が記載されている。図２に示す例の画像分析アプリケーション２１は、顔検出モデル２２、顔認識モデル２３、スクリプト２４、及びＡＲアプリケーション２５を含んでいる。 Further, the image analysis application 21 of the smartphone 1 shown in FIG. 2 is an example of various types of image analysis applications stored in the image analysis application storage 5 of the smartphone management server 4. Each of the image analysis applications stored in the image analysis application storage 5 includes one or more NN models including an object detection NN model for detecting an object reflected in an input image from the camera 2, and these NN models. It is a package program that includes a control script for the NN model. In this script, how to use one or more of the above NN models (processing order), accuracy measurement processing of inference processing by one of the above one or more NN models, and this inference processing. The display processing of the instruction (“instruction information” in the claims) for improving the accuracy of the script is described. The example image analysis application 21 shown in FIG. 2 includes a face detection model 22, a face recognition model 23, a script 24, and an AR application 25.

上記の顔検出モデル２２は、カメラ２から取得した画像に含まれる顔を検出する物体検出用ＮＮモデルである。顔認識モデル２３は、顔検出モデル２２で検出した顔に基づいて、検出した顔を有する人の性別・年齢を推定する物体認識用ＮＮモデルである。スクリプト２４は、上記の顔検出モデル２２及び顔認識モデル２３の処理手順、顔検出モデル２２による推論処理の精度測定処理、及び顔検出モデル２２による推論処理の精度を向上させるためのインストラクションの表示処理等を記述した制御用の簡易なプログラムである。また、ＡＲアプリケーション２５は、ＡＲＫｉｔ（ｉＰｈｏｎｅ（登録商標）・ｉＰａｄ（登録商標）向けのＡＲ対応アプリケーションのためのフレームワーク）等の既存の技術を用いたＡＲエンジン用のプログラムである。 The face detection model 22 is an object detection NN model that detects a face included in an image acquired from the camera 2. The face recognition model 23 is an object recognition NN model that estimates the gender and age of a person having a detected face based on the face detected by the face detection model 22. The script 24 is a processing procedure of the face detection model 22 and the face recognition model 23, an accuracy measurement process of the inference process by the face detection model 22, and an instruction display process for improving the accuracy of the inference process by the face detection model 22. It is a simple program for control that describes such things. The AR application 25 is a program for an AR engine using existing technologies such as ARKit (a framework for AR-compatible applications for iPhone (registered trademark) and iPad (registered trademark)).

次に、図３を参照して、スマートフォン１における主なソフトウェアの構成について説明する。図３に示すように、スマートフォン１における主なソフトウェアは、物体検出認識モデル３１と、簡単設置ＵＩ（ＵｓｅｒＩｎｔｅｒｆａｃｅ）３２と、ＡＲエンジン３３と、スマホ分析ＯＳ３４である。図３では、各ソフトウェアのプロセスを示している。図３における物体検出認識モデル３１のプロセスは、図２に示す画像分析アプリケーション２１の例における顔検出モデル２２及び顔認識モデル２３のプログラムに対応する。図３中の簡単設置ＵＩ３２のプロセスは、図２中のスクリプト２４のプログラムの一部に対応する。図３中のＡＲエンジン３３のプロセスは、図２中のＡＲアプリケーション２５のプログラムに対応する。図３中のスマホ分析ＯＳ３４は、図２中のスマホ分析ＯＳプログラム２０に対応する。 Next, the configuration of the main software in the smartphone 1 will be described with reference to FIG. As shown in FIG. 3, the main software in the smartphone 1 is an object detection recognition model 31, an easy installation UI (User Interface) 32, an AR engine 33, and a smartphone analysis OS 34. FIG. 3 shows the process of each software. The process of the object detection recognition model 31 in FIG. 3 corresponds to the programs of the face detection model 22 and the face recognition model 23 in the example of the image analysis application 21 shown in FIG. The process of the easy installation UI 32 in FIG. 3 corresponds to a part of the program of the script 24 in FIG. The process of the AR engine 33 in FIG. 3 corresponds to the program of the AR application 25 in FIG. The smartphone analysis OS 34 in FIG. 3 corresponds to the smartphone analysis OS program 20 in FIG.

図４は、上記のスマートフォン１におけるＳｏＣ１１の機能ブロックを示す。スマートフォン１（のＳｏＣ１１）は、機能ブロックとして、画像分析部４１と、推論精度測定部４２と、報知部４３と、カメラ位置方向推定部４４とを備えている。なお、ＳｏＣ１１における機能ブロックのうち、推論精度測定部４２と、報知部４３と、カメラ位置方向推定部４４の機能は、ＣＰＵ１２により実現され、画像分析部４１の機能は、ＣＰＵ１２とＧＰＵ１３により実現されている。 FIG. 4 shows a functional block of the SoC 11 in the smartphone 1 described above. The smartphone 1 (SoC11) includes an image analysis unit 41, an inference accuracy measurement unit 42, a notification unit 43, and a camera position direction estimation unit 44 as functional blocks. Among the functional blocks in the SoC11, the functions of the inference accuracy measuring unit 42, the notification unit 43, and the camera position direction estimation unit 44 are realized by the CPU 12, and the functions of the image analysis unit 41 are realized by the CPU 12 and the GPU 13. ing.

上記の画像分析部４１は、カメラ２からの入力画像に映り込んだ物体（人や顔等）を検出するための物体検出用ＮＮモデルを含む、１つ以上の（学習済）ＮＮモデル（図２の例では、顔検出モデル２２や顔認識モデル２３）を用いて、カメラ２からの入力画像を分析する。推論精度測定部４２は、上記の１つ以上のＮＮモデルのうち、いずれかのＮＮモデルによる推論処理の精度を測定する。報知部４３は、推論精度測定部４２により測定された推論処理の精度に基づいて、この推論処理の精度を向上させるためのインストラクションをユーザに報知する。より具体的に言うと、報知部４３は、上記のＮＮモデルによる推論処理の精度を向上させるためのインストラクションをタッチパネル１４に表示することにより、上記のインストラクションをユーザに報知する。上記のインストラクションは、物体検出用ＮＮモデルによって人又は顔を検出するのに適した設置位置又は設置角度にカメラ２を動かすことを、ユーザに促すインストラクションである。 The image analysis unit 41 includes one or more (learned) NN models including an object detection NN model for detecting an object (person, face, etc.) reflected in the input image from the camera 2. In the second example, the input image from the camera 2 is analyzed by using the face detection model 22 and the face recognition model 23). The inference accuracy measuring unit 42 measures the accuracy of the inference processing by any one of the above-mentioned one or more NN models. The notification unit 43 notifies the user of an instruction for improving the accuracy of the inference processing based on the accuracy of the inference processing measured by the inference accuracy measurement unit 42. More specifically, the notification unit 43 notifies the user of the instruction by displaying the instruction for improving the accuracy of the inference processing by the NN model on the touch panel 14. The above-mentioned instruction is an instruction for urging the user to move the camera 2 to an installation position or an installation angle suitable for detecting a person or a face by an object detection NN model.

また、カメラ位置方向推定部４４は、カメラ２からの入力画像に基づいて、カメラ２の設置位置及び設置角度を推定する。報知部４３は、推論精度測定部４２により測定された推論処理の精度に加えて、カメラ位置方向推定部４４により推定されたカメラ２の設置位置及び設置角度と、タッチパネル１４により指定された人又は顔の検出位置とに基づいて、上記のインストラクションを求める（決定する）。なお、カメラ位置方向推定部４４は、カメラ２からの映像の他、加速度センサー、角加速度センサー、その他スマートフォン１の移動を感知することが可能なセンサー等を用いて、カメラ位置方向計算の精度を向上させてもよい。また、カメラ位置方向推定部４４は、カメラ２からの映像を用いず、加速度センサー、角加速度センサー、その他スマートフォン１の移動を感知することが可能なセンサー等の出力値のみを用いてカメラ位置方向推定の計算を行ってもよい。 Further, the camera position direction estimation unit 44 estimates the installation position and installation angle of the camera 2 based on the input image from the camera 2. In addition to the accuracy of the inference processing measured by the inference accuracy measurement unit 42, the notification unit 43 includes the installation position and installation angle of the camera 2 estimated by the camera position direction estimation unit 44, and a person or a person designated by the touch panel 14. Obtain (determine) the above instructions based on the detection position of the face. In addition to the image from the camera 2, the camera position / direction estimation unit 44 uses an acceleration sensor, an angular acceleration sensor, and other sensors capable of detecting the movement of the smartphone 1 to improve the accuracy of the camera position / direction calculation. It may be improved. Further, the camera position direction estimation unit 44 does not use the image from the camera 2, but uses only the output values of the acceleration sensor, the angular acceleration sensor, and other sensors capable of detecting the movement of the smartphone 1, and the camera position direction. Estimates may be calculated.

図５は、上記のスマートフォン１（のカメラ２）の設置方法の例を示す。この例では、スマートフォン１（のカメラ２）をイベント会場の入口に設置する場合の例を示す。図５に示す例では、スマートフォン１は、容易に形状を変えることが可能なフレキシブルアーム５１を有するスマホホルダ５２に保持されており、フレキシブルアーム５１の一端に設けられたクリップ５３を用いて、イベントの案内用プレート５４に取り付けられている。上記のフレキシブルアーム５１を変形させることにより、スマートフォン１のカメラ２は、その設置位置及び設置角度を変更する事が可能である。また、図５には示していないが、スマートフォン１の充電端子１９（図２参照）に接続された電源コードが、ＡＣ／ＤＣコンバータを介して商用電源に接続されている。なお、図５に示す例では、スマホホルダ５２がフレキシブルアーム５１の一端に、取り付け用のクリップ５３を有する場合の例を示したが、スマホホルダ５２は、フレキシブルアーム５１の一端に、マグネット等の取り付け具を有する構成であってもよい。また、スマートフォン１のカメラ２に、広角レンズを、クリップ等で取り付けてもよい。 FIG. 5 shows an example of the installation method of the smartphone 1 (camera 2). In this example, an example in which the smartphone 1 (camera 2) is installed at the entrance of the event venue is shown. In the example shown in FIG. 5, the smartphone 1 is held by a smartphone holder 52 having a flexible arm 51 whose shape can be easily changed, and a clip 53 provided at one end of the flexible arm 51 is used to hold an event. It is attached to the guide plate 54. By deforming the flexible arm 51, the camera 2 of the smartphone 1 can change its installation position and installation angle. Further, although not shown in FIG. 5, a power cord connected to the charging terminal 19 (see FIG. 2) of the smartphone 1 is connected to a commercial power source via an AC / DC converter. In the example shown in FIG. 5, an example is shown in which the smartphone holder 52 has a clip 53 for attachment at one end of the flexible arm 51, but the smartphone holder 52 has an attachment tool such as a magnet at one end of the flexible arm 51. It may be a configuration having. Further, a wide-angle lens may be attached to the camera 2 of the smartphone 1 with a clip or the like.

次に、図６及び図７を参照して、上記のＮＮモデルによる推論処理の精度を向上させるためのインストラクションの一例と、このインストラクションを用いたカメラ２の設置位置及び設置角度の調整処理について、説明する。この例では、上記のインストラクションは、顔検出モデル２２（図２参照）による推論処理の精度を向上させるための簡単なインストラクションである。 Next, with reference to FIGS. 6 and 7, an example of an instruction for improving the accuracy of the inference processing by the above NN model and the adjustment processing of the installation position and the installation angle of the camera 2 using this instruction will be described. explain. In this example, the above instruction is a simple instruction for improving the accuracy of the inference processing by the face detection model 22 (see FIG. 2).

ユーザが、スマートフォン１のタッチパネル１４に表示された分析処理（認識処理）の候補の中から、所望の処理（人の性別・年齢推定）を選択すると（Ｓ１）、スマートフォン１（の主にＣＰＵ１２）は、選択した処理に応じたＮＮモデル（顔検出モデル２２及び顔認識モデル２３（図２参照））と、これらのＮＮモデル用のスクリプト２４を、スマホ管理サーバ４からダウンロードする（Ｓ２）。次に、ユーザが、スマートフォン１のカメラ２を大体の位置（本来の人の顔の検出位置（例えば、建物の入り口）をある程度撮影できそうな位置）に設置して、画像分析アプリケーション２１（図２参照）を起動する（Ｓ３）。すると、スマートフォン１のＣＰＵ１２は、画像分析アプリケーション２１のスクリプト２４を実行して、例えば、「所望の検出位置に人を立たせてください」というメッセージをタッチパネル１４に表示した後、現在の設置位置及び設置角度におけるカメラ２からの入力画像（撮影画像）に対する顔検出モデル２２による推論処理の精度の測定（Ｓ４）を開始する。 When the user selects a desired process (estimation of human gender / age) from the candidates for the analysis process (recognition process) displayed on the touch panel 14 of the smartphone 1 (S1), the smartphone 1 (mainly the CPU 12) Downloads the NN model (face detection model 22 and face recognition model 23 (see FIG. 2)) corresponding to the selected process and the script 24 for these NN models from the smartphone management server 4 (S2). Next, the user installs the camera 2 of the smartphone 1 at an approximate position (a position where the original human face detection position (for example, the entrance of a building) can be photographed to some extent), and the image analysis application 21 (FIG. 2) is started (S3). Then, the CPU 12 of the smartphone 1 executes the script 24 of the image analysis application 21, for example, displays a message "Please stand a person at a desired detection position" on the touch panel 14, and then displays the current installation position and installation. The measurement (S4) of the accuracy of the inference processing by the face detection model 22 for the input image (captured image) from the camera 2 at the angle is started.

上記Ｓ４の測定処理の結果、測定した推論処理の精度が所定の値に満たない場合（人の顔に分類される確率が所定の値以上の物体が検出できなかった場合は）、スマートフォン１のＣＰＵ１２（の報知部４３）は、顔検出モデル２２による推論が失敗したと判定して（Ｓ５でＮＯ）、人の顔の検出に適した位置にカメラ２を移動させる（より正確には、人の顔の検出に適した設置位置及び設置角度にカメラ２を動かす）ように、ユーザに指示する（Ｓ６）。具体的には、スマートフォン１のＣＰＵ１２は、顔検出モデル２２による推論が失敗したと判定したときは、画像分析アプリケーション２１のスクリプト２４に従って、図７の上部に示すように、タッチパネル１４に、顔検出失敗を表す赤枠６１（図７中に破線で示す）と、「枠の中に顔が収まるようにカメラを動かしてください」というメッセージ６４を表示する。 As a result of the measurement processing of S4, when the accuracy of the measured inference processing is less than a predetermined value (when an object whose probability of being classified as a human face is equal to or more than a predetermined value cannot be detected), the smartphone 1 The CPU 12 (notifying unit 43) determines that the inference by the face detection model 22 has failed (NO in S5), and moves the camera 2 to a position suitable for detecting the human face (more accurately, the human). The user is instructed to move the camera 2 to an installation position and an installation angle suitable for detecting the face of the user (S6). Specifically, when the CPU 12 of the smartphone 1 determines that the inference by the face detection model 22 has failed, the face detection is performed on the touch panel 14 as shown in the upper part of FIG. 7 according to the script 24 of the image analysis application 21. A red frame 61 (indicated by a broken line in FIG. 7) indicating a failure and a message 64 "Please move the camera so that the face fits in the frame" are displayed.

スマートフォン１のＣＰＵ１２（の報知部４３）は、顔検出モデル２２による推論が成功するまで、上記のメッセージ６４と赤枠６１を、タッチパネル１４に表示し続ける。これに対して、ユーザが、赤枠６１内に顔６２が収まるようにカメラ２の設置位置及び設置角度を調整して（Ｓ７）、測定した推論処理の精度が、一定時間（例えば３秒間）所定の値以上になると（人の顔に分類される確率が所定の値以上の物体が検出できると）、スマートフォン１のＣＰＵ１２（の報知部４３）は、顔検出モデル２２による推論が成功したと判定して（Ｓ５でＹＥＳ）、図７の下部に示すように、タッチパネル１４に、顔検出成功を表す緑枠６３（図７中に実線で示す）と、「調整が完了しました」というメッセージ６５を表示する（Ｓ８）。このように、ユーザは、タッチパネル１４に表示された、顔検出モデル２２による推論処理の精度を向上させるためのインストラクション（赤枠６１、緑枠６３、メッセージ６４、及びメッセージ６５）を見ながら、所望の検出位置に立たせた人の顔を正確に検出できる位置にカメラ２を設置するために、カメラ２の設置位置と設置角度を試行錯誤して調整することができる。なお、スマートフォン１のＣＰＵ１２（の報知部４３）は、タッチパネル１４内に、現在の推論処理精度を図示（例えば、精度が高い程塗りつぶし領域が大きくなるバー型のレベルメーターを表示）することで、どのようにすれば推論処理精度が上がるかを、ユーザに直感的に示すようにしてもよい。 The CPU 12 (notification unit 43) of the smartphone 1 continues to display the above message 64 and the red frame 61 on the touch panel 14 until the inference by the face detection model 22 is successful. On the other hand, the user adjusts the installation position and the installation angle of the camera 2 so that the face 62 fits within the red frame 61 (S7), and the accuracy of the measured inference processing is a fixed time (for example, 3 seconds). When the value exceeds a predetermined value (when an object whose probability of being classified as a human face is equal to or higher than a predetermined value can be detected), the CPU 12 (notification unit 43) of the smartphone 1 states that the inference by the face detection model 22 is successful. After making a judgment (YES in S5), as shown in the lower part of FIG. 7, a green frame 63 (indicated by a solid line in FIG. 7) indicating successful face detection and a message "Adjustment completed" are displayed on the touch panel 14. 65 is displayed (S8). In this way, the user desires while looking at the instructions (red frame 61, green frame 63, message 64, and message 65) for improving the accuracy of the inference processing by the face detection model 22 displayed on the touch panel 14. In order to install the camera 2 at a position where the face of a person standing in the detection position can be accurately detected, the installation position and the installation angle of the camera 2 can be adjusted by trial and error. The CPU 12 (notification unit 43) of the smartphone 1 displays the current inference processing accuracy in the touch panel 14 (for example, a bar-shaped level meter in which the filled area becomes larger as the accuracy is higher). The user may be intuitively shown how to improve the inference processing accuracy.

次に、図８のフローチャートに加えて、図９乃至図１４を参照して、上記のＮＮモデルによる推論処理の精度を向上させるためのインストラクションの他の例と、このインストラクションを用いたカメラ２の設置位置及び設置角度の調整処理について、説明する。この例では、上記のインストラクションは、店舗における来店者をカウントするための人検出用ＮＮモデルによる推論処理の精度を向上させるためのインストラクションであり、上記図６及び図７に示す例と比べて、丁寧な（推論処理の結果だけではなく、カメラ２の設置位置及び設置角度の調整の仕方を示すメッセージを含む）インストラクションである。また、この例では、画像分析アプリケーション２１に含まれるＮＮモデルは、人検出用ＮＮモデル（以下、「人物検出モデル」という）と、ベクトル化モデルである。ベクトル化モデルは、人物検出モデルで検出した人の画像に対してベクトル化処理を行う物体認識用ＮＮモデルである。このベクトル化モデルを用いて、人物検出モデルで検出した人の画像に対してベクトル化処理を行うことにより、カメラ２により別々のタイミングで撮影した人が同一人物であるか否かを判別することができる。すなわち、カメラ２が例えば１秒毎に５回映像を撮影し、これら撮影画像毎に人物検出処理及びベクトル化処理を行うことで、当該撮影画像に映りこんだ人物を追跡する（同一人物に同一ＩＤを付与することで、当該人物の移動を観測する）ことが可能となる。 Next, in addition to the flowchart of FIG. 8, with reference to FIGS. 9 to 14, another example of the instruction for improving the accuracy of the inference processing by the above NN model and the camera 2 using this instruction. The process of adjusting the installation position and installation angle will be described. In this example, the above instruction is an instruction for improving the accuracy of the inference processing by the NN model for detecting a person for counting the number of visitors in the store, and is compared with the examples shown in FIGS. 6 and 7 above. It is a polite instruction (including not only the result of inference processing but also a message indicating how to adjust the installation position and installation angle of the camera 2). Further, in this example, the NN model included in the image analysis application 21 is a person detection NN model (hereinafter referred to as “person detection model”) and a vectorization model. The vectorization model is an object recognition NN model that performs vectorization processing on a person image detected by the person detection model. By using this vectorization model to perform vectorization processing on the image of the person detected by the person detection model, it is determined whether or not the people photographed at different timings by the camera 2 are the same person. Can be done. That is, the camera 2 captures an image five times every second, for example, and performs a person detection process and a vectorization process for each captured image to track the person reflected in the captured image (same as the same person). By assigning an ID, it is possible to observe the movement of the person concerned).

ユーザが、スマートフォン１のタッチパネル１４に表示された分析処理の候補の中から、所望の処理（来店者カウント）を選択すると（Ｓ１１）、スマートフォン１のＣＰＵ１２は、選択した処理に応じたＮＮモデル（人物検出モデル、及びベクトル化モデル）と、これらのＮＮモデル用のスクリプトを、スマホ管理サーバ４からダウンロードする（Ｓ１２）。次に、ユーザが、スマートフォン１のカメラ２を大体の位置（来店者をある程度撮影できそうな位置（店舗の入口をある程度撮影できそうな位置））に固定設置して、画像分析アプリケーション２１を起動する（Ｓ１３）。すると、スマートフォン１のＣＰＵ１２は、画像分析アプリケーション２１のスクリプトを実行して、ユーザに検出位置（検出対象となる人が現れる位置）を指定させるためのメッセージを、タッチパネル１４上に表示する（Ｓ１４）。例えば、スマートフォン１のＣＰＵ１２は、現在のカメラ映像を（フレーム）画像として取得し、あわせてカメラ設置位置と設置角度、及び当該画像内の床面の三次元座標を推定した後、図９に示すように当該カメラ映像の画像の上に重畳させて、「線（１）を引いてください」というメッセージ７０や、図１０に示す「線（２）を引いてください」というメッセージ７３を、タッチパネル１４上に表示する。 When the user selects a desired process (visitor count) from the analysis process candidates displayed on the touch panel 14 of the smartphone 1 (S11), the CPU 12 of the smartphone 1 determines the NN model (NN model) corresponding to the selected process. The person detection model and the vectorization model) and the scripts for these NN models are downloaded from the smartphone management server 4 (S12). Next, the user fixedly installs the camera 2 of the smartphone 1 at an approximate position (a position where the visitor can be photographed to some extent (a position where the entrance of the store can be photographed to some extent)) and starts the image analysis application 21. (S13). Then, the CPU 12 of the smartphone 1 executes the script of the image analysis application 21 and displays a message on the touch panel 14 for the user to specify the detection position (the position where the person to be detected appears) (S14). .. For example, the CPU 12 of the smartphone 1 acquires the current camera image as a (frame) image, estimates the camera installation position and installation angle, and the three-dimensional coordinates of the floor surface in the image, and then shows it in FIG. The message 70 "Please draw a line (1)" and the message 73 "Please draw a line (2)" shown in FIG. 10 are displayed on the touch panel 14 by superimposing the image on the image of the camera image. Display above.

なお、カメラ設置位置と設置角度の推定は、ＡＲアプリケーション２５（図２参照）を用いて、カメラ位置方向推定部４４により、カメラ２で取得した各フレーム画像から特徴点を抽出し、各フレーム画像間の特徴点の動きから、カメラ２の３次元の設置位置と設置角度を推定する。この特徴点は、フレーム画像中の物のエッジ等に相当する点（ポイント）であり、光源（照明）の向きが変わっても、特徴が変わり難い部分である。また、上記の画像内の床面の三次元座標も、ＡＲアプリケーション２５を用いて、カメラ２で取得した各フレーム画像から特徴点を抽出し、各フレーム画像間の特徴点の動きから推定する。 To estimate the camera installation position and installation angle, the AR application 25 (see FIG. 2) is used, and the camera position direction estimation unit 44 extracts feature points from each frame image acquired by the camera 2, and each frame image. The three-dimensional installation position and installation angle of the camera 2 are estimated from the movements of the feature points between them. This feature point is a point corresponding to an object edge or the like in the frame image, and the feature is hard to change even if the direction of the light source (illumination) changes. Further, the three-dimensional coordinates of the floor surface in the above image are also estimated from the movement of the feature points between the frame images by extracting the feature points from each frame image acquired by the camera 2 using the AR application 25.

図９に示すように、「線（１）を引いてください」というメッセージ７０が表示された時には、ユーザは、このメッセージに従って、自分の指Ｆでタッチパネル１４上をなぞることにより、線（１）に対応する線７１を、タッチパネル１４に表示されたカメラ２からの入力画像に書き込む。 As shown in FIG. 9, when the message 70 "Please draw a line (1)" is displayed, the user traces the line (1) on the touch panel 14 with his / her finger F according to this message. The line 71 corresponding to is written in the input image from the camera 2 displayed on the touch panel 14.

また、図１０に示すように、「線（２）を引いてください」というメッセージ７３が表示された時には、ユーザは、このメッセージに従って、自分の指Ｆでタッチパネル１４上をなぞることにより、線（２）に対応する線７２を、タッチパネル１４に表示されたカメラ２からの入力画像に書き込む。 Further, as shown in FIG. 10, when the message 73 "Please draw a line (2)" is displayed, the user traces the line (on the touch panel 14 with his / her finger F according to this message). The line 72 corresponding to 2) is written on the input image from the camera 2 displayed on the touch panel 14.

上記の画像への線７１、７２の書き込み処理が完了すると、スマートフォン１のＣＰＵ１２（の報知部４３）は、例えば、「カメラの設置位置と設置角度を調整してください」というメッセージを表示して、ユーザにカメラ２の設置位置と設置角度の調整を促す。ユーザによるカメラ２の設置位置と設置角度の調整後、スマートフォン１のＣＰＵ１２は、現在のカメラ映像の（フレーム）画像を取得し、併せて当該画像から、カメラ２の設置位置・設置角度を再度推定する（Ｓ１５）。なお、以下の処理は、カメラ２の設置位置と設置角度を微調整しながら、繰り返し実施することを想定しており、当該カメラ映像の画像の再取得とカメラ２の設置位置・設置角度の再推定は、当該繰り返しの度に実施される。 When the writing process of the lines 71 and 72 to the above image is completed, the CPU 12 (notification unit 43) of the smartphone 1 displays, for example, a message "Please adjust the installation position and the installation angle of the camera". , Prompt the user to adjust the installation position and installation angle of the camera 2. After the user adjusts the installation position and installation angle of the camera 2, the CPU 12 of the smartphone 1 acquires the (frame) image of the current camera image, and estimates the installation position and installation angle of the camera 2 again from the image. (S15). It is assumed that the following processing will be repeated while fine-tuning the installation position and installation angle of the camera 2, and reacquiring the image of the camera image and re-acquiring the installation position and installation angle of the camera 2. The estimation is performed at each iteration.

スマートフォン１のＣＰＵ１２は、画像分析アプリケーション２１のスクリプトを実行して、例えば、「線（１）の上に立ってください」というメッセージをタッチパネル１４に表示した後、現在の設置位置及び設置角度におけるカメラ２からの入力画像（撮影画像）における、線（１）上の画像に対する人物検出モデルによる推論処理の精度の測定を行う（Ｓ１６）。 The CPU 12 of the smartphone 1 executes the script of the image analysis application 21, for example, displays the message "Please stand on the line (1)" on the touch panel 14, and then the camera at the current installation position and installation angle. The accuracy of the inference processing by the person detection model for the image on the line (1) in the input image (photographed image) from 2 is measured (S16).

上記Ｓ１６の測定処理の結果、測定した推論処理の精度が所定の値に満たない場合（人に分類される確率が所定の値以上の物体が検出できなかった場合）は、スマートフォン１のＣＰＵ１２は、人物検出モデルによる推論が失敗したと判定して（Ｓ１７でＮＯ）、Ｓ１５で推定したカメラ２の設置位置及び設置角度と、タッチパネル１４で指定された人の検出位置（線７１の位置）とに基づいて、（光源の影響を考慮した）人の検出に適した設置位置及び設置角度にカメラ２を動かすように、ユーザに指示する（Ｓ１８）。具体的には、スマートフォン１の報知部４３は、例えば、図１１に示すように、現在のカメラ２の設置位置及び設置角度では、線（１）上の人７４ａの足下しか撮影することができず、線（１）上の人７４ａを人物検出モデルで検出することができない場合には、図１２に示すように、「カメラの位置を高くし、水平方向に傾けて下さい」というメッセージ７６を、タッチパネル１４に表示する。スマートフォン１の報知部４３は、上記Ｓ１６で測定した推論処理の精度に加えて、Ｓ１５で推定したカメラ２の設置位置及び設置角度と、タッチパネル１４で指定された人の検出位置（線７１の位置）とに基づいて、上記のインストラクション（メッセージ７６）を求める。 As a result of the measurement process of S16, when the accuracy of the measured inference process is less than a predetermined value (when an object whose probability of being classified as a person is equal to or more than a predetermined value cannot be detected), the CPU 12 of the smartphone 1 is used. , It is determined that the inference by the person detection model has failed (NO in S17), the installation position and installation angle of the camera 2 estimated in S15, and the detection position of the person (position of the line 71) specified by the touch panel 14. Based on the above, the user is instructed to move the camera 2 to an installation position and an installation angle suitable for detecting a person (considering the influence of the light source) (S18). Specifically, as shown in FIG. 11, the notification unit 43 of the smartphone 1 can shoot only the feet of the person 74a on the line (1) at the current installation position and angle of the camera 2. If the person 74a on the line (1) cannot be detected by the person detection model, the message 76 "Please raise the camera position and tilt it horizontally" is displayed as shown in FIG. , Displayed on the touch panel 14. In addition to the accuracy of the inference processing measured in S16, the notification unit 43 of the smartphone 1 includes the installation position and angle of the camera 2 estimated in S15 and the detection position of the person (position of the line 71) designated by the touch panel 14. ) And the above instruction (message 76).

上記のメッセージ７６に従って、ユーザが、カメラ２の設置位置を高くし、設置角度を水平方向に傾けて、カメラ２が線（１）上の人７４ａの全体を撮影できるように調整し（Ｓ１９）、測定した推論処理の精度が所定の値以上になると（人に分類される確率が所定の値以上の物体が検出できると）、スマートフォン１のＣＰＵ１２は、続いて、「線（２）の上に立ってください」というメッセージをタッチパネル１４に表示する。そして、スマートフォン１のＣＰＵ１２は、現在の設置位置及び設置角度におけるカメラ２からの入力画像（撮影画像）における、線（２）上の画像に対する人物検出モデルによる推論処理の精度の測定を行う（Ｓ１６）。この測定処理の結果、測定した推論処理の精度が所定の値に満たない場合は、上記Ｓ１８及びＳ１９と同様な処理が行われる。 According to the above message 76, the user raises the installation position of the camera 2 and tilts the installation angle in the horizontal direction so that the camera 2 can photograph the entire person 74a on the line (1) (S19). When the accuracy of the measured inference processing exceeds a predetermined value (when an object whose probability of being classified as a person is equal to or higher than a predetermined value can be detected), the CPU 12 of the smartphone 1 subsequently moves "on the line (2)". The message "Please stand" is displayed on the touch panel 14. Then, the CPU 12 of the smartphone 1 measures the accuracy of the inference processing by the person detection model for the image on the line (2) in the input image (photographed image) from the camera 2 at the current installation position and installation angle (S16). ). As a result of this measurement process, when the accuracy of the measured inference process is less than a predetermined value, the same process as in S18 and S19 is performed.

上記の測定処理の結果、線（１）上の画像と線（２）上の画像の両方に対する人物検出モデルによる推論処理の精度が所定の値以上になると（図１３に示す線（１）上の人７４ａと線（２）上の人７４ｂの両方を検出できるようになると）、スマートフォン１のＣＰＵ１２（の報知部４３）は、人物検出モデルによる推論が成功したと判定して（Ｓ１７でＹＥＳ）、図１４に示すように、タッチパネル１４に、「調整が完了しました」というメッセージ７７を表示する（Ｓ２０）。このように、スマートフォン１の報知部４３は、人物検出モデルによる推論処理の精度を向上させるための（ユーザによるカメラ２の設置位置と設置角度の調整をサポートするための）インストラクション（メッセージ７０、７３、７６、７７等）を、タッチパネル１４上に表示する。なお、図１３における７５ａ、７５ｂは、それぞれ、線（１）における人７４ａに対応するバウンディングボックスと、線（２）における人７４ｂに対応するバウンディングボックスである。 As a result of the above measurement processing, when the accuracy of the inference processing by the person detection model for both the image on the line (1) and the image on the line (2) becomes a predetermined value or more (on the line (1) shown in FIG. 13). (When both the person 74a and the person 74b on the line (2) can be detected), the CPU 12 (notification unit 43) of the smartphone 1 determines that the inference by the person detection model is successful (YES in S17). ), As shown in FIG. 14, the message 77 "adjustment is completed" is displayed on the touch panel 14 (S20). In this way, the notification unit 43 of the smartphone 1 gives instructions (messages 70, 73) for improving the accuracy of the inference processing by the person detection model (to support the user in adjusting the installation position and the installation angle of the camera 2). , 76, 77, etc.) are displayed on the touch panel 14. Note that 75a and 75b in FIG. 13 are a bounding box corresponding to the person 74a in the line (1) and a bounding box corresponding to the person 74b in the line (2), respectively.

上記Ｓ２０における調整完了のメッセージ７７の表示後に、ユーザが、スマートフォン１のタッチパネル１４から、来店者カウントの処理の実行を指示すると、スマートフォン１の画像分析部４１は、人物検出モデルを用いて、線（１）における人７４ａと線（２）における人７４ｂを検出した後、ベクトル化モデルを用いて、線（１）の位置で検出した人７４ａと線（２）の位置で検出した人７４ｂとが同一人物であるか否かを判別する。そして、スマートフォン１のＣＰＵ１２は、線（１）の位置で検出した人７４ａと線（２）の位置で検出した人７４ｂとが同一人物であった場合には、その（認識）結果を、ＡＩ分析サーバ３に送信する。ＡＩ分析サーバ３は、スマートフォン１から、線（１）の位置と線（２）の位置で同一人物を検出したという認識結果を受信すると、該当の店舗における来店者の数をカウントアップする。 After the display of the adjustment completion message 77 in S20, when the user instructs the touch panel 14 of the smartphone 1 to execute the store visitor count process, the image analysis unit 41 of the smartphone 1 uses the person detection model to display a line. After detecting the person 74a in the line (1) and the person 74b in the line (2), the person 74a detected at the position of the line (1) and the person 74b detected at the position of the line (2) using a vectorized model. Determine if they are the same person. Then, when the person 74a detected at the position of the line (1) and the person 74b detected at the position of the line (2) are the same person, the CPU 12 of the smartphone 1 outputs the (recognition) result to AI. It is transmitted to the analysis server 3. When the AI analysis server 3 receives the recognition result that the same person is detected at the position of the line (1) and the position of the line (2) from the smartphone 1, the AI analysis server 3 counts up the number of visitors to the store.

上記のように、本実施形態の画像分析アプリケーション２１及びスマートフォン１によれば、画像分析アプリケーション２１に含まれるＮＮモデルのうち、いずれかのＮＮモデル（「顔検出」の例では、顔検出モデル２２、「来店者カウント」の例では、人物検出モデル）による推論処理の精度を測定して、測定した推論処理の精度に基づいて、この推論処理の精度を向上させるためのインストラクションをユーザに報知（タッチパネル１４に表示）するようにした。この報知されたインストラクションに基づいて、ユーザが、スマートフォン１のカメラ２の設置位置（撮影位置）や設置角度（撮影方向）を調整することにより、専門の技術スタッフが、カメラ２の設置場所に赴いて、カメラ２の設置位置や設置角度の調整を行うことなく、適切なＮＮモデル（「顔検出」の例では、顔検出モデル２２及び顔認識モデル２３、「来店者カウント」の例では、人物検出モデル及びベクトル化モデル）の推論精度を得ることができる。従って、ユーザが容易にカメラ２を設置することができる。 As described above, according to the image analysis application 21 and the smartphone 1 of the present embodiment, any of the NN models included in the image analysis application 21 (in the example of "face detection", the face detection model 22). , In the example of "visitor count", the accuracy of the inference processing by the person detection model) is measured, and the user is notified of the instruction for improving the accuracy of the inference processing based on the measured accuracy of the inference processing ( (Displayed on the touch panel 14). Based on this notified instruction, the user adjusts the installation position (shooting position) and installation angle (shooting direction) of the camera 2 of the smartphone 1, and the specialized technical staff goes to the installation location of the camera 2. An appropriate NN model (face detection model 22 and face recognition model 23 in the "face detection" example, and a person in the "visitor count" example without adjusting the installation position or angle of the camera 2. The inference accuracy of the detection model and vectorization model) can be obtained. Therefore, the user can easily install the camera 2.

また、本実施形態のスマートフォン１によれば、ＳｏＣ１１の報知部４３は、推論精度測定部４２により測定された推論処理の精度に加えて、カメラ位置方向推定部４４により推定されたカメラ２の設置位置及び設置角度と、タッチパネル１４により指定された人等の検出位置とに基づいて、ＮＮモデルの推論処理の精度を向上させるためのインストラクションを求めるようにした。これにより、推論処理の結果の報知だけではなく、カメラ２の設置位置及び設置角度の調整の仕方を示す報知を行うことができる。従って、ユーザがより容易にカメラ２を設置することができる。 Further, according to the smartphone 1 of the present embodiment, the notification unit 43 of the SoC 11 installs the camera 2 estimated by the camera position direction estimation unit 44 in addition to the accuracy of the inference processing measured by the inference accuracy measurement unit 42. Based on the position and installation angle and the detection position of a person or the like designated by the touch panel 14, an instruction for improving the accuracy of the inference processing of the NN model is obtained. As a result, not only the notification of the result of the inference processing but also the notification indicating how to adjust the installation position and the installation angle of the camera 2 can be performed. Therefore, the user can install the camera 2 more easily.

また、本実施形態の画像分析システム１０によれば、上記の効果に加えて、スマホ管理サーバ４を用いて、スマートフォン１への（ＮＮモデルを含む）画像分析アプリケーション２１のインストールを含む、スマートフォン１の管理を行うことができる。 Further, according to the image analysis system 10 of the present embodiment, in addition to the above effects, the smartphone 1 includes the installation of the image analysis application 21 (including the NN model) on the smartphone 1 by using the smartphone management server 4. Can be managed.

次に、図１５乃至図２３を参照して、本発明の第２の実施形態のスマートフォン１について説明する。図１５は、第２の実施形態のスマートフォン１におけるＳｏＣ１１の機能ブロックを示す。第２の実施形態のスマートフォン１におけるＳｏＣ１１は、３Ｄモデル重畳部８１（請求項における「画像重畳部」）を備えている点以外は、第１の実施形態のＳｏＣ１１と同様である。３Ｄモデル重畳部８１は、カメラ２で取得したフレーム画像（カメラ２からの入力画像）に、３次元コンピュータグラフィックス（３ＤＣＧ）で作成した人の画像を重畳させる。ここで、３次元コンピュータグラフィックスは、コンピュータに物体の形状、カメラの向きと位置、光源の強度と位置などの情報を入力して、コンピュータにプログラムで画像を計算・生成させる手法を言う。第１の実施形態では、画像分析部４１が、カメラ２からの入力画像（に映り込んだ人や顔）を直接分析したが、第２の実施形態では、画像分析部４１が、３Ｄモデル重畳部８１によりカメラ２からの入力画像（フレーム画像）に重畳された３ＤＣＧの人の画像を分析する。なお、３Ｄモデル重畳部８１が行う処理の詳細については、後述する。また、図１５における３Ｄモデル重畳部８１以外の機能ブロックについては、図４中の各ブロックと同じ符号を付して、その説明を省略する。 Next, the smartphone 1 of the second embodiment of the present invention will be described with reference to FIGS. 15 to 23. FIG. 15 shows a functional block of the SoC 11 in the smartphone 1 of the second embodiment. The SoC11 in the smartphone 1 of the second embodiment is the same as the SoC11 of the first embodiment except that it includes a 3D model superimposing unit 81 (“image superimposing unit” in the claim). The 3D model superimposition unit 81 superimposes the image of a person created by 3D computer graphics (3DCG) on the frame image (input image from the camera 2) acquired by the camera 2. Here, 3D computer graphics refers to a method of inputting information such as the shape of an object, the orientation and position of a camera, and the intensity and position of a light source into a computer, and causing the computer to calculate and generate an image by a program. In the first embodiment, the image analysis unit 41 directly analyzes the input image (the person or face reflected in the camera 2) from the camera 2, but in the second embodiment, the image analysis unit 41 superimposes the 3D model. The 3DCG person image superimposed on the input image (frame image) from the camera 2 is analyzed by the unit 81. The details of the processing performed by the 3D model superimposing unit 81 will be described later. Further, the functional blocks other than the 3D model superimposing unit 81 in FIG. 15 are designated by the same reference numerals as the blocks in FIG. 4, and the description thereof will be omitted.

図１６は、上記図１５で示したＳｏＣ１１中の各機能ブロックと、３Ｄモデル重畳部８１による３ＤＣＧの人の画像の作成に必要なファイルについて示している。 FIG. 16 shows each functional block in the SoC11 shown in FIG. 15 and a file required for creating an image of a 3DCG person by the 3D model superimposing unit 81.

次に、図１７のフローチャートに加えて、図１６及び図１８乃至図２３を参照して、第２の実施形態のスマートフォン１における、上記第１の実施形態と同様なインストラクションを用いたカメラ２の設置位置及び設置角度の調整処理について、説明する。この例では、上記のインストラクションは、顔認識モデル２３（図２参照）による推論処理の精度を向上させるためインストラクションである。ここで、顔認識モデル２３は、図６及び図７に示す例の場合と同様に、顔検出モデル２２で検出した顔に基づいて、検出した顔を有する人の性別・年齢を推定する物体認識用ＮＮモデル（性別・年齢認識用ＮＮモデル）である。 Next, in addition to the flowchart of FIG. 17, referring to FIGS. 16 and 18 to 23, the camera 2 of the smartphone 1 of the second embodiment uses the same instructions as those of the first embodiment. The adjustment process of the installation position and the installation angle will be described. In this example, the above instruction is an instruction for improving the accuracy of the inference processing by the face recognition model 23 (see FIG. 2). Here, the face recognition model 23 is an object recognition that estimates the gender and age of a person having a detected face based on the face detected by the face detection model 22, as in the case of the examples shown in FIGS. 6 and 7. NN model (NN model for gender / age recognition).

図１７では、フローチャートが長くなるため、省略しているが、この調整処理では、Ｓ２１の処理を行う前に、図６におけるＳ１乃至Ｓ３、又は図８におけるＳ１１乃至Ｓ１３に対応する処理を行う。すなわち、ユーザが、スマートフォン１のタッチパネル１４に表示された分析処理（認識処理）の候補の中から、所望の処理（人の性別・年齢推定）を選択すると（図６のＳ１に相当）、スマートフォン１（の主にＣＰＵ１２）は、選択した処理に応じたＮＮモデル（顔検出モデル２２及び顔認識モデル２３（図２参照））と、これらのＮＮモデル用のスクリプト２４を、スマホ管理サーバ４からダウンロードする（図６のＳ２に相当）。次に、ユーザが、スマートフォン１のカメラ２を大体の位置（来店者の顔を撮影できそうな位置（店舗の入口を撮影できそうな位置））に設置して、画像分析アプリケーション２１（図２参照））を起動する（図６のＳ３に相当）。すると、スマートフォン１のＣＰＵ１２は、画像分析アプリケーション２１のスクリプトを実行して、ユーザに人の位置（検出及び認識対象となる顔を有する人の位置）を指定させるためのメッセージを、タッチパネル１４上に表示する（図１７のＳ２１）。例えば、スマートフォン１のＣＰＵ１２は、現在のカメラ映像を（フレーム）画像として取得し、あわせてカメラ設置位置と設置角度、及び当該画像内の床面の三次元座標を推定した後、図１８に示す「人の位置のラインを引いてください」というメッセージ８８を、タッチパネル１４上に表示する。 Although it is omitted in FIG. 17 because the flowchart becomes long, in this adjustment process, the processes corresponding to S1 to S3 in FIG. 6 or S11 to S13 in FIG. 8 are performed before the process of S21 is performed. That is, when the user selects a desired process (gender / age estimation of a person) from the candidates for the analysis process (recognition process) displayed on the touch panel 14 of the smartphone 1 (corresponding to S1 in FIG. 6), the smartphone 1 (mainly the CPU 12) transfers the NN model (face detection model 22 and face recognition model 23 (see FIG. 2)) corresponding to the selected process and the script 24 for these NN models from the smartphone management server 4. Download (corresponds to S2 in FIG. 6). Next, the user installs the camera 2 of the smartphone 1 at an approximate position (a position where the face of the visitor can be photographed (a position where the entrance of the store can be photographed)), and the image analysis application 21 (FIG. 2). Refer to))) (corresponding to S3 in FIG. 6). Then, the CPU 12 of the smartphone 1 executes the script of the image analysis application 21 to send a message on the touch panel 14 for the user to specify the position of the person (the position of the person having the face to be detected and recognized). It is displayed (S21 in FIG. 17). For example, the CPU 12 of the smartphone 1 acquires the current camera image as a (frame) image, estimates the camera installation position and installation angle, and the three-dimensional coordinates of the floor surface in the image, and then shows it in FIG. The message 88 "Please draw a line at the position of a person" is displayed on the touch panel 14.

なお、第１の実施形態の場合と同様に、カメラ設置位置と設置角度の推定は、ＡＲアプリケーション２５（図２参照）を用いて、カメラ位置方向推定部４４により、カメラ２で取得した各フレーム画像から特徴点を抽出し、各フレーム画像間の特徴点の動きから、カメラ２の３次元の設置位置と設置角度を推定する方法により行う。 As in the case of the first embodiment, the camera installation position and the installation angle are estimated by the camera position direction estimation unit 44 using the AR application 25 (see FIG. 2) for each frame acquired by the camera 2. This is performed by a method of extracting feature points from an image and estimating the three-dimensional installation position and installation angle of the camera 2 from the movement of the feature points between each frame image.

図１８に示すように、「人の位置のラインを引いてください」というメッセージ８８が表示された時には、ユーザは、このメッセージに従って、自分の指Ｆでタッチパネル１４上をなぞることにより、人の位置のライン８９を、タッチパネル１４に表示されたカメラ２からの入力画像に書き込む。 As shown in FIG. 18, when the message 88 "Please draw a line of the position of the person" is displayed, the user follows this message and traces the position of the person on the touch panel 14 with his / her finger F. Line 89 is written on the input image from the camera 2 displayed on the touch panel 14.

上記の入力画像へのライン８９の書き込み処理が完了すると、スマートフォン１のＣＰＵ１２は、タッチパネル１４上に、ユーザが、カメラ２の撮影個所を通過すると想定している客層（の状況）を選択するための状況選択メニュー画面８６（図１９参照）を表示する。この状況選択メニュー画面８６には、図１９に示すように、「家族連れが多い」、「若者が多い」、「子供が多い」、「老人が多い」、「子供を抱いた母親含む」、「雑多」等の多種の客層（の状況）を選択するための選択ボタン８７（８７ａ〜８７ｆ等）が設けられている。なお、上記の「雑多」は、色んな属性の人がたくさん来る（例えば、カメラ２の撮影画像に、１０〜６０代の男女が１０〜２０人映り込む）という客層（の状況）を表す。 When the process of writing the line 89 to the input image is completed, the CPU 12 of the smartphone 1 selects (the situation) of the customer group on the touch panel 14 that the user expects to pass through the shooting location of the camera 2. The status selection menu screen 86 (see FIG. 19) is displayed. As shown in FIG. 19, on the situation selection menu screen 86, “many families”, “many young people”, “many children”, “many old people”, “including mothers holding children”, A selection button 87 (87a to 87f, etc.) for selecting (a situation) of various customer groups such as "miscellaneous" is provided. The above-mentioned "miscellaneous" represents the customer base (situation) in which many people with various attributes come (for example, 10 to 20 men and women in their 10s and 60s are reflected in the image taken by the camera 2).

ユーザが、上記の状況選択メニュー画面８６に表示された多種の選択ボタン８７の中から、スマートフォン１のカメラ２の設置個所の客層（の状況）に応じた複数の選択ボタン８７を選択すると（Ｓ２２）、ＳｏＣ１１の３Ｄモデル重畳部８１は、ユーザが選択した客層（の状況）に応じて、画面表示する人の３Ｄモデルの画像（３次元コンピュータグラフィックス（３ＤＣＧ）で作成した人のモデルの画像）を、図１６に示す３ＤモデルＤＢ８３から読み取る。そして、３Ｄモデル重畳部８１は、３ＤモデルＤＢ８３から読み取った３Ｄモデルの画像と、環境データファイル８４からランダムに（適当に）読み取った光の環境（「光源」及び「陰影」）（のデータ）と、カメラ位置方向推定部４４で推定したカメラ２の向きと位置とに基づいて、３ＤモデルＤＢ８３から読み取った１人以上の人の３Ｄモデルの画像を、カメラ２の向きと位置とに応じて調整した上で、環境データファイル８４から読み取ったランダムな光の環境における上記の調整後の３Ｄモデルの画像を、カメラ２からの入力画像に重畳して、この重畳画像をタッチパネル１４上に表示させる（Ｓ２３）。 When the user selects a plurality of selection buttons 87 according to the customer base (situation) of the location where the camera 2 of the smartphone 1 is installed from among the various selection buttons 87 displayed on the status selection menu screen 86 (S22). ), The 3D model superimposing unit 81 of the SoC11 is an image of a person's 3D model (a person's model image created by 3D computer graphics (3DCG)) according to (the situation) of the customer group selected by the user. ) Is read from the 3D model DB83 shown in FIG. Then, the 3D model superimposition unit 81 (data of the 3D model image read from the 3D model DB83 and the light environment (“light source” and “shadow”) (data) randomly (appropriately) read from the environment data file 84. Based on the orientation and position of the camera 2 estimated by the camera position direction estimation unit 44, the images of the 3D model of one or more people read from the 3D model DB83 are obtained according to the orientation and position of the camera 2. After adjustment, the image of the above-adjusted 3D model in the random light environment read from the environment data file 84 is superimposed on the input image from the camera 2, and this superimposed image is displayed on the touch panel 14. (S23).

次に、上記のＳ２３で述べた３Ｄモデルの画像の重畳処理の詳細について、説明するが、その前に、図１６に示す３ＤモデルＤＢ８３、環境データファイル８４、及び状況データファイル８５の各々に格納されたデータについて、説明する。 Next, the details of the image superimposition processing of the 3D model described in S23 will be described, but before that, it is stored in each of the 3D model DB 83, the environment data file 84, and the situation data file 85 shown in FIG. The data obtained will be described.

まず、３ＤモデルＤＢ８３には、性別、年齢、体形等の各種の属性を有する多種類の人の３Ｄモデルの画像データが、一人分ずつ格納されている。すなわち、３ＤモデルＤＢ８３に格納されている３Ｄモデルの画像データの各々は、例えば、１人の「３０代のやせ型高身長の女性」の３Ｄモデルの画像データである。一方、上記の状況データファイル８５には、状況選択メニュー画面８６における各選択ボタン８７に対応した各客層の状況データが格納されている。状況データファイル８５には、例えば、「家族連れ」を表現する（客層の）状況データとして、例えば、「１人の３０代〜５０代の平均的体形の女性＋１人の３０代〜５０代の平均的体形の男性＋１人の５〜１５歳の平均的体形の男子＋１人の５〜１５歳の平均的体形の女子」の状況データが格納されている。また、上記の環境データファイル８４には、光の環境（「光源」及び「陰影」）のデータが格納されている。このうち、「光源」のデータは、光源の「位置・向き／光の強さ／色」で定義される発光源のデータである。例えば、光源が「蛍光灯」である場合には、（光源が）「頭上より下方向／光の強さは〜ｃｄ（カンデラ）／白色」というように定義される発光源のデータである。また、「陰影」は、上記の「光源」に対応した陰影のデータである。ただし、上記の環境データファイル８４に格納されている光の環境のデータのうち、「陰影」（のデータ）は、スマートフォン１のＣＰＵ１２及びＧＰＵ１３の処理能力が高く、上記の「光源」（のデータ）から、「光源」に対応した陰影をリアルタイムレンダリングすることが可能な場合は、データとして保有する必要はない。ここで、「陰影」の「リアルタイムレンダリング」とは、予め用意された陰影の情報により３ＤＣＧの画像を描画するのではなく、リアルタイムに高速で陰影の計算を行い３ＤＣＧの画像を描画することを意味する。 First, in the 3D model DB 83, image data of 3D models of various kinds of people having various attributes such as gender, age, and body shape are stored for each person. That is, each of the image data of the 3D model stored in the 3D model DB83 is, for example, the image data of the 3D model of one "thin and tall woman in her thirties". On the other hand, the status data file 85 stores the status data of each customer group corresponding to each selection button 87 on the status selection menu screen 86. In the situation data file 85, for example, as situation data (of the customer base) expressing "family", for example, "one woman in her 30s to 50s with an average body shape + 1 person in her 30s to 50s" The status data of "male of average body shape + 1 male of average body shape of 5 to 15 years old + 1 female of average body shape of 5 to 15 years old" is stored. Further, the above environment data file 84 stores data of the light environment (“light source” and “shadow”). Of these, the data of the "light source" is the data of the light emitting source defined by the "position / direction / light intensity / color" of the light source. For example, when the light source is a "fluorescent lamp", it is the data of the light emitting source defined as "downward from overhead / light intensity is ~ cd (candela) / white". Further, the "shadow" is the data of the shadow corresponding to the above "light source". However, among the optical environment data stored in the above environment data file 84, the "shadow" (data) has a high processing capacity of the CPU 12 and GPU 13 of the smartphone 1, and the above "light source" (data). ), If it is possible to render the shadow corresponding to the "light source" in real time, it is not necessary to hold it as data. Here, the "real-time rendering" of "shadow" means that the 3DCG image is drawn by calculating the shadow at high speed in real time, instead of drawing the 3DCG image based on the shadow information prepared in advance. To do.

従って、例えば、Ｓ２２の選択処理において、ユーザが、状況選択メニュー画面８６における、「家族連れが多い」という状況の選択ボタン８７ａを選択すると、上記のＳ２３で述べた３Ｄモデル画像の重畳処理の詳細は、以下のようになる。すなわち、３Ｄモデル重畳部８１は、状況データファイル８５から、「家族連れ」に対応する「１人の３０代〜５０代の平均的体形の女性＋１人の３０代〜５０代の平均的体形の男性＋１人の５〜１５歳の平均的体形の男子＋１人の５〜１５歳の平均的体形の女子」の状況データを読み取る。そして、３Ｄモデル重畳部８１は、上記の状況データに基づいて、３ＤモデルＤＢ８３から、１人の「３０代〜５０代の平均的体形の女性」の３Ｄモデルの画像と、１人の「３０代〜５０代の平均的体形の男性」の３Ｄモデルの画像と、１人の「５〜１５歳の平均的体形の男子」の３Ｄモデルの画像と、１人の「５〜１５歳の平均的体形の女子」の３Ｄモデルの画像とを読み出して、これらの３Ｄモデルの画像を組み合わせて、「家族連れ」の３Ｄモデルの画像を作成する。次に、３Ｄモデル重畳部８１は、上記の「家族連れ」の３Ｄモデルの画像を、カメラ位置方向推定部４４で推定したカメラ２の向きと位置とに応じて調整した上で、環境データファイル８４からランダムに読み取った光の環境（ランダムな光源の「位置・向き／光の強さ／色」と陰影）における、上記の調整後の「家族連れ」の３Ｄモデル画像を、カメラ２からの入力画像に重畳して、図２０に示すように、この重畳画像をタッチパネル１４上のライン８９の位置に表示させる。なお、上記の重畳画像の（３Ｄモデルの画像）における人の向き（人の描画方向）は、上記のライン８９の向きに応じて変化する。また、上記の人の向き（人の描画方向）は、３Ｄモデルの画像における人の数が複数の場合には、全員が同じ向きではなく、各人の向きは、数十度程度、ランダムに異なる。例えば、図１８及び図２０におけるライン８９を、これらの図における垂直方向に引いた場合には、上記の重畳画像の（３Ｄモデルの画像）における人９１ａ〜９１ｄは、横から見た向きに描画されるが、全員が同じ向きではなく、各人の向きには、ある程度のばらつきが生じる。 Therefore, for example, in the selection process of S22, when the user selects the selection button 87a of the situation of "many families" on the situation selection menu screen 86, the details of the 3D model image superimposition processing described in S23 above. Is as follows. That is, from the situation data file 85, the 3D model superimposition unit 81 shows that "one woman in her thirties to fifties with an average body shape + one woman in her thirties to fifties" corresponding to "family". Read the situation data of "male + 1 male with average body shape of 5 to 15 years + 1 female with average body shape of 5 to 15 years". Then, the 3D model superimposition unit 81 is based on the above situation data, from the 3D model DB83, an image of a 3D model of one "woman with an average body shape in her thirties to fifties" and one "30". An image of a 3D model of "a man with an average body shape in his teens to 50s", an image of a 3D model of one "boy with an average body shape of 5 to 15 years old", and an image of one "average of 5 to 15 years old" The image of the 3D model of the "girl with a body shape" is read out, and the images of these 3D models are combined to create the image of the 3D model of the "family". Next, the 3D model superimposition unit 81 adjusts the image of the above-mentioned “family-friendly” 3D model according to the orientation and position of the camera 2 estimated by the camera position direction estimation unit 44, and then adjusts the environment data file. A 3D model image of the above-adjusted "family" in a light environment randomly read from 84 ("position / direction / light intensity / color" and shadow of a random light source) is taken from the camera 2. It is superimposed on the input image, and as shown in FIG. 20, this superimposed image is displayed at the position of line 89 on the touch panel 14. The orientation of the person (drawing direction of the person) in the (3D model image) of the superimposed image changes according to the orientation of the line 89. In addition, the above-mentioned orientation of people (drawing direction of people) is not the same when there are a plurality of people in the image of the 3D model, and the orientation of each person is randomly about several tens of degrees. different. For example, when the lines 89 in FIGS. 18 and 20 are drawn in the vertical direction in these figures, the people 91a to 91d in the above superimposed image (image of the 3D model) are drawn in the direction viewed from the side. However, not everyone is in the same orientation, and there is some variation in the orientation of each person.

上記Ｓ２３の最初の重畳画像（ユーザが状況選択メニュー画面８６で最初に選択した選択ボタン８７の客層の状況に対応する３Ｄモデルの画像（例えば、「家族連れ」の３Ｄモデルの画像）を、環境データファイル８４からランダムに読み取った最初の光の環境で生成し、この３Ｄモデルの画像をカメラ２からの入力画像に重畳した画像）の表示が完了すると、スマートフォン１のＣＰＵ１２（の報知部４３）は、例えば、「カメラの設置位置と設置角度を調整してください」というメッセージを表示して、ユーザにカメラ２の設置位置と設置角度の調整を促す。ユーザによるカメラ２の設置位置と設置角度の調整後、スマートフォン１のＣＰＵ１２は、現在のカメラ映像の（フレーム）画像を取得し、併せて当該画像から、カメラ２の設置位置・設置角度を再度推定する（Ｓ２４）。 The first superimposed image of S23 (the image of the 3D model corresponding to the situation of the customer base of the selection button 87 first selected by the user on the situation selection menu screen 86 (for example, the image of the 3D model of "family") is used as the environment. When the display of the image generated in the first light environment randomly read from the data file 84 and the image of this 3D model superimposed on the input image from the camera 2) is completed, the CPU 12 (notification unit 43) of the smartphone 1 is completed. Displays, for example, the message "Please adjust the installation position and installation angle of the camera" to prompt the user to adjust the installation position and installation angle of the camera 2. After the user adjusts the installation position and installation angle of the camera 2, the CPU 12 of the smartphone 1 acquires the (frame) image of the current camera image, and estimates the installation position and installation angle of the camera 2 again from the image. (S24).

そして、スマートフォン１の画像分析部４１は、画像分析アプリケーション２１のスクリプトを実行して、現在のカメラ２の設置位置及び設置角度における、上記の重畳画像中の３Ｄモデルの人に対する（顔検出モデル２２による）顔検出処理と（顔認識モデル２３による）性別・年齢推定処理とを実行する（Ｓ２５）。この後、スマートフォン１のＣＰＵ１２（の推論精度測定部４２）は、顔認識モデル２３による性別・年齢推定の推論処理の精度の測定を行う（Ｓ２６）。 Then, the image analysis unit 41 of the smartphone 1 executes the script of the image analysis application 21 for the person of the 3D model in the above-mentioned superimposed image at the current installation position and installation angle of the camera 2 (face detection model 22). The face detection process (according to the face recognition model 23) and the gender / age estimation process (according to the face recognition model 23) are executed (S25). After that, the CPU 12 (inference accuracy measuring unit 42) of the smartphone 1 measures the accuracy of the inference processing for gender / age estimation by the face recognition model 23 (S26).

上記Ｓ２６の測定処理の結果、測定した推論処理の精度が所定の値に満たない場合（男女のいずれに分類される確率も所定の値に満たない場合、又はいずれの年齢（又は年齢層）に分類される確率も所定の値に満たない場合）は、スマートフォン１のＣＰＵ１２（の報知部４３）は、顔認識モデル２３による推論が失敗したと判定して（Ｓ２７でＮＯ）、光源の影響を考慮して、顔の認識（性別・年齢推定）に適した設置位置及び設置角度にカメラ２を動かすように、ユーザに指示する（Ｓ２８）。なお、上記の測定した推論処理の精度が所定の値に満たない場合には、上記の重畳画像中の３Ｄモデルの人（例えば、図２０中の３Ｄモデルの人９１ａ〜９１ｄ）の各々に対する顔認識モデル２３による性別・年齢推定の結果が、これらの３Ｄモデルの人の各々について設定した性別・年齢と異なる場合が、含まれる。 As a result of the measurement processing in S26, when the accuracy of the measured inference processing is less than the predetermined value (when the probability of being classified as either male or female is less than the predetermined value, or when any age (or age group) If the probability of classification is also less than a predetermined value), the CPU 12 (notification unit 43) of the smartphone 1 determines that the inference by the face recognition model 23 has failed (NO in S27), and affects the influence of the light source. In consideration of this, the user is instructed to move the camera 2 to an installation position and an installation angle suitable for face recognition (gender / age estimation) (S28). When the accuracy of the inference processing measured above is less than a predetermined value, the faces of the 3D model people (for example, the 3D model people 91a to 91d in FIG. 20) in the superimposed image are faced. The case where the result of gender / age estimation by the recognition model 23 is different from the gender / age set for each of the people of these 3D models is included.

具体的には、スマートフォン１の報知部４３は、例えば、図２０に示すように、現在のカメラ２の設置位置及び設置角度では、ライン８９上の人９１ｃの顔の検出及び認識（性別・年齢推定）はできるが、ライン８９上の人９１ａ、９１ｂ、及び９１ｄの顔の検出及び認識ができない場合には、図２１に示すように、「光源の影響を考慮して、全員の顔の認識に適した位置及び角度に、カメラを動かして下さい」というメッセージ９３を、タッチパネル１４に表示する。なお、図２０乃至図２３における９２ａ〜９２ｄの各々は、それぞれ、上記の３Ｄモデルの人９１ａ〜９１ｄの各々の顔に対応するバウンディングボックスである。また、顔認識モデル２３による推論が失敗した場合に（Ｓ２７でＮＯ）、報知部４３がユーザに行う指示は、Ｓ２４で推定したカメラ２の設置位置及び設置角度と、タッチパネル１４で指定された人の位置（ライン８９の位置）とに基づいて、（光源の影響を考慮した）顔の検出に適した具体的な設置位置及び設置角度にカメラ２を動かすようにする指示であってもよい。この場合には、報知部４３がユーザに行う指示は、第１の実施形態の図１２に示すメッセージ７６と類似したメッセージ（例えば、「光源の影響を考慮した上で、カメラの位置を高くし、水平方向に傾けて下さい」というメッセージ）になる。 Specifically, as shown in FIG. 20, the notification unit 43 of the smartphone 1 detects and recognizes the face of the person 91c on the line 89 (gender / age) at the current installation position and installation angle of the camera 2. (Estimation) is possible, but if the faces of people 91a, 91b, and 91d on line 89 cannot be detected and recognized, as shown in FIG. 21, "recognition of all faces in consideration of the influence of the light source". The message 93 "Please move the camera to a position and angle suitable for the camera" is displayed on the touch panel 14. Each of 92a to 92d in FIGS. 20 to 23 is a bounding box corresponding to each face of the above-mentioned 3D model person 91a to 91d. Further, when the inference by the face recognition model 23 fails (NO in S27), the instruction given to the user by the notification unit 43 is the installation position and angle of the camera 2 estimated in S24 and the person specified by the touch panel 14. It may be an instruction to move the camera 2 to a specific installation position and installation angle suitable for face detection (considering the influence of the light source) based on the position of (the position of the line 89). In this case, the instruction given to the user by the notification unit 43 is a message similar to the message 76 shown in FIG. 12 of the first embodiment (for example, "the position of the camera is raised in consideration of the influence of the light source". , Please tilt horizontally ").

上記のＳ２８における指示に従って、ユーザが、光源の影響を考慮して、顔の認識（性別・年齢推定）に適した設置位置及び設置角度にカメラ２を動かす（カメラ２の位置と角度を調整する）と（Ｓ２９）、スマートフォン１のＣＰＵ１２は、上記Ｓ２４のカメラ２の設置位置・設置角度の推定と、上記Ｓ２５の顔検出処理及び（顔認識モデル２３による）性別・年齢推定処理と、上記Ｓ２６の顔認識モデル２３による性別・年齢推定の推論処理の精度測定を、再度実行する。そして、顔認識モデル２３による性別・年齢推定の推論処理の精度が所定の値以上になると（男女のいずれかに分類される確率、及びいずれかの年齢（又は年齢層）に分類される確率が所定の値以上であって、上記の重畳画像中の３Ｄモデルの人の各々（の顔）に対する顔認識モデル２３による性別・年齢推定の結果が、これらの３Ｄモデルの人の各々について設定した性別・年齢と同じになると）、スマートフォン１のＣＰＵ１２は、現在表示されている重畳画像中の３Ｄモデルの人に対する顔認識モデル２３による推論が成功したと判定して（Ｓ２７でＹＥＳ）、Ｓ３０の判定処理に移る。 According to the instruction in S28 above, the user moves the camera 2 to an installation position and an installation angle suitable for face recognition (gender / age estimation) in consideration of the influence of the light source (adjusts the position and angle of the camera 2). ) And (S29), the CPU 12 of the smartphone 1 estimates the installation position / angle of the camera 2 in S24, the face detection process in S25, the gender / age estimation process (according to the face recognition model 23), and the S26. The accuracy measurement of the inference processing of gender / age estimation by the face recognition model 23 is executed again. Then, when the accuracy of the inference processing for gender / age estimation by the face recognition model 23 exceeds a predetermined value (probability of being classified as either male or female, and probability of being classified as any age (or age group)). The result of gender / age estimation by the face recognition model 23 for each (face) of the 3D model person in the above superimposed image, which is equal to or more than a predetermined value, is the gender set for each of these 3D model people. (When it becomes the same as the age), the CPU 12 of the smartphone 1 determines that the inference by the face recognition model 23 for the person of the 3D model in the currently displayed superimposed image is successful (YES in S27), and determines S30. Move on to processing.

Ｓ３０の判定処理では、全パターンの重畳画像中の３Ｄモデルの人についての顔認識モデル２３による推論が成功したか否かを判定する。ここで、「全パターンの重畳画像」とは、ユーザが状況選択メニュー画面８６で選択した全ての選択ボタン８７の客層の状況に対応する３Ｄモデルの画像を、環境データファイル８４からランダムに読み取った４〜５個の光の環境（データ）で生成し、これらの３Ｄモデルの画像をカメラ２からの入力画像に重畳した画像の全てを意味する。すなわち、「全パターンの重畳画像」とは、当該カメラ２の設置個所において想定される複数の客層の状況に対応する３Ｄモデルの画像の各々を、環境データファイル８４に格納された光の環境データに含まれる、ランダムな４〜５個の光の環境（光源の「位置・向き／光の強さ／色」と陰影）で生成し、これらの３Ｄモデルの画像をカメラ２からの入力画像に重畳した画像の全てを意味する。 In the determination process of S30, it is determined whether or not the inference by the face recognition model 23 for the person of the 3D model in the superimposed image of all patterns is successful. Here, the "superimposed image of all patterns" means that the image of the 3D model corresponding to the situation of the customer base of all the selection buttons 87 selected by the user on the situation selection menu screen 86 is randomly read from the environment data file 84. It means all the images generated in the environment (data) of 4 to 5 lights and superimposing the images of these 3D models on the input images from the camera 2. That is, the “superimposed image of all patterns” means that each of the images of the 3D model corresponding to the situation of a plurality of customer groups assumed at the installation location of the camera 2 is the environmental data of light stored in the environment data file 84. Generated in 4 to 5 random light environments (“position / direction / light intensity / color” and shading of the light source) included in, and use these 3D model images as input images from camera 2. It means all of the superimposed images.

Ｓ３０の判定処理において、全パターンの重畳画像中の３Ｄモデルの人に対する顔認識モデル２３による推論が終了していない（成功していない）場合には（Ｓ３０でＮＯ）、ＳｏＣ１１の３Ｄモデル重畳部８１は、上記の重畳画像における人の３Ｄモデルの画像と、（環境データファイル８４からランダムに読み取った）光の環境の少なくともいずれかを変更する（Ｓ３１）。すなわち、３Ｄモデル重畳部８１は、ユーザが状況選択メニュー画面８６で選択した、ある選択ボタン８７の客層の状況に対応する３Ｄモデルの画像に基づく重畳画像を、環境データファイル８４からランダムに読み取った所定の数（例えば、４〜５個）の光の環境のうち、全ての環境において、未だ生成していない場合には、上記の重畳画像における人の３Ｄモデルの画像は変更せず、上記の光の環境のみを変更する。一方、３Ｄモデル重畳部８１は、ユーザが状況選択メニュー画面８６で選択した、ある選択ボタン８７の客層の状況に対応する３Ｄモデルの画像に基づく重畳画像を、環境データファイル８４からランダムに読み取った所定の数の光の環境のうち、全ての環境において、既に生成済の場合は、上記の重畳画像における人の３Ｄモデルの画像を変更する。 In the determination process of S30, if the inference by the face recognition model 23 for the person of the 3D model in the superimposed image of all patterns is not completed (NO in S30), the 3D model overlay portion of SoC11 81 modifies at least one of the 3D model image of the person in the superimposed image and the light environment (randomly read from the environment data file 84) (S31). That is, the 3D model superimposition unit 81 randomly reads a superimposition image based on the image of the 3D model corresponding to the situation of the customer base of a certain selection button 87 selected by the user on the situation selection menu screen 86 from the environment data file 84. If a predetermined number (for example, 4 to 5) of light environments have not yet been generated in all the environments, the image of the human 3D model in the above superimposed image is not changed, and the above Change only the light environment. On the other hand, the 3D model superimposition unit 81 randomly reads a superimposition image based on the image of the 3D model corresponding to the situation of the customer base of a certain selection button 87 selected by the user on the situation selection menu screen 86 from the environment data file 84. If it has already been generated in all of the predetermined number of light environments, the image of the 3D model of the person in the above superimposed image is changed.

すなわち、３Ｄモデル重畳部８１は、入力画像に重畳させる人の３Ｄモデルの画像における、人の性別、年齢等の属性及び人数を、ユーザが状況選択メニュー画面８６で選択した種々の客層の状況に対応する属性及び人数に経時的に変化させると共に、これらの３Ｄモデルの画像の生成に用いる光の環境を、環境データファイル８４からランダムに読み取った所定の数（例えば、４〜５個）の光の環境に経時的に変化させながら、ある光の環境における、ある３Ｄモデルの画像に基づく重畳画像についての顔認識モデル２３による性別・年齢推定の推論処理を繰り返す。 That is, the 3D model superimposition unit 81 sets the attributes such as the gender and age of the person and the number of people in the image of the 3D model of the person to be superimposed on the input image to the situations of various customer groups selected by the user on the situation selection menu screen 86. A predetermined number (for example, 4 to 5) of lights randomly read from the environment data file 84 as the light environment used to generate images of these 3D models while changing the corresponding attributes and number of people over time. The inference process of gender / age estimation by the face recognition model 23 for the superimposed image based on the image of a certain 3D model in a certain light environment is repeated while changing the environment over time.

スマートフォン１のＣＰＵ１２は、Ｓ３０の判定処理において、全パターンの重畳画像中の３Ｄモデルの人（の顔）に対する顔認識モデル２３による推論が終了（成功）するまで、上記Ｓ２４乃至Ｓ３１の処理を繰り返す。そして、全パターンの重畳画像中の３Ｄモデルの人（の顔）に対する顔認識モデル２３による推論が終了（成功）すると（Ｓ３０でＹＥＳ）、スマートフォン１のＣＰＵ１２（の報知部４３）は、図２３に示すように、タッチパネル１４に、「調整が完了しました」というメッセージ９４を表示する（Ｓ３２）。 In the determination process of S30, the CPU 12 of the smartphone 1 repeats the processes of S24 to S31 until the inference by the face recognition model 23 for the person (face) of the 3D model in the superimposed image of all patterns is completed (successful). .. Then, when the inference by the face recognition model 23 for the person (face) of the 3D model in the superimposed image of all patterns is completed (successful) (YES in S30), the CPU 12 (notification unit 43) of the smartphone 1 is shown in FIG. 23. As shown in (S32), the message 94 "adjustment is completed" is displayed on the touch panel 14.

上記のように、スマートフォン１の報知部４３は、顔認識モデル２３による推論処理の精度を向上させるための（ユーザによるカメラ２の設置位置と設置角度の調整をサポートするための）インストラクション（メッセージ８８、９３等）を、タッチパネル１４上に表示する。 As described above, the notification unit 43 of the smartphone 1 is an instruction (message 88) for improving the accuracy of the inference processing by the face recognition model 23 (to support the user in adjusting the installation position and the installation angle of the camera 2). , 93, etc.) is displayed on the touch panel 14.

なお、上記のインストラクションの表示は、下記のリアルタイム式又はコマンド式のいずれの方式で行っても良い。リアルタイム式の場合は、上記図１７のフローチャート等に示した、「３Ｄモデルの画像を入力画像に重畳した重畳画像をタッチパネル上に表示」−＞「重畳画像（における顔）を認識」−＞「インストラクション（ユーザへの指示）を表示」の処理が、リアルタイムに実施される（自動的に繰り返される）ので、ユーザは、リアルタイムに更新されるインストラクション（指示）に従って、カメラ２の設置位置や設置角度を調整すればよい。これに対して、コマンド式の場合は、ユーザが、スマートフォン１のカメラ２を大体の位置と角度に設定して、スマートフォン１のチェックボタンを押すと、上記の「３Ｄモデルの画像を入力画像に重畳した重畳画像をタッチパネル上に表示」−＞「重畳画像（における顔）を認識」−＞「インストラクションを表示」の処理が、実行される。このインストラクション（指示）に従って、ユーザが、カメラ２の設置位置や設置角度を調整した後、チェックボタンを押すと、上記の処理（３Ｄモデルの画像の重畳表示、画像（の顔）認識、及びインストラクション表示）が繰り返されるので、ユーザが、このインストラクションに従って、カメラ２の設置位置や設置角度を再調整した後、再度チェックボタンを押すという処理を、「調整が完了しました」というメッセージ９４が表示されるまで、繰り返すという流れになる。 The above instruction may be displayed by any of the following real-time type or command type. In the case of the real-time type, as shown in the flowchart of FIG. 17 above, "display the superimposed image in which the image of the 3D model is superimposed on the input image is displayed on the touch panel"-> "recognize the superimposed image (in)"-> " Since the process of "displaying instructions (instructions to the user)" is executed in real time (automatically repeated), the user can follow the instructions (instructions) updated in real time to install the camera 2 at the installation position and angle. Should be adjusted. On the other hand, in the case of the command type, when the user sets the camera 2 of the smartphone 1 to the approximate position and angle and presses the check button of the smartphone 1, the above "3D model image is used as the input image". The process of "displaying the superimposed superimposed image on the touch panel"-> "recognizing the superimposed image (in)"-> "displaying the instruction" is executed. When the user presses the check button after adjusting the installation position and installation angle of the camera 2 according to this instruction (instruction), the above processing (3D model image superimposition display, image (face) recognition, and instruction) Since the display) is repeated, the user presses the check button again after readjusting the installation position and installation angle of the camera 2 according to this instruction, and the message 94 "Adjustment is completed" is displayed. Until then, it will be repeated.

上記のリアルタイム式の方が、ユーザフレンドリーであるが、スマートフォン１のコンピュータリソースを相当消費するため、処理能力が高くないスマートフォン１の場合は、上記のコマンド式の方が、システムのセッテイングをし易いと思われる。 Although the above real-time method is more user-friendly, it consumes a considerable amount of computer resources of the smartphone 1, so in the case of the smartphone 1 whose processing power is not high, the above command method is easier to set the system. I think that the.

上記のように、第２の実施形態のスマートフォン１及び画像分析アプリケーション２１（以下、「第２の実施形態のスマートフォン１等」と略す）によれば、上記第１の実施形態のスマートフォン１及び画像分析アプリケーション２１が有する効果に加えて、以下の効果を有する。すなわち、第２の実施形態のスマートフォン１等によれば、画像分析部４１が、３Ｄモデル重畳部８１によりカメラ２からの入力画像（フレーム画像）に重畳された３ＤＣＧの人の画像を、１つ以上の学習済ＮＮモデル（顔検出モデル２２及び顔認識モデル２３）を用いて分析して、推論精度測定部４２が、上記の１つ以上の学習済ＮＮモデルのうち、いずれかの学習済ＮＮモデル（顔認識モデル２３）による推論処理の精度を測定し、報知部４３が、上記の推論処理の精度に基づいて、この推論処理の精度を向上させるための指示情報をユーザに報知するようにした。これにより、第２の実施形態のスマートフォン１等によれば、上記第１の実施形態と異なり、実際に、人が、撮影位置に立ったり、ライン上を移動しなくても、３ＤＣＧの人に対する学習済ＮＮモデルによる推論処理の精度に基づいて、この推論処理の精度を向上させるための指示情報をユーザに報知することができる。従って、第２の実施形態のスマートフォン１等によれば、実際に、人が、撮影位置に立ったり、ライン上を移動しなくても、ユーザが、スマートフォン１のカメラ２の設置位置（撮影位置）や設置角度（撮影方向）を容易に調整することができる。 As described above, according to the smartphone 1 of the second embodiment and the image analysis application 21 (hereinafter, abbreviated as "smartphone 1 of the second embodiment"), the smartphone 1 and the image of the first embodiment. In addition to the effects of the analysis application 21, it has the following effects. That is, according to the smartphone 1 or the like of the second embodiment, the image analysis unit 41 superimposes one 3DCG person image superimposed on the input image (frame image) from the camera 2 by the 3D model superimposition unit 81. After analysis using the above trained NN models (face detection model 22 and face recognition model 23), the inference accuracy measuring unit 42 uses one of the above trained NN models to be trained. The accuracy of the inference processing by the model (face recognition model 23) is measured, and the notification unit 43 notifies the user of instruction information for improving the accuracy of the inference processing based on the accuracy of the inference processing described above. did. As a result, according to the smartphone 1 and the like of the second embodiment, unlike the first embodiment, the person does not actually stand at the shooting position or move on the line for the person of 3DCG. Based on the accuracy of the inference processing by the learned NN model, it is possible to notify the user of instruction information for improving the accuracy of the inference processing. Therefore, according to the smartphone 1 or the like of the second embodiment, the user does not have to actually stand at the shooting position or move on the line, but the user can install the camera 2 of the smartphone 1 (shooting position). ) And the installation angle (shooting direction) can be easily adjusted.

また、第２の実施形態のスマートフォン１等によれば、３Ｄモデル重畳部８１は、入力画像に重畳させる人の３Ｄモデルの画像における、人の性別、年齢等の属性及び人数を、ユーザが状況選択メニュー画面８６で選択した種々の客層の状況に対応する属性及び人数に経時的に変化させ、画像分析部４１は、上記の経時的に変化させた属性及び数の人の画像を入力画像に重畳させた種々の画像を、次々に分析するようにした。これにより、報知部４３が、種々のパターンの属性及び人数の画像の分析（顔の検出又は認識）に強い設置位置及び設置角度にカメラ２を動かすように、ユーザに促すことができる。 Further, according to the smartphone 1 or the like of the second embodiment, the 3D model superimposing unit 81 allows the user to determine the attributes and the number of people such as the gender and age of the person in the image of the 3D model of the person to be superimposed on the input image. The attributes and the number of people corresponding to the situations of various customer groups selected on the selection menu screen 86 are changed over time, and the image analysis unit 41 uses the above-mentioned images of the attributes and the number of people changed over time as input images. The various superimposed images were analyzed one after another. As a result, the notification unit 43 can urge the user to move the camera 2 to an installation position and an installation angle that is strong in image analysis (face detection or recognition) of various patterns of attributes and the number of people.

また、第２の実施形態のスマートフォン１等によれば、３Ｄモデル重畳部８１は、上記のように、入力画像に重畳させる人の３Ｄモデルの画像における、人の性別、年齢等の属性及び人数を、種々の属性及び人数に経時的に変化させるだけではなく、これらの３Ｄモデルの画像の生成に用いる光の環境を、環境データファイル８４からランダムに読み取った所定の数（例えば、４〜５個）の光の環境に経時的に変化させる。そして、画像分析部４１は、上記の所定の数（例えば、４〜５個）の光の環境における、種々の属性及び数の人の画像を入力画像に重畳させた種々の重畳画像を、次々に分析するようにした。これにより、報知部４３が、種々の光の環境における、種々のパターンの属性及び人数の画像の分析に強い設置位置及び設置角度にカメラ２を動かすように、ユーザに促すことができる。 Further, according to the smartphone 1 and the like of the second embodiment, the 3D model superimposing unit 81 has attributes such as gender and age of the person and the number of people in the 3D model image of the person to be superposed on the input image as described above. A predetermined number (for example, 4 to 5) of randomly reading the light environment used to generate images of these 3D models from the environment data file 84, as well as changing the number of people to various attributes over time. The light environment is changed over time. Then, the image analysis unit 41 sequentially superimposes various superimposed images on which images of various attributes and numbers of people are superimposed on the input image in the above-mentioned predetermined number (for example, 4 to 5) light environments. I tried to analyze it. As a result, the notification unit 43 can urge the user to move the camera 2 to an installation position and an installation angle that is strong in analyzing images of various patterns of attributes and numbers of people in various light environments.

変形例：
なお、本発明は、上記の各実施形態の構成に限られず、発明の趣旨を変更しない範囲で種々の変形が可能である。次に、本発明の変形例について説明する。 Modification example:
The present invention is not limited to the configuration of each of the above embodiments, and various modifications can be made without changing the gist of the invention. Next, a modification of the present invention will be described.

変形例１：
上記第１の実施形態では、推論精度測定部４２が、物体検出用ＮＮモデル（顔検出モデル２２及び人物検出モデル）の推論処理の精度を測定し、報知部４３が、物体検出用ＮＮモデルの推論処理の精度を向上させるためのインストラクションをユーザに報知する場合の例を示したが、推論精度測定部が、物体認識用ＮＮモデル（顔認識モデル及びベクトル化モデル等）の推論処理の精度を測定し、報知部が、物体認識用ＮＮモデルの推論処理の精度を向上させるためのインストラクションをユーザに報知するようにしてもよい。 Modification 1:
In the first embodiment, the inference accuracy measuring unit 42 measures the inference processing accuracy of the object detection NN model (face detection model 22 and the person detection model), and the notification unit 43 measures the object detection NN model. An example of notifying the user of instructions for improving the accuracy of inference processing has been shown, but the inference accuracy measurement unit determines the accuracy of inference processing of NN models for object recognition (face recognition model, vectorization model, etc.). The measurement may be performed and the notification unit may notify the user of an instruction for improving the accuracy of the inference processing of the NN model for object recognition.

変形例２：
上記の各実施形態では、ＮＮモデルの推論処理の精度を向上させるためのインストラクションをタッチパネル１４に表示することにより、インストラクションをユーザに報知したが、ＮＮモデルの推論処理の精度を向上させるためのインストラクションを、スマートフォンのスピーカを用いて、音声でユーザに報知するようにしてもよい。 Modification 2:
In each of the above embodiments, the instruction is notified to the user by displaying the instruction for improving the accuracy of the inference processing of the NN model on the touch panel 14, but the instruction for improving the accuracy of the inference processing of the NN model is performed. May be notified to the user by voice using the speaker of the smartphone.

変形例３：
上記の各実施形態では、ＮＮモデルの推論処理の精度を向上させるためにタッチパネル１４に表示されるインストラクション（指示情報）が、表示画像中の枠（赤枠６１、及び緑枠６３）、又はメッセージ（メッセージ６４、６５、７０、７３、７６、７７、８８、９３等）である場合の例を示したが、タッチパネル等の表示装置に表示されるインストラクションは、これに限られない。例えば、上記の赤枠の代わりに、表示画面上におけるマーク等を点滅させ、緑枠の代わりに、表示画面上におけるマーク等を点灯させてもよい。また、上記図１２に示すようなカメラを動かす方向を示す（文字情報の）メッセージを表示する代わりに、カメラを動かす方向を示す矢印を表示するようにしてもよい。 Modification 3:
In each of the above embodiments, the instruction (instruction information) displayed on the touch panel 14 in order to improve the accuracy of the inference processing of the NN model is a frame (red frame 61 and green frame 63) or a message in the display image. (Messages 64, 65, 70, 73, 76, 77, 88, 93, etc.) have been shown, but the instructions displayed on a display device such as a touch panel are not limited to this. For example, the mark or the like on the display screen may be blinked instead of the red frame, and the mark or the like on the display screen may be lit instead of the green frame. Further, instead of displaying a message (of text information) indicating the direction in which the camera is moved as shown in FIG. 12, an arrow indicating the direction in which the camera is moved may be displayed.

変形例４：
上記の各実施形態では、スマートフォン１が、内蔵のＧＰＵ１３を用いて、各ＮＮモデルの推論処理等を行う場合の例を示したが、スマートフォン１が、内蔵のＧＰＵ１３の代わりに、外付けのＤＮＮ推論用拡張チップを用いて、各ＮＮモデルの推論処理を行ってもよいし、内蔵のＧＰＵ１３と外付けのＤＮＮ推論用拡張チップの両方を用いて、各ＮＮモデルの推論処理を行ってもよい。 Modification 4:
In each of the above embodiments, an example is shown in which the smartphone 1 uses the built-in GPU 13 to perform inference processing of each NN model, but the smartphone 1 replaces the built-in GPU 13 with an external DNN. The inference processing of each NN model may be performed using the inference expansion chip, or the inference processing of each NN model may be performed using both the built-in GPU 13 and the external DNN inference expansion chip. ..

変形例５：
上記の各実施形態では、請求項における情報処理端末が、スマートフォン１である場合の例を示したが、情報処理端末は、これに限られず、例えば、カメラを備えたタブレット型コンピュータであってもよい。 Modification 5:
In each of the above embodiments, the case where the information processing terminal in the claim is the smartphone 1 is shown, but the information processing terminal is not limited to this, and for example, even a tablet computer provided with a camera is used. Good.

変形例６：
上記第２の実施形態では、請求項における「人の属性」が、主に、性別、年齢、体形（身長等の体格）である場合の例について示したが、「人の属性」は、これに限られず、例えば、服装や持ち物を含んでもよい。また、上記第２の実施形態では、入力画像に重畳させる３Ｄモデルの画像における人の属性及び人数と、この３Ｄモデルの画像の生成に用いる光の環境とを、次々と変更した（変化させた）が、上記の入力画像に重畳させる３Ｄモデルの画像における人の属性及び人数と、光の環境に加えて、入力画像に重畳させる３Ｄモデルの画像における人の向きを、次々と変更しても良い。これにより、報知部４３が、種々の人の向きのパターンに強い設置位置及び設置角度にカメラ２を動かすように、ユーザに促すことができる。 Modification 6:
In the second embodiment described above, an example is shown in which the "human attribute" in the claim is mainly gender, age, and body shape (physical constitution such as height), but the "human attribute" is this. For example, clothes and belongings may be included. Further, in the second embodiment, the attributes and the number of people in the image of the 3D model superimposed on the input image and the light environment used for generating the image of the 3D model are changed (changed) one after another. ) However, even if the attributes and number of people in the 3D model image superimposed on the above input image and the direction of the person in the 3D model image superimposed on the input image are changed one after another in addition to the light environment. good. As a result, the notification unit 43 can urge the user to move the camera 2 to an installation position and an installation angle that are strong against patterns of various human orientations.

変形例７：
上記第２の実施形態では、図１７のフローチャート等に示したように、「３Ｄモデルの画像を入力画像に重畳した重畳画像をタッチパネル上に表示」−＞「重畳画像（における）顔を認識」−＞「インストラクション（ユーザへの指示）を表示」の処理を繰り返す。従って、例えば、２００ミリ秒毎に、表示される３Ｄモデルの人の画像と、光の環境が変化するので、タッチパネル上に表示される画像が、かなりちらつく。この画面のちらつきを回避するために、ランダムな光の環境における３Ｄモデルの画像を、カメラからの入力画像に重畳した重畳画像を、実際にタッチパネル上に表示するのではなく、この重畳画像を、図１６に示すグラフィックスメモリ８２における、オフスクリーンメモリ（オフスクリーンバッファ）に出力して、このオフスクリーンメモリに格納した重畳画像（における顔）を、画像分析部（の顔認識モデル）が認識するようにしてもよい。 Modification 7:
In the second embodiment, as shown in the flowchart of FIG. 17, "displaying a superimposed image in which the image of the 3D model is superimposed on the input image is displayed on the touch panel"->"recognizing the face (in) of the superimposed image". -> Repeat the process of "Display instructions (instructions to the user)". Therefore, for example, the image of a person in the 3D model displayed every 200 milliseconds and the light environment change, so that the image displayed on the touch panel flickers considerably. In order to avoid the flickering of this screen, the superimposed image in which the image of the 3D model in the random light environment is superimposed on the input image from the camera is not actually displayed on the touch panel, but this superimposed image is displayed. The image analysis unit (face recognition model) recognizes the superimposed image (in) that is output to the off-screen memory (off-screen buffer) in the graphics memory 82 shown in FIG. 16 and stored in the off-screen memory. You may do so.

１スマートフォン（情報処理端末）
２カメラ
４スマホ管理サーバ（管理サーバ）
１０画像分析システム
１４タッチパネル（表示装置、ポインティングデバイス）
２１画像分析アプリケーション（画像分析プログラム）
２２顔検出モデル（学習済物体検出用ニューラルネットワークモデル）
４１画像分析部
４２推論精度測定部
４３報知部
４４カメラ位置方向推定部
６１赤枠（「指示情報」の一部）
６３緑枠（「指示情報」の一部）
６４、６５、７０、７３、７６、７７、８８、９３メッセージ（「指示情報」の一部）
８１３Ｄモデル重畳部（画像重畳部）
９１ａ〜９１ｄ３Ｄモデルの人（人又は顔の画像） 1 Smartphone (information processing terminal)
2 Camera 4 Smartphone management server (management server)
10 Image analysis system 14 Touch panel (display device, pointing device)
21 Image analysis application (image analysis program)
22 Face detection model (learned object detection neural network model)
41 Image analysis unit 42 Inference accuracy measurement unit 43 Notification unit 44 Camera position / direction estimation unit 61 Red frame (part of "instruction information")
63 Green frame (part of "instruction information")
64, 65, 70, 73, 76, 77, 88, 93 messages (part of "instruction information")
81 3D model superimposition part (image superimposition part)
91a-91d 3D model person (image of person or face)

上記課題を解決するために、本発明の第１の態様による画像分析プログラムは、カメラを備えた情報処理端末を、前記カメラからの入力画像に映り込んだ物体を検出するための学習済物体検出用ニューラルネットワークモデルを含む、１つ以上の学習済ニューラルネットワークモデルを用いて、前記カメラからの入力画像を分析する画像分析部と、前記１つ以上の学習済ニューラルネットワークモデルのうち、いずれかの学習済ニューラルネットワークモデルによる推論処理の精度を測定する推論精度測定部と、前記カメラの設置位置及び設置角度を推定するカメラ位置方向推定部と、前記カメラからの入力画像を表示するための表示装置と、前記表示装置に表示された前記カメラからの入力画像に対して、前記物体の検出位置を指定するための操作部と、前記推論精度測定部による測定の結果、測定した推論処理の精度が所定の値に満たない場合、前記カメラ位置方向推定部により推定された前記カメラの設置位置及び設置角度と、前記操作部により指定された前記物体の検出位置とに基づいて、前記推論処理の精度を向上させるための指示情報をユーザに報知する報知部として機能させる。 In order to solve the above problems, the image analysis program according to the first aspect of the present invention uses an information processing terminal equipped with a camera to detect a trained object for detecting an object reflected in an input image from the camera. One of the image analysis unit that analyzes the input image from the camera and the one or more trained neural network models using one or more trained neural network models including the neural network model for the camera. An inference accuracy measurement unit that measures the accuracy of inference processing by a trained neural network model, a camera position direction estimation unit that estimates the installation position and installation angle of the camera, and a display device for displaying an input image from the camera. When the the input image from the camera which is displayed on the display device, and an operation section for designating the detection position of the object, the inference result of the measurement that by the precision measurement unit, the measured inference processing When the accuracy is less than a predetermined value, the inference process is based on the camera installation position and installation angle estimated by the camera position direction estimation unit and the detection position of the object specified by the operation unit. It functions as a notification unit that notifies the user of instruction information for improving the accuracy of the camera.

この画像分析プログラムにおいて、前記報知部は、前記指示情報を前記表示装置に表示することにより、前記指示情報をユーザに報知してもよい。 In the image analysis program, the notification unit, by displaying the instruction information on the display device, may be notified of the instruction information to the user.

本発明の第３の態様による情報処理端末は、カメラと、前記カメラからの入力画像に映り込んだ物体を検出するための学習済物体検出用ニューラルネットワークモデルを含む、１つ以上の学習済ニューラルネットワークモデルを用いて、前記カメラからの入力画像を分析する画像分析部と、前記１つ以上の学習済ニューラルネットワークモデルのうち、いずれかの学習済ニューラルネットワークモデルによる推論処理の精度を測定する推論精度測定部と、前記カメラの設置位置及び設置角度を推定するカメラ位置方向推定部と、前記カメラからの入力画像を表示するための表示装置と、前記表示装置に表示された前記カメラからの入力画像に対して、前記物体の検出位置を指定するための操作部と、前記推論精度測定部による測定の結果、測定した推論処理の精度が所定の値に満たない場合、前記カメラ位置方向推定部により推定された前記カメラの設置位置及び設置角度と、前記操作部により指定された前記物体の検出位置とに基づいて、前記推論処理の精度を向上させるための指示情報をユーザに報知する報知部とを備える。 The information processing terminal according to the third aspect of the present invention includes a camera and one or more trained neurals including a trained object detection neural network model for detecting an object reflected in an input image from the camera. An image analysis unit that analyzes an input image from the camera using a network model, and an inference that measures the accuracy of inference processing by one of the trained neural network models of the one or more trained neural network models. An accuracy measurement unit, a camera position direction estimation unit that estimates the installation position and installation angle of the camera, a display device for displaying an input image from the camera, and an input from the camera displayed on the display device. the image, an operation section for designating the detection position of the object, the inference result of the precision measuring portion that by the measurement, if the accuracy of the measured inference processing is less than a predetermined value, the camera position direction Based on the installation position and installation angle of the camera estimated by the estimation unit and the detection position of the object specified by the operation unit , the user is notified of instruction information for improving the accuracy of the inference processing. It is equipped with a notification unit.

この情報処理端末において、前記報知部は、前記指示情報を前記表示装置に表示することにより、前記指示情報をユーザに報知してもよい。 In the information processing terminal, wherein the notification unit, by displaying the instruction information on the display device, may be notified of the instruction information to the user.

この情報処理端末において、前記操作部は、ポインティングデバイスであり、前記報知部は、前記推論精度測定部により測定された推論処理の精度に加えて、前記カメラ位置方向推定部により推定された前記カメラの設置位置及び設置角度と、前記ポインティングデバイスにより指定された前記人又は顔の検出位置とに基づいて、前記指示情報を求めてもよい。 In this information processing terminal, the operation unit is a pointing device, and the notification unit is the camera estimated by the camera position direction estimation unit in addition to the accuracy of the inference processing measured by the inference accuracy measurement unit. The instruction information may be obtained based on the installation position and the installation angle of the above and the detection position of the person or face designated by the pointing device.

本発明の第２の態様による画像分析プログラム、及び第４の態様による情報処理端末によれば、１つ以上の学習済ニューラルネットワークモデルのうち、いずれかの学習済ニューラルネットワークモデルによる推論処理の精度を測定して、測定した推論処理の精度に基づいて、この推論処理の精度を向上させるための指示情報をユーザに報知するようにした。この報知された指示情報に基づいて、ユーザが、情報処理端末のカメラの設置位置（撮影位置）や設置角度（撮影方向）を調整することにより、専門の技術スタッフが、カメラの設置場所に赴いて、カメラの設置位置や設置角度の調整を行うことなく、適切な学習済ニューラルネットワークモデル（学習済物体検出用ニューラルネットワーク又は学習済物体認識用ニューラルネットワーク）の推論精度を得ることができる。従って、ユーザが容易にカメラを設置することができる。
また、本発明の第１の態様による画像分析プログラム、及び第３の態様による情報処理端末によれば、１つ以上の学習済ニューラルネットワークモデルのうち、いずれかの学習済ニューラルネットワークモデルによる推論処理の精度を測定して、測定した推論処理の精度が所定の値に満たない場合、推定したカメラの設置位置及び設置角度と、操作部により指定された物体の検出位置とに基づいて、推論処理の精度を向上させるための指示情報をユーザに報知するようにした。この報知された指示情報に基づいて、ユーザが、情報処理端末のカメラの設置位置（撮影位置）や設置角度（撮影方向）を調整することにより、専門の技術スタッフが、カメラの設置場所に赴いて、カメラの設置位置や設置角度の調整を行うことなく、適切な学習済ニューラルネットワークモデル（学習済物体検出用ニューラルネットワーク又は学習済物体認識用ニューラルネットワーク）の推論精度を得ることができる。従って、ユーザが容易にカメラを設置することができる。
The second image analysis program according to aspects of the present invention, according to the information processing terminal according to the fourth aspect及beauty, the one or more trained neural network model, the inference processing with either learned neural network model The accuracy is measured, and based on the measured accuracy of the inference processing, instruction information for improving the accuracy of the inference processing is notified to the user. Based on this notified instruction information, the user adjusts the installation position (shooting position) and installation angle (shooting direction) of the camera of the information processing terminal, and the specialized technical staff goes to the camera installation location. Therefore, it is possible to obtain an appropriate inference accuracy of the trained neural network model (learned object detection neural network or trained object recognition neural network) without adjusting the installation position and installation angle of the camera. Therefore, the user can easily install the camera.
Further, according to the image analysis program according to the first aspect of the present invention and the information processing terminal according to the third aspect, inference processing by one of the trained neural network models among one or more trained neural network models. If the accuracy of the measured inference processing is less than a predetermined value, the inference processing is performed based on the estimated installation position and angle of the camera and the detection position of the object specified by the operation unit. The user is notified of the instruction information for improving the accuracy of the information processing. Based on this notified instruction information, the user adjusts the installation position (shooting position) and installation angle (shooting direction) of the camera of the information processing terminal, and the specialized technical staff goes to the camera installation location. Therefore, it is possible to obtain an appropriate inference accuracy of the trained neural network model (learned object detection neural network or trained object recognition neural network) without adjusting the installation position and installation angle of the camera. Therefore, the user can easily install the camera.

Claims

An information processing terminal equipped with a camera
An image that analyzes an input image from the camera using one or more trained neural network models, including a trained object detection neural network model for detecting an object reflected in the input image from the camera. With the analysis department
An inference accuracy measuring unit that measures the accuracy of inference processing by any of the trained neural network models among the one or more trained neural network models.
An image analysis program for functioning as a notification unit for notifying a user of instruction information for improving the accuracy of the inference processing based on the accuracy of the inference processing measured by the inference accuracy measurement unit.

The object detected by the trained object detection neural network model is a person or a face.
The instruction information is characterized in that it is instruction information that prompts the user to move the camera to an installation position or an installation angle suitable for detecting the person or face by the learned object detection neural network model. The image analysis program according to claim 1.

The image analysis program according to claim 1 or 2, wherein the notification unit notifies the user of the instruction information by displaying the instruction information on a display device.

An information processing terminal equipped with a camera
An image superimposing unit that superimposes a person or face image on the input image from the camera, and
An image analysis unit that analyzes a person or face image superimposed by the image superimposition unit using one or more trained neural network models, and an image analysis unit.
An inference accuracy measuring unit that measures the accuracy of inference processing by any of the trained neural network models among the one or more trained neural network models.
An image analysis program for functioning as a notification unit for notifying a user of instruction information for improving the accuracy of the inference processing based on the accuracy of the inference processing measured by the inference accuracy measurement unit.

The claim is characterized in that the instruction information is instruction information for prompting the user to move the camera to an installation position or an installation angle suitable for performing inference processing by any of the learned neural network models. Item 4. The image analysis program according to item 4.

The image analysis program according to claim 4 or 5, wherein the notification unit notifies the user of the instruction information by displaying the instruction information on a display device.

The image superimposing unit changes the attributes and numbers of people in the image of a person or face superimposed on the input image with time to various attributes and numbers.
The image analysis unit analyzes various images obtained by superimposing the images of people or faces with the attributes and numbers changed over time on the input images one after another.
The notification unit is based on the accuracy of inference processing measured by the inference accuracy measurement unit for various images in which images of people or faces with attributes and numbers changed over time are superimposed on the input image. The image analysis program according to any one of claims 4 to 6, wherein the user is notified of instruction information for improving the accuracy of the inference processing.

With the camera
An image that analyzes an input image from the camera using one or more trained neural network models, including a trained object detection neural network model for detecting an object reflected in the input image from the camera. With the analysis department
An inference accuracy measuring unit that measures the accuracy of inference processing by any of the trained neural network models among the one or more trained neural network models.
An information processing terminal including a notification unit that notifies a user of instruction information for improving the accuracy of the inference processing based on the accuracy of the inference processing measured by the inference accuracy measurement unit.

The object detected by the trained object detection neural network model is a person or a face.
The instruction information is characterized in that it is instruction information that prompts the user to move the camera to an installation position or an installation angle suitable for detecting the person or face by the learned object detection neural network model. The information processing terminal according to claim 8.

The information processing terminal further includes a display device for displaying an input image from the camera.
The information processing terminal according to claim 9, wherein the notification unit notifies the user of the instruction information by displaying the instruction information on the display device.

The information processing terminal is
A camera position direction estimation unit that estimates the installation position and installation angle of the camera based on the input image from the camera, and
A pointing device for designating the detection position of the person or face with respect to the input image from the camera displayed on the display device is further provided.
In addition to the accuracy of the inference processing measured by the inference accuracy measuring unit, the notification unit includes the installation position and the installation angle of the camera estimated by the camera position direction estimation unit, and the pointing device designated by the pointing device. The information processing terminal according to claim 10, wherein the instruction information is obtained based on the detection position of a person or a face.

With the camera
An image superimposing unit that superimposes a person or face image on the input image from the camera, and
An image analysis unit that analyzes a person or face image superimposed by the image superimposition unit using one or more trained neural network models, and an image analysis unit.
An inference accuracy measuring unit that measures the accuracy of inference processing by any of the trained neural network models among the one or more trained neural network models.
An information processing terminal including a notification unit that notifies a user of instruction information for improving the accuracy of the inference processing based on the accuracy of the inference processing measured by the inference accuracy measurement unit.

The claim is characterized in that the instruction information is instruction information for prompting the user to move the camera to an installation position or an installation angle suitable for performing inference processing by any of the learned neural network models. Item 12. The information processing terminal according to item 12.

The information processing terminal further includes a display device that displays an input image from the camera.
The information processing terminal according to claim 13, wherein the notification unit notifies the user of the instruction information by displaying the instruction information on the display device.

The image superimposing unit changes the attributes and numbers of people in the image of a person or face superimposed on the input image with time to various attributes and numbers.
The image analysis unit analyzes various images obtained by superimposing the images of people or faces with the attributes and numbers changed over time on the input images one after another.
The notification unit is based on the accuracy of inference processing measured by the inference accuracy measurement unit for various images in which images of people or faces with attributes and numbers changed over time are superimposed on the input image. The information processing terminal according to any one of claims 12 to 14, wherein instruction information for improving the accuracy of the inference processing is notified to the user.

The information processing terminal according to any one of claims 8 to 15.
An image analysis system including a management server that manages the information processing terminal, including installation of the trained neural network model on the information processing terminal.