JP6989873B2

JP6989873B2 - System, image recognition method, and computer

Info

Publication number: JP6989873B2
Application number: JP2018003548A
Authority: JP
Inventors: 意強戚; 海元呉; 謙陳
Original assignee: WAKAYAMA UNIVERSITY; Hitachi Solutions Ltd
Current assignee: WAKAYAMA UNIVERSITY; Hitachi Solutions Ltd
Priority date: 2018-01-12
Filing date: 2018-01-12
Publication date: 2022-01-12
Anticipated expiration: 2038-01-12
Also published as: JP2019125031A

Description

本発明は、画像認識を行うシステム、方法、及び計算機に関する。 The present invention relates to a system, a method, and a computer for performing image recognition.

動画等の画像データから人物及び車両等の対象物を追跡する技術が知られている（例えば、特許文献１を参照）。 A technique for tracking an object such as a person or a vehicle from image data such as a moving image is known (see, for example, Patent Document 1).

特許文献１には、「画像を逐次取得する取得部と、画像から部分領域を抽出して当該部分領域から特徴量を抽出する抽出部と、抽出された特徴量と、対象物体を示す正事例の特徴量および対象物体の背景を示す負事例の特徴量を含む第１の認識モデルまたは正事例の特徴量を含む第２の認識モデルとに基づいて、部分領域が対象物体か否か認識する認識部と、認識の結果に基づいて、抽出された特徴量を第１の認識モデルへ追加して更新する更新部と、対象物体と認識された物体領域を出力する出力部と、を備え、認識部は、取得部により取得された前の画像について物体領域が出力されている場合、第１の認識モデルに基づいて認識し、前の画像について物体領域が出力されていない場合、第２の認識モデルに基づいて認識する。」ことが記載されている。 Patent Document 1 describes "a normal example showing an acquisition unit for sequentially acquiring images, an extraction unit for extracting a partial region from an image and extracting a feature amount from the partial region, the extracted feature amount, and a target object". Whether or not the partial region is the target object is recognized based on the first recognition model including the feature amount of the negative case showing the feature amount of the target object and the feature amount of the positive case or the second recognition model including the feature amount of the positive case. It includes a recognition unit, an update unit that adds and updates the extracted feature amount to the first recognition model based on the recognition result, and an output unit that outputs the object area recognized as the target object. The recognition unit recognizes based on the first recognition model when the object area is output for the previous image acquired by the acquisition unit, and when the object area is not output for the previous image, the second recognition unit is used. Recognize based on the recognition model. "

特開２０１２－２３８１１９号公報Japanese Unexamined Patent Publication No. 2012-238119

従来の画像認識システムでは、追跡対象を追跡するためには、予め、追跡対象を認識するための認識モデル（識別器）を用意する必要がある。一般的に、認識モデルは、多数の学習用データを用いる事前学習を行って生成する必要があり、事前学習の完了までに時間を要する。 In the conventional image recognition system, in order to track the tracking target, it is necessary to prepare a recognition model (discriminator) for recognizing the tracking target in advance. Generally, the recognition model needs to be generated by performing pre-learning using a large amount of training data, and it takes time to complete the pre-learning.

本発明は、追跡対象の追跡の開始が指示された場合、事前学習を行うことなく、追跡対象の追跡を迅速に開始することができるシステム、方法、及び計算機を提供する。 The present invention provides a system, a method, and a computer capable of promptly starting tracking of a tracked object without prior learning when instructed to start tracking the tracked object.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、画像データに含まれる追跡対象を追跡するための認識処理を実行する認識サーバ及び前記認識処理に使用される識別器の学習を行う学習サーバを備えるシステムであって、前記認識サーバは、第１演算装置、前記第１演算装置に接続される第１記憶装置、前記第１演算装置に接続される第１インタフェースを有し、前記学習サーバは、第２演算装置、前記第２演算装置に接続される第２記憶装置、前記第２演算装置に接続される第２インタフェースを有し、前記認識サーバは、前記追跡対象が指定された初期画像データを受け付け、前記追跡対象より小さいサイズの複数のブロックに前記初期画像データを分割し、前記初期画像データに含まれる各ブロックの特徴量を算出し、前記追跡対象の一部を含むブロックであるか否かを示す教師信号を、前記初期画像データに含まれる各ブロックの特徴量に付与することによって初期学習用データを生成し、前記初期学習用データを前記学習サーバに送信し、前記学習サーバは、前記初期学習用データを用いた学習を行うことによって、受信した前記画像データに含まれる各ブロックを、前記追跡対象の一部を含むブロック及び前記追跡対象を含まないブロックのいずれかに分類する識別器を生成し、前記識別器を前記認識サーバに送信し、前記認識サーバは、前記識別器を用いて、受信した前記画像データに対する前記認識処理を実行し、前記認識処理の結果を出力することを特徴とする。 A typical example of the invention disclosed in the present application is as follows. That is, it is a system including a recognition server that executes recognition processing for tracking the tracking target included in the image data and a learning server that learns the classifier used for the recognition processing, and the recognition server is the first. It has 1 arithmetic unit, a 1st storage device connected to the 1st arithmetic apparatus, and a 1st interface connected to the 1st arithmetic apparatus, and the learning server is used in the 2nd arithmetic apparatus and the 2nd arithmetic apparatus. The recognition server has a second storage device to be connected and a second interface connected to the second arithmetic unit, and the recognition server receives initial image data to which the tracking target is designated, and a plurality of sizes smaller than the tracking target. The initial image data is divided into blocks, the feature amount of each block included in the initial image data is calculated, and a teacher signal indicating whether or not the block includes a part of the tracking target is used as the initial image. Initial learning data is generated by adding to the feature amount of each block included in the data, the initial learning data is transmitted to the learning server, and the learning server performs learning using the initial learning data. By doing so, a discriminator for classifying each block included in the received image data into either a block including a part of the tracking target or a block not including the tracking target is generated, and the discriminator is referred to as the discriminator. The data is transmitted to the recognition server, and the recognition server uses the classifier to execute the recognition process on the received image data and outputs the result of the recognition process.

本発明の一形態によれば、多数の学習用データを用いる事前学習を行うことなく、追跡対象の追跡を迅速に開始できる。上記した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 According to one embodiment of the present invention, tracking of a tracking target can be started quickly without performing pre-learning using a large amount of learning data. Issues, configurations and effects other than those mentioned above will be clarified by the description of the following examples.

実施例１の計算機システムの構成例を示す図である。It is a figure which shows the configuration example of the computer system of Example 1. FIG. 実施例１の認識サーバのハードウェア構成及びソフトウェア構成の一例を示す図である。It is a figure which shows an example of the hardware configuration and software configuration of the recognition server of Example 1. FIG. 実施例１の認識モジュールが実行する初期学習の処理を説明するフローチャートである。It is a flowchart explaining the process of initial learning executed by the recognition module of Embodiment 1. FIG. 実施例１の認識モジュールが実行する、初期学習完了後の追跡対象の追跡処理を説明するフローチャートである。It is a flowchart explaining the tracking process of the tracking target after the completion of the initial learning executed by the recognition module of the first embodiment. 実施例１の認識サーバに入力される、追跡対象領域が設定された画像データの一例を示す図である。It is a figure which shows an example of the image data in which the tracking target area is set input to the recognition server of Example 1. FIG. 実施例１の認識サーバが生成する学習用データの一例を示す図である。It is a figure which shows an example of the learning data generated by the recognition server of Example 1. FIG. 実施例１の学習モジュールが実行する処理を説明するフローチャートである。It is a flowchart explaining the process which the learning module of Example 1 executes.

以下、本発明に係る実施例を添付図面を用いて説明する。各図において共通の構成については同一の参照符号が付されている。 Hereinafter, examples of the present invention will be described with reference to the accompanying drawings. The same reference numerals are given to the common configurations in each figure.

図１は、実施例１の計算機システムの構成例を示す図である。 FIG. 1 is a diagram showing a configuration example of the computer system of the first embodiment.

計算機システムは、認識サーバ１００、学習サーバ１０１、撮像装置１０５、及びデータベース１０６から構成される。なお、計算機システムは、撮像装置１０５及びデータベース１０６のいずれかを含む構成でもよい。 The computer system includes a recognition server 100, a learning server 101, an image pickup device 105, and a database 106. The computer system may be configured to include either the image pickup apparatus 105 or the database 106.

認識サーバ１００、学習サーバ１０１、撮像装置１０５、及びデータベース１０６は、直接又はネットワークを介して互いに接続される。本発明はネットワークの種別に限定されない。ネットワークの種別としては、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）及びＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）がある。また、ネットワークの接続方式は有線及び無線のいずれでもよい。 The recognition server 100, the learning server 101, the image pickup device 105, and the database 106 are connected to each other directly or via a network. The present invention is not limited to the type of network. The types of networks include LAN (Local Area Network) and WAN (Wide Area Network). Further, the network connection method may be either wired or wireless.

撮像装置１０５は、カメラ等の装置であり、撮影された画像を画像データ１５０として認識サーバ１００又はデータベース１０６に送信する。データベース１０６は、画像データ１５０を格納する。 The image pickup device 105 is a device such as a camera, and transmits the captured image as image data 150 to the recognition server 100 or the database 106. The database 106 stores the image data 150.

認識サーバ１００は、認識処理を実行する認識モジュール１１０を有する。認識モジュール１１０は、撮像装置１０５又はデータベース１０６から画像データ１５０を取得し、取得された画像データ１５０の認識処理を実行する。また、認識モジュール１１０は、画像データ１５０を用いて学習用データを生成し、学習サーバ１０１に送信する。 The recognition server 100 has a recognition module 110 that executes recognition processing. The recognition module 110 acquires the image data 150 from the image pickup apparatus 105 or the database 106, and executes the recognition process of the acquired image data 150. Further, the recognition module 110 generates learning data using the image data 150 and transmits it to the learning server 101.

なお、本実施例の認識処理は、画像データに含まれる追跡対象を追跡するための処理である。 The recognition process of this embodiment is a process for tracking the tracking target included in the image data.

学習サーバ１０１は、深層学習による学習を行う学習モジュール１２０を有する。学習モジュール１２０は、学習用データを用いて認識処理に使用される識別器の学習を行う。また、学習モジュール１２０は、学習された識別器を認識サーバ１００に送信する。 The learning server 101 has a learning module 120 that performs learning by deep learning. The learning module 120 learns the classifier used for the recognition process using the learning data. Further, the learning module 120 transmits the learned classifier to the recognition server 100.

識別器は関数又は行列等のデータとして表される。本実施例では、追跡対象が指定された画像データ１５０が入力される時点では、当該追跡対象を認識するための識別器は学習されていない。後述するように、学習モジュール１２０は、追跡対象が指定された画像データ１５０に基づいて生成された初期学習用データを用いて識別器の学習を行う。 The classifier is represented as data such as a function or a matrix. In this embodiment, at the time when the image data 150 for which the tracking target is designated is input, the classifier for recognizing the tracking target has not been learned. As will be described later, the learning module 120 learns the classifier using the initial learning data generated based on the image data 150 to which the tracking target is designated.

以下の説明では、追跡対象が指定された画像データ１５０を初期画像データ１５０と記載する。 In the following description, the image data 150 for which the tracking target is designated is referred to as the initial image data 150.

なお、認識サーバ１００及び学習サーバ１０１は、物理的に異なる計算機として表しているが、一つの計算機上で稼働する仮想計算機を用いて実現してもよい。 Although the recognition server 100 and the learning server 101 are represented as physically different computers, they may be realized by using a virtual computer running on one computer.

図２は、実施例１の認識サーバ１００のハードウェア構成及びソフトウェア構成の一例を示す図である。 FIG. 2 is a diagram showing an example of a hardware configuration and a software configuration of the recognition server 100 of the first embodiment.

認識サーバ１００は、ハードウェアとして、演算装置２０１、記憶装置２０２、ネットワークインタフェース２０３、及びＩ／Ｏインタフェース２０４を有する。各ハードウェアは、内部バス等を介して互いに接続される。 The recognition server 100 has an arithmetic unit 201, a storage device 202, a network interface 203, and an I / O interface 204 as hardware. The hardware is connected to each other via an internal bus or the like.

なお、学習サーバ１０１のハードウェア構成は、認識サーバ１００と同一であるものとする。 It is assumed that the hardware configuration of the learning server 101 is the same as that of the recognition server 100.

演算装置２０１は、記憶装置２０２に格納されるプログラムを実行する。演算装置２０１がプログラムにしたがって処理を実行することによって、特定の機能を実現するモジュールとして動作する。以下の説明では、モジュールを主語に処理を説明する場合、演算装置２０１が当該モジュールを実現するプログラムを実行していることを示す。 The arithmetic unit 201 executes a program stored in the storage device 202. The arithmetic unit 201 operates as a module that realizes a specific function by executing processing according to a program. In the following description, when the process is described with the module as the subject, it is shown that the arithmetic unit 201 is executing the program that realizes the module.

記憶装置２０２は、演算装置２０１が実行するプログラム及びプログラムが使用する情報を格納する。また、記憶装置２０２は、プログラムが一時的に使用するワークエリアを含む。記憶装置２０２は、例えば、メモリ等が考えられる。 The storage device 202 stores the program executed by the arithmetic unit 201 and the information used by the program. The storage device 202 also includes a work area that the program temporarily uses. As the storage device 202, for example, a memory or the like can be considered.

実施例１の記憶装置２０２は、認識モジュール１１０を実現するプログラムを格納する。また、記憶装置２０２には、処理対象の画像データ１５０が一時的に格納される。 The storage device 202 of the first embodiment stores a program that realizes the recognition module 110. Further, the image data 150 to be processed is temporarily stored in the storage device 202.

ネットワークインタフェース２０３は、ネットワークを介して他の装置と接続するためのインタフェースである。 The network interface 203 is an interface for connecting to another device via a network.

Ｉ／Ｏインタフェース２０４は、図示しない入力装置及び出力装置と接続するためのインタフェースである。入力装置は、例えば、キーボード、マウス、及びタッチパネル等である。出力装置は、例えば、タッチパネル及びディスプレイ等である。 The I / O interface 204 is an interface for connecting to an input device and an output device (not shown). The input device is, for example, a keyboard, a mouse, a touch panel, or the like. The output device is, for example, a touch panel, a display, or the like.

なお、認識サーバ１００が有する認識モジュール１１０については、一つのモジュールを機能毎に複数のモジュールに分けてもよい。 Regarding the recognition module 110 included in the recognition server 100, one module may be divided into a plurality of modules for each function.

図３Ａ及び図３Ｂは、実施例１の認識モジュール１１０が実行する処理を説明するフローチャートである。図３Ａは、実施例１の認識モジュールが実行する初期学習の処理を説明するフローチャートである。図３Ｂは、実施例１の認識モジュールが実行する、初期学習完了後の追跡対象の追跡処理を説明するフローチャートである。図４は、実施例１の認識サーバ１００に入力される、追跡対象領域が設定された画像データ１５０の一例を示す図である。図５は、実施例１の学習用データの生成過程の一例を示す図である。 3A and 3B are flowcharts illustrating the processing executed by the recognition module 110 of the first embodiment. FIG. 3A is a flowchart illustrating the initial learning process executed by the recognition module of the first embodiment. FIG. 3B is a flowchart illustrating the tracking process of the tracking target after the completion of the initial learning, which is executed by the recognition module of the first embodiment. FIG. 4 is a diagram showing an example of image data 150 in which a tracking target area is set, which is input to the recognition server 100 of the first embodiment. FIG. 5 is a diagram showing an example of a process of generating learning data according to the first embodiment.

まず、認識モジュール１１０は、初期画像データ１５０の入力を受け付ける（ステップＳ１０１）。本実施例では、初期画像データ１５０は一つであるものとする。 First, the recognition module 110 accepts the input of the initial image data 150 (step S101). In this embodiment, it is assumed that the initial image data 150 is one.

ここで、図４を用いて追跡対象を指定する方法について説明する。ユーザは、出力装置に表示される画像データ１５０を参照し、追跡対象を囲む領域を設定することによって追跡対象を指定する。以下の説明では、追跡対象が存在する領域を追跡対象領域と記載する。 Here, a method of designating a tracking target will be described with reference to FIG. The user refers to the image data 150 displayed on the output device, and specifies the tracking target by setting an area surrounding the tracking target. In the following description, the area where the tracking target exists is described as the tracking target area.

認識モジュール１１０は、追跡対象領域によって追跡対象が指定された初期画像データ１５０の入力を受け付ける。このとき、認識モジュール１１０は、追跡対象領域の中心位置及びサイズ等を追跡対象領域の設定情報として保持する。 The recognition module 110 accepts the input of the initial image data 150 whose tracking target is designated by the tracking target area. At this time, the recognition module 110 holds the center position, size, and the like of the tracking target area as setting information of the tracking target area.

以下の説明では、画像データ１５０の追跡対象でない部分を背景と記載し、また、追跡対象領域を除く領域を背景領域と記載する。 In the following description, the portion of the image data 150 that is not the tracking target is described as the background, and the area excluding the tracking target area is described as the background area.

なお、追跡対象の指定は、認識モジュール１１０が初期画像データ１５０を受け付けた後に行われてもよい。例えば、認識モジュール１１０は、初めて画像データ１５０を受け付けた場合、図４に示すような画面を出力装置に表示し、追跡対象領域の設定を促す表示を行う。 The tracking target may be specified after the recognition module 110 receives the initial image data 150. For example, when the recognition module 110 receives the image data 150 for the first time, the recognition module 110 displays a screen as shown in FIG. 4 on the output device, and displays a display prompting the setting of the tracking target area.

本実施例では、初期画像データ１５０の数は一つであるが、複数でもよい。ただし、初期画像データ１５０の数は従来の機械学習で用いる学習用データの数より十分少ないものとする。 In this embodiment, the number of initial image data 150 is one, but it may be plural. However, the number of initial image data 150 is sufficiently smaller than the number of learning data used in conventional machine learning.

次に、認識モジュール１１０は初期化処理を実行する（ステップＳ１０２）。 Next, the recognition module 110 executes the initialization process (step S102).

具体的には、認識モジュール１１０は、追跡対象に分類されたブロックを格納する記憶領域及び背景に分類されたブロックを格納する記憶領域を初期化する。以下の説明では、追跡対象に分類されたブロックを格納する記憶領域を第１記憶領域と記載し、背景に分類されたブロックを格納する記憶領域を第２記憶領域と記載する。 Specifically, the recognition module 110 initializes a storage area for storing the blocks classified as the tracking target and a storage area for storing the blocks classified in the background. In the following description, the storage area for storing the blocks classified as the tracking target is described as the first storage area, and the storage area for storing the blocks classified as the background is described as the second storage area.

次に、認識モジュール１１０は、初期画像データ１５０を用いて初期学習用データを生成し、学習サーバ１０１に送信する（ステップＳ１０３）。具体的には、認識モジュール１１０は、以下のような処理を実行する。 Next, the recognition module 110 generates initial learning data using the initial image data 150 and transmits it to the learning server 101 (step S103). Specifically, the recognition module 110 executes the following processing.

（処理１）認識モジュール１１０は、図５に示すように初期画像データ１５０を任意のサイズの領域に分割する。以下の説明では、一つの領域をブロックと記載する。ブロックのサイズは、画素以上かつ追跡対象より小さくなるように設定される。例えば、画像データのサイズが６４０×４８０の場合、認識モジュール１１０は、縦及び横をそれぞれ３２分割することによって、３００個のブロックを生成する。分割数は予め設定されているものとする。ただし、分割数は任意の値に変更できる。なお、ブロック内の矩形は画素を表す。 (Process 1) The recognition module 110 divides the initial image data 150 into regions of an arbitrary size as shown in FIG. In the following description, one area is referred to as a block. The size of the block is set to be larger than the pixel and smaller than the tracking target. For example, when the size of the image data is 640 × 480, the recognition module 110 generates 300 blocks by dividing the image data into 32 vertically and 32 horizontally. It is assumed that the number of divisions is set in advance. However, the number of divisions can be changed to any value. The rectangle in the block represents a pixel.

（処理２）認識モジュール１１０は、各ブロックの特徴量を算出する。例えば、式（１）に示すような、ブロックに含まれる各画素のＲＧＢ値の組合せがブロックの特徴量ｆ_ｉとして算出される。この場合、画像データ１５０の特徴量Ｆは式（２）のように表される。 (Process 2) The recognition module 110 calculates the feature amount of each block. For example, as shown in the equation (1), the combination of the RGB values of each pixel included in the block is calculated as the feature amount _fi of the block. In this case, the feature amount F of the image data 150 is expressed by the equation (2).

ここで、ｉはブロックの識別番号であり、１からｎまでの値を取る。なお、ｎは１より大きい整数とする。ｘ及びｙはブロック内の相対的な画素の位置を示す座標である。 Here, i is a block identification number and takes a value from 1 to n. Note that n is an integer larger than 1. x and y are coordinates indicating the relative positions of pixels in the block.

（処理３）認識モジュール１１０は、追跡対象領域に含まれるブロックに、追跡対象であることを示す教師信号を付与する。また、認識モジュール１１０は、背景領域に含まれるブロックに、背景であることを示す教師信号を付与する。本実施例では、ブロックの一部が追跡対象領域に含まれる場合、当該ブロックは追跡対象領域に含まれるブロックとして扱われる。以下の説明では、追跡対象であることを示す教師信号を第１教師信号と記載し、背景であることを示す教師信号を第２教師信号と記載する。 (Process 3) The recognition module 110 gives a teacher signal indicating that the tracking target is a tracking target to the block included in the tracking target area. Further, the recognition module 110 gives a teacher signal indicating that the background is a background to the block included in the background area. In this embodiment, when a part of the block is included in the tracking target area, the block is treated as a block included in the tracking target area. In the following description, the teacher signal indicating that it is a tracking target is described as a first teacher signal, and the teacher signal indicating that it is a background is described as a second teacher signal.

ここで、識別番号がｉのブロックに付与する教師信号ｔ_ｉを式（３）のように定義した場合、初期学習用データＳは式（４）のように与えられる。 Here, when the teacher signal ti given to the block whose identification number is _i is defined as in the equation (3), the initial learning data S is given as in the equation (4).

（処理４）認識モジュール１１０は、式（４）に示す初期学習用データＳを学習サーバ１０１に送信する。以上がステップＳ１０３の説明である。 (Process 4) The recognition module 110 transmits the initial learning data S represented by the equation (4) to the learning server 101. The above is the description of step S103.

次に、認識モジュール１１０は、初期学習が終了したか否かを判定する（ステップＳ１０４）。 Next, the recognition module 110 determines whether or not the initial learning is completed (step S104).

具体的には、認識モジュール１１０は、学習サーバ１０１によって生成された識別器を受信したか否かを判定する。本実施例の識別器は、ブロックを追跡対象及び背景のいずれかに分類する。 Specifically, the recognition module 110 determines whether or not the classifier generated by the learning server 101 has been received. The classifier of this example classifies blocks as either tracked or background.

初期学習が終了していないと判定された場合、認識モジュール１１０は、初期学習が終了するまで待ち状態に移行する。 If it is determined that the initial learning has not been completed, the recognition module 110 shifts to the waiting state until the initial learning is completed.

初期学習が終了したと判定された場合、認識モジュール１１０は、新たな画像データ１５０を取得し（ステップＳ１０５）、ブロックのループ処理を開始する（ステップＳ１０６）。 When it is determined that the initial learning is completed, the recognition module 110 acquires new image data 150 (step S105) and starts the block loop processing (step S106).

具体的には、認識モジュール１１０は、取得した画像データ１５０をブロックに分割し、各ブロックの特徴量を算出する。また、認識モジュール１１０は、ブロックの中からターゲットブロックを選択する。また、認識モジュール１１０は、追跡対象領域の設定情報に基づいて検索領域を設定する。本実施例では、検索領域の中心位置は追跡対象領域の中心位置と同一に設定され、検索領域のサイズは追跡対象領域のサイズより大きくなるように設定される。例えば、検索領域のサイズは、追跡対象領域のサイズの１．５倍に設定される。認識モジュール１１０は、一時的に検索領域の設定情報を保持する。 Specifically, the recognition module 110 divides the acquired image data 150 into blocks and calculates the feature amount of each block. Further, the recognition module 110 selects a target block from the blocks. Further, the recognition module 110 sets the search area based on the setting information of the tracking target area. In this embodiment, the center position of the search area is set to be the same as the center position of the tracking target area, and the size of the search area is set to be larger than the size of the tracking target area. For example, the size of the search area is set to 1.5 times the size of the tracked area. The recognition module 110 temporarily holds the setting information of the search area.

認識モジュール１１０は、ターゲットブロックが検索領域に含まれるか否かを判定する（ステップＳ１０７）。 The recognition module 110 determines whether or not the target block is included in the search area (step S107).

具体的には、認識モジュール１１０は、画像データ１５０内のターゲットブロックの位置及び検索領域の設定情報に基づいて、ターゲットブロックが検索領域に含まれるか否かを判定する。 Specifically, the recognition module 110 determines whether or not the target block is included in the search area based on the position of the target block in the image data 150 and the setting information of the search area.

ターゲットブロックが検索領域に含まれないと判定された場合、認識モジュール１１０は、ターゲットブロックを背景に分類し、ターゲットブロックを第２記憶領域に格納する（ステップＳ１１１）。その後、認識モジュール１１０はステップＳ１１２に進む。 When it is determined that the target block is not included in the search area, the recognition module 110 classifies the target block as a background and stores the target block in the second storage area (step S111). After that, the recognition module 110 proceeds to step S112.

ターゲットブロックが検索領域に含まれると判定された場合、認識モジュール１１０は、認識処理を実行する（ステップＳ１０８）。 When it is determined that the target block is included in the search area, the recognition module 110 executes the recognition process (step S108).

具体的には、認識モジュール１１０は、識別器にターゲットブロックの特徴量を入力し、識別器から得られた値に基づいて、ターゲットブロックが追跡対象の一部を含むか否かを判定する。ターゲットブロックが追跡対象の一部を含む場合、ターゲットブロックは追跡対象に分類される。ターゲットブロックが追跡対象の一部を含まない場合、ターゲットブロックは背景に分類される。当該分類の結果によって画像データ中の追跡対象の追跡が可能となる。 Specifically, the recognition module 110 inputs the feature amount of the target block to the discriminator, and determines whether or not the target block includes a part of the tracking target based on the value obtained from the discriminator. If the target block contains part of the tracked object, the target block is classified as tracked. If the target block does not contain part of the tracked object, the target block is classified as background. The result of the classification enables tracking of the tracking target in the image data.

ターゲットブロックが背景に分類された場合、認識モジュール１１０は、ターゲットブロックを第２記憶領域に格納する（ステップＳ１１１）。その後、認識モジュール１１０はステップＳ１１２に進む。 When the target block is classified as the background, the recognition module 110 stores the target block in the second storage area (step S111). After that, the recognition module 110 proceeds to step S112.

ターゲットブロックが追跡対象に分類された場合、認識モジュール１１０は、ターゲットブロックを第１記憶領域に格納する（ステップＳ１１０）。その後、認識モジュール１１０はステップＳ１１２に進む。 When the target block is classified as a tracking target, the recognition module 110 stores the target block in the first storage area (step S110). After that, the recognition module 110 proceeds to step S112.

ステップＳ１１２では、認識モジュール１１０は、全てのブロックに対して処理が完了したか否かを判定する（ステップＳ１１２）。 In step S112, the recognition module 110 determines whether or not the processing is completed for all the blocks (step S112).

全てのブロックに対して処理が完了していないと判定された場合、認識モジュール１１０は、ステップＳ１０６に戻り、新たなターゲットブロックを選択し、同様の処理を実行する。 If it is determined that the processing is not completed for all the blocks, the recognition module 110 returns to step S106, selects a new target block, and executes the same processing.

全てのブロックに対して処理が完了したと判定された場合、認識モジュール１１０は、追跡対象領域を更新する（ステップＳ１１３）。 When it is determined that the processing is completed for all the blocks, the recognition module 110 updates the tracking target area (step S113).

具体的には、認識モジュール１１０は、画像データ１５０内の追跡対象に分類されたブロックの分布に基づいて追跡対象領域を推定する。認識モジュール１１０は、推定結果に基づいて追跡対象領域の設定情報を更新する。 Specifically, the recognition module 110 estimates the tracking target area based on the distribution of the blocks classified into the tracking target in the image data 150. The recognition module 110 updates the setting information of the tracking target area based on the estimation result.

例えば、追跡対象に分類された全てのブロックを囲む領域を追跡対象領域として推定する。また、認識モジュール１１０は、追跡対象に分類されたブロックの重心を算出し、重心を基準とする任意のサイズの矩形を追跡対象領域として推定する。なお、追跡対象領域の推定方法は一例であってこれに限定されない。 For example, the area surrounding all the blocks classified as the tracking target is estimated as the tracking target area. Further, the recognition module 110 calculates the center of gravity of the block classified as the tracking target, and estimates a rectangle of an arbitrary size based on the center of gravity as the tracking target area. The method for estimating the tracking target area is an example and is not limited to this.

次に、認識モジュール１１０は、追跡対象の認識結果を出力する（ステップＳ１１４）。 Next, the recognition module 110 outputs the recognition result of the tracking target (step S114).

例えば、認識モジュール１１０は、追跡対象の認識結果を表示するための表示情報を生成し、出力装置に出力する。当該表示情報には、例えば、追跡対象と判定されたブロック及び追跡対象領域等を表示するためのデータが含まれる。 For example, the recognition module 110 generates display information for displaying the recognition result of the tracking target and outputs the display information to the output device. The display information includes, for example, data for displaying a block determined to be a tracking target, a tracking target area, and the like.

次に、認識モジュール１１０は、認識が行われた画像データ１５０を用いて追加学習用データを生成し、学習サーバ１０１に送信する（ステップＳ１１５）。 Next, the recognition module 110 generates additional learning data using the recognized image data 150 and transmits it to the learning server 101 (step S115).

具体的には、認識モジュール１１０は、追跡対象と判定されたブロックに第１教師信号を付与し、背景と判定されたブロックに第２教師信号を付与することによって追加学習用データを生成する。なお、追加学習用データの構造は、初期学習用データＳと同一である。 Specifically, the recognition module 110 generates additional learning data by adding a first teacher signal to the block determined to be the tracking target and adding a second teacher signal to the block determined to be the background. The structure of the additional learning data is the same as that of the initial learning data S.

次に、認識モジュール１１０は、追跡対象の追跡が終了したか否かを判定する（ステップＳ１１６）。例えば、追跡の終了指示を受け付けた場合、認識モジュール１１０は、追跡が終了したと判定する。 Next, the recognition module 110 determines whether or not the tracking of the tracking target has been completed (step S116). For example, when the tracking end instruction is received, the recognition module 110 determines that the tracking is completed.

追跡対象の追跡が終了したと判定された場合、認識モジュール１１０は処理を終了する。 When it is determined that the tracking of the tracking target is completed, the recognition module 110 ends the process.

追跡対象の追跡が終了していないと判定された場合、認識モジュール１１０は、ステップＳ１０５に戻り、同様の処理を実行する。認識モジュール１１０は、学習サーバ１０１によって逐次更新される識別器を用いて同様の処理を実行する。なお、ステップＳ１０５に戻る場合に、認識モジュール１１０は、更新された識別器を受信するまで待ち状態に移行してもよい。 If it is determined that the tracking of the tracking target has not been completed, the recognition module 110 returns to step S105 and executes the same process. The recognition module 110 executes the same process by using the classifier that is sequentially updated by the learning server 101. When returning to step S105, the recognition module 110 may shift to the waiting state until the updated classifier is received.

図６は、実施例１の学習モジュール１２０が実行する処理を説明するフローチャートである。 FIG. 6 is a flowchart illustrating a process executed by the learning module 120 of the first embodiment.

学習サーバ１０１は、認識サーバ１００から初期学習用データ又は追加学習用データを受信した場合、以下で説明する処理を開始する。以下の説明では、初期学習用データ及び追加学習用データを区別しない場合、学習用データと記載する。 When the learning server 101 receives the initial learning data or the additional learning data from the recognition server 100, the learning server 101 starts the process described below. In the following description, when the initial learning data and the additional learning data are not distinguished, they are described as learning data.

学習モジュール１２０は、学習用データを用いて深層学習による学習を行う（ステップＳ２０１）。当該学習によって、ブロック単位の追跡対象及び背景を学習できる。 The learning module 120 performs learning by deep learning using the learning data (step S201). By this learning, it is possible to learn the tracking target and the background in block units.

なお、本発明は、使用する深層学習の手法に限定されない。例えば、ＲＢＭ（ＲｅｓｔｒｉｃｔｅｄＢｏｌｔｚｍａｎｎＭａｃｈｉｎｅ）を用いることが考えられる。 The present invention is not limited to the deep learning method used. For example, it is conceivable to use RBM (Restricted Boltzmann Machine).

次に、学習モジュール１２０は、学習結果に基づいて識別器を生成又は更新し、認識サーバ１００に識別器を送信する（ステップＳ２０２）。その後、学習モジュール１２０は処理を終了する。 Next, the learning module 120 generates or updates the classifier based on the learning result, and transmits the classifier to the recognition server 100 (step S202). After that, the learning module 120 ends the process.

具体的には、初期学習用データを受信した場合、学習モジュール１２０は識別器を生成し、認識サーバ１００に生成された識別器を送信する。また、追加学習用データを受信した場合、学習モジュール１２０は識別器を更新し、認識サーバ１００に更新された識別器を送信する。 Specifically, when the initial learning data is received, the learning module 120 generates a discriminator and transmits the generated discriminator to the recognition server 100. Further, when the additional learning data is received, the learning module 120 updates the classifier and transmits the updated classifier to the recognition server 100.

従来の学習では、追跡対象全体を認識するための識別器（認識モデル）が生成されていた。このような識別器を生成するためには、多数の学習用データを用いた学習を行う必要がある。そのため、多数の学習用データを用意するためのコストがかかり、また、学習時間が非常に長いという問題があった。 In conventional learning, a classifier (recognition model) for recognizing the entire tracked object has been generated. In order to generate such a classifier, it is necessary to perform learning using a large amount of learning data. Therefore, there is a problem that it is costly to prepare a large amount of learning data and the learning time is very long.

一方、本実施例の学習サーバ１０１は、一つの初期学習用データを分割したブロックの特徴量を用いて深層学習による学習を行う。すなわち、学習サーバ１０１は、追跡対象領域に含まれる複数のブロックを追跡対象として学習する。したがって、多数の学習用データを用意する必要がなく、また、学習時間が短い。追加学習用データを用いた学習も同様に高速に行うことができる。 On the other hand, the learning server 101 of this embodiment performs learning by deep learning using the feature amount of the block obtained by dividing one initial learning data. That is, the learning server 101 learns a plurality of blocks included in the tracking target area as tracking targets. Therefore, it is not necessary to prepare a large amount of learning data, and the learning time is short. Learning using additional learning data can also be performed at high speed.

従来の認識処理では、追跡対象全体の認識が行われる。追跡対象全体が一つの特徴量として扱われるため、初めて入力された画像データのみを用いた学習で生成された識別器は認識精度が非常に低い。したがって、当該識別器を用いても迅速な追跡対象の追跡を開始できない。 In the conventional recognition process, the entire tracking target is recognized. Since the entire tracking target is treated as one feature quantity, the discriminator generated by learning using only the image data input for the first time has very low recognition accuracy. Therefore, even if the classifier is used, rapid tracking of the tracking target cannot be started.

一方、本実施例の認識サーバ１００は、検索領域に含まれるブロックに対して追跡対象の認識を行う。ブロックの特徴量は、追跡対象の移動及び追跡対象の形状の変化に対する変化量が小さいものと考えられる。したがって、追跡対象が指定された直後に追跡対象の認識を行わせる場合でも、一定の認識精度を保つことができる。したがって、初期学習用データを用いて生成された識別器を用いて迅速な追跡対象の追跡を開始できる。また、認識処理の対象となるブロックは、検索領域に含まれるブロックのみであるため、リアルタイムな追跡対象の追跡が可能となる。 On the other hand, the recognition server 100 of this embodiment recognizes the tracking target for the block included in the search area. It is considered that the feature amount of the block has a small change amount with respect to the movement of the tracking target and the change in the shape of the tracking target. Therefore, it is possible to maintain a certain recognition accuracy even when the tracking target is recognized immediately after the tracking target is designated. Therefore, rapid tracking of the tracked object can be started using the classifier generated using the initial training data. Further, since the block to be recognized is only the block included in the search area, real-time tracking of the tracking target is possible.

なお、追跡の継続中にも学習サーバ１０１によって識別器が更新されるため、時間の経過とともに認識精度が高くなることは明らかである。 Since the discriminator is updated by the learning server 101 even during the continuation of tracking, it is clear that the recognition accuracy increases with the passage of time.

以上で説明したように、本実施例によれば、多数の学習用データを用いた学習を行うことなく、追跡対象の追跡を迅速に開始するシステム、画像認識方法、及び計算機を実現できる。 As described above, according to the present embodiment, it is possible to realize a system, an image recognition method, and a computer that quickly start tracking of a tracking target without performing learning using a large amount of learning data.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。また、例えば、上記した実施例は本発明を分かりやすく説明するために構成を詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、各実施例の構成の一部について、他の構成に追加、削除、置換することが可能である。 The present invention is not limited to the above-described embodiment, and includes various modifications. Further, for example, the above-described embodiment describes the configuration in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the described configurations. Further, it is possible to add, delete, or replace a part of the configuration of each embodiment with other configurations.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、本発明は、実施例の機能を実現するソフトウェアのプログラムコードによっても実現できる。この場合、プログラムコードを記録した記憶媒体をコンピュータに提供し、そのコンピュータが備えるプロセッサが記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施例の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、光ディスク、光磁気ディスク、ＣＤ－Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどが用いられる。 Further, each of the above configurations, functions, processing units, processing means and the like may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. The present invention can also be realized by a software program code that realizes the functions of the examples. In this case, a storage medium in which the program code is recorded is provided to the computer, and the processor included in the computer reads out the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the function of the above-described embodiment, and the program code itself and the storage medium storing it constitute the present invention. Examples of the storage medium for supplying such a program code include a flexible disk, a CD-ROM, a DVD-ROM, a hard disk, an SSD (Solid State Drive), an optical disk, a magneto-optical disk, a CD-R, and a magnetic tape. Non-volatile memory cards, ROMs, etc. are used.

また、本実施例に記載の機能を実現するプログラムコードは、例えば、アセンブラ、Ｃ／Ｃ＋＋、ｐｅｒｌ、Ｓｈｅｌｌ、ＰＨＰ、Ｊａｖａ（登録商標）等の広範囲のプログラム又はスクリプト言語で実装できる。 In addition, the program code that realizes the functions described in this embodiment can be implemented in a wide range of programs or script languages such as assembler, C / C ++, perl, Shell, PHP, and Java (registered trademark).

さらに、実施例の機能を実現するソフトウェアのプログラムコードを、ネットワークを介して配信することによって、それをコンピュータのハードディスクやメモリ等の記憶手段又はＣＤ－ＲＷ、ＣＤ－Ｒ等の記憶媒体に格納し、コンピュータが備えるプロセッサが当該記憶手段や当該記憶媒体に格納されたプログラムコードを読み出して実行するようにしてもよい。 Further, by distributing the program code of the software that realizes the functions of the embodiment via the network, the program code is stored in a storage means such as a hard disk or a memory of a computer or a storage medium such as a CD-RW or a CD-R. The processor included in the computer may read and execute the program code stored in the storage means or the storage medium.

上述の実施例において、制御線や情報線は、説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に接続されていてもよい。 In the above-described embodiment, the control lines and information lines show what is considered necessary for explanation, and do not necessarily indicate all the control lines and information lines in the product. All configurations may be interconnected.

１００認識サーバ
１０１学習サーバ
１０５撮像装置
１０６データベース
１１０認識モジュール
１２０学習モジュール
１５０画像データ
２０１演算装置
２０２記憶装置
２０３ネットワークインタフェース
２０４Ｉ／Ｏインタフェース 100 Recognition server 101 Learning server 105 Imaging device 106 Database 110 Recognition module 120 Learning module 150 Image data 201 Arithmetic device 202 Storage device 203 Network interface 204 I / O interface

Claims

A system including a recognition server that executes a recognition process for tracking a tracking target included in image data and a learning server that learns a classifier used for the recognition process.
The recognition server has a first arithmetic unit, a first storage device connected to the first arithmetic unit, and a first interface connected to the first arithmetic unit.
The learning server has a second arithmetic unit, a second storage device connected to the second arithmetic unit, and a second interface connected to the second arithmetic unit .
The recognition server is
The initial image data specified by the tracking target is received, and the initial image data is divided into a plurality of blocks having a size smaller than the tracking target.
The feature amount of each block included in the initial image data is calculated, and the feature amount is calculated.
Initial learning data is generated by applying a teacher signal indicating whether or not the block includes a part of the tracking target to the feature amount of each block included in the initial image data.
The initial learning data is transmitted to the learning server, and the initial learning data is transmitted to the learning server.
The learning server is
By performing learning using the initial learning data, identification that classifies each block included in the received image data into either a block containing a part of the tracking target or a block not including the tracking target. Generate a vessel,
Send the classifier to the recognition server and
The recognition server is
Using the classifier, the recognition process for the received image data is executed.
A system characterized by outputting the result of the recognition process.

The system according to claim 1.
The first storage device stores setting information of a tracking target area, which is an area including the tracking target of the image data.
The recognition server is
When new image data is received after receiving the classifier from the learning server, the new image data is divided into the plurality of blocks.
The feature amount of each block included in the new image data is calculated, and the feature amount is calculated.
Based on the setting information of the tracking target area, the search area, which is the target area of the recognition process, is set.
Select the target block from the plurality of blocks included in the new image data, and select the target block.
It is determined whether or not the target block is a block included in the search area, and it is determined.
If it is determined that the target block is not a block included in the search area, the target block is classified into a block that does not include a part of the tracking target.
When it is determined that the target block is a block included in the search area, the feature amount of the target block is input to the classifier.
Based on the output of the classifier, the target block is classified into either a block containing a part of the tracking target or a block not including a part of the tracking target.
A new tracked area is estimated based on the distribution of blocks containing a part of the tracked object.
A system characterized in that the setting information of the tracking target area is updated based on the estimation result of the new tracking target area.

The system according to claim 2.
The learning server is
When the additional learning data generated by the recognition server is received, the discriminator is updated by performing learning using the additional learning data.
The updated classifier is sent to the recognition server and
The recognition server is
The additional learning data is generated by adding the teacher signal corresponding to the result of the classification to the feature quantities of the plurality of blocks included in the new image data.
The additional learning data is transmitted to the learning server, and the data is transmitted to the learning server.
Using the updated classifier, the recognition process for the received image data is executed.
A system characterized by outputting the result of the recognition process.

An image recognition method performed by a system with multiple computers.
The system includes a recognition server that executes a recognition process for tracking a tracking target included in image data and a learning server that learns a classifier used for the recognition process.
The recognition server has a first arithmetic unit, a first storage device connected to the first arithmetic unit, and a first interface connected to the first arithmetic unit.
The learning server has a second arithmetic unit, a second storage device connected to the second arithmetic unit, and a second interface connected to the second arithmetic unit.
The image recognition method is
A step in which the recognition server receives the initial image data to which the tracking target is specified and divides the initial image data into a plurality of blocks having a size smaller than the tracking target.
A step in which the recognition server calculates the feature amount of each block included in the initial image data, and
A step of generating initial learning data by applying a teacher signal indicating whether or not the recognition server is a block including a part of the tracking target to the feature amount of each block included in the initial image data. When,
A step in which the recognition server transmits the initial learning data to the learning server,
When the learning server performs learning using the initial learning data , each block included in the received image data is either a block containing a part of the tracking target or a block not including the tracking target. Steps to generate a classifier to classify into
A step in which the learning server transmits the classifier to the recognition server,
A step in which the recognition server executes the recognition process for the received image data using the classifier.
An image recognition method comprising the step of outputting the result of the recognition process by the recognition server.

The image recognition method according to claim 4.
The first storage device stores setting information of a tracking target area indicating an area including the tracking target of the image data.
The image recognition method is
When the recognition server receives the new image data after receiving the classifier, the step of dividing the new image data into the plurality of blocks and the step of dividing the new image data into the plurality of blocks.
A step in which the recognition server calculates the feature amount of each block included in the new image data, and
A step in which the recognition server sets a search area, which is an area to be recognized, based on the setting information of the tracking target area.
A step in which the recognition server selects a target block from a plurality of blocks included in the new image data.
A step in which the recognition server determines whether or not the target block is a block included in the search area.
When the recognition server determines that the target block is not a block included in the search area, the step of classifying the target block into a block not including the tracking target, and
When the recognition server determines that the target block is a block included in the search area, the feature amount of the target block is input to the classifier, and the target block is based on the output of the classifier. Is classified into either a block containing a part of the tracking target or a block not including the tracking target.
A step in which the recognition server estimates a new tracking target area based on the distribution of blocks containing a part of the tracking target.
An image recognition method comprising the step of updating the setting information of the tracking target area based on the estimation result of the new tracking target area.

The image recognition method according to claim 5.
A step in which the recognition server generates additional learning data by adding the teacher signal corresponding to the result of the classification to the feature quantities of the plurality of blocks included in the new image data.
A step in which the recognition server transmits the additional learning data to the learning server,
When the learning server receives the additional learning data, the step of updating the classifier by performing learning using the additional learning data, and
A step in which the learning server sends the updated classifier to the recognition server,
A step in which the recognition server executes the recognition process for the received image data by using the updated classifier.
An image recognition method comprising the step of outputting the result of the recognition process by the recognition server.

It is a computer that executes a recognition process that recognizes the tracking target included in the image data.
It has an arithmetic unit, a storage device connected to the arithmetic unit, and an interface connected to the arithmetic unit.
The arithmetic unit is
As the information for designating the tracking target, the initial image data in which the tracking target area including the tracking target is set is received, and the setting information of the tracking target area is stored in the storage device.
The initial image data is divided into a plurality of blocks having a size smaller than the tracking target, and the initial image data is divided into a plurality of blocks.
The feature amount of each block included in the initial image data is calculated, and the feature amount is calculated.
Initial learning data is generated by applying a teacher signal indicating whether or not the block includes a part of the tracking target to the feature amount of each block included in the initial image data.
The initial learning data is output to a learning unit that learns the classifier used for the recognition process.
New image data after receiving a classifier that classifies each block included in the image data generated by the learning unit into either a block containing a part of the tracking target or a block not including the tracking target. Is received, the new image data is divided into the plurality of blocks, and the new image data is divided into the plurality of blocks.
The feature amount of each block included in the new image data is calculated, and the feature amount is calculated.
Based on the setting information of the tracking target area, the search area, which is the target area of the recognition process, is set.
Select the target block from the plurality of blocks included in the new image data, and select the target block.
It is determined whether or not the target block is a block included in the search area, and it is determined.
When it is determined that the target block is not a block included in the search area, the target block is classified into a block not including the tracking target.
When it is determined that the target block is a block included in the search area, the feature amount of the target block is input to the classifier.
Based on the output of the classifier, the target block is classified into either a block containing a part of the tracking target or a block not including the tracking target.
A new track target area is estimated based on the distribution of blocks containing the track target, and a new track target area is estimated.
Based on the result of the estimation of the new tracking target area, the setting information of the tracking target area is updated.
A computer characterized by outputting the result of the recognition process including the result of the classification.

The computer according to claim 7.
The arithmetic unit is
Additional learning data is generated by adding the teacher signal corresponding to the result of the classification to the feature quantities of the plurality of blocks included in the new image data.
The additional learning data is output to the learning unit, and the data is output to the learning unit.
A computer characterized in that the recognition process for the received image data is executed by using the classifier updated by the learning unit.