JP7417504B2

JP7417504B2 - Similar image difference extraction device, similar image difference extraction method, program and recording medium

Info

Publication number: JP7417504B2
Application number: JP2020177786A
Authority: JP
Inventors: 晃治中山; 康二西田
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2020-10-23
Filing date: 2020-10-23
Publication date: 2024-01-18
Anticipated expiration: 2040-10-23
Also published as: JP2022068941A

Description

本開示は、類似画像差分抽出装置に関する。 The present disclosure relates to a similar image difference extraction device.

近年、ユーザが指定した画像と類似する類似画像を検索する類似画像検索技術が注目されている。例えば、製造業などの分野では、製品の設計を行う際に、設計図面と類似する図面を検索することで、過去に設計した図面を参照することができ、製品設計及びその見積などの効率化を図ることが可能となる。 In recent years, similar image search technology that searches for similar images similar to an image specified by a user has been attracting attention. For example, in fields such as the manufacturing industry, when designing a product, you can search for drawings similar to the design drawing and refer to drawings designed in the past, improving the efficiency of product design and estimates. It becomes possible to aim for.

また、類似画像検索技術では、更なる効率化のために、指定された画像と検索された類似画像との差分を直感的に把握できるようにする技術が望まれている。 Furthermore, in similar image search technology, in order to further improve efficiency, a technology that allows users to intuitively grasp the difference between a designated image and a searched similar image is desired.

これに対して特許文献１には、２つの画像のそれぞれに対して所定のバイナリー記述子を算出し、それらのバイナリー記述子に基づいて、各画像のオブジェクトの位置、サイズ、角度及び歪みなどの差異を補正して、各画像を比較しやすいように変形する画像レジストレーションの方法が開示されている。 On the other hand, in Patent Document 1, predetermined binary descriptors are calculated for each of two images, and based on these binary descriptors, the position, size, angle, distortion, etc. of objects in each image are calculated. An image registration method is disclosed that corrects differences and transforms each image to facilitate comparison.

特開２０１８－２８８９９号公報Japanese Patent Application Publication No. 2018-28899

しかしながら、特許文献１では、２つの画像を比較しやすいように変形しているだけなので、ユーザはそれらを目視で比較して差分を特定しなければならず、差分を把握する効率は十分とはいえない。また、特許文献１には、ユーザがバイナリー記述子の種類を画像に応じて指定しなければならないという問題点もある。 However, in Patent Document 1, the two images are simply transformed to make them easier to compare, so the user must visually compare them to identify the difference, and the efficiency of understanding the difference is not sufficient. I can't say that. Furthermore, Patent Document 1 also has a problem in that the user has to specify the type of binary descriptor depending on the image.

本開示の目的は、画像間の差分を効率よく把握することが可能な類似画像差分抽出装置、類似画像差分抽出方法、プログラム及び記録媒体を提供することにある。 An object of the present disclosure is to provide a similar image difference extraction device, a similar image difference extraction method, a program, and a recording medium that can efficiently grasp differences between images.

本開示の一態様に従う類似画像差分抽出装置は、複数の対象画像のそれぞれの特徴を示す特徴マップである複数の対象特徴マップを生成する特徴抽出部と、各対象特徴マップの位置ごとの類似度を示す類似度マップを生成する類似度算出部と、前記類似度マップに応じた差分抽出画像を生成する画像処理部と、を有する。 A similar image difference extraction device according to one aspect of the present disclosure includes a feature extraction unit that generates a plurality of target feature maps that are feature maps indicating characteristics of each of a plurality of target images, and a similarity degree for each position of each target feature map. and an image processing section that generates a difference extraction image according to the similarity map.

本発明によれば、画像間の差分を効率よく把握することが可能になる。 According to the present invention, it is possible to efficiently understand differences between images.

本開示の一実施形態による類似画像差分抽出システムを示す図である。FIG. 1 is a diagram illustrating a similar image difference extraction system according to an embodiment of the present disclosure. 画像レジストレーションモデルの一例を示す図である。FIG. 3 is a diagram showing an example of an image registration model. 画像特徴抽出モデルの一例を示す図である。FIG. 3 is a diagram showing an example of an image feature extraction model. 変換パラメータ回帰モデルの一例を示す図である。FIG. 3 is a diagram showing an example of a conversion parameter regression model. 画像レジストレーション部の動作の一例を説明するための図である。FIG. 3 is a diagram for explaining an example of the operation of an image registration section. 画像レジストレーション部の動作の一例を説明するためのフローチャートである。3 is a flowchart for explaining an example of the operation of an image registration section. 画像差分抽出モデルの一例を示す図である。FIG. 3 is a diagram showing an example of an image difference extraction model. 画像差分抽出部の動作の動作の一例を説明するためのフローチャートである。7 is a flowchart for explaining an example of the operation of an image difference extraction section. 画像差分抽出部フローを示す図である。It is a figure which shows the flow of an image difference extraction part. 類似画像差分抽出システム全体の動作の一例を説明するためのフローチャートである。2 is a flowchart for explaining an example of the operation of the entire similar image difference extraction system. 類似検索結果の一例を示す図である。It is a figure showing an example of a similar search result. 画像レジストレーション処理の結果の一例を示す図である。FIG. 3 is a diagram showing an example of the result of image registration processing. 画像差分抽出処理の結果の一例を示す図である。FIG. 3 is a diagram illustrating an example of the results of image difference extraction processing. 分割画像差分抽出処理の一例を説明するためのフローチャートである。7 is a flowchart for explaining an example of divided image difference extraction processing. 分割画像差分抽出処理の結果の一例を示す図である。FIG. 7 is a diagram illustrating an example of a result of divided image difference extraction processing. 学習処理の一例を説明するためのフローチャートである。3 is a flowchart for explaining an example of learning processing. 学習処理の他の例を説明するためのフローチャートである。12 is a flowchart for explaining another example of learning processing.

以下、本開示の実施形態について図面を参照して説明する。 Embodiments of the present disclosure will be described below with reference to the drawings.

＜システム構成＞
図１は、本開示の一実施形態による類似画像差分抽出システムを示す図である。図１に示すように類似画像差分抽出システムは、利用者端末１００と、画像処理サーバ１０４とを有する。利用者端末１００及び画像処理サーバ１０４は、ネットワーク１１１を介して、互いに通信可能に接続される。 <System configuration>
FIG. 1 is a diagram illustrating a similar image difference extraction system according to an embodiment of the present disclosure. As shown in FIG. 1, the similar image difference extraction system includes a user terminal 100 and an image processing server 104. The user terminal 100 and the image processing server 104 are connected to each other via a network 111 so that they can communicate with each other.

＜利用者端末＞
利用者端末１００は、入出力部１０１と、端末管理部１０２と、ネットワーク部１０３とを備える。利用者端末１００は、例えば、汎用的なＰＣ（Personal Computer）などで構成できる。 <User terminal>
The user terminal 100 includes an input/output section 101, a terminal management section 102, and a network section 103. The user terminal 100 can be configured with, for example, a general-purpose PC (Personal Computer).

＜入出力部＞
入出力部１０１は、例えば、マウス、キーボード及びディスプレイなどのユーザインターフェース装置で構成される。入出力部１０１は、ユーザから画像処理コマンドを受け付ける機能と、画像処理コマンドに応じた応答情報を表示する機能とを有する。応答情報は、例えば、ブラウザのようなアプリケーションプログラムを用いて表示される。 <Input/output section>
The input/output unit 101 includes, for example, a user interface device such as a mouse, a keyboard, and a display. The input/output unit 101 has a function of receiving image processing commands from a user and a function of displaying response information according to the image processing commands. The response information is displayed using an application program such as a browser.

＜端末管理部＞
端末管理部１０２は、入出力部１０１が受け付けた画像処理コマンドを、ネットワーク部１０３を介して画像処理サーバ１０４に送信する機能と、画像処理サーバ１０４からの応答情報を、ネットワーク部１０３を介して受信する機能とを有する。画像処理コマンドの送信及び応答情報の受信は、ここでは、ブラウザのようなアプリケーションプログラムを用いて、ＨＴＴＰなどのプロトコルにて行われるが、別の方法でもよい。なお、ネットワーク部１０３は、画像処理サーバ１０４と通信可能に接続する通信機能を有する通信部である。 <Terminal management department>
The terminal management unit 102 has a function of transmitting image processing commands received by the input/output unit 101 to the image processing server 104 via the network unit 103, and a function of transmitting response information from the image processing server 104 via the network unit 103. and a receiving function. Here, the image processing command is transmitted and the response information is received using an application program such as a browser using a protocol such as HTTP, but other methods may be used. Note that the network unit 103 is a communication unit that has a communication function to communicably connect to the image processing server 104.

上記の画像処理コマンドは、本実施形態では、画像検索コマンド、画像レジストレーションコマンド、差分抽出コマンド及び学習実行コマンドを含む。 In this embodiment, the above image processing commands include an image search command, an image registration command, a difference extraction command, and a learning execution command.

＜画像検索コマンド＞
画像検索コマンドは、ユーザが指定したクエリ画像と類似する類似画像の検索である画像検索処理の実行を指示するコマンドである。画像検索処理では、複数の類似画像が検索されてもよい。 <Image search command>
The image search command is a command that instructs execution of image search processing, which is a search for similar images similar to a query image specified by the user. In the image search process, a plurality of similar images may be searched.

＜画像レジストレーションコマンド＞
画像レジストレーションコマンドは、クエリ画像と類似画像とを含む画像群から指定された複数の指定画像に対する画像レジストレーション処理の実行を指示するコマンドである。画像レジストレーションについては、後述する。本実施形態では、複数の指定画像として２つの指定画像を例示するが、指定画像が３つ以上の場合でも適用可能である。 <Image registration command>
The image registration command is a command that instructs execution of image registration processing on a plurality of specified images specified from a group of images including the query image and similar images. Image registration will be described later. In this embodiment, two specified images are illustrated as the plurality of specified images, but the present invention is also applicable to a case where there are three or more specified images.

＜差分抽出コマンド＞
差分抽出コマンドは、複数の対象画像間の差分を抽出して可視化する画像差分抽出処理の実行を指示するコマンドである。本実施形態では、複数の対象画像として２つの対象画像を例示するが、対象画像が３つ以上の場合でも適用可能である。また、その２つ対象画像は、画像レジストレーション処理が実行された２つの指定画像である。なお、類似画像差分抽出システムは、画像レジストレーションコマンドと差分抽出コマンドとを自動的に連続して実行する連続モードと、それらのコマンドをユーザの指示に従って独立して実行する個別モードとを有し、ユーザがそれらのモードのいずれかを設定できる構成でもよい。本実施形態では、個別モードが設定されているものとする。 <Difference extraction command>
The difference extraction command is a command that instructs execution of image difference extraction processing that extracts and visualizes differences between a plurality of target images. In this embodiment, two target images are illustrated as the plurality of target images, but the present invention is also applicable to a case where there are three or more target images. Furthermore, the two target images are two specified images on which image registration processing has been performed. The similar image difference extraction system has a continuous mode in which image registration commands and difference extraction commands are automatically executed in succession, and an individual mode in which these commands are executed independently according to user instructions. , the configuration may be such that the user can set any of these modes. In this embodiment, it is assumed that the individual mode is set.

＜学習実行コマンド＞
学習実行コマンドは、画像レジストレーション処理及び画像差分抽出処理で使用される学習モデルを構築する学習処理の実行を指示するコマンドである。学習モデルについては、後述する。 <Learning execution command>
The learning execution command is a command that instructs execution of learning processing to construct a learning model used in image registration processing and image difference extraction processing. The learning model will be described later.

＜画像処理サーバ＞
画像処理サーバ１０４は、利用者端末１００からの画像処理コマンドに応じた種々の処理を行う類似画像差分抽出装置であり、サーバ管理部１０５、画像検索部１０６、画像レジストレーション部１０７、画像差分抽出部１０８、ネットワーク部１０９、及び、画像データベース１１０を有する。なお、以下で説明する画像処理サーバ１０４の各部が行う処理は、ＣＰＵ（Central Processing Unit）などのプロセッサがプログラムを実行することで実現されてもよい。また、プログラムは、例えば、半導体メモリ、磁気ディスク、光ディスク、磁気テープ及び光磁気ディスクなどのような、非一時的にデータを記憶する記録媒体１１２に記録させることもできる。なお、記録媒体１１２は、図１では、画像処理サーバ１０４とは別に設けられているが、画像処理サーバ１０４内に備わっていてもよい。 <Image processing server>
The image processing server 104 is a similar image difference extraction device that performs various processes according to image processing commands from the user terminal 100, and includes a server management section 105, an image search section 106, an image registration section 107, and an image difference extraction device. It has a section 108, a network section 109, and an image database 110. Note that the processing performed by each part of the image processing server 104 described below may be realized by a processor such as a CPU (Central Processing Unit) executing a program. Further, the program can also be recorded on a recording medium 112 that non-temporarily stores data, such as a semiconductor memory, a magnetic disk, an optical disk, a magnetic tape, and a magneto-optical disk. Note that although the recording medium 112 is provided separately from the image processing server 104 in FIG. 1, it may be provided within the image processing server 104.

＜サーバ管理部＞
サーバ管理部１０５は、利用者端末１００からネットワーク部１０９を介して画像処理コマンドを受信し、その画像処理コマンドに従って、画像検索部１０６、画像レジストレーション部１０７及び画像差分抽出部１０８に種々の処理を実行させる機能を有する。また、サーバ管理部１０５は、画像検索部１０６、画像レジストレーション部１０７及び画像差分抽出部１０８の処理結果を応答情報として、ネットワーク部１０９を介して利用者端末１００に送信する機能を有する。また、サーバ管理部１０５は、学習処理を実行して学習モデルを構築する学習管理部としての機能も有する。 <Server Management Department>
The server management unit 105 receives an image processing command from the user terminal 100 via the network unit 109, and performs various processing on the image search unit 106, image registration unit 107, and image difference extraction unit 108 according to the image processing command. It has the function to execute. The server management unit 105 also has a function of transmitting the processing results of the image search unit 106, image registration unit 107, and image difference extraction unit 108 to the user terminal 100 via the network unit 109 as response information. The server management unit 105 also has a function as a learning management unit that executes learning processing and constructs a learning model.

＜画像検索部＞
画像検索部１０６は、画像検索コマンドに従って、ユーザが指定したクエリ画像と類似する類似画像を画像データベース１１０から検索する画像検索処理を実行する機能を有する。画像データベース１１０は、複数の画像を予め記憶している。なお、画像データベース１１０は、画像処理サーバ１０４とは別のストレージ装置などに搭載されていてもよい。また、画像検索処理の具体的な方法については、種々の公知技術が存在するため、その詳細な説明は省略する。 <Image search section>
The image search unit 106 has a function of executing an image search process of searching the image database 110 for a similar image similar to a query image specified by the user in accordance with an image search command. The image database 110 stores a plurality of images in advance. Note that the image database 110 may be installed in a storage device or the like separate from the image processing server 104. Furthermore, since there are various known techniques for specific methods of image search processing, detailed explanation thereof will be omitted.

＜画像レジストレーション部＞
画像レジストレーション部１０７は、クエリ画像と画像検索部１０６にて検索された類似画像とを含む画像群の中から指定された２つの指定画像に対して画像レジストレーション処理を実行する機能を有する。画像レジストレーション処理は、２つの画像の間に存在する幾何学的な変換モデルを推定し、その変換モデルを用いて各画像間のサイズ、位置、角度及び歪みなどの幾何学的な差異を補正する処理である。画像レジストレーション処理を行うことで、２つの画像の差異の抽出が容易となる。 <Image registration section>
The image registration unit 107 has a function of performing image registration processing on two specified images specified from a group of images including the query image and the similar images searched by the image search unit 106. Image registration processing estimates a geometric transformation model that exists between two images, and uses that transformation model to correct geometric differences such as size, position, angle, and distortion between each image. This is the process of By performing image registration processing, it becomes easy to extract the difference between two images.

本実施形態では、画像レジストレーション部１０７は、一般的な画像処理による特徴量の算出とマッピングとを用いた変換モデルの推定を行わず、学習モデル（具体的には、ニューラルネットワークを用いた機械学習モデル）による変換モデルの推定を行う。以下、画像レジストレーション部１０７が使用する学習モデルを画像レジストレーションモデルと呼ぶこともある。 In this embodiment, the image registration unit 107 does not estimate a conversion model using feature calculation and mapping using general image processing, but uses a learning model (specifically, a learning model using a neural network). The conversion model is estimated using the learning model. Hereinafter, the learning model used by the image registration unit 107 may be referred to as an image registration model.

＜画像差分抽出部＞
画像差分抽出部１０８は、画像レジストレーション部１０７による画像レジストレーション処理が行われた２つの指定画像を対象画像として、それらの対象画像の差異を抽出して可視化する画像差分抽出処理を実行する機能を有する。本実施形態では、画像差分抽出部１０８は、画像レジストレーション部１０７と同様に、学習モデルとしてニューラルネットワークを用いた機械学習モデルを用いて、画像差分抽出処理を実行する。以下、画像差分抽出部１０８が使用する機械学習モデルを画像差分抽出モデルと呼ぶこともある。 <Image difference extraction section>
The image difference extraction unit 108 has a function of executing an image difference extraction process that extracts and visualizes the difference between the two specified images, which have been subjected to the image registration process by the image registration unit 107, as target images. has. In this embodiment, the image difference extraction unit 108, like the image registration unit 107, executes image difference extraction processing using a machine learning model using a neural network as a learning model. Hereinafter, the machine learning model used by the image difference extraction unit 108 may be referred to as an image difference extraction model.

以下、画像レジストレーション部１０７についてより詳細に説明する。 The image registration unit 107 will be described in more detail below.

＜画像レジストレーションモデル＞
図２は、画像レジストレーション部１０７が使用する画像レジストレーションモデルの一例を示す図である。図２に示す画像レジストレーションモデル２００は、画像特徴抽出モデル２０１ａ及び２０１ｂと、変換パラメータ回帰モデル２０２と、特徴関連性マップ算出部２０３とを有する。また、画像レジストレーション部１０７は、画像レジストレーションモデルの処理結果に基づいて、画像レジストレーション処理を実際に実行する実行部２０４を有する。 <Image registration model>
FIG. 2 is a diagram showing an example of an image registration model used by the image registration unit 107. The image registration model 200 shown in FIG. 2 includes image feature extraction models 201a and 201b, a conversion parameter regression model 202, and a feature relationship map calculation unit 203. The image registration unit 107 also includes an execution unit 204 that actually executes image registration processing based on the processing result of the image registration model.

画像レジストレーションモデル２００は、２つの指定画像を入力とし、その２つの指定画像に対して画像レジストレーション処理を実行するための変換モデルのパラメータである変換パラメータを推定する。変換モデルは、例えば、アフィン変換モデル又はホモグラフィ変換モデルなどである。画像レジストレーション処理では、変換モデルを用いて画像の拡大縮小、回転及び平行移動などの変形を行うことで、２つの指定画像の幾何学的な差異を補正することが可能となる。なお、２つの指定画像の一方が画像特徴抽出モデル２０１ａに入力され、他方が画像特徴抽出モデル２０１ａに入力される。 The image registration model 200 receives two designated images as input and estimates transformation parameters that are parameters of a transformation model for performing image registration processing on the two designated images. The transformation model is, for example, an affine transformation model or a homography transformation model. In image registration processing, it is possible to correct geometric differences between two specified images by performing transformations such as scaling, rotation, and translation of images using a transformation model. Note that one of the two specified images is input to the image feature extraction model 201a, and the other is input to the image feature extraction model 201a.

＜画像特徴抽出モデル＞
図３は、画像特徴抽出モデルの一例を示す図である。画像特徴抽出モデル２０１ａ及び２０１ｂは、入力された２つの指定画像のそれぞれの特徴を示す２つの特徴マップである指定特徴マップを生成する指定特徴抽出部を構成する。また、画像特徴抽出モデル２０１ａ及び２０１ｂは、同一のモデルであり、図３では、画像特徴抽出モデル３００として示されている。 <Image feature extraction model>
FIG. 3 is a diagram showing an example of an image feature extraction model. The image feature extraction models 201a and 201b constitute a designated feature extraction unit that generates designated feature maps, which are two feature maps indicating the respective features of the two input designated images. Furthermore, the image feature extraction models 201a and 201b are the same model, and are shown as an image feature extraction model 300 in FIG.

図３に示す画像特徴抽出モデル３００は、入力された指定画像から、その指定画像の特徴を示す指定特徴マップを算出するための学習モデルである。画像特徴抽出モデル３００は、例えば、ＣＮＮ（Convolutional Neural Network：畳み込みニューラルネットワーク）のようなニューラルネットワークにて実現することが可能である。指定画像は、ここでは、横幅成分（水平画素成分）と縦幅成分（垂直画素成分）と色成分とを含む３階のテンソルで表されるとする。また、横幅成分と縦幅成分とで示される位置を画素位置、横幅（横幅成分の数）及び縦幅（縦幅成分の数）を画像サイズ、色成分ｃをチャネル、色成分の数をチャネル数と呼ぶこともある。 The image feature extraction model 300 shown in FIG. 3 is a learning model for calculating, from an input designated image, a designated feature map indicating the characteristics of the designated image. The image feature extraction model 300 can be realized by, for example, a neural network such as a CNN (Convolutional Neural Network). Here, it is assumed that the specified image is represented by a third-order tensor including a width component (horizontal pixel component), a height component (vertical pixel component), and a color component. In addition, the position indicated by the width component and the height component is the pixel position, the width (the number of width components) and the height width (the number of the height components) are the image size, the color component c is the channel, and the number of color components is the channel. Sometimes called numbers.

画像特徴抽出モデル３００は、畳み込み層（図では、Convと表記）３０２とプーリング層（図では、Poolと表記）３０３とを含む処理層３０１を複数備える。複数の処理層３０１は互いに直列に接続される。各処理層３０１の畳み込み層３０２は、入力テンソルに対して畳み込み処理を行い、プーリング層３０３は、前段の畳み込み層３０２の出力テンソルに対して画像サイズを削減するプーリング処理を行って入力テンソルの特徴を示す特徴マップとして処理層３０１から出力する。処理層３０１から出力される特徴マップもテンソルである。ここでは、入力テンソルは、各処理層３０１において、チャネル数が入力前の２倍となり、画像サイズが入力前の１／２倍となって出力されるものとするが、この例に限定されない。ただし、最前段の処理層３０１では、チャネル数が入力前の２倍とはならない。 The image feature extraction model 300 includes a plurality of processing layers 301 including a convolution layer (denoted as Conv in the figure) 302 and a pooling layer (denoted as Pool in the figure) 303. The plurality of processing layers 301 are connected to each other in series. The convolution layer 302 of each processing layer 301 performs convolution processing on the input tensor, and the pooling layer 303 performs pooling processing to reduce the image size on the output tensor of the previous convolution layer 302, and It is output from the processing layer 301 as a feature map showing. The feature map output from the processing layer 301 is also a tensor. Here, it is assumed that the input tensor is outputted in each processing layer 301 with twice the number of channels as before input, and with an image size 1/2 times as large as before input, but the invention is not limited to this example. However, in the processing layer 301 at the forefront, the number of channels is not twice the number before input.

なお、各処理層３０１は、複数の畳み込み層３０２を備え、畳み込み処理を複数回行ってもよいが、ここでは、畳み込み層３０２は１つだけであるとしている。また、処理層３０１は、図３では、２つ示されているが、本実施形態では、４つあるとする。ただし、処理層３０１の数は４つに限定されない。また、畳み込み処理の後に、一般的なニューラルネットワークと同様に、バッチノーマライゼーション処理又は活性化関数（例えば、ランプ関数）処理などの後処理が行われてもよい。 Note that each processing layer 301 may include a plurality of convolutional layers 302 and perform convolution processing multiple times, but here, it is assumed that there is only one convolutional layer 302. Although two processing layers 301 are shown in FIG. 3, in this embodiment, there are four processing layers 301. However, the number of processing layers 301 is not limited to four. Further, after the convolution process, post-processing such as batch normalization process or activation function (eg, ramp function) process may be performed, similar to a general neural network.

＜特徴関連性マップ算出部＞
図２の特徴関連性マップ算出部２０３は、画像特徴抽出モデル２０１ａ及び２０１ｂのそれぞれから出力された指定特徴マップの内積を算出して、指定特徴マップ間の関連性を示す特徴関連性マップとして出力する演算部である。 <Feature relevance map calculation unit>
The feature relevance map calculation unit 203 in FIG. 2 calculates the inner product of the designated feature maps output from each of the image feature extraction models 201a and 201b, and outputs it as a feature relevance map indicating the relevance between the designated feature maps. This is an arithmetic unit that performs

＜変換パラメータ回帰モデル＞
図４は、変換パラメータ回帰モデル２０２の一例を示す図である。変換パラメータ回帰モデル２０２は、特徴関連性マップ算出部２０３から出力された特徴関連性マップに基づいて、変換モデルのパラメータである変換パラメータを推定するモデルである。 <Conversion parameter regression model>
FIG. 4 is a diagram showing an example of the conversion parameter regression model 202. The conversion parameter regression model 202 is a model that estimates conversion parameters, which are parameters of the conversion model, based on the feature relevance map output from the feature relevance map calculation unit 203.

ここでは、変換モデルを、アフィン変換を行うアフィン変換モデルとする。アフィン変換は、平行移動と線形変換をと組み合わせた変換である。線形変換とは、画像の拡大縮小、剪断及び回転を含む変換である。同次座標系を用いると、アフィン変換の変換行列は、以下の数式（１）で表される。

数式（１）は、画像の（Ｘ，Ｙ）座標の点がアフィン変換により（Ｘ’、Ｙ’）の点に変換されることを示す。また、数（１）のａ，ｂ，ｔ_ｘ，ｃ，ｄ及びｔ_ｙが変換パラメータである。なお、推定する変換パラメータを変更することでホモグラフィ変換モデルなどの他の変換モデルに適用することができる。 Here, the transformation model is an affine transformation model that performs affine transformation. An affine transformation is a combination of translation and linear transformation. A linear transformation is a transformation that includes scaling, shearing, and rotating an image. When a homogeneous coordinate system is used, the transformation matrix of affine transformation is expressed by the following equation (1).

Equation (1) indicates that a point at coordinates (X, Y) of an image is transformed into a point at (X', Y') by affine transformation. Further, a, b, t _x , c, d, and _ty in number (1) are conversion parameters. Note that by changing the estimated transformation parameters, the method can be applied to other transformation models such as the homography transformation model.

変換パラメータ回帰モデル４００は、畳み込み層（図では、Convと表記）４０２とプーリング層（図では、Poolと表記）４０３とを含む処理層４０１と、ＦＣ（Fully Conneted）層４０４とを含む。処理層４０１は、複数あり、それらの処理層４０１は互いに直接に接続される。ＦＣ層４０４は、最終層として最後段の処理層４０１の後に設けられる。 The conversion parameter regression model 400 includes a processing layer 401 including a convolution layer (denoted as Conv in the diagram) 402 and a pooling layer (denoted as Pool in the diagram) 403, and an FC (Fully Connected) layer 404. There are a plurality of processing layers 401, and these processing layers 401 are directly connected to each other. The FC layer 404 is provided as the final layer after the last processing layer 401.

各処理層４０１では、畳み込み層４０２が入力テンソルに対して畳み込み処理を行い、プーリング層４０３が畳み込み層４０２の出力テンソルに対してプーリング処理を行って特徴マップとして出力する。ここでは、入力テンソルは、各処理層４０１において、チャネル数が入力前の２倍となり、画像サイズが入力前の１／２倍となって出力されるものとするが、この例に限定されない。ただし、最前段の処理層４０１では、チャネル数が入力前の２倍とはならない。 In each processing layer 401, a convolution layer 402 performs convolution processing on the input tensor, and a pooling layer 403 performs pooling processing on the output tensor of the convolution layer 402, and outputs the result as a feature map. Here, it is assumed that the input tensor is outputted in each processing layer 401 with twice the number of channels as before input and with an image size 1/2 times as large as before input, but this is not limited to this example. However, in the processing layer 401 at the forefront, the number of channels is not twice that before input.

なお、各処理層４０１は、複数の畳み込み層４０２を備え、畳み込み処理を複数回行ってもよいが、ここでは、畳み込み層４０２は１つだけであるとしている。また、処理層４０１は、図４では、２つ示されているが、本実施形態では、４つあるとする。ただし、処理層４０１の数は４つに限定されない。また、畳み込み処理の後に上述した後処理が行われてもよい。 Note that each processing layer 401 may include a plurality of convolutional layers 402 and perform convolution processing multiple times, but here, it is assumed that there is only one convolutional layer 402. Although two processing layers 401 are shown in FIG. 4, in this embodiment, there are four processing layers 401. However, the number of processing layers 401 is not limited to four. Furthermore, the above-described post-processing may be performed after the convolution processing.

ＦＣ層４０４は、最後段の処理層４０１から出力された特徴マップに対して全結合処理を行うことで、特徴マップを１次元の出力ベクトルに変換して出力する。出力ベクトルの各要素が変換モデルの変換パラメータと対応する。このため、出力ベクトルの要素の数は、変換モデルの変換パラメータの数と同じであり、ここでは、変換モデルとしてアフィン変換モデルを使用しているため、出力ベクトルの要素の数は６となる。 The FC layer 404 converts the feature map into a one-dimensional output vector by performing full connection processing on the feature map output from the processing layer 401 at the last stage, and outputs the resultant one-dimensional output vector. Each element of the output vector corresponds to a transformation parameter of the transformation model. Therefore, the number of elements of the output vector is the same as the number of transformation parameters of the transformation model, and here, since an affine transformation model is used as the transformation model, the number of elements of the output vector is six.

以上説明した変換パラメータ回帰モデル２０２及び特徴関連性マップ算出部２０３が画像特徴抽出モデルにて生成された指定特徴マップに基づいて変換パラメータを推定するパラメータ推定部を構成する。 The conversion parameter regression model 202 and feature relationship map calculation unit 203 described above constitute a parameter estimation unit that estimates conversion parameters based on the designated feature map generated by the image feature extraction model.

＜実行部＞
実行部２０４は、変換パラメータ回帰モデル２０２からの出力ベクトルの要素に応じた変換パラメータに基づいて、２つの指定画像のいずれかに対してアフィン変換を実行することで、画像レジストレーション処理を実行する。 <Execution part>
The execution unit 204 performs image registration processing by performing affine transformation on either of the two designated images based on the transformation parameters corresponding to the elements of the output vector from the transformation parameter regression model 202. .

＜画像レジストレーション部の動作＞
図５は、画像レジストレーション部１０７の動作の一例を説明するための図であり、図６は、画像レジストレーション部１０７の動作の一例を説明するためのフローチャートである。 <Operation of image registration section>
FIG. 5 is a diagram for explaining an example of the operation of the image registration section 107, and FIG. 6 is a flowchart for explaining an example of the operation of the image registration section 107.

先ず、画像レジストレーション部１０７は、ユーザにて指定された２つの指定画像５０１ａ及び５０１ｂを受け付け、指定画像５０１ａを画像特徴抽出モデル２０１ａに入力し、指定画像５０１ｂを画像特徴抽出モデル２０１ｂに入力する（ステップＳ６００）。指定画像５０１ａ及び５０１ｂは、互いに類似しているとする。図５の「ｗ」、「ｈ」及び「ｃ」はそれぞれ、指定画像５０１ａ及び５０１ｂの横幅、縦幅及びチャネル数を表している。 First, the image registration unit 107 receives two specified images 501a and 501b specified by the user, inputs the specified image 501a into the image feature extraction model 201a, and inputs the specified image 501b into the image feature extraction model 201b. (Step S600). It is assumed that designated images 501a and 501b are similar to each other. "w", "h", and "c" in FIG. 5 represent the horizontal width, vertical width, and number of channels of the designated images 501a and 501b, respectively.

続いて、画像特徴抽出モデル２０１ａ及び２０１ｂは、指定画像５０１ａ及び５０１ｂに対して畳み込み層３０２による畳み込み処理とプーリング層３０３によるプーリング処理とを４回繰り返し実行し、指定特徴マップ５０２ａ及び５０２ｂを生成する（ステップＳ６０１）。図５では、各処理層３０１によって、指定画像５０１ａ及び５０１ｂが画像サイズを半減させながら、チャネル数を２倍にしていく過程が直方体５１２ａ及び５１２ｂで示されている。 Subsequently, the image feature extraction models 201a and 201b repeatedly perform convolution processing by the convolution layer 302 and pooling processing by the pooling layer 303 on the designated images 501a and 501b four times to generate designated feature maps 502a and 502b. (Step S601). In FIG. 5, rectangular parallelepipeds 512a and 512b represent a process in which each processing layer 301 doubles the number of channels of designated images 501a and 501b while reducing the image size by half.

そして、特徴関連性マップ算出部２０３は、画像特徴抽出モデル２０１ａ及び２０１ｂから出力された指定特徴マップ５０２ａ及び５０２ｂの内積を特徴関連性マップ５０３として算出する内積処理を行う（ステップＳ６０２）。 Then, the feature relevance map calculation unit 203 performs inner product processing to calculate the inner product of the designated feature maps 502a and 502b output from the image feature extraction models 201a and 201b as the feature relevance map 503 (step S602).

ここで、指定特徴マップ５０２ａをｆ_ａ、指定特徴マップ５０２ｂをｆ_ｂとし、指定特徴マップｆ_ａ及びｆ_ｂの画素位置を表すインデックスを（ｉ，ｊ）及び（ｉ_ｋ，ｊ_ｋ）とし、
ｆ_ｂ（ｉ，ｊ）を指定特徴マップｆ_ｂの（ｉ，ｊ）成分の特徴ベクトル（色成分からなるベクトル）、ｆ_ａ（ｉ_ｋ，ｊ_ｋ）を指定特徴マップｆ_ａの（ｉ_ｋ，ｊ_ｋ）成分の特徴ベクトルとすると、特徴関連性マップ５０３であるｃ_ａｂ（ｉ，ｊ，ｋ）は、以下の式（２）から算出することができる。

Here, the specified feature map 502a is f _a , the specified feature map 502b is f _b , and the indices representing the pixel positions of the specified feature maps f _a and f _b are (i, j) and (i _k , j _k ),
f _b (i, j) is the feature vector (vector consisting of color components) of the (i, j) component of the specified feature map f _b , f _a (i _k , j _k ) is the ( _i _k , j _k ) components, c _ab (i, j, k), which is the feature association map 503, can be calculated from the following equation (2).

式（２）において、「^Ｔ」はベクトルの転置、「・」はベクトルの内積を示す。したがって、特徴関連性マップｃ_ａｂ（ｉ，ｊ，ｋ）では、指定特徴マップｆ_ｂの（ｉ，ｊ）成分の特徴ベクトルと、指定特徴マップｆ_ａの（ｉ_ｋ，ｊ_ｋ）で表される画素位置の特徴ベクトルとの内積が（ｉ，ｊ，ｋ）成分となることを示す。換言すれば、特徴関連性マップｃ_ａｂ（ｉ，ｊ，ｋ）の（ｉ，ｊ）成分の特徴ベクトルは、指定特徴マップｆ_ｂの（ｉ，ｊ）成分が対応する指定画像５０１ａの画素領域と、指定画像５０２ｂ全体との関連性を示す。 In Equation (2), " ^T " indicates the transposition of the vector, and "." indicates the inner product of the vector. Therefore, in the feature relationship map c _ab (i, j, k), the feature vector of the (i, j) component of the specified feature map f _b and _the (i _k , j _k ) of the specified feature map fa are expressed as This shows that the inner product of the pixel position with the feature vector becomes the (i, j, k) component. In other words, the feature vector of the (i, j) component of the feature relationship map c _ab (i, j, k) is the pixel area of the specified image 501a to which the (i, j) component of the specified feature map f _b corresponds. indicates the relationship with the entire designated image 502b.

例えば、指定画像５０１ａが次元（ｗ，ｈ，ｃ）＝（２４０，２４０，３）のＲＧＢ画像の場合、画像特徴抽出モデル２０１ａでは、４つの処理層３０１により特徴マップの画像サイズは、元の指定画像５０１ａの画像サイズの１／１６となる。また、最前段の処理層３０１の出力テンソルのチャネル数を６４とすると、４つの処理層３０１にてチャネル数は、６４→１２８→２５６→５１２と変化する。このため、特徴マップｆ_ａ及びｆ_ｂは、それぞれ次元（ｗ，ｈ，ｃ）＝（１５，１５，５１２）となり、特徴関連性マップｃ_ａｂは（１５，１５，１５×１５）＝（１５，１５，２２５）次元のテンソルとなる。 For example, if the specified image 501a is an RGB image with dimensions (w, h, c) = (240, 240, 3), in the image feature extraction model 201a, the image size of the feature map is changed by the four processing layers 301 to the original size. The image size is 1/16 of the designated image 501a. Further, assuming that the number of channels of the output tensor of the processing layer 301 at the forefront is 64, the number of channels changes from 64 to 128 to 256 to 512 in the four processing layers 301. Therefore, the feature maps f _a and f _b have dimensions (w, h, c) = (15, 15, 512), respectively, and the feature association map c _ab has (15, 15, 15 x 15) = (15 , 15, 225) dimensions.

特徴関連性マップ５０３を算出すると、画像レジストレーション部１０７は、特徴関連性マップ５０３を変換パラメータ回帰モデル４００に入力する（ステップＳ６０３）。変換パラメータ回帰モデル４００は、入力された特徴関連性マップ５０３に対して畳み込み層４０２による畳み込み処理とプーリング層４０３によるプーリング処理とを４回繰り返し実行し、画像特徴マップ５０４を生成し、さらに画像特徴マップ５０４に対してＦＣ層４０４による全結合処理５１０を行い、６つの変換パラメータ５０５を生成する（ステップＳ６０４）。なお、ステップＳ６００～Ｓ６０４までの処理が画像レジストレーションモデルの処理である。 After calculating the feature relevance map 503, the image registration unit 107 inputs the feature relevance map 503 into the conversion parameter regression model 400 (step S603). The conversion parameter regression model 400 repeatedly executes the convolution process by the convolution layer 402 and the pooling process by the pooling layer 403 on the input feature relationship map 503 four times, generates an image feature map 504, and further calculates the image features. A full connection process 510 is performed on the map 504 by the FC layer 404 to generate six transformation parameters 505 (step S604). Note that the processing from steps S600 to S604 is image registration model processing.

さらに実行部２０４は、６つの変換パラメータ５０５に基づいて、画像５０１ａ及び５０１ｂのいずれかに対してアフィン変換を実行することで、画像レジストレーション処理を実行し（ステップＳ６０５）、処理を終了する。 Further, the execution unit 204 executes the image registration process by performing affine transformation on either of the images 501a and 501b based on the six transformation parameters 505 (step S605), and ends the process.

次に画像差分抽出部１０８による画像差分抽出処理ついてより詳細に説明する。 Next, the image difference extraction process by the image difference extraction unit 108 will be described in more detail.

＜画像差分抽出モデル＞
図７は、画像差分抽出部１０８で使用する画像差分抽出モデルの一例を示す図である。図７に示す画像差分抽出モデル７００は、画像特徴抽出モデル７０１ａ及び７０１ｂと、類似度算出部７０２と、画像処理部７０３とを備える。画像特徴抽出モデル７０１ａ及び７０１ｂは、画像レジストレーション部１０７が使用する画像レジストレーションモデルの画像特徴抽出モデル２０１ａ及び２０１ｂと同じモデルであり、入力された２つの対象画像のそれぞれの特徴を示す特徴マップである対象特徴マップを生成する特徴抽出部を構成する。 <Image difference extraction model>
FIG. 7 is a diagram showing an example of an image difference extraction model used by the image difference extraction unit 108. The image difference extraction model 700 shown in FIG. 7 includes image feature extraction models 701a and 701b, a similarity calculation section 702, and an image processing section 703. The image feature extraction models 701a and 701b are the same models as the image feature extraction models 201a and 201b, which are image registration models used by the image registration unit 107, and are feature maps indicating the respective features of the two input target images. A feature extraction unit that generates a target feature map is configured.

類似度算出部７０２は、画像特徴抽出モデル７０１ａ及び７０１ｂで算出された２つの対象特徴マップ間の位置（具体的には、画素位置）ごとの類似度を示す類似度マップを生成する。本実施形態では、類似度は、コサイン類似度である。 The similarity calculation unit 702 generates a similarity map that indicates the similarity for each position (specifically, pixel position) between the two target feature maps calculated by the image feature extraction models 701a and 701b. In this embodiment, the similarity is a cosine similarity.

画像処理部７０３は、類似度算出部７０２にて算出された類似度マップに応じた差分抽出画像を生成する画像処理を行う。画像処理は、ここでは、差分抽出画像として、類似度マップの各値を色又は濃淡で表すヒートマップ画像を生成するヒートマップ化処理である。また、画像処理は、差分抽出画像を対象画像と同じサイズに拡大する処理を含んでもよい。 The image processing unit 703 performs image processing to generate a difference extraction image according to the similarity map calculated by the similarity calculation unit 702. Here, the image processing is a heat mapping process that generates a heat map image representing each value of the similarity map by color or shading as a difference extraction image. Further, the image processing may include processing to enlarge the difference extracted image to the same size as the target image.

＜画像差分抽出部の動作＞ <Operation of image difference extraction unit>

図８は、画像差分抽出部１０８の動作の一例を説明するための図であり、図９は、画像差分抽出部１０８の動作の一例を説明するためのフローチャートである。 FIG. 8 is a diagram for explaining an example of the operation of the image difference extraction unit 108, and FIG. 9 is a flowchart for explaining an example of the operation of the image difference extraction unit 108.

先ず、画像差分抽出部１０８は、ユーザにて指定された２つの対象画像８０１ａ及び８０１ｂを受け付け、対象画像８０１ａを画像特徴抽出モデル７０１ａに入力し、対象画像８０１ｂを画像特徴抽出モデル７０１ｂに入力する（ステップＳ９００）。対象画像８０１ａ及び８０１ｂは、画像レジストレーション部１０７により画像レジストレーション処理が行われた指定画像である。 First, the image difference extraction unit 108 receives two target images 801a and 801b specified by the user, inputs the target image 801a into the image feature extraction model 701a, and inputs the target image 801b into the image feature extraction model 701b. (Step S900). The target images 801a and 801b are designated images that have been subjected to image registration processing by the image registration unit 107.

続いて、画像特徴抽出モデル７０１ａ及び７０１ｂは、対象画像８０１ａ及び８０１ｂに対して畳み込み層４０２による畳み込み処理とプーリング層４０３によるプーリング処理とを４回繰り返し実行し、対象特徴マップ８０２ａ及び８０２ｂを生成する（ステップＳ９０１）。図５では、各処理層３０１によって、対象画像８０１ａ及び８０１ｂが画像サイズを半減させながら、チャネル数を２倍にしていく過程が直方体８１２ａ及び８１２ｂで示されている。 Subsequently, the image feature extraction models 701a and 701b repeatedly perform convolution processing by the convolution layer 402 and pooling processing by the pooling layer 403 on the target images 801a and 801b four times to generate target feature maps 802a and 802b. (Step S901). In FIG. 5, rectangular parallelepipeds 812a and 812b represent a process in which each processing layer 301 doubles the number of channels while reducing the size of the target images 801a and 801b by half.

類似度算出部７０２は、画像特徴抽出モデル７０１ａ及び７０１ｂで算出された対象特徴マップ８０２ａ及び８０２ｂの間の類似度を示す類似度マップ８０３を算出する類似度算出処理を行う（ステップＳ９０２）。 The similarity calculation unit 702 performs similarity calculation processing to calculate a similarity map 803 indicating the similarity between the target feature maps 802a and 802b calculated by the image feature extraction models 701a and 701b (step S902).

ここで、対象特徴マップ８０２ａをｇ_ａ、対象特徴マップ８０２ｂをｇ_ｂとし、対象特徴マップｇ_ａ及びｇ_ｂの画素位置を表すインデックスを（ｉ，ｊ）及び（ｉ_ｋ，ｊ_ｋ）とし、対象特徴マップｇ_ａ及びｇ_ｂの（ｉ，ｊ）成分の特徴ベクトルをｇ_ａ（ｉ，ｊ）及びｇ_ｂ（ｉ，ｊ）とすると、類似度マップ８０３であるＣ_ｓｉｍ（ｉ，ｊ）は、以下の式（３）から算出することができる。

Here, the target feature map 802a is g _a , the target feature map 802b is g _b , and the indices representing the pixel positions of the target feature maps g _a and g _b are (i, j) and (i _k , j _k ), If the feature vectors of the (i, j) components of the target feature maps g _a and g _b are g _a (i, j) and g _b (i, j), the similarity map 803 C _sim (i, j) can be calculated from the following equation (3).

式（３）において、「^Ｔ」はベクトルの転置、「・」はベクトルの内積、「｜｜」は、ベクトルのユークリッド距離を示す。したがって、類似度マップＣ_ｓｉｍ（ｉ，ｊ）では、画素位置（ｉ，ｊ）ごとに対象特徴マップｇ_ａ及びｇ_ｂのコサイン類似度が示されることとなる。なお、コサイン類似度はスカラ値であるため、類似度マップは、１チャネルのテンソル（画像）となる。また、コサイン類似度は、２つのベクトルがどの程度同じ向きを向いているかを示す指標であり、－１から１までの値を取る。コサイン類似度は、値が１に近づくほど、２つのベクトルが類似しており、－１に近づくほど、２つのベクトルが類似していないことを示す。 In equation (3), " ^T " indicates the transpose of the vector, "." indicates the inner product of the vector, and "||" indicates the Euclidean distance of the vector. Therefore, the similarity map C _sim (i, j) shows the cosine similarity of the target feature maps g _a and g _b for each pixel position (i, j). Note that since the cosine similarity is a scalar value, the similarity map is a one-channel tensor (image). Further, cosine similarity is an index indicating how much two vectors point in the same direction, and takes a value from -1 to 1. The closer the cosine similarity value is to 1, the more similar the two vectors are, and the closer the value is to -1, the less similar the two vectors are.

類似度マップを算出すると、画像処理部７０３は、類似度マップ８０３に対してヒートマップ化処理を行い、類似度マップ８０３をヒートマップ画像８０４に変換する（ステップＳ９０３）。ヒートマップ画像は、ここでは、類似度マップの値が１に近づくほど赤く、－１に近づくほど青くなるＲＧＢ画像であるとするが、ヒートマップ画像はこの例に限らない。 After calculating the similarity map, the image processing unit 703 performs heat mapping processing on the similarity map 803 to convert the similarity map 803 into a heat map image 804 (step S903). Here, it is assumed that the heat map image is an RGB image that becomes redder as the value of the similarity map approaches 1 and becomes bluer as it approaches -1, but the heat map image is not limited to this example.

そして、画像処理部７０３は、ヒートマップ画像８０４を対象画像と同じサイズに拡大し（ステップＳ９０４）、処理を終了する。 The image processing unit 703 then enlarges the heat map image 804 to the same size as the target image (step S904), and ends the process.

＜全体フロー＞
図１０は、本実施形態の類似画像差分抽出システム全体の動作の一例を説明するためのフローチャートである。図１１～図１３は、類似画像差分抽出システムで扱われる画像の具体例を示す図である。具体的には、図１１は、画像検索処理の結果の一例を示す図であり、図１２は、画像レジストレーション処理の結果の一例を示す図であり、図１３は、画像差分抽出処理の結果の一例を示す図である。 <Overall flow>
FIG. 10 is a flowchart for explaining an example of the operation of the entire similar image difference extraction system of this embodiment. 11 to 13 are diagrams showing specific examples of images handled by the similar image difference extraction system. Specifically, FIG. 11 is a diagram showing an example of the result of image search processing, FIG. 12 is a diagram showing an example of the result of image registration processing, and FIG. 13 is a diagram showing an example of the result of image difference extraction processing. It is a figure showing an example.

先ず、利用者端末１００の入出力部１０１は、ユーザから画像検索の対象となるクエリ画像を選択する操作を受け付ける（ステップＳ１０００）。ここでは、図１１に示す画像１１００がクエリ画像として選択されたとする。 First, the input/output unit 101 of the user terminal 100 receives an operation from the user to select a query image to be searched for (step S1000). Here, it is assumed that image 1100 shown in FIG. 11 is selected as the query image.

続いて、入出力部１０１は、ユーザから画像検索コマンドを受け付ける。端末管理部１０２は、画像検索コマンドにクエリ画像を示す情報を加え、その画像検索コマンドを、ネットワーク部１０３を介して画像処理サーバ１０４に送信する（ステップＳ１００１）。 Subsequently, the input/output unit 101 receives an image search command from the user. The terminal management unit 102 adds information indicating a query image to the image search command, and transmits the image search command to the image processing server 104 via the network unit 103 (step S1001).

画像処理サーバ１０４のサーバ管理部１０５は、ネットワーク部１０９を介して画像検索コマンドを受信し、その画像検索コマンドに従って画像検索部１０６に画像検索処理の実行を指示する。画像検索部１０６は、画像検索コマンドに基づいて、クエリ画像と類似する類似画像を画像データベース１１０から検索する。サーバ管理部１０５は、検索された類似画像を検索結果として、ネットワーク部１０９を介して利用者端末１００に送信する（ステップＳ１００２）。 The server management unit 105 of the image processing server 104 receives the image search command via the network unit 109, and instructs the image search unit 106 to execute image search processing in accordance with the image search command. The image search unit 106 searches the image database 110 for similar images similar to the query image based on the image search command. The server management unit 105 transmits the searched similar images as a search result to the user terminal 100 via the network unit 109 (step S1002).

ここでは、図１１に示す類似画像群１１０１に含まれる画像１１０２～１１０５が類似画像として検索されたとする。画像１１０２はクエリ画像である画像１１００に対して位置ずれがあり、画像１１０３は画像１１００とはサイズが異なり、画像１１０４は画像１１００に対して角度が異なる。また、画像１１０５は、位置、サイズ及び角度などの幾何学的な特徴に関しては画像１１００と一致している。 Here, it is assumed that images 1102 to 1105 included in the similar image group 1101 shown in FIG. 11 are searched as similar images. Image 1102 has a positional shift with respect to image 1100 which is a query image, image 1103 has a different size from image 1100, and image 1104 has a different angle from image 1100. Image 1105 also matches image 1100 with respect to geometrical characteristics such as position, size, and angle.

利用者端末１００の端末管理部１０２は、ネットワーク部１０３を介して検索結果を受信し、その検索結果を入出力部１０１に表示する。入出力部１０１は、ユーザから、検索結果に含まれる類似画像とクエリ画像とのうち、互いに比較する２つの画像を指定する操作を受け付ける（ステップＳ１００３）。ここでは、クエリ画像である画像１１００と類似画像である画像１１０２とが指定されたとする。なお、本操作において２つの類似画像が指定されてもよい。 The terminal management unit 102 of the user terminal 100 receives the search results via the network unit 103 and displays the search results on the input/output unit 101. The input/output unit 101 receives an operation from the user to designate two images to be compared with each other from among the similar images included in the search results and the query image (step S1003). Here, it is assumed that image 1100, which is a query image, and image 1102, which is a similar image, are specified. Note that two similar images may be specified in this operation.

続いて、入出力部１０１は、ユーザから画像レジストレーションコマンドを受け付ける。端末管理部１０２は、画像レジストレーションコマンドに、指定された２つの指定画像を示す情報を加え、その画像検索コマンドを、ネットワーク部１０３を介して画像処理サーバ１０４に送信する（ステップＳ１００４）。 Subsequently, the input/output unit 101 receives an image registration command from the user. The terminal management unit 102 adds information indicating the two designated images to the image registration command, and transmits the image search command to the image processing server 104 via the network unit 103 (step S1004).

画像処理サーバ１０４のサーバ管理部１０５は、ネットワーク部１０９を介して画像レジストレーションコマンドを受信し、その画像レジストレーションコマンドに従って画像レジストレーション部１０７に画像レジストレーション処理の実行を指示する。画像レジストレーション部１０７は、画像レジストレーションコマンドに基づいて、画像レジストレーション処理を実行する。サーバ管理部１０５は、画像レジストレーション処理の処理結果であるレジストレーション結果を、ネットワーク部１０９を介して利用者端末１００に送信する（ステップＳ１００５）。画像レジストレーション処理は、具体的には、図５及び図６を用いて説明した処理である。ここでは、図１２に示す２つの画像１１００及び画像１１０２のうち画像１１００に対してアフィン変換が施されて、画像１２００がレジストレーション結果として取得されたとする。 The server management unit 105 of the image processing server 104 receives the image registration command via the network unit 109, and instructs the image registration unit 107 to execute image registration processing in accordance with the image registration command. The image registration unit 107 executes image registration processing based on the image registration command. The server management unit 105 transmits the registration result, which is the result of the image registration process, to the user terminal 100 via the network unit 109 (step S1005). The image registration process is specifically the process described using FIGS. 5 and 6. Here, it is assumed that affine transformation is performed on image 1100 of the two images 1100 and 1102 shown in FIG. 12, and image 1200 is obtained as a registration result.

その後、利用者端末１００の端末管理部１０２は、ネットワーク部１０３を介してレジストレーション結果を受信し、そのレジストレーション結果を入出力部１０１に表示する。入出力部１０１は、レジストレーション結果を確認したユーザから、差分抽出コマンドを受け付ける。端末管理部１０２は、差分抽出コマンドに画像１１００及び１２００を対象画像として示す情報を加え、その差分抽出コマンドを、ネットワーク部１０３を介して画像処理サーバ１０４に送信する（Ｓ１００６）。 Thereafter, the terminal management unit 102 of the user terminal 100 receives the registration result via the network unit 103 and displays the registration result on the input/output unit 101. The input/output unit 101 receives a difference extraction command from a user who has confirmed the registration result. The terminal management unit 102 adds information indicating images 1100 and 1200 as target images to the difference extraction command, and transmits the difference extraction command to the image processing server 104 via the network unit 103 (S1006).

画像処理サーバ１０４のサーバ管理部１０５は、ネットワーク部１０９を介して差分抽出コマンドを受信し、その差分抽出コマンドに従って画像差分抽出部１０８に画像差分抽出処理の実行を指示する。画像差分抽出部１０８は、差分抽出コマンドに基づいて、２つの対象画像に対して画像差分抽出処理を実行して、ヒートマップ画像を生成する（ステップＳ１００７）。画像差分抽出処理は、具体的には、図７及び図８を用いて説明した処理である。ここでは、図１３に示す２つの画像１１００及び画像１２００に対して画像差分抽出処理が実行され、画像１３００がヒートマップ画像として生成されたとする。 The server management unit 105 of the image processing server 104 receives the difference extraction command via the network unit 109, and instructs the image difference extraction unit 108 to execute image difference extraction processing in accordance with the difference extraction command. The image difference extraction unit 108 executes image difference extraction processing on the two target images based on the difference extraction command to generate a heat map image (step S1007). The image difference extraction process is specifically the process described using FIGS. 7 and 8. Here, it is assumed that image difference extraction processing is performed on the two images 1100 and 1200 shown in FIG. 13, and image 1300 is generated as a heat map image.

サーバ管理部１０５は、画像差分抽出処理の処理結果であるヒートマップ画像を、ネットワーク部１０９を介して利用者端末１００に送信する。利用者端末１００の端末管理部１０２は、ネットワーク部１０３を介してヒートマップ画像を受信し、そのヒートマップ画像を入出力部１０１に表示し（ステップＳ１００８）、処理を終了する。ユーザは、ヒートマップ画像を確認することにより、２つ画像のどの部分がどの程度異なるかを確認することが可能になる。 The server management unit 105 transmits the heat map image, which is the processing result of the image difference extraction process, to the user terminal 100 via the network unit 109. The terminal management unit 102 of the user terminal 100 receives the heat map image via the network unit 103, displays the heat map image on the input/output unit 101 (step S1008), and ends the process. By checking the heat map images, the user can check which parts of the two images differ and to what extent.

＜分割画像差分抽出処理＞
次に、画像差分抽出部１０８による画像差分抽出処理の変形例である分割画像差分抽出処理について説明する。図１４は、分割画像差分抽出処理の一例を説明するためのフローチャートである。図１５は、分割画像差分抽出処理の結果の一例を示す図である。 <Divided image difference extraction processing>
Next, divided image difference extraction processing, which is a modification of the image difference extraction processing performed by the image difference extraction unit 108, will be described. FIG. 14 is a flowchart for explaining an example of divided image difference extraction processing. FIG. 15 is a diagram illustrating an example of the results of the divided image difference extraction process.

分割画像差分抽出処理では、先ず、図１０のステップＳ１０００～Ｓ１００５の処理が実行される。続いて、利用者端末１００の端末管理部１０２は、ネットワーク部１０３を介してレジストレーション結果を受信し、そのレジストレーション結果を入出力部１０１に表示する。入出力部１０１は、レジストレーション結果を確認したユーザから、画像レジストレーション処理が実行された指定画像である対象画像を複数の部分画像に分割するグリッド分割するグリッド分割操作を受け付ける（ステップＳ１４００）。グリッド分割操作は、分割内容（例えば、部分画像の数、縦横比及び分割位置など）を指定するものでもよい。 In the divided image difference extraction process, first, the processes of steps S1000 to S1005 in FIG. 10 are executed. Subsequently, the terminal management section 102 of the user terminal 100 receives the registration result via the network section 103 and displays the registration result on the input/output section 101. The input/output unit 101 receives, from the user who has confirmed the registration result, a grid division operation for dividing the target image, which is the designated image on which the image registration process has been performed, into a plurality of partial images into a grid (step S1400). The grid division operation may specify division details (for example, the number of partial images, aspect ratio, division position, etc.).

入出力部１０１は、レジストレーション結果を確認したユーザから、差分抽出コマンドを受け付ける。端末管理部１０２は、差分抽出コマンドに対象画像とグリッド分割操作に応じた分割内容とを示す情報を加え、その差分抽出コマンドを、ネットワーク部１０３を介して画像処理サーバ１０４に送信する（ステップＳ１４０１）。 The input/output unit 101 receives a difference extraction command from a user who has confirmed the registration result. The terminal management unit 102 adds information indicating the target image and the division details according to the grid division operation to the difference extraction command, and transmits the difference extraction command to the image processing server 104 via the network unit 103 (step S1401 ).

画像処理サーバ１０４のサーバ管理部１０５は、ネットワーク部１０９を介して差分抽出コマンドを受信し、その差分抽出コマンドに従って画像差分抽出部１０８に画像差分抽出処理の実行を指示する。画像差分抽出部１０８は、差分抽出コマンドに基づいて、対象画像を、指定された分割内容に従って複数の部分画像にグリッド分割する（ステップＳ１４０２）。ここでは、図１５に示すように画像１５００及び１５０２が対象画像としてグリッド分割されたとする。なお、図１５では、説明のためにグリッド線１５０１が表記されているが、実際には、画像にグリッド線１５０１を引く必要はない。 The server management unit 105 of the image processing server 104 receives the difference extraction command via the network unit 109, and instructs the image difference extraction unit 108 to execute image difference extraction processing in accordance with the difference extraction command. Based on the difference extraction command, the image difference extraction unit 108 grid-divides the target image into a plurality of partial images according to the specified division details (step S1402). Here, assume that images 1500 and 1502 are divided into grids as target images, as shown in FIG. Note that although grid lines 1501 are shown in FIG. 15 for the sake of explanation, it is not actually necessary to draw the grid lines 1501 on the image.

画像差分抽出部１０８は、グリッド分割された部分画像ごとにステップＳ１４０３～１４０４を繰り返すループ処理（Ａ）を実行する。 The image difference extraction unit 108 executes a loop process (A) in which steps S1403 to S1404 are repeated for each grid-divided partial image.

ループ処理（Ａ）では、画像差分抽出部１０８は、各対象画像の互いに対応する位置の部分画像を画像差分抽出モデル７００の特徴量抽出モデル７０１ａ及び７０１ｂに入力する（ステップＳ１４０３）。特徴量抽出モデル７０１ａ及び７０１ｂは、入力された部分画像の特徴マップである部分特徴マップを算出する（ステップＳ１４０４）。 In loop processing (A), the image difference extraction unit 108 inputs partial images at mutually corresponding positions of each target image to the feature amount extraction models 701a and 701b of the image difference extraction model 700 (step S1403). The feature extraction models 701a and 701b calculate partial feature maps that are feature maps of the input partial images (step S1404).

画像差分抽出部１０８は、全ての部分画像に対してステップＳ１４０３～１４０４処理を行うと、ループ処理（Ａ）を抜ける。そして、類似度算出部７０２は、各部分特徴マップの位置関係が各部分特徴マップに対応する各部分画像の位置関係と同じになるように、各部分特徴マップを結合して対象特徴マップとして生成する（ステップＳ１４０５）。 After the image difference extraction unit 108 performs steps S1403 to S1404 on all partial images, the process exits from the loop process (A). Then, the similarity calculation unit 702 combines each partial feature map to generate a target feature map so that the positional relationship of each partial feature map is the same as the positional relationship of each partial image corresponding to each partial feature map. (Step S1405).

続いて、画像差分抽出部１０８の類似度算出部７０２は、各対象特徴マップの間の類似度を示す類似度マップを算出する類似度算出処理を行う（ステップＳ１４０６）。画像処理部７０３は、類似度マップに対してヒートマップ化処理を行い、類似度マップをヒートマップ画像に変換する（ステップＳ１４０７）。画像処理部７０３は、ヒートマップ画像を入力画像と同じサイズに拡大し（ステップＳ１４０８）、処理を終了する。これにより、図１５に示す画像１５０４がヒートマップ画像として生成される。なお、ステップＳ１４０８の処理の後に、図１０のステップＳ１００８の処理が実行される。 Subsequently, the similarity calculation unit 702 of the image difference extraction unit 108 performs similarity calculation processing to calculate a similarity map indicating the similarity between each target feature map (step S1406). The image processing unit 703 performs heat mapping processing on the similarity map and converts the similarity map into a heat map image (step S1407). The image processing unit 703 enlarges the heat map image to the same size as the input image (step S1408), and ends the process. As a result, an image 1504 shown in FIG. 15 is generated as a heat map image. Note that after the process in step S1408, the process in step S1008 in FIG. 10 is executed.

＜学習処理の一例＞
図１６は、画像レジストレーションモデル２００及び画像差分抽出モデル７００に対して行う学習処理の一例を説明するためのフローチャートである。 <Example of learning process>
FIG. 16 is a flowchart for explaining an example of a learning process performed on the image registration model 200 and the image difference extraction model 700.

学習処理では、先ず、入出力部１０１がユーザから学習コマンドを受け付けると、端末管理部１０２は、その学習コマンドを、ネットワーク部１０３を介して画像処理サーバ１０４に送信する。画像処理サーバ１０４のサーバ管理部１０５は、ネットワーク部１０９を介して学習コマンドを受信する（ステップＳ１６００）。 In the learning process, first, when the input/output unit 101 receives a learning command from the user, the terminal management unit 102 transmits the learning command to the image processing server 104 via the network unit 103. The server management unit 105 of the image processing server 104 receives the learning command via the network unit 109 (step S1600).

サーバ管理部１０５は、所定の条件を満たすまで、ステップＳ１６０１～Ｓ１６０７の処理を繰り返すループ処理（Ｂ）を実行する。所定の条件は、学習回数（ループ回数）が所定回数に到達すること、又は、学習時間が所定時間に到達することなどである。 The server management unit 105 executes a loop process (B) in which steps S1601 to S1607 are repeated until a predetermined condition is satisfied. The predetermined conditions include that the number of learning times (the number of loops) reaches a predetermined number of times, or that the learning time reaches a predetermined time.

ループ処理（Ｂ）では、サーバ管理部１０５は、画像データベース１１０から第１の学習用画像として任意の画像を取得する（ステップＳ１６０１）。ここで取得された画像を画像Ａと呼ぶ。 In loop processing (B), the server management unit 105 acquires an arbitrary image as a first learning image from the image database 110 (step S1601). The image acquired here is called image A.

サーバ管理部１０５は、ランダムな変換パラメータである第１の変換パラメータを生成する（ステップＳ１６０２）。第１の変換パラメータの数は変換モデルに応じて予め定められている。例えば、変換モデルがアフィン変換モデルの場合、第１の変換パラメータの数は６である。 The server management unit 105 generates a first conversion parameter that is a random conversion parameter (step S1602). The number of first conversion parameters is predetermined according to the conversion model. For example, when the transformation model is an affine transformation model, the number of first transformation parameters is six.

サーバ管理部１０５は、第１の変換パラメータを設定した変換モデルを用いて画像Ａを第２の学習用画像である画像Ｂに変換する（ステップＳ１６０３）。サーバ管理部１０５は、画像Ａ及びＢを指定画像として画像レジストレーションモデル２００に入力する（ステップＳ１６０４）。画像レジストレーションモデル２００は、画像Ａ及びＢに基づく変換パラメータを第２の変換パラメータとして推定する（ステップＳ１６０５）。 The server management unit 105 converts image A into image B, which is a second learning image, using the conversion model in which the first conversion parameter is set (step S1603). The server management unit 105 inputs images A and B as designated images to the image registration model 200 (step S1604). The image registration model 200 estimates a transformation parameter based on images A and B as a second transformation parameter (step S1605).

サーバ管理部１０５は、所定の損失関数を用いて、ステップＳ１６０５で推定された第２の変換パラメータと、ステップＳ１６０２で生成された第１の変換パラメータとの差を示す損失値を算出する（ステップＳ１６０６）。なお、損失関数は、例えば、ＭＳＥ（Mean Square Error）などである。 The server management unit 105 uses a predetermined loss function to calculate a loss value indicating the difference between the second transformation parameter estimated in step S1605 and the first transformation parameter generated in step S1602 (step S1606). Note that the loss function is, for example, MSE (Mean Square Error).

サーバ管理部１０５は、損失値が減少するように、画像レジストレーションモデル２００の重み値を調整する（ステップＳ１６０７）。 The server management unit 105 adjusts the weight value of the image registration model 200 so that the loss value decreases (step S1607).

そして、所定の条件が満たされると、サーバ管理部１０５は、ループ処理（Ｂ）を抜けて、処理を終了する。これにより、画像レジストレーションモデル２００が学習モデルとして構築される。画像レジストレーションモデル２００が学習されることで、内部の画像特徴抽出モデル２０１ａ及び２０１ｂが学習される。これにより、本実施形態では、画像差分抽出モデル７００の内部の画像特徴抽出モデル７０１ａ及び７０１ｂとして、画像特徴抽出モデル２０１ａ及び２０１ｂと同じモデルを使用するため、画像特徴抽出モデル２０１ａ及び２０１ｂも学習されることとなる。 Then, when the predetermined condition is satisfied, the server management unit 105 exits the loop process (B) and ends the process. As a result, the image registration model 200 is constructed as a learning model. By learning the image registration model 200, internal image feature extraction models 201a and 201b are learned. Accordingly, in this embodiment, since the same model as the image feature extraction models 201a and 201b is used as the image feature extraction models 701a and 701b inside the image difference extraction model 700, the image feature extraction models 201a and 201b are also learned. The Rukoto.

＜学習処理の他の例＞
図１７は、画像レジストレーションモデル２００及び画像差分抽出モデル７００に対して行う学習処理の他の例を説明するためのフローチャートである。本学習処理を用いることで、図１６で示した学習処理よりもさらに精度の良いモデルを構築することが可能となる。 <Other examples of learning processing>
FIG. 17 is a flowchart for explaining another example of the learning process performed on the image registration model 200 and the image difference extraction model 700. By using this learning process, it is possible to construct a more accurate model than the learning process shown in FIG. 16.

まず、画像処理サーバ１０４のサーバ管理部１０５は、位置、サイズ及び角度などの幾何学的な特徴が類似する２つの画像のセットの集合を画像データベース１１０に登録する（ステップＳ１７００）。類似する２つの画像のセットとしては、例えば、図１１の画像１１００と画像１１０５とのセットが挙げられる。画像のセットはユーザにて作成されてもよい。また、画像のセットは、一般的な画像処理ライブラリなどの既存のプログラムなどを用いて生成されてもよい。また、既存のプログラムにて生成された画像からユーザが画像のセットを選択する構成でもよい。 First, the server management unit 105 of the image processing server 104 registers, in the image database 110, a set of two images that are similar in geometrical characteristics such as position, size, and angle (step S1700). An example of a set of two similar images is the set of image 1100 and image 1105 in FIG. 11. The set of images may be created by the user. Further, the set of images may be generated using an existing program such as a general image processing library. Alternatively, the configuration may be such that the user selects a set of images from images generated using an existing program.

続いて、入出力部１０１がユーザから学習コマンドを受け付けると、端末管理部１０２は、その学習コマンドを、ネットワーク部１０３を介して画像処理サーバ１０４に送信する。画像処理サーバ１０４のサーバ管理部１０５は、ネットワーク部１０９を介して学習コマンドを受信する（ステップＳ１７０１）。 Subsequently, when the input/output unit 101 receives a learning command from the user, the terminal management unit 102 transmits the learning command to the image processing server 104 via the network unit 103. The server management unit 105 of the image processing server 104 receives the learning command via the network unit 109 (step S1701).

サーバ管理部１０５は、画像のセットごとに、ステップＳ１７０２～Ｓ１７０８の処理を繰り返すループ処理（Ｃ）を実行する。 The server management unit 105 executes a loop process (C) that repeats the processes of steps S1702 to S1708 for each set of images.

ループ処理（Ｃ）では、サーバ管理部１０５は、画像データベース１１０から任意の画像のセットに含まれる２つの画像を、第１の学習用画像及び第１の学習用画像に対応する対応類似画像として取得する（Ｓ１７０２）。ここで取得された第１の学習用画像を画像Ｃ、対応類似画像を画像Ｄと呼ぶ。 In loop processing (C), the server management unit 105 selects two images included in an arbitrary image set from the image database 110 as a first learning image and a corresponding similar image corresponding to the first learning image. Acquire (S1702). The first learning image acquired here is called image C, and the corresponding similar image is called image D.

サーバ管理部１０５は、ランダムな変換パラメータである第１の変換パラメータを生成する（ステップＳ１７０３）。 The server management unit 105 generates a first conversion parameter that is a random conversion parameter (step S1703).

サーバ管理部１０５は、第１の変換パラメータを設定した変換モデルを用いて画像Ｄを第２の学習用画像である画像Ｄ’に変換する（ステップＳ１７０４）。サーバ管理部１０５は、画像Ｃ及びＤ’を指定画像として画像レジストレーションモデル２００に入力する（ステップＳ１７０５）。画像レジストレーションモデル２００は、画像Ｃ及びＤ’に基づく変換パラメータを第２の変換パラメータとして推定する（ステップＳ１７０６）。 The server management unit 105 converts the image D into the image D' which is the second learning image using the conversion model in which the first conversion parameter is set (step S1704). The server management unit 105 inputs images C and D' into the image registration model 200 as designated images (step S1705). The image registration model 200 estimates a transformation parameter based on images C and D' as a second transformation parameter (step S1706).

サーバ管理部１０５は、所定の損失関数を用いて、ステップＳ１７０６で推定された第２の変換パラメータと、ステップＳ１７０３で生成された第１の変換パラメータとの差を示す損失値を算出する（ステップＳ１７０７）。 The server management unit 105 uses a predetermined loss function to calculate a loss value indicating the difference between the second transformation parameter estimated in step S1706 and the first transformation parameter generated in step S1703 (step S1707).

サーバ管理部１０５は、損失値が減少するように、画像レジストレーションモデル２００の重み値を調整する（ステップＳ１７０８）。 The server management unit 105 adjusts the weight value of the image registration model 200 so that the loss value decreases (step S1708).

そして、全ての対応類似画像のセットについてステップＳ１７０２～Ｓ１７０８の処理を実行すると、サーバ管理部１０５は、ループ処理（Ｃ）を抜けて、処理を終了する。これにより、画像レジストレーションモデル２００が学習モデルとして構築される。 After executing the processes of steps S1702 to S1708 for all sets of corresponding similar images, the server management unit 105 exits the loop process (C) and ends the process. As a result, the image registration model 200 is constructed as a learning model.

以上説明したように本実施形態では、画像特徴抽出モデル７０１ａ及び７０１ｂで構成される特徴抽出部は、２つの対象画像のそれぞれの特徴を示す特徴マップである対象特徴マップを生成する。類似度算出部７０２は、各対象特徴マップの位置ごとの類似度を示す類似度マップを生成する。画像処理部７０３は、類似度マップに応じた差分抽出画像を生成する。 As described above, in this embodiment, the feature extraction unit including the image feature extraction models 701a and 701b generates a target feature map that is a feature map indicating the respective features of two target images. The similarity calculation unit 702 generates a similarity map that indicates the degree of similarity for each position of each target feature map. The image processing unit 703 generates a difference extraction image according to the similarity map.

したがって、２つの対象画像の特徴の類似度を示す類似度マップに応じた差分抽出画像が生成されるため、ユーザは、２つの対象画像を目視で比較しなくても、差分抽出画像を目視するだけで、２つの対象画像の差を把握することが可能となる。このため、画像間の差分を効率よく把握することが可能になる。また、ユーザは、バイナリー記述子の種類などを画像に応じて指定する必要もないため、その観点からも、画像間の差分を効率よく把握することが可能になる。 Therefore, since a difference extraction image is generated according to a similarity map indicating the degree of similarity between the features of the two target images, the user can visually check the difference extraction image without visually comparing the two target images. It becomes possible to understand the difference between the two target images just by Therefore, it becomes possible to efficiently understand the difference between images. Furthermore, since the user does not need to specify the type of binary descriptor for each image, it is possible to efficiently grasp the differences between images from this point of view as well.

また、本実施形態では、学習モデルである画像特徴抽出モデル７０１ａ及び７０１ｂを用いて対象特徴マップが生成されるため、対象画像間の類似度を適切に反映した差分抽出画像を生成することが可能となる。 Furthermore, in this embodiment, since the target feature map is generated using the image feature extraction models 701a and 701b, which are learning models, it is possible to generate a difference extraction image that appropriately reflects the similarity between the target images. becomes.

また、本実施形態では、画像レジストレーション部１０７が２つの指定画像に対して画像レジストレーション処理を行って対象画像を生成するため、対象画像間の類似度を適切に反映した差分抽出画像を生成することが可能となる。 Furthermore, in this embodiment, the image registration unit 107 performs image registration processing on two designated images to generate a target image, so a difference extraction image that appropriately reflects the similarity between the target images is generated. It becomes possible to do so.

また、本実施形態では、画像レジストレーション処理の変換モデルの変換パラメータが２つの指定画像の特徴マップである２つの指定特徴マップに基づいて推定されるため、各指定画像の特徴に応じた適切な変換パラメータを推定することが可能となる。 Furthermore, in this embodiment, since the conversion parameters of the conversion model for image registration processing are estimated based on two specified feature maps, which are feature maps of two specified images, appropriate It becomes possible to estimate the transformation parameters.

また、本実施形態では、学習モデルである画像特徴抽出モデル２０１ａ及び２０１ｂを用いて指定特徴マップが生成されるため、各指定画像の特徴を適切に反映した変換パラメータを推定することが可能となる。 Furthermore, in this embodiment, the designated feature map is generated using the image feature extraction models 201a and 201b, which are learning models, so it is possible to estimate transformation parameters that appropriately reflect the characteristics of each designated image. .

また、本実施形態では、画像レジストレーション部１０７が用いる画像特徴抽出モデル２０１ａ及び２０１ｂと、画像差分抽出部１０８が用いる画像特徴抽出モデル７０１ａ及び７０１ｂとが同じモデルであるため、効率の良い学習が可能となる。 Furthermore, in this embodiment, since the image feature extraction models 201a and 201b used by the image registration unit 107 and the image feature extraction models 701a and 701b used by the image difference extraction unit 108 are the same model, efficient learning is possible. It becomes possible.

また、本実施形態では、サーバ管理部１０５は、ランダムに生成した第１の変換パラメータと、画像Ａと画像Ａを第１の変換パラメータで変換した画像Ｂとを指定画像としたときに画像レジストレーション部１０７にて推定される第２の変換パラメータとの差を示す損失値に基づいて、画像特徴抽出モデル２０１ａ及び２０１ｂが構築される。このため、精度の良い画像特徴抽出モデル２０１ａ及び２０１ｂを構築することが可能になる。 Further, in this embodiment, the server management unit 105 uses the randomly generated first conversion parameter and the image registration when image A and image B obtained by converting image A using the first conversion parameter are designated images. Image feature extraction models 201a and 201b are constructed based on the loss value indicating the difference from the second transformation parameter estimated by the conversion unit 107. Therefore, it becomes possible to construct highly accurate image feature extraction models 201a and 201b.

また、本実施形態では、サーバ管理部１０５は、ランダムに生成した第１の変換パラメータと、画像Ｃと画像Ｃに対して予め対応付けられた画像Ｄを第１の変換パラメータで変換した画像Ｄ’とを指定画像としたときに画像レジストレーション部１０７にて推定される第２の変換パラメータとの差を示す損失値に基づいて、画像特徴抽出モデル２０１ａ及び２０１ｂが構築される。このため、より精度の良い画像特徴抽出モデル２０１ａ及び２０１ｂを構築することが可能になる。 Further, in this embodiment, the server management unit 105 uses a randomly generated first conversion parameter, and an image D obtained by converting an image C and an image D that is associated with the image C in advance using the first conversion parameter. Image feature extraction models 201a and 201b are constructed based on a loss value indicating the difference between the second transformation parameter and the second transformation parameter estimated by the image registration unit 107 when the specified image is . Therefore, it becomes possible to construct more accurate image feature extraction models 201a and 201b.

また、本実施形態では、特徴抽出部は、２つの対象画像のそれぞれについて、当該対象画像を分割した複数の部分画像のそれぞれの特徴を示す特徴マップである部分特徴マップを生成し、各部分特徴マップの位置関係が各部分画像に対応する各部分画像の位置関係と同じになるように各部分特徴マップを結合して、対象特徴マップを生成する。このため、精度の良い対象特徴マップを生成することができる。 Furthermore, in the present embodiment, the feature extraction unit generates, for each of the two target images, a partial feature map that is a feature map indicating the characteristics of each of the plurality of partial images obtained by dividing the target image, and generates a partial feature map for each of the partial images. A target feature map is generated by combining the partial feature maps so that the positional relationship of the maps is the same as the positional relationship of each partial image corresponding to each partial image. Therefore, a highly accurate target feature map can be generated.

また、本実施形態では、差分抽出画像は、類似度マップの各値を色又は濃淡で表したヒートマップ画像であるため、２つの画像の差を直感的に素早く把握することが可能となる。 Furthermore, in this embodiment, the difference extracted image is a heat map image in which each value of the similarity map is represented by color or shading, so it is possible to quickly and intuitively grasp the difference between the two images.

上述した本開示の実施形態は、本開示の説明のための例示であり、本開示の範囲をそれらの実施形態にのみ限定する趣旨ではない。当業者は、本開示の範囲を逸脱することなしに、他の様々な態様で本開示を実施することができる。 The embodiments of the present disclosure described above are examples for explaining the present disclosure, and are not intended to limit the scope of the present disclosure only to those embodiments. Those skilled in the art can implement the present disclosure in various other ways without departing from the scope of the disclosure.

１００：利用者端末、１０１：入出力部、１０１：処理フロー作成サーバ、１０２：端末管理部、１０３：ネットワーク部、１０４：画像処理サーバ、１０５：サーバ管理部、１０６：画像検索部、１０７：画像レジストレーション部、１０８：画像差分抽出部、１０９：ネットワーク部、１１０：画像データベース、１１１：ネットワーク、１１２：メモリ、２０１：指定特徴抽出部、２０１ａ～２０１ｂ：画像特徴抽出モデル、２０２：変換パラメータ回帰モデル、２０３：特徴関連性マップ算出部、２０４：実行部、３００：画像特徴抽出モデル、３０１：処理層、３０２：畳み込み層、３０３：プーリング層、４００：変換パラメータ回帰モデル、４０１：処理層、４０２：畳み込み層、４０３：プーリング層、４０４：ＦＣ層、７００：画像差分抽出モデル、７０１：特徴抽出部、７０１ａ～７０１ｂ：画像特徴抽出モデル、７０２：類似度算出部、７０３：画像処理部

100: User terminal, 101: Input/output unit, 101: Process flow creation server, 102: Terminal management unit, 103: Network unit, 104: Image processing server, 105: Server management unit, 106: Image search unit, 107: Image registration section, 108: Image difference extraction section, 109: Network section, 110: Image database, 111: Network, 112: Memory, 201: Specified feature extraction section, 201a to 201b: Image feature extraction model, 202: Conversion parameter Regression model, 203: Feature relevance map calculation unit, 204: Execution unit, 300: Image feature extraction model, 301: Processing layer, 302: Convolution layer, 303: Pooling layer, 400: Transformation parameter regression model, 401: Processing layer , 402: Convolution layer, 403: Pooling layer, 404: FC layer, 700: Image difference extraction model, 701: Feature extraction unit, 701a to 701b: Image feature extraction model, 702: Similarity calculation unit, 703: Image processing unit

Claims

a feature extraction unit that generates a plurality of target feature maps that are feature maps indicating characteristics of each of the plurality of target images;
a similarity calculation unit that generates a similarity map indicating the similarity for each position of each target feature map;
an image processing unit that generates a difference extraction image according to the similarity map;
an image registration unit that performs image registration processing to correct geometric differences on a plurality of specified images to generate the target image ;
The image registration section includes:
a specified feature extraction unit that generates a plurality of specified feature maps that are feature maps of each of the plurality of specified images;
a parameter estimation unit that estimates transformation parameters of a transformation model for correcting geometric differences between the plurality of images based on each specified feature map;
A similar image difference extraction device , comprising: an execution unit that performs the image registration process using the conversion parameter .

The similar image difference extraction device according to claim 1, wherein the feature extraction unit generates the target feature map using a learning model that generates a feature map of images.

The similar image difference extraction device according to claim 1 , wherein the specified feature extraction unit generates the specified feature map using a learning model that generates a feature map of images.

The feature extraction unit generates the target feature map using a learning model that generates a feature map of an image,
4. The similar image difference extraction device according to claim 3 , wherein the learning model used by the specified feature extraction section and the learning model used by the feature extraction section are the same model.

A first transformation parameter that is the randomly generated transformation parameter, a first learning image, and a second learning image obtained by converting the first learning image using the first transformation parameter are combined into the plurality of further comprising a learning management unit that constructs the learning model based on a loss value indicating a difference between the second transformation parameter that is the transformation parameter estimated by the image registration unit when the specified image is The similar image difference extraction device according to claim 3 .

A first transformation parameter that is the randomly generated transformation parameter, and a first learning image and a corresponding similar image that is associated in advance with the first learning image are transformed using the first transformation parameter. a second transformation parameter that is the transformation parameter estimated by the image registration unit when the second learning image is the plurality of designated images;
4. The similar image difference extraction device according to claim 3 , further comprising a learning management unit that constructs the learning model based on a loss value indicating a difference in the learning model.

The feature extraction unit generates, for each of the plurality of target images, a partial feature map that is a feature map indicating the characteristics of each of a plurality of partial images obtained by dividing the target image, and determines the positional relationship of each partial feature map. The similar image difference extraction device according to claim 1, wherein the target feature map is generated by combining each partial feature map so that the positional relationship of each partial image corresponding to the partial feature map is the same.

2. The similar image difference extraction device according to claim 1, wherein the difference extraction image is a heat map image in which each value of the similarity map is expressed in color or shade.

A similar image difference extraction method using a similar image difference extraction device, the method comprising:
Performs image registration processing to correct geometric differences on multiple designated images to generate multiple target images,
Generating a plurality of target feature maps that are feature maps indicating characteristics of each of the plurality of target images,
Generate a similarity map showing the similarity for each position of each target feature map,
Generating a difference extraction image according to the similarity map,
In generating the target image,
generating a plurality of designated feature maps that are feature maps of each of the plurality of designated images;
Based on each designated feature map, estimate transformation parameters of a transformation model for correcting geometric differences between multiple images;
A similar image difference extraction method , wherein the image registration process is performed using the conversion parameter .

to the computer,
a step of generating a plurality of target images by performing image registration processing to correct geometric differences on the plurality of specified images;
a step of generating a plurality of target feature maps that are feature maps indicating characteristics of each of the plurality of target images;
a step of generating a similarity map indicating the degree of similarity for each location of each target feature map;
performing a step of generating a difference extraction image according to the similarity map ;
The procedure for generating the target image is as follows:
a step of generating a plurality of designated feature maps that are feature maps of each of the plurality of designated images;
estimating transformation parameters of a transformation model for correcting geometric differences between the plurality of images based on each designated feature map;
A program comprising: performing the image registration process using the conversion parameters .

A recording medium recording the program according to claim 10 .