WO2019230666A1 - Feature amount extraction device, method, and program - Google Patents

Feature amount extraction device, method, and program Download PDF

Info

Publication number
WO2019230666A1
WO2019230666A1 PCT/JP2019/020948 JP2019020948W WO2019230666A1 WO 2019230666 A1 WO2019230666 A1 WO 2019230666A1 JP 2019020948 W JP2019020948 W JP 2019020948W WO 2019230666 A1 WO2019230666 A1 WO 2019230666A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
image
weight
feature map
category
Prior art date
Application number
PCT/JP2019/020948
Other languages
French (fr)
Japanese (ja)
Inventor
之人 渡邉
周平 田良島
島村 潤
杵渕 哲也
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Publication of WO2019230666A1 publication Critical patent/WO2019230666A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/908Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to a feature quantity extraction apparatus, method, and program, and more particularly, to a feature quantity extraction apparatus, method, and program for extracting feature quantities for searching for an object in an image.
  • Non-Patent Document 1 a feature vector is extracted from an image by spatially pooling the output of the convolution layer for each region using CNN learned for image classification.
  • the inner product of the feature quantity vectors is calculated for two different images. It is considered that the same object is reflected as the value increases.
  • Non-Patent Document 2 describes a method of learning CNN using a teacher label indicating an object included in an image. Create a triple of images based on a group of about 200,000 teacher images, and learn CNN so that the distance between image pairs with the same teacher label is smaller than the distance between image pairs with different teacher labels As a result, highly accurate object retrieval is realized.
  • Non-Patent Document 1 is to extract a feature vector from a convolutional neural network (CNN) that has been learned so as to identify the category of a learning image group.
  • CNN convolutional neural network
  • Non-Patent Document 2 can obtain a feature vector that can discriminate different objects with high accuracy by learning CNN using a teacher image for object search.
  • preparing a large number of teacher images for object search requires a great deal of cost.
  • the preparation of 10,000 teacher images with a target category of automobile and the preparation of 10,000 teacher images of a specific vehicle type in the automobile are more difficult and costly in the latter case.
  • the present invention has been made to solve the above-described problems, and a feature amount extraction apparatus, method, and program capable of extracting feature amounts for accurately searching for an object in an image in a target category.
  • the purpose is to provide.
  • a feature quantity extraction device is a feature quantity extraction device that extracts a feature quantity from an arbitrary image, and pre-categorizes an image category from a plurality of categories including a target category.
  • the arbitrary image is input to the convolutional neural network learned to identify, and the output for each channel and the output position of the convolutional layer of the convolutional neural network obtained for each image is the feature of the image.
  • It is configured to include a feature transform unit for calculating a feature amount vector based on the feature maps, a.
  • the weight calculation unit calculates a weight representing the influence of the feature map on the target category for each channel and each output position using a classifier. You may make it do.
  • the weight calculation unit calculates a weight representing the influence of the feature map on the target category for each channel and each output position using a classifier.
  • the channel obtained by integrating the weights for each channel at the same output position, or obtained by integrating the weights for each output position in the same channel.
  • Each weight may be calculated.
  • the classifier outputs a probability for each category as a classification result
  • the weight calculation unit uses the differential coefficient of the probability of the target category, The weight may be calculated.
  • the feature amount vector calculated for the image is collated with a feature amount vector previously extracted from each of the reference images for the target category, and A matching unit that outputs each of the corresponding reference images as a search result may be further included.
  • a feature amount extraction method is a feature amount extraction method in a feature amount extraction apparatus for extracting a feature amount from an arbitrary image, wherein the feature map calculation unit pre-images images from a plurality of categories including a target category.
  • the arbitrary image is input to a convolutional neural network that is trained to identify a category, and an output for each channel and output position of the convolutional layer of the convolutional neural network obtained for each image is obtained.
  • the step of calculating as a feature map representing the features of the image, and the weight calculation unit using the classifier that has been learned in advance and classifies the category of the image by using the feature map as an input, to the target category of the feature map A step of calculating a weight representing the influence of the target, and the feature converting unit so as to remove the influence on the target category. And executes includes a step of calculating a feature vector based on the feature map obtained by applying weights to the feature map, the.
  • the program according to the third invention is a program for causing a computer to function as each part of the feature quantity extraction device according to the first invention.
  • an arbitrary image is input to a convolutional neural network that has been learned in advance to identify a category of an image from a plurality of categories including a target category.
  • Classification for classifying image categories by using the feature map as input and calculating the output for each channel and output position of the convolutional layer of the convolutional neural network obtained as the feature map as a feature map representing the features of the image.
  • the feature vector is calculated based on the feature map obtained by applying the weight to the feature map so as to remove the effect on the target category.
  • the effect is obtained that.
  • a feature amount extraction apparatus 1 illustrated in FIG. 1 extracts a feature amount vector 6 that can accurately search a reference image including the same object as a query image in which an object belonging to a specific target category is captured. It is.
  • the image 4 corresponds to a query image
  • the reference image set 5 corresponds to an image set including one or more reference images. Note that the reference image of the reference image set 5 may be used as the image 4.
  • the feature quantity extraction device 1 can be composed of a computer including a CPU, a RAM, and a ROM that stores a program for executing a feature amount extraction processing routine described later and various data.
  • the feature quantity extraction device 1 includes a feature map calculation unit 11, a weight calculation unit 12, a feature conversion unit 13, and a matching unit 14.
  • the feature quantity extraction device 1 of this embodiment communicates information with each other via a database 2 and communication means (not shown).
  • the database 2 can be configured by, for example, a file system mounted on a general general-purpose computer.
  • the database 2 stores in advance the data of each reference image of the reference image set 5 for each category and the feature vector of each extracted reference image.
  • an identifier such as a serial number ID (Identification) or a unique reference image file name that can uniquely identify each reference image is given.
  • the database 2 stores, for each reference image, an identifier of the reference image and image data of the reference image in association with each other.
  • the database 2 may be similarly implemented and configured with RDBMS (Relational Database Management System) or the like.
  • the information stored in the database 2 includes, for example, information that expresses the content of the reference image (such as a title of the reference image, a summary sentence, or a keyword), and information about the format of the reference image (data amount of the reference image, However, storage of such information is not essential in the implementation of the present disclosure.
  • the database 2 may be provided either inside or outside the feature quantity extraction apparatus 1, and any known communication means can be used.
  • the database 2 is assumed to be provided outside the feature quantity extraction device 1, and the feature quantity is extracted using the Internet and a network such as TCP / IP (Transmission Control Protocol / Internet Protocol) as communication means. It is assumed that the apparatus 1 is connected to be communicable.
  • TCP / IP Transmission Control Protocol / Internet Protocol
  • each unit and database 2 included in the feature quantity extraction device 1 includes an arithmetic processing device such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and It may be configured by a computer or server provided with a storage device such as an HDD (Hard Disk Drive), and the processing of each unit may be executed by a program.
  • This program may be stored in advance in the storage device included in the feature quantity extraction device 1, stored in a recording medium such as a magnetic disk, an optical disk, and a semiconductor memory, or provided through a network. Is possible.
  • any other components need not be realized by a single computer or server, but may be realized by being distributed to a plurality of computers connected by a network.
  • the feature map calculation unit 11 inputs an image 4 to a convolutional neural network (CNN) learned in advance so as to identify an image category from a plurality of categories including the target category, and is obtained for each image 4.
  • CNN convolutional neural network
  • the output for each channel and each output position of the convolutional layer is calculated as a feature map representing the features of the image 4.
  • the feature map is, for example, the output of an arbitrary intermediate layer of CNN called VGG-16 and VGG-19 described in Non-Patent Document 3 or ResNet-50 and ResNet-101 described in Non-Patent Document 4. It can be obtained by extraction. In the most preferred example, the feature map is obtained from the final convolutional layer corresponding to the state immediately before the entire coupling layer. Hereinafter, for explanation, it is assumed that a feature map is obtained from the final convolution layer of VGG-16 (for example, the third layer in the fifth block). A feature map can be extracted from the image 4 as described above.
  • Non-Patent Document 3 K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, In ICLR, 2015.
  • Non-Patent Document 4 K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, In CVPR, 2016.
  • the weight calculation unit 12 calculates a weight representing the influence of the extracted feature map on the target category by using a classifier that classifies the category of the image by using the feature map as input.
  • the weight calculation unit 12 calculates a weight where the numerical value of the portion having a large influence on the target category is large and the numerical value of the portion having a small influence is small.
  • the feature map is input to the classifier, and the probability for each category is obtained as the category classification result of the image 4.
  • Any classifier may be used as long as it outputs a probability for each category as a classification result.
  • Any known classifier may be used, but a fully connected multilayer neural network whose final layer is a softmax layer. A network is preferred.
  • the layers after the final convolutional layer of VGG-16 used in the feature map extraction process are obtained as classifiers.
  • a weight representing the degree of influence on the target category in the feature map is calculated for each channel and output position of the convolution layer.
  • the weight corresponds to the output position (vertical and horizontal) and the channel of the feature map, and has the same size as the size of the feature map (vertical, horizontal and the number of channels).
  • the method for obtaining the degree of influence is not limited, as an example, the weight is calculated using the differential coefficient of the probability of the target category in the extracted feature map described in Non-Patent Document 5.
  • the weight for each channel obtained by integrating the weights for each output position in the same channel.
  • the differential coefficient is averaged within the channel, and the value averaged for each channel can be used as the weight.
  • the weight corresponds to the channel of the feature map, has the same number as the number of channels of the feature map, and has a larger value as the channel is more influenced by the category.
  • the weight for each output position obtained by integrating the weights for each channel at the same output position.
  • the weight corresponds to the output position of the feature map and has the same size as the length and width of the feature map.
  • the integration can be performed by a known method such as summing, averaging, or taking the maximum value of all channels at each output position.
  • the weight calculation unit 12 finally normalizes and outputs the weight calculated by the above procedure. Since the weight may include a negative value, it is preferable to apply a process such as subtracting the minimum value, subtracting the minimum value and dividing by the maximum value, or replacing the negative value with 0.
  • a weight having a larger value can be obtained at a location where the influence of the category is greater with respect to the feature map channel and / or the output position.
  • the feature conversion unit 13 calculates the feature vector 6 based on the feature map obtained by applying the weight to the feature map so as to remove the influence on the target category.
  • the feature converter 13 first applies weights to the feature map.
  • the weight corresponds to the output position and channel of the feature map
  • the simplest is to subtract the corresponding weight value from each pixel (output position) of the feature map.
  • the influence of the category in the feature map can be suppressed.
  • the weight value may be normalized from 0 to 1, subtracted from 1, and multiplied by the feature map. If the weight corresponds only to the output position of the feature map or only to the channel, the weight can be applied by performing the same processing on each value of the feature map corresponding to the corresponding output position or channel. .
  • the influence of the category can be suppressed by setting the value of the corresponding part of the feature map to 0 for the part where the weight is a certain value or more.
  • the constant value may be defined in advance or may be an average value of weights.
  • the feature conversion unit 13 obtains the feature quantity vector 6 from the feature map to which the weight is applied.
  • a known method may be used as a method for calculating the feature quantity vector 6 from the feature map.
  • a method described in Non-Patent Document 1 may be used. In this case, first, rectangles of various sizes are defined, and a vector of the number of rectangles ⁇ the number of channels is obtained by obtaining the maximum value in the rectangle for each channel. By normalizing the feature vector group, adding the values of the same channel and normalizing again, it can be expressed by the feature vector 6 having the dimension of the number of channels.
  • a known method may be used, but L2 normalization is preferable.
  • the weight may be applied after the feature vector 6 is calculated. For example, from the feature vector 6 obtained from the feature map, the feature vector 6 obtained from the output position of the feature map and the weight corresponding to the channel is used.
  • the weight may be applied by subtraction. In this case, it is preferable to normalize each feature vector 6 before and after subtraction.
  • the collation unit 14 collates the feature quantity vector 6 calculated for the image 4 with the feature quantity vector extracted from each of the reference images for the target category stored in the database 2, and determines the reference image corresponding to the image 4. Each is output as a search result 7.
  • the similarity may be obtained by any known scale such as inner product or cosine similarity.
  • Reference images having the same or similar meaning content are output as search results 7 in order from the highest similarity.
  • a known indexing method may be used for obtaining these, and for example, the feature vector 6 is hashed using a method disclosed in Patent Document 1 to find a reference image that is approximately similar. May be.
  • the feature amount extraction apparatus 1 executes a feature amount extraction processing routine shown in FIG.
  • step S101 the feature map calculation unit 11 inputs the image 4 to the CNN learned in advance so as to identify the category of the image from a plurality of categories including the target category, and is obtained for each image 4.
  • the output for each channel and each output position of the convolutional layer of the CNN is calculated as a feature map representing the features of the image 4.
  • the weight calculation unit 12 calculates a weight representing the influence of the extracted feature map on the target category using a classifier that has been learned in advance and classifies the category of the image using the feature map as an input.
  • the feature map is input to the classifier, the probability for each category is obtained as the category classification result of the image 4, the feature map and the probability of the target category are used for the target category, and the target category in the feature map is obtained.
  • the weight representing the degree of the influence is calculated for each channel and output position of the feature map.
  • step S103 the feature conversion unit 13 calculates the feature vector 6 based on the feature map obtained by applying the weight to the feature map so as to remove the influence on the target category.
  • step S104 the feature amount vector 6 calculated for the image 4 is compared with the feature amount vector extracted from each of the reference images for the target category stored in the database 2, and each of the reference images corresponding to the image 4 is checked. Is output as the search result 7.
  • an arbitrary image is input to the CNN learned in advance to identify the category of the image from a plurality of categories including the target category. Then, the output for each channel and output position of the CNN convolutional layer obtained for each image is calculated as a feature map representing the feature of the image, and the category of the image is classified using the feature map learned in advance as an input.
  • the classifier is used to calculate the weight that represents the influence of the feature map on the target category, and the feature vector is calculated based on the feature map obtained by applying the weight to the feature map so as to remove the influence on the target category.
  • the present invention is not limited to this, and a collation device may be provided outside. In this case, it is assumed that the collation device is connected to the feature amount extraction device and the database so as to communicate with each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention makes it possible to extract a feature amount for performing a precise search for an object within an image in a target category. A discretionary image is input into a convolutional neural network that has been trained to identify image categories from among a plurality of categories including a target category in advance, output that is obtained for each image and that is for each output position and each channel of the convolutional layers of the convolutional neural network is calculated as a feature map representing the features of an image, a classifier that is trained in advance and that uses a feature map as input to classify the category of an image is used to calculate weighting representing the impact of the feature map on the target category, the weighting is applied to the feature map so as to eliminate the impact on the target category, and a feature amount vector is calculated on the basis of the obtained feature map.

Description

特徴量抽出装置、方法、及びプログラムFeature extraction apparatus, method, and program
 本発明は、特徴量抽出装置、方法、及びプログラムに係り、特に、画像中の物体を検索するための特徴量を抽出する特徴量抽出装置、方法、及びプログラムに関する。 The present invention relates to a feature quantity extraction apparatus, method, and program, and more particularly, to a feature quantity extraction apparatus, method, and program for extracting feature quantities for searching for an object in an image.
 スマートフォン等の小型撮像デバイスの普及に伴い、様々な場所や環境で任意の対象を撮影したような画像中に写る物体を検索する技術への要望が高まってきている。 With the widespread use of small-sized imaging devices such as smartphones, there is an increasing demand for technologies for searching for objects that appear in images that are taken from arbitrary objects in various places and environments.
 従来、畳み込みニューラルネットワーク(CNN)によって物体を検索する、種々の技術が発明され、開示されている。典型的な手続きを、非特許文献1に記載の技術に従って説明する。まず、画像分類用に学習されたCNNを用いて、畳み込み層の出力を領域ごとに空間プーリングすることで、画像から特徴量ベクトルを抽出する。次に、互いに異なる二つの画像について特徴量ベクトル同士の内積を計算する。値が大きいほど、同一の物体が写っているとみなす。予め、認識したい物体を含む画像(参照画像)によりあらかじめ参照画像データベースを構築し、新たに入力された画像(クエリ画像)と同一の物体が写っているものを検索することにより、クエリ画像中に存在する物体を特定することができる。 Conventionally, various techniques for searching for an object using a convolutional neural network (CNN) have been invented and disclosed. A typical procedure will be described according to the technique described in Non-Patent Document 1. First, a feature vector is extracted from an image by spatially pooling the output of the convolution layer for each region using CNN learned for image classification. Next, the inner product of the feature quantity vectors is calculated for two different images. It is considered that the same object is reflected as the value increases. By constructing a reference image database in advance with an image (reference image) including an object to be recognized, and searching for an image that contains the same object as the newly input image (query image), An existing object can be identified.
 また、非特許文献2には、画像が含む物体を示す教師ラベルを用いてCNNを学習する方法が記載されている。約20万枚の教師画像群をもとに画像の三つ組を作成し、同一教師ラベルを持つ画像ペア間の距離が、異なる教師ラベルを持つ画像ペア間の距離より小さくなるようにCNNを学習することで、高精度な物体検索を実現している。 Further, Non-Patent Document 2 describes a method of learning CNN using a teacher label indicating an object included in an image. Create a triple of images based on a group of about 200,000 teacher images, and learn CNN so that the distance between image pairs with the same teacher label is smaller than the distance between image pairs with different teacher labels As a result, highly accurate object retrieval is realized.
 非特許文献1の方法は、学習画像群のカテゴリを見分けるべく学習された畳み込みニューラルネットワーク(CNN)から特徴量ベクトルを抽出するものであり、同一カテゴリの物体が複数存在する場合に、特徴量ベクトル間の距離が不必要に近くなってしまう。結果として、似て非なるような物体の検索精度が劣化してしまうという問題がある。 The method of Non-Patent Document 1 is to extract a feature vector from a convolutional neural network (CNN) that has been learned so as to identify the category of a learning image group. When there are a plurality of objects of the same category, the feature vector The distance between them becomes unnecessarily close. As a result, there is a problem in that the search accuracy of objects that are not similar is deteriorated.
 非特許文献2の方法は、物体検索の教師画像を用いてCNNを学習することで、異なる物体を高精度に弁別可能な特徴量ベクトルを得ることができる。しかしながら、物体検索の教師画像を多数準備することは、多大なコストを要するという問題がある。一例として、対象カテゴリを自動車とする教師画像を1万枚準備する事と、自動車の中でもある特定の車種の教師画像を1万枚準備する事では、後者の方が難しくコストが必要となる。 The method of Non-Patent Document 2 can obtain a feature vector that can discriminate different objects with high accuracy by learning CNN using a teacher image for object search. However, there is a problem that preparing a large number of teacher images for object search requires a great deal of cost. As an example, the preparation of 10,000 teacher images with a target category of automobile and the preparation of 10,000 teacher images of a specific vehicle type in the automobile are more difficult and costly in the latter case.
 以上のように、現在に至るまで、CNNによる物体検索において、物体検索の教師画像を用いず高精度に検索を実現する方法は発明されていなかった。 As described above, until now, no method has been invented to achieve a high-precision search without using a teacher image for object search in object search by CNN.
 本発明は、上記問題点を解決するために成されたものであり、対象カテゴリにおける画像中の物体を精度よく検索するための特徴量を抽出することができる特徴量抽出装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and a feature amount extraction apparatus, method, and program capable of extracting feature amounts for accurately searching for an object in an image in a target category. The purpose is to provide.
 上記目的を達成するために、第1の発明に係る特徴量抽出装置は、任意の画像から特徴量を抽出する特徴量抽出装置であって、対象カテゴリを含む複数のカテゴリから予め画像のカテゴリを識別するように学習された畳み込みニューラルネットワークに対して前記任意の画像を入力し、前記画像ごとに得られる、前記畳み込みニューラルネットワークの畳み込み層のチャネルごと及び出力位置ごとの出力を、前記画像の特徴を表す特徴マップとして算出する特徴マップ算出部と、予め学習された、前記特徴マップを入力として前記画像のカテゴリを分類する分類器を用いて、前記特徴マップの前記対象カテゴリへの影響を表す重みを算出する重み算出部と、前記対象カテゴリへの影響を除するように前記重みを前記特徴マップに適用して得られる特徴マップに基づいて特徴量ベクトルを算出する特徴変換部と、を含んで構成されている。 In order to achieve the above object, a feature quantity extraction device according to a first invention is a feature quantity extraction device that extracts a feature quantity from an arbitrary image, and pre-categorizes an image category from a plurality of categories including a target category. The arbitrary image is input to the convolutional neural network learned to identify, and the output for each channel and the output position of the convolutional layer of the convolutional neural network obtained for each image is the feature of the image. A weight representing the influence of the feature map on the target category using a feature map calculation unit that calculates the feature map representing the image and a classifier that classifies the category of the image using the feature map as an input. Obtained by applying the weight to the feature map so as to remove the influence on the target category. It is configured to include a feature transform unit for calculating a feature amount vector based on the feature maps, a.
 また、第1の発明に係る特徴量抽出装置において、前記重み算出部は、分類器を用いて、前記特徴マップの前記対象カテゴリへの影響を表す重みを前記チャネルごと及び前記出力位置ごとに算出するようにしてもよい。 In the feature quantity extraction device according to the first aspect of the invention, the weight calculation unit calculates a weight representing the influence of the feature map on the target category for each channel and each output position using a classifier. You may make it do.
 また、第1の発明に係る特徴量抽出装置において、前記重み算出部は、分類器を用いて、前記特徴マップの前記対象カテゴリへの影響を表す重みを前記チャネルごと及び前記出力位置ごとに算出し、同一の出力位置での前記チャネルごとの重みを統合することにより得られる、前記出力位置ごとの重み、または同一のチャネルでの前記出力位置ごとの重みを統合することにより得られる、前記チャネルごとの重みを算出するようにしてもよい。 In the feature quantity extraction device according to the first aspect of the invention, the weight calculation unit calculates a weight representing the influence of the feature map on the target category for each channel and each output position using a classifier. The channel obtained by integrating the weights for each channel at the same output position, or obtained by integrating the weights for each output position in the same channel. Each weight may be calculated.
 また、第1の発明に係る特徴量抽出装置において、前記分類器は、分類結果としてカテゴリ毎に確率を出力し、前記重み算出部は、前記対象カテゴリの前記確率の微分係数を用いて、前記重みを算出するようにしてもよい。 In the feature amount extraction apparatus according to the first invention, the classifier outputs a probability for each category as a classification result, and the weight calculation unit uses the differential coefficient of the probability of the target category, The weight may be calculated.
 また、第1の発明に係る特徴量抽出装置において、前記画像について算出した前記特徴量ベクトルと、予め前記対象カテゴリについての参照画像の各々から抽出された特徴量ベクトルとを照合し、前記画像に対応する前記参照画像の各々を検索結果として出力する照合部、を更に含むようにしてもよい。 In the feature amount extraction apparatus according to the first invention, the feature amount vector calculated for the image is collated with a feature amount vector previously extracted from each of the reference images for the target category, and A matching unit that outputs each of the corresponding reference images as a search result may be further included.
 第2の発明に係る特徴量抽出方法は、任意の画像から特徴量を抽出する特徴量抽出装置における特徴量抽出方法であって、特徴マップ算出部が、対象カテゴリを含む複数のカテゴリから予め画像のカテゴリを識別するように学習された畳み込みニューラルネットワークに対して前記任意の画像を入力し、前記画像ごとに得られる、前記畳み込みニューラルネットワークの畳み込み層のチャネルごと及び出力位置ごとの出力を、前記画像の特徴を表す特徴マップとして算出するステップと、重み算出部が、予め学習された、前記特徴マップを入力として前記画像のカテゴリを分類する分類器を用いて、前記特徴マップの前記対象カテゴリへの影響を表す重みを算出するステップと、特徴変換部が、前記対象カテゴリへの影響を除するように前記重みを前記特徴マップに適用して得られる特徴マップに基づいて特徴量ベクトルを算出するステップと、を含んで実行することを特徴とする。 A feature amount extraction method according to a second invention is a feature amount extraction method in a feature amount extraction apparatus for extracting a feature amount from an arbitrary image, wherein the feature map calculation unit pre-images images from a plurality of categories including a target category. The arbitrary image is input to a convolutional neural network that is trained to identify a category, and an output for each channel and output position of the convolutional layer of the convolutional neural network obtained for each image is obtained. The step of calculating as a feature map representing the features of the image, and the weight calculation unit using the classifier that has been learned in advance and classifies the category of the image by using the feature map as an input, to the target category of the feature map A step of calculating a weight representing the influence of the target, and the feature converting unit so as to remove the influence on the target category. And executes includes a step of calculating a feature vector based on the feature map obtained by applying weights to the feature map, the.
 第3の発明に係るプログラムは、コンピュータを、第1の発明に係る特徴量抽出装置の各部として機能させるためのプログラムである。 The program according to the third invention is a program for causing a computer to function as each part of the feature quantity extraction device according to the first invention.
 本発明の特徴量抽出装置、方法、及びプログラムによれば、対象カテゴリを含む複数のカテゴリから予め画像のカテゴリを識別するように学習された畳み込みニューラルネットワークに対して任意の画像を入力し、画像ごとに得られる、畳み込みニューラルネットワークの畳み込み層のチャネルごと及び出力位置ごとの出力を、画像の特徴を表す特徴マップとして算出し、予め学習された、特徴マップを入力として画像のカテゴリを分類する分類器を用いて、特徴マップの対象カテゴリへの影響を表す重みを算出し、対象カテゴリへの影響を除するように重みを特徴マップに適用して得られる特徴マップに基づいて特徴量ベクトルを算出することにより、対象カテゴリにおける画像中の物体を精度よく検索するための特徴量を抽出することができる、という効果が得られる。 According to the feature amount extraction apparatus, method, and program of the present invention, an arbitrary image is input to a convolutional neural network that has been learned in advance to identify a category of an image from a plurality of categories including a target category. Classification for classifying image categories by using the feature map as input and calculating the output for each channel and output position of the convolutional layer of the convolutional neural network obtained as the feature map as a feature map representing the features of the image. The feature vector is calculated based on the feature map obtained by applying the weight to the feature map so as to remove the effect on the target category. To extract features for accurate search for objects in the image in the target category Can be, the effect is obtained that.
本発明の実施形態に係る特徴量抽出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the feature-value extraction apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る特徴量抽出装置における特徴量抽出処理ルーチンを示すフローチャートである。It is a flowchart which shows the feature-value extraction process routine in the feature-value extraction apparatus which concerns on embodiment of this invention.
 以下、図面を参照して本発明の実施形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
<本発明の実施形態に係る特徴量抽出装置の構成> <Configuration of Feature Extraction Device According to Embodiment of the Present Invention>
 次に、本発明の実施形態に係る特徴量抽出装置の構成について説明する。図1に示す特徴量抽出装置1は、特定の対象カテゴリに属する物体が写ったクエリ画像と同じ物体を含む参照画像を高精度に検索することができる特徴量ベクトル6を抽出する特徴量抽出装置である。なお、以降では画像4がクエリ画像に対応し、参照画像集合5が1枚以上の参照画像からなる画像集合に対応するものとして説明する。なお、参照画像集合5の参照画像を画像4として用いてもよい。 Next, the configuration of the feature quantity extraction device according to the embodiment of the present invention will be described. A feature amount extraction apparatus 1 illustrated in FIG. 1 extracts a feature amount vector 6 that can accurately search a reference image including the same object as a query image in which an object belonging to a specific target category is captured. It is. In the following description, it is assumed that the image 4 corresponds to a query image, and the reference image set 5 corresponds to an image set including one or more reference images. Note that the reference image of the reference image set 5 may be used as the image 4.
 図1に示す特徴量抽出装置1は、CPUと、RAMと、後述する特徴量抽出処理ルーチンを実行するためのプログラムや各種データを記憶したROMと、を含むコンピュータで構成することが出来る。この特徴量抽出装置1は、特徴マップ算出部11と、重み算出部12と、特徴変換部13と、照合部14とを備えている。 1 can be composed of a computer including a CPU, a RAM, and a ROM that stores a program for executing a feature amount extraction processing routine described later and various data. The feature quantity extraction device 1 includes a feature map calculation unit 11, a weight calculation unit 12, a feature conversion unit 13, and a matching unit 14.
 本実施形態の特徴量抽出装置1は、データベース2と通信手段(図示省略)を介して相互に情報を通信する。 The feature quantity extraction device 1 of this embodiment communicates information with each other via a database 2 and communication means (not shown).
 データベース2は、例えば、一般的な汎用コンピュータに実装されているファイルシステムによって構成できる。本実施形態では、一例としてデータベース2には、カテゴリごとの参照画像集合5の参照画像の各々のデータ、及び抽出した参照画像の各々の特徴量ベクトルが予め格納されているものとする。本実施形態では、各参照画像それぞれを一意に識別可能な、通し番号によるID(Identification)やユニークな参照画像ファイル名等の識別子が与えられているものとしている。また、データベース2は、各々の参照画像について、当該参照画像の識別子と、当該参照画像の画像データとを関連づけて記憶しているものとする。あるいは、データベース2は、同様に、RDBMS(Relational Database Management System)等で実装及び構成されていても構わない。データベース2が記憶する情報は、その他、メタデータとして、例えば参照画像の内容を表現する情報(参照画像のタイトル、概要文、またはキーワード等)、参照画像のフォーマットに関する情報(参照画像のデータ量、サムネイル等のサイズ)等を含んでいても構わないが、これらの情報の記憶は本開示の実施においては必須ではない。 The database 2 can be configured by, for example, a file system mounted on a general general-purpose computer. In the present embodiment, as an example, it is assumed that the database 2 stores in advance the data of each reference image of the reference image set 5 for each category and the feature vector of each extracted reference image. In the present embodiment, it is assumed that an identifier such as a serial number ID (Identification) or a unique reference image file name that can uniquely identify each reference image is given. In addition, the database 2 stores, for each reference image, an identifier of the reference image and image data of the reference image in association with each other. Alternatively, the database 2 may be similarly implemented and configured with RDBMS (Relational Database Management System) or the like. The information stored in the database 2 includes, for example, information that expresses the content of the reference image (such as a title of the reference image, a summary sentence, or a keyword), and information about the format of the reference image (data amount of the reference image, However, storage of such information is not essential in the implementation of the present disclosure.
 データベース2は、特徴量抽出装置1の内部及び外部の何れに設けられていても構わず、通信手段は任意の公知ものを用いることができる。なお、本実施形態では、データベース2は、特徴量抽出装置1の外部に設けられているものとし、インターネット、及びTCP/IP(Transmission Control Protocol/Internet Protocol)等のネットワークを通信手段として特徴量抽出装置1と通信可能に接続されているものとする。 The database 2 may be provided either inside or outside the feature quantity extraction apparatus 1, and any known communication means can be used. In the present embodiment, the database 2 is assumed to be provided outside the feature quantity extraction device 1, and the feature quantity is extracted using the Internet and a network such as TCP / IP (Transmission Control Protocol / Internet Protocol) as communication means. It is assumed that the apparatus 1 is connected to be communicable.
 また、特徴量抽出装置1が備える各部及びデータベース2は、CPU(Central Processing Unit)、GPU(Graphics Processing Unit)等の演算処理装置や、RAM(Random Access Memory)、ROM(Read Only Memory)、及びHDD(Hard Disk Drive)等の記憶装置等を備えたコンピュータやサーバ等により構成して、各部の処理がプログラムによって実行されるものとしてもよい。このプログラムは特徴量抽出装置1が備える上記記憶装置に予め記憶されていてもよいし、磁気ディスク、光ディスク、及び半導体メモリ等の記録媒体に格納して提供することも、ネットワークを通して提供することも可能である。もちろん、その他いかなる構成要素についても、単一のコンピュータやサーバによって実現しなければならないものではなく、ネットワークによって接続された複数のコンピュータに分散して実現しても構わない。 Further, each unit and database 2 included in the feature quantity extraction device 1 includes an arithmetic processing device such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and It may be configured by a computer or server provided with a storage device such as an HDD (Hard Disk Drive), and the processing of each unit may be executed by a program. This program may be stored in advance in the storage device included in the feature quantity extraction device 1, stored in a recording medium such as a magnetic disk, an optical disk, and a semiconductor memory, or provided through a network. Is possible. Of course, any other components need not be realized by a single computer or server, but may be realized by being distributed to a plurality of computers connected by a network.
 特徴マップ算出部11は、対象カテゴリを含む複数のカテゴリから画像のカテゴリを識別するように予め学習された畳み込みニューラルネットワーク(CNN)に対して画像4を入力し、画像4ごとに得られる、CNNの畳み込み層のチャネルごと及び出力位置ごとの出力を、画像4の特徴を表す特徴マップとして算出する。 The feature map calculation unit 11 inputs an image 4 to a convolutional neural network (CNN) learned in advance so as to identify an image category from a plurality of categories including the target category, and is obtained for each image 4. The output for each channel and each output position of the convolutional layer is calculated as a feature map representing the features of the image 4.
 CNNとしては公知のネットワークを用いれば良いが、予め画像4のカテゴリを見分けるべく学習されているものとする。特徴マップは、例えば、非特許文献3に記載のVGG-16、VGG-19と呼ばれるCNNや、非特許文献4に記載のResNet-50、ResNet-101と呼ばれるCNNの任意の中間層の出力を抽出することで得ることができる。最も好適な例としては、全結合層の直前に該当する、最終畳み込み層から特徴マップを得る。以降では、説明のため、前述のVGG-16の最終畳み込み層(例えば5ブロック目内3層目)から特徴マップを得るものとする。以上のようにして、画像4から特徴マップを抽出することができる。 As a CNN, a known network may be used, but it is assumed that learning has been made in advance to identify the category of the image 4. The feature map is, for example, the output of an arbitrary intermediate layer of CNN called VGG-16 and VGG-19 described in Non-Patent Document 3 or ResNet-50 and ResNet-101 described in Non-Patent Document 4. It can be obtained by extraction. In the most preferred example, the feature map is obtained from the final convolutional layer corresponding to the state immediately before the entire coupling layer. Hereinafter, for explanation, it is assumed that a feature map is obtained from the final convolution layer of VGG-16 (for example, the third layer in the fifth block). A feature map can be extracted from the image 4 as described above.
[非特許文献3]K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, In ICLR, 2015.
[非特許文献4]K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, In CVPR, 2016.
 重み算出部12は、予め学習された、特徴マップを入力として画像のカテゴリを分類する分類器を用いて、抽出した特徴マップの対象カテゴリへの影響を表す重みを算出する。
[Non-Patent Document 3] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, In ICLR, 2015.
[Non-Patent Document 4] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, In CVPR, 2016.
The weight calculation unit 12 calculates a weight representing the influence of the extracted feature map on the target category by using a classifier that classifies the category of the image by using the feature map as input.
 重み算出部12では、画像4が対象カテゴリへ分類された際に、対象カテゴリへの影響が大きい箇所の数値は大きく、影響が小さい箇所の数値は小さくなる重みを算出する。 When the image 4 is classified into the target category, the weight calculation unit 12 calculates a weight where the numerical value of the portion having a large influence on the target category is large and the numerical value of the portion having a small influence is small.
 まず、特徴マップを分類器に入力し、画像4のカテゴリ分類結果として、カテゴリごとの確率を求める。分類器としては、分類結果としてカテゴリ毎に確率を出力するものであればどのようなものでもよく、公知の任意のものを用いればよいが、最終層がソフトマックス層である全結合の多層ニューラルネットワークが好ましい。本実施形態の一例では、特徴マップ抽出処理で用いたVGG-16の最終畳み込み層以降の層を分類器として用いて求めるものとする。 First, the feature map is input to the classifier, and the probability for each category is obtained as the category classification result of the image 4. Any classifier may be used as long as it outputs a probability for each category as a classification result. Any known classifier may be used, but a fully connected multilayer neural network whose final layer is a softmax layer. A network is preferred. In an example of the present embodiment, the layers after the final convolutional layer of VGG-16 used in the feature map extraction process are obtained as classifiers.
 次に、対象カテゴリについて、特徴マップ、及び対象カテゴリの確率を用いて、特徴マップにおける、対象カテゴリへの影響の度合いを表す重みを、畳み込み層のチャネルごと及び出力位置ごとに算出する。この場合、重みは特徴マップの出力位置(縦、及び横)、及び、チャネルに対応し、特徴マップのサイズ(縦、横、及びチャネル数)と同一のサイズとなる。影響度合いを求める方法は限定されないが、一例としては、非特許文献5に記載される、抽出した特徴マップにおける対象カテゴリの確率の微分係数を用いて重みを算出する。 Next, for the target category, using the feature map and the probability of the target category, a weight representing the degree of influence on the target category in the feature map is calculated for each channel and output position of the convolution layer. In this case, the weight corresponds to the output position (vertical and horizontal) and the channel of the feature map, and has the same size as the size of the feature map (vertical, horizontal and the number of channels). Although the method for obtaining the degree of influence is not limited, as an example, the weight is calculated using the differential coefficient of the probability of the target category in the extracted feature map described in Non-Patent Document 5.
[非特許文献5]R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: visual explanations from deep networks via gradient-based localization, In ICCV, 2017. [Non-Patent Document 5] R.R.
 また、同一のチャネルでの出力位置ごとの重みを統合することにより得られる、チャネルごとの重みを算出することもできる。この場合、微分係数をチャネル内で平均化し、チャネルごとに平均化された値を、重みとすることができる。重みは特徴マップのチャネルと対応し、特徴マップのチャネル数と同数の数値を持ち、カテゴリの影響が強いチャネルほど大きい値を持つ。 It is also possible to calculate the weight for each channel obtained by integrating the weights for each output position in the same channel. In this case, the differential coefficient is averaged within the channel, and the value averaged for each channel can be used as the weight. The weight corresponds to the channel of the feature map, has the same number as the number of channels of the feature map, and has a larger value as the channel is more influenced by the category.
 また、同一の出力位置でのチャネルごとの重みを統合することにより得られる、出力位置ごとの重みを算出することもできる。この場合、重みは特徴マップの出力位置と対応し、特徴マップの縦、及び横と同一のサイズとなる。なお、統合は、各出力位置における全チャネルの値を合計する、平均化する、又は最大値を取る、などの公知の方法によって実施できる。 It is also possible to calculate the weight for each output position obtained by integrating the weights for each channel at the same output position. In this case, the weight corresponds to the output position of the feature map and has the same size as the length and width of the feature map. The integration can be performed by a known method such as summing, averaging, or taking the maximum value of all channels at each output position.
 重み算出部12は、最後に、以上の手続きによって算出した重みを正規化して出力する。重みは負値を含む可能性があるため、例えば、最小値を引く、最小値を引いた上で最大値で割る、あるいは、負値を0に置き換えるなどの処理を適用することが好ましい。 The weight calculation unit 12 finally normalizes and outputs the weight calculated by the above procedure. Since the weight may include a negative value, it is preferable to apply a process such as subtracting the minimum value, subtracting the minimum value and dividing by the maximum value, or replacing the negative value with 0.
 以上の処理で、特徴マップのチャネル、又は、出力位置、あるいはその両者に関して、カテゴリの影響が大きい箇所ほど大きい値を持つ重みを求めることができる。 With the above processing, a weight having a larger value can be obtained at a location where the influence of the category is greater with respect to the feature map channel and / or the output position.
 特徴変換部13は、対象カテゴリへの影響を除するように重みを特徴マップに適用して得られる特徴マップに基づいて特徴量ベクトル6を算出する。 The feature conversion unit 13 calculates the feature vector 6 based on the feature map obtained by applying the weight to the feature map so as to remove the influence on the target category.
 特徴変換部13の処理では、重み適用処理、及びベクトル化処理が行われる。 In the process of the feature conversion unit 13, a weight application process and a vectorization process are performed.
 特徴変換部13では、まず、特徴マップに重みを適用する。重みが特徴マップの出力位置及びチャネルに対応する場合、重みは特徴マップと同じサイズであるため、最も単純には、特徴マップの各ピクセル(出力位置)から、対応する重みの値を引くことで、特徴マップにおけるカテゴリの影響を抑えることができる。あるいは、重みの値を0から1に正規化した上で1から引き、特徴マップと掛けてもよい。また、重みが特徴マップの出力位置、あるいは、チャネルのみに対応する場合には、対応する出力位置、あるいは、チャネルに該当する特徴マップの各値に同様の処理を行うことで、重みを適用できる。 The feature converter 13 first applies weights to the feature map. When the weight corresponds to the output position and channel of the feature map, since the weight is the same size as the feature map, the simplest is to subtract the corresponding weight value from each pixel (output position) of the feature map. The influence of the category in the feature map can be suppressed. Alternatively, the weight value may be normalized from 0 to 1, subtracted from 1, and multiplied by the feature map. If the weight corresponds only to the output position of the feature map or only to the channel, the weight can be applied by performing the same processing on each value of the feature map corresponding to the corresponding output position or channel. .
 他にも、重みが一定値以上となる箇所について、特徴マップの対応する箇所の値を0とすることでもカテゴリの影響を抑えることができる。一定値は、事前に規定しても良いし、重みの平均値などとしても良い。 In addition, the influence of the category can be suppressed by setting the value of the corresponding part of the feature map to 0 for the part where the weight is a certain value or more. The constant value may be defined in advance or may be an average value of weights.
 特徴変換部13では、次に、重みを適用した特徴マップから特徴量ベクトル6を求める。特徴マップから特徴量ベクトル6を算出する方法は公知の方法を用いてよいが、例えば、非特許文献1に記載の方法を用いればよい。この場合、まず、様々な大きさの矩形を規定し、チャネルごとに矩形内の値の最大値を求めることで、矩形数×チャネル数のベクトルを得る。特徴量ベクトル群を正規化し、同一チャネルの値を足し合わせて再度正規化することで、チャネル数の次元を持つ特徴量ベクトル6によって表現することができる。正規化は、公知の方法を用いれば良いが、L2正規化が好適である。 Next, the feature conversion unit 13 obtains the feature quantity vector 6 from the feature map to which the weight is applied. A known method may be used as a method for calculating the feature quantity vector 6 from the feature map. For example, a method described in Non-Patent Document 1 may be used. In this case, first, rectangles of various sizes are defined, and a vector of the number of rectangles × the number of channels is obtained by obtaining the maximum value in the rectangle for each channel. By normalizing the feature vector group, adding the values of the same channel and normalizing again, it can be expressed by the feature vector 6 having the dimension of the number of channels. For normalization, a known method may be used, but L2 normalization is preferable.
 また、重みの適用は特徴量ベクトル6の算出後に行っても良く、例えば、特徴マップから求めた特徴量ベクトル6から、特徴マップの出力位置及びチャネルに対応する重みから求めた特徴量ベクトル6を引くことで、重みを適用してもよい。この場合、減算の前後に各特徴量ベクトル6を正規化することが好ましい。 The weight may be applied after the feature vector 6 is calculated. For example, from the feature vector 6 obtained from the feature map, the feature vector 6 obtained from the output position of the feature map and the weight corresponding to the channel is used. The weight may be applied by subtraction. In this case, it is preferable to normalize each feature vector 6 before and after subtraction.
 照合部14は、画像4について算出した特徴量ベクトル6と、データベース2に格納された対象カテゴリについての参照画像の各々から抽出された特徴量ベクトルとを照合し、画像4に対応する参照画像の各々を検索結果7として出力する。 The collation unit 14 collates the feature quantity vector 6 calculated for the image 4 with the feature quantity vector extracted from each of the reference images for the target category stored in the database 2, and determines the reference image corresponding to the image 4. Each is output as a search result 7.
 類似度は、例えば内積、コサイン類似度など、任意の公知の尺度によって求めればよい。この類似度の最も高いものから順に意味内容が同一ないし近い参照画像を検索結果7として出力する。あるいは、これらを求める際に、公知のインデクシング法を用いても構わず、例えば、特許文献1に開示される方法を用いて特徴量ベクトル6をハッシュ化し、近似的に類似する参照画像を発見してもよい。 The similarity may be obtained by any known scale such as inner product or cosine similarity. Reference images having the same or similar meaning content are output as search results 7 in order from the highest similarity. Alternatively, a known indexing method may be used for obtaining these, and for example, the feature vector 6 is hashed using a method disclosed in Patent Document 1 to find a reference image that is approximately similar. May be.
[特許文献1]特開2013-68884号公報 [Patent Document 1] JP 2013-68884 A
<本発明の実施形態に係る特徴量抽出装置の作用> <Operation of Feature Extraction Device According to Embodiment of the Present Invention>
 次に、本発明の実施形態に係る特徴量抽出装置1の作用について説明する。特徴量抽出装置1は、図2に示す特徴量抽出処理ルーチンを実行する。 Next, the operation of the feature quantity extraction device 1 according to the embodiment of the present invention will be described. The feature amount extraction apparatus 1 executes a feature amount extraction processing routine shown in FIG.
 まず、ステップS101では、特徴マップ算出部11が、対象カテゴリを含む複数のカテゴリから画像のカテゴリを識別するように予め学習されたCNNに対して画像4を入力し、画像4ごとに得られる、CNNの畳み込み層のチャネルごと及び出力位置ごとの出力を、画像4の特徴を表す特徴マップとして算出する。 First, in step S101, the feature map calculation unit 11 inputs the image 4 to the CNN learned in advance so as to identify the category of the image from a plurality of categories including the target category, and is obtained for each image 4. The output for each channel and each output position of the convolutional layer of the CNN is calculated as a feature map representing the features of the image 4.
 次に、ステップS102では、重み算出部12が、予め学習された、特徴マップを入力として画像のカテゴリを分類する分類器を用いて、抽出した特徴マップの対象カテゴリへの影響を表す重みを算出する。ここでは、特徴マップを分類器に入力し、画像4のカテゴリ分類結果として、カテゴリごとの確率を求め、対象カテゴリについて、特徴マップ、及び対象カテゴリの確率を用いて、特徴マップにおける、対象カテゴリへの影響の度合いを表す重みを、特徴マップのチャネルごと及び出力位置ごとに算出する。 Next, in step S102, the weight calculation unit 12 calculates a weight representing the influence of the extracted feature map on the target category using a classifier that has been learned in advance and classifies the category of the image using the feature map as an input. To do. Here, the feature map is input to the classifier, the probability for each category is obtained as the category classification result of the image 4, the feature map and the probability of the target category are used for the target category, and the target category in the feature map is obtained. The weight representing the degree of the influence is calculated for each channel and output position of the feature map.
 ステップS103では、特徴変換部13が、対象カテゴリへの影響を除するように重みを特徴マップに適用して得られる特徴マップに基づいて特徴量ベクトル6を算出する。 In step S103, the feature conversion unit 13 calculates the feature vector 6 based on the feature map obtained by applying the weight to the feature map so as to remove the influence on the target category.
 ステップS104では、画像4について算出した特徴量ベクトル6と、データベース2に格納された対象カテゴリについての参照画像の各々から抽出された特徴量ベクトルとを照合し、画像4に対応する参照画像の各々を検索結果7として出力する。 In step S104, the feature amount vector 6 calculated for the image 4 is compared with the feature amount vector extracted from each of the reference images for the target category stored in the database 2, and each of the reference images corresponding to the image 4 is checked. Is output as the search result 7.
 以上説明したように、本発明の実施形態に係る特徴量抽出装置によれば、対象カテゴリを含む複数のカテゴリから予め画像のカテゴリを識別するように学習されたCNNに対して任意の画像を入力し、画像ごとに得られる、CNNの畳み込み層のチャネルごと及び出力位置ごとの出力を、画像の特徴を表す特徴マップとして算出し、予め学習された、特徴マップを入力として画像のカテゴリを分類する分類器を用いて、特徴マップの対象カテゴリへの影響を表す重みを算出し、対象カテゴリへの影響を除するように重みを特徴マップに適用して得られる特徴マップに基づいて特徴量ベクトルを算出することにより、対象カテゴリにおける画像中の物体を精度よく検索するための特徴量を抽出することができる。 As described above, according to the feature quantity extraction device according to the embodiment of the present invention, an arbitrary image is input to the CNN learned in advance to identify the category of the image from a plurality of categories including the target category. Then, the output for each channel and output position of the CNN convolutional layer obtained for each image is calculated as a feature map representing the feature of the image, and the category of the image is classified using the feature map learned in advance as an input. The classifier is used to calculate the weight that represents the influence of the feature map on the target category, and the feature vector is calculated based on the feature map obtained by applying the weight to the feature map so as to remove the influence on the target category. By calculating, it is possible to extract a feature amount for accurately searching for an object in the image in the target category.
 なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.
 例えば、特徴量抽出装置1に照合部14を設ける場合を例に説明したが、これに限定されるものではなく、外部に照合装置を備えるようにしてもよい。この場合には、照合装置は、特徴量抽出装置、ならびに、データベースと相互に通信可能な形で接続されているものとする。 For example, although the case where the collation unit 14 is provided in the feature quantity extraction device 1 has been described as an example, the present invention is not limited to this, and a collation device may be provided outside. In this case, it is assumed that the collation device is connected to the feature amount extraction device and the database so as to communicate with each other.
1 特徴量抽出装置
2 データベース
4 画像
5 参照画像集合
6 特徴量ベクトル
7 検索結果
11 特徴マップ算出部
12 算出部
13 特徴変換部
14 照合部
DESCRIPTION OF SYMBOLS 1 Feature-value extraction apparatus 2 Database 4 Image 5 Reference image set 6 Feature-value vector 7 Search result 11 Feature map calculation part 12 Calculation part 13 Feature conversion part 14 Collation part

Claims (7)

  1.  任意の画像から特徴量を抽出する特徴量抽出装置であって、
     対象カテゴリを含む複数のカテゴリから予め画像のカテゴリを識別するように学習された畳み込みニューラルネットワークに対して前記任意の画像を入力し、前記画像ごとに得られる、前記畳み込みニューラルネットワークの畳み込み層のチャネルごと及び出力位置ごとの出力を、前記画像の特徴を表す特徴マップとして算出する特徴マップ算出部と、
     予め学習された、前記特徴マップを入力として前記画像のカテゴリを分類する分類器を用いて、前記特徴マップの前記対象カテゴリへの影響を表す重みを算出する重み算出部と、
     前記対象カテゴリへの影響を除するように前記重みを前記特徴マップに適用して得られる特徴マップに基づいて特徴量ベクトルを算出する特徴変換部と、
     を含む特徴量抽出装置。
    A feature amount extraction device that extracts a feature amount from an arbitrary image,
    A channel of the convolutional layer of the convolutional neural network obtained by inputting the arbitrary image to a convolutional neural network previously learned to identify a category of the image from a plurality of categories including the target category, and obtained for each image. A feature map calculation unit that calculates an output for each and an output position as a feature map representing the feature of the image;
    A weight calculator that calculates a weight representing an influence of the feature map on the target category using a classifier that classifies the category of the image by using the feature map as an input,
    A feature conversion unit that calculates a feature vector based on a feature map obtained by applying the weight to the feature map so as to remove the influence on the target category;
    A feature amount extraction device.
  2.  前記重み算出部は、分類器を用いて、前記特徴マップの前記対象カテゴリへの影響を表す重みを前記チャネルごと及び前記出力位置ごとに算出する請求項1記載の特徴量抽出装置。 The feature amount extraction apparatus according to claim 1, wherein the weight calculation unit calculates a weight representing an influence of the feature map on the target category for each channel and each output position using a classifier.
  3.  前記重み算出部は、分類器を用いて、前記特徴マップの前記対象カテゴリへの影響を表す重みを前記チャネルごと及び前記出力位置ごとに算出し、同一の出力位置での前記チャネルごとの重みを統合することにより得られる、前記出力位置ごとの重み、または同一のチャネルでの前記出力位置ごとの重みを統合することにより得られる、前記チャネルごとの重みを算出する請求項1記載の特徴量抽出装置。 The weight calculator uses a classifier to calculate a weight representing the influence of the feature map on the target category for each channel and each output position, and calculates a weight for each channel at the same output position. The feature amount extraction according to claim 1, wherein a weight for each output position obtained by integrating or a weight for each output position obtained by integrating the weight for each output position in the same channel is calculated. apparatus.
  4.  前記分類器は、分類結果としてカテゴリ毎に確率を出力し、
     前記重み算出部は、前記対象カテゴリの前記確率の微分係数を用いて、前記重みを算出する請求項1~請求項3の何れか1項記載の特徴量抽出装置。
    The classifier outputs a probability for each category as a classification result,
    4. The feature quantity extraction device according to claim 1, wherein the weight calculation unit calculates the weight using a differential coefficient of the probability of the target category.
  5.  前記画像について算出した前記特徴量ベクトルと、予め前記対象カテゴリについての参照画像の各々から抽出された特徴量ベクトルとを照合し、前記画像に対応する前記参照画像の各々を検索結果として出力する照合部、
     を更に含む請求項1~請求項4の何れか1項記載の特徴量抽出装置。
    Collation for collating the feature vector calculated for the image with a feature vector extracted from each reference image for the target category in advance and outputting each reference image corresponding to the image as a search result Part,
    5. The feature quantity extraction device according to claim 1, further comprising:
  6.  任意の画像から特徴量を抽出する特徴量抽出装置における特徴量抽出方法であって、
     特徴マップ算出部が、対象カテゴリを含む複数のカテゴリから予め画像のカテゴリを識別するように学習された畳み込みニューラルネットワークに対して前記任意の画像を入力し、前記画像ごとに得られる、前記畳み込みニューラルネットワークの畳み込み層のチャネルごと及び出力位置ごとの出力を、前記画像の特徴を表す特徴マップとして算出するステップと、
     重み算出部が、予め学習された、前記特徴マップを入力として前記画像のカテゴリを分類する分類器を用いて、前記特徴マップの前記対象カテゴリへの影響を表す重みを算出するステップと、
     特徴変換部が、前記対象カテゴリへの影響を除するように前記重みを前記特徴マップに適用して得られる特徴マップに基づいて特徴量ベクトルを算出するステップと、
     を含む特徴量抽出方法。
    A feature amount extraction method in a feature amount extraction apparatus that extracts a feature amount from an arbitrary image,
    The convolutional neural network obtained by inputting the arbitrary image to a convolutional neural network that has been learned in advance to identify a category of an image from a plurality of categories including the target category, and obtained by each of the images Calculating the output for each channel and output position of the convolutional layer of the network as a feature map representing the features of the image;
    A step of calculating a weight representing an influence of the feature map on the target category by using a classifier that classifies the category of the image using the feature map as an input, the weight calculating unit learning in advance;
    A feature converting unit calculating a feature vector based on a feature map obtained by applying the weight to the feature map so as to remove the influence on the target category;
    A feature amount extraction method.
  7.  コンピュータを、請求項1~請求項5のいずれか1項に記載の特徴量抽出装置の各部として機能させるためのプログラム。 A program for causing a computer to function as each part of the feature quantity extraction device according to any one of claims 1 to 5.
PCT/JP2019/020948 2018-06-01 2019-05-27 Feature amount extraction device, method, and program WO2019230666A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018-106237 2018-06-01
JP2018106237A JP2019211913A (en) 2018-06-01 2018-06-01 Feature quantity extraction device, method, and program

Publications (1)

Publication Number Publication Date
WO2019230666A1 true WO2019230666A1 (en) 2019-12-05

Family

ID=68698150

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/020948 WO2019230666A1 (en) 2018-06-01 2019-05-27 Feature amount extraction device, method, and program

Country Status (2)

Country Link
JP (1) JP2019211913A (en)
WO (1) WO2019230666A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239965A (en) * 2021-04-12 2021-08-10 北京林业大学 Bird identification method based on deep neural network and electronic equipment
CN113821670A (en) * 2021-07-23 2021-12-21 腾讯科技(深圳)有限公司 Image retrieval method, device, equipment and computer readable storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200143238A1 (en) * 2018-11-07 2020-05-07 Facebook, Inc. Detecting Augmented-Reality Targets
JP7357454B2 (en) * 2019-03-25 2023-10-06 三菱電機株式会社 Feature identification device, feature identification method, and feature identification program
WO2021130881A1 (en) * 2019-12-25 2021-07-01 三菱電機株式会社 Object detection device, monitoring device, and learning device
TWI790572B (en) * 2021-03-19 2023-01-21 宏碁智醫股份有限公司 Detecting method and detecting apparatus related to image

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017129990A (en) * 2016-01-19 2017-07-27 国立大学法人豊橋技術科学大学 Device, method, and program for image recognition

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017129990A (en) * 2016-01-19 2017-07-27 国立大学法人豊橋技術科学大学 Device, method, and program for image recognition

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239965A (en) * 2021-04-12 2021-08-10 北京林业大学 Bird identification method based on deep neural network and electronic equipment
CN113239965B (en) * 2021-04-12 2023-05-02 北京林业大学 Bird recognition method based on deep neural network and electronic equipment
CN113821670A (en) * 2021-07-23 2021-12-21 腾讯科技(深圳)有限公司 Image retrieval method, device, equipment and computer readable storage medium
CN113821670B (en) * 2021-07-23 2024-04-16 腾讯科技(深圳)有限公司 Image retrieval method, device, equipment and computer readable storage medium

Also Published As

Publication number Publication date
JP2019211913A (en) 2019-12-12

Similar Documents

Publication Publication Date Title
WO2019230666A1 (en) Feature amount extraction device, method, and program
US11416710B2 (en) Feature representation device, feature representation method, and program
CN105354307B (en) Image content identification method and device
Kim et al. An Efficient Color Space for Deep‐Learning Based Traffic Light Recognition
US8533162B2 (en) Method for detecting object
CN110073367B (en) Multi-view embedding with SOFT-MAX based compatibility function for zero sample learning
EP3029606A2 (en) Method and apparatus for image classification with joint feature adaptation and classifier learning
US11928790B2 (en) Object recognition device, object recognition learning device, method, and program
CN110175615B (en) Model training method, domain-adaptive visual position identification method and device
US20170262478A1 (en) Method and apparatus for image retrieval with feature learning
JP6892606B2 (en) Positioning device, position identification method and computer program
CN111079847A (en) Remote sensing image automatic labeling method based on deep learning
CN113033438B (en) Data feature learning method for modal imperfect alignment
Choi et al. Face video retrieval based on the deep CNN with RBF loss
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN112784754A (en) Vehicle re-identification method, device, equipment and storage medium
CN111611395B (en) Entity relationship identification method and device
JP6373292B2 (en) Feature generation apparatus, method, and program
Kishan et al. Handwritten character recognition using CNN
Aljuaidi et al. Mini-batch vlad for visual place retrieval
Ouni et al. Deep learning for robust information retrieval system
Ayech et al. A content-based image retrieval using PCA and SOM
Holliday et al. Scale-invariant localization using quasi-semantic object landmarks
Bicego et al. Combining free energy score spaces with information theoretic kernels: Application to scene classification
Wang et al. Joint Face Detection and Initialization for Face Alignment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19811751

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19811751

Country of ref document: EP

Kind code of ref document: A1