JP6842745B2

JP6842745B2 - Image discrimination device and image discrimination method

Info

Publication number: JP6842745B2
Application number: JP2016148267A
Authority: JP
Inventors: 公俊山崎; アーノードソービ
Original assignee: Shinshu University NUC
Current assignee: Shinshu University NUC
Priority date: 2016-07-28
Filing date: 2016-07-28
Publication date: 2021-03-17
Anticipated expiration: 2036-07-28
Also published as: JP2018018313A

Description

本発明は、画像判別技術に関し、特に、任意の画像から領域を抽出し、各領域の特徴に基づいて、被写体を判別する装置に関する。 The present invention relates to an image discrimination technique, and more particularly to a device that extracts a region from an arbitrary image and discriminates a subject based on the characteristics of each region.

近年、コンピュータによる一般物体認識技術は著しく進歩しており、これには、畳み込みニューラルネットワーク（Convolutional Neural Network、CNNs）の研究が大きく寄与している。CNNsは、古典的な多層パーセプトロンの延長であり、脳の視覚野の構造に基づいて、画像中の特徴の抽出を行う畳み込み層を有する、多層のニューラルネットワークである（非特許文献１）。 In recent years, computer-based general object recognition technology has made remarkable progress, and research on convolutional neural networks (CNNs) has greatly contributed to this. CNNs are extensions of the classical multi-layer perceptron and are multi-layer neural networks with convolutional layers that extract features in images based on the structure of the visual cortex of the brain (Non-Patent Document 1).

特開２０１５−１００１１０号公報JP 2015-100110

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. of the IEEE, pages 2278-2324,1998.Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. Of the IEEE, pages 2278-2324, 1998. Jonathan Masci, Ueli Meier, Dan Ciresan, and Jurgen Schmidhuber. Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction. Artificial Neural Networks and Machine Learning - ICANN 2011. Volume 6791 of the series Lecture Notes in Computer Science pp 52-59. 2011.Jonathan Masci, Ueli Meier, Dan Ciresan, and Jurgen Schmidhuber. Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction. Artificial Neural Networks and Machine Learning --ICANN 2011. Volume 6791 of the series Lecture Notes in Computer Science pp 52-59. 2011. G. Mori, Guiding Model Search Using Segmentation, IEEE International Conference on Computer Vision, 2005.G. Mori, Guiding Model Search Using Segmentation, IEEE International Conference on Computer Vision, 2005. Martin Ester , Hans-peter Kriegel , Jorg Sander , Xiaowei Xu,A density-based algorithm for discovering clusters in large spatial databases with noise,AAAI Press,p226--231,1996Martin Ester, Hans-peter Kriegel, Jorg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, AAAI Press, p226--231,1996 Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jorg Sander. OPTICS: Ordering Points To Identify the Clustering Structure. ACM SIGMOD international conference on Management of data. ACM Press. pp. 49-60. 1999.Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jorg Sander. OPTICS: Ordering Points To Identify the Clustering Structure. ACM SIGMOD international conference on Management of data. ACM Press. Pp. 49-60. 1999.

CNNsを用いた一般物体認識では、コンピュータが画像認識を行うために学習するための大量のラベル付け（アノテーション）された教材データセットが必要となる。しかし、予め状況を予測することが困難な状況、例えば、災害時に倒壊した施設内部等においては、教材データセットの蓄積はなく、また、適切な教材データセットを作成することも困難である。 General object recognition using CNNs requires a large amount of labeled (annotated) teaching material datasets for computers to learn in order to perform image recognition. However, in situations where it is difficult to predict the situation in advance, for example, inside a facility that collapsed during a disaster, the teaching material data set is not accumulated, and it is also difficult to create an appropriate teaching material data set.

上記の課題を解決するため、本発明者らは、CNNsが本来有する強みと、「Active Learning Approach」の考え方を組み合わせて、教師なし学習を原則として、ユーザからの最小限のフィードバックにより学習することで、事前の教材データセットなしに画像の領域分割と被写体の種別を行うことが可能な装置を見出し、本発明を完成させるに至った。本発明は、事前の教材データセットなしに、静止画や動画のデータに対して、被写体のカテゴリの判別を行うことが可能な、画像判別装置及び画像判別方法を提供することを目的とする。 In order to solve the above problems, the present inventors combine the inherent strengths of CNNs with the concept of "Active Learning Approach" and learn with the minimum feedback from users in principle, based on unsupervised learning. Therefore, we have found a device capable of dividing an image area and classifying a subject without prior teaching material data set, and have completed the present invention. An object of the present invention is to provide an image discrimination device and an image discrimination method capable of discriminating a subject category with respect to still image or moving image data without a prior teaching material data set.

即ち請求項1記載の本発明は、外部から入力された画像を処理して、画像の被写体それぞれを対象物、カテゴリごとに判別して出力する画像判別装置であって、装置外部から画像データを取得する画像取得部と、前記画像データにおける各画素の特徴量に基づいて連結された複数の画素からなる局所領域に分割する局所領域生成部と、前記局所領域の特徴を抽出するためのパラメータを保持する符号化基準保持部と、前記符号化基準保持部が有するパラメータを用いて、前記局所領域を符号化によって低次元化を図る符号化処理部と、前記符号化処理部によって低次元化された局所領域を、類似する局所領域が含まれている、任意の数の集合に分類する領域判別部と、前記領域判別部によって分類された集合に、ラベルを付すラベル付与部と、前記ラベル付与部によって付与されたラベルを、ユーザによって任意の名前で確定するラベル確定部と、前記ラベル確定部によって確定されたラベルを保持するラベル保持部と、領域判別によって判別された各領域をユーザに示すよう表示する結果表示部と、を備え、前記符号化処理部では、自己符号化器を使用し、前記符号化基準保持部に保持されているパラメータを用いて局所領域の符号化を行うとともに、符号化による低次元化の最適化を行って前記符号化基準保持部に保持されているパラメータを更新する処理を行い、前記ラベル付与部では、前記ラベル保持部を参照し、前記領域判別部により判別された集合について、当該集合に含まれる局所領域と類似する局所領域を含む、確定ラベルが付与された集合があるかを確認し、類似する局所領域を含む確定ラベルが付与された集合が確認された場合には、当該集合にその確定ラベルを付与し、確認されない場合には、当該集合に一時的に仮ラベルを付与する処理を行う、ことを特徴とする画像判別装置である。 That is, the present invention according to claim 1 is an image discrimination device that processes an image input from the outside, discriminates and outputs each subject of the image for each object and category, and outputs image data from the outside of the device. The image acquisition unit to be acquired, the local area generation unit that divides into a local region consisting of a plurality of pixels connected based on the feature amount of each pixel in the image data, and the parameters for extracting the characteristics of the local region. The coding standard holding unit to be held, the coding processing unit for reducing the dimension of the local region by coding using the parameters possessed by the coding standard holding unit, and the coding processing unit to reduce the dimension. A region discriminating unit that classifies the local regions into an arbitrary number of sets including similar local regions , a labeling unit that labels the set classified by the region discriminating unit, and the labeling unit. The label fixing unit that confirms the label given by the unit with an arbitrary name by the user, the label holding unit that holds the label confirmed by the label confirmation unit, and each area determined by the area discrimination are shown to the user. A result display unit is provided , and the coding processing unit uses a self-encoder to encode a local region using the parameters held in the coding reference holding unit. The process of optimizing the reduction in dimension by coding and updating the parameters held in the coding reference holding unit is performed. In the labeling unit, the label holding unit is referred to and the area determination unit is used. For the determined set, check if there is a set with a definite label that includes a local region similar to the local region contained in the set, and check if there is a set with a definite label that includes a similar local region. The image discrimination apparatus is characterized in that, if it is done, the definite label is given to the set, and if it is not confirmed, a process of temporarily giving a temporary label to the set is performed.

また請求項２記載の本発明は、前記ラベル付与部は、前記ラベル確定部によって確定されたラベルが、１つの集合に２以上重複して付与される場合、当該重複を解消するよう、前記符号化基準保持部のパラメータを変更することを特徴とする請求項１記載の画像判別装置である。 Further, in the present invention according to claim 2 , when the label determined by the label determining unit is attached to one set by two or more duplicates, the label assigning unit has the reference numeral so as to eliminate the duplication. an image discriminating device according to claim 1 Symbol mounting and changing the parameters of the criterion holding unit.

また請求項３記載の本発明は、外部から入力された画像を処理して、画像の被写体それぞれを対象物、カテゴリごとに判別して出力する画像判別方法であって、装置外部から画像データを取得する画像取得工程と、前記画像データにおける各画素の特徴量に基づいて連結された複数の画素からなる局所領域に分割する局所領域生成工程と、前記局所領域の特徴を抽出するためのパラメータを符号化基準保持部に記録し、該符号化基準保持部が有するパラメータを用いて、前記局所領域を符号化によって低次元化を図る符号化処理工程と、前記符号化処理工程によって低次元化された局所領域を、類似する局所領域が含まれている、任意の数の集合に分類する領域判別工程と、前記領域判別工程によって分類された集合に、ラベルを付すラベル付与工程と、前記ラベル付与工程によって付与されたラベルを、ユーザが任意の名前で確定するラベル確定工程と、領域判別によって判別された各領域をユーザに示すよう表示する結果表示工程と、を備え、前記符号化処理工程においては、自己符号化器を使用し、前記符号化基準保持部に保持されているパラメータを用いて局所領域の符号化を行うとともに、符号化による低次元化の最適化を行ってパラメータを更新する処理を行い、前記ラベル付与工程においては、前記ラベル確定工程によって確定されたラベルを保持するラベル保持部を参照し、前記領域判別工程により判別された集合について、当該集合に含まれる局所領域と類似する局所領域を含む、確定ラベルが付与された集合があるかを確認し、類似する局所領域を含む確定ラベルが付与された集合が確認された場合には、当該集合にその確定ラベルを付与し、確認されない場合には、当該集合に一時的に仮ラベルを付与する処理を行う、ことを特徴とする画像判別方法である。 The present invention according to claim 3 is an image discrimination method that processes an image input from the outside, discriminates and outputs each subject of the image for each object and category, and outputs image data from the outside of the apparatus. The image acquisition step to be acquired, the local region generation step of dividing into a local region consisting of a plurality of connected pixels based on the feature amount of each pixel in the image data, and the parameters for extracting the features of the local region are described. recorded in coded reference holder, using the parameters the encoder reference holder having said a local region encoding reduce the dimensionality by encoding step, the dimension reduction by said encoding step A region determination step of classifying the local regions into an arbitrary number of sets including similar local regions , a labeling step of attaching a label to the set classified by the region determination step, and the label assignment. The coding processing step includes a label fixing step in which the user fixes the label given by the step with an arbitrary name, and a result display step in which each area determined by the area discrimination is displayed so as to be shown to the user. Uses a self-encoder to encode a local region using the parameters held in the coding reference holding unit, and updates the parameters by optimizing the reduction in dimension by coding. In the labeling step, the label holding unit that holds the label determined by the label determining step is referred to, and the set determined by the area determination step is similar to the local region included in the set. Check if there is a set with a definite label that includes the local region to be used, and if a set with a definite label that includes a similar local region is confirmed, assign the definite label to the set. , If it is not confirmed, the image discrimination method is characterized in that a process of temporarily giving a temporary label to the set is performed.

また請求項４記載の本発明は、前記ラベル確定工程によって確定されたラベルが、１つの集合に２以上重複して付与された場合、当該重複を解消するよう、前記符号化基準保持部が有するパラメータを変更するパラメータ変更工程と、をさらに備えることを特徴とする請求項３記載の画像判別方法である。 Further, in the present invention according to claim 4 , when two or more labels determined by the label determination step are duplicated in one set, the coding standard holding unit has the coding standard holding unit so as to eliminate the duplication. The image discrimination method according to claim 3 , further comprising a parameter changing step of changing parameters.

本発明によれば、画像の被写体について、事前教材データセットなしに、かつ、ユーザからの少ないフィードバックで判別することが可能な画像判別装置を提供することが可能である。 According to the present invention, it is possible to provide an image discrimination device capable of discriminating an image subject without a prior teaching material data set and with a small amount of feedback from the user.

実施形態の一例に係る画像判別装置の機能構成図である。It is a functional block diagram of the image discrimination apparatus which concerns on an example of embodiment. 畳み込み自己符号化器のイメージである。It is an image of a convolution self-encoder. 実施形態の一例に係る画像判別装置のハードウェア構成である。This is a hardware configuration of an image discrimination device according to an example of the embodiment. 実施形態の一例における認識処理の流れを示すフローチャートである。It is a flowchart which shows the flow of recognition processing in an example of Embodiment. 入力画像の例と、スーパーピクセル生成を行った画像の例である。It is an example of an input image and an example of an image in which super pixel generation is performed. 領域生成部によるクラスタリングのイメージである。It is an image of clustering by the area generation part. 結果表示部による結果表示画面のイメージである。This is an image of the result display screen by the result display unit.

以下、本発明に係る画像判別装置及び画像判別方法の実施の形態について説明する。 Hereinafter, embodiments of the image discrimination device and the image discrimination method according to the present invention will be described.

（装置構成）
図１は、本実施形態に係る画像判別装置の機能構成図である。本実施形態に係る画像判別装置は、外部から入力された画像を処理して、画像の被写体それぞれを対象物、カテゴリごとに判別して出力する。本実施形態に係る画像判別装置は、画像取得部１００、局所領域生成部１１０、符号化処理部１２０、符号化基準保持部２００、領域判別部１３０、ラベル付与部１４０、ラベル保持部２１０、結果表示部１５０、ラベル確定部１６０から構成される。 (Device configuration)
FIG. 1 is a functional configuration diagram of an image discrimination device according to the present embodiment. The image discrimination device according to the present embodiment processes an image input from the outside, discriminates each subject of the image for each object and category, and outputs the image. The image discrimination apparatus according to the present embodiment includes an image acquisition unit 100, a local region generation unit 110, a coding processing unit 120, a coding reference holding unit 200 , an area discrimination unit 130, a labeling unit 140, a label holding unit 210, and a result. It is composed of a display unit 150 and a label confirmation unit 160.

画像取得部１００は、本実施形態に係る装置のインタフェースの役割を果たす部分であり、装置に備えられたデータ入力部、または外部入力装置から判別対象の画像を取得し、これを局所領域生成部１１０に送る。前記データ入力部の例としては、画像をデータとして取り込むためのネットワークインタフェース等が挙げられ、外部入力装置の例としては、カメラ、ビデオカメラ等が挙げられる。 The image acquisition unit 100 is a part that plays a role of an interface of the device according to the present embodiment, acquires an image to be discriminated from a data input unit provided in the device or an external input device, and obtains an image to be discriminated from the local area generation unit. Send to 110. Examples of the data input unit include a network interface for capturing an image as data, and examples of an external input device include a camera, a video camera, and the like.

局所領域生成部１１０は、画像取得部１００によって取得された画像を入力し、前記画像のそれぞれの画素の特徴量に基づいて連結された複数の画素からなる局所領域に分割する。また、以降の処理で、当該画像で前記局所領域を画素単位として扱うよう記録した上で、符号化処理部１２０に送る。局所領域は、スーパーピクセルとも呼ばれ、所定の特徴量の類似する画素の集合をいう。特徴量の例としては、RGB、CMYK、HSL、線情報、形状情報等が挙げられる。入力される画像は、1枚の静止画でも良く、また複数の静止画が連続的に入力される動画でも良いが、特に、本発明においては、処理する画像の枚数が増えるにつれて、判別結果をよりユーザの感覚に近づけることが可能であるため、動画について、好適に処理が可能である。 The local region generation unit 110 inputs an image acquired by the image acquisition unit 100, and divides the image into a local region composed of a plurality of connected pixels based on the feature amount of each pixel of the image. Further, in the subsequent processing, the local region is recorded in the image so as to be treated as a pixel unit, and then sent to the coding processing unit 120. The local region is also called a super pixel, and refers to a set of pixels having similar features with a predetermined feature amount. Examples of feature quantities include RGB, CMYK, HSL, line information, shape information, and the like. The input image may be a single still image or a moving image in which a plurality of still images are continuously input, but in particular, in the present invention, as the number of images to be processed increases, the discrimination result is determined. Since it is possible to get closer to the user's feeling, it is possible to suitably process the moving image.

符号化処理部１２０は、局所領域生成部１１０から入力された画像の局所領域について、所定のパラメータに基づく符号化によって低次元化を図る。符号化とは、デジタルデータを一定の規則に従って、目的に応じた符号に変換することであり、エンコードともいう。符号化処理のアルゴリズムとしては、ニューラルネットワークを用いた自己符号化器（Auto Encoder）が好適に適用可能である（非特許文献２）。これにより、各局所領域について、パラメータの最適化を行う。一方で、パラメータの最適化と交互もしくは独立に、局所領域を低次元化表現することが可能になる。符号化処理部１２０の例としては畳み込み自己符号化器（Convolutional Auto Encoder）が好適に適用される。図２に畳み込み自己符号化器のイメージを示す。 The coding processing unit 120 attempts to reduce the dimension of the local region of the image input from the local region generation unit 110 by coding based on a predetermined parameter. Coding is to convert digital data into a code according to a purpose according to a certain rule, and is also called encoding. As an algorithm for coding processing, an autoencoder using a neural network can be preferably applied (Non-Patent Document 2). As a result, the parameters are optimized for each local region. On the other hand, it becomes possible to express the local region in a lower dimension alternately or independently of the parameter optimization. As an example of the coding processing unit 120, a convolutional auto encoder is preferably applied. FIG. 2 shows an image of a convolution self-encoder.

符号化基準保持部１６０は、符号化処理部１２０で低次元化を行う際に使用するパラメータを記録する。パラメータは、符号化処理部１２０から参照されるほか、符号化処理部１２０による最適化処理によって適宜更新され、入力画像をよりユーザの判断基準に近い判別が行えるようにする。符号化基準保持部内のパラメータ更新は、基本的に符号化処理部１２０の処理ごとに随時行われるほか、例えば、ユーザがラベルを確定させた後、その後の領域判別処理に齟齬が生じた場合に、パラメータを漸次変更して、判別基準をユーザの感覚に近づける。 The coding reference holding unit 160 records the parameters used when the coding processing unit 120 performs low-dimensionalization. The parameters are referred to by the coding processing unit 120 and are appropriately updated by the optimization processing by the coding processing unit 120 so that the input image can be discriminated closer to the user's judgment standard. The parameter update in the coding reference holding unit is basically performed at any time for each processing of the coding processing unit 120, and for example, when a discrepancy occurs in the subsequent area determination processing after the user determines the label. , The parameters are gradually changed to bring the discrimination criteria closer to the user's feeling.

領域判別部１３０は、符号化処理部１２０によって低次元化されたデータの集合を、類似する任意の数の集合に分類する。各集合は、符号化処理部１２０によって低次元化された時点において、類似する性質を有しており、この判別結果は、画像中の被写体判別結果に対応する。例えば、分類された集合はクラスタとも呼ばれ、この分類処理はクラスタリングともいう。分類されたデータは、仮のラベルまたは確定ラベルを付与するため、ラベル付与部１４０に受け渡される。 The area determination unit 130 classifies the set of data reduced in dimension by the coding processing unit 120 into an arbitrary number of similar sets. Each set has similar properties at the time when the dimension is reduced by the coding processing unit 120, and this discrimination result corresponds to the subject discrimination result in the image. For example, a classified set is also called a cluster, and this classification process is also called a clustering. The classified data is passed to the labeling unit 140 in order to give a temporary label or a final label.

ラベル付与部１４０は、領域判別部１３０によって判別された各集合に対して、それぞれ所定のラベルを付与する。付与されるラベルは、ラベル保持部２１０に記録されたユーザによる確定ラベルから参照されるほか、付与すべきラベルが存在していない場合には、所定の仮ラベルから任意に付与する。 The labeling unit 140 assigns a predetermined label to each set determined by the area determination unit 130. The label to be assigned is referred to from the confirmed label recorded by the user in the label holding unit 210, and if there is no label to be assigned, the label is arbitrarily assigned from a predetermined temporary label.

ラベル保持部２１０は、ラベル付与部がラベル付与を行う際に使用する確定ラベルを記録する。確定ラベルは、ユーザが結果表示部１６０に表示された判別結果を確認し、自身の判断によって任意の名前で付与するラベルであり、ユーザ自身で変更、更新を行うことはできるが、画像判別装置の判断で更新を行うことはできない。 The label holding unit 210 records a definite label used when the labeling unit assigns a label. The confirmation label is a label that the user confirms the discrimination result displayed on the result display unit 160 and assigns it with an arbitrary name by his / her own judgment. Although the user can change or update the label by himself / herself, the image discrimination device It is not possible to update at the discretion of.

結果表示部１５０は、領域判別によって判別され、それぞれラベルが付された各集合をユーザに示すよう表示する。表示は、例えばディスプレイ画面上などで判別結果が分かる形態で行われ、これによりユーザは、画像判別装置による判別結果を、局所領域ごとに確認することができる。ユーザが判別結果を確認した結果、任意の局所領域の判別結果、自身の判別で確定する場合、これを画像判別装置に示すため、後のラベル確定部１６０によって、当該局所領域に確定したラベルを付与しても良い。付与される確定されたラベルとしては、「柱」や「壁」などのユーザの判別結果と対応したものが挙げられる。ユーザによって確定された確定ラベルは、ラベル保持部に記録され、その後の処理において、ラベル付与部１４０によって参照される。 The result display unit 150 displays each set labeled by the area discrimination so as to show the user. The display is performed in a form in which the discrimination result can be seen, for example, on a display screen, whereby the user can check the discrimination result by the image discrimination device for each local area. As a result of checking the discrimination result by the user, when the discrimination result of an arbitrary local area is confirmed by its own discrimination, in order to show this to the image discrimination device, the label confirmed in the local area is determined by the label confirmation unit 160 later. It may be given. Examples of the confirmed label to be given include those corresponding to the user's discrimination result such as "pillar" and "wall". The confirmed label confirmed by the user is recorded in the label holding unit and referred to by the labeling unit 140 in the subsequent processing.

（ハードウェア構成）
図３に、本実施形態に係る画像判別装置のハードウェア構成を示す。画像判別装置に用いるハードウェアとしては、一般的なパーソナルコンピュータ装置を用いることが可能である。この場合、装置は、画像取得部としての外部インタフェースと、その他の機能を記憶し、実行するための、ハードディスクドライブ（ＨＤＤ）、半導体メモリ等の記憶装置、演算処理を行うＣＰＵ、処理の途中情報を保持するためのメモリ（ＲＡＭ）、判別結果を出力するためのディスプレイ等を備える。また、並列処理を行うため、グラフィックスプロセッシングユニット（ＧＰＵ）を備えていても良い。 (Hardware configuration)
FIG. 3 shows the hardware configuration of the image discrimination device according to the present embodiment. As the hardware used for the image discrimination device, a general personal computer device can be used. In this case, the device has an external interface as an image acquisition unit, a storage device such as a hard disk drive (HDD) and a semiconductor memory for storing and executing other functions, a CPU for performing arithmetic processing, and information during processing. It is equipped with a memory (RAM) for holding the interface, a display for outputting the discrimination result, and the like. Further, since parallel processing is performed, a graphics processing unit (GPU) may be provided.

（判別処理の説明）
図４は、本実施形態における認識処理の流れを示すフローチャートである。以下に、図４のフローチャートに従って本実施形態に係る画像判別装置の具体的な処理について述べる。 (Explanation of discrimination process)
FIG. 4 is a flowchart showing the flow of recognition processing in the present embodiment. Hereinafter, specific processing of the image discrimination apparatus according to the present embodiment will be described with reference to the flowchart of FIG.

ステップＳ１００では、判別対象となる画像が入力される。入力の方法は、カメラ等の外部装置によりリアルタイムに入力されることも可能であり、予め記憶装置等に記憶された画像データを入力することも可能である。入力される画像の形式は、静止画および動画の場合があり、動画の場合には、フレームごとに画像の判別処理を行う。 In step S100, an image to be discriminated is input. The input method can be input in real time by an external device such as a camera, or image data stored in advance in a storage device or the like can be input. The format of the input image may be a still image or a moving image, and in the case of a moving image, the image discrimination process is performed for each frame.

ステップＳ１１０では、入力された画像データをそれぞれの画素の特徴量に基づいて連結された複数の画素からなる局所領域（スーパーピクセル）に分割する。スーパーピクセル生成のアルゴリズムは当業者が既知の方法から任意に選択可能であり、例えば、特許文献１、非特許文献３等の方法が挙げられる。本ステップ以降の処理では本ステップで生成したスーパーピクセルを画像の要素単位として扱う。図５に入力画像の例と、スーパーピクセル生成を行った画像の例を示す。図中、左画像が入力画像であり、右画像が処理画像である。図から、入力画像が局所領域（スーパーピクセル）に分割されている様子が確認される。 In step S110, the input image data is divided into a local region (super pixel) composed of a plurality of connected pixels based on the feature amount of each pixel. The algorithm for generating superpixels can be arbitrarily selected from methods known to those skilled in the art, and examples thereof include methods such as Patent Document 1 and Non-Patent Document 3. In the processing after this step, the superpixels generated in this step are treated as an element unit of the image. FIG. 5 shows an example of an input image and an example of an image in which superpixel generation is performed. In the figure, the left image is the input image and the right image is the processed image. From the figure, it can be confirmed that the input image is divided into local regions (super pixels).

ステップＳ１２０では、入力された局所領域について、符号化によって低次元化を図る。符号化処理のアルゴリズムとしては、例えば、畳み込みニューラルネットワークを用いた自己符号化器が好適に適用可能である。CNNsを適用した場合には、各局所領域について、符号化器による最適化を行いながら、判別を行うためのパラメータを学習させると同時に、前記パラメータに基づいて、各局所領域を低次元化表現することが可能になる。CNNsでは、符号化器の入力データと出力データは同一であり、前記パラメータを用いることで、低次元化された画像から、入力画像を復元することが可能である。なお、ここで学習されたパラメータは、その後適宜に更新され、ユーザの判断基準に近づくよう最適化される。 In step S120, the input local region is reduced in dimension by coding. As the coding processing algorithm, for example, a self-encoder using a convolutional neural network can be preferably applied. When CNNs are applied, each local region is optimized by a encoder to learn parameters for discrimination, and at the same time, each local region is expressed in a low dimension based on the parameters. Will be possible. In CNNs, the input data and the output data of the encoder are the same, and by using the above parameters, it is possible to restore the input image from the reduced-dimensional image. The parameters learned here are subsequently updated as appropriate and optimized to approach the user's judgment criteria.

ステップＳ１３０では、低次元化されたデータの集合を、類似する任意の数の集合に分類する。具体的な処理としては、N次元のデータに符号化処理された局所領域データを、N次元の空間上にマッピングして、その後、所定のクラスタリングのアルゴリズムを用いて、各データを距離によって分類する。クラスタリングのアルゴリズムの例としては、当業者に既知の手法から任意のものが選択可能であり、代表的な例としては、K-Means、DBSCAN（非特許文献４）、Mean-shift、OPTICS（非特許文献５）が挙げられ、特にOPTICSが望ましい。図６にクラスタリングのイメージを示す。出力されたクラスタは、それぞれが画像中の被写体と対応しており、例えば、同一クラスタをつなぎ合わせることで、画像中におけるそれぞれの対象物であると判別することが可能となる。 In step S130, the set of low-dimensional data is classified into an arbitrary number of similar sets. As a specific process, the local region data encoded into the N-dimensional data is mapped on the N-dimensional space, and then each data is classified by the distance using a predetermined clustering algorithm. .. As an example of the clustering algorithm, any method can be selected from methods known to those skilled in the art, and typical examples are K-Means, DBSCAN (Non-Patent Document 4), Mean-shift, and OPTICS (Non-Patent Document 4). Patent Document 5) is mentioned, and OPTICS is particularly desirable. FIG. 6 shows an image of clustering. Each of the output clusters corresponds to the subject in the image, and for example, by connecting the same clusters, it is possible to determine that they are the respective objects in the image.

ステップＳ１４０では、被写体ごとに判別された各局所領域に対して、それぞれ所定のラベルを付与する。ラベル付与に際しては、ラベル付与部１４０は自身に接続されたラベル保持部２１０を参照し、対象となるクラスタと類似する性質を持つ、確定ラベルを付与された局所領域があるか確認する。ここで、ラベル保持部２１０には、以降の処理において、ユーザによって任意の集合について付与された、確定ラベルが記録されている。確認の結果、対象となる集合に付与すべき確定ラベルが確認された場合には、当該確定ラベルを対象となる集合に付与する。 In step S140, a predetermined label is given to each local region determined for each subject. At the time of labeling, the labeling unit 140 refers to the label holding unit 210 connected to itself, and confirms whether there is a local region with a definite label having properties similar to those of the target cluster. Here, the label holding unit 210 records a definite label given by the user for an arbitrary set in the subsequent processing. As a result of the confirmation, when the confirmation label to be given to the target set is confirmed, the confirmation label is given to the target set.

上記検索の結果、ラベル保持部２１０内に、対象となる集合に、確定ラベルが付された局所領域が含まれていない場合には、当該対象には、一時的な仮ラベルを付与して、次の処理へ進むこととなる。反対に、対象となる集合に付与すべき確定ラベルが２以上確認された場合には、画像判別装置による判別結果が、ユーザの感覚と乖離していると認められるため、符号化基準保持部２００に記録されたパラメータを更新する。その後、確定ラベルもしくは仮ラベルが付与された集合は、ユーザに結果を通知するため、結果表示を行うステップＳ１５０に受け渡される。 As a result of the above search, if the target set does not include a local area with a definite label in the label holding unit 210, a temporary temporary label is given to the target. It will proceed to the next process. On the contrary, when two or more definite labels to be given to the target set are confirmed, it is recognized that the discrimination result by the image discrimination device deviates from the user's feeling, so that the coding standard holding unit 200 Update the parameters recorded in. After that, the set to which the final label or the temporary label is given is passed to step S150 for displaying the result in order to notify the user of the result.

ステップＳ１５０では、各局所領域が被写体ごとに判別された結果を表示してユーザに示す。表示方法は、例えば、局所領域の画像を、クラスタごとに所定の色やテキストを付して表示し、どの領域が一つの被写体であるかとユーザが区別できる形式で表示する。ユーザは、表示された判別結果を確認し、自らも被写体の判別を行うことができる。ユーザによる判別の結果、任意の領域が所定の被写体であるとユーザが判断した場合には、例えば、結果表示部に表示されたラベル確定部から、確定ラベルを付与する。確定ラベルが付与された場合には、画像判別装置は、当該確定ラベルをラベル保持部２１０に記録し、ラベル付与ステップＳ１４０によって参照される確定ラベルとして、以降の画像または領域を処理する際に使用される。 In step S150, the result of determining each local region for each subject is displayed and shown to the user. As a display method, for example, an image of a local area is displayed with a predetermined color or text for each cluster, and the image is displayed in a format that allows the user to distinguish which area is one subject. The user can confirm the displayed discrimination result and discriminate the subject himself / herself. When the user determines that an arbitrary area is a predetermined subject as a result of the determination by the user, for example, a confirmation label is given from the label confirmation unit displayed on the result display unit. When a confirmation label is assigned, the image discrimination apparatus records the confirmation label in the label holding unit 210 and uses it when processing a subsequent image or area as a confirmation label referred to by the label assignment step S140. Will be done.

本実施形態に係る画像判別装置では、例えば局所領域の低次元化に自己符号化器を適用した場合には、判別装置自らが判別基準を学習し、最適化しながら局所領域の判別を行うため、ユーザがあらかじめ提供した教材を用いた事前学習処理を行わなくても、画像判別を行うことが可能である。ただし、判別結果は必ずしもユーザの判別基準と一致するものではなく、低次元化を行う際のパラメータ如何によっては、画像中の領域をユーザの認識と異なる被写体であると判別する可能性がある。この場合、例えば、ユーザは、判別結果が自身の判断で確定させるステップＳ１６０を設け、その結果を以降の処理に反映させることで修正することが可能である。図７に結果表示画面のイメージを示す。図から、判別されたクラスタが、システムによって一つの物体であると推測されている様子が確認できる。 In the image discrimination device according to the present embodiment, for example, when the self-encoder is applied to lower the dimension of the local region, the discrimination device itself learns the discrimination standard and discriminates the local region while optimizing it. Image discrimination can be performed without performing pre-learning processing using the teaching materials provided in advance by the user. However, the discrimination result does not always match the discrimination standard of the user, and there is a possibility that the region in the image is discriminated as a subject different from the recognition of the user depending on the parameters at the time of lowering the dimension. In this case, for example, the user can make corrections by providing step S160 in which the determination result is determined by his / her own determination and reflecting the result in the subsequent processing. FIG. 7 shows an image of the result display screen. From the figure, it can be confirmed that the identified cluster is presumed to be one object by the system.

以上により、本実施形態に係る画像判別装置を用いて、カメラ等から入力された画像について、ユーザの判断を参照しながら、人間の判断に近い判断で被写体を分類することが可能となる。本発明では、前記実施形態に係る機能を実現するプログラムをネットワーク又は記憶媒体を介して、コンピュータ等の装置に供給して、当該システムのCPUやGPUが当該プログラムを読みだして実行する形態で実行することが可能である。 As described above, using the image discrimination device according to the present embodiment, it is possible to classify the subject with a judgment close to that of a human being while referring to the judgment of the user with respect to the image input from the camera or the like. In the present invention, a program that realizes the function according to the embodiment is supplied to a device such as a computer via a network or a storage medium, and the CPU or GPU of the system reads and executes the program. It is possible to do.

本発明に係る画像判別装置を使用することで、例えば、災害現場等で人間が立ち入ることが困難な環境下において、無人ロボットによる調査を行う場合に、倒壊した建材、瓦礫、要救助者等を判別することが可能となり、より迅速な救助に資することが可能になる。 By using the image discrimination device according to the present invention, for example, when conducting an investigation by an unmanned robot in an environment where it is difficult for humans to enter at a disaster site or the like, collapsed building materials, rubble, people requiring rescue, etc. can be removed. It becomes possible to discriminate and contribute to quicker rescue.

１００画像取得部
１１０局所領域生成部
１２０符号化処理部
１３０領域判別部
１４０ラベル付与部
１５０結果表示部
１６０ラベル確定部
２００符号化基準保持部
２１０ラベル保持部

100 Image acquisition unit 110 Local area generation unit 120 Coding processing unit 130 Area discrimination unit 140 Labeling unit 150 Result display unit 160 Label confirmation unit 200 Coding standard holding unit 210 Label holding unit

Claims

It is an image discrimination device that processes an image input from the outside, discriminates each subject of the image for each object and category, and outputs it.
An image acquisition unit that acquires image data from outside the device,
A local area generation unit that divides into a local area consisting of a plurality of pixels connected based on the feature amount of each pixel in the image data, and a local area generation unit.
A coding reference holding unit that holds parameters for extracting the features of the local region, and
A coding processing unit that uses the parameters of the coding standard holding unit to reduce the dimension of the local region by coding.
A region determination unit that classifies the local regions reduced in dimension by the coding processing unit into an arbitrary number of sets including similar local regions.
A labeling unit that attaches a label to the set classified by the area discrimination unit, and a labeling unit.
A label confirmation unit that determines the label assigned by the label assignment unit with an arbitrary name by the user,
A label holding unit that holds the label confirmed by the label fixing unit, and a label holding unit.
A result display unit that displays each area determined by area discrimination to the user.
Equipped with a,
In the coding processing unit, a self-encoder is used to code a local region using the parameters held in the coding reference holding unit, and optimization of lower dimension by coding is performed. To update the parameters
In the label giving unit, with reference to the label holding unit, is there a set to which a definite label is given, including a local region similar to the local region included in the set, for the set determined by the region discriminating unit? Check,
If a set with a definite label containing a similar local region is confirmed, the definite label is given to the set, and if it is not confirmed, a temporary label is temporarily given to the set. Do, do
An image discrimination device characterized by this.

When two or more labels confirmed by the label fixing unit are duplicated in one set, the labeling unit changes the parameters of the coding reference holding unit so as to eliminate the duplication. The image discrimination apparatus according to claim 1.

It is an image discrimination method that processes an image input from the outside, discriminates each subject of the image for each object and category, and outputs it.
Image acquisition process to acquire image data from outside the device,
A local region generation step of dividing into a local region composed of a plurality of pixels connected based on the feature amount of each pixel in the image data, and
The recorded in coded reference holder parameters for extracting the features of the local region, the code using a parameter having criterion holding unit, the local region reduce the dimensionality by encoding the encoding process When,
A region determination step of classifying the local regions reduced in dimension by the coding processing step into an arbitrary number of sets including similar local regions, and
A labeling step of attaching a label to the set classified by the area determination step, and a labeling step.
A label confirmation step in which the user confirms the label given by the label assignment step with an arbitrary name,
It is provided with a result display step of displaying each area determined by area determination so as to be shown to the user.
In the coding processing step, a self-encoder is used to code a local region using the parameters held in the coding reference holding unit, and optimization of lower dimension by coding is performed. Perform the process of updating the parameters
In the label applying step, the label holding unit that holds the label determined by the label determination step is referred to, and for the set determined by the area determination step, a local region similar to the local region included in the set is obtained. Check if there is a set with a definite label, including
If a set with a definite label containing a similar local region is confirmed, the definite label is given to the set, and if it is not confirmed, a temporary label is temporarily given to the set. Do, do
An image discrimination method characterized by the fact that.

When two or more labels confirmed by the label determination step are given to one set in duplicate, a parameter change step of changing the parameters of the coding reference holding unit so as to eliminate the overlap. The image discrimination method according to claim 3 , further comprising.