JP7269013B2

JP7269013B2 - Image recognition system and image recognition method capable of suppressing false recognition

Info

Publication number: JP7269013B2
Application number: JP2019001909A
Authority: JP
Inventors: 洋一三谷; 晋一郎安藤
Original assignee: Kawasaki Jukogyo KK
Current assignee: Kawasaki Motors Ltd
Priority date: 2019-01-09
Filing date: 2019-01-09
Publication date: 2023-05-08
Anticipated expiration: 2039-01-09
Also published as: JP2020112926A

Description

本発明は、深層学習（ディープラーニング）を用いた画像認識における誤認識を抑制することが可能な画像認識システムおよび画像認識方法に関する。 The present invention relates to an image recognition system and an image recognition method capable of suppressing erroneous recognition in image recognition using deep learning.

機械学習は、準備されたデータ集合に基づいて「機械」自身が学習することにより、当該データ集合から法則性、規則性、判断基準等（学習結果）を発見したり予測したりする技術である。機械学習の手法の一つとして、近年、深層学習（ディープラーニング）が注目されている。深層学習は、ニューラルネットワークを多層化（数十層～数百層）したものであり、学習プロセスをより高精度化することが可能であるため、様々な分野への適用（または適用の検討）がなされている。 Machine learning is a technology that discovers and predicts rules, regularities, judgment criteria, etc. (learning results) from the data set by having the "machine" itself learn based on the prepared data set. . As one of machine learning methods, deep learning has been attracting attention in recent years. Deep learning is a multi-layered neural network (several tens to hundreds of layers), and it is possible to improve the accuracy of the learning process, so it can be applied (or considered for application) to various fields. is done.

深層学習の代表的な適用分野としては、画像認識が挙げられる。深層学習を用いた画像認識処理は高い汎化能力を有することが知られており、例えば、認識対象Ｘ（例えば猫）について数万枚の学習用画像を準備し、これら学習用画像を用いて深層学習に学習させ、この深層学習を用いて画像認識処理システムを構築する。この画像認識処理システムに対して学習に使用していない対象Ｘ（猫）の画像を入力しても、その入力画像を適切に対象Ｘ（猫）と認識することが可能である。 A typical application field of deep learning is image recognition. Image recognition processing using deep learning is known to have high generalization ability. Let deep learning learn, and build an image recognition processing system using this deep learning. Even if an image of target X (cat) not used for learning is input to this image recognition processing system, it is possible to appropriately recognize the input image as target X (cat).

ところで、機械学習を用いた画像認識処理の分野では、誤認識を抑制するために種々の手法が提案されている。例えば、特許文献１には、放射線画像中から画像認識により対象物を検出する放射線撮影装置において、機械学習を用いた画像認識を行う際の誤検出（誤認識）を抑制することを目的とする技術が提案されている。この技術では、対象物の画像を複数角度に回転させて得られた複数の回転画像を用いた機械学習により、画像認識用の学習結果データを予め取得して記憶しておき、この学習結果データに基づいて、撮影された放射線画像中から画像認識により対象物を検出している。 By the way, in the field of image recognition processing using machine learning, various techniques have been proposed to suppress erroneous recognition. For example, in Patent Document 1, in a radiation imaging apparatus that detects an object from a radiographic image by image recognition, an object is to suppress erroneous detection (erroneous recognition) when performing image recognition using machine learning. techniques have been proposed. In this technology, learning result data for image recognition is acquired in advance and stored by machine learning using a plurality of rotated images obtained by rotating an image of an object at a plurality of angles, and this learning result data is stored. Based on this, the object is detected by image recognition from the captured radiographic image.

さらに、深層学習を用いた画像認識処理においては、予想外の誤認識が発生することも最近明らかとなっている。例えば、人間にはノイズ画像にしか見えない画像を対象Ｘであると認識することが知られている。あるいは、意図的に誤認識するように作成された画像を用いることで、深層学習に誤認識させることが可能であることも報告されている。 Furthermore, it has recently become clear that unexpected misrecognition occurs in image recognition processing using deep learning. For example, it is known to recognize an image that looks like a noise image to humans as an object X. Alternatively, it has been reported that it is possible to cause deep learning to misrecognise by using an image that is intentionally created to cause misrecognition.

例えば、非特許文献１では、学習済のディープニューラルネットワーク（ＤＮＮ）に対して、微細な摂動（perturbations）を付与した画像を認識させると、誤認識することが報告されており、このように摂動が付与されたexampleを“Adversarial Example”と称している。また、非特許文献２では、非特許文献１とは異なる手法で“Adversarial Example”を作成できることが報告されている。 For example, Non-Patent Document 1 reports that erroneous recognition occurs when a trained deep neural network (DNN) is made to recognize an image to which fine perturbations are applied. is called an “Adversarial Example”. In addition, Non-Patent Document 2 reports that an “Adversarial Example” can be created by a method different from that of Non-Patent Document 1.

特開２０１７－１８５００７号公報JP 2017-185007 A

“Intriguing Properties of Neural Networks”，Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus著、arXiv org，arXiv:1312.6199 (version 4)，２０１３年１２月２１日（version 1)"Intriguing Properties of Neural Networks", Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus, arXiv org, arXiv:1312.6199 (version 4), December 21, 2013 (version 1 ) “Explaining and Harnessing Adversarial Examples”，Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy著、arXiv org, arXiv:1412.6572 (version 3)，２０１４年１２月２０日（version 1）"Explaining and Harnessing Adversarial Examples", Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy, arXiv org, arXiv:1412.6572 (version 3), December 20, 2014 (version 1)

特許文献１では、周辺構造の状況が様々に変化することに起因する誤認識の発生を抑制することを目的としており、そのために機械学習を用いている。具体的な機械学習としては、ニューラルネットワーク、サポートベクターマシン（ＳＶＭ）、ブースティング等の識別器を用いることが記載されており、具体的な一例として、アンサンブル学習の一種であるブースティングが挙げられている。しかしながら、この特許文献１の手法では、予想外の誤認識、あるいは、意図的に作成された画像の誤認識を抑制するものではない。 Japanese Patent Laid-Open No. 2002-200000 aims at suppressing the occurrence of erroneous recognition caused by various changes in the situation of the surrounding structure, and uses machine learning for this purpose. Specific examples of machine learning include the use of discriminators such as neural networks, support vector machines (SVM), and boosting. One specific example is boosting, which is a type of ensemble learning. ing. However, the method disclosed in Patent Document 1 does not suppress unexpected erroneous recognition or erroneous recognition of an intentionally created image.

また、非特許文献１および非特許文献２では、前記の通り、誤認識される画像データである“Adversarial Example”を意図的に作成できることを報告しており、例えば、非特許文献２では、このような“Adversarial Example”を学習用データとして深層学習に用いることで、誤認識を抑制することを提案している。しかしながら、この手法は、他の手法で作成された“Adversarial Example”については誤認識する可能性が考えられる。 In addition, Non-Patent Document 1 and Non-Patent Document 2 report that as described above, it is possible to intentionally create an "Adversarial Example", which is image data that is misrecognized. We propose to suppress misrecognition by using such “Adversarial Examples” as learning data for deep learning. However, there is a possibility that this method may erroneously recognize "Adversarial Examples" created by other methods.

本発明はこのような課題を解決するためになされたものであって、深層学習（ディープラーニング）を用いた画像認識において、予想外の誤認識または意図的に作成された画像の誤認識を有効に抑制することを目的とする。 The present invention has been made to solve such problems, and is effective for unexpected erroneous recognition or erroneous recognition of intentionally created images in image recognition using deep learning. The purpose is to suppress

本発明に係る画像認識システムは、前記の課題を解決するために、並列する複数の画像認識処理部と、入力画像データを複数の前記画像認識処理部に分配供給する入力画像分配部と、複数の前記画像認識処理部による認識結果である、複数の個別認識結果を統合処理する認識結果統合処理部と、を備え、前記画像認識処理部は、認識対象について学習済の深層学習により画像認識処理を行い、前記入力画像データに含まれる認識候補について前記個別認識結果を生成し、複数の前記画像認識処理部におけるそれぞれの前記深層学習は互いに異なるものであり、前記認識結果統合処理部は、複数の前記個別認識結果が全て一致したときには、当該個別認識結果を前記認識対象の統合認識結果として出力する、もしくは、複数の前記個別認識結果が全て一致しないときには、一致が最も多い前記個別認識結果を前記認識対象の統合認識結果として出力する構成である。 In order to solve the above problems, an image recognition system according to the present invention includes a plurality of parallel image recognition processing units, an input image distribution unit that distributes and supplies input image data to the plurality of image recognition processing units, and a plurality of image recognition processing units. and a recognition result integration processing unit that integrates a plurality of individual recognition results, which are the recognition results of the image recognition processing unit, wherein the image recognition processing unit performs image recognition processing by deep learning that has already learned about a recognition target. and generating the individual recognition results for the recognition candidates included in the input image data, the deep learning in the plurality of image recognition processing units being different from each other, and the recognition result integration processing unit comprising a plurality of When all of the individual recognition results match, the individual recognition result is output as the integrated recognition result of the recognition target, or when the plurality of individual recognition results do not all match, the individual recognition result with the greatest number of matches is selected. It is configured to output as an integrated recognition result of the recognition target.

前記構成によれば、複数の画像認識処理部を並列して備えており、これら画像認識処理部の画像認識処理は、認識対象についての深層学習（ディープラーニング）により行われるが、それぞれの深層学習は、認識対象について互いに異なるように学習したものである。そして、これら画像認識処理部からそれぞれ得られる個別認識結果を統合する際には、これら個別認識結果が「全会一致」であるか、もしくは、「多数決」で最も一致が多い個別認識結果を選択し、統合認識結果として出力する。これにより、いずれかの画像認識処理部において、認識対象について予想外の誤認識が発生したり、認識対象の画像が誤認識させるために意図的に作成された画像であったりしても、全会一致または多数決のプロセスで誤認識を実質的に排除することができる。それゆえ、深層学習（ディープラーニング）を用いた画像認識において、予想外の誤認識または意図的に作成された画像の誤認識であっても有効に抑制することが可能となる。 According to the above configuration, a plurality of image recognition processing units are provided in parallel, and the image recognition processing of these image recognition processing units is performed by deep learning on the recognition target. are learned differently about recognition targets. Then, when integrating the individual recognition results obtained from these image recognition processing units, the individual recognition result that is "unanimous" or the individual recognition result that has the largest number of matches by "majority vote" is selected. , are output as integrated recognition results. As a result, even if an unexpected erroneous recognition of the recognition target occurs in any of the image recognition processing units, or if the image of the recognition target is an image intentionally created to cause erroneous recognition, False recognitions can be virtually eliminated in the process of consensus or majority voting. Therefore, in image recognition using deep learning, it is possible to effectively suppress unexpected erroneous recognition or erroneous recognition of an intentionally created image.

前記構成の画像認識システムにおいては、前記画像認識処理部は、深層学習により画像認識処理を行うことにより、前記入力画像データに含まれる認識候補についての認識結果を確信度とともに生成する個別認識処理部と、生成した認識結果および確信度に基づいて前記認識結果の有効性を判定し、前記認識結果統合処理部に対して、前記認識結果とともに当該認識結果の有効性を前記個別認識結果として出力する個別認識結果判定部と、を備えており、前記認識結果統合処理部は、複数の前記個別認識結果を統合処理する際には、少なくとも前記有効性に基づいて、それぞれの個別認識結果に投票するか否かを判定するとともに、当該投票結果に基づいて前記統合認識結果を生成する構成であってもよい。 In the image recognition system configured as described above, the image recognition processing unit performs image recognition processing by deep learning, thereby generating a recognition result for each recognition candidate included in the input image data together with a certainty factor. and determining the validity of the recognition result based on the generated recognition result and confidence factor, and outputting the validity of the recognition result together with the recognition result to the recognition result integration processing unit as the individual recognition result. and an individual recognition result determination unit, wherein when the plurality of individual recognition results are integrated, the recognition result integration processing unit votes for each individual recognition result based on at least the validity. It may be configured to determine whether or not, and to generate the integrated recognition result based on the voting result.

また、前記構成の画像認識システムにおいては、前記認識結果統合処理部は、複数の前記個別認識結果がいずれも有効でない場合、または、有効でない前記個別認識結果が有効である前記個別認識結果よりも多い場合には、統合認識結果として不明を出力する構成であってもよい。 Further, in the image recognition system configured as described above, the recognition result integration processing unit may be configured such that when none of the plurality of individual recognition results is valid, or when the invalid individual recognition result is more effective than the valid individual recognition result If there are many, it may be configured to output unknown as the integrated recognition result.

また、前記構成の画像認識システムにおいては、複数の前記画像認識処理部には、それぞれ異なる教師画像を用いて学習した深層学習、異なる学習の初期値を設定した深層学習、または、中間層の層数およびセル数の少なくとも一方が異なる深層学習のいずれかが用いられる構成であってもよい。 In the image recognition system having the configuration described above, the plurality of image recognition processing units include deep learning in which learning is performed using different teacher images, deep learning in which different initial values for learning are set, or intermediate layers. A configuration using either deep learning in which at least one of the number and the number of cells is different may be used.

また、前記構成の画像認識システムにおいては、さらに、前記入力画像分配部に入力画像データを供給する画像供給部と、前記認識結果統合処理部から出力される前記統合認識結果を用いて所定の処理を実行する認識結果使用部との少なくとも一方を備えている構成であってもよい。 Further, in the image recognition system configured as described above, an image supply unit that supplies input image data to the input image distribution unit, and a predetermined processing using the integrated recognition result output from the recognition result integration processing unit. and at least one of a recognition result using unit that executes

また、本発明に係る画像認識処理方法は、認識対象についての互いに異なる学習済の深層学習により画像認識処理を行う画像認識処理部を複数並列し、これら画像認識処理部に対して入力画像データを分配供給して、それぞれの画像認識処理部が画像認識処理を行って、前記入力画像データに含まれる認識候補についての個別認識結果を複数生成し、これら個別認識結果を統合処理し、この統合処理では、複数の前記個別認識結果が全て一致したときには、当該個別認識結果を前記認識対象の統合認識結果として出力する、もしくは、複数の前記個別認識結果が全て一致しないときには、一致が最も多い前記個別認識結果を前記認識対象の統合認識結果として出力する構成である。 Further, in the image recognition processing method according to the present invention, a plurality of image recognition processing units that perform image recognition processing by deep learning that has been trained differently from each other for a recognition target are arranged in parallel, and input image data is sent to these image recognition processing units. distributed and supplied, each image recognition processing unit performs image recognition processing, generates a plurality of individual recognition results for recognition candidates included in the input image data, integrates these individual recognition results, and performs this integration processing. Then, when the plurality of individual recognition results all match, the individual recognition result is output as the integrated recognition result of the recognition target, or when the plurality of individual recognition results do not all match, the individual A recognition result is output as an integrated recognition result of the recognition target.

本発明では、以上の構成により、深層学習（ディープラーニング）を用いた画像認識において、予想外の誤認識または意図的に作成された画像の誤認識を有効に抑制することができる、という効果を奏する。 According to the present invention, with the above configuration, it is possible to effectively suppress unexpected erroneous recognition or erroneous recognition of an intentionally created image in image recognition using deep learning. Play.

（Ａ）は、本開示の実施の形態１に係る画像認識システムの構成の一例を示すブロック図であり、（Ｂ）は、（Ａ）に示す画像認識システムが備える画像認識処理部の具体的な構成の一例を示すブロック図である。1A is a block diagram showing an example of the configuration of an image recognition system according to Embodiment 1 of the present disclosure, and FIG. 1 is a block diagram showing an example of a configuration; FIG. 図１（Ａ）に示す画像認識システムが備える認識結果統合処理部による統合処理の代表的な一例を示すフローチャートである。2 is a flowchart showing a typical example of integration processing by a recognition result integration processing unit provided in the image recognition system shown in FIG. 1(A); 本開示の実施の形態２に係る画像認識システムが備える認識結果統合処理部による統合処理の代表的な一例を示すフローチャートである。FIG. 11 is a flow chart showing a typical example of integration processing by a recognition result integration processing unit included in the image recognition system according to Embodiment 2 of the present disclosure; FIG. 本開示の実施の形態３に係る画像認識システムの構成例を示すブロック図である。FIG. 11 is a block diagram showing a configuration example of an image recognition system according to Embodiment 3 of the present disclosure; 図４に示す画像認識システムの構成例を示すブロック図である。5 is a block diagram showing a configuration example of the image recognition system shown in FIG. 4; FIG.

以下、本発明の代表的な実施の形態を、図面を参照しながら説明する。なお、以下では全ての図を通じて同一又は相当する要素には同一の参照符号を付して、その重複する説明を省略する。 Hereinafter, typical embodiments of the present invention will be described with reference to the drawings. In the following description, the same or corresponding elements are denoted by the same reference numerals throughout all the drawings, and duplicate descriptions thereof will be omitted.

（実施の形態１）
図１（Ａ）に示すように、本開示に係る画像認識システム１０Ａは、画像認識部１１Ａを備えており、この画像認識部１１Ａは、入力画像分配部１２、画像認識処理部１３Ａ～１３Ｃ、認識結果統合処理部１４を備えている。 (Embodiment 1)
As shown in FIG. 1A, an image recognition system 10A according to the present disclosure includes an image recognition unit 11A. The image recognition unit 11A includes an input image distribution unit 12, image recognition processing units 13A to 13C, A recognition result integration processing unit 14 is provided.

入力画像分配部１２には、入力画像データが供給され（図中「画像供給」）、この入力画像データを画像認識処理部１３Ａ～１３Ｃに分配供給する。入力画像データには、認識対象となり得る画像が含まれている可能性があるが、含まれていなくてもよい。なお、認識対象となり得る画像を便宜上「認識候補」と称する。 Input image data is supplied to the input image distribution unit 12 ("image supply" in the drawing), and this input image data is distributed and supplied to the image recognition processing units 13A to 13C. The input image data may contain an image that can be recognized, but it does not have to be contained. Note that an image that can be a recognition target is called a "recognition candidate" for convenience.

画像認識処理部１３Ａ～１３Ｃは、互いに並列するように設けられており、入力画像分配部１２から分配供給された入力画像データについて画像認識処理を行い、入力画像データに含まれる認識候補について認識結果を生成する。この画像認識処理は、認識対象について学習済の深層学習により行われる。認識結果統合処理部１４は、画像認識処理部１３Ａ～１３Ｃからそれぞれ出力される認識結果を統合処理して、最終的な認識結果である「統合認識結果」を出力する。 The image recognition processing units 13A to 13C are provided parallel to each other, perform image recognition processing on the input image data distributed and supplied from the input image distribution unit 12, and obtain recognition results for recognition candidates included in the input image data. to generate This image recognition processing is performed by deep learning in which the recognition target has already been learned. The recognition result integration processing unit 14 integrates the recognition results output from the image recognition processing units 13A to 13C, and outputs an "integrated recognition result" which is the final recognition result.

ここで、入力画像データに含まれる認識候補は、画像認識処理部１３Ａ～１３Ｃの画像認識処理により認識対象であるか否か認識されるが、説明の便宜上、この認識候補についての認識結果を「候補認識結果」と称する。画像認識処理部１３Ａ～１３Ｃでは、複数の候補認識結果が得られる可能性があり、この場合には、複数の候補認識結果から適切な認識結果を採用することになる。そこで、画像認識処理部１３Ａ～１３Ｃで採用された適切な認識結果を、候補認識結果と区別して「個別認識結果」と称する。また、画像認識処理部１３Ａ～１３Ｃから得られる３つの個別認識結果は、認識結果統合処理部１４により統合されて、最終的な認識結果となるが、この最終的な認識結果を前記の通り「統合認識結果」と称する。 Here, the recognition candidates included in the input image data are recognized by the image recognition processing of the image recognition processing units 13A to 13C as to whether or not they are recognition targets. "candidate recognition result". A plurality of candidate recognition results may be obtained in the image recognition processing units 13A to 13C, and in this case, an appropriate recognition result is adopted from the plurality of candidate recognition results. Therefore, the appropriate recognition results adopted by the image recognition processing units 13A to 13C are called "individual recognition results" to distinguish them from the candidate recognition results. The three individual recognition results obtained from the image recognition processing units 13A to 13C are integrated by the recognition result integration processing unit 14 to become the final recognition result. "integrated recognition result".

画像認識処理部１３Ａ～１３Ｃは、入力画像データに含まれる可能性のある認識候補について互いに異なる深層学習により画像認識処理を行う。ここでいう「異なる深層学習」とは、特に限定されないが、例えば、それぞれ異なる学習用画像を用いて学習した深層学習、異なる学習の初期値を設定した深層学習、または、中間層の層数およびセル数の少なくとも一方が異なる深層学習等を挙げることができる。あるいは、深層学習のプログラム開発環境を異なるものとしてもよい。 The image recognition processing units 13A to 13C perform image recognition processing on recognition candidates that may be included in the input image data by different deep learning. "Different deep learning" here is not particularly limited, but for example, deep learning that uses different learning images, deep learning that sets different initial values for learning, or the number of intermediate layers and Deep learning or the like in which at least one of the numbers of cells is different can be mentioned. Alternatively, a different program development environment for deep learning may be used.

本実施の形態では、例えば、それぞれ異なる学習用画像を用いた深層学習を用いている。具体的には、例えば、認識対象Ｘ（例えば猫）について、３００００枚の対象画像データを準備し、これらを１００００枚ずつ３つの学習用画像群（第一画像群、第二画像群、および第三画像群）に分ける。第一画像認識処理部１３Ａでは第一画像群を使用して深層学習を行い、第二画像認識処理部１３Ｂでは第二画像群を使用して深層学習を行い、第三画像認識処理部１３Ｃでは第三画像群を使用して深層学習を行う。 In this embodiment, for example, deep learning using different learning images is used. Specifically, for example, 30,000 target image data are prepared for a recognition target X (for example, a cat), and 10,000 of these target image data are divided into three learning image groups (first image group, second image group, and third image group). divided into three image groups). The first image recognition processing unit 13A performs deep learning using the first image group, the second image recognition processing unit 13B performs deep learning using the second image group, and the third image recognition processing unit 13C Deep learning is performed using the third set of images.

画像認識処理部１３Ａ～１３Ｃにおける深層学習（ディープラーニング）の具体的な構成は特に限定されず、公知の構成を好適に用いることができる。具体的には、例えば、畳み込みニューラルネットワーク（ＣＮＮ）、回帰結合型ニューラルネットワーク（ＲＮＮ）あるいはＬＳＴＭ（Long Short-Term Memory）、オートエンコーダ、ボルツマンマシン、敵対的生成ネットワーク（ＧＡＮ）等が挙げられる。本実施の形態では、例えば、ＣＮＮを用いており、パターンマッチングを行う畳み込み層とデータ集約を行うプーリング層とを交互に並べて数十層用意すればよい。 A specific configuration of deep learning in the image recognition processing units 13A to 13C is not particularly limited, and a known configuration can be preferably used. Specific examples include convolutional neural networks (CNN), recurrent neural networks (RNN), LSTM (Long Short-Term Memory), autoencoders, Boltzmann machines, and generative adversarial networks (GAN). In this embodiment, for example, a CNN is used, and dozens of layers may be prepared by alternately arranging convolutional layers for pattern matching and pooling layers for data aggregation.

画像認識処理部１３Ａ～１３Ｃは、学習済の深層学習により入力画像データに含まれる可能性のある認識候補について画像認識処理を行えばよく、その具体的な構成は特に限定されないが、例えば、図１（Ｂ）に示すように、画像認識処理部１３Ａ～１３Ｃとしては、個別認識処理部１３１および個別認識結果判定部１３２を備える構成を挙げることができる。 The image recognition processing units 13A to 13C may perform image recognition processing on recognition candidates that may be included in the input image data by learned deep learning, and the specific configuration thereof is not particularly limited. As shown in FIG. 1B, the image recognition processing units 13A to 13C may have a configuration including an individual recognition processing unit 131 and an individual recognition result determination unit 132. FIG.

個別認識処理部１３１は、深層学習により認識候補についての画像認識処理を行って候補認識結果を生成するが、この画像認識処理では、当該候補認識結果とともにその確信度も生成する。個別認識結果判定部１３２は、生成した候補認識結果および確信度に基づいて、候補認識結果の有効性を判定し、候補認識結果とともにその有効性を認識結果統合処理部１４に対して個別認識結果として出力する。 The individual recognition processing unit 131 performs image recognition processing on the recognition candidates by deep learning to generate candidate recognition results. In this image recognition processing, the candidate recognition results are also generated along with the degrees of certainty. The individual recognition result determination unit 132 determines the effectiveness of the candidate recognition result based on the generated candidate recognition result and the degree of certainty, and transmits the effectiveness together with the candidate recognition result to the recognition result integration processing unit 14 as the individual recognition result. output as

個別認識処理部１３１の具体的な構成は特に限定されず、前述した深層学習（例えば、ＣＮＮ等）であればよいが、この個別認識処理部１３１では、前記の通り、候補認識結果とともにその確信度を生成する。深層学習における認識結果の確信度は、通常、百分率（パーセンテージ）で生成されるが、複数のランクとして生成されてもよい。なお、本実施の形態では、個別認識処理部１３１の深層学習は、１種類のみの認識対象Ｘ（例えば猫）について学習したものであるとする。 The specific configuration of the individual recognition processing unit 131 is not particularly limited, and may be deep learning (for example, CNN, etc.) as described above. generate degrees. Confidence of recognition results in deep learning is usually generated as a percentage, but may be generated as a plurality of ranks. In this embodiment, the deep learning of the individual recognition processing unit 131 is assumed to have learned only one type of recognition target X (for example, a cat).

例えば、入力画像データに３つの認識候補が含まれているとして、第一の認識候補については９０％の確信度が生成（９０％の確信度で「猫」である）され、第二の認識候補については５０％の確信度が生成（５０％の確信度で「猫」である）され、第三の認識候補について２０％の確信度が生成（２０％の確信度で「猫」である）されてもよい。あるいは、第一の認識候補については高い確信を示すＡランクとして確信度が生成され、第二の認識候補について中間的な確信を示すＢランクとして確信度が生成され、第三の認識候補については、低い確信を示すＣランクとして確信度が生成されてもよい。 For example, assuming that the input image data contains three recognition candidates, a 90% confidence level is generated for the first recognition candidate (90% confidence is "cat"), and the second recognition candidate A 50% confidence is generated for the candidate (50% confidence is "cat") and a 20% confidence is generated for the third recognition candidate (20% confidence is "cat" ) may be Alternatively, for the first recognition candidate, the confidence level is generated as A rank indicating high confidence, for the second recognition candidate, the confidence level is generated as B rank indicating intermediate confidence, and for the third recognition candidate, , C-rank indicating low confidence.

個別認識結果判定部１３２の具体的な構成は特に限定されず、前記の通り、個別認識処理部１３１で生成された候補認識結果および確信度から、当該候補認識結果の有効性を判定する。有効性の判定手法についても特に限定されないが、例えば、（１）最も確信度の高い候補認識結果を採用する手法、（２）最も確信度の高い候補認識結果の確信度が十分に高い場合に当該候補認識結果を採用する手法、（３）最も確信度の高い候補認識結果の確信度が十分に高く、かつ、次いで確信度の高い候補認識結果との差が十分にある場合に当該候補認識結果を採用する手法等を挙げることができる。 The specific configuration of the individual recognition result determination unit 132 is not particularly limited, and as described above, the validity of the candidate recognition result is determined from the candidate recognition result and the certainty factor generated by the individual recognition processing unit 131 . The method of determining effectiveness is not particularly limited, either. (3) when the confidence of the candidate recognition result with the highest confidence is sufficiently high and there is a sufficient difference from the candidate recognition result with the second highest confidence, the candidate recognition; A method of adopting the result, etc. can be mentioned.

例えば、前述したように、入力画像データに３つの認識候補が含まれているとすれば、まず（１）の手法では、個別認識結果判定部１３２は、確信度が最も高い第一の認識候補の候補認識結果を採用して、個別認識結果として出力すればよい。 For example, as described above, if three recognition candidates are included in the input image data, in method (1), the individual recognition result determination unit 132 selects the first recognition candidate with the highest degree of certainty. can be adopted and output as an individual recognition result.

次に（２）の手法については、複数の候補認識結果の確信度が全て低い場合には、これらの中から最も確信度の高いものを採用することが妥当でない可能性がある。そこで、（２）の手法では、個別認識結果判定部１３２は、候補認識結果の確信度が十分に高いか否かを閾値等によって判定し、閾値以上であれば、当該候補認識結果を採用して個別認識結果として出力すればよい。例えば、前記のように、確信度が百分率であれば、確信度５０％を閾値として５０％以上の確信度の候補認識結果を個別認識結果として採用すればよい。また、確信度がランクであれば、Ｂランク以上の候補認識結果を個別認識結果として採用すればよい。 Next, with regard to the method (2), if all of the plurality of candidate recognition results have low certainty, it may not be appropriate to adopt the one with the highest certainty among them. Therefore, in the method (2), the individual recognition result determination unit 132 determines whether or not the certainty of the candidate recognition result is sufficiently high using a threshold or the like, and if it is equal to or higher than the threshold, the candidate recognition result is adopted. output as individual recognition results. For example, as described above, if the certainty is a percentage, the candidate recognition result with a certainty of 50% or more may be adopted as the individual recognition result with the certainty of 50% as the threshold. Also, if the certainty is a rank, then the candidate recognition result with rank B or higher may be adopted as the individual recognition result.

次に（３）の手法については、複数の候補認識結果のうち確信度１位の候補認識結果と確信度２位の候補認識結果とにおいて、その確信度が僅差である場合には、１位のものをそのまま採用することが妥当でない可能性がある。そこで、（３）の手法では、個別認識結果判定部１３２は、（２）の手法と同様に閾値等で上位の候補認識結果が十分に確信度の高いものであることを判定するとともに、これらの確信度に十分な差が生じている場合のみ、１位の候補認識結果を採用して個別認識結果として出力すればよい。 Next, with regard to method (3), if there is a small difference between the candidate recognition result with the highest confidence level and the candidate recognition result with the second highest confidence level among a plurality of candidate recognition results, There is a possibility that it may not be appropriate to adopt the thing of Therefore, in the method (3), the individual recognition result determination unit 132 determines that the high-ranking candidate recognition result has a sufficiently high degree of certainty using a threshold or the like as in the method (2). Only when there is a sufficient difference in confidence between the two, the first candidate recognition result should be adopted and output as an individual recognition result.

ここで、（２）の手法および（３）の手法では、確信度が閾値未満であったり、確信度１位と２位との差が僅差であったりすれば個別認識結果として採用されない。そこで、個別認識結果判定部１３２では、候補認識結果の有効性についてフラグを設定すればよい。このようなフラグとしては、例えば、確信度が５０％以上であれば有効となるバリッドフラグを挙げることができる。個別認識結果判定部１３２では、当初はバリッドフラグを初期化しておき（バリッドフラグ：０）、個別認識処理部１３１において、確信度が５０％以上の候補認識結果が生成されればバリッドフラグが有効になる（バリッドフラグ：１）。一方、確信度が５０％未満の候補認識結果が生成されればバリッドフラグは有効にならないので（バリッドフラグ：０）、当該候補認識結果は無効となる。 Here, in the method (2) and the method (3), if the degree of certainty is less than the threshold value or the difference between the first and second places in the certainty is very small, it is not adopted as an individual recognition result. Therefore, the individual recognition result determination unit 132 should set a flag for the validity of the candidate recognition result. As such a flag, for example, a valid flag that becomes effective when the certainty is 50% or more can be cited. The individual recognition result determination unit 132 initially initializes the valid flag (valid flag: 0), and if the individual recognition processing unit 131 generates a candidate recognition result with a certainty of 50% or more, the valid flag is valid. becomes (valid flag: 1). On the other hand, if a candidate recognition result with a certainty factor of less than 50% is generated, the valid flag is not valid (valid flag: 0), so the candidate recognition result is invalid.

画像認識処理部１３Ａ～１３Ｃにおいて、このように候補認識結果の有効または無効を判定するのであれば、当該画像認識処理部１３Ａ～１３Ｃは、認識結果統合処理部１４に対して、有効な候補認識結果および無効な候補認識結果のいずれも個別認識結果として出力することができる。認識結果統合処理部１４は、画像認識処理部１３Ａ～１３Ｃから出力された複数の個別認識結果を統合処理すればよいが、このとき、個別認識結果に含まれる有効性または無効性を利用することができる。 If the image recognition processing units 13A to 13C determine the validity or invalidity of the candidate recognition result in this way, the image recognition processing units 13A to 13C provide the recognition result integration processing unit 14 with valid candidate recognition. Both results and invalid candidate recognition results can be output as individual recognition results. The recognition result integration processing unit 14 may integrate a plurality of individual recognition results output from the image recognition processing units 13A to 13C. At this time, validity or invalidity included in the individual recognition results may be used. can be done.

本開示においては、認識結果統合処理部１４では、個別認識結果の統合処理として、（ｉ）複数の個別認識結果が全て一致したときには、当該個別認識結果を認識対象の統合認識結果として出力し、（ｉｉ）複数の個別認識結果が全て一致しない場合には、一致が最も多い個別認識結果を認識対象の統合認識結果として出力する。説明の便宜上、（ｉ）のステップを「全会一致」ステップと称し、（ｉｉ）のステップを「多数決」ステップと称する。 In the present disclosure, the recognition result integration processing unit 14, as integration processing of individual recognition results, (i) when a plurality of individual recognition results all match, outputs the individual recognition result as an integrated recognition result of the recognition target, (ii) If all of the individual recognition results do not match, the individual recognition result with the greatest number of matches is output as the integrated recognition result of the recognition target. For convenience of explanation, step (i) will be referred to as the "unanimous" step and step (ii) will be referred to as the "majority vote" step.

（ｉ）全会一致ステップおよび（ｉｉ）多数決ステップを含む統合処理の具体的な手法は特に限定されないが、本実施の形態では、例えば、個別認識結果に含まれる有効性に基づいて、それぞれの個別認識結果に投票するか否かを判定する統合処理手法を用いることができる。認識結果統合処理部１４は、個別認識結果に対する投票結果に基づいて統合認識結果を生成する。 (i) a unanimity step and (ii) a majority step. A joint processing technique can be used to determine whether or not to vote on a recognition result. The recognition result integration processing unit 14 generates an integrated recognition result based on the voting results for the individual recognition results.

認識結果統合処理部１４における投票による統合処理手法について、図２を参照して具体的に説明する。図２に示すフローチャートは、投票による統合処理手法を模式的にステップ化したものであり、この例では、ステップＳ１１～ステップＳ１９の合計９ステップから構成される。もちろん、統合処理手法はこれに限定されるものではない。 An integration processing method based on voting in the recognition result integration processing unit 14 will be specifically described with reference to FIG. The flowchart shown in FIG. 2 is a schematic step-by-step process of the integration processing method by voting, and in this example, it is composed of a total of 9 steps from step S11 to step S19. Of course, the integrated processing technique is not limited to this.

認識結果統合処理部１４に対しては、図１（Ａ）に示すように、画像認識処理部１３Ａ～１３Ｃからそれぞれ個別認識結果が出力される。ここで、第一画像認識処理部１３Ａからの個別認識結果を「第一個別認識結果」と称し、第二画像認識処理部１３Ｂからの個別認識結果を「第二個別認識結果」と称し、第三画像認識処理部１３Ｃからの個別認識結果を「第三個別認識結果」と称する。 As shown in FIG. 1A, individual recognition results are output to the recognition result integration processing unit 14 from the image recognition processing units 13A to 13C. Here, the individual recognition result from the first image recognition processing unit 13A is called "first individual recognition result", the individual recognition result from the second image recognition processing unit 13B is called "second individual recognition result", and the The individual recognition result from the three-image recognition processing section 13C will be referred to as "third individual recognition result".

認識結果統合処理部１４では、図２において、ステップＳ１１およびステップＳ１２のループで示すように、画像認識処理部１３Ａ～１３Ｃから出力された全ての個別認識結果について有効性を判断する。例えば、ステップＳ１１において、第一個別認識結果が認識対象Ｘ（例えば猫）であることの有効性を判断し、有効であれば第一個別認識結果について認識対象Ｘに票数：１を投票する。有効でなければ第一個別認識結果については投票されない。次に、ステップＳ１２では、全ての個別認識結果の有効性が判断されたか否かを判定する。判断済が第一個別認識結果のみであれば（ステップＳ１２でＮＯ）、ステップＳ１１に戻り、第二個別認識結果について有効性を判断する。同様に第三個別認識結果についても有効性を判断する。 The recognition result integration processing unit 14 determines the validity of all individual recognition results output from the image recognition processing units 13A to 13C, as indicated by the loop of steps S11 and S12 in FIG. For example, in step S11, it is determined whether the first individual recognition result is the recognition target X (for example, a cat), and if it is valid, the first individual recognition result is voted for the recognition target X with the number of votes: 1. If not valid, the first individual recognition result will not be voted on. Next, in step S12, it is determined whether or not the validity of all individual recognition results has been determined. If only the first individual recognition result has been judged (NO in step S12), the process returns to step S11 to judge the validity of the second individual recognition result. Similarly, the validity of the third individual recognition result is judged.

全ての個別認識結果について有効性が判断されれば（ステップＳ１２でＹＥＳ）、それぞれの個別認識結果の獲得票数について判断する。図２に示す例では、ステップＳ１３において、票を獲得した個別認識結果が存在するか否か（獲得票数の有無）について判定する。票を獲得した個別認識結果が存在していれば（ステップＳ１３でＹＥＳ）、ステップＳ１４において、全ての個別認識結果が票を獲得しているか、すなわち、個別認識結果が全会一致であるか否かを判定する。 If the validity of all individual recognition results is determined (YES in step S12), the number of votes obtained for each individual recognition result is determined. In the example shown in FIG. 2, in step S13, it is determined whether or not there is an individual recognition result that has obtained votes (whether or not the number of obtained votes is present). If there are individual recognition results that have obtained votes (YES in step S13), in step S14, it is determined whether all the individual recognition results have obtained votes, that is, whether the individual recognition results are unanimous. judge.

第一～第三個別認識結果の全てが有効であるときには、入力画像データに含まれる任意の認識候補の認識結果（候補認識結果）が認識対象Ｘについて３票獲得するため、この候補認識結果は全会一致で有効な個別認識結果であると判断される（ステップＳ１４でＹＥＳ）。そこで、ステップＳ１５において、任意の認識候補が全会一致で認識対象Ｘ（例えば猫）であるという統合認識結果が生成される。 When all of the first to third individual recognition results are valid, the recognition result (candidate recognition result) of any recognition candidate included in the input image data obtains three votes for the recognition object X, so this candidate recognition result is It is unanimously determined that the individual recognition result is valid (YES in step S14). Therefore, in step S15, an integrated recognition result is generated that an arbitrary recognition candidate is unanimously the recognition target X (for example, a cat).

一方、第一～第三個別認識結果のうちいずれか１つが無効であるときには、全会一致にならず（ステップＳ１４でＮＯ）、任意の候補認識結果が認識対象Ｘについて２票獲得することになる。すなわち、ステップＳ１６において、任意の候補認識結果について、認識対象Ｘの得票が多いかもしくは無効となる票が多いか（無効票も含めてＸが最多得票であるか）判断する。無効票よりも認識対象Ｘの得票が多ければ、この候補認識結果は、多数決で有効な個別認識結果であると判断される（ステップＳ１６でＹＥＳ）。そこで、ステップＳ１７において、任意の認識候補が多数決で認識対象Ｘ（例えば猫）であるという統合認識結果が生成される。 On the other hand, when any one of the first to third individual recognition results is invalid, unanimity is not reached (NO in step S14), and any candidate recognition result obtains two votes for recognition object X. . That is, in step S16, it is determined whether the recognition object X has received many votes or whether there are many invalid votes (whether X has the largest number of votes including invalid votes) for any candidate recognition result. If the recognition object X has more votes than the invalid votes, this candidate recognition result is judged to be a valid individual recognition result by majority vote (YES in step S16). Therefore, in step S17, an integrated recognition result is generated in which an arbitrary recognition candidate is the recognition target X (for example, a cat) by majority vote.

一方、第一～第三個別認識結果のうち２つが無効であるが１つが有効であるときにも、全会一致にならず（ステップＳ１４でＮＯ）、任意の候補認識結果に１票獲得することになる。得票数だけで見れば、この候補認識結果は、１票ではあるものの多数決で有効な個別認識結果であると判断することができる。しかしながら、無効票も「得票］と見て多数決を判断すれば、この候補認識結果が「無効」とした方がよい、と判断することができる。そこで、認識対象Ｘの得票よりも無効票が多い場合には（ステップＳ１６でＮＯ）、任意の認識候補について画像認識部１１Ａは認識対象Ｘであるとは認識できなかったと判断し、ステップＳ１８において、不明という統合認識結果が生成される。 On the other hand, even when two of the first to third individual recognition results are invalid, but one is valid, there is no unanimity (NO in step S14), and one vote is obtained for an arbitrary candidate recognition result. become. Looking only at the number of votes, it can be determined that this candidate recognition result is an effective individual recognition result with a majority vote, even though it has one vote. However, if the majority vote is determined by considering the invalid vote as "voted", it can be determined that the candidate recognition result should be "invalid". Therefore, when there are more invalid votes than recognition target X (NO in step S16), the image recognition unit 11A determines that an arbitrary recognition candidate could not be recognized as recognition target X, and in step S18 , an unknown integrated recognition result is generated.

そして、第一～第三個別認識結果のいずれも無効である場合には、任意の候補認識結果の獲得票数は０票である（ステップＳ１３でＮＯ）。それゆえ、この候補認識結果は無効であると判断される。この場合、任意の認識候補について画像認識部１１Ａは認識対象Ｘであるとは認識できなかったことになるので、ステップＳ１８において、不明という統合認識結果が生成される。 When all of the first to third individual recognition results are invalid, the number of votes obtained for any candidate recognition result is 0 (NO in step S13). Therefore, this candidate recognition result is determined to be invalid. In this case, the image recognition unit 11A cannot recognize any recognition candidate as the recognition target X, so in step S18, an integrated recognition result of unknown is generated.

ステップＳ１５～Ｓ１８のいずれかにおいて生成された統合認識結果は、ステップＳ１９として、認識結果統合処理部１４から出力される（図１（Ａ）参照）。この統合認識結果は、図１（Ａ）には図示しない認識結果使用装置において使用される。なお、認識結果使用装置については後述する。 The integrated recognition result generated in any one of steps S15 to S18 is output from the recognition result integration processing unit 14 as step S19 (see FIG. 1A). This integrated recognition result is used in a recognition result using device (not shown in FIG. 1A). Note that the recognition result using device will be described later.

このように、本開示に係る画像認識システムまたは画像認識処理方法は、複数の画像認識処理部（例えば３つの画像認識処理部１３Ａ～１３Ｃ）を並列して備えており、これら画像認識処理部の画像認識処理は、認識対象についての深層学習（ディープラーニング）により行われるが、それぞれの深層学習は、認識対象について互いに異なるように学習したものである。そして、これら画像認識処理部からそれぞれ得られる個別認識結果を統合する際には、これら個別認識結果が「全会一致」であるか、もしくは、「多数決」で最も一致が多い個別認識結果を選択し、統合認識結果として出力する。 Thus, the image recognition system or image recognition processing method according to the present disclosure includes a plurality of image recognition processing units (for example, three image recognition processing units 13A to 13C) in parallel. Image recognition processing is performed by deep learning of a recognition target, and each deep learning learns a recognition target differently. Then, when integrating the individual recognition results obtained from these image recognition processing units, the individual recognition result that is "unanimous" or the individual recognition result that has the largest number of matches by "majority vote" is selected. , are output as integrated recognition results.

前述した図２に示すような得票を用いる例では、全会一致していれば、複数の画像認識処理部において、予想外または意図的な誤認識が生じていないと判断することができる。また、多数決の場合では、無効票より有効票が多ければ、無効となった個別認識結果について予想外または意図的な誤認識が生じている可能性があるものの、複数の有効な個別認識結果において一致が見られるので、これら個別認識結果は適切な画像認識であると判断することができる。 In the example using the votes shown in FIG. 2 described above, if unanimous, it can be determined that unexpected or intentional erroneous recognition has not occurred in the plurality of image recognition processing units. Also, in the case of a majority vote, if there are more valid votes than invalid votes, there is a possibility that an unexpected or intentional misrecognition has occurred for invalid individual recognition results. Since there is a match, it can be determined that these individual recognition results are appropriate image recognitions.

一方、個別認識結果の全てが無効であれば、予想外または意図的な誤認識が生じている可能性があるとともに、そもそも認識候補として適切でない可能性もある。それゆえ、この場合には不明という統合認識結果を出力することで、不適切な画像認識の出力を回避することができる。また、無効票が有効票より多い場合にも、全てが無効である場合と同様に、予想外または意図的な誤認識、あるいは、認識候補として適切でない可能性があるので、不明という統合認識結果を出力すればよい。ただし、１票でも有効票が得られた場合に画像認識が適切であると判断できるのであれば、無効票を無視して多数決で判断することもできる。 On the other hand, if all of the individual recognition results are invalid, there is a possibility that unexpected or intentional erroneous recognition has occurred, and there is also a possibility that the candidate is not suitable as a recognition candidate in the first place. Therefore, in this case, by outputting an unknown integrated recognition result, it is possible to avoid outputting inappropriate image recognition. Also, if there are more invalid votes than valid votes, as in the case where all votes are invalid, there is a possibility of unexpected or intentional misrecognition, or that it may not be suitable as a recognition candidate, so the integrated recognition result is unknown. should be output. However, if it can be determined that image recognition is appropriate when even one valid vote is obtained, it is possible to ignore invalid votes and make a decision by majority vote.

それゆえ、本開示に係る画像認識システムまたは画像認識処理方法では、いずれかの画像認識処理部において、認識対象について予想外の誤認識が発生したり、認識対象の画像が誤認識させるために意図的に作成された画像であったりしても、全会一致または多数決のプロセスで誤認識を実質的に排除することができる。したがって、深層学習（ディープラーニング）を用いた画像認識において、予想外の誤認識または意図的に作成された画像の誤認識であっても有効に抑制することが可能となる。 Therefore, in the image recognition system or the image recognition processing method according to the present disclosure, an unexpected misrecognition of the recognition target occurs in any of the image recognition processing units, or an image of the recognition target is intentionally misrecognized. The unanimous or majority voting process can virtually eliminate misrecognitions, even for artificially generated images. Therefore, in image recognition using deep learning, it is possible to effectively suppress unexpected erroneous recognition or erroneous recognition of an intentionally created image.

なお、本実施の形態１に係る画像認識システム１０Ａでは、画像認識処理部は合計３つ並列して備えている（画像認識処理部１３Ａ～１３Ｃ）が、２つの画像認識処理部を並列して備えてもよいし、４つ以上の画像認識処理部を並列して備えてもよい。 In the image recognition system 10A according to the first embodiment, a total of three image recognition processing units are provided in parallel (image recognition processing units 13A to 13C). Alternatively, four or more image recognition processing units may be provided in parallel.

また、本実施の形態１に係る画像認識システム１０Ａを構成する、入力画像分配部１２、画像認識処理部１３Ａ～１３Ｃ、認識結果統合処理部１４は、深層学習の分野で公知のＧＰＵ（Graphics Processing Unit），ＦＰＧＡ（Field-Programmable Gate Array），ＡＳＩＣ（Application specific integrated circuit），ＣＰＵ（Central Processing Unit）等の演算器（演算素子、演算装置）により実現される機能構成であればよいが、少なくとも深層学習を実現する画像認識処理部１３Ａ～１３Ｃ、あるいは、これに含まれる個別認識処理部１３１がＧＰＵ等の演算器で実現される構成であってもよい。また、画像認識システム１０Ａが備える一部の構成は、公知のスイッチング素子、減算器、比較器等による論理回路等として構成されてもよいし、独立した装置として構成されてもよい。 Further, the input image distribution unit 12, the image recognition processing units 13A to 13C, and the recognition result integration processing unit 14, which constitute the image recognition system 10A according to the first embodiment, are GPU (Graphics Processing) known in the field of deep learning. Unit), FPGA (Field-Programmable Gate Array), ASIC (Application specific integrated circuit), CPU (Central Processing Unit), etc. The image recognition processing units 13A to 13C that implement deep learning or the individual recognition processing unit 131 included therein may be implemented by a computing unit such as a GPU. Also, part of the configuration of the image recognition system 10A may be configured as a logic circuit or the like using known switching elements, subtractors, comparators, etc., or may be configured as an independent device.

（実施の形態２）
前記実施の形態１では、画像認識処理部１３Ａ～１３Ｃにおける深層学習は、１種類のみの認識対象Ｘ（例えば猫）について学習したものであったが、本開示においては、画像認識処理部１３Ａ～１３Ｃにおける深層学習は、複数種類の認識対象について学習したものであってもよい。例えば、深層学習が、認識対象Ｘ（例えば猫）および認識対象Ｙ（例えば犬）について学習したものであるとして、図３を参照して本実施の形態２について説明する。 (Embodiment 2)
In the first embodiment, the deep learning in the image recognition processing units 13A to 13C learns about only one type of recognition target X (for example, a cat), but in the present disclosure, the image recognition processing units 13A to 13C Deep learning in 13C may be learned about multiple types of recognition targets. For example, the second embodiment will be described with reference to FIG. 3 assuming that deep learning learns about a recognition target X (for example, a cat) and a recognition target Y (for example, a dog).

本実施の形態２に係る画像認識システム１０Ａの構成は、前記実施の形態１で説明した通りであるが、前記の通り、深層学習が複数種類の認識対象Ｘ（猫）およびＹ（犬）について学習したものである。図３に示すフローチャートは、図２と同様に、認識結果統合処理部１４における投票による統合処理手法を模式的にステップ化したものであり、この例では、ステップＳ２１～ステップＳ２９の合計９ステップから構成される。もちろん、統合処理手法はこれに限定されるものではない。 The configuration of the image recognition system 10A according to the second embodiment is as described in the first embodiment. It is learned. The flowchart shown in FIG. 3, like FIG. 2, is a schematic step-by-step representation of the vote-based integration processing method in the recognition result integration processing unit 14. In this example, a total of nine steps from step S21 to step S29 Configured. Of course, the integrated processing method is not limited to this.

認識結果統合処理部１４では、図３において、ステップＳ２１およびステップＳ２２のループで示すように、画像認識処理部１３Ａ～１３Ｃから出力された全ての個別認識結果について有効性を判断する。このループは、図２におけるステップＳ１１およびステップＳ１２のループと同様であり、例えば、第一個別認識結果が認識対象Ｘ（猫）であることの有効性、並びに、認識対象Ｙ（犬）であることの有効性を判断し、認識対象Ｘとして有効であれば第一個別認識結果について認識対象Ｘに票数：１を投票し、認識対象Ｙとして有効であれば第一個別認識結果について認識対象Ｙに票数：１を投票する。有効でなければ第一個別認識結果については投票されない。その後、第二個別認識結果および第三個別認識結果についても同様に有効性を判断する。 The recognition result integration processing unit 14 determines the validity of all the individual recognition results output from the image recognition processing units 13A to 13C, as indicated by the loop of steps S21 and S22 in FIG. This loop is similar to the loop of steps S11 and S12 in FIG. If it is valid as the recognition target X, the number of votes: 1 for the recognition target X for the first individual recognition result is voted, and if it is valid as the recognition target Y, the recognition target Y for the first individual recognition result Votes for: 1. If not valid, the first individual recognition result will not be voted on. After that, the validity of the second individual recognition result and the third individual recognition result is similarly judged.

全ての個別認識結果について有効性が判断されれば（ステップＳ２２でＹＥＳ）、それぞれの個別認識結果の獲得票数について判断する。図３に示す例では、ステップＳ２３において、票を獲得した個別認識結果が存在するか否か（獲得票数の有無）について判定する。票を獲得した個別認識結果が存在していれば（ステップＳ２３でＹＥＳ）、ステップＳ２４において、全ての個別認識結果が認識対象Ｘについて票を獲得しているか、または、全ての個別認識結果が認識対象Ｙについて票を獲得しているか、すなわち、個別認識結果が全会一致であるか否かを判定する。 If the validity of all individual recognition results is determined (YES in step S22), the number of votes obtained for each individual recognition result is determined. In the example shown in FIG. 3, in step S23, it is determined whether or not there is an individual recognition result that has obtained votes (whether or not the number of obtained votes is present). If there are individual recognition results that have obtained votes (YES in step S23), in step S24, all the individual recognition results have obtained votes for the recognition target X, or all the individual recognition results have been recognized. It is determined whether or not votes have been obtained for object Y, that is, whether or not individual recognition results are unanimous.

第一～第三個別認識結果の全てが、例えば認識対象Ｘ（猫）として有効であるときには、入力画像データに含まれる任意の認識候補の認識結果（候補認識結果）がＸについて３票獲得する。そのため、この候補認識結果は全会一致でＸとして有効な個別認識結果であると判断される（ステップＳ２４でＹＥＳ）。そこで、ステップＳ２５において、任意の認識候補が全会一致で認識対象Ｘ（猫）であるという統合認識結果が生成される。なお、認識対象Ｙ（犬）についても同様である。 When all of the first to third individual recognition results are valid as, for example, the recognition target X (cat), the recognition result (candidate recognition result) of any recognition candidate included in the input image data obtains three votes for X. . Therefore, it is unanimously determined that this candidate recognition result is an effective individual recognition result for X (YES in step S24). Therefore, in step S25, an integrated recognition result is generated that an arbitrary recognition candidate is unanimously the recognition target X (cat). The same applies to recognition target Y (dog).

一方、第一～第三個別認識結果が全会一致でない（ステップＳ２４でＮＯ）ときには、これら個別認識結果には、認識対象Ｘとして有効なものと認識対象Ｙとして有効なものとが含まれている可能性がある（票を獲得していない無効なものも含まれている可能性がある）。そこで、ステップＳ２６において、第一～第三個別認識結果のうち、認識対象ＸまたはＹについて最多得票のものが存在するか否かを判断する。 On the other hand, when the first to third individual recognition results do not unanimously agree (NO in step S24), these individual recognition results include valid recognition target X and recognition target Y. Possibly (possibly including invalid ones that did not get votes). Therefore, in step S26, it is determined whether or not the recognition target X or Y has the highest number of votes among the first to third individual recognition results.

例えば、任意の候補認識結果が、認識対象Ｘとして２票獲得かつ認識対象Ｙとして１票獲得している場合、認識対象Ｘとして２票獲得し残りが無効である場合、認識対象Ｙとして２票獲得かつ認識対象Ｘとして１票獲得している場合、認識対象Ｙとして２票獲得し残りが無効である場合には、多数決で認識対象Ｘとして有効な個別認識結果である、または、認識対象Ｙとして有効な個別認識結果であると判断される（ステップＳ２６でＹＥＳ）。そこで、ステップＳ２７において、任意の認識候補が多数決で認識対象Ｘ（猫）または認識対象Ｙであるという統合認識結果が生成される。 For example, if an arbitrary candidate recognition result obtains two votes as recognition target X and one vote as recognition target Y, if two votes are obtained as recognition target X and the rest are invalid, two votes are obtained as recognition target Y. If one vote is obtained as the recognition target X, if two votes are obtained as the recognition target Y and the rest are invalid, the individual recognition result is valid as the recognition target X by majority vote, or the recognition target Y (YES in step S26). Therefore, in step S27, an integrated recognition result is generated in which any recognition candidate is the recognition target X (cat) or the recognition target Y by majority vote.

ここで、第一～第三個別認識結果の全て無効である場合には、認識対象ＸおよびＹのいずれも１票も獲得していない（獲得票数０票）ことになる（ステップＳ２３でＮＯ）。それゆえ、この候補認識結果は無効であると判断され、ステップＳ２８において不明という統合認識結果が生成される。 Here, if all of the first to third individual recognition results are invalid, neither of the recognition targets X and Y has obtained a single vote (the number of votes obtained is 0) (NO in step S23). . Therefore, this candidate recognition result is determined to be invalid, and an unknown integrated recognition result is generated in step S28.

あるいは、第一～第三個別認識結果のいずれかが認識対象Ｘとして１票を獲得し、他のいずれかが認識対象Ｙとして１票を獲得し、残りが無効である場合には、認識対象ＸまたはＹのいずれかが最多得票にならない（ステップＳ２６でＮＯ）。同様に、認識対象Ｘとして１票獲得し残りが無効である場合、あるいは、認識対象Ｙとして１票獲得し残りが無効である場合には、無効票の方が多くなるので、認識対象ＸまたはＹのいずれかが最多得票にならない（ステップＳ２６でＮＯ）。これらの場合には、候補認識結果がＸであるかＹであるか判断できないので、ステップＳ２８において不明という統合認識結果が生成される。 Alternatively, if one of the first to third individual recognition results obtains one vote as the recognition target X, one of the others obtains one vote as the recognition target Y, and the rest are invalid, the recognition target Either X or Y does not get the most votes (NO in step S26). Similarly, if one vote is obtained for the recognition target X and the rest are invalid, or if one vote is obtained for the recognition target Y and the rest are invalid, the number of invalid votes will be greater. Any of Y does not get the most votes (NO in step S26). In these cases, it cannot be determined whether the candidate recognition result is X or Y, so an unknown integrated recognition result is generated in step S28.

前記の通り、ステップＳ２５，Ｓ２７，Ｓ２８のいずれかにおいて統合認識結果が生成されれば、ステップＳ２９において、この統合認識結果は、認識結果統合処理部１４から認識結果使用装置に出力される。 As described above, when the integrated recognition result is generated in any of steps S25, S27, and S28, the integrated recognition result is output from the recognition result integration processing unit 14 to the recognition result using device in step S29.

このように、本開示に係る画像認識システムまたは画像認識処理方法においては、複数の認識対象についての深層学習を用いた複数の画像認識処理部を並列して備えてもよい。この場合でも、複数の個別認識結果を統合する際には、これら個別認識結果が「全会一致」であるか、もしくは、「多数決」で最も一致が多い個別認識結果を選択し、統合認識結果として出力すればよい。 Thus, in the image recognition system or image recognition processing method according to the present disclosure, a plurality of image recognition processing units using deep learning for a plurality of recognition targets may be provided in parallel. Even in this case, when integrating a plurality of individual recognition results, select the individual recognition result that is "unanimous" or the individual recognition result that has the largest number of matches in the "majority vote", and select it as the integrated recognition result. output.

これにより、いずれかの画像認識処理部において、認識対象について予想外の誤認識が発生したり、認識対象の画像が誤認識させるために意図的に作成された画像であったりしても、全会一致または多数決のプロセスで誤認識を実質的に排除することができる。それゆえ、深層学習を用いた画像認識において、予想外の誤認識または意図的に作成された画像の誤認識であっても有効に抑制することが可能となる。 As a result, even if an unexpected erroneous recognition of the recognition target occurs in any of the image recognition processing units, or if the image of the recognition target is an image intentionally created to cause erroneous recognition, False recognitions can be virtually eliminated in the process of consensus or majority voting. Therefore, in image recognition using deep learning, it is possible to effectively suppress unexpected erroneous recognition or erroneous recognition of an intentionally created image.

（実施の形態３）
前記実施の形態１または２においては、画像認識システム１０Ａは、入力画像分配部１２、画像認識処理部１３Ａ～１３Ｃ、認識結果統合処理部１４を備える画像認識部１１Ａのみを備える構成であったが、本開示はこれに限定されない。例えば、図４に示すように、画像認識装置１１Ｂと画像供給装置１５と認識結果使用装置１６Ａを備える画像認識システム１０Ｂであってもよい。あるいは、図５に示すように、画像認識・結果使用装置１１Ｃおよび画像供給装置１５を備える画像認識システム１０Ｃであってもよい。 (Embodiment 3)
In the first or second embodiment, the image recognition system 10A is configured to include only the image recognition unit 11A including the input image distribution unit 12, the image recognition processing units 13A to 13C, and the recognition result integration processing unit 14. , the disclosure is not limited thereto. For example, as shown in FIG. 4, it may be an image recognition system 10B comprising an image recognition device 11B, an image supply device 15, and a recognition result use device 16A. Alternatively, as shown in FIG. 5, an image recognition system 10C including an image recognition/result usage device 11C and an image supply device 15 may be used.

画像認識装置１１Ｂは、前記実施の形態１または２で説明した画像認識部１１Ａと同様の構成である。画像供給装置１５は、画像認識装置１１Ｂまたは画像認識・結果使用装置１１Ｃに対して入力画像データを供給するものであればよい。具体的には、例えば、静止画または動画（映像）を撮影するカメラ（撮影装置）であってもよいし、予め撮影された静止画または動画のデータを供給可能に記憶する情報端末装置（パーソナルコンピュータ、スマートホン、タブレット等）であってもよい。 The image recognition device 11B has the same configuration as the image recognition section 11A described in the first or second embodiment. The image supply device 15 may supply input image data to the image recognition device 11B or the image recognition/result use device 11C. Specifically, for example, it may be a camera (shooting device) that shoots still images or moving images (video), or an information terminal device (personal computer, smartphone, tablet, etc.).

認識結果使用装置１６Ａは、認識結果統合処理部１４から出力される統合認識結果を用いて所定の処理を実行するものであればよい。具体的には、画像認識を用いた各種のアプリケーション（後述）に用いられる作動装置等が挙げられる。例えば、画像認識システム１０Ｂまたは１０Ｃが自動運転車両に適用される場合には、走行判断装置が挙げられ、画像認識システム１０Ｂまたは１０Ｃが立ち入り制限のための顔認証システムに適用される場合には、施解錠装置が挙げられる。 16 A of recognition result using apparatuses should just perform a predetermined process using the integrated recognition result output from the recognition result integration process part 14. FIG. Specifically, an actuator or the like used for various applications (described later) using image recognition can be mentioned. For example, when the image recognition system 10B or 10C is applied to an automatic driving vehicle, a driving determination device is included, and when the image recognition system 10B or 10C is applied to a face authentication system for entry restriction, A locking and unlocking device is mentioned.

画像認識・結果使用装置１１Ｃは、画像認識装置１１Ｂと認識結果使用装置１６Ａとが一体化したものであり、画像認識部１１Ａと同様の構成（入力画像分配部１２、画像認識処理部１３Ａ～１３Ｃ、認識結果統合処理部１４）と、認識結果使用部１６Ｂとを備えている。認識結果使用部１６Ｂは、認識結果使用装置１６Ａと同様のものであればよい。 The image recognition/result use device 11C is an integration of the image recognition device 11B and the recognition result use device 16A. , a recognition result integration processing unit 14) and a recognition result use unit 16B. The recognition result using unit 16B may be the same as the recognition result using device 16A.

本実施の形態３に係る画像認識システム１０Ｂ，１０Ｃ、あるいは、前述した実施の形態１または２に係る画像認識システム１０Ａが適用可能なアプリケーションは特に限定されず、認識結果使用装置１６Ａの例として挙げたように、画像認識を用いた各種の制御システム、認証システム、あるいは診断システム等であればよい。 Applications to which the image recognition systems 10B and 10C according to the third embodiment or the image recognition system 10A according to the first or second embodiment can be applied are not particularly limited, and examples of the recognition result using device 16A are given. As described above, any control system, authentication system, diagnostic system, or the like using image recognition may be used.

より具体的には、例えば、自動運行航空機における地上走行時の他機または設備等の周辺環境を認識するための画像認識システム、自動運転車両における他車または歩行者等の周辺環境を認識するための画像認識システム、自動運航船舶における他船舶または港湾施設等の周辺環境を認識するための画像認識システム、医療分野におけるＸ線画像からのがん診断システム、手荷物検査におけるＸ線画像からの危険物自動識別システム、高セキュリティエリアにおける立ち入り制限のための顔認証システム等を挙げることができる。 More specifically, for example, an image recognition system for recognizing the surrounding environment such as other aircraft or equipment during ground taxiing in an autonomous aircraft, and for recognizing the surrounding environment such as other vehicles or pedestrians in an autonomous vehicle. image recognition system for autonomous ships, image recognition system for recognizing the surrounding environment such as other ships or port facilities in autonomous ships, cancer diagnosis system from X-ray images in the medical field, dangerous goods from X-ray images in baggage inspection Automatic identification systems, face authentication systems for restricting access in high security areas, and the like can be mentioned.

このように、本開示に係る画像認識システムまたは画像認識処理方法は、認識対象についての互いに異なる学習済の深層学習により画像認識処理を行う画像認識処理部を複数並列し、これら画像認識処理部に対して入力画像データを分配供給して、それぞれの画像認識処理部が画像認識処理を行って、前記入力画像データに含まれる認識候補についての個別認識結果を複数生成し、これら個別認識結果を統合処理するものであり、この統合処理では、複数の前記個別認識結果が全て一致したときには、当該個別認識結果を前記認識対象の統合認識結果として出力し、複数の前記個別認識結果が全て一致しないときには、一致が最も多い前記個別認識結果を前記認識対象の統合認識結果として出力する構成であればよい。 In this way, in the image recognition system or image recognition processing method according to the present disclosure, a plurality of image recognition processing units that perform image recognition processing by deep learning that is different from each other for a recognition target are arranged in parallel, and these image recognition processing units The input image data is distributed and supplied to each of the image recognition processing units, and each image recognition processing unit performs image recognition processing to generate a plurality of individual recognition results for the recognition candidates included in the input image data, and integrates these individual recognition results. In this integration process, when the plurality of individual recognition results all match, the individual recognition result is output as the integrated recognition result of the recognition target, and when the plurality of individual recognition results do not all match , the individual recognition result with the largest number of matches is output as the integrated recognition result of the recognition object.

それゆえ、本開示に係る画像認識システムの具体的な構成については特に限定されず、例えば、実施の形態１または２で説明した図１（Ａ）に示す画像認識システム１０Ａであってもよいし、本実施の形態３で説明した図４に示す画像認識システム１０Ｂ、あるいは、図５に示す画像認識システム１０Ｃであってもよいし、実施の形態１～３で説明した以外の構成を有するシステムであってもよい。 Therefore, the specific configuration of the image recognition system according to the present disclosure is not particularly limited, and may be, for example, the image recognition system 10A shown in FIG. , the image recognition system 10B shown in FIG. 4 described in the third embodiment, or the image recognition system 10C shown in FIG. 5, or a system having a configuration other than that described in the first to third embodiments may be

なお、本発明は前記実施の形態の記載に限定されるものではなく、特許請求の範囲に示した範囲内で種々の変更が可能であり、異なる実施の形態や複数の変形例にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施の形態についても本発明の技術的範囲に含まれる。 It should be noted that the present invention is not limited to the description of the above embodiments, and various modifications are possible within the scope of the claims, and different embodiments and multiple modifications are disclosed respectively. Embodiments obtained by appropriately combining the above technical means are also included in the technical scope of the present invention.

本発明は、深層学習（ディープラーニング）を用いた画像認識の分野に広く好適に用いることができる。 INDUSTRIAL APPLICABILITY The present invention can be widely and suitably used in the field of image recognition using deep learning.

１０Ａ～１０Ｃ：画像認識システム
１１Ａ：画像認識部
１１Ｂ：画像認識装置（画像認識部）
１１Ｃ：画像認識・結果使用装置（画像認識部、認識結果使用部）
１２：入力画像分配部
１３：画像認識処理部
１３Ａ：第一画像認識処理部
１３Ｂ：第二画像認識処理部
１３Ｃ：第三画像認識処理部
１４：認識結果統合処理部
１５：画像供給装置
１６Ａ：認識結果使用装置（認識結果使用部）
１６Ｂ：認識結果使用部
１３１：個別認識処理部
１３２：個別認識結果判定部
10A to 10C: Image recognition system 11A: Image recognition unit 11B: Image recognition device (image recognition unit)
11C: Image recognition/result usage device (image recognition unit, recognition result usage unit)
12: Input image distribution unit 13: Image recognition processing unit 13A: First image recognition processing unit 13B: Second image recognition processing unit 13C: Third image recognition processing unit 14: Recognition result integration processing unit 15: Image supply device 16A: Recognition result usage device (recognition result usage unit)
16B: recognition result using unit 131: individual recognition processing unit 132: individual recognition result determination unit

Claims

A plurality of image recognition processing units that perform image recognition processing on input image data by trained deep learning and generate individual recognition results for recognition candidates included in the input image data, wherein the deep learning is different from each other . , a plurality of image recognition processing units ;
Integrating processing of the individual recognition results output from the plurality of image recognition processing units, and outputting the individual recognition result as an integrated recognition result when the plurality of individual recognition results are all matched, a recognition result integration processing unit for outputting the individual recognition result with the greatest number of matches as an integrated recognition result when all the individual recognition results do not match;
with
The image recognition processing unit generates a recognition result of the recognition candidate and a certainty factor thereof in image recognition processing of the recognition candidate, and evaluates the validity of the recognition result based on the recognition result and the certainty factor. When it is determined that the certainty is high when is equal to or greater than a predetermined threshold,
A method of adopting a first recognition result, which is the recognition result of the recognition candidate with the highest degree of certainty,
A method of adopting a second recognition result that selects the recognition result from the recognition candidates whose certainty is equal to or greater than a predetermined threshold;
A method of determining, as a final recognition result, a recognition candidate selected according to the magnitude of the difference in certainty between the recognition results in the recognition candidate group consisting of the first recognition result and the second recognition result. ,
and outputting the validity as the individual recognition result to the recognition result integration processing unit;
image recognition system.

When the plurality of individual recognition results are integrated, the recognition result integration processing unit determines whether or not to vote for each individual recognition result based on at least the validity, and
characterized by generating the integrated recognition result based on the voting result,
The image recognition system according to claim 1.

When none of the plurality of individual recognition results are valid, or when the number of invalid individual recognition results is greater than the number of valid individual recognition results, the recognition result integration processing unit outputs an unknown integrated recognition result. characterized by outputting
The image recognition system according to claim 2.

In the plurality of image recognition processing units, deep learning is performed using different teacher images, deep learning is performed using different initial values for learning, or deep learning is performed in which at least one of the number of intermediate layers and the number of cells is different. characterized in that one of the
The image recognition system according to any one of claims 1 to 3.

further comprising an input image distribution unit that distributes and supplies the input image data to a plurality of the image recognition processing units;
an image supply unit that supplies input image data to the input image distribution unit;
and at least one of a recognition result using unit that executes a predetermined process using the integrated recognition result output from the recognition result integration processing unit,
The image recognition system according to any one of claims 1 to 4.

A plurality of image recognition processing units that perform image recognition processing on a recognition target, wherein a plurality of image recognition processing units trained by different deep learning are arranged in parallel ,
Each of the image recognition processing units generates the recognition result of the recognition candidate and its certainty factor in the image recognition process of the recognition target, and determines the validity of the recognition result based on the recognition result and the certainty factor. When it is determined that the degree of certainty is high when the degree is equal to or greater than a predetermined threshold,
A method of adopting a first recognition result, which is the recognition result of the recognition candidate with the highest degree of certainty,
A method of adopting a second recognition result that selects the recognition result from the recognition candidates whose certainty is equal to or greater than a predetermined threshold;
A method of determining, as a final recognition result, a recognition candidate selected according to the magnitude of the difference in confidence between each recognition result in a recognition candidate group consisting of the first recognition result and the second recognition result;
and outputting the effectiveness as the individual recognition result to the recognition result integration processing unit to generate a plurality of individual recognition results for the recognition candidates included in the input image data ,
when all of the plurality of individual recognition results match, outputting the individual recognition result as an integrated recognition result of the recognition target;
When all of the plurality of individual recognition results do not match, outputting the individual recognition result with the greatest number of matches as an integrated recognition result of the recognition target;
Image recognition processing method.