JP7046239B2

JP7046239B2 - Methods and systems for generating neural networks for object recognition in images

Info

Publication number: JP7046239B2
Application number: JP2021004662A
Authority: JP
Inventors: アールマニカンダン; センシバシッシュ
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2020-01-24
Filing date: 2021-01-15
Publication date: 2022-04-01
Anticipated expiration: 2041-01-15
Also published as: JP2021117993A

Description

本開示は、ニューラルネットワークの分野に関する。特に、ただし限定的ではなく、本開示は、画像内のオブジェクト認識のためにニューラルネットワークを生成するための方法及びシステムに関する。 The present disclosure relates to the field of neural networks. In particular, but not limited to, the present disclosure relates to methods and systems for generating neural networks for object recognition in images.

近年では、リアルタイムビデオ解析の必要性が著しく高まってきた。ビデオ解析又はビデオコンテンツ解析は、ビデオ内の１つ以上のオブジェクトのオブジェクト認識を含み、１つ以上のオブジェクトによって実行される特定の動作又はアクションをリアルタイムで決定し、決定された動作又はアクションに基づいてユーザに見識又はアラートを提供する。例えば、ビデオ解析は、１つ以上のオブジェクトの動作及び活動をモニタするために、混雑した場所から工場までの視覚監視を実行するために使用される。ビデオ解析における重要な態様は、ビデオ機器の画像内のオブジェクト認識である。オブジェクト認識は、人から動物、車から安全装置などに及ぶ画像内の１つ以上のオブジェクトを検出し、識別することを含む。ニューラルネットワークを使用し、実行されるオブジェクト認識は、従来の画像処理技術に比較してより高い精度を達成する。既存の技術は、ディープニューラルネットワーク、つまり畳み込みニューラルネットワークのクラスを使用し、オブジェクト認識を実行する。畳み込みニューラルネットワークは、認識される１つ以上のオブジェクトに対応する複数の画像を提供し、畳み込みニューラルネットワークの出力を複数の画像と関連付けられたクラスラベルと比較することによって訓練され、畳み込みニューラルネットワークの１つ以上のパラメータは、教師付き学習技術を使用し、修正される。訓練された畳み込みニューラルネットワークは、ビデオの画像内の１つ以上のオブジェクトを認識するために使用される。 In recent years, the need for real-time video analysis has increased significantly. Video analysis or video content analysis involves object recognition of one or more objects in a video, determining in real time a particular action or action performed by one or more objects, and based on the determined action or action. Provide insights or alerts to users. For example, video analysis is used to perform visual surveillance from a crowded area to a factory to monitor the movement and activity of one or more objects. An important aspect in video analysis is object recognition in images of video equipment. Object recognition involves detecting and identifying one or more objects in an image ranging from humans to animals, cars to safety devices, and the like. Object recognition performed using neural networks achieves higher accuracy compared to traditional image processing techniques. Existing techniques use a class of deep neural networks, or convolutional neural networks, to perform object recognition. A convolutional neural network is trained by providing multiple images corresponding to one or more recognized objects and comparing the output of the convolutional neural network with the class labels associated with the multiple images of the convolutional neural network. One or more parameters are modified using supervised learning techniques. A trained convolutional neural network is used to recognize one or more objects in a video image.

既存の技術での問題は、各オブジェクトが畳み込みニューラルネットワークによって認識される上で、畳み込みニューラルネットワークが多数の画像によって訓練されることを必要とする。したがって、畳み込みニューラルネットワークを訓練するために要する時間は、認識されるオブジェクトの数及び訓練データセットの画像の数とともに著しく増加する。さらに、畳み込みニューラルネットワークを訓練する上で、各オブジェクトについて多数の画像を利用できない場合がある。 Problems with existing techniques require that the convolutional neural network be trained by a large number of images in order for each object to be recognized by the convolutional neural network. Therefore, the time required to train a convolutional neural network increases significantly with the number of recognized objects and the number of images in the training dataset. In addition, many images may not be available for each object in training convolutional neural networks.

既存の技術での別の問題は、より少ない画像を用いて畳み込みニューラルネットワークを訓練すると過剰適合につながり、したがってオブジェクト認識の精度が低下する点である。 Another problem with existing techniques is that training a convolutional neural network with fewer images leads to overfitting and thus reduces the accuracy of object recognition.

開示の項の本背景技術で開示される情報は、本発明の一般的な背景の理解を強化するためだけであり、この情報が、当業者にすでに既知の先行技術を形成する旨の認容又は任意の示唆の形として解釈されるべきではない。 The information disclosed in this background art in the Disclosure section is solely to enhance the understanding of the general background of the invention and tolerate or acknowledge that this information forms prior art already known to those of skill in the art. It should not be interpreted as any form of suggestion.

追加の特徴及び優位点は、本開示の技術を通して実現される。本開示の他の実施形態及び態様は、本明細書に詳細に説明され、特許請求される開示の一部と見なされる。 Additional features and advantages are realized through the techniques of the present disclosure. Other embodiments and embodiments of the present disclosure are described in detail herein and are considered part of the claimed disclosure.

本明細書に開示されるのは、画像内のオブジェクト認識のためにニューラルネットワークを生成する方法である。方法は、１つ以上のオブジェクト間の階層関係に関する情報を受け取ることを含む。さらに、方法は、情報に基づいてベースニューラルネットワークに対する入力として、１つ以上のオブジェクトの各オブジェクトに対応する訓練データセットの１つ以上の画像を提供することを含み、ベースニューラルネットワークは１つ以上のパラメータと関連付けられる。さらに、方法は、各入力画像について、ベースニューラルネットワークの出力及び各入力画像に対応するクラスラベルに基づいて、ベースニューラルネットワークの損失値を決定することを含み、出力は、１つ以上のオブジェクトに対応する相似値を示す。最後に、方法は、出力、ベースニューラルネットワークの損失値、及びニューラルネットワークを生成するための第２のユーザ入力のうちの少なくとも１つに基づいて、各入力画像について１つ以上のパラメータを更新することを含み、ニューラルネットワークはオブジェクト認識に使用される。 Disclosed herein is a method of generating a neural network for object recognition in an image. The method comprises receiving information about a hierarchical relationship between one or more objects. Further, the method comprises providing one or more images of the training data set corresponding to each object of one or more objects as an informed input to the base neural network, the base neural network being one or more. Associated with the parameters of. Further, the method comprises determining the loss value of the base neural network for each input image based on the output of the base neural network and the class label corresponding to each input image, the output being to one or more objects. Shows the corresponding similarity value. Finally, the method updates one or more parameters for each input image based on at least one of the output, the loss value of the base neural network, and the second user input to generate the neural network. Neural networks are used for object recognition.

さらに、本開示は、プロセッサ、及びプロセッサに通信で結合されたメモリを含む訓練システムを開示し、メモリは、実行時に、プロセッサに、１つ以上のオブジェクト間の階層関係に関する情報を受け取らせるプロセッサ命令を格納する。さらに、プロセッサは、情報に基づいてベースニューラルネットワークに対する入力として、１つ以上のオブジェクトの各オブジェクトに対応する訓練データセットの１つ以上の画像を提供するように構成され、ベースニューラルネットワークは１つ以上のパラメータと関連付けられる。さらに、プロセッサは、各入力画像について、ベースニューラルネットワークの出力及び各入力画像に対応するクラスラベルに基づいてベースニューラルネットワークの損失値を決定するように構成され、出力は１つ以上のオブジェクトに対応する相似値を示す。最後に、プロセッサは、出力、ベースニューラルネットワークの損失値、及びニューラルネットワークを生成するための第２のユーザ入力のうちの少なくとも１つに基づいて各入力画像について１つ以上のパラメータを更新するように構成され、ニューラルネットワークはオブジェクト認識に使用される。 Further, the present disclosure discloses a processor and a training system including a memory communicatively coupled to the processor, which, at run time, causes the processor to receive information about hierarchical relationships between one or more objects. To store. Further, the processor is configured to provide one or more images of the training data set corresponding to each object of one or more objects as input to the base neural network based on the information, and the base neural network is one. It is associated with the above parameters. In addition, the processor is configured to determine the loss value of the base neural network for each input image based on the output of the base neural network and the class label corresponding to each input image, the output corresponding to one or more objects. Indicates a similar value to be used. Finally, the processor updates one or more parameters for each input image based on at least one of the output, the loss value of the base neural network, and the second user input to generate the neural network. The neural network is used for object recognition.

上記の発明の概要は、説明のためだけであり、限定的となることを意図していない。上述の実施態様、実施形態、及び特徴に加えて、追加の態様、実施形態、及び特徴が、図面及び続く発明を実施するための形態を参照することで明らかになる場合がある。 The outline of the above invention is for illustration purposes only and is not intended to be limiting. In addition to the embodiments, embodiments, and features described above, additional embodiments, embodiments, and features may be revealed by reference to the drawings and subsequent embodiments for carrying out the invention.

本開示の新規の特徴及び特色は、添付の特許請求の範囲に説明される。しかしながら、本開示自体及び好ましい使用形態、その追加の目的及び優位点は、添付図面と併せて読まれると、実施形態の以下の発明を実施するための形態を参照して最もよく理解され得る。本開示に組み込まれ、本開示の一部を構成する添付の図面は、例示的な実施形態を示し、説明とともに、開示される原理を説明するのに役立つ。図中、参照番号の最も左側の数字（複数の場合がある）は、参照番号が最初に表示される図を識別する。１つ以上の実施形態が、類似する参照番号が類似する要素を表す添付図を参照して、ここでは単なる例として説明される。 The novel features and features of this disclosure are described in the appended claims. However, the present disclosure itself and preferred embodiments, its additional objectives and advantages, when read in conjunction with the accompanying drawings, can best be understood with reference to embodiments of the embodiments for carrying out the following inventions. The accompanying drawings incorporated into this disclosure and forming part of this disclosure provide exemplary embodiments and, along with explanations, serve to explain the disclosed principles. In the figure, the number on the far left of the reference number (s) identifies the figure in which the reference number is displayed first. One or more embodiments will be described herein by way of example only, with reference to the accompanying figures in which similar reference numbers represent similar elements.

本開示のいくつかの実施形態に従って、画像内のオブジェクト認識のためにニューラルネットワークを生成するための例示的な環境を示す図である。FIG. 6 illustrates an exemplary environment for generating a neural network for object recognition in an image according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る訓練システムの詳細ブロック図である。It is a detailed block diagram of the training system which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に係る、ユーザから受け取られた例示的な情報を示す図である。It is a figure which shows the exemplary information received from the user which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に従って、画像内のオブジェクト認識のためにニューラルネットワークを生成するための方法を示すフローチャートである。It is a flowchart which shows the method for generating a neural network for object recognition in an image according to some embodiments of this disclosure. 本開示のいくつかの実施形態に係る例示的な訓練データセットを示す図である。It is a figure which shows the exemplary training data set which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に従って、ツリー構造を使用し、表される例示的な情報を示す図である。It is a diagram showing exemplary information represented using a tree structure according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、ソートされた１つ以上の例示的な画像を示す図である。It is a figure which shows one or more sorted exemplary images which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に係るベースニューラルネットワークの例示的なブロック図である。It is an exemplary block diagram of a base neural network according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、１つ以上のカーネルを用いたグレイスケール入力画像の例示的な畳み込みを示す図である。FIG. 3 illustrates an exemplary convolution of grayscale input images using one or more kernels according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、１つ以上のカーネルを用いたカラー入力画像の例示的な畳み込みを示す図である。FIG. 3 illustrates an exemplary convolution of a color input image using one or more kernels according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、ベースニューラルネットワークの例示的な畳み込み層を示す図である。It is a figure which shows the exemplary convolution layer of the base neural network which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に係る、ベースニューラルネットワークの正規化線形ユニット層の例示的な仕組みを示す図である。It is a figure which shows the exemplary mechanism of the normalized linear unit layer of the base neural network which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に係る、ベースニューラルネットワークのプーリング層の例示的な仕組みを示す図である。It is a figure which shows the exemplary mechanism of the pooling layer of the base neural network which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に係る、ベースニューラルネットワークの例示的な完全接続層を示す図である。FIG. 3 illustrates an exemplary fully connected layer of a base neural network according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、１つ以上の接続を有するベースニューラルネットワークの例示的な第1の層及び第２の層を示す図である。FIG. 3 illustrates an exemplary first and second layer of a base neural network with one or more connections according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、ベースニューラルネットワークの例示的なカーネル活性化リストを示す図である。It is a figure which shows the example kernel activation list of the base neural network which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に係る、複数の値を有するベースニューラルネットワークの例示的な１つ以上のカーネルを示す図である。FIG. 3 illustrates an exemplary one or more kernels of a base neural network with a plurality of values according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、ベースニューラルネットワークの出力に基づいた１つ以上のカーネルの１つ以上の値の例示的な修正を示す図である。It is a figure which shows the exemplary modification of one or more values of one or more kernels based on the output of a base neural network, according to some embodiments of this disclosure. 本開示のいくつかの実施形態に係る、１つ以上の層を有する例示的なベースニューラルネットワークを示す図である。FIG. 3 illustrates an exemplary base neural network with one or more layers according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、１つ以上のカーネルのベースニューラルネットワークへの例示的な追加を示す図である。FIG. 5 illustrates an exemplary addition of one or more kernels to a base neural network according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、ベースニューラルネットワークの１つ以上の接続の例示的な修正を示す図である。FIG. 5 illustrates an exemplary modification of one or more connections of a base neural network according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、ベースニューラルネットワークのカーネル活性化リストへの１つ以上のオブジェクトの例示的な追加を示す図である。FIG. 5 illustrates an exemplary addition of one or more objects to the kernel activation list of a base neural network according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、ベースニューラルネットワークのカーネル活性化リストへの１つ以上のオブジェクトの例示的な追加を示す図である。FIG. 5 illustrates an exemplary addition of one or more objects to the kernel activation list of a base neural network according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、ベースニューラルネットワークのカーネル活性化リストの１つ以上のカーネルのインデックスの例示的な修正を示す図である。It is a figure which shows the exemplary modification of the index of one or more kernels of the kernel activation list of a base neural network which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に係る、第１のユーザ入力に基づいて受け取られた情報の例示的な修正を示す図である。It is a figure which shows the exemplary modification of the information received based on the 1st user input which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に係る、例示的に生成されたニューラルネットワーク及びカーネル活性化リストを示す図である。FIG. 5 shows an exemplary generated neural network and kernel activation list for some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、例示的なテストデータセットを示す図である。It is a figure which shows the exemplary test data set which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に係る、テスト画像のために生成されたニューラルネットワークの第１の層の例示的な出力を示す図である。It is a figure which shows the exemplary output of the first layer of the neural network generated for the test image which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に係る、識別された１つ以上のカーネルを介したテスト画像の例示的な伝播を示す図である。FIG. 6 illustrates exemplary propagation of test images through one or more identified kernels according to some embodiments of the present disclosure. 本開示のいくつかの実施形態に係る、画像内のオブジェクト認識のためにニューラルネットワークを生成するための汎用コンピュータシステムを示す図である。It is a figure which shows the general-purpose computer system for generating a neural network for object recognition in an image which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に係る、各種行列を示した図である。It is a figure which showed various matrix which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に係る、各種数式を示した図である。It is a figure which showed various mathematical formulas which concerns on some embodiments of this disclosure. 本開示のいくつかの実施形態に係る、各種数式を示した図である。It is a figure which showed various mathematical formulas which concerns on some embodiments of this disclosure.

本明細書のいかなるブロック図も、本主題の原理を実施する例示的なシステムの概念図を表すことが当業者によって理解されるべきである。同様に、いかなるフローチャート、流れ図、状態遷移図、疑似コードなども、コンピュータ可読媒体で実質的に表され得、コンピュータ又はプロセッサによって、係るコンピュータ又はプロセッサが例示的な示されているかどうかに関わらず、実行され得る多様なプロセスを表すことが理解され得る。 It should be understood by those skilled in the art that any block diagram herein represents a conceptual diagram of an exemplary system that implements the principles of the subject. Similarly, any flow chart, flow diagram, state transition diagram, pseudo-code, etc. may be represented substantially on a computer-readable medium, regardless of whether the computer or processor is exemplified by the computer or processor. It can be understood that it represents a variety of processes that can be performed.

本書では、単語「例示的な」は、「実施例、例、又は実例としての役割を果たすこと」を意味するために本明細書で使用される。本明細書に「例示的」として説明される本主題の任意の実施形態又は実施態様は、好ましい又は他の実施形態よりも有利とは必ずしも見なされるべきではない。 As used herein, the word "exemplary" is used herein to mean "to serve as an example, example, or example." Any embodiment or embodiment of the subject described herein as "exemplary" should not necessarily be considered preferred or more advantageous than other embodiments.

本開示は、多様な変更形態及び代替形式の影響を受けやすいが、その具体的な実施形態は図面中に例として示され、以下に詳細に説明されることがある。しかしながら、それは、本開示を開示される特定の形に限定することを意図しておらず、逆に本開示は、本開示の範囲に入るすべての変更形態、等価物、及び代替策を含むことを理解されたい。 The present disclosure is susceptible to a variety of modified and alternative forms, the specific embodiments of which are shown by way of illustration in the drawings and may be described in detail below. However, it is not intended to limit this disclosure to any particular form disclosed, and conversely, this disclosure includes all modifications, equivalents, and alternatives that fall within the scope of this disclosure. Please understand.

用語「含む（ｃｏｍｐｒｉｓｅｓ）」、「含む（ｉｎｃｌｕｄｅｓ）」、「含む（ｃｏｍｐｒｉｓｉｎｇ）」、「含む（ｉｎｃｌｕｄｉｎｇ）」又はその任意の他の変形語は、包括的な包含を対象とすることを目的とし、これにより構成要素又はステップのリストを含むセットアップ、デバイス、又は方法は、それらの構成要素又はステップだけを含むのではなく、明示的に示されていない又は係るセットアップ若しくはデバイス若しくは方法に固有の他の構成要素を含んでよい。言い換えると、「・・・を含む（ｃｏｍｐｒｉｓｅｓ．．．ａ）」又は「・・・を含む（ｉｎｃｌｕｄｅｓ．．．ａ）」が前に置かれたシステム又は装置の１つ以上の要素は、より多くの制約を受けることなく、システム又は装置の他の要素又は追加の要素の存在を除外しない。 The terms "comprises", "includes", "comprising", "inclusion" or any other variant thereof are intended to cover comprehensive inclusion. , Thus a setup, device, or method that includes a list of components or steps does not include only those components or steps, but is not explicitly indicated or is specific to such setup or device or method. May include the components of. In other words, one or more elements of a system or appliance preceded by "comprises ... a" or "includes ... a" is more. It does not rule out the presence of other or additional elements of the system or appliance without many restrictions.

本開示は、画像内のオブジェクト認識のためにニューラルネットワークを生成する方法を説明する。方法は、１つ以上のオブジェクト間の階層関係に関する情報を受け取ることを含む。さらに、方法は、情報に基づいてベースニューラルネットワークに対する入力として、１つ以上のオブジェクトの各オブジェクトに対応する訓練データセットの１つ以上の画像を提供することを含み、ベースニューラルネットワークは、１つ以上のパラメータと関連付けられる。さらに、方法は、各入力画像について、ベースニューラルネットワークの出力及び各入力画像に対応するクラスラベルに基づいてベースニューラルネットワークの損失値を決定することを含み、出力は、１つ以上のオブジェクトに対応する相似値を示す。最後に、方法は、出力、ベースニューラルネットワークの損失値、及びニューラルネットワークを生成するための第２のユーザ入力のうちの少なくとも１つに基づいて、各入力画像について１つ以上のパラメータを更新することを含み、ニューラルネットワークはオブジェクト認識に使用される。 The present disclosure describes how to generate a neural network for object recognition in an image. The method comprises receiving information about a hierarchical relationship between one or more objects. Further, the method comprises providing one or more images of the training data set corresponding to each object of one or more objects as an informed input to the base neural network, the base neural network being one. It is associated with the above parameters. Further, the method comprises determining for each input image the output of the base neural network and the loss value of the base neural network based on the class label corresponding to each input image, the output corresponding to one or more objects. Indicates a similar value to be used. Finally, the method updates one or more parameters for each input image based on at least one of the output, the loss value of the base neural network, and the second user input to generate the neural network. Neural networks are used for object recognition.

本開示の実施形態の以下の詳細な説明では、本明細書の一部を形成し、実例として本開示が実践し得る具体的な実施形態が示される添付図面が参照される。これらの実施形態は、当業者が本開示を実践できるように十分に詳細に説明され、他の実施形態が利用され得ること、及び本開示の範囲から逸脱することなく変更が加えられ得ることを理解されたい。したがって、以下の説明は限定的な意味で解釈されるべきではない。 In the following detailed description of embodiments of the present disclosure, reference is made to the accompanying drawings that form part of the specification and show, by way of example, specific embodiments that the present disclosure may practice. These embodiments will be described in sufficient detail to allow one of ordinary skill in the art to practice the present disclosure, and that other embodiments may be utilized and that modifications may be made without departing from the scope of the present disclosure. I want to be understood. Therefore, the following explanation should not be construed in a limited sense.

図１は、本開示のいくつかの実施形態に従って、画像内のオブジェクト認識のためにニューラルネットワークを生成するための例示的な環境を示す。 FIG. 1 shows an exemplary environment for generating a neural network for object recognition in an image according to some embodiments of the present disclosure.

一実施形態では、ユーザ（１０１）は、ユーザインタフェース（１０３）を介して訓練システム（１０２）と対話して、画像内のオブジェクト認識のためにニューラルネットワークを生成する。訓練システム（１０２）は、ニューラルネットワークを生成するためにサーバ（図では図示せず）上で実装されてよい。訓練システム（１０２）は、有線インタフェース又は無線インタフェースの１つを通してユーザインタフェース（１０３）に接続される。ユーザインタフェース（１０３）は、ディスプレイユニット、タッチスクリーン、キーパッド、マイク、スピーカなどのうちの少なくとも１つを含む。ユーザ（１０１）は、ユーザインタフェース（１０３）を使用し、訓練システム（１０２）に入力を提供し、訓練システム（１０２）から出力（５０３）を受け取ってよい。訓練システム（１０２）は、ユーザ（１０１）から１つ以上のオブジェクト間の階層関係に関する情報を受け取ると、ベースニューラルネットワークを訓練する。情報は、例えばツリー構造など、階層の中の１つ以上のオブジェクトの配列（つまり、第1のオブジェクトが、第２のオブジェクトの「上方」、「下方」、又は「と同じ高さ」にある）を含む。訓練システム（１０２）によって受け取られた情報は、ユーザインタフェース（１０３）を介して第1のユーザ入力を使用して、ユーザ（１０１）によって修正されてよい。ベースニューラルネットワークは、１つ以上のパラメータと関連付けられた畳み込みニューラルネットワークである。ベースニューラルネットワークの１つ以上のパラメータは、ユーザインタフェース（１０３）を介してユーザ（１０１）によって構成されてよい。１つ以上のパラメータは、例えば、１つ以上の層、１つ以上の層のそれぞれの中の１つ以上のカーネル、１つ以上のカーネルの値、１つ以上の層の１つ以上のカーネル間の接続などに関するデータを含む。訓練システム（１０２）は、ベースニューラルネットワークへの入力として１つ以上のオブジェクトの各オブジェクトに対応する１つ以上の画像を提供することで、ベースニューラルネットワークを訓練する。１つ以上の画像は、訓練データセット（１０４）に格納される。さらに、訓練データセット（１０４）は、１つ以上の画像のそれぞれと関連付けられたクラスラベルを格納する。クラスラベルは、１つ以上のオブジェクトの名前に相当する。例えば、訓練データセット（１０４）に格納された１つ以上の画像の中の第1の画像は車の画像であり、クラスラベルは、車の車体の種類を示す「セダン」である。訓練データセット（１０４）は、図1に示される訓練システム（１０２）に接続されたデータベースで実装される場合もあれば、訓練データセット（１０４）は訓練システム（１０２）に格納される場合もある。 In one embodiment, the user (101) interacts with the training system (102) via the user interface (103) to generate a neural network for object recognition in the image. The training system (102) may be implemented on a server (not shown) to generate a neural network. The training system (102) is connected to the user interface (103) through one of the wired or wireless interfaces. The user interface (103) includes at least one of a display unit, a touch screen, a keypad, a microphone, a speaker, and the like. The user (101) may use the user interface (103) to provide an input to the training system (102) and receive an output (503) from the training system (102). The training system (102) trains the base neural network when it receives information about the hierarchical relationship between one or more objects from the user (101). The information is in an array of one or more objects in the hierarchy, for example a tree structure (ie, the first object is "above", "below", or "at the same height" as the second object. )including. The information received by the training system (102) may be modified by the user (101) using the first user input via the user interface (103). A base neural network is a convolutional neural network associated with one or more parameters. One or more parameters of the base neural network may be configured by the user (101) via the user interface (103). One or more parameters are, for example, one or more kernels in each of one or more layers, one or more layers, one or more kernel values, and one or more kernels in one or more layers. Includes data about connections between. The training system (102) trains the base neural network by providing one or more images corresponding to each object of the one or more objects as input to the base neural network. One or more images are stored in the training data set (104). In addition, the training dataset (104) stores the class labels associated with each of the one or more images. Class labels correspond to the names of one or more objects. For example, the first image in one or more images stored in the training data set (104) is an image of a car and the class label is a "sedan" indicating the type of car body. The training data set (104) may be implemented in a database connected to the training system (102) shown in FIG. 1, or the training data set (104) may be stored in the training system (102). be.

一実施形態では、ベースニューラルネットワークに対する入力として提供された各入力画像について、訓練システム（１０２）は、ベースニューラルネットワークの出力及び各入力画像に対応するクラスラベルに基づいてベースニューラルネットワークの損失値を決定する。損失値は、ベースニューラルネットワークによる誤分類率を示す。さらに、訓練システム（１０２）は、出力、ベースニューラルネットワークの損失値、及び第２のユーザ入力のうちの少なくとも１つに基づいて、各入力画像についてベースニューラルネットワークと関連付けられた１つ以上のパラメータを更新する。ユーザ（１０１）は、ユーザインタフェース（１０３）を介して第２のユーザ入力を使用し、ベースニューラルネットワークの１つ以上のパラメータを修正してよい。 In one embodiment, for each input image provided as an input to the base neural network, the training system (102) determines the loss value of the base neural network based on the output of the base neural network and the class label corresponding to each input image. decide. The loss value indicates the misclassification rate by the base neural network. In addition, the training system (102) has one or more parameters associated with the base neural network for each input image based on at least one of the output, the loss value of the base neural network, and the second user input. To update. The user (101) may use the second user input via the user interface (103) to modify one or more parameters of the base neural network.

一実施形態では、ベースニューラルネットワークは、訓練データセット（１０４）からの入力として、各オブジェクトに１つ以上の画像を提供し、損失値が所定の閾値より少なくなるまで１つ以上のパラメータを更新することによって訓練される。例えば、所定の閾値は０．０１であってよい。損失値が１つ以上のオブジェクトのそれぞれについて所定の閾値より少なくなると、ベースニューラルネットワークの訓練は終了し、ベースニューラルネットワークからの更新された１つ以上のパラメータを含むニューラルネットワークが生成される。生成されたニューラルネットワークは、画像内のオブジェクト認識のために使用される。 In one embodiment, the base neural network provides one or more images for each object as input from the training data set (104) and updates one or more parameters until the loss value is less than a predetermined threshold. Trained by doing. For example, the predetermined threshold may be 0.01. When the loss value is less than a predetermined threshold for each of the one or more objects, the training of the base neural network ends and a neural network containing one or more updated parameters from the base neural network is generated. The generated neural network is used for object recognition in the image.

一実施形態では、ユーザ（１０１）は、テスト画像内のオブジェクトを認識するために訓練システム（１０２）にテストデータセット（１０５）に格納されたテスト画像を提供する。テストデータセット（１０５）は、クラスラベルのない１つ以上の画像を含み、１つ以上の画像内のクラスラベル又は名前は認識される必要がある。訓練システム（１０２）は、ニューラルネットワークへの入力としてテスト画像を提供し、ニューラルネットワークの出力に基づいて、テスト画像のクラスラベルが決定される、又はテスト画像内のオブジェクトが認識される。テストデータセット（１０５）は、図1に示されるように訓練システム（１０２）に接続されたデータベースで実装される場合もあれば、テストデータセット（１０５）が訓練システム（１０２）に格納される場合もある。 In one embodiment, the user (101) provides the training system (102) with a test image stored in a test data set (105) to recognize an object in the test image. The test data set (105) includes one or more images without class labels, and the class labels or names in the one or more images need to be recognized. The training system (102) provides a test image as an input to the neural network, and based on the output of the neural network, the class label of the test image is determined or the object in the test image is recognized. The test data set (105) may be implemented in a database connected to the training system (102) as shown in FIG. 1, or the test data set (105) is stored in the training system (102). In some cases.

図２Ａは、本開示のいくつかの実施形態に係る、訓練システム（１０２）の詳細なブロック図を示す。 FIG. 2A shows a detailed block diagram of the training system (102) according to some embodiments of the present disclosure.

訓練システム（１０２）は、中央演算処理装置（「ＣＰＵ」又は「プロセッサ」）（２０３）、及びプロセッサ（２０３）によって実行可能な命令を格納するメモリ（２０２）を含んでよい。プロセッサ（２０３）は、ユーザ要求又はシステム生成要求を実行するためのプログラム構成要素を実行するために少なくとも１つのデータプロセッサを含んでよい。メモリ（２０２）は、プロセッサ（２０３）に通信で結合されてよい。訓練システム（１０２）は、入出力（Ｉ／Ｏ）インタフェース（２０１）をさらに含む。Ｉ／Ｏインタフェース（２０１）は、入力信号又は／及び出力信号がそれを通じて通信され得るプロセッサ（２０３）と結合されてよい。一実施形態では、訓練システム（１０２）は、Ｉ／Ｏインタフェース（２０１）を通して入力画像、テスト画像、第1のユーザ入力、及び第２のユーザ入力を受け取ってよい。 The training system (102) may include a central processing unit (“CPU” or “processor”) (203), and a memory (202) that stores instructions that can be executed by the processor (203). The processor (203) may include at least one data processor for executing program components for executing a user request or a system generation request. The memory (202) may be communicatively coupled to the processor (203). The training system (102) further includes an input / output (I / O) interface (201). The I / O interface (201) may be coupled with a processor (203) through which an input signal and / and an output signal can be communicated. In one embodiment, the training system (102) may receive input images, test images, first user inputs, and second user inputs through the I / O interface (201).

いくつかの実施態様では、訓練システム（１０２）は、データ（２０４）及びモジュール（２０９）を含んでよい。一例として、データ（２０４）及びモジュール（２０９）は、訓練システム（１０２）で構成されたメモリ（２０２）に格納されてよい。一実施形態では、データ（２０４）は、例えば関係データ（２０５）、パラメータデータ（２０６）、損失値データ（２０７）、及び他のデータ（２０８）を含んでよい。示されている図２Ａでは、データ（２０４）が本明細書に詳細に説明される。 In some embodiments, the training system (102) may include data (204) and modules (209). As an example, the data (204) and the module (209) may be stored in a memory (202) configured by the training system (102). In one embodiment, the data (204) may include, for example, relationship data (205), parameter data (206), loss value data (207), and other data (208). In FIG. 2A shown, data (204) is described in detail herein.

一実施形態では、関係データ（２０５）は、１つ以上のオブジェクト間の階層関係に関する情報を含む。１つ以上のオブジェクトは、訓練データセット（１０４）に格納された１つ以上の画像と関連付けられたクラスラベルに基づいて決定される。１つ以上のオブジェクトは、１つ以上のオブジェクト間の関係に基づいた階層（例えば、ツリー構造、チャートなど）で配列される。１つ以上のオブジェクト間の関係は、クラスラベルに基づいて決定される。例えば、訓練データセット（１０４）の１つ以上の画像に１つ以上のオブジェクト「バイク」、「スクーター」、「セダン」、「ハッチバック」、「スポーツユーティリティビークル（ＳＵＶ）」、「バス」、及び「トラック」を含ませる。図２Ｂに示されるように、情報（２１７）は、ツリー構造のノードによって表される１つ以上のオブジェクトを有するツリー構造の形で配列され、エッジはツリー構造の階層関係を表す。さらに、図２Ｂに示されるように、１つ以上のオブジェクト「バイク」及び「スクーター」は、「２つの車輪を有する車両」として分類され、１つ以上のオブジェクト「セダン」、「ハッチバック」、及び「スポーツユーティリティビークル（ＳＵＶ）」は「４つの車輪を有する車両」として分類され、１つ以上のオブジェクト「バス」及び「トラック」は「４つを超える車輪を有する車両」として分類される。さらに、１つ以上のオブジェクト「２つの車輪を有する車両」、「４つの車両を有する車両」、及び「４つを超える車両を有する車両」は、「自動車」として分類される。 In one embodiment, the relationship data (205) includes information about hierarchical relationships between one or more objects. One or more objects are determined based on the class label associated with the one or more images stored in the training data set (104). One or more objects are arranged in a hierarchy (eg, tree structure, chart, etc.) based on the relationships between the one or more objects. Relationships between one or more objects are determined based on the class label. For example, one or more objects "bike", "scouter", "sedan", "hatchback", "sport utility vehicle (SUV)", "bus", and one or more objects in one or more images of the training dataset (104). Include "track". As shown in FIG. 2B, the information (217) is arranged in the form of a tree structure having one or more objects represented by the nodes of the tree structure, and the edges represent the hierarchical relationship of the tree structure. Further, as shown in FIG. 2B, one or more objects "motorcycles" and "scooters" are classified as "vehicles with two wheels" and one or more objects "sedans", "hatchbacks", and. A "sports utility vehicle (SUV)" is classified as a "vehicle with four wheels" and one or more objects "bus" and "truck" are classified as a "vehicle with more than four wheels". Further, one or more objects "vehicle with two wheels", "vehicle with four vehicles", and "vehicle with more than four vehicles" are classified as "vehicles".

一実施形態では、パラメータデータ（２０６）は、ベースニューラルネットワークと関連付けられた１つ以上のパラメータを含む。１つ以上のパラメータは、ベースニューラルネットワークの各層の１つ以上のカーネル、ベースニューラルネットワークの１つ以上の層のための第１の層の１つ以上のカーネルと第１の層の後の第２の層の１つ以上のカーネルとの間の１つ以上の接続、１つ以上のカーネルの中の複数の値、及びカーネル活性化リストのうちの少なくとも１つを含む。さらに、カーネル活性化リストは、ベースニューラルネットワークの第１の層の出力、１つ以上のカーネルのインデックス、及びクラスラベルのうちの少なくとも１つを含む。例えば、４つの層を有するベースニューラルネットワークを考えると、ベースニューラルネットワークの各層の１つ以上のカーネルは、第１の層の７つのカーネル、第２の層の１２のカーネル、第３の層の１５のカーネル、及び第４の層の１８のカーネルである。別の実施例では、サイズ３×３を有する１つ以上のカーネルからの第１のカーネルを考え。第１のカーネルの複数の値は、図９Ａの（行列１）に示す行列を使用し、表される。 In one embodiment, the parameter data (206) comprises one or more parameters associated with the base neural network. One or more parameters are one or more kernels of each layer of the base neural network, one or more kernels of the first layer for one or more layers of the base neural network, and the first after the first layer. Includes one or more connections between one or more kernels in two layers, multiple values in one or more kernels, and at least one of the kernel activation lists. In addition, the kernel activation list contains the output of the first layer of the base neural network, one or more kernel indexes, and at least one of the class labels. For example, considering a base neural network with four layers, one or more kernels in each layer of the base neural network are seven kernels in the first layer, twelve kernels in the second layer, and three layers. There are 15 kernels and 18 kernels in the fourth layer. In another embodiment, consider a first kernel from one or more kernels having a size of 3x3. The plurality of values of the first kernel are represented using the matrix shown in FIG. 9A (matrix 1).

一実施形態では、損失値データ（２０７）は、損失関数に基づいてベースニューラルネットワークについて決定された損失値を含む。損失関数は、ベースニューラルネットワークの予測出力、及び各入力画像と関連付けられたクラスラベルに基づく。損失値は、ベースニューラルネットワークの誤分類（つまり、所与の入力画像に間違ったクラスラベルを予測すること）率を示してよい。より高い損失値はより高い誤分類率を示し、より低い損失値はより低い誤分類率を示す。例えば、２．４の損失値はより高い誤分類率を示し、０．１の損失値はより低い誤分類率を示す。 In one embodiment, the loss value data (207) comprises a loss value determined for the base neural network based on the loss function. The loss function is based on the predicted output of the base neural network and the class label associated with each input image. The loss value may indicate the rate of misclassification of the base neural network (ie, predicting the wrong class label for a given input image). Higher loss values indicate a higher misclassification rate and lower loss values indicate a lower misclassification rate. For example, a loss value of 2.4 indicates a higher misclassification rate and a loss value of 0.1 indicates a lower misclassification rate.

一実施形態では、他のデータ（２０８）は、所定の閾値、第１のユーザ入力、第２のユーザ入力、ベースニューラルネットワーク及び生成されたニューラルネットワークのゼロ詰め、ストライド、出力と関連付けられた１つ以上の値、損失関数などを含んでよい。 In one embodiment, the other data (208) is associated with a predetermined threshold, a first user input, a second user input, a base neural network and a generated neural network zeroed, stride, output 1 It may contain one or more values, a loss function, and so on.

いくつかの実施形態では、データ（２０４）は、多様なデータ構造の形でメモリ（２０２）に格納されてよい。さらに、データ（２０４）は、リレーショナルデータモデル又は階層データモデルなどのデータモデルを使用して編成されてよい。他のデータ（２０８）は、訓練システム（１０２）の多様な機能を実行するためにモジュール（２０９）によって生成された、一時データ及び一時ファイルを含むデータを格納してよい。 In some embodiments, the data (204) may be stored in memory (202) in the form of various data structures. Further, the data (204) may be organized using a data model such as a relational data model or a hierarchical data model. The other data (208) may store data, including temporary data and temporary files, generated by the module (209) to perform various functions of the training system (102).

いくつかの実施形態では、メモリ（２０２）に格納されたデータ（２０４）は、訓練システム（１０２）のモジュール（２０９）によって処理されてよい。モジュール（２０９）は、メモリ（２０２）の中に格納されてよい。一例では、訓練システム（１０２）内で構成されたプロセッサ（２０３）に通信で結合されたモジュール（２０９）は、図２Ａに示されるようにメモリ（２０２）の外部に存在し、ハードウェアとして実装されてもよい。本明細書で使用されるように、用語モジュール（２０９）は、特定用途向け集積回路（ＡＳＩＣ）、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）、電子回路、プロセッサ（２０３）（共有、専用、又はグループ）、及び１つ以上のソフトウェアプログラム又はファームウェアプログラムを実行するメモリ（２０２）、組合せ論理回路、及び／又は説明された機能を提供する他の適切な構成要素を指してよい。いくつかの他の実施形態では、モジュール（２０９）は、ＡＳＩＣ及びＦＰＧＡのうちの少なくとも１つを使用し、実装されてよい。 In some embodiments, the data (204) stored in the memory (202) may be processed by the module (209) of the training system (102). The module (209) may be stored in the memory (202). In one example, the module (209) communicatively coupled to the processor (203) configured within the training system (102) resides outside the memory (202) and is implemented as hardware, as shown in FIG. 2A. May be done. As used herein, the term module (209) is an application specific integrated circuit (ASIC), FPGA (field programmable gate array), electronic circuit, processor (203) (shared, dedicated, or group). And may refer to memory (202) running one or more software or firmware programs, combined logic circuits, and / or other suitable components that provide the described functionality. In some other embodiments, the module (209) may be implemented using at least one of an ASIC and an FPGA.

一実施態様では、モジュール（２０９）は、例えば、通信モジュール（２１０）、情報修正モジュール（２１１）、入力モジュール（２１２）、損失決定モジュール（２１３）、更新モジュール（２１４）、認識モジュール（２１５）、及び他のモジュール（２１６）を含んでよい。係る上述のモジュール（２０９）が、単一のモジュール又は異なるモジュール（２０９）の組合せとして表されてよいことが理解され得る。 In one embodiment, the module (209) may be, for example, a communication module (210), an information correction module (211), an input module (212), a loss determination module (213), an update module (214), a recognition module (215). , And other modules (216). It can be understood that such above-mentioned module (209) may be represented as a single module or a combination of different modules (209).

一実施形態では、通信モジュール（２１０）は、ユーザインタフェース（１０３）を介してユーザ（１０１）から第１のユーザ入力及び第２のユーザ入力を受け取るために使用される。第１のユーザ入力は、１つ以上のオブジェクト間の階層関係に関する情報（２１７）の修正を含む。第２のユーザ入力は、ベースニューラルネットワークの１つ以上のパラメータを更新することを含む。通信モジュール（２１０）は、有線インタフェース又は無線インタフェースの１つを通してユーザインタフェース（１０３）に接続される。さらに、通信モジュール（２１０）は、ユーザインタフェース（１０３）を介してユーザ（１０１）にベースニューラルネットワークの出力、ベースニューラルネットワークの１つ以上のパラメータ、生成されたニューラルネットワーク、訓練データセット（１０４）、及びテストデータセット（１０５）のうちの少なくとも１つを提供するために使用される。 In one embodiment, the communication module (210) is used to receive a first user input and a second user input from the user (101) via the user interface (103). The first user input includes modification of information (217) regarding hierarchical relationships between one or more objects. The second user input involves updating one or more parameters of the base neural network. The communication module (210) is connected to the user interface (103) through one of the wired or wireless interfaces. In addition, the communication module (210) outputs the base neural network to the user (101) via the user interface (103), one or more parameters of the base neural network, the generated neural network, the training data set (104). , And used to provide at least one of the test data sets (105).

一実施態様では、情報修正モジュール（２１１）は、ユーザインタフェース（１０３）を介してユーザ（１０１）から受け取られた第１のユーザ入力に基づいて、１つ以上のオブジェクト間の階層関係に関する情報（２１７）を修正するために使用される。第１のユーザ入力は、訓練データセット（１０４）に基づいて情報（２１７）に新しいオブジェクトを追加することと、訓練データセット（１０４）の１つ以上のオブジェクトのための画像の総数に基づいて情報（２１７）の中の１つ以上のオブジェクトの位置を入れ替えることと、ベースニューラルネットワークの損失値などに基づいて情報（２１７）の１つ以上のオブジェクトを削除することのうちの少なくとも１つを含む。 In one embodiment, the information modification module (211) has information about a hierarchical relationship between one or more objects based on a first user input received from the user (101) via the user interface (103). Used to modify 217). The first user input is to add a new object to the information (217) based on the training dataset (104) and based on the total number of images for one or more objects in the training dataset (104). Swap the positions of one or more objects in the information (217) and delete at least one of the one or more objects in the information (217) based on the loss value of the base neural network, etc. include.

一実施形態では、入力モジュール（２１２）は、訓練データセット（１０４）から１つ以上の画像及び１つ以上の画像に対応するクラスラベルを取り出すために使用される。さらに、入力モジュール（２１２）は、クラスラベルに基づいて１つ以上の画像をソートするために使用される。当業者は、クラスラベルに基づいて１つ以上の画像をソートするためのバブルソート、選択ソート、マージソート、挿入ソート、クイックソート等を含む１つ以上の技術の使用を理解し得る。さらに、入力モジュール（２１２）は、情報（２１７）の中の各オブジェクトの位置に基づいてベースニューラルネットワークに対する入力として各オブジェクトに対応する１つ以上の画像を提供するために使用される。例えば、図２Ｂに示される情報（２１７）の第１のオブジェクト「バイク」に対応する１つ以上の画像は、ベースニューラルネットワークに対する入力として提供されてよく、「スクーター」などの１つ以上の画像が後に続く。 In one embodiment, the input module (212) is used to retrieve one or more images and the class labels corresponding to one or more images from the training data set (104). In addition, the input module (212) is used to sort one or more images based on the class label. Those skilled in the art may understand the use of one or more techniques including bubble sort, selection sort, merge sort, insertion sort, quicksort, etc. for sorting one or more images based on class labels. Further, the input module (212) is used to provide one or more images corresponding to each object as input to the base neural network based on the position of each object in the information (217). For example, one or more images corresponding to the first object "bike" of information (217) shown in FIG. 2B may be provided as input to the base neural network and one or more images such as "scooters". Follows.

一実施形態では、損失決定モジュール（２１３）は、ベースニューラルネットワークの損失値を決定するために使用される。損失値は、ベースニューラルネットワークに提供された入力画像ごとに決定される。さらに、損失値は、損失関数を使用して、ベースニューラルネットワークの出力及び各入力画像に対応するクラスラベルに基づいて決定される。例えば、損失関数は、平均平方誤差、交差エントロピー、階層交差エントロピー技術などのうちの少なくとも１つであってよい。 In one embodiment, the loss determination module (213) is used to determine the loss value of the base neural network. The loss value is determined for each input image provided to the base neural network. In addition, the loss value is determined using the loss function based on the output of the base neural network and the class label corresponding to each input image. For example, the loss function may be at least one of mean squared error, cross entropy, hierarchical cross entropy techniques, and the like.

一実施形態では、更新モジュール（２１４）は、ベースニューラルネットワークの１つ以上のパラメータを更新するために使用される。１つ以上のパラメータは、ベースニューラルネットワークの損失値、入力画像に対応するベースニューラルネットワークの出力、及びユーザインタフェース（１０３）を介してユーザ（１０１）から受け取られた第２のユーザ入力のうちの少なくとも１つに基づいて更新される。さらに、１つ以上のパラメータは、ベースニューラルネットワークの１つ以上の層の中の少なくとも１つの層で１つ以上のカーネルを追加することと、ベースニューラルネットワークの１つ以上の層の１つ以上のカーネルの複数の値の中の１つ以上の値を修正することと、ベースニューラルネットワークの１つ以上の層のための第1の層の１つ以上のカーネルと第１の層の後の第２の層の１つ以上のカーネルとの間の１つ以上の接続を修正することと、カーネル活性化リストを修正することとのうちの少なくとも１つを実行することによって更新される。例えば、ベースニューラルネットワークの第1の層の第３のカーネルから、第1の層の後のベースニューラルネットワークの第２の層の第２のカーネルへの接続は、第１の層の第３のカーネルを第２の層の第４のカーネルに接続するために修正されてよい。さらに、第１の層の第５のカーネルと第２の層の第１のカーネルとの間の新しい接続が追加されてよい。 In one embodiment, the update module (214) is used to update one or more parameters of the base neural network. One or more parameters are the loss value of the base neural network, the output of the base neural network corresponding to the input image, and the second user input received from the user (101) via the user interface (103). Updated based on at least one. In addition, one or more parameters include adding one or more kernels at least one of the one or more layers of the base neural network and one or more of the one or more layers of the base neural network. Modifying one or more of the values in the kernel and after one or more kernels and the first layer of the first layer for one or more layers of the base neural network It is updated by performing at least one of modifying one or more connections to one or more kernels in the second layer and modifying the kernel activation list. For example, the connection from the third kernel of the first layer of the base neural network to the second kernel of the second layer of the base neural network after the first layer is the third kernel of the first layer. It may be modified to connect the kernel to the 4th kernel in the 2nd layer. In addition, new connections may be added between the fifth kernel in the first layer and the first kernel in the second layer.

一実施形態では、認識モジュール（２１５）は、生成されたニューラルネットワークを使用し、テスト画像内のオブジェクト認識を認識するために使用される。ユーザ（１０１）は、ユーザインタフェース（１０３）を介してテストデータセット（１０５）に格納されたテスト画像を選択してよい。認識モジュール（２１５）は、テストデータセット（１０５）からテスト画像を取り出し、ニューラルネットワークの第１の層に対する入力としてテスト画像を提供する。さらに、認識モジュール（２１５）は、ニューラルネットワークの第１の層の出力をカーネル活性化リストと比較する。さらに、認識モジュール（２１５）は、比較に基づいてカーネル活性化リストから１つ以上のカーネルを識別する。その後、認識モジュール（２１５）は、ニューラルネットワークの１つ以上の層の識別された１つ以上のカーネルを介してテスト画像を伝播する。最後に、認識モジュール（２１５）は、ニューラルネットワークの出力に基づいてテスト画像のオブジェクトを認識する。 In one embodiment, the recognition module (215) uses the generated neural network and is used to recognize object recognition in the test image. The user (101) may select a test image stored in the test data set (105) via the user interface (103). The recognition module (215) takes the test image from the test data set (105) and provides the test image as input to the first layer of the neural network. In addition, the recognition module (215) compares the output of the first layer of the neural network with the kernel activation list. In addition, the recognition module (215) identifies one or more kernels from the kernel activation list based on comparisons. The recognition module (215) then propagates the test image through one or more identified kernels of one or more layers of the neural network. Finally, the recognition module (215) recognizes the object in the test image based on the output of the neural network.

一実施形態では、他のモジュール（２１０）は、ユーザインタフェース（１０３）を介してユーザ（１０１）から受け取られた訓練データセット（１０４）及びテストデータセット（１０５）に１つ以上の画像を追加することを担う。さらに、他のモジュール（２１０）は、訓練データセット（１０４）及び第１のユーザ入力のうちの少なくとも１つに基づいて情報（２１７）を生成するために使用される。 In one embodiment, the other module (210) adds one or more images to the training data set (104) and test data set (105) received from the user (101) via the user interface (103). Responsible for doing. In addition, another module (210) is used to generate information (217) based on at least one of the training data set (104) and the first user input.

図３は、本開示のいくつかの実施形態に係る、画像内のオブジェクト認識のためにニューラルネットワークを生成する方法を示すフローチャートを示す。 FIG. 3 shows a flow chart showing a method of generating a neural network for object recognition in an image, according to some embodiments of the present disclosure.

方法（３００）を説明できる順序は、制限として解釈されることを意図しておらず、任意の数の説明された方法ブロックは、方法を実装するために任意の順序で結合されてよい。さらに、個々のブロックは、本明細書に説明される主題の精神及び範囲から逸脱することなく、方法から削除されてよい。さらに、方法は、任意の適切なハードウェア、ソフトウェア、ファームウェア、又はその組合せで実装されてよい。 The order in which method (300) can be described is not intended to be construed as a limitation, and any number of described method blocks may be combined in any order to implement the method. Moreover, individual blocks may be removed from the method without departing from the spirit and scope of the subject matter described herein. Further, the method may be implemented with any suitable hardware, software, firmware, or a combination thereof.

ステップ３０１で、訓練システム（１０２）は、１つ以上のオブジェクト間の階層関係に関する情報（２１７）を受け取る。階層関係は、１つ以上のクラス及びサブクラスに分類される１つ以上のオブジェクトを示す。一実施形態では、情報（２１７）は、ユーザ（１０１）によって手作業で生成される。一実施形態では、情報は、訓練データセット（１０４）を使用して、訓練システム（１０２）によって自動的に生成されてよい。情報（２１７）は、ツリー構造、チャート、テーブルなどのうちの少なくとも１つを使用して、表されてよい。 At step 301, the training system (102) receives information (217) about the hierarchical relationship between one or more objects. Hierarchical relationships indicate one or more objects that are classified into one or more classes and subclasses. In one embodiment, the information (217) is manually generated by the user (101). In one embodiment, the information may be automatically generated by the training system (102) using the training data set (104). Information (217) may be represented using at least one of a tree structure, charts, tables, and the like.

一実施形態では、訓練データセット（１０４）は、図４Ａに示されるように、１つ以上のオブジェクトのそれぞれの１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）、及び１つ以上の画像（４０１Ａ、４０１Ｂ．．．４０１Ｎ）に対応するクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）を格納する。訓練データセット（１０４）の１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）は、白黒画像、グレイスケール画像、及びカラー画像のうちの少なくとも１つである。クラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）は、テキスト、ベクトルなどのうちの少なくとも１つとして表される。例えば、テキストとして表されるクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）は、「ビーグル」であり、「ビーグル」の１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）に対応するベクトルとして表されるクラスラベルは、図９Ａの（行列２）に示される。 In one embodiment, the training data set (104) is one or more images (401A, 401B, ... 401N) of each of one or more objects, and one or more, as shown in FIG. 4A. Stores class labels (402A, 402B, ... 402N) corresponding to images (401A, 401B ... 401N). The one or more images (401A, 401B, ... 401N) of the training data set (104) is at least one of a black-and-white image, a grayscale image, and a color image. The class label (402A, 402B, ... 402N) is represented as at least one of text, vector, and the like. For example, the class label (402A, 402B, ... 402N) represented as text is a "beagle" and is a vector corresponding to one or more images (401A, 401B, ... 401N) of the "beagle". The class label represented as is shown in (matrix 2) of FIG. 9A.

図４Ａに示される１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）及びクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）は、説明のためだけであり、制限として扱われるべきではない。さらに、訓練システム（１０２）は、ユーザ（１０１）からの第１の入力に基づいて、訓練データセット（１０４）の１つ以上のオブジェクト、及び１つ以上のオブジェクトの中の各オブジェクトに対応する訓練データセット（１０４）の１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）の総数を決定する。１つ以上のオブジェクトは、訓練データセット（１０４）のいくつかの一意のクラスラベルを識別することによって決定される。一意に識別されたクラスラベルのそれぞれに対応する１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）は、各オブジェクトに対応する１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）の総数を決定するために統合される。 The one or more images (401A, 401B, ... 401N) and class labels (402A, 402B, ... 402N) shown in FIG. 4A are for illustration purposes only and should not be treated as restrictions. Further, the training system (102) corresponds to one or more objects in the training data set (104) and each object in the one or more objects based on the first input from the user (101). Determine the total number of one or more images (401A, 401B, ... 401N) in the training data set (104). One or more objects are determined by identifying some unique class labels in the training data set (104). One or more images (401A, 401B, ... 401N) corresponding to each of the uniquely identified class labels are one or more images (401A, 401B, ... 401N) corresponding to each object. Integrated to determine the total number.

一実施形態では、訓練システム（１０２）は、各オブジェクトに対応する１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）の総数、第１のユーザ入力、及び１つ以上のオブジェクト間の関係のうちの少なくとも１つに基づいて、１つ以上のノード及び１つ以上のエッジを有する階層ツリー構造を構築する。階層ツリー構造では、１つ以上のノードは１つ以上のオブジェクトを示し、１つ以上のエッジは１つ以上のオブジェクト間の関係を示す。訓練システム（１０２）は、階層の同じレベル（つまり、第１のレベル（４０３））でノードとして類似した関係を有する１つ以上のオブジェクトを配列してよい。例えば、１つ以上のオブジェクト、つまり「ラブラドール」、「柴犬」、「ビーグル」、「ピットブル」、「ロシアンブルー」、「シャム猫」、及び「ペルシャ猫」は、図４Ｂに示されるように、階層の第１のレベル（４０３）に配置される。さらに、階層の第１のレベル（４０３）の１つ以上のオブジェクトは、１つ以上のエッジを使用して、第２のレベル（４０４）の中に（つまり、より高いカテゴリと）統合され、階層の第１のレベル（４０３）の１つ以上のオブジェクトの上方に配置される。例えば、１つ以上のオブジェクト「犬」及び「猫」は、図４Ｂに示されるように、第２のレベル（４０４）の対応する１つ以上のオブジェクトのサブカテゴリとして、第１のレベル（４０３）の１つ以上のオブジェクトを示すエッジにより階層の第２のレベル（４０４）に配置される。さらに、第２のレベル（４０４）の１つ以上のオブジェクトは、第３のレベル（４０５）などに統合される。当業者は、１つ以上のレベルを有する階層ツリー構造を理解し得、図４Ｂに示される階層の３つのレベルは説明のためだけであり、制限として解釈されるべきではない。情報（２１７）は、図４Ｂに示されるように、１つ以上のオブジェクトを配列することによって生成された階層ツリー構造を含む。 In one embodiment, the training system (102) is a total number of one or more images (401A, 401B, ... 401N) corresponding to each object, a first user input, and a relationship between the one or more objects. Build a hierarchical tree structure with one or more nodes and one or more edges based on at least one of them. In a hierarchical tree structure, one or more nodes represent one or more objects, and one or more edges represent relationships between one or more objects. The training system (102) may arrange one or more objects having similar relationships as nodes at the same level of the hierarchy (ie, the first level (403)). For example, one or more objects, namely "Labrador", "Shiba Inu", "Beagle", "Pitbull", "Russian Blue", "Siamese Cat", and "Persian Cat", are as shown in FIG. 4B. It is placed at the first level (403) of the hierarchy. In addition, one or more objects at the first level (403) of the hierarchy are integrated into the second level (404) (ie, with higher categories) using one or more edges. It is placed above one or more objects at the first level (403) of the hierarchy. For example, one or more objects "dog" and "cat" are the first level (403) as a subcategory of the corresponding one or more objects of the second level (404), as shown in FIG. 4B. Placed at the second level (404) of the hierarchy by edges representing one or more objects of. Further, one or more objects of the second level (404) are integrated into the third level (405) and the like. One of ordinary skill in the art can understand a hierarchical tree structure with one or more levels, and the three levels of the hierarchy shown in FIG. 4B are for illustration purposes only and should not be construed as restrictions. Information (217) includes a hierarchical tree structure generated by arranging one or more objects, as shown in FIG. 4B.

一実施形態では、訓練システム（１０２）は、ユーザ（１０１）から、１つ以上のオブジェクト間の階層関係に関する情報（２１７）を示す階層ツリー構造を受け取る。 In one embodiment, the training system (102) receives from the user (101) a hierarchical tree structure showing information (217) about hierarchical relationships between one or more objects.

ステップ３０２で、訓練システム（１０２）は、情報（２１７）に基づいてベースニューラルネットワークに対する入力として１つ以上のオブジェクトの各オブジェクトに対応する訓練データセット（１０４）の１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）を提供する。ベースニューラルネットワークは、１つ以上のパラメータと関連付けられる。 At step 302, the training system (102) is based on the information (217) with one or more images (401A,) of the training data set (104) corresponding to each object of the one or more objects as input to the base neural network. 401B, ... 401N) is provided. The base neural network is associated with one or more parameters.

一実施形態では、訓練システム（１０２）は、訓練データセット（１０４）から１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）及びクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）を取り出す。さらに、訓練システム（１０２）は、クラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）に基づいて１つ以上の画像（４０１Ａ、４０２Ｂ、．．．４０１Ｎ）をソートする。ソートされた１つ以上の画像（４０６）は、図４Ｃに示される通りである。訓練システム（１０２）は、バブルソート、選択ソート、マージソート、挿入ソート、クイックソートなどの１つを使用して、１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）をソートしてよい。当業者は、１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）をソートするための１つ以上の既存のソート技術の使用を理解し得る。さらに、訓練システム（１０２）は、情報（２１７）の中の各オブジェクトの位置に基づいてベースニューラルネットワークに対する入力として各オブジェクトに対応する１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）を提供する。例えば、オブジェクト「ラブラドール」に対応する１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）は、図４Ｂに示されるように、情報（２１７）の中の各オブジェクトの位置に基づいて、オブジェクト「柴犬」などに対応する１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）が後に続く、ベースニューラルネットワークに対する入力として提供される。 In one embodiment, the training system (102) retrieves one or more images (401A, 401B, ... 401N) and class labels (402A, 402B, ... 402N) from the training data set (104). Further, the training system (102) sorts one or more images (401A, 402B, ... 401N) based on the class label (402A, 402B, ... 402N). One or more sorted images (406) are as shown in FIG. 4C. The training system (102) may sort one or more images (401A, 401B, ... 401N) using one of bubble sort, selection sort, merge sort, insertion sort, quick sort, and the like. .. One of skill in the art can understand the use of one or more existing sorting techniques for sorting one or more images (401A, 401B, ... 401N). Further, the training system (102) produces one or more images (401A, 401B, ... 401N) corresponding to each object as an input to the base neural network based on the position of each object in the information (217). offer. For example, one or more images (401A, 401B, ... 401N) corresponding to the object "Labrador" are objects based on the position of each object in the information (217), as shown in FIG. 4B. One or more images (401A, 401B, ... 401N) corresponding to "Shiba Inu" etc. are provided as inputs to the subsequent base neural network.

ステップ３０３で、各入力画像について、訓練システム（１０２）は、ベースニューラルネットワークの出力、及び各入力画像に対応するクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）に基づいてベースニューラルネットワークの損失値を決定する。ベースニューラルネットワークの出力は、１つ以上のオブジェクトに対応する相似値を示す。 In step 303, for each input image, the training system (102) has the output of the base neural network and the loss value of the base neural network based on the class labels (402A, 402B, ... 402N) corresponding to each input image. To determine. The output of the base neural network shows the similarity values corresponding to one or more objects.

一実施形態では、ベースニューラルネットワーク（５０１）は、図５Ａに示されるように、１つ以上の層を有する畳み込みニューラルネットワークである。ベースニューラルネットワーク（５０１）の１つ以上の層は、図５Ａに示される畳み込み層（５０５Ａ、５０５Ｂ）、プーリング層（５０７）、正規化線形ユニット層（５０６Ａ、５０６Ｂ）、完全接続層（５０８）、及び損失層のうちの少なくとも１つである。入力画像（５０２）のそれぞれは、ベースニューラルネットワーク（５０１）に対する入力として送られ、入力画像（５０２）は、図５Ａに示されるように、ベースニューラルネットワーク（５０１）の１つ以上の層を通して伝播されて、入力画像（５０２）に対応するクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）を示すベースニューラルネットワーク（５０１）の出力（５０３）を生成する。さらに、畳み込みニューラルネットワークは、雑音除去演算、切り出し演算、モルフォロジー演算、リサイジング演算又はスケーリング演算、正規化、次元縮退などのうちの少なくとも１つを含む１つ以上の画像の前処理層（５０４）を含んでよい。当業者は、入力画像（５０２）に対する１つ以上の既存の画像前処理演算の使用を理解し得る。さらに、ベースニューラルネットワーク（５０１）は、ベースニューラルネットワーク（５０１）の各層の１つ以上のカーネル、ベースニューラルネットワーク（５０１）の１つ以上の層のための第１の層の１つ以上のカーネルと第１の層の後の第２の層の１つ以上のカーネルとの間の１つ以上の接続、１つ以上のカーネルの複数の値、及びカーネル活性化リストのうちの少なくとも１つを含む１つ以上のパラメータと関連付けられる。さらに、カーネル活性化リストは、ベースニューラルネットワーク（５０１）の第１の層の出力、１つ以上のカーネルのインデックス、及びクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）のうちの少なくとも１つを含む。 In one embodiment, the base neural network (501) is a convolutional neural network with one or more layers, as shown in FIG. 5A. One or more layers of the base neural network (501) are a convolutional layer (505A, 505B), a pooling layer (507), a normalized linear unit layer (506A, 506B), a fully connected layer (508) shown in FIG. 5A. , And at least one of the loss layers. Each of the input images (502) is sent as input to the base neural network (501), and the input image (502) propagates through one or more layers of the base neural network (501), as shown in FIG. 5A. The output (503) of the base neural network (501) indicating the class label (402A, 402B, ... 402N) corresponding to the input image (502) is generated. Further, the convolutional neural network is a preprocessing layer (504) of one or more images including at least one of noise reduction operation, clipping operation, morphology operation, resizing operation or scaling operation, normalization, dimension reduction, and the like. May include. One of skill in the art can understand the use of one or more existing image preprocessing operations on the input image (502). Further, the base neural network (501) is one or more kernels of each layer of the base neural network (501), one or more kernels of the first layer for one or more layers of the base neural network (501). And one or more connections between one or more kernels in the second layer after the first layer, multiple values in one or more kernels, and at least one of the kernel activation lists. Associated with one or more parameters including. In addition, the kernel activation list contains at least one of the output of the first layer of the base neural network (501), one or more kernel indexes, and class labels (402A, 402B, ... 402N). include.

一実施形態では、訓練システム（１０２）は、１つ以上のカーネルのそれぞれの複数の値を、ゼロ平均及び所定の分散を有する乱数値に初期化してよい。所定の分散は、ユーザインタフェース（１０３）を介してユーザ（１０１）によって提供される。別の実施形態では、訓練システム（１０２）は、ユーザインタフェース（１０３）を介してユーザ（１０１）から１つ以上のカーネルのそれぞれの複数の値を受け取ってよい。さらに、１つ以上のカーネルのそれぞれの複数の値は、行列として表される。例えば、行列として表される第１のカーネルは、図９Ａの(行列３)の通りに示される。 In one embodiment, the training system (102) may initialize each plurality of values of one or more kernels to random values having a zero mean and a given variance. The predetermined variance is provided by the user (101) via the user interface (103). In another embodiment, the training system (102) may receive each plurality of values of one or more kernels from the user (101) via the user interface (103). Further, each plurality of values of one or more kernels is represented as a matrix. For example, the first kernel, represented as a matrix, is shown as shown in FIG. 9A (matrix 3).

一実施形態では、畳み込み層（５０５Ａ、５０５Ｂ）は１つ以上のカーネルを含み、１つ以上のカーネルの各カーネルは、各カーネルの複数の値と、入力画像（５０２）又は以前の層の出力の１つ以上の値との間のドット積を計算することによって、入力画像（５０２）又は以前の層の出力の幅及び高さにわたって畳み込まれる。入力画像（５０２）又は以前の層の出力による畳み込み層（５０５Ａ、５０５Ｂ）の各カーネルの畳み込みにより、各カーネルに対応する二次元出力（つまり、活性化マップ又は特徴空間）が生成される。畳み込み層（５０５Ａ、５０５Ｂ）の出力は、入力画像（５０２）又は以前の層の出力の何らかの空間的な位置で特徴を検出すると活性化される１つ以上のカーネルを識別するために使用される。例えば、図５Ｂに示されるように、グレイスケール画像に対応する入力画像（５０２）を、寸法６×６の行列として表し、畳み込み層（５０５Ａ）の１つ以上のカーネル（５０９）の中の第１のカーネルを、寸法３×３の行列として表し、畳み込み層（５０５Ａ）の出力（つまり、活性化マップ（５１０））は、寸法４×４の行列として表される。カラー画像に対応する入力画像（５０２）は、３つの行列を使用し、表される。３つの行列は色空間を示す。例えば、３つの行列は、色空間ＲＧＢの「赤」、「緑」、「青」に相当する場合があり、３つの行列は、ＣＭＹＫ色空間の「シアン」、「マゼンタ」、及び「イエロー」に相当する場合があり、３つの空間は、ＹＵＶ色空間の「ルマ」、「クロミナンス－Ｕ」、及び「クロミナンス－Ｖ」に相当する場合があるなどである。カラー画像に対応する入力画像（５０２）の３つの行列は畳み込まれて、第１のカーネルが３つの行列の中に複製され、畳み込み後の３つの行列は、図５Ｃに示されるように単一の行列として表される畳み込み層（５０５Ａ）の出力（つまり、活性化マップ（５１０））を生成するために追加される。さらに、図５Ｄに示されるように、入力画像（５０２）は、１つ以上のカーネル（５０９）を含む畳み込み層（５０５Ａ）に対する入力として提供され、１つ以上のカーネル（５０９）のそれぞれに対応する畳み込み層（５０５Ａ）の出力（つまり、活性化マップ（５１０））は、図５Ｄに示される。 In one embodiment, the convolutional layer (505A, 505B) comprises one or more kernels, each kernel of one or more kernels having multiple values of each kernel and an input image (502) or an output of the previous layer. By calculating the dot product between one or more values of, it is convolved over the width and height of the input image (502) or the output of the previous layer. Convolution of the input image (502) or the output of the previous layer Convolution of each kernel of the layers (505A, 505B) produces a two-dimensional output (ie, activation map or feature space) corresponding to each kernel. The output of the convolutional layer (505A, 505B) is used to identify one or more kernels that are activated when a feature is detected at some spatial location in the input image (502) or the output of the previous layer. .. For example, as shown in FIG. 5B, the input image (502) corresponding to the grayscale image is represented as a matrix of dimensions 6 × 6 and is the first in one or more kernels (509) of the convolution layer (505A). The kernel of 1 is represented as a matrix of dimensions 3x3, and the output of the convolution layer (505A) (ie, the activation map (510)) is represented as a matrix of dimensions 4x4. The input image (502) corresponding to the color image is represented using three matrices. The three matrices represent the color space. For example, the three matrices may correspond to the "red", "green", "blue" in the color space RGB, and the three matrices are "cyan", "magenta", and "yellow" in the CMYK color space. The three spaces may correspond to the YUV color spaces "Luma", "Chrominance-U", and "Chrominance-V". The three matrices of the input image (502) corresponding to the color image are convolved, the first kernel is duplicated into the three matrices, and the three matrices after convolution are simply as shown in FIG. 5C. Added to generate the output (ie, activation map (510)) of the convolutional layer (505A) represented as a single matrix. Further, as shown in FIG. 5D, the input image (502) is provided as input to the convolution layer (505A) containing one or more kernels (509) and corresponds to each of one or more kernels (509). The output of the convolutional layer (505A) (ie, the activation map (510)) is shown in FIG. 5D.

一実施形態では、正規化線形ユニット層（５０６Ａ、５０６Ｂ）は、正規化線形ユニット層（５０６Ａ、５０８Ａ）の入力と出力（つまり、活性化マップ（５１０））との間のマッピングを示す活性化関数を含む。正規化線形ユニット層（５０６Ａ、５０６Ｂ）は、例えば、正規化線形関数、双曲線正接関数、シグモイド関数などの活性化関数を含む。正規化線形関数は、以下に示す方程式を使用し、表される。
Ｏｕｔｐｕｔ_ｒｅｌｕ＝ｍａｘｉｍｕｍ(０,ｉｎｐｕｔ_ｒｅｌｕ)…(数式１)
上式では、ｉｎｐｕｔ＿ｒｅｌｕは、正規化線形ユニット層（５０６Ａ、５０６Ｂ）に対する入力を示し、ｏｕｔｐｕｔ＿ｒｅｌｕは、正規化線形ユニット層（５０６Ａ、５０６Ｂ）の出力（つまり、活性化マップ（５１０））を示す。入力画像（５０２）又は正規化線形関数を活性化関数とする以前の層（５１１）の出力に対応する正規化線形ユニット層（５０６Ａ）の出力（つまり、活性化マップ（５１０））は、図５Ｅに示される。 In one embodiment, the normalized linear unit layer (506A, 506B) is an activation showing a mapping between the input and output (ie, activation map (510)) of the normalized linear unit layer (506A, 508A). Includes functions. The normalized linear unit layer (506A, 506B) includes an activation function such as a normalized linear function, a hyperbolic tangent function, and a sigmoid function. The rectified linear function is expressed using the equation shown below.
Output_relu = maximum (0, input_relu)… (Formula 1)
In the above equation, input_relu indicates the input to the normalized linear unit layer (506A, 506B) and output_relu indicates the output (ie, activation map (510)) of the normalized linear unit layer (506A, 506B). The output of the normalized linear unit layer (506A) corresponding to the output of the input image (502) or the previous layer (511) with the normalized linear function as the activation function (ie, the activation map (510)) is shown in the figure. Shown in 5E.

一実施形態では、プーリング層（５０７）は、以前の層（５１１）の出力の空間サイズを減少させるための非線形関数を実装する。例えば、ベースニューラルネットワーク（５０１）のプーリング層は、最大プーリング、平均プーリング、関心領域プーリングなどのうちの１つを使用してよい。２×２のカーネルサイズを有する最大プーリング技術によるプーリング層（５０７）は、以前の層（５１１）の出力を２のストライドを有する２×２の１つ以上のブロックに分割し、１つ以上のブロックのそれぞれの最大値を識別する。図５Ｆに示されるように、１つ以上のブロックのそれぞれの最大値は連結されて、プーリング層（５０７の出力（つまり、活性化マップ（５１０））を形成する。 In one embodiment, the pooling layer (507) implements a non-linear function to reduce the spatial size of the output of the previous layer (511). For example, the pooling layer of the base neural network (501) may use one of maximum pooling, average pooling, region of interest pooling, and the like. The maximum pooling technology pooling layer (507) with a 2x2 kernel size divides the output of the previous layer (511) into one or more blocks of 2x2 with a stride of 2 and one or more. Identify the maximum value for each of the blocks. As shown in FIG. 5F, the maximum values of each of the one or more blocks are concatenated to form a pooling layer (output of 507 (ie, activation map (510))).

一実施形態では、図５Ｇに示されるように、完全接続層（５０８）は１つ以上のニューロン（５１２）を含み、以前の層（５１１）の各出力は、１つ以上のニューロン（５１２）の中の各ニューロンに対する入力として提供される。完全接続層（５０８）の１つ以上のニューロン（５１２）は、訓練データセット（１０４）の中の一意のクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）の数に等しい。さらに、完全接続層（５０８）の１つ以上のニューロン（５１２）は、完全接続層（５０８）の入力と出力との間のマッピングを示す活性化関数を含む。例えば、活性化関数は、シグモイド関数、ソフトマックス関数、ガウス関数などのうちの１つを含んでよい。例えば、ソフトマックス関数は、図９Ｂの数式２に示す方程式を使用し、表される。 In one embodiment, as shown in FIG. 5G, the fully connected layer (508) comprises one or more neurons (512) and each output of the previous layer (511) is one or more neurons (512). It is provided as an input for each neuron in. One or more neurons (512) in the fully connected layer (508) is equal to the number of unique class labels (402A, 402B, ... 402N) in the training data set (104). In addition, one or more neurons (512) in the fully connected layer (508) include an activation function that indicates a mapping between the inputs and outputs of the fully connected layer (508). For example, the activation function may include one of a sigmoid function, a softmax function, a Gaussian function, and the like. For example, the softmax function is expressed using the equation shown in Equation 2 in FIG. 9B.

図９Ｂの数式２では、Ｋは、完全接続層（５０８）のニューロン（５１２）の数Ｚ_ｉを示し、Ｚ_ｊは、１つ以上のニューロン（５１２）の出力を示す。完全接続層（５０８）の出力は、ベースニューラルネットワーク（５０１）の出力（５０３）である。ソフトマックス関数を有するベースニューラルネットワーク（５０１）の出力（５０３）は、ゼロから１の範囲の１つ以上の値を有するベクトルである。ベクトルの中の１つ以上の値は、訓練データベース（１０４）の１つ以上のオブジェクトに対応する相似値を示す。例えば、１つ以上の値及び対応する１つ以上のオブジェクトを有するベクトルは、図９Ｂの数式３に示す通りである。 In Equation 2 of FIG. 9B, K represents the number Z _i of neurons (512) in the fully connected layer (508) and Z _j represents the output of one or more neurons (512). The output of the fully connected layer (508) is the output (503) of the base neural network (501). The output (503) of a base neural network (501) with a softmax function is a vector with one or more values in the range zero to one. One or more values in the vector indicate similarity values corresponding to one or more objects in the training database (104). For example, a vector having one or more values and one or more corresponding objects is as shown in Equation 3 of FIG. 9B.

ベースニューラルネットワーク（５０１）の出力（５０３）は、ベースニューラルネットワーク（５０１）に提供された入力画像（５０２）に存在するオブジェクトとして、訓練データセット（１０４）の１つ以上のオブジェクトの中で「０．８５」の最高の相似値を有するオブジェクト「柴犬」を示す。 The output (503) of the base neural network (501) is "as an object present in the input image (502) provided to the base neural network (501)" in one or more objects of the training data set (104). The object "Shiba dog" having the highest similarity value of "0.85" is shown.

一実施形態では、各入力画像（５０２）について、ベースニューラルネットワーク（５０１）の損失値は、ベースニューラルネットワーク（５０１）の出力（５０３）及び各入力画像（５０２）に対応するクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）に基づいて決定される。ベースニューラルネットワーク（５０１）の損失層は、損失値を決定するために使用される。損失値は、ベースニューラルネットワーク（５０１）の、各入力画像（５０２）に対応するクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）に等しい出力（５０３）を生成する予測能力を示す。訓練システム（１０２）は、階層交差エントロピー技術に基づいて損失値を決定する。当業者は、例えばソフトマックス損失、ユークリッド損失などの他の損失計算技術の使用を理解し得る。階層交差エントロピー技術に基づいたベースニューラルネットワーク（５０１）の損失値は、図９Ｃの数式４に示す方程式を使用して決定される。 In one embodiment, for each input image (502), the loss value of the base neural network (501) is the class label (402A,, corresponding to the output (503) of the base neural network (501) and each input image (502). It is determined based on 402B, ... 402N). The loss layer of the base neural network (501) is used to determine the loss value. The loss value indicates the predictive ability of the base neural network (501) to produce an output (503) equal to the class label (402A, 402B, ... 402N) corresponding to each input image (502). The training system (102) determines the loss value based on the hierarchical cross entropy technique. One of skill in the art can understand the use of other loss calculation techniques such as softmax loss, Euclidean loss. The loss value of the base neural network (501) based on the hierarchical cross entropy technique is determined using the equation shown in Equation 4 of FIG. 9C.

上式では、Ｍは情報（２１７）の中のオブジェクトの数を示し、Ｎは訓練データセット（１０４）の中の一意のクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）の数又はベースニューラルネットワーク（５０１）の完全接続層（５０８）の中のニューロン（５１２）の数を示す。 In the above equation, M is the number of objects in the information (217) and N is the number of unique class labels (402A, 402B, ... 402N) in the training data set (104) or the base neural network. The number of neurons (512) in the fully connected layer (508) of (501) is shown.

一実施形態では、図５Ｈに示されるように、訓練システム（１０２）は、ベースニューラルネットワーク５０１の１つ以上の層のための第１の層（５１３）の１つ以上のカーネル（５０９）と、第１の層（５１３）の後の第２の層（５１４）の１つ以上のカーネル（５０９）との間の１つ以上の接続（５１５Ａ、５１５Ｂ、．．．５１５Ｎ）を初期化してよい。別の実施形態では、訓練システム（１０２）は、ユーザインタフェース（１０３）を介してユーザ（１０１）から受け取られた第２のユーザ入力に基づいて１つ以上の接続（５１５Ａ、５１５Ｂ、．．．５１５Ｎ）を初期化してよい。 In one embodiment, as shown in FIG. 5H, the training system (102) is with one or more kernels (509) of the first layer (513) for one or more layers of the base neural network 501. Initialize one or more connections (515A, 515B, ... 515N) between one or more kernels (509) in the second layer (514) after the first layer (513). good. In another embodiment, the training system (102) has one or more connections (515A, 515B, ...) Based on a second user input received from the user (101) via the user interface (103). 515N) may be initialized.

一実施形態では、訓練システム（１０２）は、ユーザインタフェース（１０３）を介してユーザ（１０１）から受け取られた第２のユーザ入力、及びベースニューラルネットワーク（５０１）の出力（５０３）のうちの少なくとも１つに基づいてカーネル活性化リストを初期化してよい。ベースニューラルネットワーク（５０１）の出力（つまり、活性化マップ（５１０））、１つ以上のカーネル（５０９）のインデックス（５１７）、及びベースニューラルネットワーク（５０１）にオブジェクト「ラブラドール」に対応する入力画像（５０２）を提供するときに初期化されたクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）のうちの少なくとも１つを含むカーネル活性化リスト（５１６）は、図５Ｉに示される通りである。ベースニューラルネットワーク（５０１）の１つ以上の層の１つ以上のカーネル（５０９）のインデックス（５１７）は、入力画像（５０２）に対応する１つ以上のカーネルに比較して、より大きい相似性を有する活性化マップ（５１０）生成することを示す。一実施形態では、１つ以上のカーネル（５０９）のインデックス（５１７）は、ベースニューラルネットワーク（５０１）の１つ以上の層の出力（つまり、活性化マップ（５１０））に基づいてユーザインタフェース（１０３）を介してユーザ（１０１）によって選択されてよい。 In one embodiment, the training system (102) has at least one of a second user input received from the user (101) via the user interface (103) and an output (503) of the base neural network (501). You may initialize the kernel activation list based on one. The output of the base neural network (501) (ie, the activation map (510)), the index (517) of one or more kernels (509), and the input image corresponding to the object "Labrador" in the base neural network (501). A kernel activation list (516) containing at least one of the class labels (402A, 402B, ... 402N) initialized when providing (502) is as shown in FIG. 5I. The index (517) of one or more kernels (509) in one or more layers of the base neural network (501) has greater similarity compared to one or more kernels corresponding to the input image (502). It is shown to generate an activation map (510) with. In one embodiment, the index (517) of one or more kernels (509) is based on the output of one or more layers of the base neural network (501) (ie, the activation map (510)). It may be selected by the user (101) via 103).

ステップ３０４で、訓練システム（１０２）は、出力（５０３）、ベースニューラルネットワーク（５０１）の損失値、及びニューラルネットワークを生成するための第２のユーザ入力のうちの少なくとも１つに基づいて、各入力画像（５０２）について１つ以上のパラメータを更新する。ニューラルネットワークは、オブジェクト認識のために使用される。 In step 304, the training system (102) is based on at least one of an output (503), a loss value of the base neural network (501), and a second user input to generate the neural network, respectively. Update one or more parameters for the input image (502). Neural networks are used for object recognition.

一実施形態では、１つ以上のパラメータを更新することは、ベースニューラルネットワーク（５０１）の１つ以上の層の中の少なくとも１つの層で１つ以上のカーネル（５０９）を追加することと、ベースニューラルネットワーク（５０１）の１つ以上の層の１つ以上のカーネル（５０９）の複数の値の中の１つ以上の値を修正することと、ベースニューラルネットワーク（５０１）の１つ以上の層のための第１の層の１つ以上のカーネル（５０９）と第１の層の後の第２の層の１つ以上のカーネル（５０９）との間の１つ以上の接続（５１５Ａ、５１５Ｂ、．．．５１５Ｎ）を修正することと、カーネル活性化リスト（５１６）を修正することのうちの少なくとも１つを含む。情報（２１７）の中の１つ以上のオブジェクトのそれぞれに、ベースニューラルネットワーク（５０１）に対する入力として１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）のそれぞれを提供すること、及びベースニューラルネットワーク（５０１）の１つ以上のパラメータを更新することを含む方法ステップは、ベースニューラルネットワーク（５０１）の訓練として示される。 In one embodiment, updating one or more parameters means adding one or more kernels (509) in at least one layer in one or more layers of the base neural network (501). Modifying one or more of the values in one or more kernels (509) in one or more layers of the base neural network (501) and one or more of the base neural network (501). One or more connections (515A, 515A,) between one or more kernels (509) in the first layer for the layer and one or more kernels (509) in the second layer after the first layer. Includes at least one of modifying 515B, ... 515N) and modifying the kernel activation list (516). To provide each of the one or more objects in the information (217) with one or more images (401A, 401B, ... 401N) as inputs to the base neural network (501), and the base neural. A method step involving updating one or more parameters of the network (501) is shown as training of the base neural network (501).

一実施形態では、訓練システム（１０２）は、出力（５０３）及びベースニューラルネットワーク（５０１）の損失値に基づいてベースニューラルネットワーク（５０１）の１つ以上の層の１つ以上のカーネル（５０９）の複数の値の間で１つ以上の値を修正することを含む１つ以上のパラメータを更新してよい。訓練システム（１０２）は、バックプロパゲーション技術を使用して、１つ以上の層の１つ以上のカーネル（５０９）の複数の値の間で１つ以上の値を更新する。例えば、寸法Ｈ×Ｗ及び１つ以上のカーネル（５０９）のそれぞれの寸法をｋ１×ｋ２とする入力画像（５０２）考える。１つ以上のカーネル（５０９）のそれぞれの１つ以上の値の変更は、図９Ｃの数式５に示される方程式を使用し、決定される。 In one embodiment, the training system (102) is one or more kernels (509) of one or more layers of the base neural network (501) based on the output (503) and the loss values of the base neural network (501). One or more parameters may be updated, including modifying one or more values among multiple values of. The training system (102) uses backpropagation techniques to update one or more values among multiple values in one or more kernels (509) in one or more layers. For example, consider an input image (502) in which the dimensions H × W and the respective dimensions of one or more kernels (509) are k1 × k2. The change of one or more values of each of the one or more kernels (509) is determined using the equation shown in Equation 5 of FIG. 9C.

上式では、Ｅは、損失値を決定するために使用される損失計算技術を示し、xに係る項は、入力画像（５０２）又は以前の層（５１１）の出力の１つ以上の値を示し、Wに係る項はベースニューラルネットワーク（５０１）の１つ以上の層の１つ以上のカーネル（５０９）の１つ以上の値を示し、左辺は、ベースニューラルネットワーク（５０１）の１つ以上の層の１つ以上のカーネル（５０９）の１つ以上の値に追加される値の変更を示す。修正前の１つ以上のカーネル（５０９）値の複数の値の中の１つ以上の値は図６Ａに示され、バックプロパゲーションに基づいた訓練システム（１０２）による修正後の１つ以上のカーネル（５０９）値の複数の値の中の１つ以上の値は、図６Ｂに示される。 In the above equation, E represents the loss calculation technique used to determine the loss value, and the term with respect to x is one or more values of the output of the input image (502) or the previous layer (511). Shown, the term relating to W indicates one or more values of one or more kernels (509) of one or more layers of the base neural network (501), and the left side is one or more of the base neural network (501). Indicates a change in the value added to one or more values in one or more kernels (509) in the layer of. One or more of the plurality of values of one or more kernel (509) values before modification is shown in FIG. 6A and one or more after modification by the training system (102) based on backpropagation. One or more of the plurality of kernel (509) values is shown in FIG. 6B.

一実施形態では、訓練システム（１０２）は、情報（２１７）、訓練データセット（１０４）に格納された１つ以上の入力画像（４０１Ａ、４０１Ｂ、．．．、４０１Ｎ）、及びクラスラベル（４０２Ａ、４０２Ｂ、．．．４２０Ｎ）、テストデータセット（１０５）に格納された１つ以上のテスト画像、ベースニューラルネットワーク（５０１）、損失値、ベースニューラルネットワーク（５０１）と関連付けられた１つ以上のパラメータなどのうちの少なくとも１つを表示する。訓練システム（１０２）は、ユーザインタフェース（１０３）を介してユーザ（１０１）から受け取られた第２のユーザ入力、及びベースニューラルネットワーク（５０１）の損失値に基づいて、ベースニューラルネットワーク（５０１）の１つ以上の層の中の少なくとも１つの層で１つ以上のカーネル（５０９）を追加することを含む１つ以上のパラメータを更新してよい。ユーザ（１０１）は、ベースニューラルネットワーク（５０１）を訓練するために情報（２１７）の中の１つ以上のオブジェクトから新しいオブジェクトを選択した後、１つ以上のカーネル（５０９）を追加してよい。さらに、ユーザ（１０１）は、１つ以上の層の出力（つまり、活性化マップ（５１０））及びベースニューラルネットワーク（５０１）の損失値に基づいて、１つ以上のカーネル（５０９）を追加してよい。例えば、図６Ｃに示される３つの層及び７つのカーネルを含むベースニューラルネットワーク（５０１）の場合、ユーザ（１０１）は、図６Ｄに示される点線で示される１つ以上の層の出力（つまり、活性化マップ（５１０））に基づいて１つ以上のカーネル（５０９）を追加してよい。 In one embodiment, the training system (102) comprises information (217), one or more input images (401A, 401B, ..., 401N) stored in the training data set (104), and a class label (402A). , 402B, ... 420N), one or more test images stored in the test data set (105), the base neural network (501), the loss value, and one or more associated with the base neural network (501). Display at least one of the parameters and so on. The training system (102) of the base neural network (501) is based on a second user input received from the user (101) via the user interface (103) and a loss value of the base neural network (501). One or more parameters may be updated, including adding one or more kernels (509) in at least one layer in one or more layers. User (101) may add one or more kernels (509) after selecting a new object from one or more objects in information (217) to train the base neural network (501). .. In addition, the user (101) adds one or more kernels (509) based on the output of one or more layers (ie, the activation map (510)) and the loss value of the base neural network (501). It's okay. For example, in the case of a base neural network (501) containing three layers and seven kernels shown in FIG. 6C, the user (101) has the output of one or more layers (ie, the dotted line shown in FIG. 6D). One or more kernels (509) may be added based on the activation map (510)).

一実施形態では、ユーザ（１０１）は、１つ以上の層の出力（つまり、活性化マップ（５１０））及びベースニューラルネットワーク（５０１）の損失値に基づいて、ベースニューラルネットワーク（５０１）の１つ以上の層のための第１の層の１つ以上のカーネル（５０９）と第１の層の後の第２の層の１つ以上のカーネル（５０９）との間の１つ以上の接続（５１５Ａ、５１５Ｂ、．．．５１５Ｎ）を修正してよい。例えば、図６Ｄに示される３つの層及び８つのカーネルを含むベースニューラルネットワーク（５０１）の場合、ユーザ（１０１）は、図６Ｅに示されるように、点線で示される１つ以上のカーネル（５０９）間の１つ以上の接続（５１５Ａ、５１５Ｂ、．．．５１５Ｎ）を修正してよい。１つ以上の接続（５１５Ａ、５１５Ｂ、．．．５１５Ｎ）の修正は、１つ以上の接続（５１５Ａ、５１５Ｂ、．．．５１５Ｎ）を追加することと、ベースニューラルネットワーク（５０１）の１つ以上の層のための第１の層の１つ以上のカーネル（５０９）と、第１の層の後の第２の層の１つ以上のカーネル（５０９）との間の１つ以上の接続（５１５Ａ、５１５Ｂ、．．．５１５Ｎ）を削除することのうちの少なくとも１つを含む。 In one embodiment, the user (101) is one of the base neural networks (501) based on the output of one or more layers (ie, the activation map (510)) and the loss value of the base neural network (501). One or more connections between one or more kernels (509) in the first layer for one or more layers and one or more kernels (509) in the second layer after the first layer. (515A, 515B, ... 515N) may be modified. For example, in the case of a base neural network (501) containing three layers and eight kernels shown in FIG. 6D, the user (101) has one or more kernels (509) shown in dotted lines, as shown in FIG. 6E. ) One or more connections (515A, 515B, ... 515N) may be modified. Modification of one or more connections (515A, 515B, ... 515N) involves adding one or more connections (515A, 515B, ... 515N) and one or more of the base neural networks (501). One or more connections between one or more kernels (509) in the first layer for the first layer and one or more kernels (509) in the second layer after the first layer. 515A, 515B, ... 515N) includes at least one of the deletions.

一実施形態では、ユーザ（１０１）は、ベースニューラルネットワーク（５０１）の１つ以上の層の出力（つまり、活性化マップ（５１０））及びベースニューラルネットワーク（５０１）の損失値に基づいてカーネル活性化リスト（５１６）を修正してよい。ユーザ（１０１）は、カーネルリストを追加し、ベースニューラルネットワーク（５０１）を訓練するために情報（２１７）の１つ以上のオブジェクトから新しいオブジェクトを選択すると、第１の層の出力（つまり、活性化マップ（５１０））及びクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）を出力してよい。例えば、オブジェクト「ラブラドール」の１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）でベースニューラルネットワーク（５０１）を訓練した後、カーネル活性化リスト（５１６）は、図５Ｉに示される通りであり、ユーザ（１０１）がベースニューラルネットワーク（５０１）を訓練するためにオブジェクト「ピットブル」の１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）を提供すると、カーネル活性化リスト（５１６）は、図６Ｆに示されるように更新されてよい。さらに、ユーザ（１０１）がベースニューラルネットワーク（５０１）を訓練するためにオブジェクト「ビーグル」の１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）を提供すると、カーネル活性化リスト（５１６）は、図６Ｇに示される通りに更新されてよい。一実施形態では、ユーザ（１０１）は、図６Ｈに示されるように、１つ以上の層の出力（つまり、活性化マップ（５１０））及びベースニューラルネットワーク（５０１）の損失値のうちの少なくとも１つに基づいて、カーネル活性化リスト（５１６）の１つ以上のオブジェクトに１つ以上のカーネル（５０９）のインデックス（５１７）を追加することによって、図６Ｇに示されるように、１つ以上のカーネル（５０９）のインデックス（５１７）を修正してよい。１つ以上のカーネル（５０９）のインデックス（５１７）の修正は、カーネル活性化リスト（５１６）の１つ以上のオブジェクトのための１つ以上のカーネル（５０９）のインデックス（５１７）の追加及び削除のうちの少なくとも１つを含む。 In one embodiment, the user (101) has kernel activity based on the output of one or more layers of the base neural network (501) (ie, the activation map (510)) and the loss value of the base neural network (501). The conversion list (516) may be modified. When the user (101) adds a kernel list and selects a new object from one or more objects of information (217) to train the base neural network (501), the output (ie, activity) of the first layer. The kernel map (510)) and class labels (402A, 402B, ... 402N) may be output. For example, after training the base neural network (501) with one or more images (401A, 401B, ... 401N) of the object "Labrador", the kernel activation list (516) is as shown in FIG. 5I. Yes, when the user (101) provides one or more images (401A, 401B, ... 401N) of the object "pitbull" to train the base neural network (501), the kernel activation list (516) is displayed. , May be updated as shown in FIG. 6F. Further, when the user (101) provides one or more images (401A, 401B, ... 401N) of the object "Beagle" to train the base neural network (501), the kernel activation list (516) is displayed. , May be updated as shown in FIG. 6G. In one embodiment, the user (101) has at least one of the outputs of one or more layers (ie, the activation map (510)) and the loss value of the base neural network (501), as shown in FIG. 6H. One or more, as shown in FIG. 6G, by adding one or more kernel (509) indexes (517) to one or more objects in the kernel activation list (516) based on one. The index (517) of the kernel (509) of the above may be modified. Modification of one or more kernel (509) indexes (517) adds and removes one or more kernel (509) indexes (517) for one or more objects in the kernel activation list (516). Includes at least one of.

一実施形態では、ユーザ（１０１）は、ベースニューラルネットワーク（５０１）の損失値、ベースニューラルネットワーク（５０１）の第１の層の出力（つまり、活性化マップ（５１０））、及び訓練データセット（１０４）に対する１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）の追加のうちの少なくとも１つに基づいて、第１のユーザ入力を使用して、ユーザインタフェース（１０３）を介して階層ツリー構造を修正する。修正は、階層ツリー構造への１つ以上のオブジェクトの追加、及び階層ツリー構造の中の１つ以上のオブジェクトの位置の修正のうちの少なくとも１つを含む。例えば、訓練データセット（１０４）の新しいオブジェクトの１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）を識別すると、ユーザ（１０１）は、図６Ｉに示されるように、情報（２１７）に新しいオブジェクト（つまり、「コンゴライオン」）を追加し、さらにベースニューラルネットワーク（５０１）を訓練してよい。 In one embodiment, the user (101) has the loss value of the base neural network (501), the output of the first layer of the base neural network (501) (ie, the activation map (510)), and the training data set (ie). Based on at least one of the additions of one or more images (401A, 401B, ... 401N) to 104), a hierarchical tree via the user interface (103) using the first user input. Modify the structure. Modifications include adding one or more objects to the hierarchical tree structure and modifying the position of one or more objects in the hierarchical tree structure. For example, when one or more images (401A, 401B, ... 401N) of a new object in the training data set (104) are identified, the user (101) is informed (217) as shown in FIG. 6I. You may add new objects (ie, "Congolion") and further train the base neural network (501).

一実施形態では、訓練システム（１０２）は、ベースニューラルネットワーク（５０１）に対する入力として各オブジェクトに対応する１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）を提供し、損失値と所定の閾値との比較の結果に基づいてベースニューラルネットワーク（５０１）の１つ以上のパラメータを更新する。例えば、所定の閾値は０．０１であってよい。１つ以上のオブジェクトの中の各オブジェクトのためのベースニューラルネットワーク（５０１）の訓練は、損失値が所定の閾値未満になるまで実行される。ベースニューラルネットワーク（５０１）の訓練完了後、訓練システム（１０２）は、ベースニューラルネットワーク（５０１）と関連付けられた１つ以上のパラメータを使用して、ニューラルネットワークを生成する。例えば、生成されたニューラルネットワークは、訓練されたベースニューラルネットワーク（５０１）である。別の例では、生成されたニューラルネットワーク及びクラス「犬」の１つ以上のオブジェクトのためにベースニューラルネットワーク（５０１）を訓練した後に取得されたカーネル活性化リスト（５１６）は、図６Ｊに示される通りである。さらに、生成されたニューラルネットワーク（６０１）は、オブジェクト認識のために使用される。オブジェクト認識は、テストデータセット（１０５）の１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）のために実行される。テストデータセット（１０５）の１つ以上のテスト画像は、図７Ａに示されるように、１つ以上のテスト画像のためのクラスラベル（４０２Ａ、４０２Ｂ、．．．４０２Ｎ）を含まない。 In one embodiment, the training system (102) provides one or more images (401A, 401B, ... 401N) corresponding to each object as inputs to the base neural network (501), with loss values and predetermined. Update one or more parameters of the base neural network (501) based on the result of comparison with the threshold. For example, the predetermined threshold may be 0.01. Training of the base neural network (501) for each object in one or more objects is performed until the loss value falls below a predetermined threshold. After training of the base neural network (501) is complete, the training system (102) uses one or more parameters associated with the base neural network (501) to generate the neural network. For example, the generated neural network is a trained base neural network (501). In another example, the kernel activation list (516) obtained after training the base neural network (501) for the generated neural network and one or more objects of class "dog" is shown in FIG. 6J. As you can see. In addition, the generated neural network (601) is used for object recognition. Object recognition is performed for one or more images (401A, 401B, ... 401N) of the test data set (105). One or more test images in the test data set (105) do not include class labels (402A, 402B, ... 402N) for one or more test images, as shown in FIG. 7A.

一実施形態では、生成されたニューラルネットワーク（６０１）を使用して、オブジェクト認識を実行するために、訓練システム（１０２）は、ニューラルネットワーク（６０１）の第１の層の入力として、ユーザ（１０１）から受け取られた、テストデータセット（１０５）に格納された１つ以上のテスト画像の中のテスト画像を提供してよい。さらに、訓練システム（１０２）は、ニューラルネットワーク（６０１）の第１の層の出力（つまり、活性化マップ（５１０））をカーネル活性化リスト（５１６）と比較する。さらに、比較に基づいて、訓練システム（１０２）は、カーネル活性化リスト（５１６）から１つ以上のカーネル（５０９）を識別する。その後、訓練システム（１０２）は、ニューラルネットワーク（６０１）の１つ以上の層の識別された１つ以上のカーネル（５０９）を介してテスト画像を伝播する。最後に、訓練システム（１０２）は、ニューラルネットワーク（６０１）の出力（５０３）に基づいてテスト画像（７０１）のオブジェクトを認識する。例えば、ニューラルネットワーク（６０１）の第１の層に対する入力として提供されたテスト画像（７０１）を考えると、ニューラルネットワーク（６０１）の第１の層の出力（つまり、活性化マップ（５１０））は図７Ｂに示される通りである。ニューラルネットワーク（６０１）の第１の層の出力（つまり、活性化マップ（５１０））は、図６Ｊに示されるニューラルネットワーク（６０１）と関連付けられたカーネル活性化リスト（５１６）と比較される。第１の層の出力（つまり、活性化マップ（５１０））をオブジェクト「ビーグル」に類似させる。ニューラルネットワーク（６０１）は、インデックス（５１７）「７」及び「１１」を有する１つ以上のカーネル（５０９）を介してテスト画像（７０１）を伝播する。さらに、ニューラルネットワーク（６０１）の出力（５０３）は、「ビーグル」としてテスト画像（７０１）のオブジェクトを認識する。 In one embodiment, in order to perform object recognition using the generated neural network (601), the training system (102) receives the user (101) as an input in the first layer of the neural network (601). ) May provide a test image among one or more test images stored in the test data set (105). In addition, the training system (102) compares the output of the first layer of the neural network (601) (ie, the activation map (510)) with the kernel activation list (516). Further, based on the comparison, the training system (102) identifies one or more kernels (509) from the kernel activation list (516). The training system (102) then propagates the test image through one or more identified kernels (509) of one or more layers of the neural network (601). Finally, the training system (102) recognizes the object of the test image (701) based on the output (503) of the neural network (601). For example, given the test image (701) provided as an input to the first layer of the neural network (601), the output of the first layer of the neural network (601) (ie, the activation map (510)) is. As shown in FIG. 7B. The output of the first layer of the neural network (601) (ie, the activation map (510)) is compared to the kernel activation list (516) associated with the neural network (601) shown in FIG. 6J. Makes the output of the first layer (ie, the activation map (510)) similar to the object "Beagle". The neural network (601) propagates the test image (701) through one or more kernels (509) having indexes (517) "7" and "11". Further, the output (503) of the neural network (601) recognizes the object of the test image (701) as a "beagle".

画像内のオブジェクト認識のためにニューラルネットワーク（６０１）を生成するための方法は、訓練システム（１０２）によって、訓練データセット（１０４）の１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）でベースニューラルネットワーク（５０１）を訓練してニューラルネットワーク（６０１）を生成することを含む。類似するオブジェクトのクラス（例えば、「犬」）の場合、ベースニューラルネットワーク（５０１）は、訓練のために類似するオブジェクトの各クラス（つまり、「ラブラドール」、「ピットブル」、「ビーグル」など）により大きい訓練データセット（１０４）を必要としない。類似するオブジェクトを認識するために使用される特徴は、１つ以上のカーネル（５０９）を有する場合があり、したがって訓練された１つ以上のカーネル（５０９）が、より小さい訓練データセット（１０４）を有するオブジェクトに使用され得る。情報（２１７）を使用して、訓練されたベースニューラルネットワーク（５０１）は、より速い訓練を可能にする。さらに、テスト画像（７０１）は、ニューラルネットワーク（６０１）の１つ以上のカーネル（５０９）のサブセットを通して伝播され、したがってニューラルネットワーク（６０１）は、より少ない時間及びより高い精度でオブジェクト認識を実行する。 The method for generating the neural network (601) for object recognition in the image is by the training system (102) one or more images (401A, 401B, ... 401N) of the training data set (104). Includes training the base neural network (501) in to generate the neural network (601). For similar object classes (eg, "dog"), the base neural network (501) is based on each class of similar objects (ie, "labrador", "pitbull", "beagle", etc.) for training purposes. Does not require a large training data set (104). The feature used to recognize similar objects may have one or more kernels (509), so one or more trained kernels (509) have a smaller training dataset (104). Can be used for objects with. Using information (217), a trained base neural network (501) allows for faster training. In addition, the test image (701) is propagated through one or more subsets of the kernel (509) of the neural network (601), so that the neural network (601) performs object recognition in less time and with higher accuracy. ..

コンピュータシステム
図８は、本開示と一致する実施形態を実装するための例示的なコンピュータシステム（８００）のブロック図を示す。一実施形態では、コンピュータシステム（８００）は、方法を実装するために使用されてよい。コンピュータシステム（８００）は、中央演算処理装置（「ＣＰＵ」又は「プロセッサ」）（８０２）を含んでよい。プロセッサ（８０２）は、実行時の動的資源配分のためにプログラム構成要素を実行するための少なくとも１つのプロセッサを含んでよい。プロセッサ（８０２）は、統合システム（バス）コントローラ、メモリ（８０５）管理制御ユニット、浮動小数点ユニット、グラフィックスプロセッシングユニット、デジタル信号処理ユニットなどの特殊処理ユニットを含んでよい。 Computer System FIG. 8 shows a block diagram of an exemplary computer system (800) for implementing embodiments consistent with the present disclosure. In one embodiment, a computer system (800) may be used to implement the method. The computer system (800) may include a central processing unit (“CPU” or “processor”) (802). The processor (802) may include at least one processor for executing program components for dynamic resource allocation at run time. The processor (802) may include a special processing unit such as an integrated system (bus) controller, a memory (805) management control unit, a floating point unit, a graphics processing unit, a digital signal processing unit, and the like.

プロセッサ（８０２）は、入出力（Ｉ／Ｏ）インタフェース（８０１）を介して１つ以上のＩ／Ｏ装置（不図示）と通信して配置されてよい。Ｉ／Ｏインタフェース（８０１）は、音声、アナログ、デジタル、モノラル、ＲＣＡ、ステレオ、ＩＥＥＥ－（１３９４）、シリアルバス、ユニバーサルシリアルバス（ＵＳＢ）、赤外線、ＰＳ／２、ＢＮＣ、同軸、構成要素、合成物、デジタルビジュアルインタフェース（ＤＶＩ）、高精細マルチメディアインタフェース（ＨＤＭＩ（登録商標））、ＲＦアンテナ、Ｓ－ビデオ、ＶＧＡ、ＩＥＥＥ（８０２）．ｎ／ｂ／ｇ／ｎ／ｘ、Ｂｌｕｅｔｏｏｔｈ、セルラー（例えば、符号分割多元接続（ＣＤＭＡ）、高速パケットアクセス（ＨＳＰＡ＋）、グローバルシステムフォアモバイルコミュニケーションズ（ＧＳＭ）、ロングタームエボリューション（ＬＴＥ）、ＷｉＭａｘなど）などであるが、これに限定されるものではない通信プロトコル／方法を利用してよい。 The processor (802) may be arranged in communication with one or more I / O devices (not shown) via an input / output (I / O) interface (801). The I / O interface (801) includes audio, analog, digital, monaural, RCA, stereo, IEEE- (1394), serial bus, universal serial bus (USB), infrared, PS / 2, BNC, coaxial, component, Composites, Digital Visual Interface (DVI), High Definition Multimedia Interface (HDMI®), RF Antenna, S-Video, VGA, IEEE (802). n / b / g / n / x, Bluetooth, cellular (eg code division multiple access (CDMA), high speed packet access (HSPA +), global system for mobile communications (GSM), long term evolution (LTE), WiMax, etc.) However, communication protocols / methods that are not limited to this may be used.

Ｉ／Ｏインタフェース（８０１）を使用し、コンピュータシステム（８００）は、１つ以上のＩ／Ｏ装置と通信してよい。例えば、入力装置（８１０）は、アンテナ、キーボード、マウス、ジョイスティック、（赤外線）リモコン装置、カメラ、カードリーダ、ファクシミリ、ドングル、バイオメトリックリーダ、マイク、タッチスクリーン、タッチパッド、トラックボール、スタイラス、スキャナ、記憶装置、トランシーバ、ビデオ装置／ソースなどであってよい。出力装置（８１１）は、プリンタ、ファクシミリ、ビデオディスプレイ（例えば、陰極線管（ＣＲＴ）、液晶ディスプレイ（ＬＣＤ）、発光ダイオード（ＬＥＤ）、プラズマ、プラズマディスプレイパネル（ＰＤＰ）、有機発光ダイオードディスプレイ（ＯＬＥＤ）など）、音声スピーカなどであってよい。 Using the I / O interface (801), the computer system (800) may communicate with one or more I / O devices. For example, the input device (810) is an antenna, keyboard, mouse, joystick, (infrared) remote control device, camera, card reader, facsimile, dongle, biometric reader, microphone, touch screen, touch pad, trackball, stylus, scanner. , Storage device, transceiver, video device / source, etc. The output device (811) includes a printer, a facsimile, a video display (for example, a cathode line tube (CRT), a liquid crystal display (LCD), a light emitting diode (LED), a plasma, a plasma display panel (PDP), an organic light emitting diode display (OLED)). Etc.), may be a voice speaker or the like.

いくつかの実施形態では、コンピュータシステム（８００）は、通信ネットワーク（８０９）を通してサービス事業者に接続される。プロセッサ（８０２）は、ネットワークインタフェース（８０３）を介して通信ネットワーク（８０９）と通信して配置されてよい。ネットワークインタフェース（８０３）は、通信ネットワーク（８０９）と通信してよい。ネットワークインタフェース（８０３）は、直接接続、イーサネット（例えば、ツイストペア１０／１００／１０００ベースＴ）、伝送制御プロトコル／インターネットプロトコル（ＴＣＰ／ＩＰ）、トークンリング、ＩＥＥＥ８０２．１１ａ／ｂ／ｇ／ｎ／ｘ等を含むが、これに限定されるものではない接続プロトコルを利用してよい。通信ネットワーク（８０９）は、直接相互接続、電子商取引ネットワーク、ピアツーピア（Ｐ２Ｐ）ネットワーク、ローカルエリアネットワーク（ＬＡＮ）、広域ネットワーク（ＷＡＮ）、無線ネットワーク（例えば、ワイヤレスアプリケーションプロトコルを使用する）、インターネット、Ｗｉ－Ｆｉなどを含んでよいが、これに限定されるものではない。ネットワークインタフェース（８０３）及び通信ネットワーク（８０９）を使用して、コンピュータシステム（８００）は、１つ以上のサービス事業者と通信してよい。 In some embodiments, the computer system (800) is connected to the service provider through a communication network (809). The processor (802) may be arranged in communication with the communication network (809) via the network interface (803). The network interface (803) may communicate with the communication network (809). The network interface (803) is a direct connection, Ethernet (eg, twisted pair 10/100/1000 base T), transmission control protocol / Internet protocol (TCP / IP), Token Ring, IEEE802.11a / b / g / n / x. Etc., but not limited to, connection protocols may be used. Communication networks (809) include direct interconnects, electronic commerce networks, peer-to-peer (P2P) networks, local area networks (LANs), wide area networks (WANs), wireless networks (eg, using wireless application protocols), the Internet, Wii. -Fi may be included, but the present invention is not limited to this. The computer system (800) may communicate with one or more service providers using the network interface (803) and the communication network (809).

いくつかの実施形態では、プロセッサ（８０２）は、メモリ（８０５）（例えば、ストレージインタフェース（８０４）を介して図８に示されていないＲＡＭ、ＲＯＭ、など）と通信して配置されてよい。ストレージインタフェース（８０４）は、シリアルアドバンストテクノロジーアタッチメント（ＳＡＴＡ）、インテグレーテッドドライブエレクトロニクス（ＩＤＥ）、ＩＥＥＥ－１３９４、ユニバーサルシリアルバス（ＵＳＢ）、ファイバチャネル、小型コンピュータ用周辺機器インタフェース（ＳＣＳＩ）などの接続プロトコルを利用して、メモリドライブ、リムーバブルディスクドライブなどを含むが、これに限定されるものではないメモリ（８０５）に接続してよい。メモリドライブは、ドラム、磁気ディスクドライブ、磁気光学ドライブ、光学ドライブ、レイド（ＲＡＩＤ）、ソリッドステートメモリデバイス、ソリッドステートドライブなどをさらに含んでよい。 In some embodiments, the processor (802) may be arranged in communication with memory (805) (eg, RAM, ROM, etc. not shown in FIG. 8 via storage interface (804)). The storage interface (804) is a connection protocol such as Serial Advanced Technology Attachment (SATA), Integrated Drive Electronics (IDE), IEEE-1394, Universal Serial Bus (USB), Fiber Channel, Peripheral Interface for Small Computers (SCSI). May be used to connect to a memory (805) including, but not limited to, a memory drive, a removable disk drive, and the like. The memory drive may further include a drum, a magnetic disk drive, a magnetic optical drive, an optical drive, a RAID, a solid state memory device, a solid state drive, and the like.

メモリ（８０５）は、ユーザインタフェース（８０６）、オペレーティングシステム（８０７）、ウェブサーバ（８０８）などを含むが、これに限定されるものではないプログラム又はデータベース構成要素の集合体を格納してよい。いくつかの実施形態では、コンピュータシステム（８００）は、本開示に説明されるデータ、変数、レコードなどのユーザ／アプリケーションデータを含んでよい。係るデータベースは、Ｏｒａｃｌｅ又はＳｙｂａｓｅなどのフォルトトレラントでリレーショナル、スケラブルな安全データベースとして実装されてよい。 The memory (805) may store a collection of programs or database components including, but not limited to, a user interface (806), an operating system (807), a web server (808), and the like. In some embodiments, the computer system (800) may include user / application data such as the data, variables, records described herein. The database may be implemented as a fault tolerant, relational, scalable safety database such as Oracle or Sybase.

オペレーティングシステム（８０７）は、コンピュータシステム（８００）の資源管理及び動作を容易にし得る。オペレーティングシステムの例は、ＡＰＰＬＥ（登録商標）ＭＡＣＩＮＴＯＳＨ（登録商標）ＯＳＸ（登録商標）、ＵＮＩＸ（登録商標）、ＵＮＩＸのようなシステムディストリビューション（例えば、ＢＥＲＫＥＬＥＹＳＯＦＴＷＡＲＥＤＩＳＴＲＩＢＵＴＩＯＮ（登録商標）（ＢＳＤ）、ＦＲＥＥＢＳＤ（登録商標）、ＮＥＴＢＳＤ（登録商標）、ＯＰＥＮＢＳＤなど）、ＬＩＮＵＸ（登録商標）ＤＩＳＴＲＩＢＵＴＩＯＮＳ（例えば、ＲＥＤＨＡＴ（登録商標）、ＵＢＵＮＴＵ（登録商標）、ＫＵＢＵＮＴＵ（登録商標）など）、ＩＢＭ（登録商標）ＯＳ／２（登録商標）、ＭＩＣＲＯＳＯＦＴ（登録商標）ＷＩＮＤＯＷＳ（登録商標）（ＸＰ（登録商標）、ＶＩＳＴＡ（登録商標）７／８／１０など）、ＡＰＰＬＥ（登録商標）ＩＯＳ（登録商標）、ＧＯＯＧＬＥ（登録商標）ＡＮＤＲＯＩＤ（登録商標）、ＢＬＡＣＫＢＥＲＲＹ（登録商標）ＯＳなどを含むが、これに限定されるものではない。 The operating system (807) may facilitate resource management and operation of the computer system (800). Examples of operating systems include APPLE® MACINTOSH® OS X®, UNIX®, system distributions such as UNIX (eg, BERKELEY SOFTWARE DISTRIBUTION® (BSD), FREEBSD (registered trademark), NETBSD (registered trademark), OPENBSD, etc.), LINUX (registered trademark) DISTRIBUTIONS (for example, RED HAT (registered trademark), UBUNTU (registered trademark), KUBUNTU (registered trademark), etc.), IBM (registered trademark) ) OS / 2 (registered trademark), MICROSOFT (registered trademark) WINDOWS (registered trademark) (XP (registered trademark), VISTA (registered trademark) 7/8/10, etc.), APPLE (registered trademark) IOS (registered trademark), Includes, but is not limited to, GOOGLE® ANDROID®, BLACKBERRY® OS, and the like.

いくつかの実施形態では、コンピュータシステム（８００）は、ウェブブラウザ（図では不図示）格納プログラム構成要素を実装してよい。ウェブブラウザは、ＭＩＣＲＯＳＯＦＴ（登録商標）ＩＮＴＥＲＮＥＴＥＸＰＬＯＲＥＲ（登録商標）、ＧＯＯＧＬＥ（登録商標）ＣＨＲＯＭＥ（登録商標）、ＭＯＺＩＬＬＡ（登録商標）ＦＩＲＥＦＯＸ（登録商標）、ＡＰＰＬＥ（登録商標）ＳＡＦＡＲＩ（登録商標）などのハイパーテキスト視聴アプリケーションであってよい。安全なウェブ閲覧は、セキュアハイパーテキスト転送プロトコル（ＨＴＴＰＳ）、セキュアソケットレイヤ（ＳＳＬ）、トランスポート層セキュリティ（ＴＬＳ）などを使用して提供されてよい。ウェブブラウザ（８０８）は、ＡＪＡＸ、ＤＨＴＭＬ、ＡＤＯＢＥ（登録商標）ＦＬＡＳＨ（登録商標）、ＪＡＶＡＳＣＲＩＰＴ（登録商標）、ＪＡＶＡ（登録商標）、アプリケーションプログラミングインタフェース（ＡＰＩ）などの機能を利用し得る。いくつかの実施形態では、コンピュータシステム（８００）は、メールサーバ格納プログラム構成要素を実装してよい。メールサーバは、ＭｉｃｒｏｓｏｆｔＥｘｃｈａｎｇｅなどのインターネットメールサーバであってよい。メールサーバは、アクティブサーバページ（ＡＳＰ）、ＡＣＴＩＶＥＸ（登録商標）、ＡＮＳＩ（登録商標）Ｃ＋＋／Ｃ＃、ＭＩＣＲＯＳＯＦＴ（登録商標）、．ＮＥＴ、ＣＧＩＳＣＲＩＰＴＳ、ＪＡＶＡ（登録商標）、ＪＡＶＡＳＣＲＩＰＴ（登録商標）、ＰＥＲＬ（登録商標）、ＰＨＰ、ＰＹＴＨＯＮ（登録商標）、ＷＥＢＯＢＪＥＣＴＳ（登録商標）などの機能を利用し得る。メールサーバは、インターネットメッセージアクセスプロトコル（ＩＭＡＰ）、メッセージングアプリケーションプログラミングインタフェース（ＭＡＰＩ）、ＭＩＣＲＯＳＯＦＴ（登録商標）Ｅｘｃｈａｎｇｅ、ポストオフィスプロトコル（ＰＯＰ）、シンプルメールトランスファプロトコル（ＳＭＴＰ）などの通信プロトコルを利用してよい。いくつかの実施形態では、コンピュータシステム（８００）は、メールクライアント格納プログラム構成要素を実装してよい。メールクライアントは、ＡＰＰＬＥ（登録商標）ＭＡＩＬ、ＭＩＣＲＯＳＯＦＴ（登録商標）ＥＮＴＯＵＲＡＧＥ（登録商標）、ＭＩＣＲＯＳＯＦＴ（登録商標）ＯＵＴＬＯＯＫ（登録商標）、ＭＯＺＩＬＬＡ（登録商標）ＴＨＵＮＤＥＲＢＩＲＤ（登録商法）などのメール視聴アプリケーションであってよい。 In some embodiments, the computer system (800) may implement a web browser (not shown) storage program component. Web browsers include MICROSOFT (registered trademark) INTERNET EXPLORER (registered trademark), GOOGLE (registered trademark) CHROME (registered trademark), MOZILLA (registered trademark) FIREFOX (registered trademark), APPLE (registered trademark) SAFARI (registered trademark), etc. It may be a hypertext viewing application. Secure web browsing may be provided using Secure Hypertext Transfer Protocol (HTTPS), Secure Sockets Layer (SSL), Transport Layer Security (TLS), and the like. The web browser (808) can utilize functions such as AJAX, DHCP, ADOBE (registered trademark) FLASH (registered trademark), JAVASCRIPT (registered trademark), JAVA (registered trademark), and application programming interface (API). In some embodiments, the computer system (800) may implement a mail server storage program component. The mail server may be an internet mail server such as Microsoft Exchange. The mail server is an active server page (ASP), ACTIVEX (registered trademark), ANSI (registered trademark) C ++ / C #, MICROSOFT (registered trademark) ,. Functions such as NET, CGI SCRIPTS, JAVA (registered trademark), JAVASCRIPT (registered trademark), PERL (registered trademark), PHP, PYTHON (registered trademark), and WEBOBJECTS (registered trademark) can be used. The mail server may utilize communication protocols such as Internet Message Access Protocol (IMAP), Messaging Application Programming Interface (MAPI), MICROSOFT® Exchange, Post Office Protocol (POP), Simple Mail Transfer Protocol (SMTP). .. In some embodiments, the computer system (800) may implement mail client storage program components. The mail client is a mail viewing application such as APPLE (registered trademark) MAIL, MICROSOFT (registered trademark) ENTOURAGE (registered trademark), MICROSOFT (registered trademark) OUTLOOK (registered trademark), MOZILLA (registered trademark) THUNDERBIRD (registered commercial method). You can do it.

さらに、１つ以上のコンピュータ可読記憶媒体は、本発明と一致する実施形態を実装する際に利用され得る。コンピュータ可読記憶媒体は、プロセッサ（８０２）によって読み取り可能な情報又はデータが格納されてよい任意の種類の物理メモリ（８０５）を指す。したがって、コンピュータ可読記憶媒体は、プロセッサに、本明細書で説明される実施形態と一致するステップ又は段階を実行させるための命令を含む、１つ以上のプロセッサによる実行のための命令を格納してよい。用語「コンピュータ可読媒体」は、有形品目を含み、搬送波及び過渡信号を除外する、つまり非一過性と理解されるべきである。例は、ランダムアクセスメモリ（ＲＡＭ）、読み取り専用メモリ（ＲＯＭ）、揮発性メモリ、不揮発性メモリ、ハードドライブ、コンパクトディスク（ＣＤ）ＲＯＭ、デジタルビデオディスク（ＤＶＤ）、フラッシュドライブ、ディスク、及び任意の他の既知の物理記憶媒体を含む。 In addition, one or more computer-readable storage media can be utilized in implementing embodiments consistent with the present invention. Computer-readable storage medium refers to any type of physical memory (805) in which information or data readable by a processor (802) may be stored. Accordingly, the computer-readable storage medium stores instructions for execution by one or more processors, including instructions for causing the processor to perform a step or step consistent with the embodiments described herein. good. The term "computer-readable medium" should be understood to include tangible items and exclude carrier and transient signals, i.e., non-transient. Examples are random access memory (RAM), read-only memory (ROM), volatile memory, non-volatile memory, hard drives, compact discs (CDs) ROMs, digital video discs (DVDs), flash drives, disks, and any. Includes other known physical storage media.

一実施形態では、コンピュータシステム（８００）は遠隔装置（８１２）を含んでよい。コンピュータシステム（８００）は、通信ネットワーク（８０９）を通して遠隔装置（８１２）から１つ以上の画像（４０１Ａ、４０１Ｂ、．．．４０１Ｎ）、第1のユーザ入力、第２のユーザ入力、及び１つ以上のテスト画像を受け取ってよい。 In one embodiment, the computer system (800) may include a remote device (812). The computer system (800) has one or more images (401A, 401B, ... 401N), a first user input, a second user input, and one from the remote device (812) through the communication network (809). You may receive the above test image.

用語「一実施形態（ａｎｅｍｂｏｄｉｍｅｎｔ）」、「実施形態（ｅｍｂｏｄｉｍｅｎｔ）」、「複数の実施形態（ｅｍｂｏｄｉｍｅｎｔｓ）」、「該実施形態（ｔｈｅｅｍｂｏｄｉｍｅｎｔ）」、「該複数の実施形態（ｔｈｅｅｍｂｏｄｉｍｅｎｔｓ）」、「１つ以上の実施形態（ｏｎｅｏｒｍｏｒｅｅｍｂｏｄｉｍｅｎｔｓ）」、「いくつかの実施形態（ｓｏｍｅｅｍｂｏｄｉｍｅｎｔｓ）」及び「一実施形態（ｏｎｅｅｍｂｏｄｉｍｅｎｔ）」は、別段の定めのない限り、「本発明（複数の場合がある）の１つ以上の（ただしすべてではない）実施形態」を意味する。 The terms "an embodiment", "embodied", "multiple embodiments", "the embodied", "the embodiments". , "One or more embodied", "some embodied" and "one embodied" are "the present invention (one embodied)" unless otherwise specified. Means one or more (but not all) embodiments (which may be more than one).

用語「含む（ｉｎｃｌｕｄｉｎｇ）」、「含む（ｃｏｍｐｒｉｓｉｎｇ）」、「有する（ｈａｖｉｎｇ）」、及びその変形物は、別段の定めのない限り「含むが、これに限定されるものではない」を意味する。 The terms "include", "comprising", "having", and variants thereof mean "including, but not limited to," unless otherwise specified. ..

品目の列挙されたリストは、別段の定めのない限り、品目のいずれか又はすべてが相互に排他的であることを暗示していない。用語「ある」、「１つの」、及び「該」は、別段の定めがない限り「１つ以上」を意味する。 The enumerated list of items does not imply that any or all of the items are mutually exclusive, unless otherwise specified. The terms "is", "one", and "the" mean "one or more" unless otherwise specified.

互いと通信するいくつかの構成要素を有する一実施形態の説明は、すべての係る構成要素が必要とされることを暗示していない。逆に、さまざまな任意選択の構成要素は、本発明の多種多様の考えられる実施形態を示すために説明される。 The description of one embodiment having several components communicating with each other does not imply that all such components are required. Conversely, various optional components are described to illustrate a wide variety of possible embodiments of the invention.

単一のデバイス又は製品が本明細書で説明されるとき、１つ以上のデバイス／製品が（それらが協調するかどうかに関わりなく）単一のデバイス／製品の代わりに使用されてよいことは容易に明らかになり得る。同様に、複数のデバイス又は製品が（それらが協調するかどうかに関わりなく）本明細書に説明される場合、単一のデバイス／製品が複数のデバイス若しくは製品の代わりに使用されてよい、又は異なる数のデバイス／製品が示されている数のデバイス又はプログラムの代わりに使用されてよいことは容易に明らかになり得る。デバイスの機能性及び／又は特徴は、代わりに係る機能性／特徴を有するとして明示的に説明されない１つ以上の他のデバイスによって実施されてよい。したがって、本発明の他の実施形態は、デバイス自体を含む必要はない。 When a single device or product is described herein, one or more devices / products may be used in place of a single device / product (whether or not they cooperate). It can be easily revealed. Similarly, if multiple devices or products are described herein (whether or not they cooperate), a single device / product may be used in place of multiple devices or products, or It can be easily apparent that different numbers of devices / products may be used in place of the indicated number of devices or programs. The functionality and / or characteristics of the device may be performed by one or more other devices that are not explicitly described as having the functionality / characteristics of the alternative. Therefore, other embodiments of the invention need not include the device itself.

図３の示されている動作は、特定のイベントが特定の順序で発生することを示す。代替実施形態では、特定の動作は、異なる順序で実行され、修正、又は削除されてよい。さらに、ステップが上述の論理に加えられてよく、説明された実施形態に依然として適合してよい。さらに、本明細書で説明された動作は連続して発生する場合もあれば、特定の動作が並行して処理される場合もある。しかもさらに、動作は単一の処理ユニットによって又は分散型処理ユニットによって実行されてよい。 The behavior shown in FIG. 3 indicates that certain events occur in a particular order. In alternative embodiments, the particular actions may be performed, modified, or deleted in a different order. In addition, steps may be added to the logic described above and may still be compatible with the embodiments described. In addition, the actions described herein may occur consecutively, or certain actions may be processed in parallel. Moreover, the operation may be performed by a single processing unit or by a distributed processing unit.

最後に、本明細書で使用される言語は、おもに読みやすさ及び教育の目的で選択されており、言語は、本発明の主題を詳しく説明又は制限するために選択されなかった可能性がある。したがって、本発明の範囲は、本発明を実施するための形態によって制限されるのではなく、むしろここに基づく出願に対して発生するあらゆる請求項によって制限されることが意図される。したがって、本発明の実施形態の開示は、続く特許請求の範囲に説明される、本発明の範囲の制限的ではなく、例示的となることを目的とする。 Finally, the language used herein has been selected primarily for readability and educational purposes, and the language may not have been selected to elaborate or limit the subject matter of the invention. .. Accordingly, it is intended that the scope of the invention is not limited by the embodiments for carrying out the invention, but rather by any claims arising from an application based herein. Accordingly, the disclosure of embodiments of the invention is intended to be exemplary rather than limiting in the scope of the invention as described in the claims that follow.

多様な態様及び実施形態が本明細書に説明されてきたが、他の態様及び実施形態が当業者に明らかになってよい。本明細書に開示される多様な態様及び実施形態は、説明の目的のためであり、制限的となることを目的としておらず、真の範囲及び精神は、以下の特許請求の範囲によって示されている。 Various embodiments and embodiments have been described herein, but other embodiments and embodiments may be apparent to those of skill in the art. The various aspects and embodiments disclosed herein are for illustration purposes only and are not intended to be restrictive, the true scope and spirit of which is set forth by the claims below. ing.

１０１ユーザ
１０２訓練システム
１０３ユーザインタフェース
１０４訓練データセット
１０５テストデータセット
２０１Ｉ／Ｏインタフェース
２０２メモリ
２０３プロセッサ
２０４データ
２０５関係データ
２０６パラメータデータ
２０７損失値データ
２０８他のデータ
２０９モジュール
２１０通信モジュール
２１１情報修正モジュール
２１２入力モジュール
２１３損失決定モジュール
２１４更新モジュール
２１５認識モジュール
２１６他のモジュール
２１７情報
４０１Ａ，４０１Ｂ， …４０１Ｎ１つ以上の画像
４０２Ａ，４０２Ｂ， …４０２Ｎクラスラベル
４０３第１のレベル
４０４第２のレベル
４０５第３のレベル
４０６ソートされた１つ以上の画像
５０１ベースニューラルネットワーク
５０２入力画像
５０３ベースニューラルネットワークの出力
５０４前処理層
５０５Ａ，５０５Ｂ畳み込み層
５０６Ａ，５０６Ｂ正規化線形ユニット層
５０７プーリング層
５０８完全接続層
５０９１つ以上のカーネル
５１０活性化マップ
５１１以前の層の出力
５１２ニューロン
５１３第１の層
５１４第２の層
５１５Ａ，５１５Ｂ， …５１５Ｎ１つ以上の接続
５１６カーネル活性化リスト
５１７インデックス
６０１ニューラルネットワーク
７０１テスト画像
８００コンピュータシステム
８０１Ｉ／Ｏインタフェース
８０２プロセッサ
８０３ネットワークインタフェース
８０４ストレージインタフェース
８０５メモリ
８０６ユーザインタフェース
８０７オペレーティングシステム
８０８ウェブサーバ
８０９通信ネットワーク
８１０入力装置
８１１出力装置
８１２遠隔装置 101 User 102 Training system 103 User interface 104 Training data set 105 Test data set 201 I / O interface 202 Memory 203 Processor 204 Data 205 Relationship data 206 Parameter data 207 Loss value data 208 Other data 209 Module 210 Communication module 211 Information correction module 212 Input module 213 Loss determination module 214 Update module 215 Recognition module 216 Other modules 217 Information 401A, 401B, ... 401N One or more images 402A, 402B, ... 402N Class label 403 1st level 404 2nd level 405th Level 3 406 Sorted one or more images 501 Base Neural Network 502 Input Image 503 Base Neural Network Output 504 Pretreatment Layers 505A, 505B Convolution Layers 506A, 506B Normalized Linear Unit Layers 507 Pooling Layers 508 Fully Connected Layers 509 One or more kernels 510 activation map 511 previous layer outputs 512 neurons 513 first layer 514 second layers 515A, 515B, ... 515N one or more connections 516 kernel activation list 517 index 601 neural network 701 test Image 800 Computer system 801 I / O interface 802 Processor 803 Network interface 804 Storage interface 805 Memory 806 User interface 807 Operating system 808 Web server 809 Communication network 810 Input device 811 Output device 812 Remote device

Claims

A method of generating a neural network for object recognition in an image.
Receiving information about hierarchical relationships between one or more objects through a training system,
The training system provides one or more images of a training dataset corresponding to each object of the one or more objects as input to the base neural network based on information about the hierarchical relationship between the one or more objects. To provide such that the base neural network is associated with one or more parameters.
For each input image, the training system determines the loss value of the base neural network based on the output of the base neural network and the class label corresponding to each input image, wherein the output is said. Making decisions that indicate similarity values for one or more objects,
One or more of the input images based on at least one of the output, the loss value of the base neural network, and a second user input for generating the neural network by the training system. To update one or more parameters such that the neural network is used for the object recognition .
The training system allows the recognition of the object in the neural network.
To provide a test image received from the user as an input to the first layer of the neural network.
Comparing the output of the first layer of the neural network with the kernel activation list ,
Identifying one or more kernels from the kernel activation list based on the comparison.
Propagating the test image through the identified kernel of one or more layers of the neural network.
Recognizing the object in the test image based on the output of the neural network
How to include.

Receiving information about hierarchical relationships between one or more of the objects
Determining the total number of the one or more objects in the training data set and the one or more images in the training data set corresponding to each object.
One or more nodes and one or more based on at least one of the total number of the one or more images corresponding to each object, the first user input, and the relationship between the one or more objects. By constructing a hierarchical tree structure with edges, such that the one or more nodes indicate the one or more objects and the one or more edges indicate the relationships between the one or more objects. , Building a hierarchical tree structure ,
Receiving from the user the hierarchical tree structure showing information about the hierarchical relationship between the one or more objects regarding the hierarchical relationship between the one or more objects.
The first user input based on at least one of the loss value of the base neural network, the output of the first layer of the base neural network, and the addition of one or more images to the training data set. The method of claim 1, comprising modifying the hierarchical tree structure via a user interface.

Providing one or more of the above images
Retrieving the one or more images and the class label from the training dataset
Sorting the one or more images based on the class label,
The first aspect of claim 1, wherein the one or more images corresponding to the respective objects are provided based on the position of the respective objects in the information regarding the hierarchical relationship between the one or more objects . Method.

The method of claim 1, wherein determining the loss value is based on a hierarchical cross entropy technique.

The method of claim 1, wherein providing the one or more images corresponding to each of the objects is based on the result of comparison between the loss value and a predetermined threshold.

The base neural network and the neural network are convolutional neural networks having one or more layers, and the one or more layers are a convolutional layer, a pooling layer, a rectified linear unit layer, a fully connected layer, and a loss layer. The method according to claim 1, which is at least one of.

The one or more parameters are one or more kernels of each layer of the base neural network, one or more kernels of the first layer for one or more layers of the base neural network and the first layer. Includes one or more connections between the one or more kernels in the second layer after, multiple values of the one or more kernels, and at least one of the kernel activation lists. The method according to claim 1.

7. The method of claim 7, wherein the kernel activation list comprises the output of the first layer of the base neural network, the index of one or more kernels, and at least one of the class labels.

Updating the one or more parameters adds one or more kernels in at least one layer of the one or more layers of the base neural network and the one or more of the base neural network. Modifying one or more of the values of one or more kernels in one of the layers and said one or more of the first layer for the one or more layers of the base neural network. At least one of modifying one or more connections between the kernel and the one or more kernels in the second layer after the first layer and modifying the kernel activation list. The method of claim 8, comprising:

The method of claim 2, wherein the first user input and the second user input are received via a user interface.

A training system for generating neural networks for object recognition in images.
With the processor
With a memory coupled to the processor by communication
When the memory is executed, the processor
Receive information about hierarchical relationships between one or more objects,
Based on the information about the hierarchical relationship between the one or more objects, the input to the base neural network is to provide one or more images of the training data set corresponding to each object of the one or more objects, in which case the base. Neural networks associate with one or more parameters,
For each input image, the loss of the base neural network is determined based on the output of the base neural network and the class label corresponding to each input image, in which case the output corresponds to the one or more objects of similarity. Show value,
The one or more parameters are updated for each of the input images based on at least one of the output, the loss value of the base neural network, and the second user input for generating the neural network. In that case, the neural network is used for the object recognition,
Stores processor instructions ,
The processor
It is configured to receive information about hierarchical relationships between one or more of the objects .
Determining the total number of the one or more objects in the training data set and the one or more images in the training data set corresponding to each object.
One or more nodes and one or more based on at least one of the total number of the one or more images corresponding to each object, the first user input, and the relationship between the one or more objects. By constructing a hierarchical tree structure having the edges of, such that the one or more nodes indicate the one or more objects and the one or more edges indicate the relationships between the one or more objects. Well, doing the above construction and
Receiving from the user the hierarchical tree structure showing information about the hierarchical relationship between the one or more objects regarding the hierarchical relationship between the one or more objects .
The first user input based on at least one of the loss value of the base neural network, the output of the first layer of the base neural network, and the addition of one or more images to the training data set. Including modifying the hierarchical tree structure via the user interface using
The processor is configured to recognize the object in the neural network.
To provide the first layer of the neural network with a test image received from the user.
Comparing the output of the first layer of the neural network with the kernel activation list ,
Identifying one or more kernels from the kernel activation list based on the comparison.
Propagating the test image through the identified kernel of one or more layers of the neural network.
Recognizing the object in the test image based on the output of the neural network
Training system , including .

The processor is configured to provide the one or more images.
Retrieving the one or more images and the class label from the training data set.
Sorting the one or more images based on the class label,
11. The training according to claim 11 , comprising providing the one or more images corresponding to each object based on the position of each object in the information about the hierarchical relationship between the one or more objects . system.

11. The training system of claim 11 , wherein the processor is configured to determine the loss value based on hierarchical cross entropy technology.

11. The training system of claim 11 , wherein the processor is configured to provide one or more images corresponding to each of the objects based on the result of comparison of the loss value with a predetermined threshold.

The processor is configured to generate the neural network as a convolutional neural network having one or more layers, the one or more layers being a convolutional layer, a pooling layer, a normalized linear unit layer, a fully connected layer, and the like. And the training system according to claim 11 , which is at least one of the loss layers.

The one or more parameters are one or more kernels of each layer of the base neural network, one or more kernels of the first layer for one or more layers of the base neural network, and the first. Includes one or more connections between the one or more kernels in the second layer after the layer, multiple values of the one or more kernels, and at least one of the kernel activation lists. The training system according to claim 11 .

16. The training system of claim 16 , wherein the kernel activation list comprises the output of the first layer of the base neural network, the index of one or more kernels, and at least one of the class labels. ..

The processor is configured to update the one or more parameters, adding one or more kernels in at least one layer of the one or more layers of the base neural network, and the base neural. Modifying one or more of the values of one or more kernels of the one or more layers of the network and the first layer for the one or more layers of the base neural network. Modifying one or more connections between the one or more kernels of the first layer and the one or more kernels of the second layer after the first layer, and modifying the kernel activation list. The training system according to claim 11 , comprising at least one of the above.

11. The training system of claim 11 , wherein the processor is configured to use a user interface to receive the first user input and the second user input.
The processor is configured to recognize the object in the neural network.
To provide the first layer of the neural network with a test image received from the user.
Comparing the output of the first layer of the neural network with the kernel activation list,
Identifying one or more kernels from the kernel activation list based on the comparison.
Propagating the test image through the identified kernel of one or more layers of the neural network.
11. The training system of claim 11 , comprising recognizing the object of the test image based on the output of the neural network.