JP2018528521A

JP2018528521A - Media classification

Info

Publication number: JP2018528521A
Application number: JP2018504642A
Authority: JP
Inventors: タデッセ、ヘノク・テフェラ; チャクラボルティー、アビジット; ジュリアン、デイビッド・ジョナサン; ストークマン、ヘンリクス・マイナルドゥス; デ・ローイ、オーク; バン・デ・サンデ、クーン・エリック・アドリアーン; アンナプレディー、ベンカタ・スリーカンタ・レッディ
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2015-07-31
Filing date: 2016-07-19
Publication date: 2018-09-27
Also published as: CN107851198A; EP3329425A1; US20170032247A1; BR112018002025A2; WO2017023539A1; KR20180036709A

Abstract

しきい値および／またはスケールファクタを決定することによって、マルチラベル分類が改善される。マルチラベル分類のためのしきい値を選択することは、順序付きリストを作成するために、第１のラベルに関連するラベルスコアのセットをソートすることを含む。候補しきい値のセットに対応する適合率値および再現率値がスコア値から計算される。しきい値は、ターゲット適合率値または再現率値に基づいて、第１のラベルについて候補しきい値から選択される。また、範囲内のスコアのメトリックが計算されるマルチラベル分類のための活性化関数のために、スケールファクタが選択される。スケールファクタは、スコアのメトリックが範囲内にないときに調節される。 Multi-label classification is improved by determining threshold and / or scale factors. Selecting a threshold for multi-label classification includes sorting a set of label scores associated with the first label to create an ordered list. A precision value and recall value corresponding to the set of candidate thresholds are calculated from the score values. A threshold is selected from the candidate thresholds for the first label based on the target precision value or recall value. A scale factor is also selected for the activation function for multi-label classification for which the metric of the score within the range is calculated. The scale factor is adjusted when the score metric is not within range.

Description

関連出願の相互参照
[0001]本出願は、その開示全体が参照により本明細書に明確に組み込まれる、２０１５年７月３１日に出願された、「MEDIA CLASSIFICATION」と題する米国仮特許出願第６２／１９９，８６５号の利益を主張する。 Cross-reference of related applications
[0001] This application is a US Provisional Patent Application No. 62 / 199,865 entitled "MEDIA CLASSIFICATION", filed July 31, 2015, the entire disclosure of which is expressly incorporated herein by reference. Insist on the interests of.

[0002]本開示のいくつかの態様は、一般に機械学習に関し、より詳細には、メディアの分類のための、特に、ピクチャファイルを含むメディアファイルをラベリングするためのシステムおよび方法を改善することに関する。 [0002] Certain aspects of the present disclosure relate generally to machine learning, and more particularly to improving systems and methods for media classification, particularly for labeling media files, including picture files. .

[0003]人工ニューロン（たとえば、ニューロンモデル）の相互結合されたグループを備え得る人工ニューラルネットワークは、計算デバイスであるか、または計算デバイスによって実行されるべき方法を表す。 [0003] An artificial neural network that may comprise interconnected groups of artificial neurons (eg, neuron models) is a computing device or represents a method to be performed by a computing device.

[0004]畳み込みニューラルネットワークは、フィードフォワード人工ニューラルネットワークのタイプである。畳み込みニューラルネットワークは、各々が受容野を有し、入力空間を集合的にタイリングするニューロンの集合を含み得る。畳み込みニューラルネットワーク（ＣＮＮ）は多数の適用例を有する。特に、ＣＮＮは、パターン認識および分類の領域内で広く使用されている。 [0004] A convolutional neural network is a type of feedforward artificial neural network. A convolutional neural network may include a collection of neurons that each have a receptive field and collectively tile the input space. Convolutional neural networks (CNN) have many applications. In particular, CNN is widely used in the area of pattern recognition and classification.

[0005]深層信念ネットワークおよび深層畳み込みネットワークなど、深層学習アーキテクチャは、層状（layered）ニューラルネットワークアーキテクチャであり、ニューロンの第１の層の出力はニューロンの第２の層への入力になり、ニューロンの第２の層の出力はニューロンの第３の層になり、入力し、以下同様である。深層ニューラルネットワークは、特徴の階層（hierarchy）を認識するようにトレーニングされ得、したがって、それらはオブジェクト認識適用例においてますます使用されている。畳み込みニューラルネットワークのように、これらの深層学習アーキテクチャにおける計算は、１つまたは複数の計算チェーンにおいて構成され得る処理ノードの集団にわたって分散され得る。これらの多層アーキテクチャは、一度に１つの層をトレーニングされ得、バックプロパゲーション（back propagation）を使用して微調整され得る。 [0005] Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural network architectures, where the output of the first layer of neurons becomes the input to the second layer of neurons, and the neuron's The output of the second layer becomes the third layer of neurons, inputs, and so on. Deep neural networks can be trained to recognize a hierarchy of features, so they are increasingly used in object recognition applications. Like convolutional neural networks, the computations in these deep learning architectures can be distributed across a collection of processing nodes that can be configured in one or more computation chains. These multi-layer architectures can be trained one layer at a time and can be fine-tuned using back propagation.

[0006]他のモデルも、オブジェクト認識のために利用可能である。たとえば、サポートベクターマシン（ＳＶＭ）は、分類のために適用され得る学習ツールである。サポートベクターマシンは、データをカテゴリー分類する分離超平面（separating hyperplane）（たとえば、決定境界（decision boundary））を含む。超平面は、教師あり学習によって定義される。所望の超平面は、トレーニングデータのマージンを増加させる。言い換えれば、超平面は、トレーニング例との最大の最小距離を有するべきである。 [0006] Other models are also available for object recognition. For example, Support Vector Machine (SVM) is a learning tool that can be applied for classification. The support vector machine includes a separating hyperplane (eg, a decision boundary) that categorizes the data. The hyperplane is defined by supervised learning. The desired hyperplane increases the margin of the training data. In other words, the hyperplane should have the largest minimum distance from the training example.

[0007]これらのソリューションは、いくつかの分類ベンチマーク上で優れた結果を達成するが、それらの計算複雑さは極めて高いことがある。さらに、モデルのトレーニングが難しいことがある。 [0007] Although these solutions achieve excellent results on some classification benchmarks, their computational complexity can be quite high. In addition, model training can be difficult.

[0008]一態様では、マルチラベル分類のためのしきい値を選択する方法が開示される。本方法は、順序付きリストを作成するために、第１のラベルに関連するラベルスコアのセットをソートすることを含む。本方法はまた、複数のスコア値から、候補しきい値のセットに対応する適合率（precision）値および再現率（recall）値を計算することを含む。本方法はまた、ターゲット適合率値またはターゲット再現率値に少なくとも部分的に基づいて、第１のラベルについて候補しきい値からしきい値を選択することを含む。 [0008] In one aspect, a method for selecting a threshold for multi-label classification is disclosed. The method includes sorting a set of label scores associated with the first label to create an ordered list. The method also includes calculating a precision value and a recall value corresponding to the set of candidate threshold values from the plurality of score values. The method also includes selecting a threshold from the candidate threshold for the first label based at least in part on the target precision value or target recall value.

[0009]別の態様は、マルチラベル分類のための活性化関数のためのスケールファクタを選択する方法を開示する。本方法は、範囲内のスコアのメトリックを計算することと、スコアのメトリックが範囲内にないとき、スケールファクタを調節することとを含む。 [0009] Another aspect discloses a method for selecting a scale factor for an activation function for multi-label classification. The method includes calculating a metric for a score within the range and adjusting the scale factor when the score metric is not within the range.

[0010]別の態様では、ワイヤレス通信におけるマルチラベル分類のためのしきい値を選択するための装置が開示される。本装置は、順序付きリストを作成するために、第１のラベルに関連するラベルスコアのセットをソートするための手段を含む。本装置はまた、複数のスコア値から、候補しきい値のセットに対応する適合率値および再現率値を計算するための手段を含む。本装置はまた、ターゲット適合率値またはターゲット再現率値に少なくとも部分的に基づいて、第１のラベルについて候補しきい値からしきい値を選択するための手段を含む。 [0010] In another aspect, an apparatus for selecting a threshold for multi-label classification in wireless communications is disclosed. The apparatus includes means for sorting a set of label scores associated with the first label to create an ordered list. The apparatus also includes means for calculating a precision value and a recall value corresponding to the set of candidate threshold values from the plurality of score values. The apparatus also includes means for selecting a threshold from the candidate threshold for the first label based at least in part on the target precision value or target recall value.

[0011]別の態様は、マルチラベル分類のための活性化関数のためのスケールファクタを選択するための装置を開示する。本装置は、範囲内のスコアのメトリックを計算するための手段と、スコアのメトリックが範囲内にないとき、スケールファクタを調節するための手段とを含む。 [0011] Another aspect discloses an apparatus for selecting a scale factor for an activation function for multi-label classification. The apparatus includes means for calculating a metric for a score within the range and means for adjusting a scale factor when the score metric is not within the range.

[0012]別の態様では、ワイヤレス通信におけるマルチラベル分類のためのしきい値を選択するための装置が開示される。本装置は、メモリと、メモリに結合された少なくとも１つのプロセッサとを有する。（１つまたは複数の）プロセッサは、順序付きリストを作成するために、第１のラベルに関連するラベルスコアのセットをソートするように構成される。（１つまたは複数の）プロセッサはまた、複数のスコア値から、候補しきい値のセットに対応する適合率値および再現率値を計算するように構成される。（１つまたは複数の）プロセッサはまた、ターゲット適合率値またはターゲット再現率値に少なくとも部分的に基づいて、第１のラベルについて候補しきい値からしきい値を選択するように構成される。 [0012] In another aspect, an apparatus for selecting a threshold for multi-label classification in wireless communications is disclosed. The apparatus includes a memory and at least one processor coupled to the memory. The processor (s) is configured to sort the set of label scores associated with the first label to create an ordered list. The processor (s) is also configured to calculate a precision value and a recall value corresponding to the set of candidate threshold values from the plurality of score values. The processor (s) is also configured to select a threshold from the candidate thresholds for the first label based at least in part on the target precision value or target recall value.

[0013]別の態様は、ワイヤレス通信における活性化関数のためのスケールファクタを選択するための装置を開示する。本装置は、メモリと、メモリに結合された少なくとも１つのプロセッサとを有する。（１つまたは複数の）プロセッサは、範囲内のスコアのメトリックを計算することと、スコアのメトリックが範囲内にないとき、スケールファクタを調節することとを行うように構成される。 [0013] Another aspect discloses an apparatus for selecting a scale factor for an activation function in wireless communications. The apparatus includes a memory and at least one processor coupled to the memory. The processor (s) is configured to calculate a metric for a score within the range and adjust the scale factor when the score metric is not within the range.

[0014]別の態様では、マルチラベル分類のためのしきい値を選択するための非一時的コンピュータ可読媒体が開示される。本非一時的コンピュータ可読媒体は、（１つまたは複数の）プロセッサによって実行されたとき、（１つまたは複数の）プロセッサに、順序付きリストを作成するために、第１のラベルに関連するラベルスコアのセットをソートする動作を実行させる、それに記録された非一時的プログラムコードを有する。プログラムコードはまた、（１つまたは複数の）プロセッサに、複数のスコア値から、候補しきい値のセットに対応する適合率値および再現率値を計算させる。プログラムコードはまた、（１つまたは複数の）プロセッサに、ターゲット適合率値またはターゲット再現率値に少なくとも部分的に基づいて、第１のラベルについて候補しきい値からしきい値を選択させる。 [0014] In another aspect, a non-transitory computer readable medium for selecting a threshold for multi-label classification is disclosed. The non-transitory computer readable medium has a label associated with the first label to create an ordered list on the processor (s) when executed by the processor (s). It has a non-transitory program code recorded on it that causes it to perform the operation of sorting the set of scores. The program code also causes the processor (s) to calculate the precision and recall values corresponding to the set of candidate threshold values from the plurality of score values. The program code also causes the processor (s) to select a threshold from the candidate thresholds for the first label based at least in part on the target precision value or target recall value.

[0015]別の態様は、活性化関数のためのスケールファクタを選択するための非一時的コンピュータ可読媒体を開示する。本非一時的コンピュータ可読媒体は、（１つまたは複数の）プロセッサによって実行されたとき、（１つまたは複数の）プロセッサに、範囲内のスコアのメトリックを計算し、スコアのメトリックが範囲内にないとき、スケールファクタを調節する動作を実行させる、それに記録された非一時的プログラムコードを有する。 [0015] Another aspect discloses a non-transitory computer readable medium for selecting a scale factor for an activation function. The non-transitory computer-readable medium, when executed by the processor (s), calculates to the processor (s) a score metric within the range, and the score metric falls within the range. When not, it has non-temporary program code recorded on it that causes it to perform an operation to adjust the scale factor.

[0016]ここでは、以下の発明を実施するための形態がより良く理解され得るように、本開示の特徴および技術的利点についてやや広く概説した。本開示の追加の特徴および利点が、以下で説明される。本開示は、本開示の同じ目的を実行するための他の構造を変更または設計するための基礎として容易に利用され得ることを、当業者は諒解されたい。また、そのような等価な構成が、添付の特許請求の範囲に記載の本開示の教示から逸脱しないことを、当業者は了解されたい。さらなる目的および利点とともに、本開示の編成と動作の方法の両方に関して、本開示を特徴づけると考えられる新規の特徴は、添付の図に関連して以下の説明を検討するとより良く理解されよう。ただし、図の各々は、例示および説明のみの目的で与えられたものであり、本開示の限界を定めるものではないことを明確に理解されたい。 [0016] The foregoing has outlined rather broadly the features and technical advantages of the present disclosure in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the present disclosure are described below. Those skilled in the art will appreciate that the present disclosure can be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present disclosure. Those skilled in the art should also realize that such equivalent constructions do not depart from the teachings of the disclosure as set forth in the appended claims. The novel features believed to characterize the present disclosure, both as to the organization and method of operation of the present disclosure, as well as further objects and advantages, will be better understood when the following description is considered in conjunction with the accompanying drawings. It should be clearly understood, however, that each of the figures is provided for purposes of illustration and explanation only and does not define the limitations of the present disclosure.

[0017]本開示の特徴、特性、および利点は、全体を通じて同様の参照符号が同様のものを指す図面とともに、以下に記載される発明を実施するための形態を読めばより明らかになろう。 [0017] The features, characteristics, and advantages of the present disclosure will become more apparent from the detailed description set forth below when taken in conjunction with the drawings, in which like reference characters refer to like parts throughout.

[0018]本開示のいくつかの態様による、汎用プロセッサを含むシステムオンチップ（ＳＯＣ）を使用してニューラルネットワークを設計する例示的な実装形態を示す図。[0018] FIG. 4 illustrates an example implementation for designing a neural network using a system-on-chip (SOC) that includes a general purpose processor in accordance with certain aspects of the present disclosure. [0019]本開示の態様による、システムの例示的な実装形態を示す図。[0019] FIG. 4 illustrates an example implementation of a system in accordance with aspects of the present disclosure. [0020]本開示の態様による、ニューラルネットワークを示す図。[0020] FIG. 5 illustrates a neural network according to aspects of the disclosure. [0021]本開示の態様による、例示的な深層畳み込みネットワーク（ＤＣＮ）を示すブロック図。[0021] FIG. 4 is a block diagram illustrating an example deep convolutional network (DCN), according to aspects of the disclosure. [0022]本開示の態様による、人工知能（ＡＩ）機能をモジュール化し得る例示的なソフトウェアアーキテクチャを示すブロック図。[0022] FIG. 7 is a block diagram illustrating an example software architecture that may modularize artificial intelligence (AI) functions according to aspects of the disclosure. [0023]本開示の態様による、スマートフォン上のＡＩアプリケーションのランタイム動作を示すブロック図。[0023] FIG. 7 is a block diagram illustrating runtime operation of an AI application on a smartphone according to aspects of the disclosure. [0024]例示的なバイナリ分類プロセスを示すブロック図。[0024] FIG. 4 is a block diagram illustrating an exemplary binary classification process. [0025]適合率および再現率の概念を示す図。[0025] The figure which shows the concept of precision and recall. [0026]本開示の態様による、分類プロセスの全体的例を示す図。[0026] FIG. 7 illustrates a general example of a classification process according to aspects of the disclosure. [0027]本開示の態様による、分類プロセスの例示的な傾き選択関数を示すブロック図。[0027] FIG. 7 is a block diagram illustrating an example slope selection function of a classification process according to aspects of the disclosure. [0028]本開示の態様による、分類プロセスの例示的なしきい値選択関数を示すブロック図。[0028] FIG. 7 is a block diagram illustrating an exemplary threshold selection function of a classification process according to aspects of the disclosure. [0029]本開示の態様による、ラベルのためのスコアを示すグラフ。[0029] A graph showing the score for a label according to aspects of the disclosure. [0030]本開示の態様による、Ｆ尺度を利用するしきい値選択を示すグラフ。[0030] FIG. 7 is a graph illustrating threshold selection utilizing an F measure, according to aspects of the disclosure. [0031]本開示の態様による、マルチラベル分類のためのしきい値を選択するための方法を示す流れ図。[0031] FIG. 6 is a flow diagram illustrating a method for selecting a threshold for multi-label classification according to aspects of the disclosure. [0032]本開示の態様による、活性化関数のためのスケールファクタを選択するための方法を示す流れ図。[0032] FIG. 6 is a flow diagram illustrating a method for selecting a scale factor for an activation function according to aspects of the disclosure.

[0033]添付の図面に関して以下に記載される発明を実施するための形態は、様々な構成を説明するものであり、本明細書で説明される概念が実施され得る構成のみを表すものではない。発明を実施するための形態は、様々な概念の完全な理解を与えるための具体的な詳細を含む。ただし、これらの概念はこれらの具体的な詳細なしに実施され得ることが当業者には明らかであろう。いくつかの事例では、そのような概念を不明瞭にしないように、よく知られている構造および構成要素がブロック図の形式で示される。 [0033] The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be implemented. . The detailed description includes specific details for providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well-known structures and components are shown in block diagram form in order to avoid obscuring such concepts.

[0034]これらの教示に基づいて、本開示の範囲は、本開示の他の態様とは無関係に実装されるにせよ、本開示の他の態様と組み合わせて実装されるにせよ、本開示のいかなる態様をもカバーするものであることを、当業者なら諒解されたい。たとえば、記載された態様をいくつ使用しても、装置は実装され得るか、または方法は実施され得る。さらに、本開示の範囲は、記載された本開示の様々な態様に加えてまたはそれらの態様以外に、他の構造、機能、または構造および機能を使用して実施されるそのような装置または方法をカバーするものとする。開示される本開示のいずれの態様も、請求項の１つまたは複数の要素によって実施され得ることを理解されたい。 [0034] Based on these teachings, the scope of the present disclosure may be implemented independently of other aspects of the present disclosure or in combination with other aspects of the present disclosure. Those skilled in the art should appreciate that they cover any aspect. For example, an apparatus can be implemented or a method can be implemented using any number of the described aspects. Further, the scope of the present disclosure is such an apparatus or method implemented using other structures, functions, or structures and functions in addition to or in addition to the various aspects of the present disclosure described Shall be covered. It should be understood that any aspect of the disclosure disclosed may be practiced by one or more elements of a claim.

[0035]「例示的」という単語は、本明細書では「例、事例、または例示の働きをすること」を意味するために使用される。「例示的」として本明細書で説明されるいかなる態様も、必ずしも他の態様よりも好適または有利であると解釈されるべきであるとは限らない。 [0035] The word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any aspect described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other aspects.

[0036]本明細書では特定の態様が説明されるが、これらの態様の多くの変形および置換は本開示の範囲内に入る。好適な態様のいくつかの利益および利点が説明されるが、本開示の範囲は特定の利益、使用、または目的に限定されるものではない。むしろ、本開示の態様は、様々な技術、システム構成、ネットワーク、およびプロトコルに広く適用可能であるものとし、それらのいくつかが、例として、図および好適な態様についての以下の説明において示される。発明を実施するための形態および図面は、本開示を限定するものではなく説明するものにすぎず、本開示の範囲は添付の特許請求の範囲およびそれの均等物によって定義される。 [0036] Although particular aspects are described herein, many variations and permutations of these aspects fall within the scope of the disclosure. Although some benefits and advantages of the preferred aspects are described, the scope of the disclosure is not intended to be limited to particular benefits, uses, or objectives. Rather, aspects of the present disclosure shall be broadly applicable to various technologies, system configurations, networks, and protocols, some of which are illustrated by way of example in the figures and the following description of preferred aspects. . The detailed description and drawings are merely illustrative of the disclosure rather than limiting, the scope of the disclosure being defined by the appended claims and equivalents thereof.

[0037]本開示の態様は、メディアファイルをラベリングするためのシステムおよび方法を対象とする。メディアファイルのデータベースが、各記憶されたメディアファイルを１つまたは複数のラベルに関連付け得る。さらに、関数が、メディアファイルに基づいて各ラベルについてスコアを計算する。たとえば、湖中のボートの写真の場合、関数は、ラベル「ボート」および「湖」について高いスコアを計算し得、データベース中の残りのラベル（たとえば、「車」および「納屋」）について低いスコアを計算し得る。関数はニューラルネットワークであり得、スコアはニューラルネットワークの出力層の活性化レベルであり得る。 [0037] Aspects of the present disclosure are directed to systems and methods for labeling media files. A database of media files may associate each stored media file with one or more labels. In addition, the function calculates a score for each label based on the media file. For example, for a boat photo in a lake, the function may calculate a high score for the labels “boat” and “lake” and a low score for the remaining labels in the database (eg, “car” and “barn”). Can be calculated. The function can be a neural network and the score can be the activation level of the output layer of the neural network.

[0038]本開示の一態様は、ラベルごとにラベリングシステムのための分類器しきい値を選択する方法を対象とする。湖中のボートの画像の例の場合、「ボート」についての計算されたスコアは０．８であり得、「湖」についての計算されたスコアは０．９であり得る。画像中にボートを実際に有する（およびそのようにラベリングされた）データベース中の画像は、０．６またはそれよりも高いスコアを確実に有することと、画像中に湖を含んでいる（およびそのようにラベリングされた）画像は、０．８またはそれよりも高いスコアを確実に有することとが別々に決定され得る。これは、関数（ニューラルネットワーク）が「湖」について０．７のスコアをそれについて計算したデータベース中の画像は、湖を含んでいる可能性がより低く、「ボート」についての０．７の計算されたスコアをもつ画像は、ボートを含んでいる可能性がより高いことを意味する。データベースに関するこの情報は、次いで、ラベルごとに分類器システムのための異なるしきい値を設定するために適用され得る。本例では、「ボート」のためのしきい値は０．６に設定され得、「湖」のためのしきい値は０．８に設定され得る。 [0038] One aspect of the present disclosure is directed to a method of selecting a classifier threshold for a labeling system for each label. For the example of an image of a boat in a lake, the calculated score for “boat” may be 0.8 and the calculated score for “lake” may be 0.9. An image in the database that actually has a boat in the image (and so labeled) ensures that it has a score of 0.6 or higher and includes a lake in the image (and its Images) are reliably determined to have a score of 0.8 or higher. This is because the image in the database for which the function (neural network) calculated a score of 0.7 for “lake” is less likely to contain a lake, and a calculation of 0.7 for “boat” An image with an assigned score means that it is more likely to contain a boat. This information about the database can then be applied to set different thresholds for the classifier system for each label. In this example, the threshold for “boat” may be set to 0.6 and the threshold for “lake” may be set to 0.8.

[0039]本開示の別の態様は、ニューラルネットワークの最終層におけるスコアの計算の変更を対象とする。画像のデータベースにわたって、元の関数（ニューラルネットワーク）は、極めて狭い分布を有するものとして特徴づけられ得る所与のラベルについてスコアのセットを計算し得る。たとえば、許容範囲が−１．０から１．０の間であるとき、値のすべてが０．７から０．９の間に入り得る。これにより、上記で開示されたしきい値設定動作は、新しい画像に十分な汎化を与えないことがある。たとえば、湖の画像が０．８〜０．９の値においてスコアリングされる傾向があるが、湖を含んでいない画像が、０．７５〜０．７９の間の湖についての計算されたスコアを頻繁に有する場合、ラベリングシステムの性能は、０．８におけるしきい値の厳密な配置に極めて敏感になる。 [0039] Another aspect of the present disclosure is directed to changing the calculation of scores in the final layer of a neural network. Over the database of images, the original function (neural network) can calculate a set of scores for a given label that can be characterized as having a very narrow distribution. For example, when the tolerance range is between -1.0 and 1.0, all of the values can fall between 0.7 and 0.9. Thus, the threshold setting operation disclosed above may not give sufficient generalization to a new image. For example, images of lakes tend to be scored at values between 0.8 and 0.9, but images that do not include lakes are calculated scores for lakes between 0.75 and 0.79 The labeling system performance becomes very sensitive to the exact placement of the threshold at 0.8.

[0040]さらに、関数（ニューラルネットワーク）は、画像の正規変動により、湖を含んでいる新しい画像について０．８をわずかに下回るスコアを計算することが予想され得る。同様に、湖を含んでいない新しい画像が、０．８をわずかに上回る計算されたスコアを有し得る。したがって、「湖」のためのしきい値を０．８に設定することは、多くのフォールスネガティブ（false-negative）結果およびフォールスポジティブ（false-positive）結果をもたらし得る。この敏感性を緩和するために、本開示の態様は、ニューラルネットワークの最終層のための活性化関数の変更を対象とする。この変更の結果として、所与のラベルについてのスコアの分布は、画像の分布にわたってより広い、より均一な分布を有し得る。本開示の態様は、ポジティブ例の計算されたスコアとネガティブ例の計算されたスコアとがより離れて拡散され得るので、改善された汎化を与える。 [0040] Furthermore, the function (neural network) may be expected to calculate a score slightly below 0.8 for a new image containing a lake due to normal variation of the image. Similarly, a new image that does not include a lake may have a calculated score slightly above 0.8. Thus, setting the threshold for “Lake” to 0.8 can result in many false-negative and false-positive results. To mitigate this sensitivity, aspects of the present disclosure are directed to changing the activation function for the final layer of the neural network. As a result of this change, the distribution of scores for a given label may have a wider and more uniform distribution across the distribution of images. Aspects of the present disclosure provide improved generalization because the calculated score for positive examples and the calculated score for negative examples can be diffused more distantly.

[0041]図１は、本開示のいくつかの態様による、汎用プロセッサ（ＣＰＵ）またはマルチコア汎用プロセッサ（ＣＰＵ）１０２を含み得るシステムオンチップ（ＳＯＣ）１００を使用する、メディアファイルの上述のラベリングの例示的な実装形態を示す。変数（たとえば、ニューラル信号およびシナプス荷重）、計算デバイスに関連するシステムパラメータ（たとえば、重みをもつニューラルネットワーク）、遅延、周波数ビン情報、およびタスク情報が、ニューラル処理ユニット（ＮＰＵ）１０８に関連するメモリブロックに記憶されるか、ＣＰＵ１０２に関連するメモリブロックに記憶されるか、グラフィックス処理ユニット（ＧＰＵ）１０４に関連するメモリブロックに記憶されるか、デジタル信号プロセッサ（ＤＳＰ）１０６に関連するメモリブロックに記憶されるか、専用メモリブロック１１８に記憶され得るか、または複数のブロックにわたって分散され得る。汎用プロセッサ１０２において実行される命令が、ＣＰＵ１０２に関連するプログラムメモリからロードされ得るか、または専用メモリブロック１１８からロードされ得る。 [0041] FIG. 1 illustrates the above labeling of a media file using a system on chip (SOC) 100 that may include a general purpose processor (CPU) or multi-core general purpose processor (CPU) 102, according to some aspects of the present disclosure. An exemplary implementation is shown. Variables (eg, neural signals and synaptic weights), system parameters (eg, weighted neural networks) associated with computing devices, delays, frequency bin information, and task information are associated with a neural processing unit (NPU) 108. Stored in a block, stored in a memory block associated with the CPU 102, stored in a memory block associated with the graphics processing unit (GPU) 104, or memory block associated with the digital signal processor (DSP) 106 Stored in a dedicated memory block 118, or distributed across multiple blocks. Instructions executed on general purpose processor 102 may be loaded from program memory associated with CPU 102, or may be loaded from dedicated memory block 118.

[0042]ＳＯＣ１００はまた、ＧＰＵ１０４、ＤＳＰ１０６など、特定の機能に適合された追加の処理ブロックと、第４世代ロングタームエボリューション（４ＧＬＴＥ（登録商標））接続性、無認可Ｗｉ−Ｆｉ（登録商標）接続性、ＵＳＢ接続性、Ｂｌｕｅｔｏｏｔｈ（登録商標）接続性などを含み得る接続性ブロック１１０と、たとえば、ジェスチャーを検出および認識し得るマルチメディアプロセッサ１１２とを含み得る。一実装形態では、ＮＰＵは、ＣＰＵ、ＤＳＰ、および／またはＧＰＵにおいて実装される。ＳＯＣ１００はまた、センサープロセッサ１１４、画像信号プロセッサ（ＩＳＰ）、および／または全地球測位システムを含み得るナビゲーション１２０を含み得る。 [0042] The SOC 100 also includes additional processing blocks adapted to specific functions, such as GPU 104, DSP 106, and fourth generation long term evolution (4G LTE®) connectivity, unlicensed Wi-Fi®. A connectivity block 110 that may include connectivity, USB connectivity, Bluetooth® connectivity, and the like, and a multimedia processor 112 that may detect and recognize gestures, for example, may be included. In one implementation, the NPU is implemented in a CPU, DSP, and / or GPU. The SOC 100 may also include a navigation 120 that may include a sensor processor 114, an image signal processor (ISP), and / or a global positioning system.

[0043]ＳＯＣはＡＲＭ命令セットに基づき得る。本開示の一態様では、命令は、メモリに結合するである、汎用プロセッサ１０２など、少なくとも１つのプロセッサにロードされる。命令は、順序付きリストを作成するために、第１のラベルに関連するラベルスコアのセットをソートするためのコードを備え得る。汎用プロセッサ１０２にロードされる命令はまた、スコア値のセットから、候補しきい値のセットに対応する適合率値および再現率値を計算するためのコードを備え得る。さらに、汎用プロセッサ１０２にロードされる命令はまた、ターゲット適合率値またはターゲット再現率値に基づいて、第１のラベルについて候補しきい値からしきい値を選択するためのコードを備え得る。 [0043] The SOC may be based on an ARM instruction set. In one aspect of the present disclosure, instructions are loaded into at least one processor, such as general purpose processor 102, which is coupled to memory. The instructions may comprise code for sorting a set of label scores associated with the first label to create an ordered list. The instructions loaded into the general purpose processor 102 may also comprise code for calculating a precision value and a recall value corresponding to the set of candidate threshold values from the set of score values. Further, the instructions loaded into general purpose processor 102 may also comprise code for selecting a threshold from the candidate thresholds for the first label based on the target precision value or target recall value.

[0044]本開示の別の態様では、汎用プロセッサ１０２にロードされる命令は、範囲内のスコアのメトリックを計算するためのコードを備え得る。さらに、汎用プロセッサ１０２にロードされる命令は、スコアのメトリックが範囲内にないとき、スケールファクタを調節するためのコードを備え得る。 [0044] In another aspect of the present disclosure, the instructions loaded into the general purpose processor 102 may comprise code for calculating a metric for a score within a range. Further, the instructions loaded into the general purpose processor 102 may comprise code for adjusting the scale factor when the score metric is not within range.

[0045]図２は、本開示のいくつかの態様による、システム２００の例示的な実装形態を示す。図２に示されているように、システム２００は、本明細書で説明される方法の様々な動作を実行し得る複数のローカル処理ユニット２０２を有し得る。各ローカル処理ユニット２０２は、ローカル状態メモリ２０４と、ニューラルネットワークのパラメータを記憶し得るローカルパラメータメモリ２０６とを備え得る。さらに、ローカル処理ユニット２０２は、ローカルモデルプログラムを記憶するためのローカル（ニューロン）モデルプログラム（ＬＭＰ）メモリ２０８と、ローカル学習プログラムを記憶するためのローカル学習プログラム（ＬＬＰ）メモリ２１０と、ローカル接続メモリ２１２とを有し得る。さらに、図２に示されているように、各ローカル処理ユニット２０２は、ローカル処理ユニットのローカルメモリのための構成を与えるための構成プロセッサユニット２１４、およびローカル処理ユニット２０２間のルーティングを与えるルーティング接続処理ユニット２１６とインターフェースし得る。 [0045] FIG. 2 illustrates an exemplary implementation of the system 200 in accordance with certain aspects of the present disclosure. As shown in FIG. 2, system 200 may have multiple local processing units 202 that may perform various operations of the methods described herein. Each local processing unit 202 may comprise a local state memory 204 and a local parameter memory 206 that may store the parameters of the neural network. Further, the local processing unit 202 includes a local (neuron) model program (LMP) memory 208 for storing a local model program, a local learning program (LLP) memory 210 for storing a local learning program, and a local connection memory. 212. Further, as shown in FIG. 2, each local processing unit 202 has a configuration processor unit 214 for providing configuration for the local memory of the local processing unit, and a routing connection that provides routing between the local processing units 202. It may interface with the processing unit 216.

[0046]深層学習アーキテクチャは、各層において連続的により高い抽象レベルで入力を表現するように学習し、それにより、入力データの有用な特徴表現を蓄積することによって、オブジェクト認識タスクを実行し得る。このようにして、深層学習は、旧来の機械学習の主要なボトルネックに対処する。深層学習の出現より前に、オブジェクト認識問題に対する機械学習手法は、場合によっては浅い分類器（shallow classifier）と組み合わせて、人的に設計された特徴に大きく依拠していることがある。浅い分類器は、たとえば、入力がどのクラスに属するかを予測するために、特徴ベクトル成分の重み付き和がしきい値と比較され得る２クラス線形分類器であり得る。人的に設計された特徴は、領域の専門知識をもつ技術者によって特定の問題領域に適合されたテンプレートまたはカーネルであり得る。対照的に、深層学習アーキテクチャは、人間の技術者が設計し得るものと同様である特徴を表現するように学習するが、トレーニングを通してそれを行い得る。さらに、深層ネットワークは、人間が考慮していないことがある新しいタイプの特徴を表現し、認識するように学習し得る。 [0046] A deep learning architecture may perform object recognition tasks by learning to represent input at a continuously higher level of abstraction in each layer, thereby accumulating useful feature representations of the input data. In this way, deep learning addresses the major bottleneck of traditional machine learning. Prior to the advent of deep learning, machine learning techniques for object recognition problems may rely heavily on human-designed features, sometimes in combination with shallow classifiers. The shallow classifier can be, for example, a two-class linear classifier in which the weighted sum of feature vector components can be compared to a threshold value to predict which class the input belongs to. A human-designed feature can be a template or kernel adapted to a particular problem area by an engineer with domain expertise. In contrast, deep learning architectures learn to represent features that are similar to what a human engineer can design, but can do so through training. In addition, deep networks can learn to represent and recognize new types of features that humans may not consider.

[0047]深層学習アーキテクチャは特徴の階層を学習し得る。たとえば、視覚データが提示された場合、第１の層は、エッジなど、入力ストリーム中の単純な特徴を認識するように学習し得る。聴覚データが提示された場合、第１の層は、特定の周波数におけるスペクトル電力を認識するように学習し得る。第１の層の出力を入力として取る第２の層は、視覚データの場合の単純な形状、または聴覚データの場合の音の組合せなど、特徴の組合せを認識するように学習し得る。上位層は、視覚データ中の複雑な形状、または聴覚データ中の単語を表現するように学習し得る。さらに上位の層は、共通の視覚オブジェクトまたは発話フレーズを認識するように学習し得る。 [0047] A deep learning architecture may learn a hierarchy of features. For example, if visual data is presented, the first layer may learn to recognize simple features in the input stream, such as edges. When auditory data is presented, the first layer may learn to recognize spectral power at a particular frequency. The second layer, which takes the output of the first layer as input, can learn to recognize a combination of features, such as a simple shape in the case of visual data, or a sound combination in the case of auditory data. Upper layers can learn to represent complex shapes in visual data or words in auditory data. Further higher layers may learn to recognize common visual objects or utterance phrases.

[0048]深層学習アーキテクチャは、自然階層構造を有する問題に適用されたとき、特にうまく機能し得る。たとえば、原動機付き車両の分類は、ホイール、フロントガラス、および他の特徴を認識するための第１の学習から恩恵を受け得る。これらの特徴は、車、トラック、および飛行機を認識するために、異なる方法で、上位層において組み合わせられ得る。 [0048] Deep learning architectures may work particularly well when applied to problems with natural hierarchical structures. For example, motor vehicle classification may benefit from first learning to recognize wheels, windscreens, and other features. These features can be combined in higher layers in different ways to recognize cars, trucks, and airplanes.

[0049]ニューラルネットワークは、様々な結合性パターンを用いて設計され得る。フィードフォワードネットワークでは、情報が下位層から上位層に受け渡され、所与の層における各ニューロンは、上位層におけるニューロンに通信する。上記で説明されたように、フィードフォワードネットワークの連続する層において、階層表現が蓄積され得る。ニューラルネットワークはまた、リカレントまたは（トップダウンとも呼ばれる）フィードバック結合を有し得る。リカレント結合では、所与の層におけるニューロンからの出力は、同じ層における別のニューロンに通信される。リカレントアーキテクチャは、時間的に展開するパターンを認識するのに役立ち得る。所与の層におけるニューロンから下位層におけるニューロンへの結合は、フィードバック（またはトップダウン）結合と呼ばれる。高レベルの概念の認識が、入力の特定の低レベルの特徴を弁別することを助け得るとき、多くのフィードバック結合をもつネットワークが役立ち得る。 [0049] Neural networks can be designed with various connectivity patterns. In a feedforward network, information is passed from a lower layer to an upper layer, and each neuron in a given layer communicates with a neuron in the upper layer. As explained above, hierarchical representations can be accumulated in successive layers of the feedforward network. Neural networks can also have recurrent or feedback coupling (also called top-down). In recurrent coupling, the output from a neuron in a given layer is communicated to another neuron in the same layer. A recurrent architecture can help recognize patterns that evolve over time. The connection from a neuron in a given layer to a neuron in a lower layer is called feedback (or top-down) connection. A network with many feedback combinations can be useful when recognition of high-level concepts can help discriminate specific low-level features of the input.

[0050]図３Ａを参照すると、ニューラルネットワークの層間の結合は全結合３０２または局所結合３０４であり得る。全結合ネットワーク３０２では、所与の層におけるニューロンは、次の層におけるあらゆるニューロンにそれの出力を通信し得る。代替的に、局所結合ネットワーク３０４では、所与の層におけるニューロンは、次の層における限られた数のニューロンに結合され得る。畳み込みネットワーク３０６は、局所結合であり得、さらに、所与の層における各ニューロンに関連する結合強度が共有される特殊な場合である（たとえば、３０８）。より一般的には、ネットワークの局所結合層は、層における各ニューロンが同じまたは同様の結合性パターンを有するように構成されるが、異なる値を有し得る結合強度で構成され得る（たとえば、３１０、３１２、３１４、および３１６）。局所結合の結合性パターンは、所与の領域中の上位層ニューロンが、ネットワークへの総入力のうちの制限された部分のプロパティにトレーニングを通して調整された入力を受信し得るので、上位層において空間的に別個の受容野を生じ得る。 [0050] Referring to FIG. 3A, the connections between the layers of the neural network may be full connections 302 or local connections 304. In fully connected network 302, a neuron in a given layer can communicate its output to every neuron in the next layer. Alternatively, in the local connection network 304, neurons in a given layer can be coupled to a limited number of neurons in the next layer. The convolutional network 306 may be a local connection, and is a special case where the connection strength associated with each neuron in a given layer is shared (eg, 308). More generally, the local connectivity layer of the network is configured such that each neuron in the layer has the same or similar connectivity pattern, but with connectivity strengths that may have different values (eg, 310 , 312, 314, and 316). The connectivity pattern of local connections is a spatial pattern in the upper layer, because upper layer neurons in a given region can receive input that has been adjusted through training to the properties of a limited portion of the total input to the network. Can result in distinct receptive fields.

[0051]局所結合ニューラルネットワークは、入力の空間ロケーションが有意味である問題に好適であり得る。たとえば、車載カメラからの視覚特徴を認識するように設計されたネットワーク３００は、画像の下側部分対上側部分とのそれらの関連付けに依存して、異なるプロパティをもつ上位層ニューロンを発達させ得る。画像の下側部分に関連するニューロンは、たとえば、車線区分線を認識するように学習し得るが、画像の上側部分に関連するニューロンは、交通信号、交通標識などを認識するように学習し得る。 [0051] Locally coupled neural networks may be suitable for problems where the spatial location of the inputs is meaningful. For example, a network 300 designed to recognize visual features from an in-vehicle camera may develop higher layer neurons with different properties depending on their association with the lower part of the image versus the upper part. Neurons associated with the lower part of the image can learn to recognize lane markings, for example, while neurons associated with the upper part of the image can learn to recognize traffic signals, traffic signs, etc. .

[0052]ＤＣＮは、教師あり学習を用いてトレーニングされ得る。トレーニング中に、ＤＣＮは、速度制限標識のクロップされた画像など、画像３２６を提示され得、次いで、出力３２８を生成するために、「フォワードパス」が計算され得る。出力３２８は、「標識」、「６０」、および「１００」など、特徴に対応する値のベクトルであり得る。ネットワーク設計者は、ＤＣＮが、出力特徴ベクトルにおけるニューロンのうちのいくつか、たとえば、トレーニングされたネットワーク３００のための出力３２８に示されているように「標識」および「６０」に対応するニューロンについて、高いスコアを出力することを希望し得る。トレーニングの前に、ＤＣＮによって生成された出力は不正確である可能性があり、したがって、実際の出力とターゲット出力との間で誤差が計算され得る。次いで、ＤＣＮの重みは、ＤＣＮの出力スコアがターゲットとより密接に整合されるように調節され得る。 [0052] The DCN may be trained using supervised learning. During training, the DCN can be presented with an image 326, such as a cropped image of a speed limit indicator, and then a “forward path” can be calculated to produce an output 328. The output 328 may be a vector of values corresponding to the features, such as “indicator”, “60”, and “100”. The network designer may note that the DCN corresponds to some of the neurons in the output feature vector, eg, “label” and “60” as shown in the output 328 for the trained network 300. You may wish to output a high score. Prior to training, the output generated by the DCN can be inaccurate, so an error can be calculated between the actual output and the target output. The DCN weights can then be adjusted so that the DCN output score is more closely matched to the target.

[0053]重みを適切に調節するために、学習アルゴリズムは、重みのための勾配ベクトルを計算し得る。勾配は、重みがわずかに調節された場合に、誤差が増加または減少する量を示し得る。最上層において、勾配は、最後から２番目の層における活性化されたニューロンと出力層におけるニューロンとを結合する重みの値に直接対応し得る。下位層では、勾配は、重みの値と、上位層の計算された誤差勾配とに依存し得る。次いで、重みは、誤差を低減するように調節され得る。重みを調節するこの様式は、それがニューラルネットワークを通して「バックワードパス」を伴うので、「バックプロパゲーション」と呼ばれることがある。 [0053] In order to properly adjust the weights, the learning algorithm may calculate a gradient vector for the weights. The slope may indicate how much the error increases or decreases when the weight is adjusted slightly. In the top layer, the gradient may directly correspond to the value of the weight that combines the activated neurons in the penultimate layer with the neurons in the output layer. In the lower layer, the gradient may depend on the weight value and the calculated error gradient in the upper layer. The weight can then be adjusted to reduce the error. This way of adjusting the weight is sometimes called “back propagation” because it involves a “backward path” through the neural network.

[0054]実際には、重みの誤差勾配は、計算された勾配が真の誤差勾配を近似するように、少数の例にわたって計算され得る。この近似方法は、確率的勾配降下（stochastic gradient descent）と呼ばれることがある。システム全体の達成可能な誤差レートが減少しなくなるまで、または誤差レートがターゲットレベルに達するまで、確率的勾配降下が繰り返され得る。 [0054] In practice, the error gradient of weights may be calculated over a small number of examples so that the calculated gradient approximates the true error gradient. This approximation method is sometimes referred to as stochastic gradient descent. Stochastic gradient descent may be repeated until the overall achievable error rate does not decrease or until the error rate reaches a target level.

[0055]学習の後に、ＤＣＮは新しい画像３２６を提示され得、ネットワークを通したフォワードパスは、ＤＣＮの推論または予測と見なされ得る出力３２８をもたらし得る。 [0055] After learning, the DCN may be presented with a new image 326, and a forward path through the network may result in an output 328 that may be considered an inference or prediction of the DCN.

[0056]深層信念ネットワーク（ＤＢＮ：deep belief network）は、隠れノードの複数の層を備える確率モデルである。ＤＢＮは、トレーニングデータセットの階層表現を抽出するために使用され得る。ＤＢＮは、制限ボルツマンマシン（ＲＢＭ：Restricted Boltzmann Machine）の層を積層することによって取得され得る。ＲＢＭは、入力のセットにわたる確率分布を学習することができる人工ニューラルネットワークのタイプである。ＲＢＭは、各入力がそれにカテゴリー分類されるべきクラスに関する情報の不在下で確率分布を学習することができるので、ＲＢＭは、教師なし学習においてしばしば使用される。ハイブリッド教師なしおよび教師ありパラダイムを使用して、ＤＢＮの下部ＲＢＭは、教師なし様式でトレーニングされ得、特徴抽出器として働き得、上部ＲＢＭは、（前の層からの入力とターゲットクラスとの同時分布上で）教師あり様式でトレーニングされ得、分類器として働き得る。 [0056] A deep belief network (DBN) is a probabilistic model comprising multiple layers of hidden nodes. The DBN can be used to extract a hierarchical representation of the training data set. The DBN can be obtained by laminating layers of restricted Boltzmann Machine (RBM). An RBM is a type of artificial neural network that can learn a probability distribution over a set of inputs. RBM is often used in unsupervised learning because RBM can learn probability distributions in the absence of information about the class to which it should be categorized. Using a hybrid unsupervised and supervised paradigm, the lower RBM of the DBN can be trained in an unsupervised fashion and can act as a feature extractor, where the upper RBM (simultaneously with the input from the previous tier and the target class Can be trained in a supervised manner (on the distribution) and can act as a classifier.

[0057]深層畳み込みネットワーク（ＤＣＮ）は、追加のプーリング層および正規化層で構成された、畳み込みネットワークのネットワークである。ＤＣＮは、多くのタスクに関して最先端の性能を達成している。ＤＣＮは、入力と出力ターゲットの両方が、多くの標本について知られており、勾配降下方法の使用によってネットワークの重みを変更するために使用される、教師あり学習を使用してトレーニングされ得る。 [0057] A deep convolutional network (DCN) is a network of convolutional networks composed of an additional pooling layer and a normalization layer. DCN achieves state-of-the-art performance for many tasks. The DCN can be trained using supervised learning, where both input and output targets are known for many samples and are used to change the network weights by using gradient descent methods.

[0058]ＤＣＮは、フィードフォワードネットワークであり得る。さらに、上記で説明されたように、ＤＣＮの第１の層におけるニューロンから次の上位層におけるニューロンのグループへの結合は、第１の層におけるニューロンにわたって共有される。ＤＣＮのフィードフォワードおよび共有結合は、高速処理のために活用され得る。ＤＣＮの計算負担は、たとえば、リカレントまたはフィードバック結合を備える同様のサイズのニューラルネットワークのそれよりもはるかに少ないことがある。 [0058] The DCN may be a feedforward network. Furthermore, as explained above, the connections from neurons in the first layer of the DCN to groups of neurons in the next higher layer are shared across neurons in the first layer. DCN feedforward and covalent bonding can be exploited for high speed processing. The computational burden of DCN may be much less than that of a similarly sized neural network with recurrent or feedback coupling, for example.

[0059]畳み込みネットワークの各層の処理は、空間的に不変のテンプレートまたは基底投射と見なされ得る。入力が、カラー画像の赤色、緑色、および青色チャネルなど、複数のチャネルに最初に分解された場合、その入力に関してトレーニングされた畳み込みネットワークは、画像の軸に沿った２つの空間次元と、色情報をキャプチャする第３の次元とをもつ、３次元であると見なされ得る。畳み込み結合の出力は、後続の層３１８、３２０、および３２２において特徴マップを形成すると考えられ、特徴マップ（たとえば、３２０）の各要素が、前の層（たとえば、３１８）における様々なニューロンから、および複数のチャネルの各々から入力を受信し得る。特徴マップにおける値は、正規化（rectification）、ｍａｘ（０，ｘ）など、非線形性を用いてさらに処理され得る。隣接するニューロンからの値は、さらにプールされ得３２４、これは、ダウンサンプリングに対応し、さらなる局所不変性と次元削減とを与え得る。白色化に対応する正規化はまた、特徴マップにおけるニューロン間のラテラル抑制によって適用され得る。 [0059] The processing of each layer of the convolutional network may be viewed as a spatially invariant template or base projection. When an input is first decomposed into multiple channels, such as the red, green, and blue channels of a color image, the convolutional network trained on that input has two spatial dimensions along the image axis and color information. Can be considered to be three-dimensional, with a third dimension that captures. The output of the convolutional combination is thought to form a feature map in subsequent layers 318, 320, and 322, where each element of the feature map (eg, 320) is from various neurons in the previous layer (eg, 318), And may receive input from each of the plurality of channels. The values in the feature map can be further processed using non-linearities, such as rectification, max (0, x), etc. Values from neighboring neurons can be further pooled 324, which corresponds to downsampling and can provide additional local invariance and dimensionality reduction. Normalization corresponding to whitening can also be applied by lateral suppression between neurons in the feature map.

[0060]深層学習アーキテクチャの性能は、より多くのラベリングされたデータポイントが利用可能となるにつれて、または計算能力が増加するにつれて、向上し得る。現代の深層ニューラルネットワークは、ほんの１５年前に一般的な研究者にとって利用可能であったものより数千倍も大きいコンピューティングリソースを用いて、ルーチン的にトレーニングされる。新しいアーキテクチャおよびトレーニングパラダイムが、深層学習の性能をさらに高め得る。正規化線形ユニット（rectified linear unit）は、勾配消失（vanishing gradients）として知られるトレーニング問題を低減し得る。新しいトレーニング技法は、過学習（over-fitting）を低減し、したがって、より大きいモデルがより良い汎化を達成することを可能にし得る。カプセル化技法は、所与の受容野においてデータを抽出し、全体的性能をさらに高め得る。 [0060] The performance of deep learning architectures may improve as more labeled data points become available or as computing power increases. Modern deep neural networks are routinely trained with computing resources that are thousands of times larger than those available to general researchers just 15 years ago. New architectures and training paradigms can further enhance deep learning performance. A normalized linear unit can reduce a training problem known as vanishing gradients. New training techniques may reduce over-fitting and thus allow larger models to achieve better generalization. Encapsulation techniques can extract data at a given receptive field and further enhance overall performance.

[0061]図３Ｂは、例示的な深層畳み込みネットワーク３５０を示すブロック図である。深層畳み込みネットワーク３５０は、結合性および重み共有に基づく、複数の異なるタイプの層を含み得る。図３Ｂに示されているように、例示的な深層畳み込みネットワーク３５０は、複数の畳み込みブロック（たとえば、Ｃ１およびＣ２）を含む。畳み込みブロックの各々は、畳み込み層と、正規化層（ＬＮｏｒｍ）と、プーリング層とで構成され得る。畳み込み層は、１つまたは複数の畳み込みフィルタを含み得、これは、特徴マップを生成するために入力データに適用され得る。２つの畳み込みブロックのみが示されているが、本開示はそのように限定しておらず、代わりに、設計選好に従って、任意の数の畳み込みブロックが深層畳み込みネットワーク３５０中に含まれ得る。正規化層は、畳み込みフィルタの出力を正規化するために使用され得る。たとえば、正規化層は、白色化またはラテラル抑制を行い得る。プーリング層は、局所不変性および次元削減のために、空間にわたってダウンサンプリングアグリゲーションを行い得る。 [0061] FIG. 3B is a block diagram illustrating an exemplary deep convolutional network 350. As shown in FIG. The deep convolutional network 350 may include multiple different types of layers based on connectivity and weight sharing. As shown in FIG. 3B, an exemplary deep convolution network 350 includes a plurality of convolution blocks (eg, C1 and C2). Each of the convolution blocks may be composed of a convolution layer, a normalization layer (LNorm), and a pooling layer. The convolution layer may include one or more convolution filters, which may be applied to the input data to generate a feature map. Although only two convolution blocks are shown, the present disclosure is not so limited, and instead any number of convolution blocks may be included in the deep convolution network 350 according to design preferences. The normalization layer can be used to normalize the output of the convolution filter. For example, the normalization layer can provide whitening or lateral suppression. The pooling layer may perform downsampling aggregation over space for local invariance and dimension reduction.

[0062]たとえば、深層畳み込みネットワークの並列フィルタバンクは、高性能および低電力消費を達成するために、随意にＡＲＭ命令セットに基づいて、ＳＯＣ１００のＣＰＵ１０２またはＧＰＵ１０４にロードされ得る。代替実施形態では、並列フィルタバンクは、ＳＯＣ１００のＤＳＰ１０６またはＩＳＰ１１６にロードされ得る。さらに、ＤＣＮは、センサー１１４およびナビゲーション１２０に専用の処理ブロックなど、ＳＯＣ上に存在し得る他の処理ブロックにアクセスし得る。 [0062] For example, a parallel filter bank of a deep convolution network may be loaded into the CPU 102 or GPU 104 of the SOC 100, optionally based on the ARM instruction set, to achieve high performance and low power consumption. In an alternative embodiment, the parallel filter bank may be loaded into the DSP 106 or ISP 116 of the SOC 100. In addition, the DCN may access other processing blocks that may exist on the SOC, such as processing blocks dedicated to sensors 114 and navigation 120.

[0063]深層畳み込みネットワーク３５０はまた、１つまたは複数の全結合層（たとえば、ＦＣ１およびＦＣ２）を含み得る。深層畳み込みネットワーク３５０は、ロジスティック回帰（ＬＲ）層をさらに含み得る。深層畳み込みネットワーク３５０の各層の間には、更新されるべき重み（図示せず）がある。各層の出力は、第１の畳み込みブロックＣ１において供給された入力データ（たとえば、画像、オーディオ、ビデオ、センサーデータおよび／または他の入力データ）から階層特徴表現を学習するために、深層畳み込みネットワーク３５０中の後続の層の入力として働き得る。 [0063] The deep convolutional network 350 may also include one or more full coupling layers (eg, FC1 and FC2). Deep convolution network 350 may further include a logistic regression (LR) layer. Between each layer of the deep convolutional network 350 is a weight (not shown) to be updated. The output of each layer is a deep convolution network 350 for learning a hierarchical feature representation from the input data (eg, image, audio, video, sensor data and / or other input data) provided in the first convolution block C1. Can serve as input for subsequent layers in.

[0064]図４は、人工知能（ＡＩ）機能をモジュール化し得る例示的なソフトウェアアーキテクチャ４００を示すブロック図である。アーキテクチャを使用して、ＳＯＣ４２０の様々な処理ブロック（たとえば、ＣＰＵ４２２、ＤＳＰ４２４、ＧＰＵ４２６および／またはＮＰＵ４２８）に、アプリケーション４０２のランタイム動作中に計算をサポートすることを実行させ得るアプリケーション４０２が設計され得る。 [0064] FIG. 4 is a block diagram illustrating an example software architecture 400 that may modularize artificial intelligence (AI) functions. Using the architecture, an application 402 may be designed that may cause various processing blocks of the SOC 420 (eg, CPU 422, DSP 424, GPU 426, and / or NPU 428) to perform supporting computation during the runtime operation of the application 402.

[0065]ＡＩアプリケーション４０２は、たとえば、デバイスが現在動作するロケーションを示すシーンの検出および認識を与え得る、ユーザ空間４０４において定義されている機能を呼び出すように構成され得る。ＡＩアプリケーション４０２は、たとえば、認識されたシーンがオフィスであるのか、講堂であるのか、レストランであるのか、湖などの屋外環境であるのかに応じて別様に、マイクロフォンおよびカメラを構成し得る。ＡＩアプリケーション４０２は、現在のシーンの推定を与えるために、ＳｃｅｎｅＤｅｔｅｃｔアプリケーションプログラミングインターフェース（ＡＰＩ）４０６において定義されているライブラリに関連するコンパイルされたプログラムコードへの要求を行い得る。この要求は、たとえば、ビデオおよび測位データに基づくシーン推定を与えるように構成された深層ニューラルネットワークの出力に最終的に依拠し得る。 [0065] The AI application 402 may be configured to invoke functions defined in the user space 404, which may provide, for example, scene detection and recognition indicating the location where the device currently operates. The AI application 402 may configure the microphone and camera differently depending on, for example, whether the recognized scene is an office, a lecture hall, a restaurant, or an outdoor environment such as a lake. The AI application 402 may make a request for compiled program code associated with a library defined in the SceneDetect application programming interface (API) 406 to provide an estimate of the current scene. This requirement may ultimately rely on, for example, the output of a deep neural network configured to provide scene estimation based on video and positioning data.

[0066]さらに、ランタイムフレームワークのコンパイルされたコードであり得るランタイムエンジン４０８が、ＡＩアプリケーション４０２にとってアクセス可能であり得る。ＡＩアプリケーション４０２は、たとえば、ランタイムエンジンに、特定の時間間隔における、またはアプリケーションのユーザインターフェースによって検出されたイベントによってトリガされた、シーン推定を要求させ得る。シーンを推定させられたとき、ランタイムエンジンは、ＳＯＣ４２０上で実行している、Ｌｉｎｕｘ（登録商標）カーネル４１２など、オペレーティングシステム４１０に信号を送り得る。オペレーティングシステム４１０は、ＣＰＵ４２２、ＤＳＰ４２４、ＧＰＵ４２６、ＮＰＵ４２８、またはそれらの何らかの組合せ上で、計算を実行させ得る。ＣＰＵ４２２は、オペレーティングシステムによって直接アクセスされ得、他の処理ブロックは、ＤＳＰ４２４のための、ＧＰＵ４２６のための、またはＮＰＵ４２８のためのドライバ４１４〜４１８など、ドライバを通してアクセスされ得る。例示的な例では、深層ニューラルネットワークは、ＣＰＵ４２２およびＧＰＵ４２６など、処理ブロックの組合せ上で動作するように構成され得るか、または存在する場合、ＮＰＵ４２８上で動作させられ得る。 [0066] Further, a runtime engine 408, which may be compiled code of the runtime framework, may be accessible to the AI application 402. The AI application 402 may, for example, cause the runtime engine to request a scene estimate that is triggered at a particular time interval or by an event detected by the application user interface. When the scene is estimated, the runtime engine may signal an operating system 410, such as the Linux kernel 412 running on the SOC 420. Operating system 410 may cause computations to be performed on CPU 422, DSP 424, GPU 426, NPU 428, or some combination thereof. CPU 422 may be accessed directly by the operating system, and other processing blocks may be accessed through drivers, such as drivers 414-418 for DSP 424, GPU 426, or NPU 428. In the illustrative example, the deep neural network may be configured to operate on a combination of processing blocks, such as CPU 422 and GPU 426, or may be operated on NPU 428 if present.

[0067]図５は、スマートフォン５０２上のＡＩアプリケーションのランタイム動作５００を示すブロック図である。ＡＩアプリケーションは、画像５０６のフォーマットを変換し、次いで画像５０８をクロップおよび／またはリサイズするように（たとえば、ＪＡＶＡ（登録商標）プログラミング言語を使用して）構成され得る前処理モジュール５０４を含み得る。次いで、前処理された画像は、視覚入力に基づいてシーンを検出および分類するように（たとえば、Ｃプログラミング言語を使用して）構成され得るＳｃｅｎｅＤｅｔｅｃｔバックエンドエンジン５１２を含んでいる分類アプリケーション５１０に通信され得る。ＳｃｅｎｅＤｅｔｅｃｔバックエンドエンジン５１２は、スケーリング５１６およびクロッピング５１８によって、画像をさらに前処理５１４するように構成され得る。たとえば、画像は、得られた画像が２２４ピクセル×２２４ピクセルであるように、スケーリングされ、クロップされ得る。これらの次元は、ニューラルネットワークの入力次元にマッピングし得る。ニューラルネットワークは、ＳＯＣ１００の様々な処理ブロックに、深層ニューラルネットワークを用いて画像ピクセルをさらに処理させるように、深層ニューラルネットワークブロック５２０によって構成され得る。次いで、深層ニューラルネットワークの結果は、しきい値処理５２２され、分類アプリケーション５１０中の指数平滑化ブロック５２４を通され得る。次いで、平滑化された結果は、スマートフォン５０２の設定および／またはディスプレイの変更を生じ得る。
分類のためのスケールファクタおよびしきい値選択
[0068]本開示の態様は、メディアの分類、特に、ピクチャファイルを含むメディアファイルをラベリングするためのを対象とする。態様は、バイナリおよびマルチラベル分類を対象とする。特に、例示的な例では、３つの別個のサンプル画像が、異なる色のサッカーボールを含んでいる。第１の画像は青色サッカーボールのみを含んでおり、第２の画像は緑色サッカーボールのみを含んでおり、第３の画像は赤色サッカーボールのみを含んでいる。各画像は、画像中のサッカーボールの色に基づいてラベリングされ得る。ラベルを割り当てるこのプロセスは分類と呼ばれる。別の場合には、単一の画像がいくつかの色のサッカーボールを含んでいる。同じタスクのために、画像は複数の色でラベリングされる。これはマルチラベル分類と呼ばれる。 FIG. 5 is a block diagram illustrating the runtime operation 500 of the AI application on the smartphone 502. The AI application may include a preprocessing module 504 that may be configured to convert the format of the image 506 and then crop and / or resize the image 508 (eg, using the JAVA programming language). The preprocessed image is then communicated to a classification application 510 that includes a SceneDetect backend engine 512 that can be configured (eg, using the C programming language) to detect and classify a scene based on visual input. Can be done. The SceneDetect backend engine 512 may be configured to further pre-process 514 the image with scaling 516 and cropping 518. For example, the image can be scaled and cropped such that the resulting image is 224 pixels by 224 pixels. These dimensions can be mapped to the input dimensions of the neural network. The neural network may be configured by the deep neural network block 520 to cause various processing blocks of the SOC 100 to further process the image pixels using the deep neural network. The deep neural network results can then be thresholded 522 and passed through an exponential smoothing block 524 in the classification application 510. The smoothed result may then result in settings on the smartphone 502 and / or changes in the display.
Scale factor and threshold selection for classification
[0068] Aspects of the present disclosure are directed to media classification, in particular for labeling media files including picture files. Aspects are directed to binary and multi-label classification. In particular, in the illustrative example, three separate sample images contain different colored soccer balls. The first image includes only a blue soccer ball, the second image includes only a green soccer ball, and the third image includes only a red soccer ball. Each image can be labeled based on the color of the soccer ball in the image. This process of assigning labels is called classification. In another case, a single image contains several colored soccer balls. For the same task, images are labeled with multiple colors. This is called multi-label classification.

[0069]機械学習では、分類器は、各ラベルについてのスコアと、決定関数とを与える。決定関数は、スコアがあるしきい値を上回るかどうかを検査する。シングルラベル分類器の場合、どのラベルが正しいかを決定するためにすべてのラベルのスコアが考慮される。 [0069] In machine learning, the classifier provides a score and a decision function for each label. The decision function tests whether the score is above a certain threshold. For a single label classifier, the scores of all labels are taken into account to determine which label is correct.

[0070]マルチラベル分類の場合、各ラベルは他のラベルのスコアにかかわらず正しいことがある。したがって、しきい値は、どのラベルがオブジェクトに属するかを決定するために重要である。極めて高いスコアをもつフォールスポジティブまたは極めて低いスコアをもつフォールスネガティブを出力する分類器を用いた作業は、正しいしきい値を見つける問題を困難にする。本開示の態様は、分類のためのスケールファクタおよびしきい値選択を改善することを対象とする。 [0070] For multi-label classification, each label may be correct regardless of the score of the other labels. The threshold is therefore important for determining which labels belong to the object. Working with a classifier that outputs false positives with very high scores or false negatives with very low scores makes the problem of finding the correct threshold difficult. Aspects of the present disclosure are directed to improving scale factor and threshold selection for classification.

[0071]図６は、バイナリ分類プロセスを示す例示的な流れ図６００である。一例では、分類プロセスはトレーニングフェーズ６０１と予測フェーズ６０２とを含む。トレーニングフェーズ６０１では、画像が特徴抽出器６１０に入力される。音または画像を含む、任意のタイプのマルチメディアファイルが特徴抽出器に入力され得ることを、当業者は諒解されよう。この例示的な例では、各画像は、画像の特徴および分類を取得するために特徴抽出器６１０を通される。この例では、画像のバイナリ分類が取得される。バイナリ分類はポジティブ応答またはネガティブ応答であり得る。代替的に、出力は「はい」または「いいえ」ラベルであり得る。学習関数６１２は、トレーニングの特定の概念または要素のための特徴を学習する。 [0071] FIG. 6 is an exemplary flow diagram 600 illustrating a binary classification process. In one example, the classification process includes a training phase 601 and a prediction phase 602. In the training phase 601, an image is input to the feature extractor 610. Those skilled in the art will appreciate that any type of multimedia file, including sounds or images, can be input to the feature extractor. In this illustrative example, each image is passed through a feature extractor 610 to obtain image features and classifications. In this example, a binary classification of the image is obtained. The binary classification can be a positive response or a negative response. Alternatively, the output can be a “yes” or “no” label. A learning function 612 learns features for a particular concept or element of training.

[0072]次に、予測フェーズ６０２では、画像は特徴抽出器６２０を通される。特徴は分類器６２２に供給され、学習関数６１２によって利用される学習モデルに基づいて、分類器６２２はスコアを出力する。決定関数６２４はスコアを受信する。一態様では、決定関数６２４は、スコアが０よりも大きいのか小さいのかを決定する。スコアが０よりも大きく、しきい値が０（またはしきい値なし）であるとき、出力は「はい」である。他の場合、出力は「いいえ」である。決定関数は、バイナリ分類器によって利用されるグローバルしきい値（たとえば、０）に基づき得る。 [0072] Next, in the prediction phase 602, the image is passed through a feature extractor 620. The features are supplied to the classifier 622, and based on the learning model used by the learning function 612, the classifier 622 outputs a score. Decision function 624 receives the score. In one aspect, the decision function 624 determines whether the score is greater than or less than zero. When the score is greater than 0 and the threshold is 0 (or no threshold), the output is “yes”. In other cases, the output is “no”. The decision function may be based on a global threshold (eg, 0) utilized by the binary classifier.

[0073]適合率および再現率など、追加の基準が、分類器の性能を決定する際に利用され得る。適合率は、ポジティブクラスに属するとラベリングされた要素の総数（たとえば、トゥルーポジティブ（true positive）と、クラスに属すると間違ってラベリングされたアイテムであるフォールスポジティブとの和）で除算されたトゥルーポジティブの数（たとえば、ポジティブクラスに属すると正しくラベリングされたアイテムの数）である。再現率は、ポジティブクラスに実際に属する要素の総数（たとえば、トゥルーポジティブと、ポジティブクラスに属するとラベリングされなかったが、そうされるべきであったアイテムであるフォールスネガティブとの和）で除算されたトゥルーポジティブの数である。図７は、適合率と再現率と（適合率および再現率に基づく）Ｆ尺度式との概念を示す。 [0073] Additional criteria, such as precision and recall, can be utilized in determining the performance of the classifier. The precision is true positive divided by the total number of elements labeled as belonging to the positive class (for example, the sum of true positives and false positives, items that are incorrectly labeled as belonging to the class) (Eg, the number of items correctly labeled as belonging to the positive class). The recall is divided by the total number of elements that actually belong to the positive class (for example, the sum of true positives and false negatives, items that were not labeled but belonged to the positive class). The number of true positives. FIG. 7 shows the concept of precision, recall and F scale formula (based on precision and recall).

[0074]以下は、メディア分類の例示的な例である。機械が、サンプル画像中のサッカーボールをラベリングするタスクを実行するように構成される。特に、機械は、入力として画像を受け、画像についてラベル（たとえば、色）のリストを出力する分類器を利用する。この例では、機械は、青色ボールをもつ３つの画像と、緑色ボールをもつ３つの画像と、赤色ボールをもつ４つの画像とを与えられる。分類器は、赤色ボールを有した画像のうちの２つのみにラベル「赤」を出力し、緑のボールを有した画像にラベル「赤」を誤って出力する。適合率は、「赤」とラベリングされた画像の総数で除算された「赤」と正しくラベリングされた画像の数である。この例では、ラベル「赤」についての適合率は２／３である。再現率は、ラベル「赤」であるべきであった画像の総数で除算された赤と正しくラベリングされた画像の数である。前の例では、再現率は２／４＝１／２である。 [0074] The following is an illustrative example of media classification. A machine is configured to perform the task of labeling a soccer ball in the sample image. In particular, the machine utilizes a classifier that takes an image as input and outputs a list of labels (eg, colors) for the image. In this example, the machine is given three images with a blue ball, three images with a green ball, and four images with a red ball. The classifier outputs the label “red” to only two of the images with a red ball and erroneously outputs the label “red” to an image with a green ball. The precision is the number of images correctly labeled “red” divided by the total number of images labeled “red”. In this example, the precision for the label “red” is 2/3. The recall is the number of images correctly labeled red and divided by the total number of images that should have been the label “red”. In the previous example, the recall is 2/4 = 1/2.

[0075]最適なしきい値は、適合率および再現率が両方とも１である、１である。これは、フォールスポジティブおよびフォールスネガティブが精度に影響を及ぼすので、めったに起こらない。適合率および再現率は、あるラベルに割り当てられたオブジェクトの数が、そのラベルに割り当てられるべきであるオブジェクトの数に等しいときに等しくなる。前の例では、４つの画像を「赤」とラベリングすることが、適合率と再現率とを等しくすることになる。５つ以上の画像をラベリングすることは、間違った画像を赤とラベリングする可能性が高くなるので、適合率を減少させる可能性が極めて高いことになる。４つ未満の画像をラベリングすることは、正しくラベリングされた画像が除外される場合、分子を減少させることになるので、再現率を減少させる可能性がある。したがって、適合率と再現率との間の折衷がある。言い換えれば、より高い適合率は、再現率を犠牲にして取得され、その逆も同様である。 [0075] The optimal threshold is 1, where the precision and recall are both 1. This rarely occurs because false positives and false negatives affect accuracy. The precision and recall are equal when the number of objects assigned to a label is equal to the number of objects that should be assigned to that label. In the previous example, labeling the four images as “red” will make the precision and recall equal. Labeling five or more images is very likely to reduce the precision because it is more likely to label the wrong image as red. Labeling less than four images can reduce the reproducibility because it will reduce the molecules if correctly labeled images are excluded. Therefore, there is a compromise between precision and recall. In other words, a higher precision is obtained at the expense of recall and vice versa.

[0076]図８Ａは、本開示の態様による、分類プロセス８００の全体的例を示すブロック図である。分類プロセスはトレーニングフェーズ８０１と予測フェーズ８０２とを含む。トレーニングフェーズ８０１では、特徴抽出器８１０が、各画像および／またはメディアファイルを受信し、受信された画像の特徴およびバイナリ分類を出力する。学習関数８１２が、トレーニングの特定の概念または要素のための特定の特徴を学習する。 [0076] FIG. 8A is a block diagram illustrating a general example of a classification process 800 in accordance with aspects of the present disclosure. The classification process includes a training phase 801 and a prediction phase 802. In the training phase 801, a feature extractor 810 receives each image and / or media file and outputs the received image features and binary classification. A learning function 812 learns specific features for specific concepts or elements of training.

[0077]予測フェーズ８０２では、特徴抽出器８２０が、各画像を受信し、分類器８２２に画像の特徴を出力する。受信された特徴およびトレーニングモデルに基づいて、分類器８２２は活性化関数８２４に未加工スコアを出力する。活性化関数８２４は、ある範囲内に入るようにスコアを正規化し、たとえば、範囲は、０から１の間であるか、または１から−１の間の範囲中にあり得る。さらに、傾き選択関数８３０が、活性化関数８２４による使用のためのスケーリングファクタ（たとえば、傾き）を決定する。様々なパラメータは、以下で説明される活性化関数８２４によって使用されるファクタに影響を及ぼすように変更され得る。活性化関数８２４は、ロジスティック関数、ｔａｎ−ｈ関数または線形正規化関数であり得る。 [0077] In the prediction phase 802, the feature extractor 820 receives each image and outputs the image features to the classifier 822. Based on the received features and training model, classifier 822 outputs a raw score to activation function 824. The activation function 824 normalizes the score to fall within a range, for example, the range can be between 0 and 1 or in the range between 1 and -1. Further, the slope selection function 830 determines a scaling factor (eg, slope) for use by the activation function 824. Various parameters may be modified to affect the factors used by the activation function 824 described below. The activation function 824 can be a logistic function, a tan-h function, or a linear normalization function.

[0078]活性化関数８２４による正規化されたスコア出力は、決定関数８２６によって受信される。しきい値選択関数８４０が、決定関数８２６による使用のためのしきい値を決定する。いくつかの態様では、しきい値選択関数８４０は、０以外のしきい値を決定する。しきい値選択関数８４０は以下でより詳細に説明される。 [0078] The normalized score output by the activation function 824 is received by the decision function 826. A threshold selection function 840 determines a threshold for use by the decision function 826. In some aspects, the threshold selection function 840 determines a non-zero threshold. The threshold selection function 840 is described in more detail below.

[0079]図８Ｂは、傾き選択関数８３０の一例を示す。傾き選択関数８３０は、特定の概念／ラベルについて未加工スコアのリストを作成するために、画像データセットを使用する。スコアの望ましい分布を取得するために、傾き選択関数８３０はスケールファクタ（たとえば、傾き）を決定する。特に、画像のデータベースからの未加工スコア８３２が供給される。活性化関数８３３が未加工スコア８３２に適用される。次いで、スコアはブロック８３５においてソートされる。一例では、ソートされたスコアはまた、グラフで示される。特定の範囲内にあるスコアの割合が、ブロック８３７において計算される。さらに、ターゲット割合も確立される。ターゲット割合は、値のある範囲内にある画像のパーセントを示す。ターゲット割合が満たされると、スケールファクタ８３８は範囲内の画像の数をもたらした量に設定される。たとえば、ターゲット割合が９０％である場合、画像の９０％が特定の範囲内にあると、スケールファクタ８３８は、その範囲中の画像のその量を与えた値に設定される。 [0079] FIG. 8B illustrates an example of a slope selection function 830. The slope selection function 830 uses the image data set to create a list of raw scores for a particular concept / label. In order to obtain the desired distribution of scores, the slope selection function 830 determines a scale factor (eg, slope). In particular, a raw score 832 from a database of images is provided. An activation function 833 is applied to the raw score 832. The scores are then sorted at block 835. In one example, the sorted score is also shown graphically. The percentage of scores that are within a particular range is calculated at block 837. In addition, a target ratio is established. The target percentage indicates the percentage of images that are within a range of values. When the target percentage is met, the scale factor 838 is set to an amount that resulted in the number of images in range. For example, if the target percentage is 90% and 90% of the image is within a certain range, the scale factor 838 is set to a value given that amount of images in that range.

[0080]さらに、ターゲット割合が満たされないとき、スケールファクタは調節される。たとえば、スケールファクタは、ブロック８３９においてアルファの値だけ増分的に調節され得る。調節されたスケールファクタ８３６は、ブロック８３３において活性化関数によって適用され、プロセスは繰り返される。スケールファクタは、ターゲット割合が達成されるまで、繰り返し増分的に調節される。別の態様では、傾き選択関数８３０は、ターゲット割合の代わりにターゲット傾きを利用する。たとえば、特定の傾きは「ａ」から「ｂ」の間の範囲をターゲットにされ得る。随意に、別の態様では、スケールファクタを増分するのではなく、最小スケールファクタおよび最大スケールファクタを定義することによって、代替の探索関数が利用され得る。特に、たとえば、スケールファクタは、新しいスケールファクタを決定するために、最小スケールファクタと最大スケールファクタとの間の差を２で除算することによって調節され得る。別の随意の態様では、範囲端点（range end point）のみが、異なるスケーリングファクタを通して反復するときに使用される。さらに、別の態様では、スケールファクタは、範囲端点における活性化関数の逆を使用することによって近似され得る。 [0080] Furthermore, the scale factor is adjusted when the target percentage is not met. For example, the scale factor may be adjusted incrementally by an alpha value at block 839. The adjusted scale factor 836 is applied by the activation function at block 833 and the process is repeated. The scale factor is adjusted incrementally repeatedly until the target percentage is achieved. In another aspect, the slope selection function 830 utilizes target slope instead of target percentage. For example, a particular slope may be targeted to a range between “a” and “b”. Optionally, in another aspect, instead of incrementing the scale factor, alternative search functions can be utilized by defining a minimum scale factor and a maximum scale factor. In particular, for example, the scale factor may be adjusted by dividing the difference between the minimum and maximum scale factors by two to determine a new scale factor. In another optional aspect, only range end points are used when iterating through different scaling factors. Furthermore, in another aspect, the scale factor can be approximated by using the inverse of the activation function at the range endpoint.

[0081]しきい値選択関数８４０は、図８Ｃに示されているように、しきい値を調節するために利用され得る。０以外の値にしきい値を調節することによって、改善された精度が観測され得る。さらに、しきい値を調節することによって、適合率と再現率との間のトレードオフが実現され得る。たとえば、しきい値は、再現率を犠牲にして所望の適合率を取得するように調整され得、その逆も同様である。さらに、しきい値を調節することは、（画像中の当該の特定のオブジェクトの周囲のオブジェクトを反映する）周囲値を除外する。たとえば、画像が、青空を背景に草原上の木および椅子を含んでいる場合、分類器は、木、草および空を一般的な周囲とみなすようにトレーニングされ得る。しきい値を調節することは、木および草に関連する周囲値を除外し、したがって、椅子に関連する値を考慮に入れる。 [0081] A threshold selection function 840 may be utilized to adjust the threshold, as shown in FIG. 8C. By adjusting the threshold to a value other than 0, improved accuracy can be observed. Furthermore, by adjusting the threshold, a trade-off between precision and recall can be realized. For example, the threshold may be adjusted to obtain a desired precision at the expense of recall and vice versa. Furthermore, adjusting the threshold excludes ambient values (reflecting objects around that particular object in the image). For example, if the image includes trees and chairs on a meadow against a blue sky, the classifier can be trained to consider trees, grass and the sky as a general surrounding. Adjusting the threshold excludes ambient values associated with trees and grass, and therefore takes into account values associated with chairs.

[0082]一態様では、しきい値は、各ラベルについてのスコアをソートすることと、ソートの後に適合率および再現率を計算することと、次いで、しきい値を選択するために計算を実行することとによって決定され得る。図８Ｃは、しきい値を決定するしきい値選択関数８４０の一例を示す。最初に、特定のラベルについて、すべての入力についての正規化されたスコアが取得される。ソート関数８４２は、正規化されたスコアをソートし、順序付きリストを随意に作成し得る。たとえば、スコアは降順でソートされ得る。スコアのソートされたリストを使用して、計算関数８４４は、各スコアをしきい値にすることによって適合率および再現率を計算する。言い換えれば、適合率値および再現率値は、候補しきい値の対応するセットの各々について計算される。次いで、候補しきい値からしきい値が選択され得る。選択は、ターゲット適合率値および／またはターゲット再現率値に少なくとも部分的に基づき得る。 [0082] In one aspect, the threshold is calculated by sorting the scores for each label, calculating the precision and recall after sorting, and then performing a calculation to select the threshold Can be determined. FIG. 8C shows an example of a threshold selection function 840 that determines the threshold. First, normalized scores for all inputs are obtained for a particular label. Sort function 842 may sort the normalized scores and optionally create an ordered list. For example, the scores can be sorted in descending order. Using the sorted list of scores, the calculation function 844 calculates precision and recall by thresholding each score. In other words, the precision value and recall value are calculated for each corresponding set of candidate threshold values. A threshold can then be selected from the candidate thresholds. The selection may be based at least in part on the target precision value and / or the target recall value.

[0083]代替的に、あらゆるスコアを使用するのではなく、連続するスコアの平均が、設定されたしきい値として使用され得る。適合率および再現率を計算した後に、適合率および再現率に基づいてしきい値が選択関数８４６によって選択される。選択関数は、しきい値と、関連する適合率値および／または再現率値との組合せを分析する。 [0083] Alternatively, instead of using every score, an average of consecutive scores may be used as the set threshold. After calculating the precision and recall, a threshold is selected by the selection function 846 based on the precision and recall. The selection function analyzes a combination of thresholds and associated precision and / or recall values.

[0084]さらに、別の態様では、しきい値は、最大Ｆスコアに対応する値に基づき得る。これは、たとえば、適合率値がターゲット適合率を上回る値がないとき、再現率値がターゲット再現率値を上回るとき、あるいは適合率値ターゲットが満たされるときに適合率または再現率が低すぎるとき、起こり得る。さらに、しきい値は、適合率または再現率のほうへ傾くベータ値を使用するＦスコアに基づいて選択され得る。 [0084] Further, in another aspect, the threshold may be based on a value corresponding to a maximum F score. This is the case, for example, when there is no value for which the precision value is above the target precision, when the recall value is above the target recall value, or when the precision or recall is too low when the precision value target is met Can happen. Further, the threshold may be selected based on an F-score that uses a beta value that leans towards precision or recall.

[0085]図９は、特定のラベル（たとえば、「空」）についてのスコアを示すグラフ９００である。分類器は、画像中の異なる概念を学習するようにトレーニングされ得る。数千個の画像が分類器を通過し、「空」についてのソートおよび正規化されたスコアがライン９０１において示されている。各スコアは−１．０から１．０の間の可能な値を有する。次いで、適合率および再現率は、それぞれ、ライン９０２および９０３において計算およびプロットされる。適合率ライン９０２および再現率ライン９０３は、グラフの右側の０．０〜１．０の異なるスケール上にある。ライン９０４はしきい値ラインである。ライン９０４は、選択されたしきい値を示し、選択されたしきい値は、破線が、ソートされたスコアライン９０１と交差するところの分類器スコアである。ライン９０１に沿った各スコアは候補しきい値として選択され得、垂直しきい値ライン（たとえば、９０４）は、その候補しきい値について適合率および再現率を決定するために分析される。 [0085] FIG. 9 is a graph 900 showing the score for a particular label (eg, “empty”). The classifier can be trained to learn different concepts in the image. Thousands of images pass through the classifier and the sorted and normalized score for “empty” is shown in line 901. Each score has a possible value between -1.0 and 1.0. The precision and recall are then calculated and plotted at lines 902 and 903, respectively. The precision line 902 and recall line 903 are on different scales of 0.0 to 1.0 on the right side of the graph. Line 904 is a threshold line. Line 904 shows the selected threshold, and the selected threshold is the classifier score where the dashed line intersects the sorted score line 901. Each score along line 901 can be selected as a candidate threshold, and a vertical threshold line (eg, 904) is analyzed to determine the relevance and recall for that candidate threshold.

[0086]限定はしないが、ターゲット適合率および最大Ｆ尺度など、しきい値を選択するために、様々な方法が使用され得る。たとえば、ターゲット適合率では、ターゲット適合率をわずかに上回る適合率をもつスコアが選択される。たとえば、しきい値は、９０％の適合率をターゲットにすることによって選択され得る。 [0086] Various methods may be used to select thresholds such as, but not limited to, target relevance and maximum F scale. For example, for target precision, a score with a precision that is slightly above the target precision is selected. For example, the threshold may be selected by targeting a 90% match rate.

[0087]いくつかのシナリオでは、しきい値はターゲット割合を満たさないことがあり、フォールバック方法が利用される。たとえば、図８ＣのＦ尺度関数８４８は、Ｆ尺度式を利用し、最大Ｆスコアに対応する値に基づいてしきい値を選択し得る。Ｆ尺度式は以下の通りである。 [0087] In some scenarios, the threshold may not meet the target percentage and a fallback method is utilized. For example, the F scale function 848 of FIG. 8C may utilize an F scale formula and select a threshold based on the value corresponding to the maximum F score. The F scale formula is as follows.

ここで、ｉは画像カウントである。ａｒｇｒｎａｘ（Ｆ_β）は、スコアのリストへのインデックスを決定するために計算される。このロケーションにあるスコアがしきい値である。ベータ（β）パラメータは、再現率または適合率のほうへ傾く方法を与える。ベータが１よりも大きいとき（β＞１）、再現率により多くの重きが置かれる。Ｆ尺度を調節することは、適合率および／または再現率に関するフィードバックを与える。さらに、Ｆ尺度式中のベータ値は、適合率値または再現率値に影響を及ぼすように操作され得る。図１０は、Ｆ尺度を使用するしきい値選択を示すグラフ１０００である。ライン１００５、１００６および１００７は、Ｆ尺度のための異なるベータ値を使用することの結果である。 Here, i is an image count. argrnax (F _β ) is calculated to determine an index into the list of scores. The score at this location is the threshold. The beta (β) parameter provides a way to lean towards recall or precision. When beta is greater than 1 (β> 1), more emphasis is placed on recall. Adjusting the F scale provides feedback on precision and / or recall. Further, the beta value in the F scale equation can be manipulated to affect the precision value or recall value. FIG. 10 is a graph 1000 illustrating threshold selection using the F scale. Lines 1005, 1006 and 1007 are the result of using different beta values for the F scale.

[0088]随意に、代替態様では、しきい値ではなくバイアス値が利用される。特に、しきい値を使用する代わりに、しきい値は、バイアスを追加することによって、またはしきい値に基づいてスコアを正規化することによってスコア中に埋め込まれ得る。さらに、随意の態様では、実際のスコアを使用するのではなく、概念ごとのスコアが符号化され得、したがって、スコアは各概念のスコアを表さない。 [0088] Optionally, in an alternative embodiment, a bias value is utilized rather than a threshold value. In particular, instead of using a threshold, the threshold can be embedded in the score by adding a bias or by normalizing the score based on the threshold. Further, in an optional aspect, rather than using an actual score, a score for each concept may be encoded, so the score does not represent a score for each concept.

[0089]一構成では、モデルが、順序付きリストを作成するために、第１のラベルに関連するラベルスコアのセットをソートするために構成される。モデルはまた、スコア値のセット（たとえば、複数のスコア値）から、候補しきい値のセットに対応する適合率値および再現率値を計算するために構成される。さらに、モデルは、ターゲット適合率またはターゲット再現率に基づいて、第１のラベルについて候補しきい値からしきい値を選択するために構成される。モデルは、ソートするための手段、計算するための手段、および／または選択するための手段を含む。一態様では、ソート手段、計算手段、および／または選択手段は、具陳された機能を実行するように構成された、汎用プロセッサ１０２、汎用プロセッサ１０２に関連するプログラムメモリ、メモリブロック１１８、ローカル処理ユニット２０２、およびまたはルーティング接続処理ユニット２１６であり得る。別の構成では、上述の手段は、上述の手段によって具陳された機能を実行するように構成された任意のモジュールまたは任意の装置であり得る。 [0089] In one configuration, the model is configured to sort a set of label scores associated with the first label to create an ordered list. The model is also configured to calculate a precision value and recall value corresponding to the set of candidate threshold values from a set of score values (eg, multiple score values). Further, the model is configured to select a threshold from the candidate thresholds for the first label based on the target precision or target recall. The model includes means for sorting, means for calculating, and / or means for selecting. In one aspect, the sorting means, calculating means, and / or selecting means are configured to perform the indicated function, the general purpose processor 102, the program memory associated with the general purpose processor 102, the memory block 118, local processing It may be unit 202 and / or routing connection processing unit 216. In another configuration, the means described above may be any module or any device configured to perform the function provided by the means described above.

[0090]別の構成では、モデルが、順序付きリストを作成するために、第１のラベルに関連するラベルスコアのセットをソートするために構成される。モデルはまた、範囲内のスコアのメトリックを計算するために、およびスコアのメトリックが範囲内にないとき、スケールファクタを調節するために構成される。モデルは、メトリックを計算するための手段および／または調整するための手段を含む。一態様では、メトリック計算手段および／または調節手段は、具陳された機能を実行するように構成された、汎用プロセッサ１０２、汎用プロセッサ１０２に関連するプログラムメモリ、メモリブロック１１８、ローカル処理ユニット２０２、およびまたはルーティング接続処理ユニット２１６であり得る。別の構成では、上述の手段は、上述の手段によって具陳された機能を実行するように構成された任意のモジュールまたは任意の装置であり得る。 [0090] In another configuration, the model is configured to sort a set of label scores associated with the first label to create an ordered list. The model is also configured to calculate a metric for scores within the range and to adjust the scale factor when the score metric is not within range. The model includes means for calculating and / or adjusting the metrics. In one aspect, the metric calculation means and / or adjustment means is a general purpose processor 102, a program memory associated with the general purpose processor 102, a memory block 118, a local processing unit 202, configured to perform the indicated function. And / or a routing connection processing unit 216. In another configuration, the means described above may be any module or any device configured to perform the function provided by the means described above.

[0091]さらに、モデルはまた、スケールファクタを増分するための手段、および／または除算するための手段を含み得る。一態様では、増分手段および除算手段は、具陳された機能を実行するように構成された、汎用プロセッサ１０２、汎用プロセッサ１０２に関連するプログラムメモリ、メモリブロック１１８、ローカル処理ユニット２０２、およびまたはルーティング接続処理ユニット２１６であり得る。別の構成では、上述の手段は、上述の手段によって具陳された機能を実行するように構成された任意のモジュールまたは任意の装置であり得る。 [0091] In addition, the model may also include means for incrementing the scale factor and / or means for dividing. In one aspect, the incrementing means and the dividing means are general purpose processor 102, program memory associated with general purpose processor 102, memory block 118, local processing unit 202, and / or routing configured to perform the indicated function. It may be a connection processing unit 216. In another configuration, the means described above may be any module or any device configured to perform the function provided by the means described above.

[0092]本開示のいくつかの態様によれば、各ローカル処理ユニット２０２は、ネットワークの所望の１つまたは複数の機能的特徴に基づいてネットワークのパラメータを決定し、決定されたパラメータがさらに適合、調整および更新されるように、１つまたは複数の機能的特徴を所望の機能的特徴のほうへ発達させるように構成され得る。 [0092] According to some aspects of the present disclosure, each local processing unit 202 determines network parameters based on the desired one or more functional characteristics of the network, and the determined parameters are further adapted. Can be configured to develop one or more functional features toward the desired functional features as adjusted and updated.

[0093]図１１は、マルチラベル分類のためのしきい値を選択するための方法１１００を示す。ブロック１１０２において、プロセスは、順序付きリストを作成するために、第１のラベルに関連するラベルスコアのセットをソートする。ブロック１１０４において、プロセスは、スコア値のセットから、候補しきい値のセットに対応する適合率値および再現率値を計算する。さらに、ブロック１１０６において、プロセスは、ターゲット適合率またはターゲット再現率に基づいて、第１のラベルについて候補しきい値からしきい値を選択する。 [0093] FIG. 11 shows a method 1100 for selecting a threshold for multi-label classification. At block 1102, the process sorts the set of label scores associated with the first label to create an ordered list. In block 1104, the process calculates a precision value and recall value corresponding to the set of candidate threshold values from the set of score values. Further, at block 1106, the process selects a threshold from the candidate thresholds for the first label based on the target precision or target recall.

[0094]図１２は、活性化関数のためのスケールファクタを選択するための方法１２００を示す。ブロック１２０２において、プロセスは範囲内のスコアのメトリックを計算する。ブロック１２０４において、プロセスは、スコアのメトリックが範囲内にないとき、スケールファクタを調節する。 [0094] FIG. 12 shows a method 1200 for selecting a scale factor for an activation function. At block 1202, the process calculates a metric for scores within the range. In block 1204, the process adjusts the scale factor when the score metric is not within range.

[0095]上記で説明された方法の様々な動作は、対応する機能を実行することが可能な任意の好適な手段によって実行され得る。それらの手段は、限定はしないが、回路、特定用途向け集積回路（ＡＳＩＣ）、またはプロセッサを含む、様々な（１つまたは複数の）ハードウェアおよび／またはソフトウェア構成要素および／またはモジュールを含み得る。概して、図に示されている動作がある場合、それらの動作は、同様の番号をもつ対応するカウンターパートのミーンズプラスファンクション構成要素を有し得る。 [0095] Various operations of the methods described above may be performed by any suitable means capable of performing corresponding functions. Such means may include various (one or more) hardware and / or software components and / or modules including, but not limited to, circuits, application specific integrated circuits (ASICs), or processors. . In general, if there are operations shown in the figures, they may have corresponding counterpart means-plus-function components with similar numbers.

[0096]本明細書で使用される「決定すること」という用語は、多種多様なアクションを包含する。たとえば、「決定すること」は、計算すること（calculating）、計算すること（computing）、処理すること、導出すること、調査すること、ルックアップすること（たとえば、テーブル、データベースまたは別のデータ構造においてルックアップすること）、確認することなどを含み得る。さらに、「決定すること」は、受信すること（たとえば、情報を受信すること）、アクセスすること（たとえば、メモリ中のデータにアクセスすること）などを含み得る。さらに、「決定すること」は、解決すること、選択すること、選定すること、確立することなどを含み得る。 [0096] As used herein, the term "determining" encompasses a wide variety of actions. For example, “determining” means calculating, computing, processing, deriving, examining, looking up (eg, a table, database or another data structure Lookup), confirmation, etc. Further, “determining” can include receiving (eg, receiving information), accessing (eg, accessing data in a memory) and the like. Further, “determining” may include resolving, selecting, selecting, establishing and the like.

[0097]本明細書で使用される、項目のリスト「のうちの少なくとも１つ」を指す句は、単一のメンバーを含む、それらの項目の任意の組合せを指す。一例として、「ａ、ｂ、またはｃのうちの少なくとも１つ」は、ａ、ｂ、ｃ、ａ−ｂ、ａ−ｃ、ｂ−ｃ、およびａ−ｂ−ｃを包含するものとする。 [0097] As used herein, a phrase referring to "at least one of a list of items" refers to any combination of those items, including a single member. By way of example, “at least one of a, b, or c” is intended to include a, b, c, ab, ac, bc, and abc.

[0098]本開示に関連して説明された様々な例示的な論理ブロック、モジュールおよび回路は、汎用プロセッサ、デジタル信号プロセッサ（ＤＳＰ）、特定用途向け集積回路（ＡＳＩＣ）、フィールドプログラマブルゲートアレイ信号（ＦＰＧＡ）または他のプログラマブル論理デバイス（ＰＬＤ）、個別ゲートまたはトランジスタ論理、個別ハードウェア構成要素、あるいは本明細書で説明された機能を実行するように設計されたそれらの任意の組合せを用いて実装または実行され得る。汎用プロセッサはマイクロプロセッサであり得るが、代替として、プロセッサは、任意の市販のプロセッサ、コントローラ、マイクロコントローラ、または状態機械であり得る。プロセッサはまた、コンピューティングデバイスの組合せ、たとえば、ＤＳＰとマイクロプロセッサとの組合せ、複数のマイクロプロセッサ、ＤＳＰコアと連携する１つまたは複数のマイクロプロセッサ、あるいは任意の他のそのような構成として実装され得る。 [0098] Various exemplary logic blocks, modules and circuits described in connection with this disclosure may include general purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate array signals ( FPGA) or other programmable logic device (PLD), individual gate or transistor logic, individual hardware components, or any combination thereof designed to perform the functions described herein Or it can be implemented. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any commercially available processor, controller, microcontroller or state machine. The processor is also implemented as a combination of computing devices, eg, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors associated with a DSP core, or any other such configuration. obtain.

[0099]本開示に関連して説明された方法またはアルゴリズムのステップは、ハードウェアで直接実施されるか、プロセッサによって実行されるソフトウェアモジュールで実施されるか、またはその２つの組合せで実施され得る。ソフトウェアモジュールは、当技術分野で知られている任意の形態の記憶媒体中に常駐し得る。使用され得る記憶媒体のいくつかの例としては、ランダムアクセスメモリ（ＲＡＭ）、読取り専用メモリ（ＲＯＭ）、フラッシュメモリ、消去可能プログラマブル読取り専用メモリ（ＥＰＲＯＭ）、電気消去可能プログラマブル読取り専用メモリ（ＥＥＰＲＯＭ（登録商標））、レジスタ、ハードディスク、リムーバブルディスク、ＣＤ−ＲＯＭなどがある。ソフトウェアモジュールは、単一の命令、または多数の命令を備え得、いくつかの異なるコードセグメント上で、異なるプログラム間で、および複数の記憶媒体にわたって分散され得る。記憶媒体は、プロセッサがその記憶媒体から情報を読み取ることができ、その記憶媒体に情報を書き込むことができるように、プロセッサに結合され得る。代替として、記憶媒体はプロセッサと一体であり得る。 [0099] The method or algorithm steps described in connection with this disclosure may be implemented directly in hardware, in a software module executed by a processor, or in a combination of the two. . A software module may reside in any form of storage medium that is known in the art. Some examples of storage media that may be used include random access memory (RAM), read only memory (ROM), flash memory, erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM ( Registered trademark)), registers, hard disks, removable disks, CD-ROMs, and the like. A software module may comprise a single instruction, or multiple instructions, and may be distributed over several different code segments, between different programs, and across multiple storage media. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.

[00100]本明細書で開示された方法は、説明された方法を達成するための１つまたは複数のステップまたはアクションを備える。本方法のステップおよび／またはアクションは、特許請求の範囲から逸脱することなく、互いに交換され得る。言い換えれば、ステップまたはアクションの特定の順序が指定されない限り、特定のステップおよび／またはアクションの順序および／または使用は特許請求の範囲から逸脱することなく変更され得る。 [00100] The methods disclosed herein comprise one or more steps or actions for achieving the described method. The method steps and / or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is specified, the order and / or use of specific steps and / or actions may be changed without departing from the scope of the claims.

[00101]説明された機能は、ハードウェア、ソフトウェア、ファームウェア、またはそれらの任意の組合せで実装され得る。ハードウェアで実装される場合、例示的なハードウェア構成はデバイス中に処理システムを備え得る。処理システムは、バスアーキテクチャを用いて実装され得る。バスは、処理システムの特定の適用例および全体的な設計制約に応じて、任意の数の相互接続バスおよびブリッジを含み得る。バスは、プロセッサと、機械可読媒体と、バスインターフェースとを含む様々な回路を互いにリンクし得る。バスインターフェースは、ネットワークアダプタを、特に、バスを介して処理システムに接続するために使用され得る。ネットワークアダプタは、信号処理機能を実装するために使用され得る。いくつかの態様では、ユーザインターフェース（たとえば、キーパッド、ディスプレイ、マウス、ジョイスティックなど）もバスに接続され得る。バスはまた、タイミングソース、周辺機器、電圧調整器、電力管理回路など、様々な他の回路をリンクし得るが、それらは当技術分野でよく知られており、したがってこれ以上説明されない。 [00101] The functions described may be implemented in hardware, software, firmware, or any combination thereof. When implemented in hardware, an exemplary hardware configuration may comprise a processing system in the device. The processing system can be implemented using a bus architecture. The bus may include any number of interconnecting buses and bridges depending on the specific application of the processing system and the overall design constraints. The bus may link various circuits including a processor, a machine readable medium, and a bus interface to each other. The bus interface can be used to connect the network adapter, in particular, to the processing system via the bus. Network adapters can be used to implement signal processing functions. In some aspects, a user interface (eg, keypad, display, mouse, joystick, etc.) may also be connected to the bus. A bus may also link various other circuits, such as timing sources, peripherals, voltage regulators, power management circuits, etc., which are well known in the art and are therefore not further described.

[00102]プロセッサは、機械可読媒体に記憶されたソフトウェアの実行を含む、バスおよび一般的な処理を管理することを担当し得る。プロセッサは、１つまたは複数の汎用および／または専用プロセッサを用いて実装され得る。例としては、マイクロプロセッサ、マイクロコントローラ、ＤＳＰプロセッサ、およびソフトウェアを実行することができる他の回路がある。ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語などの名称にかかわらず、命令、データ、またはそれらの任意の組合せを意味すると広く解釈されたい。機械可読媒体は、例として、ランダムアクセスメモリ（ＲＡＭ）、フラッシュメモリ、読取り専用メモリ（ＲＯＭ）、プログラマブル読取り専用メモリ（ＰＲＯＭ）、消去可能プログラマブル読取り専用メモリ（ＥＰＲＯＭ）、電気消去可能プログラマブル読取り専用メモリ（ＥＥＰＲＯＭ）、レジスタ、磁気ディスク、光ディスク、ハードドライブ、または他の好適な記憶媒体、あるいはそれらの任意の組合せを含み得る。機械可読媒体はコンピュータプログラム製品において実施され得る。コンピュータプログラム製品はパッケージング材料を備え得る。 [00102] The processor may be responsible for managing buses and general processing, including execution of software stored on machine-readable media. The processor may be implemented using one or more general purpose and / or dedicated processors. Examples include microprocessors, microcontrollers, DSP processors, and other circuits that can execute software. Software should be broadly interpreted to mean instructions, data, or any combination thereof, regardless of names such as software, firmware, middleware, microcode, hardware description language, and the like. Machine-readable media include, for example, random access memory (RAM), flash memory, read only memory (ROM), programmable read only memory (PROM), erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), register, magnetic disk, optical disk, hard drive, or other suitable storage medium, or any combination thereof. A machine-readable medium may be implemented in a computer program product. The computer program product may comprise packaging material.

[00103]ハードウェア実装形態では、機械可読媒体は、プロセッサとは別個の処理システムの一部であり得る。しかしながら、当業者なら容易に理解するように、機械可読媒体またはその任意の部分は処理システムの外部にあり得る。例として、機械可読媒体は、すべてバスインターフェースを介してプロセッサによってアクセスされ得る、伝送線路、データによって変調された搬送波、および／またはデバイスとは別個のコンピュータ製品を含み得る。代替的に、または追加として、機械可読媒体またはその任意の部分は、キャッシュおよび／または汎用レジスタファイルがそうであり得るように、プロセッサに統合され得る。局所構成要素など、説明された様々な構成要素は、特定のロケーションを有するものとして説明され得るが、それらはまた、分散コンピューティングシステムの一部として構成されているいくつかの構成要素など、様々な方法で構成され得る。 [00103] In a hardware implementation, the machine-readable medium may be part of a processing system that is separate from the processor. However, as those skilled in the art will readily appreciate, the machine-readable medium or any portion thereof may be external to the processing system. By way of example, a machine-readable medium may include a transmission line, a data modulated carrier wave, and / or a computer product separate from the device, all of which may be accessed by a processor via a bus interface. Alternatively or additionally, the machine-readable medium or any portion thereof may be integrated into the processor, as may the cache and / or general purpose register file. Although the various described components, such as local components, may be described as having a particular location, they may also be various, such as some components configured as part of a distributed computing system. Can be configured in various ways.

[00104]処理システムは、すべて外部バスアーキテクチャを介して他のサポート回路と互いにリンクされる、プロセッサ機能を提供する１つまたは複数のマイクロプロセッサと、機械可読媒体の少なくとも一部を提供する外部メモリとをもつ汎用処理システムとして構成され得る。代替的に、処理システムは、本明細書で説明されたニューロンモデルとニューラルシステムのモデルとを実装するための１つまたは複数の神経形態学的プロセッサを備え得る。別の代替として、処理システムは、プロセッサをもつ特定用途向け集積回路（ＡＳＩＣ）と、バスインターフェースと、ユーザインターフェースと、サポート回路と、単一のチップに統合された機械可読媒体の少なくとも一部分とを用いて、あるいは１つまたは複数のフィールドプログラマブルゲートアレイ（ＦＰＧＡ）、プログラマブル論理デバイス（ＰＬＤ）、コントローラ、状態機械、ゲート論理、個別ハードウェア構成要素、もしくは他の好適な回路、または本開示全体にわたって説明された様々な機能を実行することができる回路の任意の組合せを用いて、実装され得る。当業者は、特定の適用例と、全体的なシステムに課される全体的な設計制約とに応じて、どのようにしたら処理システムについて説明された機能を最も良く実装し得るかを理解されよう。 [00104] The processing system includes one or more microprocessors that provide processor functionality, all linked together with other support circuitry via an external bus architecture, and an external memory that provides at least a portion of the machine-readable medium. Can be configured as a general-purpose processing system. Alternatively, the processing system may comprise one or more neuromorphological processors for implementing the neuron model and neural system model described herein. As another alternative, a processing system includes an application specific integrated circuit (ASIC) having a processor, a bus interface, a user interface, support circuitry, and at least a portion of a machine readable medium integrated on a single chip. Or one or more field programmable gate arrays (FPGAs), programmable logic devices (PLDs), controllers, state machines, gate logic, discrete hardware components, or other suitable circuitry, or throughout this disclosure It can be implemented using any combination of circuits that can perform the various functions described. Those skilled in the art will understand how best to implement the described functionality for a processing system, depending on the particular application and the overall design constraints imposed on the overall system. .

[00105]機械可読媒体はいくつかのソフトウェアモジュールを備え得る。ソフトウェアモジュールは、プロセッサによって実行されたときに、処理システムに様々な機能を実行させる命令を含む。ソフトウェアモジュールは、送信モジュールと受信モジュールとを含み得る。各ソフトウェアモジュールは、単一の記憶デバイス中に常駐するか、または複数の記憶デバイスにわたって分散され得る。例として、トリガイベントが発生したとき、ソフトウェアモジュールがハードドライブからＲＡＭにロードされ得る。ソフトウェアモジュールの実行中、プロセッサは、アクセス速度を高めるために、命令のいくつかをキャッシュにロードし得る。次いで、１つまたは複数のキャッシュラインが、プロセッサによる実行のために汎用レジスタファイルにロードされ得る。以下でソフトウェアモジュールの機能に言及する場合、そのような機能は、そのソフトウェアモジュールからの命令を実行したときにプロセッサによって実装されることが理解されよう。さらに、本開示の態様が、そのような態様を実装するプロセッサ、コンピュータ、機械、または他のシステムの機能に改善を生じることを諒解されたい。 [00105] A machine-readable medium may comprise a number of software modules. A software module includes instructions that, when executed by a processor, cause the processing system to perform various functions. The software module may include a transmission module and a reception module. Each software module can reside in a single storage device or can be distributed across multiple storage devices. As an example, a software module can be loaded from a hard drive into RAM when a trigger event occurs. During execution of the software module, the processor may load some of the instructions into the cache to increase access speed. One or more cache lines can then be loaded into a general purpose register file for execution by the processor. When referring to the functionality of a software module below, it will be understood that such functionality is implemented by a processor when executing instructions from that software module. Further, it should be appreciated that aspects of the present disclosure result in improvements in the functionality of processors, computers, machines, or other systems that implement such aspects.

[00106]ソフトウェアで実装される場合、機能は、１つまたは複数の命令またはコードとして、非一時的コンピュータ可読媒体上に記憶されるか、または非一時的コンピュータ可読媒体を介して送信され得る。コンピュータ可読媒体は、ある場所から別の場所へのコンピュータプログラムの転送を可能にする任意の媒体を含む、コンピュータ記憶媒体と通信媒体の両方を含む。記憶媒体は、コンピュータによってアクセスされ得る任意の利用可能な媒体であり得る。限定ではなく例として、そのようなコンピュータ可読媒体は、ＲＡＭ、ＲＯＭ、ＥＥＰＲＯＭ、ＣＤ−ＲＯＭまたは他の光ディスクストレージ、磁気ディスクストレージまたは他の磁気ストレージデバイス、あるいは命令またはデータ構造の形態の所望のプログラムコードを搬送または記憶するために使用され得、コンピュータによってアクセスされ得る、任意の他の媒体を備えることができる。さらに、いかなる接続もコンピュータ可読媒体と適切に呼ばれる。たとえば、ソフトウェアが、同軸ケーブル、光ファイバーケーブル、ツイストペア、デジタル加入者回線（ＤＳＬ）、または赤外線（ＩＲ）、無線、およびマイクロ波などのワイヤレス技術を使用して、ウェブサイト、サーバ、または他のリモートソースから送信される場合、同軸ケーブル、光ファイバーケーブル、ツイストペア、ＤＳＬ、または赤外線、無線、およびマイクロ波などのワイヤレス技術は、媒体の定義に含まれる。本明細書で使用されるディスク（disk）およびディスク（disc）は、コンパクトディスク（disc）（ＣＤ）、レーザーディスク（登録商標）（disc）、光ディスク（disc）、デジタル多用途ディスク（disc）（ＤＶＤ）、フロッピー（登録商標）ディスク（disk）、およびＢｌｕ−ｒａｙ（登録商標）ディスク（disc）を含み、ここで、ディスク（disk）は、通常、データを磁気的に再生し、ディスク（disc）は、データをレーザーで光学的に再生する。したがって、いくつかの態様では、コンピュータ可読媒体は非一時的コンピュータ可読媒体（たとえば、有形媒体）を備え得る。さらに、他の態様では、コンピュータ可読媒体は一時的コンピュータ可読媒体（たとえば、信号）を備え得る。上記の組合せもコンピュータ可読媒体の範囲に含まれるべきである。 [00106] When implemented in software, the functions may be stored on or transmitted over as non-transitory computer-readable media as one or more instructions or code. Computer-readable media includes both computer storage media and communication media including any medium that enables transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer readable media can be RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage device, or desired program in the form of instructions or data structures. Any other medium that can be used to carry or store the code and that can be accessed by a computer can be provided. In addition, any connection is properly referred to as a computer-readable medium. For example, the software may use a website, server, or other remote, using coaxial technology, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared (IR), wireless, and microwave. When transmitted from a source, coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of the medium. Discs and discs used herein are compact discs (CDs), laser discs (discs), optical discs (discs), digital versatile discs (discs) DVD, floppy disk, and Blu-ray disk, where the disk normally reproduces data magnetically and is a disc. ) Reproduce the data optically with a laser. Thus, in some aspects computer readable media may comprise non-transitory computer readable media (eg, tangible media). In addition, in other aspects computer readable media may comprise transitory computer readable media (eg, signals). Combinations of the above should also be included within the scope of computer-readable media.

[00107]したがって、いくつかの態様は、本明細書で提示された動作を実行するためのコンピュータプログラム製品を備え得る。たとえば、そのようなコンピュータプログラム製品は、本明細書で説明された動作を実行するために１つまたは複数のプロセッサによって実行可能である命令をその上に記憶した（および／または符号化した）コンピュータ可読媒体を備え得る。いくつかの態様では、コンピュータプログラム製品はパッケージング材料を含み得る。 [00107] Accordingly, some aspects may comprise a computer program product for performing the operations presented herein. For example, such a computer program product stores a computer (and / or encoded) with instructions stored thereon that can be executed by one or more processors to perform the operations described herein. A readable medium may be provided. In some aspects, the computer program product may include packaging material.

[00108]さらに、本明細書で説明された方法および技法を実行するためのモジュールおよび／または他の適切な手段は、適用可能な場合にユーザ端末および／または基地局によってダウンロードされ、および／または他の方法で取得され得ることを諒解されたい。たとえば、そのようなデバイスは、本明細書で説明された方法を実行するための手段の転送を可能にするためにサーバに結合され得る。代替的に、本明細書で説明された様々な方法は、ユーザ端末および／または基地局が記憶手段（たとえば、ＲＡＭ、ＲＯＭ、コンパクトディスク（ＣＤ）またはフロッピーディスクなどの物理記憶媒体など）をデバイスに結合するかまたは与えると様々な方法を得ることができるように、記憶手段によって提供され得る。その上、本明細書で説明された方法および技法をデバイスに提供するための任意の他の好適な技法が利用され得る。 [00108] Further, modules and / or other suitable means for performing the methods and techniques described herein may be downloaded by user terminals and / or base stations when applicable, and / or It should be appreciated that it can be obtained in other ways. For example, such a device may be coupled to a server to allow transfer of means for performing the methods described herein. Alternatively, the various methods described herein allow user terminals and / or base stations to store storage means (eg, physical storage media such as RAM, ROM, compact disk (CD) or floppy disk). It can be provided by storage means so that various methods can be obtained when coupled to or provided with. Moreover, any other suitable technique for providing a device with the methods and techniques described herein may be utilized.

[00109]特許請求の範囲は、上記で示された厳密な構成および構成要素に限定されないことを理解されたい。上記で説明された方法および装置の構成、動作および詳細において、特許請求の範囲から逸脱することなく、様々な改変、変更および変形が行われ得る。 [00109] It is to be understood that the claims are not limited to the precise configuration and components illustrated above. Various modifications, changes and variations may be made in the arrangement, operation and details of the methods and apparatus described above without departing from the scope of the claims.

[00109]特許請求の範囲は、上記で示された厳密な構成および構成要素に限定されないことを理解されたい。上記で説明された方法および装置の構成、動作および詳細において、特許請求の範囲から逸脱することなく、様々な改変、変更および変形が行われ得る。
以下に本願の出願当初の特許請求の範囲に記載された発明を付記する。
［Ｃ１］
マルチラベル分類のためのしきい値を選択する方法であって、
順序付きリストを作成するために、第１のラベルに関連するラベルスコアのセットをソートすることと、
複数のスコア値から、候補しきい値のセットに対応する適合率値および再現率値を計算することと、
ターゲット適合率値またはターゲット再現率値に少なくとも部分的に基づいて、前記第１のラベルについて前記候補しきい値からしきい値を選択することとを備える、方法。
［Ｃ２］
前記しきい値は、
適合率値が前記ターゲット適合率値を上回る値がない、または前記再現率値が前記ターゲット再現率値を上回る値がない、あるいは
前記ターゲット再現率値が満たされるときに前記適合率値が低すぎる、または前記ターゲット適合率値が満たされるときに前記再現率値が低すぎる、のいずれかのとき、最大Ｆスコアに対応する値に少なくとも部分的に基づく、Ｃ１に記載の方法。
［Ｃ３］
前記選択することが、適合率または再現率のほうへ傾くベータ値を使用するＦスコアに少なくとも部分的に基づく、Ｃ２に記載の方法。
［Ｃ４］
マルチラベル分類のための活性化関数のためのスケールファクタを選択する方法であって、
範囲内のスコアのメトリックを計算することと、
スコアの前記メトリックが前記範囲内にないとき、前記スケールファクタを調節することとを備える、方法。
［Ｃ５］
前記活性化関数が、ロジスティック関数、ｔａｎ−ｈ関数、または線形正規化関数を備える、Ｃ４に記載の方法。
［Ｃ６］
スコアの前記メトリックが割合を備える、Ｃ４に記載の方法。
［Ｃ７］
スコアの前記メトリックが傾きを備える、Ｃ４に記載の方法。
［Ｃ８］
前記スケールファクタを調節することが、
値だけ前記スケールファクタを増分することと、
最小スケールファクタと最大スケールファクタとの間の差を２で除算することとのうちの１つを備える、Ｃ４に記載の方法。
［Ｃ９］
ワイヤレス通信におけるマルチラベル分類のためのしきい値を選択するための装置であって、
メモリと、
前記メモリに結合された少なくとも１つのプロセッサとを備え、前記少なくとも１つのプロセッサが、
順序付きリストを作成するために、第１のラベルに関連するラベルスコアのセットをソートすることと、
複数のスコア値から、候補しきい値のセットに対応する適合率値および再現率値を計算することと、
ターゲット適合率値またはターゲット再現率値に少なくとも部分的に基づいて、前記第１のラベルについて前記候補しきい値からしきい値を選択することと
を行うように構成された、装置。
［Ｃ１０］
前記しきい値は、
適合率値が前記ターゲット適合率値を上回る値がない、または前記再現率値が前記ターゲット再現率値を上回る値がない、あるいは
前記ターゲット再現率値が満たされるときに前記適合率値が低すぎる、または前記ターゲット適合率値が満たされるときに前記再現率値が低すぎる、のいずれかのとき、最大Ｆスコアに対応する値に少なくとも部分的に基づく、Ｃ９に記載の装置。
［Ｃ１１］
前記少なくとも１つのプロセッサが、適合率または再現率のほうへ傾くベータ値を使用するＦスコアに少なくとも部分的に基づいて選択するように構成された、Ｃ１０に記載の装置。
［Ｃ１２］
ワイヤレス通信における活性化関数のためのスケールファクタを選択するための装置であって、
メモリと、
前記メモリに結合された少なくとも１つのプロセッサとを備え、前記少なくとも１つのプロセッサは、
範囲内のスコアのメトリックを計算することと、
スコアの前記メトリックが前記範囲内にないとき、前記スケールファクタを調節することと
を行うように構成された、装置。
［Ｃ１３］
前記活性化関数が、ロジスティック関数、ｔａｎ−ｈ関数、または線形正規化関数を備える、Ｃ１２に記載の装置。
［Ｃ１４］
スコアの前記メトリックが割合を備える、Ｃ１２に記載の装置。
［Ｃ１５］
スコアの前記メトリックが傾きを備える、Ｃ１２に記載の装置。
［Ｃ１６］
前記少なくとも１つのプロセッサが、
値だけ前記スケールファクタを増分することと、
最小スケールファクタと最大スケールファクタとの間の差を２で除算することとのうちの少なくとも１つによって前記スケールファクタを調節するように構成された、Ｃ１２に記載の装置。
［Ｃ１７］
マルチラベル分類のためのしきい値を選択するための非一時的コンピュータ可読媒体であって、前記非一時的コンピュータ可読媒体がそれに記録された非一時的プログラムコードを有し、前記プログラムコードが、
順序付きリストを作成するために、第１のラベルに関連するラベルスコアのセットをソートするためのプログラムコードと、
複数のスコア値から、候補しきい値のセットに対応する適合率値および再現率値を計算するためのプログラムコードと、
ターゲット適合率値またはターゲット再現率値に少なくとも部分的に基づいて、前記第１のラベルについて前記候補しきい値からしきい値を選択するためのプログラムコードとを備える、非一時的コンピュータ可読媒体。
［Ｃ１８］
前記しきい値は、適合率値が前記ターゲット適合率値を上回る値がない、または前記再現率値が前記ターゲット再現率値を上回る値がない、あるいは前記ターゲット再現率値が満たされるときに前記適合率値が低すぎる、または前記ターゲット適合率値が満たされるときに前記再現率値が低すぎる、のいずれかのとき、最大Ｆスコアに対応する値に少なくとも部分的に基づく、Ｃ１７に記載の非一時的コンピュータ可読媒体。
［Ｃ１９］
前記プログラムコードが、適合率または再現率のほうへ傾くベータ値を使用するＦスコアに少なくとも部分的に基づいて選択するように構成された、Ｃ１８に記載の非一時的コンピュータ可読媒体。
［Ｃ２０］
活性化関数のためのスケールファクタを選択するための非一時的コンピュータ可読媒体であって、前記非一時的コンピュータ可読媒体がそれに記録された非一時的プログラムコードを有し、前記プログラムコードは、
範囲内のスコアのメトリックを計算するためのプログラムコードと、
スコアの前記メトリックが前記範囲内にないとき、前記スケールファクタを調節するためのプログラムコードとを備える、非一時的コンピュータ可読媒体。
［Ｃ２１］
前記活性化関数が、ロジスティック関数、ｔａｎ−ｈ関数、または線形正規化関数を備える、Ｃ２０に記載の非一時的コンピュータ可読媒体。
［Ｃ２２］
スコアの前記メトリックが割合を備える、Ｃ２０に記載の非一時的コンピュータ可読媒体。
［Ｃ２３］
スコアの前記メトリックが傾きを備える、Ｃ２０に記載の非一時的コンピュータ可読媒体。
［Ｃ２４］
前記プログラムコードが、
値だけ前記スケールファクタを増分することと、
最小スケールファクタと最大スケールファクタとの間の差を２で除算することとのうちの少なくとも１つによって前記スケールファクタを調節するように構成された、Ｃ２０に記載の非一時的コンピュータ可読媒体。
［Ｃ２５］
ワイヤレス通信におけるマルチラベル分類のためのしきい値を選択するための装置であって、
順序付きリストを作成するために、第１のラベルに関連するラベルスコアのセットをソートするための手段と、
複数のスコア値から、候補しきい値のセットに対応する適合率値および再現率値を計算するための手段と、
ターゲット適合率値またはターゲット再現率値に少なくとも部分的に基づいて、前記第１のラベルについて前記候補しきい値からしきい値を選択するための手段とを備える、装置。
［Ｃ２６］
前記しきい値は、適合率値が前記ターゲット適合率値を上回る値がない、または前記再現率値が前記ターゲット再現率値を上回る値がない、あるいは前記ターゲット再現率値が満たされるときに前記適合率値が低すぎる、または前記ターゲット適合率値が満たされるときに前記再現率値が低すぎる、のいずれかのとき、最大Ｆスコアに対応する値に少なくとも部分的に基づく、Ｃ２５に記載の装置。
［Ｃ２７］
選択するための前記手段が、適合率または再現率のほうへ傾くベータ値を使用するＦスコアに少なくとも部分的に基づく、Ｃ２６に記載の装置。
［Ｃ２８］
ワイヤレス通信におけるマルチラベル分類のための活性化関数のためのスケールファクタを選択する装置であって、
範囲内のスコアのメトリックを計算するための手段と、
スコアの前記メトリックが前記範囲内にないとき、前記スケールファクタを調節するための手段とを備える、装置。
［Ｃ２９］
前記活性化関数が、ロジスティック関数、ｔａｎ−ｈ関数、または線形正規化関数を備える、Ｃ２８に記載の装置。
［Ｃ３０］
スコアの前記メトリックが割合を備える、Ｃ２８に記載の装置。
［Ｃ３１］
スコアの前記メトリックが傾きを備える、Ｃ２８に記載の装置。
［Ｃ３２］
前記スケールファクタを調節するための前記手段が、
値だけ前記スケールファクタを増分するための手段と、
最小スケールファクタと最大スケールファクタとの間の差を２で除算するための手段とのうちの１つを備える、Ｃ２８に記載の装置。 [00109] It is to be understood that the claims are not limited to the precise configuration and components illustrated above. Various modifications, changes and variations may be made in the arrangement, operation and details of the methods and apparatus described above without departing from the scope of the claims.
The invention described in the scope of claims at the beginning of the application of the present application will be added below.
[C1]
A method for selecting a threshold for multi-label classification,
Sorting the set of label scores associated with the first label to create an ordered list;
Calculating precision and recall values corresponding to a set of candidate thresholds from multiple score values;
Selecting a threshold from the candidate threshold for the first label based at least in part on a target precision value or target recall value.
[C2]
The threshold is
There is no value with a precision value that exceeds the target precision value, or there is no value with the recall value that exceeds the target recall value, or
The value corresponding to the maximum F-score when either the precision value is too low when the target recall value is satisfied, or the recall value is too low when the target precision value is satisfied The method of C1, based at least in part on.
[C3]
The method of C2, wherein the selecting is based at least in part on an F-score using a beta value that leans towards precision or recall.
[C4]
A method for selecting a scale factor for an activation function for multi-label classification comprising:
Calculating a metric for scores within the range;
Adjusting the scale factor when the metric of the score is not within the range.
[C5]
The method of C4, wherein the activation function comprises a logistic function, a tan-h function, or a linear normalization function.
[C6]
The method of C4, wherein the metric of scores comprises a percentage.
[C7]
The method of C4, wherein the metric of scores comprises a slope.
[C8]
Adjusting the scale factor;
Incrementing the scale factor by a value;
The method of C4, comprising one of dividing a difference between a minimum scale factor and a maximum scale factor by two.
[C9]
An apparatus for selecting a threshold for multi-label classification in wireless communication,
Memory,
At least one processor coupled to the memory, the at least one processor comprising:
Sorting the set of label scores associated with the first label to create an ordered list;
Calculating precision and recall values corresponding to a set of candidate thresholds from multiple score values;
Selecting a threshold from the candidate threshold for the first label based at least in part on a target precision value or target recall value;
Configured to do the device.
[C10]
The threshold is
There is no value with a precision value that exceeds the target precision value, or there is no value with the recall value that exceeds the target recall value, or
The value corresponding to the maximum F-score when either the precision value is too low when the target recall value is satisfied, or the recall value is too low when the target precision value is satisfied The device of C9, based at least in part on.
[C11]
The apparatus of C10, wherein the at least one processor is configured to select based at least in part on an F-score that uses a beta value that leans toward precision or recall.
[C12]
An apparatus for selecting a scale factor for an activation function in wireless communication,
Memory,
At least one processor coupled to the memory, the at least one processor comprising:
Calculating a metric for scores within the range;
Adjusting the scale factor when the metric of the score is not within the range;
Configured to do the device.
[C13]
The apparatus of C12, wherein the activation function comprises a logistic function, a tan-h function, or a linear normalization function.
[C14]
The apparatus of C12, wherein the metric of scores comprises a percentage.
[C15]
The apparatus of C12, wherein the metric of scores comprises a slope.
[C16]
The at least one processor comprises:
Incrementing the scale factor by a value;
The apparatus of C12, configured to adjust the scale factor by at least one of dividing a difference between a minimum scale factor and a maximum scale factor by two.
[C17]
A non-transitory computer readable medium for selecting a threshold for multi-label classification, the non-transitory computer readable medium having non-transitory program code recorded thereon, the program code comprising:
Program code for sorting a set of label scores associated with a first label to create an ordered list;
Program code for calculating precision and recall values corresponding to a set of candidate thresholds from multiple score values;
Non-transitory computer readable medium comprising program code for selecting a threshold from the candidate threshold for the first label based at least in part on a target precision value or target recall value.
[C18]
The threshold value is determined when the precision value does not exceed the target precision value, or when the recall value does not exceed the target recall value, or when the target recall value is satisfied. As described in C17, when the precision value is either too low or the recall value is too low when the target precision value is met, based at least in part on the value corresponding to the maximum F-score Non-transitory computer readable medium.
[C19]
The non-transitory computer readable medium of C18, wherein the program code is configured to select based at least in part on an F-score that uses a beta value that leans toward precision or recall.
[C20]
A non-transitory computer readable medium for selecting a scale factor for an activation function, the non-transitory computer readable medium having non-transitory program code recorded thereon, the program code comprising:
Program code to calculate the metric for scores within the range;
A non-transitory computer readable medium comprising: program code for adjusting the scale factor when the metric of the score is not within the range.
[C21]
The non-transitory computer readable medium of C20, wherein the activation function comprises a logistic function, a tan-h function, or a linear normalization function.
[C22]
The non-transitory computer readable medium of C20, wherein the metric of scores comprises a percentage.
[C23]
The non-transitory computer readable medium of C20, wherein the metric of scores comprises a slope.
[C24]
The program code is
Incrementing the scale factor by a value;
The non-transitory computer readable medium of C20, configured to adjust the scale factor by at least one of dividing a difference between a minimum scale factor and a maximum scale factor by two.
[C25]
An apparatus for selecting a threshold for multi-label classification in wireless communication,
Means for sorting a set of label scores associated with the first label to create an ordered list;
Means for calculating precision and recall values corresponding to a set of candidate thresholds from a plurality of score values;
Means for selecting a threshold from the candidate threshold for the first label based at least in part on a target precision value or target recall value.
[C26]
The threshold value is determined when the precision value does not exceed the target precision value, or when the recall value does not exceed the target recall value, or when the target recall value is satisfied. As described in C25, when the precision value is either too low or the recall value is too low when the target precision value is met, based at least in part on the value corresponding to the maximum F-score apparatus.
[C27]
The apparatus of C26, wherein the means for selecting is based at least in part on an F-score that uses a beta value that leans towards precision or recall.
[C28]
An apparatus for selecting a scale factor for an activation function for multi-label classification in wireless communications, comprising:
A means for calculating a metric for a score within a range;
Means for adjusting the scale factor when the metric of the score is not within the range.
[C29]
The apparatus of C28, wherein the activation function comprises a logistic function, a tan-h function, or a linear normalization function.
[C30]
The apparatus of C28, wherein the metric of scores comprises a percentage.
[C31]
The apparatus of C28, wherein the metric of scores comprises a slope.
[C32]
The means for adjusting the scale factor comprises:
Means for incrementing the scale factor by a value;
The apparatus of C28, comprising one of means for dividing a difference between a minimum scale factor and a maximum scale factor by two.

Claims

A method for selecting a threshold for multi-label classification,
Sorting the set of label scores associated with the first label to create an ordered list;
Calculating precision and recall values corresponding to a set of candidate thresholds from multiple score values;
Selecting a threshold from the candidate threshold for the first label based at least in part on a target precision value or target recall value.

The threshold is
There is no value with a precision value exceeding the target precision value, or there is no value with the recall value exceeding the target recall value, or the precision value is too low when the target recall value is satisfied Or the recall value is too low when the target precision value is met,
The method of claim 1, wherein the method is based at least in part on a value corresponding to a maximum F score.

The method of claim 2, wherein the selecting is based at least in part on an F-score using a beta value that leans towards precision or recall.

A method for selecting a scale factor for an activation function for multi-label classification comprising:
Calculating a metric for scores within the range;
Adjusting the scale factor when the metric of the score is not within the range.

The method of claim 4, wherein the activation function comprises a logistic function, a tan-h function, or a linear normalization function.

The method of claim 4, wherein the metric of scores comprises a percentage.

The method of claim 4, wherein the metric of scores comprises a slope.

Adjusting the scale factor;
Incrementing the scale factor by a value;
The method of claim 4, comprising one of dividing a difference between a minimum scale factor and a maximum scale factor by two.

An apparatus for selecting a threshold for multi-label classification in wireless communication,
Memory,
At least one processor coupled to the memory, the at least one processor comprising:
Sorting the set of label scores associated with the first label to create an ordered list;
Calculating precision and recall values corresponding to a set of candidate thresholds from multiple score values;
Selecting a threshold value from the candidate threshold values for the first label based at least in part on a target precision value or target recall value;
apparatus.

The threshold is
There is no value with a precision value exceeding the target precision value, or there is no value with the recall value exceeding the target recall value, or the precision value is too low when the target recall value is satisfied Or the recall value is too low when the target precision value is met,
10. The apparatus of claim 9, wherein the apparatus is based at least in part on a value corresponding to a maximum F score.

The apparatus of claim 10, wherein the at least one processor is configured to select based at least in part on an F-score that uses a beta value that leans toward precision or recall.

An apparatus for selecting a scale factor for an activation function in wireless communication,
Memory,
At least one processor coupled to the memory, the at least one processor comprising:
Calculating a metric for scores within the range;
Adjusting the scale factor when the metric of the score is not within the range; and
apparatus.

The apparatus of claim 12, wherein the activation function comprises a logistic function, a tan-h function, or a linear normalization function.

The apparatus of claim 12, wherein the metric of scores comprises a percentage.

The apparatus of claim 12, wherein the metric of a score comprises a slope.

The at least one processor comprises:
Incrementing the scale factor by a value;
The apparatus of claim 12, configured to adjust the scale factor by at least one of dividing a difference between a minimum scale factor and a maximum scale factor by two.

A non-transitory computer readable medium for selecting a threshold for multi-label classification, the non-transitory computer readable medium having non-transitory program code recorded thereon, the program code comprising:
Program code for sorting a set of label scores associated with a first label to create an ordered list;
Program code for calculating precision and recall values corresponding to a set of candidate thresholds from multiple score values;
Non-transitory computer readable medium comprising program code for selecting a threshold from the candidate threshold for the first label based at least in part on a target precision value or target recall value.

The threshold value is determined when the precision value does not exceed the target precision value, or when the recall value does not exceed the target recall value, or when the target recall value is satisfied. 18. At least in part based on a value corresponding to a maximum F-score when either the precision value is too low or the recall value is too low when the target precision value is met A non-transitory computer readable medium as described.

The non-transitory computer-readable medium of claim 18, wherein the program code is configured to select based at least in part on an F-score that uses a beta value that leans toward precision or recall.

A non-transitory computer readable medium for selecting a scale factor for an activation function, the non-transitory computer readable medium having non-transitory program code recorded thereon, the program code comprising:
Program code to calculate the metric for scores within the range;
A non-transitory computer readable medium comprising: program code for adjusting the scale factor when the metric of the score is not within the range.

21. The non-transitory computer readable medium of claim 20, wherein the activation function comprises a logistic function, a tan-h function, or a linear normalization function.

21. The non-transitory computer readable medium of claim 20, wherein the metric of scores comprises a percentage.

21. The non-transitory computer readable medium of claim 20, wherein the metric of scores comprises a slope.

The program code is
Incrementing the scale factor by a value;
21. The non-transitory computer readable medium of claim 20, configured to adjust the scale factor by at least one of dividing a difference between a minimum scale factor and a maximum scale factor by two. .

An apparatus for selecting a threshold for multi-label classification in wireless communication,
Means for sorting a set of label scores associated with the first label to create an ordered list;
Means for calculating precision and recall values corresponding to a set of candidate thresholds from a plurality of score values;
Means for selecting a threshold from the candidate threshold for the first label based at least in part on a target precision value or target recall value.

The threshold value is determined when the precision value does not exceed the target precision value, or when the recall value does not exceed the target recall value, or when the target recall value is satisfied. 26. To at least partially based on a value corresponding to a maximum F-score when either the precision value is too low or the recall value is too low when the target precision value is met The device described.

27. The apparatus of claim 26, wherein the means for selecting is based at least in part on an F-score that uses a beta value that leans toward precision or recall.

An apparatus for selecting a scale factor for an activation function for multi-label classification in wireless communications, comprising:
A means for calculating a metric for a score within a range;
Means for adjusting the scale factor when the metric of the score is not within the range.

30. The apparatus of claim 28, wherein the activation function comprises a logistic function, a tan-h function, or a linear normalization function.

30. The apparatus of claim 28, wherein the metric of scores comprises a percentage.

30. The apparatus of claim 28, wherein the metric of scores comprises a slope.

The means for adjusting the scale factor comprises:
Means for incrementing the scale factor by a value;
29. The apparatus of claim 28, comprising one of means for dividing a difference between a minimum scale factor and a maximum scale factor by two.