JP2022546883A

JP2022546883A - Target object recognition method, apparatus and system

Info

Publication number: JP2022546883A
Application number: JP2021523386A
Authority: JP
Inventors: ▲進▼ ▲呉▼; ▲凱▼歌 ▲陳▼; ▲帥▼ 伊
Original assignee: Sensetime International Pte Ltd
Current assignee: Sensetime International Pte Ltd
Priority date: 2020-08-01
Filing date: 2020-10-30
Publication date: 2022-11-10
Anticipated expiration: 2040-10-30
Also published as: KR20220018467A; AU2020294280A1; US11631240B2; JP7250924B2; CN112513877A; US20220036067A1

Abstract

本発明は、目標対象認識方法、装置およびシステムを提供する。前記方法は、収集された画像から、積層された複数の認識待ちの目標対象を含む目標画像を切り出すことと、前記目標画像の高さを所定高さに調整することと、前記調整された目標画像の特徴マップを抽出することと、前記特徴マップを前記目標画像の高さ方向に対応する次元に沿ってセグメント分けして、所定数のセグメントの特徴を取得することと、前記所定数のセグメントの特徴のうちの各セグメントの特徴に基づいて目標対象の認識を行うことと、を含み、前記目標画像の高さ方向は、前記複数の認識待ちの目標対象が積層された方向である。【選択図】図１The present invention provides a target object recognition method, apparatus and system. The method includes: cropping a target image including a plurality of stacked target objects awaiting recognition from the acquired image; adjusting a height of the target image to a predetermined height; extracting a feature map of an image; segmenting the feature map along a dimension corresponding to the height direction of the target image to obtain features of a predetermined number of segments; and recognizing a target object based on the feature of each segment among the features of , wherein the height direction of the target image is the direction in which the plurality of target objects awaiting recognition are stacked. [Selection drawing] Fig. 1

Description

＜関連出願の相互引用＞
本発明は、２０２０年８月１日に提出された、発明名称が「目標対象認識方法、装置、及びシステム」であり、出願番号が１０２０２００７３４７Ｖであるシンガポール特許出願の優先権を主張し、当該出願の全ての内容が援用により本発明に組み入れられる。
本発明は、コンピュータビジョン技術分野に関し、特に目標対象認識方法、装置およびシステムに関する。 <Cross-citation of related applications>
The present invention claims priority from Singapore Patent Application entitled "Target Object Recognition Method, Apparatus, and System" and application number 10202007347V, filed on August 1, 2020; is incorporated herein by reference in its entirety.
The present invention relates to the technical field of computer vision, and more particularly to a target object recognition method, apparatus and system.

日常生産および生活では、通常、幾つかの目標対象を認識する必要がある。デスクトップゲームという娯楽場面を例とすると、幾つかのデスクトップゲームにおいて、ゲームコインの種類および数の情報が取得されるように、デスクトップ上のゲームコインを認識する必要がある。しかし、伝統的な認識方式では、認識正確率が低い。 In daily production and life, we usually need to recognize several target objects. Taking the entertainment scene of desktop games as an example, in some desktop games, it is necessary to recognize the game coins on the desktop so that the information on the type and number of game coins can be obtained. However, the traditional recognition method has a low recognition accuracy rate.

本発明の一態様は、目標対象認識方法を提供する。前記目標対象認識方法は、収集された画像から、積層された複数の認識待ちの目標対象を含む目標画像を切り出すことと、前記目標画像の高さを所定高さに調整することと、前記調整された目標画像の特徴マップを抽出することと、前記特徴マップを前記目標画像の高さ方向に対応する次元に沿ってセグメント分けして、所定数のセグメントの特徴を取得することと、前記所定数のセグメントの特徴のうちの各セグメントの特徴に基づいて目標対象の認識を行うことと、を含み、前記目標画像の高さ方向は、前記複数の認識待ちの目標対象が積層された方向である。 One aspect of the present invention provides a target object recognition method. The target object recognition method includes: cutting out a target image including a plurality of stacked target objects awaiting recognition from a collected image; adjusting the height of the target image to a predetermined height; segmenting the feature map along a dimension corresponding to the height direction of the target image to obtain features of a predetermined number of segments; recognizing a target object based on the feature of each segment of the features of the number of segments, wherein the height direction of the target image is the direction in which the plurality of target objects awaiting recognition are stacked. be.

本発明に係る何れか１つの実施形態を参照すると、前記目標画像の高さを前記所定高さに調整することは、スケーリング後の目標画像の幅が所定幅になるまで、前記目標画像の高さおよび幅を等比率でスケーリングすることと、前記スケーリング後の目標画像の高さが前記所定高さよりも大きい場合に、縮小後の目標画像の高さが前記所定高さに等しくなるまで、前記スケーリング後の目標画像の高さおよび幅を等比率で縮小することと、を含む。 Referring to any one embodiment of the present invention, adjusting the height of the target image to the predetermined height may increase the height of the target image until the width of the target image after scaling is the predetermined width. scaling the height and width in equal proportions; and, if the height of the target image after scaling is greater than the predetermined height, scaling until the height of the target image after reduction is equal to the predetermined height. and reducing the height and width of the scaled target image proportionally.

本発明に係る何れか１つの実施形態を参照すると、前記目標画像の高さを前記所定高さに調整することは、スケーリング後の目標画像の幅が所定幅になるまで、前記目標画像の高さおよび幅を等比率でスケーリングすることと、前記スケーリング後の目標画像の高さが前記所定高さよりも小さい場合に、第１画素を利用して前記スケーリング後の目標画像に対して充填を行うことにより、充填された目標画像の高さが前記所定高さになるようにすることと、を含む。 Referring to any one embodiment of the present invention, adjusting the height of the target image to the predetermined height may increase the height of the target image until the width of the target image after scaling is the predetermined width. scaling the height and width equally; and padding the scaled target image using a first pixel if the height of the scaled target image is less than the predetermined height. thereby causing the height of the filled target image to be the predetermined height.

本発明に係る何れか１つの実施形態を参照すると、前記目標画像内の認識待ちの目標対象は、シート状物体であり、各認識待ちの目標対象の厚さは、等しく、前記複数の認識待ちの目標対象は、厚さ方向に沿って積層され、且つ、前記所定高さは、前記厚さの整数倍である。 With reference to any one embodiment of the present invention, the recognition pending target objects in said target image are sheet-like objects, the thickness of each recognition pending target object is equal, and said plurality of recognition pending target objects is stacked along the thickness direction, and the predetermined height is an integer multiple of the thickness.

本発明に係る何れか１つの実施形態を参照すると、前記特徴マップの抽出と目標対象の認識とは、何れもニューラルネットワークによって実行され、前記ニューラルネットワークは、サンプル画像とそのラベル情報とを用いてトレーニングされたものである。 Referring to any one embodiment of the present invention, both the feature map extraction and target object recognition are performed by a neural network, which uses a sample image and its label information to has been trained.

本発明に係る何れか１つの実施形態を参照すると、前記サンプル画像のラベル情報は、前記サンプル画像内の各目標対象のラベルタイプを含み、前記ニューラルネットワークは、サイズ調整後のサンプル画像に対して特徴抽出を行い、前記サイズ調整後のサンプル画像の特徴マップを取得することと、前記特徴マップをセグメント分けして得た各セグメントの特徴に基づいて、サンプル画像内の目標対象の認識を行い、サンプル画像内の各目標対象の予測タイプを取得することと、前記サンプル画像内の各目標対象の予測タイプと前記サンプル画像内の各目標対象のラベルタイプとに基づいて、前記ニューラルネットワークのパラメータ値を調整することと、によってトレーニングされたものである。 Referring to any one embodiment of the present invention, the sample image label information includes a label type for each target object in the sample image, and the neural network performs a performing feature extraction to obtain a feature map of the resized sample image, and recognizing a target object in the sample image based on the features of each segment obtained by segmenting the feature map; obtaining a prediction type for each target object in a sample image; and parameter values for the neural network based on the prediction type for each target object in the sample image and the label type for each target object in the sample image. It is trained by coordinating and

本発明に係る何れか１つの実施形態を参照すると、前記サンプル画像のラベル情報は、各ラベルタイプの目標対象の数を更に含み、前記ニューラルネットワークのパラメータ値を調整することは、前記サンプル画像内の各目標対象の予測タイプと、前記サンプル画像内の各目標対象のラベルタイプと、前記サンプル画像における各ラベルタイプの目標対象の数と、前記サンプル画像における各予測タイプの目標対象の数とに基づいて、前記ニューラルネットワークのパラメータ値を調整することを含む。 Referring to any one embodiment of the present invention, the label information of the sample images further includes a number of target objects of each label type, and adjusting the parameter values of the neural network includes: a prediction type of each target target in said sample image, a label type of each target target in said sample image, a number of target targets of each label type in said sample image, and a number of target targets of each prediction type in said sample image; adjusting the parameter values of the neural network based on.

本発明に係る何れか１つの実施形態を参照すると、前記サンプル画像のラベル情報は、前記サンプル画像内の目標対象の総数を更に含み、前記ニューラルネットワークのパラメータ値を調整することは、前記サンプル画像内の各目標対象の予測タイプと、前記サンプル画像内の各目標対象のラベルタイプと、前記サンプル画像における各予測タイプの目標対象の数の和と、前記サンプル画像内の目標対象の総数とに基づいて、前記ニューラルネットワークのパラメータ値を調整することを含む。 Referring to any one embodiment of the present invention, the sample image label information further includes a total number of target objects in the sample image, and adjusting parameter values of the neural network includes: the sum of the prediction type of each target object in the sample image, the label type of each target object in the sample image, the number of target objects of each prediction type in the sample image, and the total number of target objects in the sample image; adjusting the parameter values of the neural network based on.

本発明に係る何れか１つの実施形態を参照すると、前記目標対象認識方法は、トレーニングされた前記ニューラルネットワークをテストすることと、前記テストの結果に基づいて、前記ニューラルネットワークに応じて各タイプの目標対象の認識精度を並べ替え、認識精度の並べ替え結果を取得することと、前記テストの結果に基づいて、前記ニューラルネットワークに応じて各タイプの目標対象の誤認識率を並べ替え、誤認識率の並べ替え結果を取得することと、前記認識精度の並べ替え結果と前記誤認識率の並べ替え結果とに基づいて、前記ニューラルネットワークを更にトレーニングすることと、を更に含む。 Referring to any one embodiment of the present invention, the target object recognition method comprises testing the trained neural network, and based on the results of the testing, each type of sorting the recognition accuracy of the target object, obtaining a result of sorting the recognition accuracy; and according to the result of the test, sorting the misrecognition rate of each type of target object according to the neural network, misrecognition. Further comprising obtaining a rate reordering result, and further training the neural network based on the recognition accuracy reordering result and the false recognition rate reordering result.

本発明の一態様は、目標対象認識装置を提供する。前記目標対象認識装置は、収集された画像から、積層された複数の認識待ちの目標対象を含む目標画像を切り出すための取得ユニットと、前記目標画像の高さを所定高さに調整するための調整ユニットと、調整された目標画像の特徴マップを抽出するための抽出ユニットと、前記特徴マップを前記目標画像の高さ方向に対応する次元に沿ってセグメント分けして、所定数のセグメントの特徴を取得するためのセグメント分けユニットと、前記所定数のセグメントの特徴のうちの各セグメントの特徴に基づいて目標対象の認識を行うための認識ユニットと、を備え、前記目標画像の高さ方向は、前記複数の認識待ちの目標対象が積層された方向である。 One aspect of the present invention provides a target object recognizer. The target object recognition device includes an acquisition unit for cutting out a target image including a plurality of stacked target objects awaiting recognition from the acquired image, and a height adjustment unit for adjusting the height of the target image to a predetermined height. an adjusting unit, an extracting unit for extracting a feature map of an adjusted target image, segmenting the feature map along a dimension corresponding to the height direction of the target image to obtain a predetermined number of segmented features; and a recognition unit for recognizing a target object based on the feature of each segment among the features of the predetermined number of segments, wherein the height direction of the target image is , the direction in which the plurality of target objects awaiting recognition are stacked.

本発明に係る何れか１つの実施形態を参照すると、前記調整ユニットは、スケーリング後の目標画像の幅が所定幅になるまで、前記目標画像の高さおよび幅を等比率でスケーリングし、且つ、前記スケーリング後の目標画像の高さが前記所定高さよりも大きい場合に、縮小後の目標画像の高さが前記所定高さに等しくなるまで、前記スケーリング後の目標画像の高さおよび幅を等比率で縮小する。 With reference to any one embodiment of the present invention, the adjustment unit scales the height and width of the target image by equal proportions until the width of the target image after scaling reaches a predetermined width, and If the height of the target image after scaling is greater than the predetermined height, then adjust the height and width of the target image after scaling until the height of the target image after reduction is equal to the predetermined height. Shrink by ratio.

本発明に係る何れか１つの実施形態を参照すると、前記調整ユニットは、スケーリング後の目標画像の幅が所定幅になるまで、前記目標画像の高さおよび幅を等比率でスケーリングし、且つ、前記スケーリング後の目標画像の高さが前記所定高さよりも小さい場合に、第１画素を利用してスケーリング後の目標画像に対して充填を行うことにより、充填された目標画像の高さが前記所定高さになるようにする。 With reference to any one embodiment of the present invention, the adjustment unit scales the height and width of the target image by equal proportions until the width of the target image after scaling reaches a predetermined width, and When the height of the target image after scaling is smaller than the predetermined height, filling the target image after scaling using the first pixel reduces the height of the filled target image to the predetermined height. Set to the desired height.

本発明に係る何れか１つの実施形態を参照すると、前記目標画像内の認識待ちの目標対象は、シート状物体であり、各認識待ちの目標対象の厚さは、等しく、複数の認識待ちの目標対象は、厚さ方向に沿って積層され、且つ、前記所定高さは、前記厚さの整数倍である。 With reference to any one embodiment of the present invention, the pending target objects in said target image are sheet-like objects, the thickness of each pending target object is equal, and the multiple pending target objects are: The target object is stacked along the thickness direction, and the predetermined height is an integer multiple of the thickness.

本発明に係る何れか１つの実施形態を参照すると、前記サンプル画像のラベル情報は、前記サンプル画像内の各目標対象のラベルタイプを含み、前記目標対象認識装置は、トレーニングユニットを更に備え、前記トレーニングユニットは、サイズ調整後のサンプル画像に対して特徴抽出を行い、前記サイズ調整後のサンプル画像の特徴マップを取得することと、前記特徴マップをセグメント分けして得た各セグメントの特徴に基づいて、サンプル画像内の目標対象の認識を行い、サンプル画像内の各目標対象の予測タイプを取得することと、前記サンプル画像内の各目標対象の予測タイプと前記サンプル画像内の各目標対象のラベルタイプとに基づいて、前記ニューラルネットワークのパラメータ値を調整することとにより、前記ニューラルネットワークをトレーニングする。 Referring to any one embodiment of the present invention, the sample image label information includes a label type for each target object in the sample image, the target object recognition device further comprising a training unit, wherein the A training unit performs feature extraction on a resized sample image to obtain a feature map of the resized sample image, and segmenting the feature map based on the features of each segment. performing recognition of target objects in the sample images to obtain a prediction type for each target object in the sample images; and training the neural network by adjusting parameter values of the neural network based on the label type.

本発明に係る何れか１つの実施形態を参照すると、前記サンプル画像のラベル情報は、各ラベルタイプの目標対象の数を更に含み、前記トレーニングユニットは、前記サンプル画像内の各目標対象の予測タイプと、前記サンプル画像内の各目標対象のラベルタイプと、前記サンプル画像における各ラベルタイプの目標対象の数と、前記サンプル画像における各予測タイプの目標対象の数とに基づいて、前記ニューラルネットワークのパラメータ値を調整する。 Referring to any one embodiment of the present invention, the label information of the sample images further comprises a number of target objects of each label type, and the training unit comprises a prediction type of each target object in the sample images. and based on the label type of each target object in the sample image, the number of target objects of each label type in the sample image, and the number of target objects of each prediction type in the sample image, performing Adjust parameter values.

本発明に係る何れか１つの実施形態を参照すると、前記サンプル画像のラベル情報は、前記サンプル画像内の目標対象の総数を更に含み、前記トレーニングユニットは、前記サンプル画像内の各目標対象の予測タイプと、前記サンプル画像内の各目標対象のラベルタイプと、前記サンプル画像における各予測タイプの目標対象の数の和と、前記サンプル画像内の目標対象の総数とに基づいて、前記ニューラルネットワークのパラメータ値を調整する。 Referring to any one embodiment of the present invention, the sample image label information further includes a total number of target objects in the sample image, and the training unit predicts each target object in the sample image. the label type of each target object in the sample image; the sum of the number of target objects of each prediction type in the sample image; and the total number of target objects in the sample image. Adjust parameter values.

本発明に係る何れか１つの実施形態を参照すると、前記目標対象認識装置は、テストユニットを更に備え、前記テストユニットは、トレーニングされた前記ニューラルネットワークをテストし、前記テストの結果に基づいて、前記ニューラルネットワークに応じて各タイプの目標対象の認識精度を並べ替え、認識精度の並べ替え結果を取得し、前記テストの結果に基づいて、前記ニューラルネットワークに応じて各タイプの目標対象の誤認識率を並べ替え、誤認識率の並べ替え結果を取得し、前記認識精度の並べ替え結果と前記誤認識率の並べ替え結果とに基づいて、前記ニューラルネットワークを更にトレーニングする。 Referring to any one embodiment of the present invention, the target object recognizer further comprises a test unit for testing the trained neural network, and based on the results of the test: Sorting the recognition accuracy of each type of target object according to the neural network, obtaining a result of sorting the recognition accuracy, and misrecognising each type of target object according to the neural network according to the result of the test. Reordering the rates, obtaining a reordering result of the misrecognition rate, and further training the neural network based on the reordering result of the recognition accuracy and the reordering result of the misrecognition rate.

本発明の一態様は、電子デバイスを提供する。前記電子デバイスは、プロセッサと、プロセッサ実行可能指令を記憶するためのメモリと、を備え、前記プロセッサは、前記メモリに記憶された指令を呼び出すことにより、本発明の何れか１つの実施形態に記載の目標対象認識方法を実施するように構成される。 One aspect of the invention provides an electronic device. The electronic device comprises a processor and a memory for storing processor-executable instructions, and the processor invokes the instructions stored in the memory to perform the operations according to any one of the embodiments of the invention. is configured to implement the target object recognition method of

本発明の一態様は、コンピュータ可読記憶媒体を提供する。前記コンピュータ可読記憶媒体には、コンピュータプログラム指令が記憶され、前記コンピュータプログラム指令がプロセッサによって実行されたときに、本発明の何れか一項に記載の目標対象認識方法は、実施される。 One aspect of the invention provides a computer-readable storage medium. Computer program instructions are stored on the computer readable storage medium and, when the computer program instructions are executed by a processor, the method of target object recognition according to any one of the present inventions is performed.

本発明の１つまたは複数の実施例に係る目標対象認識方法、装置、電子デバイスおよび記憶媒体では、収集された画像から取り出された目標画像の高さを所定高さに調整し、調整後の目標画像の特徴マップを抽出し、前記特徴マップを前記目標画像の高さ方向に対応する次元に沿ってセグメント分けして、所定数のセグメントの特徴を取得することにより、前記所定数のセグメントの特徴のうちの各セグメントの特徴に基づいて目標対象の認識を行う。セグメント分けして得たセグメントの特徴が各目標対象の特徴マップに対応するため、セグメント分け特徴に基づいて目標対象の認識を行うことによって目標対象の数が認識正確性へ影響することは、回避され、目標対象の認識正確性は、向上する。 Target object recognition methods, apparatus, electronic devices, and storage media according to one or more embodiments of the present invention adjust the height of a target image retrieved from an acquired image to a predetermined height, and adjust the extracting a feature map of a target image, segmenting the feature map along a dimension corresponding to the height direction of the target image, and obtaining features of a predetermined number of segments; Recognition of the target object is performed based on the features of each segment of the features. Since the segment features obtained by segmentation correspond to the feature map of each target object, it is possible to avoid the influence of the number of target objects on the recognition accuracy by performing target object recognition based on the segmentation features. and the target object recognition accuracy is improved.

上述した一般的な記述と後文の詳細記述は、単に例示的なものおよび解釈的なものであり、本発明を制限するためのものではないことは、理解すべきである。 It is to be understood that the above general description and the following detailed description are merely exemplary and explanatory and are not intended to limit the invention.

ここでの図面は、明細書に組み込まれて明細書の一部を構成する。これらの図面は、本発明に合致する実施例を示しつつ、明細書の記載とともに本発明の解決ユニットを解釈するために用いられる。
本発明の少なくとも１つの実施例に係る目標対象認識方法のフローチャートである。本発明の少なくとも１つの実施例に係る目標対象認識方法において直立で積載された複数の目標対象の模式図である。本発明の少なくとも１つの実施例に係る目標対象認識方法において横立ちで積載された複数の目標対象の模式図である。本発明の少なくとも１つの実施例に係る目標対象認識装置のブロック図である。本発明の少なくとも１つの実施例に係る電子デバイスのブロック図である。 The drawings herein are incorporated into and constitute a part of the specification. These drawings, showing embodiments consistent with the present invention, are used to interpret the solution unit of the present invention together with the description of the specification.
4 is a flow chart of a target object recognition method in accordance with at least one embodiment of the present invention; FIG. 4 is a schematic diagram of a plurality of target objects loaded upright in a target object recognition method according to at least one embodiment of the present invention; FIG. 4 is a schematic diagram of a plurality of side-mounted target objects in a target object recognition method according to at least one embodiment of the present invention; 1 is a block diagram of a target object recognizer in accordance with at least one embodiment of the present invention; FIG. 1 is a block diagram of an electronic device in accordance with at least one embodiment of the invention; FIG.

本発明が当業者によってより良好に理解されるように、以下では、図面を組み合わせて本発明の幾つかの実施例を明瞭で完全に記述する。明らかに、記述される実施例は、単に本発明の一部の可能な実施例である。本発明の１つまたは複数の実施例に基づいて当業者が進歩性に値する労働を掛けずに得た全ての他の実施例は、何れも本発明の保護範囲に含まれるべきである。 In order that the present invention may be better understood by those skilled in the art, the following clearly and completely describes several embodiments of the present invention in combination with the drawings. Obviously, the described embodiments are merely some possible embodiments of the invention. All other embodiments obtained by a person skilled in the art based on one or more embodiments of the present invention without laborious steps should fall within the protection scope of the present invention.

本発明で使用される用語は、単に特定の実施例を記述する目的であり、本発明を制限するためのものではない。本発明および添付する特許請求の範囲で使用される単数形式の「一種」、「前記」および「当該」も、文脈から他の意味を明瞭で分かる場合でなければ、複数の形式を含むことを意図する。理解すべきことは、本文で使用される用語「および／または」が、１つまたは複数の関連する列挙項目を含む如何なる或いは全ての可能な組み合わせを指す。また、本文における用語「少なくとも１種」は、複数種のうちの何れか１種または複数種のうちの少なくとも２種の任意の組み合わせを指す。 The terminology used in the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the present invention and the appended claims, the singular forms "a", "said" and "the" are intended to include the plural forms unless the context clearly dictates otherwise. Intend. It should be understood that the term "and/or" as used herein refers to any and all possible combinations of one or more of the associated listed items. In addition, the term "at least one" in the text refers to any one of a plurality of types or any combination of at least two of a plurality of types.

理解すべきことは、本発明において第１、第２、第３等という用語を用いて各種の情報を記述するが、これらの情報は、これらの用語に限定されるものではない。これらの用語は、単に同一のタイプの情報同士を区分するために用いられる。例えば、本発明の範囲を逸脱しない限り、第１情報が第２情報と呼称されてもよく、類似的に、第２情報が第１情報と呼称されてもよい。これは、コンテキストに依存する。例えば、ここで使用される言葉「場合」は、「…とき」や「…ときに」あるいは「特定の状況に応じて」として解釈されてもよい。 It should be understood that although the terms first, second, third, etc. are used in the present invention to describe various types of information, these information are not limited to these terms. These terms are only used to distinguish between similar types of information. For example, first information may be referred to as second information, and similarly, second information may be referred to as first information, without departing from the scope of the present invention. This is context dependent. For example, the word "if" as used herein may be interpreted as "when" or "when" or "depending on the particular circumstances".

図１は、本発明の少なくとも１つの実施例に係る目標対象認識方法のフローチャートである。図１に示すように、当該方法は、ステップ１０１～１０５を含んでもよい。 FIG. 1 is a flowchart of a target object recognition method according to at least one embodiment of the invention. As shown in FIG. 1, the method may include steps 101-105.

ステップ１０１では、収集された画像から目標画像を切り出し、前記目標画像は、積層された複数の認識待ちの目標対象を含む。 In step 101, a target image is segmented from the acquired image, said target image comprising a plurality of stacked target objects awaiting recognition.

幾つかのよく見られる状況において、認識待ちの目標対象は、各種形状のシート状物体、例えば、ゲームコインであり、且つ各目標対象の厚さ（高さ）は、通常、同じである。複数の認識待ちの目標対象は、通常、厚さ方向に沿って積層される。図２Ａに示すように、複数のゲームコインは、鉛直方向に沿って積層（直立（ｓｔａｎｄ）積載）され、目標画像の高さ方向（Ｈ）は、鉛直方向であり、目標画像の幅方向（Ｗ）は、目標画像の高さ方向（Ｈ）に垂直な方向である。図２Ｂに示すように、複数のゲームコインは、水平方向に沿って積層（横立ち（ｆｌｏａｔ）積載）されてもよく、目標画像の高さ方向（Ｈ）は、水平方向であり、目標画像の幅方向（Ｗ）は、目標画像の高さ方向（Ｈ）に垂直な方向である。 In some common situations, the targets awaiting recognition are sheet-like objects of various shapes, such as game coins, and the thickness (height) of each target is usually the same. Multiple target objects awaiting recognition are typically stacked along the thickness direction. As shown in FIG. 2A, a plurality of game coins are vertically stacked (stand stacked), the height direction (H) of the target image is the vertical direction, and the width direction (H) of the target image is the vertical direction. W) is the direction perpendicular to the height direction (H) of the target image. As shown in FIG. 2B, multiple game coins may be stacked (float stacked) along the horizontal direction, the height direction (H) of the target image is the horizontal direction, and the target image The width direction (W) of is the direction perpendicular to the height direction (H) of the target image.

認識待ちの目標対象は、目標領域に置かれた目標対象であってもよい。前記目標領域は、平面（例えば、デスクトップ）、容器（例えば、ボックス）等であってもよい。目標領域付近の画像収集装置、例えばカメラまたはカメラヘッドを介して、前記目標領域の画像を収集してもよい。 A target object awaiting recognition may be a target object placed in the target area. The target area may be a flat surface (eg, a desktop), a container (eg, a box), or the like. An image of the target area may be acquired via an image acquisition device, such as a camera or camera head, in the vicinity of the target area.

本発明の実施例において、ディープラーニングネットワーク、例えばＲＣＮＮ（ＲｅｇｉｏｎＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を利用し、収集された画像を検出して目標対象検出結果を取得してもよく、前記検出結果は、検出フレームであってもよい。検出フレームによって、前記収集された画像から、積層された複数の認識待ちの目標対象を含む目標画像を切り出してもよい。当業者であれば理解できるように、ＲＣＮＮは、単に例示であり、他のディープラーニングネットワークを採用して目標検出を行ってもよく、本発明ではこれについて限定しない。 In an embodiment of the present invention, a deep learning network, such as RCNN (Region Convolutional Neural Network), may be used to detect the collected images to obtain a target object detection result, said detection result being a detection frame. There may be. A detection frame may segment a target image comprising a plurality of stacked target objects awaiting recognition from the acquired image. As can be appreciated by those skilled in the art, RCNN is merely an example, other deep learning networks may be employed to perform target detection, and the present invention is not limited thereto.

ステップ１０２では、前記目標画像の高さを所定高さに調整する。 At step 102, the height of the target image is adjusted to a predetermined height.

ただし、前記目標画像の高さ方向は、前記複数の認識待ちの目標対象が積層された方向である。前記所定高さは、前記認識待ちの目標対象の厚さの整数倍であってもよい。図２Ａおよび図２Ｂに示す積載されたゲームコインを例とし、図２Ａおよび図２Ｂで示されたゲームコインの積層方向を前記目標画像の高さ方向として確定することができる。これに応じて、前記ゲームコインの径方向を前記目標画像の幅方向として確定する。 However, the height direction of the target image is the direction in which the plurality of recognition-waiting target objects are stacked. The predetermined height may be an integer multiple of the thickness of the target object awaiting recognition. Taking the stacked game coins shown in FIGS. 2A and 2B as an example, the stacking direction of the game coins shown in FIGS. 2A and 2B can be determined as the height direction of the target image. Accordingly, the radial direction of the game coin is determined as the width direction of the target image.

ステップ１０３では、調整された目標画像の特徴マップを抽出する。 At step 103, a feature map of the adjusted target image is extracted.

調整後の目標画像について、予めトレーニングされた特徴抽出ネットワークを利用して、前記調整後の目標画像の特徴マップを取得してもよい。ただし、前記特徴抽出ネットワークは、複数の畳み込み層、または複数の畳み込み層とプーリング層等を含んでもよい。複数層の特徴抽出により、徐々に低レベル特徴を中レベルまたは高レベル特徴に変換して、前記目標画像に対する表現力を向上させ、後段の処理に有利になる。 For an adjusted target image, a pre-trained feature extraction network may be used to obtain a feature map of the adjusted target image. However, the feature extraction network may include multiple convolutional layers, or multiple convolutional and pooling layers, or the like. Multi-layer feature extraction gradually transforms low-level features into medium-level or high-level features to improve the expressiveness of the target image, which is advantageous for subsequent processing.

ステップ１０４では、前記特徴マップを前記目標画像の高さ方向に沿ってセグメント分けして、所定数のセグメントの特徴を取得する。 At step 104, the feature map is segmented along the height direction of the target image to obtain features for a predetermined number of segments.

前記特徴マップを目標画像の高さ方向においてセグメント分けすることにより、所定数のセグメントの特徴を取得することができる。ただし、各セグメントの特徴は、１つの目標対象に対応するものとして考えられてもよい。ただし、前記所定数は、認識待ちの目標対象の最大数でもある。 By segmenting the feature map in the height direction of the target image, features of a predetermined number of segments can be obtained. However, each segment feature may be considered as corresponding to one target object. However, said predetermined number is also the maximum number of target objects awaiting recognition.

一例において、前記特徴マップは、複数の次元、例えば、チャンネル次元、高さ次元、幅次元、バッチ（ｂａｔｃｈ）次元等を含んでもよく、前記特徴マップのフォーマットは、例えば、[ＢＣＨＷ]と表現され得る。ただし、Ｂは、バッチ次元を示し、Ｃは、チャンネル次元を示し、Ｈは、高さ次元を示し、Ｗは、幅次元を示す。ただし、前記特徴マップの高さ次元と幅次元とで指示される方向は、目標画像の高さ方向および幅方向に基づいて確定されてもよい。 In one example, the feature map may include multiple dimensions, such as channel dimension, height dimension, width dimension, batch dimension, etc., and the format of the feature map is, for example, [BCHW] can be expressed as where B denotes the batch dimension, C denotes the channel dimension, H denotes the height dimension, and W denotes the width dimension. However, the directions indicated by the height and width dimensions of the feature map may be determined based on the height and width directions of the target image.

ステップ１０５では、前記所定数のセグメントの特徴のうちの各セグメントの特徴に基づいて目標対象の認識を行う。 At step 105, target object recognition is performed based on the features of each segment of the predetermined number of segment features.

各セグメントの特徴が１つの目標対象に対応するため、各セグメントごとに特徴目標対象の認識を行うことは、直接目標画像の特徴マップを利用して目標認識を行うことよりも、目標対象の数による影響を解消し、目標画像内の目標対象の認識正確性を向上させる。 Since the features of each segment correspond to one target object, performing feature target recognition on a segment-by-segment basis is more efficient than performing target recognition directly using the feature map of the target image. , and improve the recognition accuracy of the target object in the target image.

幾つかの実施例において、目標領域側面に設けられた画像収集装置を介して、直立の複数の目標対象を含む目標画像（側面視画像と呼称）を撮影してもよく、または、目標領域上方に設けられた画像収集装置を介して、横立の複数の目標対象を含む目標画像（平面図像と呼称）を撮影してもよい。 In some embodiments, a target image containing multiple target objects upright (referred to as a side view image) may be captured via an image acquisition device mounted to the side of the target area, or may be viewed above the target area. A target image (referred to as a plan view image) including a plurality of sideways standing target objects may be captured via an image acquisition device provided in the .

幾つかの実施例において、以下の方法で前記目標画像の高さを調整してもよい。 In some embodiments, the height of the target image may be adjusted in the following manner.

まず、前記目標画像に対応する所定高さおよび所定幅を取得し、前記目標画像に対してサイズ変換を行うことに用いる。ただし、前記所定幅は、目標対象の平均幅に応じて設定されてもよく、前記所定高さは、前記目標対象の平均高さと認識待ちの目標対象の最大数とに応じて設定されてもよい。 First, a predetermined height and a predetermined width corresponding to the target image are obtained and used to perform size conversion on the target image. However, the predetermined width may be set according to the average width of the target objects, and the predetermined height may be set according to the average height of the target objects and the maximum number of target objects awaiting recognition. good.

一例において、前記目標画像の幅が所定幅になるまで、前記目標画像の高さおよび幅を等比率でスケーリングしてもよい。ただし、等比率でスケーリングするとは、前記目標画像の高さと幅との間の比率を維持したまま、前記目標画像に対して拡大または縮小を行うことを指す。ただし、前記所定幅および所定高さの単位は、画素であってもよく、他の単位であってもよいが、本発明では、これについて限定しない。 In one example, the height and width of the target image may be scaled proportionally until the width of the target image is a predetermined width. However, scaling at the same ratio means enlarging or reducing the target image while maintaining the ratio between the height and width of the target image. However, the units of the predetermined width and the predetermined height may be pixels or other units, but the present invention is not limited to this.

スケーリング後の目標画像の幅が所定幅に達したが、スケーリング後の目標画像の高さが所定高さよりも大きい場合に、縮小後の目標画像の高さが所定高さに等しくなるまで、前記スケーリング後の目標画像の高さおよび幅を等比率で縮小する。 If the width of the target image after scaling reaches the predetermined width but the height of the target image after scaling is greater than the predetermined height, the above-mentioned steps are performed until the height of the target image after reduction is equal to the predetermined height. Reduce the height and width of the scaled target image proportionally.

例を挙げると、前記目標対象がゲームコインであるとすれば、ゲームコインの平均幅に基づいて、所定幅を２２４ｐｉｘ（画素）に設定してもよく、ゲームコインの平均高さと認識待ちのゲームコインの最大数例えば７２とに基づいて、所定高さを１３４４ｐｉｘに設定してもよい。まず、目標画像の幅を２２４ｐｉｘに調整し、前記目標画像の高さを等比率で調整してもよい。調整後の高さが１３４４ｐｉｘよりも大きい場合に、調整後の目標画像の高さを再度調整して前記目標画像の高さを１３４４ｐｉｘにさせるとともに、前記目標画像の幅を等比率で調整することにより、前記目標画像の高さを所定高さ１３４４ｐｉｘに調整することを図る。調整後の高さが１３４４ｐｉｘに等しい場合に、再度調整を必要とせず、前記目標画像の高さを所定高さ１３４４ｐｉｘに調整する。 For example, if the target object is a game coin, the predetermined width may be set to 224 pixels based on the average width of the game coin, and the average height of the game coin and the game waiting for recognition. Based on the maximum number of coins, for example 72, the predetermined height may be set to 1344 pix. First, the width of the target image may be adjusted to 224 pixels, and the height of the target image may be adjusted in equal proportions. When the height after adjustment is greater than 1344 pix, the height of the target image after adjustment is adjusted again to make the height of the target image 1344 pix, and the width of the target image is adjusted at an equal ratio. is intended to adjust the height of the target image to a predetermined height of 1344 pix. If the height after adjustment is equal to 1344 pix, the height of the target image is adjusted to a predetermined height of 1344 pix without requiring adjustment again.

一例において、前記目標画像の幅が所定幅になるまで、前記目標画像の高さおよび幅を等比率でスケーリングする。スケーリング後の目標画像の幅が所定幅に達したが、スケーリング後の目標画像の高さが所定高さよりも小さい場合に、第１画素を利用してスケーリング後の目標画像に対して充填することにより、充填後の目標画像の高さが所定高さになるようにする。 In one example, the height and width of the target image are proportionally scaled until the width of the target image is a predetermined width. When the width of the scaled target image reaches a predetermined width but the height of the scaled target image is smaller than the predetermined height, the first pixel is used to fill the scaled target image. so that the height of the target image after filling is a predetermined height.

ただし、前記第１画素は、画素値がゼロの画素、即ち、黒色画素であってもよい。前記第１画素は、他の画素値と設定されてもよい。具体的な画素値は、本発明の実施例の効果へ影響しない。 However, the first pixel may be a pixel with a pixel value of zero, that is, a black pixel. The first pixel may be set with another pixel value. Specific pixel values do not affect the effectiveness of embodiments of the present invention.

相変わらず前記目標対象がゲームコインであり、所定幅を２２４ｐｉｘ、所定高さを１３４４ｐｉｘ、最大数を７２とすることを例とし、まず、目標画像の幅を２２４ｐｉｘまでスケーリングし、前記目標画像の高さを等比率でスケーリングしてもよい。スケーリング後の目標画像の高さが１３４４ｐｉｘよりも小さい場合に、１３４４ｐｉｘ未満である高さの部分に黒色画素を充填することにより、充填後の目標画像の高さが１３４４ｐｉｘになるようにする。充填後の目標画像の高さが１３４４ｐｉｘに等しい場合に、充填を必要とせずに、前記目標画像の高さを所定高さ１３４４ｐｉｘに調整することを実現する。 As an example, the target object is still a game coin, the predetermined width is 224 pix, the predetermined height is 1344 pix, and the maximum number is 72. First, the width of the target image is scaled to 224 pix, and the height of the target image is may be scaled proportionally. If the height of the target image after scaling is less than 1344 pix, the portion of height less than 1344 pix is filled with black pixels so that the height of the target image after filling is 1344 pix. When the height of a target image after filling is equal to 1344 pix, it is realized to adjust the height of the target image to a predetermined height of 1344 pix without needing filling.

前記目標画像の高さを所定高さに調整した後、調整後の目標画像の特徴マップを前記目標画像の高さ方向に対応する次元に沿ってセグメント分けして所定数のセグメントの特徴を取得してもよい。 After adjusting the height of the target image to a predetermined height, the feature map of the adjusted target image is segmented along a dimension corresponding to the height direction of the target image to obtain features of a predetermined number of segments. You may

特徴マップ[ＢＣＨＷ]を例とし、例えば、所定数、即ち、認識待ちの目標対象の最大数である７２に基づいて、特徴マップ[ＢＣＨＷ]をＨ次元（高さ次元）においてセグメント分けする。調整後の目標画像の高さが所定高さよりも小さい場合に、目標画像に対して充填を行って高さが所定高さに達するようにする。調整後の目標画像の高さが所定高さよりも大きい場合に、等比率縮小によって目標画像の高さを所定高さに調整するため、前記目標画像の特徴マップは、何れも所定高さの目標画像に基づいて取得されたものとなる。また、前記所定高さが認識待ちの目標対象の最大数に応じて設定されるため、前記最大数に応じて前記特徴マップをセグメント分けして、取得された各セグメントの特徴マップを各目標対象に関連付けさせ、各セグメントの特徴マップに基づいて目標対象の認識を行うことにより、目標対象の数による影響を低減可能であり、各目標対象認識の正確性を向上させる。 Taking the feature map [B C H W] as an example, for example, the feature map [B C H W] is defined in the H dimension (height dimension) based on a predetermined number, namely 72, which is the maximum number of target objects awaiting recognition. segment at. When the height of the target image after adjustment is smaller than the predetermined height, the target image is filled so that the height reaches the predetermined height. When the height of the target image after adjustment is greater than the predetermined height, the height of the target image is adjusted to the predetermined height by equal ratio reduction. It is obtained based on the image. Further, since the predetermined height is set according to the maximum number of target objects awaiting recognition, the feature map is segmented according to the maximum number, and the acquired feature map of each segment is transferred to each target object. and performing target recognition based on the feature map of each segment, the impact of the number of targets can be reduced, improving the accuracy of each target recognition.

幾つかの実施例において、前記充填後の目標画像における、前記充填後の目標画像をセグメント分けして得た所定数のセグメント特徴について、前記セグメント特徴を分類するときに、前記第１画素で充填された領域に対応するセグメントの特徴の分類結果は、空となる。例えば、黒色画素で充填された領域に対応するセグメントの特徴について、これらのセグメントの特徴に対応する分類結果が空であると確定可能である。目標対象の最大数と空である分類結果の数との差により、目標画像に含まれる非空である分類結果の数を確定してもよく、または、目標対象に対応するセグメント特徴の非空である分類結果の数を直接認識してもよい。こうようにすることによって、取得された非空である分類結果の数に基づいて、目標画像に含まれる目標対象の数を確定することができる。 In some embodiments, for a predetermined number of segment features in the filled target image obtained by segmenting the filled target image, filling with the first pixel when classifying the segment features. The feature classification result of the segment corresponding to the marked region is empty. For example, for segment features corresponding to regions filled with black pixels, it can be determined that the classification results corresponding to these segment features are empty. The difference between the maximum number of target objects and the number of empty classification results may determine the number of non-empty classification results contained in the target image, or the number of non-empty segment features corresponding to the target object. You may directly recognize the number of classification results where . By doing so, the number of target objects contained in the target image can be determined based on the number of non-empty classification results obtained.

認識待ちの目標対象の最大数を７２とすれば、目標画像の特徴マップを７２セグメントに分け、各セグメントの特徴マップに基づいて目標対象の認識を行うと、７２個の分類結果を取得可能である。目標画像が黒色画素充填領域を含む場合に、当該充填領域のセグメントの特徴マップに対応する分類結果は、空となる。例えば、空である分類結果を１６個取得した場合に、非空である分類結果が５６個取得される。こうようにすることによって、目標画像が５６個の目標対象を含むと確定可能である。 Assuming that the maximum number of target objects awaiting recognition is 72, if the feature map of the target image is divided into 72 segments and the target object is recognized based on the feature map of each segment, 72 classification results can be obtained. be. If the target image contains a black pixel fill region, the classification result corresponding to the feature map of the segment of the fill region will be empty. For example, when 16 empty classification results are obtained, 56 non-empty classification results are obtained. By doing so, it can be determined that the target image contains 56 target objects.

当業者であれば理解できるように、以上の所定幅パラメータ、所定高さパラメータ、認識待ちの目標対象の最大数パラメータは、何れも例示であり、これらのパラメータの具体的な数値は、実際の需要に応じて具体的に設定されてもよい。本発明の実施例では、これについて限定しない。 As can be understood by those skilled in the art, the predetermined width parameter, the predetermined height parameter, and the maximum number of target objects awaiting recognition parameter are all examples, and the specific numerical values of these parameters are It may be specifically set according to demand. Embodiments of the present invention are not limited in this respect.

幾つかの実施例において、特徴マップの抽出と目標対象の認識とは、何れもニューラルネットワークによって実行され、前記ニューラルネットワークは、サンプル画像とそのラベル情報とを用いてトレーニングされたものである。前記ニューラルネットワークは、特徴抽出ネットワークおよび分類ネットワークを含んでもよい。ただし、前記特徴抽出ネットワークは、サイズ調整後の目標画像の特徴マップを抽出し、前記分類ネットワークは、所定数のセグメントの特徴のうちの各セグメントの特徴に基づいて目標対象の認識を行う。ただし、前記サンプル画像は、複数の目標対象を含む。 In some embodiments, both feature map extraction and target object recognition are performed by a neural network, which was trained using sample images and their label information. The neural networks may include feature extraction networks and classification networks. However, the feature extraction network extracts a feature map of the resized target image, and the classification network performs target object recognition based on the features of each segment of a predetermined number of segment features. However, the sample image includes multiple target objects.

一例において、前記サンプル画像のラベル情報は、前記サンプル画像内の各目標対象のラベルタイプを含み、前記ニューラルネットワークは、以下の操作によってトレーニングされたものである。前記操作は、サイズ調整後のサンプル画像に対して特徴抽出を行い、前記サイズ調整後のサンプル画像の特徴マップを取得することと、前記特徴マップをセグメント分けして得た各セグメントの特徴に基づいて、サンプル画像内の目標対象の認識を行い、サンプル画像内の各目標対象の予測タイプを取得することと、前記サンプル画像内の各目標対象の予測タイプと前記サンプル画像内の各目標対象のラベルタイプとに基づいて、前記ニューラルネットワークのパラメータ値を調整することとである。 In one example, the label information of the sample images includes the label type of each target object in the sample images, and the neural network was trained by the following operations. The operation includes performing feature extraction on the resized sample image to obtain a feature map of the resized sample image, and segmenting the feature map based on the features of each segment obtained. performing recognition of target objects in the sample images to obtain a prediction type for each target object in the sample images; and adjusting parameter values of the neural network based on the label type.

ゲームコインを例とすると、各ゲームコインのタイプが額面に関連し、同じ額面のゲームコインは、同一のタイプに属する。直立で積載された複数のゲームコインを含むサンプル画像について、前記サンプル画像に各ゲームコインの額面をラベリングされている。額面がラベリングされたサンプル画像に基づいて、目標対象を認識するためのニューラルネットワークをトレーニングする。前記ニューラルネットワークは、サンプル画像に基づいて、予測によって各ゲームコインの額面を取得し、予測タイプとラベルタイプとの間の差により、例えば、前記特徴抽出ネットワークのパラメータ値と前記分類ネットワークのパラメータ値とを含む前記ニューラルネットワークのパラメータ値を調整し、予測タイプとラベルタイプとの間の差が設定閾値より小さくなるときに、または、反復回数が設定回数に達するときに、トレーニングを完了する。 Taking game coins as an example, each game coin type is associated with a denomination, and game coins of the same denomination belong to the same type. For a sample image containing a plurality of game coins stacked upright, said sample image is labeled with the denomination of each game coin. A neural network is trained to recognize the target object based on the denomination labeled sample images. The neural network obtains the denomination of each game coin by prediction based on the sample image, and the difference between the prediction type and the label type determines, for example, the parameter value of the feature extraction network and the parameter value of the classification network. and to complete training when the difference between prediction type and label type is less than a set threshold or when the number of iterations reaches a set number.

一例において、前記サンプル画像のラベル情報は、各ラベルタイプの目標対象の数を更に含む。このような場合に、前記サンプル画像内の各目標対象の予測タイプと、前記サンプル画像内の各目標対象のラベルタイプと、前記サンプル画像における各ラベルタイプの目標対象の数と、前記サンプル画像における各予測タイプの目標対象の数とに基づいて、前記ニューラルネットワークのパラメータ値を調整する。 In one example, the sample image label information further includes the number of target objects for each label type. In such a case, the prediction type of each target object in the sample image, the label type of each target object in the sample image, the number of target objects of each label type in the sample image, and and adjusting the parameter values of the neural network based on the number of target targets of each prediction type.

依然として直立で積載された複数のゲームコインを例とすると、前記サンプル画像には、各ゲームコインの額面情報、および各額面のゲームコインの数情報がラベリングされている。上記情報がラベリングされたサンプル画像に基づいて、目標対象を認識するためのニューラルネットワークをトレーニングする。前記ニューラルネットワークは、サンプル画像に基づいて、予測によって各ゲームコインの額面および同一額面のゲームコインの数を取得する。予測結果とラベル情報との間の差に基づいて、前記ニューラルネットワークのパラメータ値を調整する。 Taking a plurality of game coins still stacked upright as an example, the sample image is labeled with the denomination information of each game coin and the number information of game coins of each denomination. Based on sample images labeled with the above information, a neural network is trained to recognize the target object. The neural network obtains the denomination of each game coin and the number of game coins of the same denomination by prediction based on the sample image. Adjust the parameter values of the neural network based on the difference between the prediction result and the label information.

一例において、前記サンプル画像のラベル情報は、前記サンプル画像内の目標対象の総数を更に含む。このような場合に、前記サンプル画像内の各目標対象の予測タイプと、前記サンプル画像内の各目標対象のラベルタイプと、前記サンプル画像における各予測タイプの目標対象の数の和と、前記サンプル画像内の目標対象の総数とに基づいて、前記ニューラルネットワークのパラメータ値を調整する。 In one example, the sample image label information further includes the total number of target objects in the sample image. In such a case, the prediction type of each target object in the sample image, the label type of each target object in the sample image, the sum of the number of target objects of each prediction type in the sample image, and the sample and adjusting the parameter values of the neural network based on the total number of target objects in the image.

依然として直立で積載された複数のゲームコインを例とすると、前記サンプル画像には、各ゲームコインの額面情報およびゲームコインの総数情報がラベリングされている。上記情報がラベリングされたサンプル画像に基づいて、目標対象を認識するためのニューラルネットワークをトレーニングする。前記ニューラルネットワークは、サンプル画像に基づいて、予測によって、各ゲームコインの額面、およびゲームコインの総数（つまり、予測結果）を取得する。予測結果とラベル情報との間の差に基づいて、前記ニューラルネットワークのパラメータ値を調整する。 Taking a plurality of game coins still stacked upright as an example, the sample image is labeled with the denomination information of each game coin and the total number of game coins information. Based on sample images labeled with the above information, a neural network is trained to recognize the target object. The neural network obtains the denomination of each game coin and the total number of game coins (ie, the prediction result) by prediction based on the sample image. Adjust the parameter values of the neural network based on the difference between the prediction result and the label information.

一例において、前記サンプル画像のラベル情報は、前記サンプル画像内の各目標対象のラベルタイプ、前記サンプル画像における各ラベルタイプの目標対象の数目、前記サンプル画像内の目標対象の総数を含む。このような場合に、前記サンプル画像内の各目標対象の予測タイプと、前記サンプル画像内の各目標対象のラベルタイプと、前記サンプル画像における各ラベルタイプの目標対象の数と、前記サンプル画像における各予測タイプの目標対象の数と、前記サンプル画像における各予測タイプの目標対象の数の和および前記サンプル画像内の目標対象の総数とに基づいて、前記ニューラルネットワークのパラメータ値を調整する。 In one example, the label information of the sample image includes the label type of each target object in the sample image, the number of target objects of each label type in the sample image, and the total number of target objects in the sample image. In such a case, the prediction type of each target object in the sample image, the label type of each target object in the sample image, the number of target objects of each label type in the sample image, and Adjusting parameter values of the neural network based on the number of target objects of each prediction type, the sum of the number of target objects of each prediction type in the sample image and the total number of target objects in the sample image.

依然として直立で積載された複数のゲームコインを例とすると、前記サンプル画像には、各ゲームコインの額面情報、各種の額面のゲームコインの数情報およびゲームコインの総数情報がラベリングされている。上記情報がラベリングされたサンプル画像に基づいて、目標対象を認識するためのニューラルネットワークをトレーニングする。前記ニューラルネットワークは、サンプル画像に基づいて、予測によって各ゲームコインの額面、各種の額面のゲームコインの数、およびゲームコインの総数を取得する。予測結果とラベル情報との間の差に基づいて、前記ニューラルネットワークのパラメータ値を調整する。 Taking a plurality of game coins still stacked upright as an example, the sample image is labeled with the denomination information of each game coin, the number information of game coins of various denominations and the total number of game coins information. Based on sample images labeled with the above information, a neural network is trained to recognize the target object. The neural network obtains the denomination of each game coin, the number of game coins of different denominations and the total number of game coins by prediction based on the sample image. Adjust the parameter values of the neural network based on the difference between the prediction result and the label information.

本発明の実施例において、前記ニューラルネットワークをトレーニングするのに用いられる損失関数は、交差エントロピー損失、各タイプの目標対象の数損失、前記目標対象の総数損失のうちの少なくとも一項を含む。つまり、前記損失関数が、交差エントロピー損失の他に、各タイプの目標対象の数損失および前記目標対象の総数損失も含むことができるため、目標対象の数に対する認識能力を向上させる。 In an embodiment of the present invention, the loss function used to train the neural network includes at least one of cross-entropy loss, target number loss of each type, and target target total number loss. That is, the loss function can also include the number loss of each type of target objects and the total loss of the target objects in addition to the cross-entropy loss, thus improving the ability to recognize the number of target objects.

幾つかの実施例において、ニューラルネットワークをトレーニングする際に、トレーニングデータを拡張することによって、本発明の実施例に係る目標対象のタイプおよび数を認識するためのニューラルネットワークが実際の場面により良好に適用するようにしてもよい。例えば、前記サンプル画像を水平反転させること、前記サンプル画像を設定角度で回転させること、前記サンプル画像に対して色変換を行うことと、前記サンプル画像に対して輝度変換を行うこと等のうちの何れか一項または複数項を採用してデータ拡張を行ってもよい。 In some embodiments, the neural network for recognizing types and numbers of target objects according to embodiments of the present invention is better suited for real-world situations by augmenting the training data when training the neural network. may be applied. For example, horizontally reversing the sample image, rotating the sample image at a set angle, performing color conversion on the sample image, and performing brightness conversion on the sample image. Any one term or multiple terms may be adopted to perform data expansion.

本発明の複数の実施例に係る目標対象認識方法は、複数種のタイプの目標対象を認識するために用いられてもよく、セグメント分けされた特徴マップを利用して目標対象を認識することにより、各タイプの目標対象の認識精度がタイプ種類の増加につれて劣化されることはない。 A target object recognition method according to embodiments of the present invention may be used to recognize multiple types of target objects by recognizing the target object using segmented feature maps. , the recognition accuracy of each type of target object does not degrade as the type variety increases.

幾つかの実施例において、トレーニングされた前記ニューラルネットワークをテストし、前記テストの結果に基づいて、前記ニューラルネットワークに応じて各タイプの目標対象の認識精度を並べ替え、認識精度の並べ替え結果を取得し、前記テストの結果に基づいて、前記ニューラルネットワークに応じて各タイプの目標対象の誤認識率を並べ替え、誤認識率の並べ替え結果を取得し、前記認識精度の並べ替え結果と前記誤認識率の並べ替え結果とに基づいて、前記ニューラルネットワークを更にトレーニングしてもよい。 In some embodiments, testing the trained neural network, sorting the recognition accuracy of each type of target object according to the neural network based on the results of the testing, and sorting the recognition accuracy results into and sorting the misrecognition rate of each type of target object according to the neural network based on the result of the test, obtaining the sorting result of the misrecognition rate, the sorting result of the recognition accuracy and the The neural network may be further trained based on the misrecognition rate reordering results.

各タイプの目標対象の認識精度の並べ替え結果および誤認識率の並べ替え結果について、二次元テーブルを利用して記憶してもよい。例えば、認識精度の並べ替え結果を上から下への順番でテーブルに記憶し、誤認識率の並べ替え結果を左から右への順番でテーブルに記憶してもよい。前記表格における設定範囲内のタイプ、例えば、前記テーブルにおける、第３行に位置する前３列の範囲内のタイプについて、更にトレーニングすることにより、ニューラルネットワークのこれらのタイプに対する認識精度および正確率を向上させる。 A two-dimensional table may be used to store the sorted result of recognition accuracy and the sorted result of false recognition rate of each type of target object. For example, the results of sorting recognition accuracies may be stored in a table in order from top to bottom, and the results of sorting recognition error rates may be stored in a table in order from left to right. By further training the types within the set range in the table, for example, the types within the first three columns located in the third row of the table, the recognition accuracy and accuracy rate of the neural network for these types can be improved. Improve.

図３は、本発明の少なくとも１つの実施例に係る目標対象認識装置のブロック図である。図３に示すように、前記装置は、収集された画像から、積層された複数の認識待ちの目標対象を含む目標画像を切り出すための取得ユニット３０１と、前記目標画像の高さを所定高さに調整するための調整ユニット３０２と、前記調整された目標画像の特徴マップを抽出するための抽出ユニット３０３と、前記特徴マップを前記目標画像の高さ方向に対応する次元に沿ってセグメント分けして、所定数のセグメントの特徴を取得するためのセグメント分けユニット３０３と、前記所定数のセグメントの特徴のうちの各セグメントの特徴に基づいて目標対象の認識を行うための認識ユニット３０５と、を備え、ここで、前記目標画像の高さ方向は、前記複数の認識待ちの目標対象が積層された方向である。 FIG. 3 is a block diagram of a target object recognizer in accordance with at least one embodiment of the invention. As shown in FIG. 3, the apparatus includes an acquisition unit 301 for cropping a target image including a plurality of stacked target objects awaiting recognition from an acquired image, and a height of the target image to a predetermined height. and an extraction unit 303 for extracting a feature map of said adjusted target image, segmenting said feature map along a dimension corresponding to the height direction of said target image. a segmentation unit 303 for obtaining a predetermined number of segment features; and a recognition unit 305 for recognizing a target object based on the features of each segment among the predetermined number of segment features. wherein the height direction of the target image is the direction in which the plurality of target objects awaiting recognition are stacked.

幾つかの実施例において、前記調整ユニット３０２は、スケーリング後の目標画像の幅が所定幅になるまで、前記目標画像の高さおよび幅を等比率でスケーリングし、スケーリング後の目標画像の幅が所定幅に達したが、スケーリング後の目標画像高さが所定高さよりも大きい場合に、縮小後の目標画像の高さが所定高さに等しくなるまで、前記スケーリング後の目標画像の高さおよび幅を等比率で縮小する。 In some embodiments, the adjustment unit 302 scales the height and width of the target image proportionally until the width of the scaled target image is a predetermined width, and the width of the scaled target image is If the predetermined width is reached but the scaled target image height is greater than the predetermined height, the scaled target image height and Reduce width proportionally.

幾つかの実施例において、前記調整ユニット３０２は、スケーリング後の目標画像の幅が所定幅になるまで、前記目標画像の高さおよび幅を等比率でスケーリングし、スケーリング後の目標画像の幅が所定幅に達したが、スケーリング後の目標画像の高さが所定高さよりも小さい場合に、第１画素を利用してスケーリング後の目標画像に対して充填することにより、充填後の目標画像の高さが所定高さになるようにする。 In some embodiments, the adjustment unit 302 scales the height and width of the target image proportionally until the width of the scaled target image is a predetermined width, and the width of the scaled target image is When the predetermined width is reached, but the height of the target image after scaling is smaller than the predetermined height, the target image after filling is filled by using the first pixel to fill the target image after scaling. Set the height to the desired height.

幾つかの実施例において、前記目標画像内の認識待ちの目標対象は、シート状物体であり、各認識待ちの目標対象の厚さは、等しく、前記複数の認識待ちの目標対象は、その厚さ方向に沿って積層され、所定高さは、前記認識待ちの目標対象の厚さの整数倍である。 In some embodiments, the pending target objects in the target image are sheet-like objects, the thickness of each pending target object is equal, and the plurality of pending target objects has a thickness of The predetermined height is an integral multiple of the thickness of the target object awaiting recognition.

幾つかの実施例において、特徴マップの抽出と目標対象の認識とは、何れもニューラルネットワークによって実行され、前記ニューラルネットワークは、サンプル画像とそのラベル情報とを用いてトレーニングされたものである。 In some embodiments, both feature map extraction and target object recognition are performed by a neural network, which was trained using sample images and their label information.

幾つかの実施例において、前記サンプル画像のラベル情報は、前記サンプル画像内の各目標対象のラベルタイプを含み、前記装置は、トレーニングユニットを更に備え、前記トレーニングユニットは、サイズ調整後のサンプル画像に対して特徴抽出を行い、前記サイズ調整後のサンプル画像の特徴マップを取得することと、前記特徴マップをセグメント分けして得た各セグメントの特徴に基づいて、サンプル画像内の目標対象の認識を行い、サンプル画像内の各目標対象の予測タイプを取得することと、前記サンプル画像内の各目標対象の予測タイプと前記サンプル画像内の各目標対象のラベルタイプとに基づいて、前記ニューラルネットワークのパラメータ値を調整することとにより、前記ニューラルネットワークをトレーニングする。 In some embodiments, the sample image label information includes a label type for each target object in the sample image, and the apparatus further comprises a training unit, wherein the training unit is a resized sample image. and obtaining a feature map of the resized sample image; and recognizing a target object in the sample image based on the features of each segment obtained by segmenting the feature map. to obtain a prediction type for each target object in the sample image; and based on the prediction type for each target object in the sample image and the label type for each target object in the sample image, the neural network training the neural network by adjusting the parameter values of .

幾つかの実施例において、前記サンプル画像のラベル情報は、各ラベルタイプの目標対象の数を更に含み、前記トレーニングユニットは、前記サンプル画像内の各目標対象の予測タイプと前記サンプル画像内の各目標対象のラベルタイプとに基づいて、前記ニューラルネットワークのパラメータ値を調整するときに、具体的に、前記サンプル画像内の各目標対象の予測タイプと、前記サンプル画像内の各目標対象のラベルタイプと、前記サンプル画像における各ラベルタイプの目標対象の数と、前記サンプル画像における各予測タイプの目標対象の数とに基づいて、前記ニューラルネットワークのパラメータ値を調整する。 In some embodiments, the label information of the sample images further includes a number of target objects of each label type, and the training unit includes a prediction type for each target object in the sample images and a prediction type for each target object in the sample images. Specifically, a prediction type of each target object in the sample image and a label type of each target object in the sample image when adjusting the parameter values of the neural network based on the label type of the target object and and adjusting the parameter values of the neural network based on the number of target objects of each label type in the sample image and the number of target objects of each prediction type in the sample image.

幾つかの実施例において、前記サンプル画像のラベル情報は、前記サンプル画像内の目標対象の総数を更に含み、前記トレーニングユニットは、前記サンプル画像内の各目標対象の予測タイプと前記サンプル画像内の各目標対象のラベルタイプとに基づいて、前記ニューラルネットワークのパラメータ値を調整するときに、前記サンプル画像内の各目標対象の予測タイプと、前記サンプル画像内の各目標対象のラベルタイプと、前記サンプル画像における各予測タイプの目標対象の数の和と、前記サンプル画像内の目標対象の総数とに基づいて、前記ニューラルネットワークのパラメータ値を調整する。 In some embodiments, the label information of the sample images further includes a total number of target objects in the sample images, and the training unit determines a prediction type for each target object in the sample images and a prediction type for each target object in the sample image; a label type for each target object in the sample image; Adjust the parameter values of the neural network based on the sum of the number of target objects of each prediction type in a sample image and the total number of target objects in the sample image.

幾つかの実施例において、前記装置は、テストユニットを更に備え、前記テストユニットは、トレーニングされた前記ニューラルネットワークをテストし、前記テストの結果に基づいて、前記ニューラルネットワークに応じて各タイプの目標対象の認識精度を並べ替え、認識精度の並べ替え結果を取得し、前記テストの結果に基づいて、前記ニューラルネットワークに応じて各タイプの目標対象の誤認識率を並べ替え、誤認識率の並べ替え結果を取得し、前記認識精度の並べ替え結果と前記誤認識率の並べ替え結果とに基づいて、前記ニューラルネットワークを更にトレーニングする。 In some embodiments, the apparatus further comprises a testing unit for testing the trained neural network and, based on the results of the testing, determining each type of target for the neural network. Sorting the recognition accuracy of the object, obtaining a result of sorting the recognition accuracy, sorting the misrecognition rate of each type of target object according to the neural network according to the result of the test, and sorting the misrecognition rate A permutation result is obtained, and the neural network is further trained based on the permutation result of the recognition accuracy and the permutation result of the misrecognition rate.

図４は、本発明の少なくとも１つの実施例に係る電子デバイスのブロック図である。図４に示すように、前記電子デバイスは、プロセッサと、プロセッサ実行可能指令を記憶するためのメモリとを備えてもよい。ただし、前記プロセッサは、前記指令を実行することにより、本発明の何れか１つの実施形態に記載の目標対象認識方法を実施する。 FIG. 4 is a block diagram of an electronic device in accordance with at least one embodiment of the invention. As shown in FIG. 4, the electronic device may comprise a processor and memory for storing processor-executable instructions. However, the processor implements the target object recognition method according to any one of the embodiments of the invention by executing the instructions.

本発明の少なくとも１つの実施例は、コンピュータ可読記憶媒体を更に提供する。前記コンピュータ可読記憶媒体には、コンピュータプログラム指令が記憶され、前記プログラム指令がプロセッサによって実行されたときに、本発明の何れか１つの実施形態に記載の目標対象認識方法は、実施される。 At least one embodiment of the invention further provides a computer-readable storage medium. Computer program instructions are stored on the computer-readable storage medium and, when the program instructions are executed by a processor, the method for target object recognition according to any one embodiment of the invention is performed.

当業者であれば理解できるように、本発明の１つまたは複数の実施例は、方法、システムまたはコンピュータプログラム製品として提供され得る。したがって、本発明の１つまたは複数の実施例は、完全なハードウェアの実施例、完全なソフトウェアの実施例、またはソフトウェアとハードウェアとを組み合わせた態様の実施例の形式を採用してもよい。また、本発明の１つまたは複数の実施例は、１つまたは複数の、コンピュータ利用可能なプログラムコードを含むコンピュータ利用可能な記憶媒体（磁気ディスクメモリ、ＣＤ－ＲＯＭ、光学メモリ等を含むが、それらに限定されない）で実施されるコンピュータプログラム製品の形式を採用してもよい。 As will be appreciated by those skilled in the art, one or more embodiments of the invention may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. . One or more embodiments of the present invention may also include one or more computer-usable storage media (including magnetic disk memories, CD-ROMs, optical memories, etc.) containing computer-usable program code. may take the form of a computer program product implemented in, but not limited to,

本発明における「および／または」は、両者のうちの１つを少なくとも含むことを示す。例えば、「Ａおよび／またはＢ」は、Ａ、Ｂ、および「ＡとＢ」という３つの形態を含む。 In the present invention, "and/or" means including at least one of both. For example, "A and/or B" includes the three forms A, B, and "A and B."

本発明における各実施例は、何れも漸進の方式で記述され、各実施例は、他の実施例との相違点を重点的に説明し、各実施例同士の同じまたは類似する部分が互いに参照すればよい。特にデータ処理デバイスの実施例は、方法実施例に基本的に類似するため、記述が相対的に簡単であり、関連箇所が方法実施例の部分の説明を参照すればよい。 Each embodiment of the present invention will be described in a progressive manner, each embodiment will focus on the differences from other embodiments, and the same or similar parts of each embodiment will refer to each other. do it. In particular, since the data processing device embodiment is basically similar to the method embodiment, the description is relatively simple, and the relevant part can be referred to the description of the method embodiment.

以上は、本発明の特定の実施例について記述した。他の実施例は、添付する特許請求の範囲のスコープ内に含まれる。幾つかの場合において、特許請求の範囲に記載の挙動またはことは、実施例における順番と異なる順番で実行可能であり、且つ依然として所望の結果を得ることができる。また、図面に描かれた手順は、示された特定の順番または連続順番でないと所望の結果を得られないことを要求するとは限らない。幾つかの実施形態において、マルチタスク処理および並行処理も、実行可能であり、または有利なものである。 The foregoing has described specific embodiments of the invention. Other implementations are within the scope of the appended claims. In some cases, the actions or things recited in the claims can be performed in a different order than in the examples and still achieve desirable results. Also, the steps depicted in the figures do not require that the particular order or sequential order shown must be used to achieve the desired result. Multitasking and parallel processing are also possible or advantageous in some embodiments.

本発明に記述されたテーマおよび機能操作の実施例は、デジタル電子回路、有形的に体現されたコンピュータソフトウェア若しくはファームウェア、本発明に開示された構造およびその構造的均等物を含むコンピュータハードウェア、またはそれらのうちの１つまたは複数の組み合わせにおいて実現され得る。本発明に記述されたテーマの実施例は、１つまたは複数のコンピュータプログラム、即ち、有形の非一時的なプログラムキャリア上にコーディングされることでデータ処理装置によって実行されまたはデータ処理装置の操作を制御されるコンピュータプログラム指令における１つまたは複数のモジュールとして実現され得る。代替的にまたは追加的に、プログラム指令は、人工で生成された伝送信号、例えばデバイスで生成された電気、光または電磁的信号にコーディングされてもよい。当該信号は、生成されることで情報を符号化して適切な受信機装置へ伝送されてデータ処理装置に実行させる。コンピュータ記憶媒体は、デバイス読み取り可能な記憶デバイス、デバイス読み取り可能な記憶基板、ランダム若しくはシリアルアクセスメモリデバイス、またはそれらのうちの１つ若しくは複数の組み合わせであってもよい。 Embodiments of the themes and functional operations described in this invention can be digital electronic circuits, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this invention and their structural equivalents, or It can be implemented in one or more combinations thereof. Embodiments of the subject matter described in the present invention are implemented by one or more computer programs, i.e., coded on a tangible, non-transitory program carrier, to be executed by or to operate a data processing apparatus. It can be implemented as one or more modules in controlled computer program instructions. Alternatively or additionally, program instructions may be encoded in a man-made transmission signal, such as a device-generated electrical, optical or electromagnetic signal. Such signals are generated to encode information and be transmitted to appropriate receiver equipment for execution by data processing equipment. A computer storage medium may be a device readable storage device, a device readable storage substrate, a random or serial access memory device, or a combination of one or more thereof.

本発明に記述された処理および論理フローは、１つまたは複数のコンピュータプログラムを実行する１つまたは複数のプログラマブルコンピュータによって実施されて、入力データに応じて操作を行って出力を生成して対応する機能を実行させてもよい。前記処理および論理フローは、専用論理回路、例えばＦＰＧＡ（フィールドプログラマブルゲートアレイ）またはＡＳＩＣ（特定用途向け集積回路）によって実行されてもよく、装置も専用論理回路として実現されてもよい。 The processes and logic flows described in the present invention are implemented by one or more programmable computers executing one or more computer programs to operate and respond to input data by generating output. function may be performed. The processing and logic flow may be performed by dedicated logic circuits, such as FPGAs (Field Programmable Gate Arrays) or ASICs (Application Specific Integrated Circuits), and the device may also be implemented as dedicated logic circuits.

コンピュータプログラムの実行に適するコンピュータは、例えば、汎用および／または専用マイクロプロセッサ、または如何なる他のタイプの中央処理装置を含む。通常、中央処理装置は、読み出し専用メモリおよび／またはランダムアクセスメモリから指令およびデータを受信する。コンピュータの基本ユニットは、指令を実施や実行するための中央処理装置と、指令およびデータを記憶するための１つまたは複数のメモリデバイスとを備える。通常、コンピュータは、更に、データを記憶するための１つまたは複数の大容量記憶デバイス、例えば、磁気ディスク、光磁気ディスクまたは光ディスク等を含み、または、コンピュータは、この大容量記憶デバイスに操作可能にカップリングされてそれからデータを受信したりそれへデータを伝送したりし、または、２種の状況を兼ね備える。しかし、コンピュータは、このようなデバイスを必ず有するとは限らない。また、コンピュータは、別のデバイス、例えば、携帯電話、パーソナルデジタルアシスタント（ＰＤＡ）、モバイルオーディオまたはビデオプレーヤ、ゲームコンソール、全地球測位システム（ＧＰＳ）受信機、または、例えばユニバーサルシリアルバス（ＵＳＢ）フラッシュメモリドライバの携帯型記憶デバイスに組み込まれてもよい。以上は、単に幾つかの例である。 Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. The central processing unit typically receives instructions and data from read-only memory and/or random access memory. The basic unit of a computer comprises a central processing unit for implementing and executing instructions and one or more memory devices for storing instructions and data. Typically, a computer also includes one or more mass storage devices, such as magnetic, magneto-optical, or optical disks, for storing data, or the computer can operate on this mass storage device. to receive data from or transmit data to, or combine the two situations. However, computers do not necessarily have such devices. The computer may also be connected to another device such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or universal serial bus (USB) flash, for example. A memory driver may be incorporated into the portable storage device. The above are just some examples.

コンピュータプログラム指令およびデータを記憶するのに適するコンピュータ可読媒体は、あらゆる形態の不揮発性メモリ、メディアとメモリデバイスを含み、例えば、半導体メモリデバイス（例えば、ＥＰＲＯＭ、ＥＥＰＲＯＭとフラッシュメモリデバイス）、磁気ディスク（例えば、内部ハードディスクまたはリムーバブルディスク）、光磁気ディスクおよびＣＤＲＯＭとＤＶＤ－ＲＯＭディスクを含む。プロセッサとメモリは、専用論理回路によって補充されまたは専用論理回路に統合されてもよい。 Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices such as semiconductor memory devices (e.g. EPROM, EEPROM and flash memory devices), magnetic disks (e.g. internal hard disk or removable disk), magneto-optical disks and CD ROM and DVD-ROM disks. The processor and memory may be supplemented by or integrated into dedicated logic circuitry.

本発明が大量の具体的な実施詳細を含むが、これらの詳細は、本発明を制限するとは解釈すべきではなく、主に本発明の具体的な実施例の特徴を記述するために用いられる。本発明の複数の実施例に記述された幾つかの特徴は、単一の実施例において組み合わせて実施されてもよい。その一方、単一の実施例に記述された各種の特徴は、複数の実施例に分けて実施され、または、如何なる適切なサブ組み合わせとして実施されてもよい。また、特徴が上記のように幾つかの組み合わせにおいて役割を果たし、ひいてはこのように保護するように要求されてもよいが、保護請求される組み合わせからの１つまたは複数の特徴は、幾つかの場合において当該組み合わせから除去されてもよく、更に、保護請求される組み合わせは、サブ組み合わせまたはサブ組み合わせの変形を指してもよい。 Although the present invention includes a large amount of specific implementation details, these details should not be construed as limiting the invention, but are primarily used to describe the features of the specific embodiment of the invention. . Certain features described in multiple embodiments of the invention may be implemented in combination in a single embodiment. On the other hand, various features that are described in a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Also, although features as described above may play a role in, and thus be claimed to protect in, some combinations, one or more features from the claimed combination may be In some cases may be omitted from the combination and the claimed combination may refer to sub-combinations or variations of sub-combinations.

類似的に、図面に特定の順番で操作が描かれたが、これらの操作が示された特定の順番で実行されまたは順に実行されまたは全ての例示の操作が実行されて所望の結果を得ることを要求するとして理解すべきではない。幾つかの場合に、マルチタスクおよび並行処理は、有利である可能性がある。また、上記実施例における各種のシステムモジュールとユニットの分離は、全ての実施例においてこのような分離を必要とすると理解すべきではない。更に、理解できるように、記述されるプログラムユニットおよびシステムは、通常、単一のソフトウェア製品に統合されてもよく、または複数のソフトウェア製品としてカプセル化されてもよい。 Similarly, although the figures depict operations in a particular order, it is understood that these operations may be performed in the particular order shown or may be performed in sequence or all illustrated operations may be performed to obtain a desired result. should not be understood as requiring In some cases, multitasking and parallel processing can be advantageous. Also, the separation of various system modules and units in the above embodiments should not be understood to require such separation in all embodiments. Moreover, it will be appreciated that the program units and systems described may typically be integrated into a single software product or encapsulated as multiple software products.

このように、テーマの特定実施例が記述された。他の実施例は、添付する特許請求の範囲のスコープ内に含まれる。幾つかの場合において、特許請求の範囲に記載の動作は、異なる順番で実行可能であり、且つ依然として所望の結果を得ることができる。また、図面に描かれた処理が必ずしも示された特定の順番または連続順番で所望の結果を得るとは限らない。幾つかの実施形態において、マルチタスク処理および並行処理は、有利である可能性がある。 Thus, a specific embodiment of the theme has been described. Other implementations are within the scope of the appended claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desired results. Moreover, the operations depicted in the figures do not necessarily yield the desired results in the particular order or sequential order presented. In some embodiments, multitasking and parallel processing can be advantageous.

上述したのは、本発明の幾つかの実施例であり、本発明を制限するためのものではない。本発明の精神および原則内でなされた如何なる変更、均等物による置換、改良等も、本発明の範囲内に含まれるべきである。 The above are some embodiments of the present invention and are not intended to limit the present invention. Any change, equivalent substitution, improvement, etc. made within the spirit and principle of the present invention should be included within the scope of the present invention.

Claims

A target object recognition method comprising:
segmenting from the acquired image a target image comprising a plurality of stacked target objects awaiting recognition;
adjusting the height of the target image to a predetermined height;
extracting a feature map of the adjusted target image;
segmenting the feature map along a dimension corresponding to the height direction of the target image to obtain features for a predetermined number of segments;
performing target object recognition based on each segment feature of the predetermined number of segment features;
A target object recognition method, wherein the height direction of the target image is a direction in which the plurality of target objects waiting for recognition are stacked.

Adjusting the height of the target image to the predetermined height includes:
scaling the height and width of the target image in equal proportions until the width of the target image after scaling reaches a predetermined width;
If the height of the target image after scaling is greater than the predetermined height, then adjust the height and width of the target image after scaling until the height of the target image after reduction is equal to the predetermined height. 2. The method of claim 1, comprising: shrinking by a ratio.

Adjusting the height of the target image to the predetermined height includes:
scaling the height and width of the target image in equal proportions until the width of the target image after scaling reaches a predetermined width;
When the height of the target image after scaling is smaller than the predetermined height, filling the target image after scaling using a first pixel reduces the height of the filled target image. 2. The method of claim 1, further comprising: reaching said predetermined height.

the target object awaiting recognition in the target image is a sheet-like object, the thickness of each target object awaiting recognition is equal, and the plurality of target objects awaiting recognition are stacked along a thickness direction; and,
The target object recognition method according to claim 1, wherein the predetermined height is an integral multiple of the thickness.

2. The feature map extraction and target object recognition are both performed by a neural network, and the neural network is trained using sample images and their label information. 5. The method of target object recognition according to any one of items 4 to 4.

the sample image label information includes a label type for each target object in the sample image;
The neural network is
performing feature extraction on the size-adjusted sample image to obtain a feature map of the size-adjusted sample image;
recognizing a target object in a sample image based on the feature of each segment obtained by segmenting the feature map to obtain a prediction type of each target object in the sample image;
adjusting parameter values of the neural network based on the prediction type of each target object in the sample image and the label type of each target object in the sample image. 6. The target object recognition method according to claim 5.

the sample image label information further includes a target number of targets for each label type;
Adjusting parameter values of the neural network includes:
a prediction type of each target object in the sample image; a label type of each target object in the sample image; a number of target objects of each label type in the sample image; and a target object of each prediction type in the sample image. 7. The method of claim 6, comprising adjusting parameter values of the neural network based on the number of .

the sample image label information further includes a total number of target objects in the sample image;
Adjusting parameter values of the neural network includes:
a prediction type for each target object in the sample image; a label type for each target object in the sample image; a sum of the number of target objects of each prediction type in the sample image; 7. The method of claim 6, comprising adjusting parameter values of the neural network based on a total number of

testing the trained neural network;
rearranging the recognition accuracies of each type of target object according to the neural network according to the results of the testing to obtain a recognition accuracy rearrangement result;
rearranging the misrecognition rate of each type of target object according to the neural network according to the result of the test, and obtaining a rearrangement result of the misrecognition rate;
6. The method of claim 5, further comprising: further training the neural network based on the recognition accuracy permutation results and the false recognition rate permutation results.

A target object recognizer comprising:
an acquisition unit for segmenting from the acquired image a target image comprising a plurality of stacked target objects awaiting recognition;
an adjustment unit for adjusting the height of the target image to a predetermined height;
an extraction unit for extracting a feature map of the adjusted target image;
a segmentation unit for segmenting the feature map along a dimension corresponding to the height direction of the target image to obtain features of a predetermined number of segments;
a recognition unit for recognizing a target object based on features of each segment of the predetermined number of segment features;
The target object recognition device, wherein the height direction of the target image is a direction in which the plurality of target objects waiting for recognition are stacked.

The adjustment unit is
scaling the height and width of the target image by equal proportions until the width of the target image after scaling reaches a predetermined width; and
If the height of the target image after scaling is greater than the predetermined height, then adjust the height and width of the target image after scaling until the height of the target image after reduction is equal to the predetermined height. 11. The target object recognizer of claim 10, wherein the target object recognition device is scaled down by a ratio.

The adjustment unit is
scaling the height and width of the target image by equal proportions until the width of the target image after scaling reaches a predetermined width; and
When the height of the target image after scaling is smaller than the predetermined height, filling the target image after scaling using a first pixel reduces the height of the filled target image. 11. The target object recognition device according to claim 10, wherein the predetermined height is set.

the target object awaiting recognition in the target image is a sheet-like object, the thickness of each target object awaiting recognition is equal, and the plurality of target objects awaiting recognition are stacked along a thickness direction; and,
The target object recognition device according to claim 10, wherein the predetermined height is an integral multiple of the thickness.

10. The feature map extraction and target object recognition are both performed by a neural network, and the neural network is trained using sample images and their label information. 14. A target object recognizer according to any one of Claims 1-13.

the sample image label information includes a label type for each target object in the sample image;
The target object recognition device further comprises a training unit,
The training unit comprises:
performing feature extraction on the size-adjusted sample image to obtain a feature map of the size-adjusted sample image;
recognizing a target object in a sample image based on the feature of each segment obtained by segmenting the feature map to obtain a prediction type of each target object in the sample image;
training the neural network by adjusting parameter values of the neural network based on the prediction type of each target object in the sample image and the label type of each target object in the sample image. 15. The target object recognizer of claim 14.

the sample image label information further includes a target number of targets for each label type;
The training unit comprises: a prediction type for each target object in the sample image; a label type for each target object in the sample image; a number of target objects of each label type in the sample image; 16. The target object recognizer of claim 15, wherein the parameter values of the neural network are adjusted based on the number of target objects of predictive type.

the sample image label information further includes a total number of target objects in the sample image;
The training unit comprises: a prediction type for each target object in the sample image; a label type for each target object in the sample image; a sum of the number of target objects of each prediction type in the sample image; 16. The target object recognizer of claim 15, wherein the parameter values of the neural network are adjusted based on the total number of target objects in the target object.

The target object recognition device further comprises a test unit,
The test unit includes:
testing said trained neural network;
rearranging the recognition accuracy of each type of target object according to the neural network according to the result of the test, obtaining a result of the rearrangement of the recognition accuracy;
sorting the misrecognition rate of each type of target object according to the neural network according to the results of the test, obtaining the sorting result of the misrecognition rate;
15. The target object recognizer of claim 14, wherein the neural network is further trained based on the permutation result of the recognition accuracy and the permutation result of the misrecognition rate.

an electronic device,
a processor;
a memory for storing processor-executable instructions;
10. Electronic device, characterized in that the processor is arranged to implement the method of any one of claims 1 to 9 by executing the instructions.

A computer readable storage medium,
Computer program instructions are stored on the computer readable storage medium, and when the computer program instructions are executed by a processor, a target object recognition method according to any one of claims 1 to 9 is performed. A computer-readable storage medium characterized by: