JP2022544893A

JP2022544893A - Network training method and apparatus, target detection method and apparatus, and electronic equipment

Info

Publication number: JP2022544893A
Application number: JP2021569189A
Authority: JP
Inventors: ハオシュエンドウ; イールーワン; ウェイハオガン; シャオチンルー; ウェイウー; ジュンジエイエン
Original assignee: ベイジン・センスタイム・テクノロジー・デベロップメント・カンパニー・リミテッド
Priority date: 2020-07-15
Filing date: 2020-11-02
Publication date: 2022-10-24
Also published as: CN111881956A; TWI780751B; CN111881956B; TW202205151A; KR20220009965A; WO2022011892A1

Abstract

本願の実施例は、ネットワーク訓練方法及び装置、ターゲット検出方法及び装置並びに電子機器に関する。該ネットワーク訓練方法は、ラベリングされていないサンプル画像をターゲット検出ネットワークに入力して処理を行い、ターゲット検出結果を得ることであって、該ターゲット検出結果は、ターゲットの画像領域、特徴情報及び分類確率を含む、ことと、ターゲットの分類確率に基づいて、ターゲットのカテゴリ信頼度を決定することと、カテゴリ信頼度が閾値以上である第１ターゲットに対して、第１ターゲットが位置するサンプル画像をラベリングされたサンプル画像とし、訓練集合に加えることと、カテゴリ信頼度が第１閾値よりも小さい第２ターゲットに対して、第２ターゲットに対して特徴相関マイニングを行い、第４ターゲットを決定し、それが位置するサンプル画像を訓練集合に加えることと、訓練集合に基づいて、ターゲット検出ネットワークを訓練することと、を含む。Embodiments of the present application relate to network training methods and apparatus, target detection methods and apparatus, and electronic devices. The network training method is to input unlabeled sample images into a target detection network for processing to obtain target detection results, which include target image regions, feature information and classification probabilities. determining a category confidence of the target based on the classification probability of the target; and labeling a sample image in which the first target is located for the first target whose category confidence is greater than or equal to a threshold and adding to the training set, performing feature correlation mining on the second target for which the category confidence is less than the first threshold to determine a fourth target, and adding sample images in which is located to a training set; and training a target detection network based on the training set.

Description

（関連出願の相互参照）
本願は、２０２０年０７月１５日に提出された、出願番号が２０２０１０６８１１７８．２である中国特許出願に基づく優先権を主張し、該中国特許出願の全内容が参照として本願に組み込まれる。 (Cross reference to related applications)
This application claims priority from a Chinese patent application with application number 202010681178.2 filed on Jul. 15, 2020, the entire content of which is incorporated herein by reference.

本願は、コンピュータ技術分野に関し、特にネットワーク訓練方法及び装置、ターゲット検出方法及び装置並びに電子機器に関する。 The present application relates to the field of computer technology, and more particularly to network training method and apparatus, target detection method and apparatus, and electronic equipment.

コンピュータビジョンは、人工知能技術の重要な動向であり、コンピュータビジョン処理において、一般的には、画像又はビデオにおけるターゲット（例えば、通行者、物体など）に対して検出を行う必要がある。大規模なロングテールデータのターゲット検出は、例えば都市監視における異常物体の検出、異常挙動の検出及び突発事象の警報などのような多くの分野において、重要な用途がある。しかしながら、ロングテールデータのデータ量が膨大であり、またポジティブサンプルとネガティブサンプルとのバランスが酷く取られておらず、即ち大部分のデータピクチャが背景画像であり、ピクチャのごく一部のみに検出可能なターゲットが含まれるため、関連技術におけるターゲット検出方式によるロングテールデータのターゲット検出効果が悪いことを引き起こしてしまう。 Computer vision is an important trend in artificial intelligence technology, and in computer vision processing, it is generally necessary to perform detection on targets (eg, pedestrians, objects, etc.) in images or videos. Large-scale long-tail data target detection has important applications in many fields, such as abnormal object detection, abnormal behavior detection, and incident warning in urban surveillance. However, the data volume of the long tail data is huge, and the positive and negative samples are badly balanced, i.e., most of the data pictures are background pictures, and only a small part of the pictures are detected. Because of the inclusion of possible targets, it causes the target detection effect of long-tail data to be poor by target detection schemes in related arts.

本願の実施例は、ネットワーク訓練及びターゲット検出の技術的解決手段を提供する。 Embodiments of the present application provide technical solutions for network training and target detection.

本願の実施例の一態様によれば、ネットワーク訓練方法を提供する。前記方法は、
ラベリングされていない第１サンプル画像をターゲット検出ネットワークに入力して処理を行い、前記第１サンプル画像のターゲット検出結果を得ることであって、前記ターゲット検出結果は、前記第１サンプル画像におけるターゲットの画像領域、特徴情報及び分類確率を含む、ことと、前記ターゲットの分類確率に基づいて、前記ターゲットのカテゴリ信頼度を決定することと、前記ターゲットのうちのカテゴリ信頼度が第１閾値以上である第１ターゲットに対して、前記第１ターゲットが位置する第１サンプル画像をラベリングされた第２サンプル画像とし、訓練集合に加えることであって、ここで、前記第２サンプル画像のラベリング情報は、前記第１ターゲットの画像領域と、前記第１ターゲットのカテゴリ信頼度に対応するカテゴリと、を含み、前記訓練集合に、ラベリングされた第３サンプル画像が含まれる、ことと、前記ターゲットのうちのカテゴリ信頼度が前記第１閾値よりも小さい第２ターゲットに対して、前記第３サンプル画像における第３ターゲットの特徴情報に基づいて、前記第２ターゲットに対して特徴相関マイニングを行い、特徴相関マイニングにより、前記第２ターゲットから、第４ターゲット及び前記第４ターゲットが位置する第１サンプル画像を決定し、前記第４ターゲットが位置する第１サンプル画像を第４サンプル画像とし、前記訓練集合に加えることと、前記第４サンプル画像のラベリング情報、前記訓練集合における第２サンプル画像、第３サンプル画像及び前記第４サンプル画像に基づいて、前記ターゲット検出ネットワークを訓練することと、を含む。 According to one aspect of embodiments of the present application, a network training method is provided. The method includes:
inputting an unlabeled first sample image into a target detection network and processing to obtain a target detection result of the first sample image, wherein the target detection result is the number of targets in the first sample image; determining a category confidence of the target based on the classification probability of the target; and category confidence of the targets is greater than or equal to a first threshold. For a first target, a first sample image in which the first target is located is a labeled second sample image and added to a training set, wherein the labeling information of the second sample image is: an image region of the first target and a category corresponding to the category confidence of the first target, wherein the training set includes labeled third sample images; For a second target whose category reliability is smaller than the first threshold, feature correlation mining is performed on the second target based on feature information of the third target in the third sample image, and feature correlation mining is performed. from the second target, determine a fourth target and a first sample image on which the fourth target is located, and add the first sample image on which the fourth target is located as a fourth sample image to the training set; and training the target detection network based on the labeling information of the fourth sample images, the second sample images, the third sample images and the fourth sample images in the training set.

可能な実現形態において、前記第４サンプル画像のラベリング情報、前記訓練集合における第２サンプル画像、第３サンプル画像及び前記第４サンプル画像に基づいて、前記ターゲット検出ネットワークを訓練することは、
前記訓練集合のポジティブサンプル画像におけるターゲットのカテゴリに基づいて、各カテゴリのポジティブサンプル画像からサンプリングされる第１数をそれぞれ決定することであって、前記ポジティブサンプル画像は、画像にターゲットが含まれるサンプル画像である、ことと、各カテゴリのポジティブサンプル画像からサンプリングされる第１数に基づいて、各カテゴリのポジティブサンプル画像に対してサンプリングを行い、複数の第５サンプル画像を得ることと、前記訓練集合のネガティブサンプル画像に対してサンプリングを行い、複数の第６サンプル画像を得ることであって、前記ネガティブサンプル画像は、画像にターゲットが含まれないサンプル画像である、ことと、前記第５サンプル画像及び前記第６サンプル画像に基づいて、前記ターゲット検出ネットワークを訓練することと、を含む。 In a possible implementation, training the target detection network based on the labeling information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set comprises:
each determining a first number sampled from each category of positive sample images based on the category of the target in the training set of positive sample images, wherein the positive sample images are samples whose images contain the target; and performing sampling on the positive sample images of each category to obtain a plurality of fifth sample images based on a first number sampled from the positive sample images of each category; sampling against a set of negative sample images to obtain a plurality of sixth sample images, wherein the negative sample images are sample images without targets in the images; and the fifth sample. training the target detection network based on the images and the sixth sample image.

可能な実現形態において、前記第３サンプル画像における第３ターゲットの特徴情報に基づいて、前記第２ターゲットに対して特徴相関マイニングを行い、特徴相関マイニングにより、前記第２ターゲットから、第４ターゲット及び前記第４ターゲットが位置する第１サンプル画像を決定することは、前記第２ターゲットの分類確率に基づいて、前記第２ターゲットの情報エントロピーを決定することと、前記第２ターゲットのカテゴリ信頼度及び情報エントロピーに基づいて、前記第２ターゲットから、第５ターゲットを選択することと、前記第３サンプル画像における第３ターゲットのカテゴリ及びマイニング待ちサンプル画像の総数に基づいて、各カテゴリのマイニング待ちサンプル画像の第２数をそれぞれ決定することと、前記第３サンプル画像における第３ターゲットの特徴情報、前記第５ターゲットの特徴情報及び各カテゴリのマイニング待ちサンプル画像の第２数に基づいて、前記第５ターゲットから、第４ターゲット及び前記第４ターゲットが位置する第１サンプル画像を決定することと、を含む。 In a possible implementation, feature correlation mining is performed on the second target based on the feature information of the third target in the third sample image, the feature correlation mining from the second target to the fourth target and Determining a first sample image in which the fourth target is located includes determining information entropy of the second target based on the classification probability of the second target; category confidence of the second target; selecting a fifth target from the second targets based on the information entropy; and based on the category of the third target in the third sample image and the total number of sample images waiting to be mined, each category of sample images waiting to be mined. and based on the feature information of the third target in the third sample image, the feature information of the fifth target, and the second number of sample images awaiting mining of each category, the fifth determining from the targets a fourth target and a first sample image in which the fourth target is located.

可能な実現形態において、前記第２ターゲットのカテゴリ信頼度及び情報エントロピーに基づいて、前記第２ターゲットから、第５ターゲットを選択することは、前記第２ターゲットのカテゴリ信頼度及び情報エントロピーに基づいて、前記第２ターゲットに対してそれぞれ順序付けを行い、第３数の第６ターゲット及び第４数の第７ターゲットを選択することと、前記第６ターゲットと前記第７ターゲットに対してマージを行い、前記第５ターゲットを得ることと、を含む。 In a possible implementation, selecting a fifth target from the second target based on the categorical confidence and information entropy of the second target comprises: , ordering for the second target respectively, selecting a third number of the sixth target and a fourth number of the seventh target; performing a merge on the sixth and the seventh target; obtaining the fifth target.

可能な実現形態において、前記第３サンプル画像における第３ターゲットのカテゴリ及びマイニング待ちサンプル画像の総数に基づいて、各カテゴリのマイニング待ちサンプル画像の第２数をそれぞれ決定することは、前記第３サンプル画像における第３ターゲットのカテゴリに基づいて、各カテゴリの第３ターゲットの割合を決定することと、各カテゴリの第３ターゲットの割合に基づいて、各カテゴリのサンプリング比重を決定することと、各カテゴリのサンプリング比重に基づいて、各カテゴリのマイニング待ちサンプル画像の第２数をそれぞれ決定することと、を含む。 In a possible implementation, determining a second number of awaiting mining sample images for each category based on a third target category and a total number of awaiting mining sample images in said third sample images respectively comprises: Determining the percentage of the third target in each category based on the category of the third target in the image; Determining the sampling weight of each category based on the percentage of the third target in each category; determining a second number of sample images to be mined for each category based on the sampling weights of the respective categories.

可能な実現形態において、前記第３サンプル画像における第３ターゲットの特徴情報、前記第５ターゲットの特徴情報及び各カテゴリのマイニング待ちサンプル画像の第２数に基づいて、前記第５ターゲットから、第４ターゲット及び前記第４ターゲットが位置する第１サンプル画像を決定することは、第１カテゴリの第３ターゲットの特徴情報と各第５ターゲットの特徴情報との距離に基づいて、前記第１カテゴリの第３ターゲットのうち、各第５ターゲットとの距離が最も小さい第３ターゲットをそれぞれ決定し、第８ターゲットとすることであって、前記第１カテゴリは、第３ターゲットのカテゴリのうちのいずれか１つである、ことと、前記第８ターゲットのうち、距離が最も大きいターゲットを第４ターゲットとして決定することと、を含む。 In a possible implementation, based on the feature information of the third target in the third sample image, the feature information of the fifth target and a second number of sample images to be mined for each category, from the fifth target, the fourth Determining a first sample image in which the target and the fourth target are located may be performed based on the distance between the feature information of the third target of the first category and the feature information of each fifth target of the first category. A third target having the shortest distance to each fifth target among the three targets is determined as the eighth target, and the first category is any one of the categories of the third targets. and determining a target with the longest distance among the eighth targets as the fourth target.

可能な実現形態において、前記第３サンプル画像における第３ターゲットの特徴情報、前記第５ターゲットの特徴情報及び各カテゴリのマイニング待ちサンプル画像の第２数に基づいて、前記第５ターゲットから、第４ターゲット及び前記第４ターゲットが位置する第１サンプル画像を決定することは、決定された第４ターゲットを前記第１カテゴリの第３ターゲットに加え、前記決定された第４ターゲットをラベリングされていない第５ターゲットから除去することを更に含む。 In a possible implementation, based on the feature information of the third target in the third sample image, the feature information of the fifth target and a second number of sample images to be mined for each category, from the fifth target, the fourth Determining a target and a first sample image in which the fourth target is located includes adding the determined fourth target to the third target in the first category and adding the determined fourth target to the unlabeled third target. 5 removing from the target.

可能な実現形態において、前記方法は、前記第３サンプル画像を前記ターゲット検出ネットワークに入力して処理を行い、前記第３サンプル画像における第３ターゲットの特徴情報を得ることを更に含む。 In a possible implementation, the method further comprises inputting and processing the third sample image into the target detection network to obtain feature information of a third target in the third sample image.

可能な実現形態において、ラベリングされていない第１サンプル画像をターゲット検出ネットワークに入力して処理を行い、前記第１サンプル画像のターゲット検出結果を得るステップの前に、前記方法は、
ラベリングされた第３サンプル画像により、前記ターゲット検出ネットワークに対して事前訓練を行うことを更に含む。 In a possible implementation, prior to the step of inputting and processing an unlabeled first sample image into a target detection network to obtain a target detection result for said first sample image, said method comprises:
Further comprising pre-training the target detection network with labeled third sample images.

可能な実現形態において、前記第１サンプル画像は、ロングテール画像を含む。 In a possible implementation, said first sample image comprises a long tail image.

本願の実施例の一態様によれば、ターゲット検出方法を提供する。該方法は、処理待ち画像をターゲット検出ネットワークに入力して処理を行い、前記処理待ち画像のターゲット検出結果を得ることであって、前記ターゲット検出結果は、前記処理待ち画像におけるターゲットの位置及びカテゴリを含み、前記ターゲット検出ネットワークは、上記ネットワーク訓練方法により訓練されたものである、ことを含む。 According to one aspect of embodiments of the present application, a method of target detection is provided. The method includes inputting a pending image to a target detection network for processing to obtain a target detection result of the pending image, wherein the target detection result is a position and category of a target in the pending image. and wherein the target detection network has been trained by the network training method described above.

本願の実施例の一態様によれば、ネットワーク訓練装置を提供する。前記装置は、
ラベリングされていない第１サンプル画像をターゲット検出ネットワークに入力して処理を行い、前記第１サンプル画像のターゲット検出結果を得るように構成されるターゲット検出部であって、前記ターゲット検出結果は、前記第１サンプル画像におけるターゲットの画像領域、特徴情報及び分類確率を含む、ターゲット検出部と、
前記ターゲットの分類確率に基づいて、前記ターゲットのカテゴリ信頼度を決定するように構成される信頼度決定部と、
前記ターゲットのうちのカテゴリ信頼度が第１閾値以上である第１ターゲットに対して、前記第１ターゲットが位置する第１サンプル画像をラベリングされた第２サンプル画像とし、訓練集合に加えるように構成されるラベリング部であって、前記第２サンプル画像のラベリング情報は、前記第１ターゲットの画像領域と、前記第１ターゲットのカテゴリ信頼度に対応するカテゴリと、を含み、前記訓練集合に、ラベリングされた第３サンプル画像が含まれる、ラベリング部と、
前記ターゲットのうちのカテゴリ信頼度が前記第１閾値よりも小さい第２ターゲットに対して、前記第３サンプル画像における第３ターゲットの特徴情報に基づいて、前記第２ターゲットに対して特徴相関マイニングを行い、特徴相関マイニングにより、前記第２ターゲットから、第４ターゲット及び前記第４ターゲットが位置する第１サンプル画像を決定し、前記第４ターゲットが位置する第１サンプル画像を第４サンプル画像とし、前記訓練集合に加えるように構成される特徴マイニング部と、
前記第４サンプル画像のラベリング情報、前記訓練集合における第２サンプル画像、第３サンプル画像及び前記第４サンプル画像に基づいて、前記ターゲット検出ネットワークを訓練するように構成される訓練部と、を備える。 According to one aspect of embodiments of the present application, a network training device is provided. The device comprises:
A target detection unit configured to input and process an unlabeled first sample image into a target detection network to obtain a target detection result of the first sample image, wherein the target detection result is the a target detector, including target image regions, feature information and classification probabilities in the first sample image;
a confidence determiner configured to determine a category confidence of the target based on the classification probabilities of the target;
For a first target among the targets whose category reliability is greater than or equal to a first threshold, a first sample image in which the first target is located is defined as a labeled second sample image and added to the training set. wherein the labeling information of the second sample image includes an image region of the first target and a category corresponding to the category confidence of the first target, and the training set is labeled with a labeling portion containing a third sample image;
For a second target among the targets whose category reliability is smaller than the first threshold, feature correlation mining is performed on the second target based on the feature information of the third target in the third sample image. determining a fourth target and a first sample image in which the fourth target is located from the second target by feature correlation mining, and using the first sample image in which the fourth target is located as a fourth sample image; a feature mining unit configured to add to the training set;
a training unit configured to train the target detection network based on the labeling information of the fourth sample images, the second sample images, the third sample images and the fourth sample images in the training set. .

可能な実現形態において、前記訓練部は、前記訓練集合のポジティブサンプル画像におけるターゲットのカテゴリに基づいて、各カテゴリのポジティブサンプル画像からサンプリングされる第１数をそれぞれ決定するように構成されるサンプリング数決定サブ部であって、前記ポジティブサンプル画像は、画像にターゲットが含まれるサンプル画像である、サンプリング数決定サブ部と、各カテゴリのポジティブサンプル画像からサンプリングされる第１数に基づいて、各カテゴリのポジティブサンプル画像に対してサンプリングを行い、複数の第５サンプル画像を得るように構成される第１サンプリングサブ部と、前記訓練集合のネガティブサンプル画像に対してサンプリングを行い、複数の第６サンプル画像を得るように構成される第２サンプリングサブ部であって、前記ネガティブサンプル画像は、画像にターゲットが含まれないサンプル画像である、第２サンプリングサブ部と、前記第５サンプル画像及び前記第６サンプル画像に基づいて、前記ターゲット検出ネットワークを訓練するように構成される訓練サブ部と、を備える。 In a possible implementation, the training unit comprises a sampling number configured to respectively determine a first number to be sampled from each category of positive sample images based on target categories in the training set of positive sample images. a determining sub-unit, wherein the positive sample images are sample images whose images contain the target; a first sampling sub-unit configured to sample positive sample images of the training set to obtain a plurality of fifth sample images; and sample negative sample images of the training set to obtain a plurality of sixth samples. a second sampling sub-section configured to obtain an image, wherein the negative sample image is a sample image in which no target is included in the image; a training sub-unit configured to train the target detection network based on 6 sample images.

可能な実現形態において、前記特徴マイニング部は、前記第２ターゲットの分類確率に基づいて、前記第２ターゲットの情報エントロピーを決定するように構成される情報エントロピー決定サブ部と、前記第２ターゲットのカテゴリ信頼度及び情報エントロピーに基づいて、前記第２ターゲットから、第５ターゲットを選択するように構成されるターゲット選択サブ部と、前記第３サンプル画像における第３ターゲットのカテゴリ及びマイニング待ちサンプル画像の総数に基づいて、各カテゴリのマイニング待ちサンプル画像の第２数をそれぞれ決定するように構成されるマイニング数決定サブ部と、前記第３サンプル画像における第３ターゲットの特徴情報、前記第５ターゲットの特徴情報及び各カテゴリのマイニング待ちサンプル画像の第２数に基づいて、前記第５ターゲットから、第４ターゲット及び前記第４ターゲットが位置する第１サンプル画像を決定するように構成されるターゲット及び画像決定サブ部と、を備える。 In a possible implementation, the feature mining unit comprises an information entropy determination sub-unit configured to determine the information entropy of the second target based on the classification probability of the second target; a target selection sub-unit configured to select a fifth target from the second target based on category confidence and information entropy; a mining number determination sub-unit configured to respectively determine a second number of sample images waiting to be mined for each category based on the total number; feature information of a third target in the third sample image; Targets and images configured to determine, from said fifth target, a fourth target and a first sample image in which said fourth target is located, based on feature information and a second number of sample images to be mined for each category. a decision sub-part.

可能な実現形態において、前記ターゲット選択サブ部は、前記第２ターゲットのカテゴリ信頼度及び情報エントロピーに基づいて、前記第２ターゲットに対してそれぞれ順序付けを行い、第３数の第６ターゲット及び第４数の第７ターゲットを選択し、前記第６ターゲットと前記第７ターゲットに対してマージを行い、前記第５ターゲットを得るように構成される。 In a possible implementation, the target selection sub-unit orders the second targets based on the category confidence and information entropy of the second target, respectively; selecting a seventh target of the number and performing a merge on said sixth target and said seventh target to obtain said fifth target.

可能な実現形態において、前記マイニング数決定サブ部は、前記第３サンプル画像における第３ターゲットのカテゴリに基づいて、各カテゴリの第３ターゲットの割合を決定し、各カテゴリの第３ターゲットの割合に基づいて、各カテゴリのサンプリング比重を決定し、各カテゴリのサンプリング比重に基づいて、各カテゴリのマイニング待ちサンプル画像の第２数をそれぞれ決定するように構成される。 In a possible implementation, the mining number determination sub-unit determines the percentage of third targets in each category based on the categories of third targets in the third sample image, and determines the percentage of third targets in each category. and determining a second number of sample images to be mined for each category based on the sampling weight of each category.

可能な実現形態において、前記ターゲット及び画像決定サブ部は、第１カテゴリの第３ターゲットの特徴情報と各第５ターゲットの特徴情報との距離に基づいて、前記第１カテゴリの第３ターゲットのうち、各第５ターゲットとの距離が最も小さい第３ターゲットをそれぞれ決定し、第８ターゲットとし、前記第１カテゴリは、第３ターゲットのカテゴリのうちのいずれか１つであり、前記第８ターゲットのうち、距離が最も大きいターゲットを第４ターゲットとして決定するように構成される。 In a possible implementation, the target and image determination subunit selects among the third targets of the first category based on the distance between the characteristic information of the third target of the first category and the characteristic information of each fifth target. , a third target having the smallest distance from each fifth target is determined as an eighth target, the first category is any one of the categories of the third targets, and the eighth target is Among them, the target with the longest distance is determined as the fourth target.

可能な実現形態において、前記ターゲット及び画像決定サブ部は更に、決定された第４ターゲットを前記第１カテゴリの第３ターゲットに加え、前記決定された第４ターゲットをラベリングされていない第５ターゲットから除去するように構成される。 In a possible implementation, the target and image determination sub-unit further adds the determined fourth target to the third target of the first category, and subtracts the determined fourth target from the unlabeled fifth target. configured to remove.

可能な実現形態において、前記装置は、前記第３サンプル画像を前記ターゲット検出ネットワークに入力して処理を行い、前記第３サンプル画像における第３ターゲットの特徴情報を得るように構成される特徴抽出部を更に備える。 In a possible implementation, the apparatus comprises a feature extractor configured to input and process the third sample image into the target detection network to obtain feature information of a third target in the third sample image. Further prepare.

可能な実現形態において、前記ターゲット検出部の前に、前記装置は、ラベリングされた第３サンプル画像により、前記ターゲット検出ネットワークに対して事前訓練を行うように構成される事前訓練部を更に備える。 In a possible implementation, before the target detection unit, the device further comprises a pre-training unit configured to pre-train the target detection network with labeled third sample images.

本願の実施例の一態様によれば、ターゲット検出装置を提供する。前記装置は、処理待ち画像をターゲット検出ネットワークに入力して処理を行い、前記処理待ち画像のターゲット検出結果を得るように構成される検出処理部であって、前記ターゲット検出結果は、前記処理待ち画像におけるターゲットの位置及びカテゴリを含み、前記ターゲット検出ネットワークは、上記ネットワーク訓練方法により訓練されたものである、検出処理部を備える。 According to one aspect of embodiments of the present application, a target detection apparatus is provided. The apparatus is a detection processing unit configured to input a pending image to a target detection network for processing to obtain a target detection result of the pending image, wherein the target detection result is the pending image. The target detection network comprises a detection processing unit, including the location and category of the target in the image, which has been trained according to the network training method described above.

可能な実現形態において、前記訓練集合のポジティブサンプル画像におけるターゲットのカテゴリに基づいて、各カテゴリのポジティブサンプル画像からサンプリングされる第１数をそれぞれ決定するステップの前に、訓練集合におけるポジティブサンプル画像及びネガティブサンプル画像に対してサンプリングを行い、数が同じであるか又は近いポジティブサンプル画像とネガティブサンプル画像を得ることを含む。 In a possible implementation, before the step of respectively determining the first number to be sampled from each category of positive sample images based on the target category in the training set of positive sample images, positive sample images in the training set and Sampling is performed on the negative sampled images to obtain positive sampled images and negative sampled images that are the same or close in number.

可能な実現形態において、前記マイニング待ちサンプル画像の総数は、前記第１サンプル画像の総数の５％～２５％である。 In a possible implementation, the total number of sample images awaiting mining is between 5% and 25% of the total number of first sample images.

可能な実現形態において、前記第６ターゲット及び前記第７ターゲットに対してマージを行い、前記第５ターゲットを得ることは、前記第６ターゲットのうち、前記第７ターゲットと同じであるターゲットを除去し、前記第６ターゲットのうち、前記第７ターゲットと異なる余剰ターゲットを得ることと、前記余剰ターゲット及び前記第７ターゲットを前記第５ターゲットとすることと、を含む。 In a possible implementation, merging on the sixth target and the seventh target to obtain the fifth target removes those of the sixth targets that are the same as the seventh target. , obtaining a surplus target different from the seventh target among the sixth targets, and setting the surplus target and the seventh target as the fifth target.

可能な実現形態において、前記方法は、前記第１カテゴリの前記第４サンプル画像の数が前記第１カテゴリのマイニング待ちサンプル画像の第２数に達する場合、前記第１カテゴリに対する特徴相関マイニングを終了することを更に含む。 In a possible implementation, the method terminates feature correlation mining for the first category when the number of the fourth sample images of the first category reaches a second number of sample images to be mined of the first category. further comprising:

可能な実現形態において、第１カテゴリの第３ターゲットの特徴情報と各第５ターゲットの特徴情報との距離に基づいて、前記第１カテゴリの第３ターゲットのうち、各第５ターゲットとの距離が最も小さい第３ターゲットをそれぞれ決定し、第８ターゲットとするステップの後に、前記方法は、前記第４ターゲットが位置する第１サンプル画像の数が前記第１カテゴリのマイニング待ちサンプル画像の第２数に達する場合、前記第８ターゲットに対する決定を終了することを更に含む。 In a possible implementation, based on the distance between the feature information of the third target of the first category and the feature information of each fifth target, the distance to each fifth target among the third targets of the first category is determined. After determining each of the smallest third targets to be the eighth target, the method further comprises determining the number of the first sample images in which the fourth targets are located as a second number of to-be-mined sample images of the first category. is reached, terminating the determination for the eighth target.

可能な実現形態において、第１カテゴリの第３ターゲットの特徴情報と各第５ターゲットの特徴情報との距離に基づいて、前記第１カテゴリの第３ターゲットのうち、各第５ターゲットとの距離が最も小さい第３ターゲットをそれぞれ決定し、第８ターゲットとするステップの後に、前記方法は、前記第４ターゲットが位置する第１サンプル画像の数が前記第１カテゴリのマイニング待ちサンプル画像の第２数に達しておらず、且つ前記第５ターゲットの特徴情報を記憶する集合がヌルである場合、前記第８ターゲットに対する決定を終了することを更に含む。 In a possible implementation, based on the distance between the feature information of the third target of the first category and the feature information of each fifth target, the distance to each fifth target among the third targets of the first category is determined. After determining each of the smallest third targets to be the eighth target, the method further comprises determining the number of the first sample images in which the fourth targets are located as a second number of to-be-mined sample images of the first category. is not reached and the set storing feature information for the fifth target is null, terminating the determination for the eighth target.

可能な実現形態において、前記第３サンプル画像を前記ターゲット検出ネットワークに入力して処理を行い、前記第３サンプル画像における第３ターゲットの特徴情報を得ることは、前記第３サンプル画像を前記ターゲット検出ネットワークに入力し、前記ターゲット検出ネットワークの隠れ層から出力された特徴ベクトルを得ることと、前記特徴ベクトルを前記第３ターゲットの特徴情報として決定することと、を含む。 In a possible implementation, inputting and processing the third sample image into the target detection network to obtain feature information of a third target in the third sample image comprises converting the third sample image into the target detection network. inputting to a network and obtaining a feature vector output from a hidden layer of the target detection network; and determining the feature vector as the feature information of the third target.

本願の実施例の一態様によれば、電子機器を提供する。前記電子機器は、プロセッサと、プロセッサによる実行可能な命令を記憶するメモリと、を備え、前記プロセッサは、前記メモリに記憶されている命令を呼び出して、上記ネットワーク訓練方法を実行するか又は上記ターゲット検出方法を実行する。 According to one aspect of an embodiment of the present application, an electronic device is provided. The electronic device comprises a processor and a memory storing instructions executable by the processor, the processor calling the instructions stored in the memory to perform the network training method or to train the target. Run the detection method.

本願の実施例の一態様によれば、コンピュータ可読記憶媒体を提供する。前記コンピュータ可読記憶媒体には、コンピュータプログラム命令が記憶されており、前記コンピュータプログラム命令がプロセッサにより実行される時、プロセッサにネットワーク訓練方法を実現させるか又は上記ターゲット検出方法を実現させる。 According to one aspect of an embodiment of the present application, a computer-readable storage medium is provided. The computer readable storage medium stores computer program instructions which, when executed by a processor, cause the processor to implement the network training method or implement the target detection method described above.

本願の実施例によれば、ターゲット検出ネットワークにより、ラベリングされていないサンプル画像のターゲット検出結果を取得し、ターゲット検出結果に基づいて、疑似ラベリング及び特徴相関マイニングをそれぞれ行い、価値が高いサンプル画像をラベリングして収集し、訓練集合に加え、拡張後の訓練集合に基づいて、ターゲット検出ネットワークを訓練し、それにより訓練集合におけるポジティブサンプルデータの数を拡張し、ポジティブサンプルとネガティブサンプルとのバランスが取られていないという問題を軽減し、ターゲット検出ネットワークの訓練効果を向上させる。 According to the embodiments of the present application, the target detection network obtains target detection results of unlabeled sample images, and based on the target detection results, performs pseudo-labeling and feature correlation mining, respectively, to obtain high-value sample images. Labeling and collecting, adding to the training set, training a target detection network based on the expanded training set, thereby expanding the number of positive sample data in the training set, and balancing the positive and negative samples It reduces the problem of not being taken and improves the training effectiveness of the target detection network.

上記の一般的な説明及び後述する細部に関する説明は、例示及び説明のためのものに過ぎず、本願の実施例を限定するものではないことが理解されるべきである。本願の実施例の他の特徴及び態様は、下記の図面に基づく例示的な実施例の詳細な説明を参照すれば明らかになる。 It is to be understood that the general descriptions above and the detailed descriptions that follow are exemplary and explanatory only and are not restrictive of the embodiments of the present application. Other features and aspects of embodiments of the present application will become apparent with reference to the following detailed description of exemplary embodiments based on the drawings.

ここで添付した図面は、明細書に引き入れて本明細書の一部分を構成し、これらの図面は、本願に適合する実施例を示し、かつ、明細書とともに本願の実施例の技術的解決手段を解釈することに用いられる。
本願の実施例によるネットワーク訓練方法を示すフローチャートである。本願の実施例のネットワーク訓練方法の処理プロセスを示す概略図である。本願の実施例によるネットワーク訓練装置を示すブロック図である。本願の実施例による電子機器を示すブロック図である。本願の実施例による電子機器を示すブロック図である。 The drawings attached hereto are incorporated into the specification and constitute a part of the specification, and these drawings show the embodiments consistent with the present application, and together with the description, the technical solutions of the embodiments of the present application. Used for interpretation.
4 is a flowchart illustrating a network training method according to embodiments of the present application; Fig. 3 is a schematic diagram showing the processing process of the network training method of the embodiment of the present application; 1 is a block diagram of a network training device according to an embodiment of the present application; FIG. 1 is a block diagram illustrating an electronic device according to an embodiment of the present application; FIG. 1 is a block diagram illustrating an electronic device according to an embodiment of the present application; FIG.

以下、図面を参照しながら本願の実施例の種々の例示的な実施例、特徴及び態様を詳しく説明する。図面における同一の符号は、同一または類似する機能を有する要素を示す。図面は、実施例の種々の態様を示しているが、特別な説明がない限り、必ずしも比率どおりの図面ではない。 Various exemplary implementations, features and aspects of embodiments of the present application are described in detail below with reference to the drawings. The same reference numerals in the drawings indicate elements having the same or similar functions. The drawings, which illustrate various aspects of the embodiments, are not necessarily drawn to scale unless specifically stated otherwise.

ここで「例示的」という専用用語は、「例、実施例として用いられるか、または説明のためのものである」ことを意味する。ここで、「例示的なもの」として説明される如何なる実施例は、他の実施例より好適または有利であると必ずしも解釈されるべきではない。 The term "exemplary" as used herein means "serving as an example, example, or for the purpose of explanation." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

本明細書において、用語「及び／又は」は、関連対象の関連関係を説明するためのものであり、３通りの関係が存在することを表す。例えば、Ａ及び／又はＢは、Ａのみが存在すること、ＡとＢが同時に存在すること、Ｂのみが存在するという３つの場合を表す。また、本明細書において、用語「少なくとも１つ」は、複数のうちのいずれか１つ又は複数のうちの少なくとも２つの任意の組み合わせを表す。例えば、Ａ、Ｂ、Ｃのうちの少なくとも１つを含むことは、Ａ、Ｂ及びＣからなる集合から選ばれるいずれか１つ又は複数の要素を含むことを表す。 As used herein, the term “and/or” is used to describe a related relationship between related objects, and indicates that there are three types of relationships. For example, A and/or B represents three cases: only A is present, A and B are present at the same time, and only B is present. Also, as used herein, the term "at least one" represents any one of the plurality or any combination of at least two of the plurality. For example, including at least one of A, B, and C means including any one or more elements selected from the set consisting of A, B, and C.

なお、本願の実施例をより良く説明するために、以下の具体的な実施形態において具体的な細部を多く記載した。当業者は、これら具体的な詳細に関わらず、本開示は同様に実施可能であると理解すべきである。本願の実施例の主旨を明確にするために、いくつかの実例において、当業者に熟知されている方法、手段、素子及び回路については詳しく説明しないことにする。 It is noted that many specific details are set forth in the specific embodiments below in order to better describe the embodiments of the present application. It should be understood by those skilled in the art that the present disclosure may be similarly practiced regardless of these specific details. In some instances, methods, means, devices, and circuits that are well known to those skilled in the art are not described in detail in order to avoid obscuring the subject matter of the embodiments of the present application.

図１は、本願の実施例によるネットワーク訓練方法を示すフローチャートである。図１に示すように、前記ネットワーク訓練方法は、以下を含む。 FIG. 1 is a flowchart illustrating a network training method according to embodiments of the present application. As shown in FIG. 1, the network training method includes:

ステップＳ１１において、ラベリングされていない第１サンプル画像をターゲット検出ネットワークに入力して処理を行い、前記第１サンプル画像のターゲット検出結果を得て、前記ターゲット検出結果は、前記第１サンプル画像におけるターゲットの画像領域、特徴情報及び分類確率を含む。 In step S11, an unlabeled first sample image is input to a target detection network and processed to obtain a target detection result of the first sample image; image regions, feature information and classification probabilities.

ステップＳ１２において、前記ターゲットの分類確率に基づいて、前記ターゲットのカテゴリ信頼度を決定する。 In step S12, the category reliability of the target is determined based on the classification probability of the target.

ステップＳ１３において、前記ターゲットのうちのカテゴリ信頼度が第１閾値以上である第１ターゲットに対して、前記第１ターゲットが位置する第１サンプル画像をラベリングされた第２サンプル画像とし、訓練集合に加え、ここで、前記第２サンプル画像のラベリング情報は、前記第１ターゲットの画像領域と、前記第１ターゲットのカテゴリ信頼度に対応するカテゴリと、を含み、前記訓練集合に、ラベリングされた第３サンプル画像が含まれる。 In step S13, for a first target among the targets whose category reliability is greater than or equal to a first threshold, a first sample image in which the first target is located is taken as a labeled second sample image, and a training set is obtained. In addition, wherein the labeling information of the second sample image includes an image region of the first target and a category corresponding to the category confidence of the first target, and the training set includes a labeled 3 sample images are included.

ステップＳ１４において、前記ターゲットのうちのカテゴリ信頼度が前記第１閾値よりも小さい第２ターゲットに対して、前記第３サンプル画像における第３ターゲットの特徴情報に基づいて、前記第２ターゲットに対して特徴相関マイニングを行い、特徴相関マイニングにより、前記第２ターゲットから、第４ターゲット及び前記第４ターゲットが位置する第１サンプル画像を決定し、前記第４ターゲットが位置する第１サンプル画像を第４サンプル画像とし、前記訓練集合に加える。 In step S14, for a second target having a category reliability smaller than the first threshold among the targets, based on the feature information of the third target in the third sample image, performing feature correlation mining, determining a fourth target and a first sample image in which the fourth target is located from the second target by feature correlation mining, and determining a first sample image in which the fourth target is located as a fourth target Take it as a sample image and add it to the training set.

ステップＳ１５において、前記第４サンプル画像のラベリング情報、前記訓練集合における第２サンプル画像、第３サンプル画像及び前記第４サンプル画像に基づいて、前記ターゲット検出ネットワークを訓練する。 In step S15, training the target detection network based on the labeling information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set.

可能な実現形態において、前記方法は、端末機器又はサーバなどの電子機器により実行されてもよく、端末機器は、ユーザ装置（ＵｓｅｒＥｑｕｉｐｍｅｎｔ：ＵＥ）、携帯機器、ユーザ端末、端末、セルラ電話、コードレス電話、パーソナルデジタルアシスタント（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ：ＰＤＡ）、ハンドヘルドデバイス、コンピューティングデバイス、車載機器、ウェアブル機器などであってもよい。前記方法は、プロセッサによりメモリに記憶されているコンピュータ可読命令を呼び出すことで実現することができる。又は、サーバにより前記方法を実行することができる。 In a possible implementation, the method may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a User Equipment (UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless It may be a phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, an in-vehicle device, a wearable device, and the like. The method may be implemented by invoking computer readable instructions stored in memory by a processor. Alternatively, the method can be performed by a server.

例えば、第１サンプル画像は、画像収集機器（例えばカメラ）により収集された画像であってもよい。第１サンプル画像は、大規模なロングテール（Ｌｏｎｇ－ｔａｉｌｅｄ）画像であってもよく、即ち、大部分の画像は、背景画像であり、画像のごく一部に検出可能なターゲットが含まれる。検出可能なターゲットは、例えば、人体、顔、車両、物体などを含んでもよい。例えば、防犯領域において、カメラにより、ある地理的エリアの画像を収集することができる。ごく一部の時間に、誰かが該地理的エリアを通っている可能性がある。それにより、収集された画像の大部分が背景画像であり、画像のごく一部のみに顔及び／又は人体が含まれる。この場合、収集された複数の画像は、ロングテールデータ集合を構成することができる。本願の実施例は、第１サンプル画像の取得方式及び第１サンプル画像におけるターゲットのカテゴリを限定しない。 For example, the first sample image may be an image acquired by an image acquisition device (eg, camera). The first sample image may be a large, Long-tailed image, ie, most of the image is the background image and only a small portion of the image contains detectable targets. Detectable targets may include, for example, human bodies, faces, vehicles, objects, and the like. For example, in a security area, cameras may collect images of a geographic area. A small portion of the time someone may be passing through the geographic area. Thereby, most of the images collected are background images, and only a small portion of the images contain faces and/or human bodies. In this case, the multiple images collected can constitute a long-tail data set. Embodiments of the present application do not limit the acquisition method of the first sample image and the category of targets in the first sample image.

可能な実現形態において、画像におけるターゲットの位置（即ち、検出枠）及びカテゴリを検出するためのターゲット検出ネットワークが予め設定されてもよい。該ターゲット検出ネットワークは、例えば、畳み込みニューラルネットワークであってもよく、本願の実施例は、ターゲット検出ネットワークのネットワーク構造を限定しない。 In a possible implementation, a target detection network may be preconfigured to detect the position (ie detection window) and category of the target in the image. The target detection network may be, for example, a convolutional neural network, and embodiments herein do not limit the network structure of the target detection network.

可能な実現形態において、ステップＳ１１の前に、該方法は、ラベリングされた第３サンプル画像により、前記ターゲット検出ネットワークに対して事前訓練を行うことを更に含む。つまり、訓練集合が予め設定されてもよく。該訓練集合に、ラベリングされた第３サンプル画像が含まれる。第３サンプル画像のラベリング情報は、画像におけるターゲットの検出枠及びカテゴリを含んでもよい。該訓練集合に基づいて、関連技術における方式を用いて、ターゲット検出ネットワークに対して事前訓練を行い、該ターゲット検出ネットワークに、一定の検出精度を持たせることができる。 In a possible implementation, before step S11, the method further comprises pre-training the target detection network with labeled third sample images. That is, the training set may be preset. The training set includes labeled third sample images. The labeling information for the third sample image may include detection windows and categories of targets in the image. Based on the training set, a method in the related art can be used to pre-train a target detection network to make the target detection network have a certain detection accuracy.

しかしながら、事前訓練されたターゲット検出ネットワークの、大規模なロングテール画像に対する検出効果が悪いため、能動的学習の方式で、ラベリングされていない第１サンプル画像を用いてターゲット検出ネットワークを更に訓練することができる。 However, due to the poor detection effectiveness of the pre-trained target detection network on large-scale long-tail images, it is recommended to further train the target detection network using the unlabeled first sample images in a manner of active learning. can be done.

可能な実現形態において、ステップＳ１１において、ラベリングされていない第１サンプル画像をターゲット検出ネットワークに入力して処理を行い、第１サンプル画像のターゲット検出結果を得ることができる。該ターゲット検出結果は、第１サンプル画像におけるターゲットの画像領域、特徴情報及び分類確率を含んでもよい。ターゲットが位置する画像領域は、画像における検出枠であってもよく、ターゲットの特徴情報は、例えば、ターゲット検出ネットワークの隠れ層（例えば、畳み込み層）から出力された特徴ベクトルであってもよく、ターゲットの分類確率は、該ターゲットが各カテゴリに属する分類事後確率を表すことができる。 In a possible implementation, in step S11, an unlabeled first sample image can be input to a target detection network for processing to obtain a target detection result for the first sample image. The target detection result may include an image region of the target in the first sample image, feature information and classification probability. The image region where the target is located may be a detection frame in the image, the feature information of the target may be, for example, a feature vector output from a hidden layer (e.g., convolutional layer) of the target detection network, The classification probabilities of a target can represent the classification posterior probabilities that the target belongs to each category.

可能な実現形態において、第１サンプル画像におけるターゲットは、インスタンスとよばれてもよい。各第１サンプル画像から、１つ又は複数のターゲットを検出する可能性がある。実際の処理において、検出されたターゲットの数レベルは、画像数レベルの数倍から数十倍である可能性がある。 In a possible implementation, a target in the first sample image may be called an instance. One or more targets may be detected from each first sample image. In actual processing, the number level of detected targets may be several times to tens of times the number level of images.

可能な実現形態において、ステップＳ１２において、ターゲットの分類確率に基づいて、分類確率の最大値を求めて該ターゲットのカテゴリ信頼度として決定することができる。 In a possible implementation, in step S12, based on the classification probabilities of the target, the maximum value of the classification probabilities can be determined as the category confidence of the target.

可能な実現形態において、ステップＳ１３において、カテゴリ信頼度が第１閾値以上であるターゲット（第１ターゲットとよばれてもよい）に対して、該第１ターゲットが位置する第１サンプル画像をラベリングされたサンプル画像（第２サンプル画像と呼ばれてもよい）とし、訓練集合に加えることができる。第１ターゲットの画像領域をラベリングされる画像領域とし、該第１ターゲットのカテゴリ信頼度に対応するカテゴリを該第１ターゲットのラベリングカテゴリとする。同一の第２サンプル画像は、該第２サンプル画像における複数の第１ターゲットにより複数回ラベリングされる可動性がある。ここで、第１閾値は例えば０．９９であり、本願の実施例は、第１閾値の値を限定しない。 In a possible implementation, in step S13, for a target whose category confidence is greater than or equal to a first threshold (which may be referred to as a first target), a first sample image in which the first target is located is labeled. sample image (which may be referred to as the second sample image) and can be added to the training set. Let the image region of the first target be the image region to be labeled, and let the category corresponding to the category reliability of the first target be the labeling category of the first target. The same second sample image may be labeled multiple times with multiple first targets in the second sample image. Here, the first threshold is, for example, 0.99, and the embodiments of the present application do not limit the value of the first threshold.

可能な実現形態において、ステップＳ１３の処理プロセスは、疑似ラベリング（ｐｓｅｕｄｏ－ｌａｂｅｌｉｎｇ）と呼ばれてもよい。即ち、信頼度が高いターゲットが位置する画像を価値が高いサンプルとし、ターゲット検出の推理結果を直接的にターゲットのラベリング結果とする。このような方式により、訓練集合におけるポジティブサンプルデータの数を拡張し、ポジティブサンプルの収集が困難であるという問題を解決することができる。 In a possible implementation, the process of step S13 may be called pseudo-labeling. That is, an image in which a target with high reliability is located is taken as a high-value sample, and the inference result of target detection is directly taken as the labeling result of the target. This method can expand the number of positive sample data in the training set and solve the problem of difficulty in collecting positive samples.

可能な実現形態において、ステップＳ１４において、カテゴリ信頼度が第１閾値よりも小さいターゲット（第２ターゲットと呼ばれてもよい）に対して、訓練集合におけるラベリングされた第３サンプル画像におけるターゲット（第３ターゲットと呼ばれてもよい）の特徴情報に基づいて、第２ターゲットに対して特徴相関マイニングを行い、第２ターゲットから、要件を満たすターゲット（第４ターゲットと呼ばれてもよい）をマイニングすることができる。例えば、第３ターゲットの特徴情報と第２ターゲットの特徴情報との距離又は相関度を算出し、距離又は相関度に基づいて、所定の数のターゲットを選択し、選択された所定の数のターゲットを第４ターゲットとすることができる。 In a possible implementation, in step S14, targets in the labeled third sample image in the training set (the second 3 targets), perform feature correlation mining on the second target, and mining a target (which may be referred to as a fourth target) that satisfies the requirements from the second target. can do. For example, the distance or the degree of correlation between the feature information of the third target and the feature information of the second target is calculated, a predetermined number of targets are selected based on the distance or the degree of correlation, and the selected predetermined number of targets are selected. can be a fourth target.

可能な実現形態において、マイニングされた第４ターゲットが位置する第１サンプル画像を第４サンプル画像とし、前記訓練集合に加えることにより、特徴相関マイニングの処理プロセスを完了することができる。このような方式により、訓練集合におけるサンプルデータの数を更に拡張することができる。 In a possible implementation, the first sample image where the mined fourth target is located is taken as the fourth sample image and added to the training set to complete the process of feature correlation mining. Such schemes can further expand the number of sample data in the training set.

可能な実現形態において、手動ラベリングの方式により、第４サンプル画像のラベリング情報を取得することができる。例えば、第４サンプル画像におけるターゲットの検出枠及びカテゴリを手動で決定する。本願の実施例は、これを限定しない。 In a possible implementation, the labeling information of the fourth sample image can be obtained by means of manual labeling. For example, manually determine the detection window and category of the target in the fourth sample image. Embodiments of the present application do not limit this.

可能な実現形態において、ステップＳ１５において、第４サンプル画像のラベリング情報を得た後、訓練集合における第２サンプル画像、第３サンプル画像及び第４サンプル画像に基づいて、ターゲット検出ネットワークを訓練することができる。 In a possible implementation, in step S15, after obtaining the labeling information of the fourth sample image, training a target detection network based on the second, third and fourth sample images in the training set. can be done.

可能な実現形態において、ステップＳ１１の処理により、各第１サンプル画像のターゲット検出結果を得て、Ｓ１２の処理により、各第１サンプル画像におけるターゲットのカテゴリ信頼度を得る。ステップＳ１３において、カテゴリ信頼度が第１閾値以上である第１ターゲットが位置するサンプル画像を訓練集合に加え、疑似ラベリング方式により、ラベリングされた第２サンプル画像を得ることができる。ステップＳ１４において、カテゴリ信頼度が第１閾値よりも小さい第２ターゲットに対してマイニングを行うことができる。 In a possible implementation, the process of step S11 obtains the target detection result for each first sample image, and the process of S12 obtains the category confidence of the target in each first sample image. In step S13, the sample images in which the first targets with category confidence greater than or equal to the first threshold are located are added to the training set, and the labeled second sample images can be obtained by the pseudo-labeling method. In step S14, mining may be performed for a second target whose category confidence is less than the first threshold.

可能な実現形態において、ステップＳ１４は、
前記第２ターゲットの分類確率に基づいて、前記第２ターゲットの情報エントロピーを決定することと、
前記第２ターゲットのカテゴリ信頼度及び情報エントロピーに基づいて、前記第２ターゲットから、第５ターゲットを選択することと、
前記第３サンプル画像における第３ターゲットのカテゴリ及びマイニング待ちサンプル画像の総数に基づいて、各カテゴリのマイニング待ちサンプル画像の第２数をそれぞれ決定することと、
前記第３サンプル画像における第３ターゲットの特徴情報、前記第５ターゲットの特徴情報及び各カテゴリのマイニング待ちサンプル画像の第２数に基づいて、前記第５ターゲットから、第４ターゲット及び前記第４ターゲットが位置する第１サンプル画像を決定することと、を含んでもよい。 In a possible implementation, step S14 includes:
determining the information entropy of the second target based on the classification probability of the second target;
selecting a fifth target from the second target based on the category confidence and information entropy of the second target;
respectively determining a second number of sample images to be mined for each category based on a third target category and a total number of sample images to be mined in the third sample images;
Based on the feature information of the third target in the third sample image, the feature information of the fifth target, and the second number of sample images awaiting mining in each category, from the fifth target, the fourth target and the fourth target and determining a first sample image in which is located.

例えば、第２ターゲットの分類確率に基づいて、第２ターゲットの情報エントロピーを算出することができる。該情報エントロピーは、第２ターゲットの不確実性の程度を表すためのものである。つまり、第２ターゲットの情報エントロピーが大きいほど、第２ターゲットの不確実性の程度が大きくなる。逆に、第２ターゲットの情報エントロピーが小さいほど、第２ターゲットの不確実性の程度が小さくなる。本願の実施例は、情報エントロピーの演算方式を限定しない。 For example, the information entropy of the second target can be calculated based on the classification probability of the second target. The information entropy is intended to represent the degree of uncertainty of the second target. That is, the greater the information entropy of the second target, the greater the degree of uncertainty of the second target. Conversely, the smaller the information entropy of the second target, the smaller the degree of uncertainty of the second target. Embodiments of the present application do not limit the method of calculating information entropy.

可能な実現形態において、第２ターゲットのカテゴリ信頼度及び情報エントロピーに基づいて、複数の第２ターゲットから、一定の条件を満たすターゲット（第５ターゲットと呼ばれてもよい）をそれぞれ選択することができる。例えば、カテゴリ信頼度が大きいターゲット、情報エントロピーが大きいターゲットなどを選択する。 In a possible implementation, a target (which may be referred to as a fifth target) that satisfies a certain condition may be selected from a plurality of second targets based on the category reliability and information entropy of the second target. can. For example, targets with high category reliability, targets with high information entropy, and the like are selected.

可能な実現形態において、前記第２ターゲットのカテゴリ信頼度及び情報エントロピーに基づいて、前記第２ターゲットから、第５ターゲットを選択するステップは、
前記第２ターゲットのカテゴリ信頼度及び情報エントロピーに基づいて、前記第２ターゲットに対してそれぞれ順序付けを行い、第３数の第６ターゲット及び第４数の第７ターゲットを選択することと、
前記第６ターゲットと前記第７ターゲットに対してマージを行い、前記第５ターゲットを得ることと、を含んでもよい。 In a possible implementation, the step of selecting a fifth target from said second target based on the categorical confidence and information entropy of said second target comprises:
ordering the second targets based on their categorical confidence and information entropy, respectively, and selecting a third number of sixth targets and a fourth number of seventh targets;
performing a merge on the sixth target and the seventh target to obtain the fifth target.

つまり、第２ターゲットのカテゴリ信頼度に基づいて、複数の第２ターゲットに対して順序付けを行い、順序付け結果に基づいて、複数の第２ターゲットから、所定の第３数のターゲット（第６ターゲットと呼ばれてもよい）を選択する。同様に、第２ターゲットの情報エントロピーに基づいて、複数の第２ターゲットに対して順序付けを行い、順序付け結果に基づいて、複数の第２ターゲットから、所定の第４数のターゲット（第７ターゲットと呼ばれてもよい）を選択する。ここで、第３数と第４数はそれぞれ３Ｋであってもよく、Ｋは、マイニング待ちサンプル画像の数を表し、Ｋの値は例えば１００００である。実際の処理において、Ｋの値は、ラベリングされていない第１サンプル画像の総数の５％～２５％である可能性がある。本願の実施例は、Ｋの値、第３数及び第４数とＫとの定量的関係をいずれも限定しない。 That is, based on the category reliability of the second targets, the plurality of second targets are ordered, and based on the ordering result, a predetermined third number of targets (the sixth target and may be called). Similarly, based on the information entropy of the second target, a plurality of second targets are ordered, and based on the ordering result, a predetermined fourth number of targets (7th target and may be called). Here, the third number and the fourth number may each be 3K, K represents the number of sample images waiting to be mined, and the value of K is 10000, for example. In actual processing, the value of K may be 5% to 25% of the total number of unlabeled first sample images. The examples of the present application do not limit the value of K nor the quantitative relationship between K and the third and fourth numbers.

当業者が実際の状況に応じてマイニング待ちサンプル画像の数Ｋ、第３数及び第４数の値を設定することができ、且つ第３数と第４数は異なってもよく、本願の実施例はこれを限定しないことは、理解されるべきである。 A person skilled in the art can set the values of the number K of sample images awaiting mining, the third number and the fourth number according to the actual situation, and the third number and the fourth number can be different. It should be understood that the examples do not limit this.

可能な実現形態において、選択された第６ターゲットと第７ターゲットをマージし、マージされた複数のターゲットを第５ターゲットとし、そのうちの存在する可能性がある重複ターゲットを除去することができる。実際の処理において、約６Ｋ個の第５ターゲットを得ることができる。 In a possible implementation, the 6th and 7th targets selected can be merged, and the merged multiple targets can be the 5th target, of which possible duplicate targets can be removed. In actual processing, about 6K fifth targets can be obtained.

上記処理方式は、ブートストラッピング法（ｂｏｏｔｓｔｒａｐｐｉｎｇ）と呼ばれてもよい。このような方式により、第２ターゲットから、一定の数の、可能性が高いポジティブサンプルとネガティブサンプルを同時に選択することができる。それにより、後続の特徴相関マイニングを容易にし、特徴相関マイニングの演算量を低減させ、処理効率を向上させる。 The above processing scheme may be referred to as bootstrapping. In such a manner, a certain number of probable positive and negative samples can be selected simultaneously from the second target. This facilitates subsequent feature correlation mining, reduces the computational complexity of feature correlation mining, and improves processing efficiency.

可能な実現形態において、前記第３サンプル画像における第３ターゲットのカテゴリ及びマイニング待ちサンプル画像の総数に基づいて、各カテゴリのマイニング待ちサンプル画像の第２数をそれぞれ決定するステップは、
前記第３サンプル画像における第３ターゲットのカテゴリに基づいて、各カテゴリの第３ターゲットの割合を決定することと、
各カテゴリの第３ターゲットの割合に基づいて、各カテゴリのサンプリング比重を決定することと、
各カテゴリのサンプリング比重に基づいて、各カテゴリのマイニング待ちサンプル画像の第２数をそれぞれ決定することと、を含んでもよい。 In a possible implementation, determining a second number of sample images to be mined for each category based on a third target category and a total number of sample images to be mined in said third sample images respectively comprises:
determining a percentage of third targets in each category based on categories of third targets in the third sample image;
determining a sampling weight for each category based on the proportion of the third target for each category;
respectively determining a second number of sample images to be mined for each category based on the sampling weight of each category.

例えば、訓練集合における既存のラベリングされた第３サンプル画像における第３ターゲットのカテゴリに基づいて、各カテゴリの第３ターゲットの割合ｆ_ｃを決定することができる。該割合ｆ_ｃに基づいて、下記式により、各カテゴリのサンプリング比重

を算出することができる。

（１）

（２）
式（１）及び（２）において、ｒ_ｃは、カテゴリｃのサンプリング値を表し、ｔは、ハイパーパラメータであり、その値が例えば０．１であり、Ｃは、カテゴリの数を表し、ｒ_ｉは、Ｃ個のカテゴリのうちのｉ番目のカテゴリのサンプリング値を表す。 For example, based on the categories of tertiary targets in the existing labeled 3rd sample images in the training set, the proportion f _c of tertiary targets in each category can be determined. Based on the ratio f _c , the sampling weight of each category is calculated by the following formula

can be calculated.

(1)

(2)
In equations (1) and (2), r _c represents the sampling value of category c, t is a hyperparameter whose value is for example 0.1, C represents the number of categories, r _i represents the sampling value of the i-th category among the C categories.

式（１）及び（２）の処理により、割合が小さいカテゴリに対応するサンプリング比重を向上させ、割合が大きいカテゴリに対応するサンプリング比重を低減させることができ、それにより異なるカテゴリのサンプルの数のバランスが取られていないという問題を軽減し、ネットワークの訓練効果を向上させる。 The processing of equations (1) and (2) can increase the sampling weight corresponding to low-proportion categories and reduce the sampling weight corresponding to high-proportion categories, thereby reducing the number of samples in different categories. Alleviate the problem of imbalance and improve the training effect of the network.

可能な実現形態において、各カテゴリのサンプリング比重

及びマイニング待ちサンプル画像の総数（Ｋ個）に基づいて、各カテゴリのマイニング待ちサンプル画像の第２数を決定することができる。更に、第２数に基づいて、特徴相関マイニングを行うことができる。 Sampling weight for each category in possible implementations

and the total number of sample images to be mined (K), a second number of sample images to be mined for each category can be determined. Additionally, feature correlation mining can be performed based on the second number.

つまり、訓練集合におけるラベリングされた第３サンプル画像をターゲット検出ネットワークに入力し、ターゲット検出ネットワークの隠れ層（例えば、畳み込み層）から、該第三サンプル画像の特徴情報、例えば特徴ベクトルを出力することができる。このような方式により、第３サンプル画像の特徴を得ることができ、後続の特徴相関マイニングに寄与する。 That is, inputting the labeled third sample image in the training set into the target detection network, and outputting feature information, e.g., a feature vector, of the third sample image from the hidden layer (e.g., convolutional layer) of the target detection network. can be done. With such a scheme, features of the third sample image can be obtained to contribute to subsequent feature correlation mining.

可能な実現形態において、前記第３サンプル画像における第３ターゲットの特徴情報、前記第５ターゲットの特徴情報及び各カテゴリのマイニング待ちサンプル画像の第２数に基づいて、前記第５ターゲットから、第４ターゲット及び前記第４ターゲットが位置する第１サンプル画像を決定することは、
第１カテゴリの第３ターゲットの特徴情報と各第５ターゲットの特徴情報との距離に基づいて、前記第１カテゴリの第３ターゲットのうち、各第５ターゲットとの距離が最も小さい第３ターゲットをそれぞれ決定し、第８ターゲットとすることであって、前記第１カテゴリは、第３ターゲットのカテゴリのうちのいずれか１つである、ことと、
前記第８ターゲットのうち、距離が最も大きいターゲットを第４ターゲットとして決定することと、を含む。 In a possible implementation, based on the feature information of the third target in the third sample image, the feature information of the fifth target and a second number of sample images to be mined for each category, from the fifth target, the fourth Determining a first sample image in which the target and the fourth target are located comprises:
Based on the distance between the feature information of the third target in the first category and the feature information of each fifth target, the third target with the smallest distance from each fifth target among the third targets in the first category is selected. each determining an eighth target, wherein the first category is any one of the categories of the third target;
determining a target having the longest distance among the eighth targets as a fourth target.

例えば、各カテゴリのマイニング待ちサンプル画像の第２数を決定した後、ｋセンター（ｋ－ｃｅｎｔｅｒ）方式を用いて、第５ターゲットが位置するサンプル画像から、該当する数のサンプル画像をマイニングすることができる。第３ターゲットの複数のカテゴリのうちのいずれか１つのカテゴリ（第１カテゴリと呼ばれてもよい）に対して、該第１カテゴリの第３ターゲットの特徴情報と各第５ターゲットの特徴情報との距離を算出することができる。該距離は例えば、ユークリッド距離であってもよい。いずれか１つの第５ターゲットに対して、第１カテゴリの第３ターゲットのうち、該第５ターゲットとの距離が最も小さい第３ターゲットを決定することができ、それにより各第５ターゲットとの距離が最も小さい第３ターゲットを決定することができ、これは、第８ターゲットと呼ばれてもよい。 For example, after determining the second number of sample images waiting to be mined for each category, using a k-center method to mine the corresponding number of sample images from the sample images where the fifth target is located. can be done. For any one category (which may be called a first category) among the plurality of categories of the third target, the feature information of the third target of the first category and the feature information of each fifth target can be calculated. The distance may for example be the Euclidean distance. For any one fifth target, a third target having the smallest distance from the fifth target among the third targets in the first category can be determined, whereby the distance from each fifth target is can be determined, which may be referred to as the eighth target.

可能な実現形態において、各第８ターゲットから、距離が最も大きい１つのターゲットを選択し、今回の特徴相関マイニングにより得られた第４ターゲットとして決定することができる。これは、下記式に示すとおりである。

（３）
式（３）において、ｕは、特徴相関マイニングにより得られた第４ターゲットを表し、

は、ｊ番目の第５ターゲットの特徴情報

と第１カテゴリｃの１番目の第３ターゲットの特徴情報

との距離を表し、

は、第５ターゲットの特徴情報の集合を表し、

は、第１カテゴリｃの第３ターゲットの特徴情報の集合を表す。 In a possible implementation, from each eighth target, one target with the largest distance can be selected and determined as the fourth target obtained by this feature correlation mining. This is as shown in the following formula.

(3)
In equation (3), u represents the fourth target obtained by feature correlation mining,

is the feature information of the j-th fifth target

and feature information of the first third target of the first category c

represents the distance from

represents the set of feature information of the fifth target,

represents a set of feature information of the third target of the first category c.

可能な実現形態において、該第４ターゲットが位置する第１サンプル画像を決定し、該サンプル画像を訓練集合に加え、第４サンプル画像とすることができ、それにより、今回の特徴相関マイニングプロセスを完了する。 In a possible implementation, the first sample image in which the fourth target is located can be determined and added to the training set to become the fourth sample image, thereby making the current feature correlation mining process complete.

可能な実現形態において、前記第３サンプル画像における第３ターゲットの特徴情報、前記第５ターゲットの特徴情報及び各カテゴリのマイニング待ちサンプル画像の第２数に基づいて、前記第５ターゲットから、第４ターゲット及び前記第４ターゲットが位置する第１サンプル画像を決定するステップは、
決定された第４ターゲットを前記第１カテゴリの第３ターゲットに加え、前記決定された第４ターゲットをラベリングされていない第５ターゲットから除去することを更に含む。 In a possible implementation, based on the feature information of the third target in the third sample image, the feature information of the fifth target and a second number of sample images to be mined for each category, from the fifth target, the fourth Determining a first sample image in which the target and the fourth target are located comprises:
Adding the determined fourth target to the third target of the first category and removing the determined fourth target from the unlabeled fifth target.

つまり、今回の特徴相関マイニングにより得られた第４ターゲットをラベリングされたターゲットとし、該第４ターゲットをラベリングされていないターゲットから除去する。この場合、該第４ターゲットの特徴情報を第１カテゴリｃの第３ターゲットの特徴情報の集合

に加え、第５ターゲットの特徴情報の集合

から除去することができる。このように、次回の特徴相関マイニングにおいて、式（３）により、更新後の２つの集合に対してマイニングを行い、上記プロセスを繰り返すことができる。 That is, the fourth target obtained by the current feature correlation mining is set as the labeled target, and the fourth target is removed from the unlabeled targets. In this case, the feature information of the fourth target is the set of feature information of the third target of the first category c.

In addition to the set of feature information of the fifth target

can be removed from Thus, in the next feature correlation mining, the two updated sets can be mined according to equation (3), and the above process can be repeated.

可能な実現形態において、第１カテゴリの第４サンプル画像の数が第１カテゴリの第２数に達するか又は第２数に達しておらず且つ第５ターゲットがなくなる（集合

がヌルである）場合、該第１カテゴリの特徴相関マイニングを完了することができる。 In a possible implementation, the number of fourth sample images of the first category reaches or does not reach the second number of the first category and there is no fifth target (set

is null), then feature correlation mining for the first category can be completed.

このような方式により、各カテゴリに対してそれぞれ特徴相関マイニングを行い、最終的に、十分な数の第４サンプル画像（一般的には、Ｋ個のサンプル画像）を得ることができ、それにより訓練集合におけるサンプル画像の数を更に拡張し、ポジティブサンプルとネガティブサンプルとのバランスが取られていないことを軽減する。 With such a scheme, we can perform feature correlation mining for each category separately, and finally obtain a sufficient number of fourth sample images (generally, K sample images), so that We further expand the number of sample images in the training set to mitigate the imbalance between positive and negative samples.

可能な実現形態において、マイニングされた第４サンプル画像に対して手動ラベリング（ｈｕｍａｎａｎｎｏｔａｔｉｏｎ）を行い、第４サンプル画像のラベリング情報を得ることができる。第４サンプル画像に、ポジティブサンプル画像（即ち、画像にターゲットが含まれる第４サンプル画像）とネガティブサンプル画像（即ち、画像にターゲットが含まれない第４サンプル画像）が同時に存在する可能性があるため、第４サンプル画像のラベリング情報は、画像がポジティブサンプル画像又はネガティブサンプル画像であることを示すサンプルカテゴリ情報、ポジティブサンプル画像におけるターゲットが位置する画像枠及びターゲットのカテゴリを含んでもよい。 In a possible implementation, human annotation can be performed on the mined fourth sample image to obtain the labeling information of the fourth sample image. The fourth sample image may have a positive sample image (i.e., the fourth sample image in which the image contains the target) and a negative sample image (i.e., the fourth sample image in which the image contains no target) at the same time. Therefore, the labeling information of the fourth sample image may include sample category information indicating whether the image is a positive sample image or a negative sample image, the image box in which the target is located in the positive sample image, and the category of the target.

可能な実現形態において、手動ラベリングを完了した後、ステップＳ１５において、第４サンプル画像のラベリング情報、前記訓練集合における第２サンプル画像、第３サンプル画像及び第４サンプル画像に基づいて、ターゲット検出ネットワークを訓練することができる。 In a possible implementation, after completing the manual labeling, in step S15, based on the labeling information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set, a target detection network can be trained.

ここで、ステップＳ１５は、前記訓練集合のポジティブサンプル画像におけるターゲットのカテゴリに基づいて、各カテゴリのポジティブサンプル画像からサンプリングされる第１数をそれぞれ決定することであって、前記ポジティブサンプル画像は、画像にターゲットが含まれるサンプル画像である、ことと、
各カテゴリのポジティブサンプル画像からサンプリングされる第１数に基づいて、各カテゴリのポジティブサンプル画像に対してサンプリングを行い、複数の第５サンプル画像を得ることと、
前記訓練集合のネガティブサンプル画像に対してサンプリングを行い、複数の第６サンプル画像を得ることであって、前記ネガティブサンプル画像は、画像にターゲットが含まれないサンプル画像である、ことと、
前記第５サンプル画像及び前記第６サンプル画像に基づいて、前記ターゲット検出ネットワークを訓練することと、を含んでもよい。 wherein step S15 is respectively determining a first number to be sampled from each category of positive sample images based on target categories in the training set of positive sample images, wherein the positive sample images are: that the image is a sample image containing the target;
sampling the positive sample images of each category based on a first number sampled from the positive sample images of each category to obtain a plurality of fifth sample images;
sampling the negative sample images of the training set to obtain a plurality of sixth sample images, wherein the negative sample images are sample images in which the target is not included in the images;
training the target detection network based on the fifth sample image and the sixth sample image.

例えば、リサンプリング（ｒｅｓａｍｐｌｉｎｇ）の方式により、ターゲット検出ネットワークを訓練することができる。リサンプリングにより、データのうち、出現頻度が低いデータのサンプリング周波数を増加させ、これらのデータに対する、ネットワークの性能を改善し、ポジティブサンプルとネガティブサンプルとのバランスが取られていないことを更に改善する。 For example, the target detection network can be trained by a resampling scheme. Resampling increases the sampling frequency of data with low frequency of occurrence, improves the performance of the network on these data, and further improves the imbalance between positive and negative samples. .

可能な実現形態において、訓練集合（第２サンプル画像、第３サンプル画像及び第４サンプル画像を含む）におけるポジティブサンプル画像とネガティブサンプル画像に対してそれぞれサンプリングを行い、サンプリング後のポジティブサンプル画像の数とネガティブサンプル画像の数を同じであるか又は近くなるようにすることができる。 In a possible implementation, each positive sample image and negative sample image in the training set (including the second sample image, the third sample image, and the fourth sample image) are sampled, and the number of positive sample images after sampling and the number of negative sample images can be the same or close.

可能な実現形態において、ポジティブサンプル画像に対して、ポジティブサンプル画像のサンプリング総数が予め設定されてもよい。訓練集合におけるポジティブサンプル画像におけるターゲットのカテゴリに基づいて、各カテゴリのポジティブサンプル画像からサンプリングされる第１数をそれぞれ決定する。
上記処理プロセスと同様に、ポジティブサンプル画像におけるターゲットのカテゴリに基づいて、各カテゴリのターゲットの割合を決定することができる。該割合に基づいて、下記式により、各カテゴリのサンプリング比重を算出することができる。

（４） In a possible implementation, for positive sample images, a sampling total number of positive sample images may be preset. A first number to be sampled from each category of positive sample images is determined based on the target category in the positive sample images in the training set, respectively.
Similar to the above process, based on the categories of targets in the positive sample image, the percentage of targets in each category can be determined. Based on this ratio, the sampling weight of each category can be calculated by the following formula.

(4)

式（４）において、Ｒ_ｈは、ｈ番目のカテゴリのポジティブサンプル画像のサンプリング比重を表し、ｑ_ｈは、ｈ番目のカテゴリのターゲットの割合を表し、ｔ_１は、ハイパーパラメータであり、その値が例えば０．１である。 In equation (4), R _h represents the sampling weight of the positive sample image of the h-th category, q _h represents the target proportion of the h-th category, and t ₁ is a hyperparameter whose value is, for example, 0.1.

式（４）の処理により、割合が小さいカテゴリに対応するサンプリング比重を向上させ、割合が大きいカテゴリに対応するサンプリング比重を低減させることができ、それにより異なるカテゴリのポジティブサンプル画像の数のバランスが取られていないことを軽減し、ネットワークの訓練効果を向上させる。 The processing of equation (4) can increase the sampling weight corresponding to categories with a small proportion and decrease the sampling weight corresponding to categories with a large proportion, thereby balancing the number of positive sample images in different categories. Reduce non-taken and improve the training effect of the network.

可能な実現形態において、各カテゴリのポジティブサンプル画像のサンプリング比重及びポジティブサンプル画像のサンプリング総数に基づいて、各カテゴリのポジティブサンプル画像の第１数を決定することができる。 In a possible implementation, the first number of positive sample images in each category can be determined based on the sampling weight of positive sample images in each category and the total number of positive sample images sampled.

可能な実現形態において、いずれか１つのカテゴリに対して、該カテゴリの第１数に基づいて、該カテゴリのポジティブサンプル画像から、第１数のポジティブサンプル画像をランダムにサンプリングし、第５サンプル画像とすることができる。各カテゴリのポジティブサンプル画像に対してそれぞれサンプリングを行い、サンプリング総数の第５サンプル画像を得ることができる。 In a possible implementation, for any one category, randomly sample a first number of positive sample images from the positive sample images of the category based on the first number of the category, and generate a fifth sample image can be Sampling is performed on the positive sample images of each category to obtain the total number of sampled fifth sample images.

可能な実現形態において、ネガティブサンプル画像に対して、予め設定されたサンプリング総数に基づいて、訓練集合におけるネガティブサンプル画像に対して直接的にランダムサンプリングを行い、サンプリング総数の第６サンプル画像を得ることができる。ネガティブサンプル画像の該サンプリング総数は、ポジティブサンプル画像のサンプリング総数と同じであって異なってもよく、本願の実施例は、これを限定しない。 In a possible implementation, directly random sampling the negative sample images in the training set based on a preset sampling total for the negative sample images to obtain the sixth sample image of the sampling total. can be done. The sampling total number of negative sample images may be the same as or different from the sampling total number of positive sample images, and embodiments of the present application do not limit this.

可能な実現形態において、第５サンプル画像及び第６サンプル画像に基づいて、ターゲット検出ネットワークを訓練することができる。つまり、第５及び第６サンプル画像をそれぞれターゲット検出ネットワークに入力し、第５及び第６サンプル画像のターゲット検出結果を得て、ターゲット検出結果及びラベリング情報に基づいて、ターゲット検出ネットワークの損失を決定し、損失に基づいて、ターゲット検出ネットワークのパラメータを逆調整し、複数回の反復により、所定の条件（例えば、ネットワーク収束）を満たす場合、訓練されたターゲット検出ネットワークを得る。 In a possible implementation, a target detection network can be trained based on the fifth and sixth sample images. That is, the fifth and sixth sample images are input to the target detection network respectively, the target detection results of the fifth and sixth sample images are obtained, and the loss of the target detection network is determined based on the target detection results and the labeling information. and based on the loss, back-adjust the parameters of the target detection network, and by multiple iterations, obtain a trained target detection network if it meets a predetermined condition (eg, network convergence).

このような方式により、訓練されたターゲット検出ネットワークの、ロングテール画像に対する検出効果を著しく向上させることができる。 Such a scheme can significantly improve the detection effectiveness of the trained target detection network for long-tail images.

可能な実現形態において、ステップＳ１１の前に、ラベリングされた第３サンプル画像により、前記ターゲット検出ネットワークに対して事前訓練を行うステップは、上記リサンプリング訓練方式により実行されてもよい。それにより、ターゲット検出ネットワークの事前訓練効果を向上させる。 In a possible implementation, the step of pre-training the target detection network with labeled third sample images before step S11 may be performed by the resampling training scheme described above. Thereby improving the pre-training effect of the target detection network.

実際の適用において、ステップＳ１１－Ｓ１５におけるプロセス全体を繰り返し、持続的なインクリメンタル訓練を実現させることができる。つまり、ラベリングされていないサンプル画像が再び収集された場合、今回の訓練後のターゲット検出ネットワークを初期ターゲット検出ネットワークとし、今回の拡張後の訓練集合を初期訓練集合とし、疑似ラベリング－特徴相関マイニング－リサンプリング訓練の処理プロセスを繰り返すことができ、それによりターゲット検出ネットワークの性能を持続的に向上させる。 In practical application, the whole process in steps S11-S15 can be repeated to realize continuous incremental training. That is, when the unlabeled sample images are collected again, the target detection network after this training is taken as the initial target detection network, the training set after this expansion is taken as the initial training set, and pseudo-labeling - feature correlation mining - The process of resampling training can be repeated, thereby continuously improving the performance of the target detection network.

図２は、本願の実施例によるネットワーク訓練方法の処理プロセスを示す概略図である。図２に示すように、データソースに、大量のラベリングされていない第１サンプル画像２０が含まれる。第１サンプル画像２０をターゲット検出ネットワークに入力し、予測（ｐｒｅｄｉｃｔ）を行い、各第１サンプル画像２０のターゲット検出結果２１を得る。該ターゲット検出結果は、第１サンプル画像におけるターゲットの画像領域（図示されず）、特徴ベクトル及び分類確率を含む。 FIG. 2 is a schematic diagram showing the process of a network training method according to an embodiment of the present application. As shown in FIG. 2, the data source contains a large number of unlabeled first sample images 20 . A first sample image 20 is input to a target detection network and a predict is performed to obtain a target detection result 21 for each first sample image 20 . The target detection results include target image regions (not shown), feature vectors and classification probabilities in the first sample image.

図２に示すように、該例において、ターゲット検出ネットワークは、ＣＮＮバックボーンネットワーク２１１、特徴マップピラミッドネットワーク（ＦＰＮ）２１２及び全結合ネットワーク２１３を含んでもよく、全結合ネットワーク２１３は、例えばｂｂｏｘｈｅａｄである。第１サンプル画像２０をターゲット検出ネットワークに入力した後、ＣＮＮバックボーンネットワーク２１１及びＦＰＮ２１２により処理を行い、第１サンプル画像の特徴マップ２１４を得る。特徴マップ２１４を全結合ネットワーク２１３により処理し、ターゲット検出結果２１を得る。 As shown in FIG. 2, in this example, the target detection network may include a CNN backbone network 211, a feature map pyramid network (FPN) 212 and a fully connected network 213, where the fully connected network 213 is, for example, a bbox head. . After inputting the first sample image 20 into the target detection network, it is processed by the CNN backbone network 211 and the FPN 212 to obtain a feature map 214 of the first sample image. The feature map 214 is processed by the fully connected network 213 to obtain the target detection result 21 .

該例において、ターゲットの分類確率に基づいて、ターゲットのカテゴリ信頼度を決定することができる。カテゴリ信頼度が第１閾値（例えば、０．９９）以上である第１ターゲットに対して、これらの第１ターゲットが位置する第１サンプル画像を決定し、第２サンプル画像２２とし、第２サンプル画像２２に対して疑似ラベリングを行う。即ち、第１ターゲットの画像領域及び第１ターゲットのカテゴリ信頼度に対応するカテゴリを第２サンプル画像２２のラベリング情報とする。ラベリングされた第２サンプル画像２２を訓練集合２５に加え、それにより訓練集合におけるポジティブサンプルの拡張を実現させる。 In such an example, a target's category confidence can be determined based on the target's classification probabilities. For first targets whose category confidence is greater than or equal to a first threshold (e.g., 0.99), a first sample image in which these first targets are located is determined to be a second sample image 22; Pseudo labeling is performed on the image 22 . That is, the category corresponding to the image area of the first target and the category reliability of the first target is used as the labeling information of the second sample image 22 . The labeled second sample images 22 are added to the training set 25, thereby achieving expansion of the positive samples in the training set.

該例において、カテゴリ信頼度が第１閾値よりも小さい第２ターゲットに対して、ブートストラッピング法により、一定の数の第５ターゲットを選択し、第５ターゲットが位置するサンプル画像２３を得る。訓練集合におけるラベリングされた第３サンプル画像における第３ターゲットの特徴ベクトル（図示されず）に基づいて、第５ターゲットに対して特徴相関マイニングを行い、第４ターゲット及び第４ターゲットが位置する第１サンプル画像を決定し、第４サンプル画像２４とする。第４サンプル画像２４に対して手動ラベリングを行い、訓練集合２５に加え、それにより訓練集合におけるラベリングされた画像に対する更なる拡張を実現させる。 In the example, for the second targets whose category confidence is less than the first threshold, bootstrapping selects a certain number of fifth targets to obtain the sample image 23 where the fifth targets are located. Based on the feature vector (not shown) of the third target in the labeled third sample image in the training set, perform feature correlation mining on the fifth target to determine the fourth target and the first target where the fourth target is located. A sample image is determined to be the fourth sample image 24 . Manual labeling is performed on the fourth sample image 24 and added to the training set 25, thereby realizing further extensions to the labeled images in the training set.

該例において、二回の拡張を行った後、訓練集合２５に、ラベリングされた第２サンプル画像、第３サンプル画像及び第４サンプル画像が含まれる。訓練集合２５に対してリサンプリングを行い、ポジティブサンプルとネガティブサンプルの数及び異なるカテゴリのポジティブサンプルの数のバランスを取り、リサンプリング後の訓練集合２６を得る。更に、リサンプリング後の訓練集合２６に基づいて、ターゲット検出ネットワークを訓練し、それにより処理プロセス全体を完了する。 In the example, after two rounds of dilation, the training set 25 contains labeled second, third and fourth sample images. Resampling is performed on the training set 25 to balance the number of positive and negative samples and the number of positive samples of different categories to obtain a training set 26 after resampling. Further, a target detection network is trained based on the resampled training set 26, thereby completing the entire processing process.

本願の実施例によれば、ターゲット検出方法を更に提供する。該方法は、
処理待ち画像をターゲット検出ネットワークに入力して処理を行い、前記処理待ち画像のターゲット検出結果を得ることであって、前記ターゲット検出結果は、前記処理待ち画像におけるターゲットの位置及びカテゴリを含み、前記ターゲット検出ネットワークは、上記ネットワーク訓練方法により訓練されたものである、ことを含む。 Embodiments of the present application further provide a target detection method. The method comprises
inputting a to-be-processed image into a target detection network for processing to obtain a target detection result of the to-be-processed image, wherein the target detection result includes a position and category of a target in the to-be-processed image; The target detection network is trained according to the network training method described above.

つまり、上記方法により訓練されたターゲット検出ネットワークを配置し、処理待ち画像のターゲット検出を実現させることができる。処理待ち画像は、例えば、画像収集機器（例えばカメラ）により収集された画像であってもよい。画像に、検出待ちターゲット、例えば人体、顔、車両、物体などが含まれる可能性がある。本願の実施例は、これを限定しない。 That is, a target detection network trained according to the above method can be deployed to achieve target detection in pending images. A pending image may be, for example, an image captured by an image capture device (eg, a camera). Images may include targets waiting to be detected, such as human bodies, faces, vehicles, objects, and so on. Embodiments of the present application do not limit this.

可能な実現形態において、処理待ち画像をターゲット検出ネットワークに入力して処理を行い、前記処理待ち画像のターゲット検出結果を得ることができる。該ターゲット検出結果は、処理待ち画像におけるターゲットの位置及びカテゴリを含み、例えば、処理待ち画像における顔が位置する検出枠及び顔に対応する身分を含む。 In a possible implementation, a pending image can be input to a target detection network and processed to obtain a target detection result for said pending image. The target detection result includes the position and category of the target in the pending image, for example, the detection frame where the face is located in the pending image and the identity corresponding to the face.

このような方式により、ターゲット検出の検出精度を向上させ、大規模なロングテール画像データのターゲット検出を実現させることができる。 With such a method, it is possible to improve the detection accuracy of target detection and realize target detection for large-scale long-tail image data.

本願の実施例のネットワーク訓練方法によれば、能動的学習マイニング方法を利用して、潜在的なラベル無しデータに対してマイニングを行い、半教師あり学習方法を利用して、ラベル無しデータに対するラベリングを補助し、ポジティブサンプルデータの数を拡張し、それにより大規模なロングテール検出においてデータ規模が大きくて且つポジティブサンプルの収集が困難であるという問題を解決し、また、ポジティブサンプルとネガティブサンプルとのバランスが取られていないという問題を一定の程度軽減する。限られたラベリングとコンピューティングリソース環境で、モデルの性能を効果的に向上させる。 According to the network training method of the embodiments of the present application, the active learning mining method is used to mine the latent unlabeled data, and the semi-supervised learning method is used to label the unlabeled data. , to expand the number of positive sample data, thereby solving the problem of large data scale and difficulty in collecting positive samples in large-scale long-tail detection, and to some extent the problem of unbalanced Effectively improve model performance in limited labeling and computing resource environments.

本願の実施例のネットワーク訓練方法によれば、リサンプリングの方式を用いて、ターゲット検出ネットワークを訓練することによって、ポジティブサンプルとネガティブサンプルとのバランスが取られていないことによりネットワーク訓練に及ぼす悪影響を解消し、異なるカテゴリのポジティブサンプルのバランスが取られていないことによりネットワーク訓練に及ぼす悪影響を軽減し、それにより、ターゲット検出ネットワークを訓練する時に効果的に収束を行い、ネットワークの性能を向上させることができる。 According to the network training method of the embodiments of the present application, a resampling scheme is used to train the target detection network so that the network training is not adversely affected by the unbalanced positive and negative samples. and reduce the adverse effects on network training of unbalanced positive samples of different categories, thereby effectively converging and improving the performance of the network when training a target detection network. can be done.

本願の実施例のネットワーク訓練方法によれば、能動的学習方法を利用して、膨大なラベリングされていないデータから、モデルの性能向上に寄与する潜在的な高価値のサンプルをマイニングすることができ、限られたラベリングとコンピューティングリソース環境で、モデルの性能を効果的に向上させ、深層学習モデルが新たな業務に用いられることに必要な人件費及びコンピューティングコストを大幅に節約することができる。リサンプリング方法を利用して、サンプルのバランスが取られていない場合にターゲット検出ネットワークを効果的に訓練することができ、パラメータの手動調整による多すぎる関与を必要とせず、深層学習モデルが新たな業務に用いられる場合に必要な人件費を節約することができる。 According to the network training methods of the embodiments of the present application, active learning methods can be used to mine potentially high-value samples that contribute to model performance improvement from a large amount of unlabeled data. , in a limited labeling and computing resource environment, it can effectively improve the performance of the model, and greatly save the labor and computing costs required for the deep learning model to be used in new business. . Utilizing resampling methods, target detection networks can be effectively trained when samples are unbalanced, without requiring too much involvement in manual tuning of parameters, allowing deep learning models to be novel. Labor costs required for business use can be saved.

本願の実施例のネットワーク訓練方法は、インテリジェントビデオ分析、防犯などの分野に適用可能である。限られた人工及びコンピューティングリソースで、本方法を用いて、インテリジェントビデオ分析又はインテリジェントモニタリングにおける潜在的なターゲットに対してオンラインで検出を行い、用いられる検出ネットワークに対して迅速な反復向上を行い、小さい人件費及びコンピューティングコストを用いて、業務に必要な性能要件を迅速に達成し、後続で、ネットワークの性能を持続的に向上させることができる。 The network training method of the embodiments of the present application can be applied in the fields of intelligent video analysis, crime prevention and so on. With limited man-made and computing resources, the method is used to perform online detection for potential targets in intelligent video analysis or intelligent monitoring, and rapid iterative improvements to the detection network used; With low labor and computing costs, the performance requirements required for business can be quickly achieved, and subsequently the performance of the network can be continuously improved.

本願の実施例のネットワーク訓練方法は、オンラインインテリジェントビデオ分析又はインテリジェントモニタリングに適用可能である。限られた人工及びコンピューティングリソースで、インテリジェントビデオ分析又はインテリジェントモニタリングにおける潜在的なターゲット検出アプリケーションに対してオンラインで迅速な反復向上を行い、小さい人件費及びコンピューティングコストを用いて、業務に必要な性能要件を迅速に達成し、後続で、モデルの性能を持続的に向上させることができる。 The network training methods of the embodiments of the present application are applicable to online intelligent video analysis or intelligent monitoring. With limited man-made and computing resources, it can be rapidly iteratively improved online for potential target detection applications in intelligent video analysis or intelligent monitoring, with small labor and computing costs to meet the needs of the business. Performance requirements can be achieved quickly and subsequently the performance of the model can be continuously improved.

本願の実施例で言及した上記各方法の実施例は、原理や論理から逸脱しない限り、互いに組み合わせることで組み合わせた実施例を構成することができ、紙数に限りがあるため、本願の実施例において逐一説明しないことが理解されるべきである。具体的な実施形態の上記方法において、各ステップの実行順番はその機能及び可能な内在的論理により決まることは、当業者であれば理解すべきである。 The embodiments of the above methods mentioned in the embodiments of the present application can be combined to form a combined embodiment without departing from the principle or logic. It should be understood that it is not explained step-by-step in Those skilled in the art should understand that the execution order of each step in the above method of a specific embodiment depends on its function and possible underlying logic.

なお、本願の実施例は、ネットワーク訓練装置、ターゲット検出装置、コンピュータ可読記憶媒体、プログラムを更に提供する。上記はいずれも、本願の実施例で提供されるいずれか１つのネットワーク訓練方法又はターゲット検出方法を実現させるためのものである。対応する技術的解決手段及び説明は、方法に関連する記述を参照されたい。ここで、詳細な説明を省略する。 In addition, the embodiments of the present application further provide a network training device, a target detection device, a computer-readable storage medium, and a program. All of the above are for implementing any one network training method or target detection method provided in the embodiments of the present application. For the corresponding technical solution and description, please refer to the description related to the method. Here, detailed description is omitted.

図３は、本願の実施例によるネットワーク訓練装置を示すブロック図である。前記装置は、プロセッサ（図３に図示されず）を備え、前記プロセッサは、メモリ（図３に図示されず）に記憶されているプログラム部分を実行するように構成され、図３に示すように、メモリに記憶されているプログラム部分は、
ラベリングされていない第１サンプル画像をターゲット検出ネットワークに入力して処理を行い、前記第１サンプル画像のターゲット検出結果を得るように構成されるターゲット検出部３１であって、前記ターゲット検出結果は、前記第１サンプル画像におけるターゲットの画像領域、特徴情報及び分類確率を含む、ターゲット検出部３１と、
前記ターゲットの分類確率に基づいて、前記ターゲットのカテゴリ信頼度を決定するように構成される信頼度決定部３２と、
前記ターゲットのうちのカテゴリ信頼度が第１閾値以上である第１ターゲットに対して、前記第１ターゲットが位置する第１サンプル画像をラベリングされた第２サンプル画像とし、訓練集合に加えるように構成されるラベリング部３３であって、ここで、前記第２サンプル画像のラベリング情報は、前記第１ターゲットの画像領域と、前記第１ターゲットのカテゴリ信頼度に対応するカテゴリと、を含み、前記訓練集合に、ラベリングされた第３サンプル画像が含まれる、ラベリング部３３と、
前記ターゲットのうちのカテゴリ信頼度が前記第１閾値よりも小さい第２ターゲットに対して、前記第３サンプル画像における第３ターゲットの特徴情報に基づいて、前記第２ターゲットに対して特徴相関マイニングを行い、特徴相関マイニングにより、前記第２ターゲットから、第４ターゲット及び前記第４ターゲットが位置する第１サンプル画像を決定し、前記第４ターゲットが位置する第１サンプル画像を第４サンプル画像とし、前記訓練集合に加えるように構成される特徴マイニング部３４と、
前記第４サンプル画像のラベリング情報、前記訓練集合における第２サンプル画像、第３サンプル画像及び前記第４サンプル画像に基づいて、前記ターゲット検出ネットワークを訓練するように構成される訓練部３５と、を備える。 FIG. 3 is a block diagram illustrating a network training device according to an embodiment of the present application; Said apparatus comprises a processor (not shown in FIG. 3), said processor being arranged to execute a program part stored in a memory (not shown in FIG. 3), as shown in FIG. , the part of the program stored in memory is
A target detection unit 31 configured to input and process an unlabeled first sample image into a target detection network to obtain a target detection result of the first sample image, the target detection result comprising: a target detection unit 31, including target image regions, feature information and classification probabilities in the first sample image;
a confidence determiner 32 configured to determine a category confidence of the target based on the classification probabilities of the target;
For a first target among the targets whose category reliability is greater than or equal to a first threshold, a first sample image in which the first target is located is defined as a labeled second sample image and added to the training set. wherein the labeling information of the second sample image includes an image region of the first target and a category corresponding to the category confidence of the first target; a labeling unit 33, the collection of which includes the labeled third sample image;
For a second target among the targets whose category reliability is smaller than the first threshold, feature correlation mining is performed on the second target based on the feature information of the third target in the third sample image. determining a fourth target and a first sample image in which the fourth target is located from the second target by feature correlation mining, and using the first sample image in which the fourth target is located as a fourth sample image; a feature miner 34 configured to add to the training set;
a training unit 35 configured to train the target detection network based on the labeling information of the fourth sample images, the second sample images, the third sample images and the fourth sample images in the training set; Prepare.

可能な実現形態において、前記装置は、ラベリングされた第３サンプル画像により、前記ターゲット検出ネットワークに対して事前訓練を行うように構成される事前訓練部を更に備える。 In a possible implementation, the apparatus further comprises a pre-training unit configured to pre-train the target detection network with labeled third sample images.

可能な実現形態において、サンプリング数決定サブ部は更に、前記訓練集合のポジティブサンプル画像におけるターゲットのカテゴリに基づいて、各カテゴリのポジティブサンプル画像からサンプリングされる第１数をそれぞれ決定する前に、訓練集合におけるポジティブサンプル画像及びネガティブサンプル画像に対してサンプリングを行い、数が同じであるか又は近いポジティブサンプル画像とネガティブサンプル画像を得るように構成される。 In a possible implementation, the number-of-sampling determination sub-unit further comprises, based on target categories in the positive sample images of the training set, prior to respectively determining the first number to be sampled from each category of positive sample images. Sampling is performed on the positive and negative sample images in the set to obtain positive and negative sample images that are equal or close in number.

可能な実現形態において、前記ターゲット選択サブ部は更に、前記第６ターゲットのうち、前記第７ターゲットと同じであるターゲットを除去し、前記第６ターゲットのうち、前記第７ターゲットと異なる余剰ターゲットを得て、前記余剰ターゲット及び前記第７ターゲットを前記第５ターゲットとするように構成される。 In a possible implementation, the target selection sub-unit further removes targets among the sixth targets that are the same as the seventh target, and removes surplus targets among the sixth targets that are different from the seventh target. and the surplus target and the seventh target are used as the fifth target.

可能な実現形態において、前記方法は、第１カテゴリの第３ターゲットの特徴情報と各第５ターゲットの特徴情報との距離に基づいて、前記第１カテゴリの第３ターゲットのうち、各第５ターゲットとの距離が最も小さい第３ターゲットをそれぞれ決定し、第８ターゲットとした後に、前記第４ターゲットが位置する第１サンプル画像の数が前記第１カテゴリのマイニング待ちサンプル画像の第２数に達する場合、前記第８ターゲットに対する決定を終了することを更に含む。 In a possible implementation, the method includes determining each fifth target among the third targets of the first category based on the distance between the feature information of the third target of the first category and the feature information of each fifth target. After determining the third target with the smallest distance to and as the eighth target, the number of the first sample images in which the fourth target is located reaches the second number of the mining-waiting sample images of the first category. If yes, further comprising terminating the determination for the eighth target.

可能な実現形態において、前記ターゲット及び画像決定サブ部は更に、第１カテゴリの第３ターゲットの特徴情報と各第５ターゲットの特徴情報との距離に基づいて、前記第１カテゴリの第３ターゲットのうち、各第５ターゲットとの距離が最も小さい第３ターゲットをそれぞれ決定し、第８ターゲットとした後に、前記第４ターゲットが位置する第１サンプル画像の数が前記第１カテゴリのマイニング待ちサンプル画像の第２数に達しておらず、且つ前記第５ターゲットの特徴情報を記憶する集合がヌルである場合、前記第８ターゲットに対する決定を終了するように構成される。 In a possible implementation, the target and image determination subunit further determines the third target of the first category based on the distance between the feature information of the third target of the first category and the feature information of each fifth target. Among them, the third target with the shortest distance from each fifth target is determined and set as the eighth target, and then the number of the first sample images in which the fourth target is located is the mining-waiting sample image of the first category. and the set storing feature information of the fifth target is null, the determination for the eighth target is terminated.

可能な実現形態において、前記特徴抽出部は更に、前記第３サンプル画像を前記ターゲット検出ネットワークに入力し、前記ターゲット検出ネットワークの隠れ層から出力された特徴ベクトルを得て、前記特徴ベクトルを前記第３ターゲットの特徴情報として決定するように構成される。 In a possible implementation, the feature extractor further inputs the third sample image into the target detection network, obtains a feature vector output from a hidden layer of the target detection network, converts the feature vector into the third 3 is configured to be determined as target feature information.

いくつかの実施例において、本願の実施例で提供される装置が持つ機能又は備える部分は、上記方法の実施例に記載の方法を実行するように構成されてもよく、その具体的な実現は、上記方法の実施例の記述を参照することができ、簡潔化を図るために、ここで、詳細な説明を省略する。 In some embodiments, the functions or parts comprising the apparatus provided in the embodiments of the present application may be configured to perform the methods described in the above method embodiments, the specific implementation of which is , the description of the embodiments of the above method can be referred to, and the detailed description is omitted here for the sake of brevity.

いくつかの実施例において、「部」は一部の回路、一部のプロセッサ、一部のプログラム又はソフトウェアなどであってもよく、無論、ユニットであってもよく、更に、モジュールであってもよく、非モジュール化のものであってもよい。 In some embodiments, a "part" may be part of a circuit, part of a processor, part of a program or software, etc., and may of course be a unit or even a module. may be non-modularized.

本願の実施例は、コンピュータ可読記憶媒体を更に提供する。該コンピュータ可読記憶媒体にはコンピュータプログラム命令が記憶されており、前記コンピュータプログラム命令がプロセッサにより実行される時、上記方法を実現させる。コンピュータ可読記憶媒体は不揮発性コンピュータ可読記憶媒体であってもよい。 Embodiments of the present application further provide a computer-readable storage medium. The computer readable storage medium stores computer program instructions which, when executed by a processor, implement the method. The computer-readable storage medium may be a non-volatile computer-readable storage medium.

本願の実施例は電子機器を更に提供する。該電子機器は、プロセッサと、プロセッサによる実行可能な命令を記憶するように構成されるメモリと、を備え、前記プロセッサは、前記メモリに記憶されている命令を呼び出し、上記方法を実行するように構成される。 Embodiments of the present application further provide an electronic device. The electronic device comprises a processor and a memory configured to store instructions executable by the processor, the processor calling the instructions stored in the memory to perform the method. Configured.

本願の実施例は、コンピュータプログラム製品を更に提供する。前記コンピュータプログラム製品は、コンピュータ可読コードを含み、コンピュータ可読コードが機器で実行される時、機器におけるプロセッサは、上記いずれか１つの実施例で提供されるネットワーク訓練方法又はターゲット検出方法を実現させるための命令を実行する。 Embodiments of the present application further provide a computer program product. The computer program product comprises computer readable code, and when the computer readable code is executed in a device, a processor in the device causes a network training method or a target detection method provided in any one of the embodiments above to implement the network training method or the target detection method. execute the instructions of

本願の実施例は、もう１つのコンピュータプログラム製品を更に提供する。該コンピュータプログラム製品は、コンピュータ可読命令を記憶するように構成され、命令が実行される時、コンピュータに、上記いずれか１つの実施例で提供されるネットワーク訓練方法又はターゲット検出方法の操作を実行させる。 Embodiments of the present application further provide another computer program product. The computer program product is configured to store computer readable instructions which, when executed, cause a computer to perform the operations of the network training method or target detection method provided in any one of the embodiments above. .

電子機器は、端末、サーバ又は他の形態の機器として提供されてもよい。 An electronic device may be provided as a terminal, server, or other form of device.

図５は、本願の実施例による電子機器８００を示すブロック図である。例えば、電子機器８００は、携帯電話、コンピュータ、デジタル放送端末、メッセージング装置、ゲームコンソール、タブレットデバイス、医療機器、フィットネス機器、パーソナルデジタルアシスタントなどの端末であってもよい。 FIG. 5 is a block diagram illustrating electronic device 800 according to an embodiment of the present application. For example, electronic device 800 may be a terminal such as a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical equipment, fitness equipment, personal digital assistant, and the like.

図５を参照すると、電子機器８００は、処理コンポーネント８０２、メモリ８０４、電源コンポーネント８０６、マルチメディアコンポーネント８０８、オーディオコンポーネント８１０、入力／出力（Ｉ／Ｏ）インタフェース８１２、センサコンポーネント８１４及び通信コンポーネント８１６のうちの１つ又は複数を備えてもよい。 Referring to FIG. 5, electronic device 800 includes processing component 802 , memory 804 , power component 806 , multimedia component 808 , audio component 810 , input/output (I/O) interface 812 , sensor component 814 and communication component 816 . may comprise one or more of

処理コンポーネント８０２は一般的には、電子機器８００の全体操作を制御する。例えば、表示、通話呼、データ通信、カメラ操作及び記録操作に関連する操作を制御する。処理コンポーネント８０２は、指令を実行するための１つ又は複数のプロセッサ８２０を備えてもよい。それにより上記方法の全て又は一部のステップを実行する。なお、処理コンポーネント８０２は、他のユニットとのインタラクションのために、１つ又は複数のモジュールを備えてもよい。例えば、処理コンポーネント８０２はマルチメディアモジュールを備えることで、マルチメディアコンポーネント８０８と処理コンポーネント８０２とのインタラクションに寄与する。 Processing component 802 generally controls the overall operation of electronic device 800 . For example, it controls operations related to display, phone calls, data communication, camera operation and recording operation. Processing component 802 may include one or more processors 820 for executing instructions. All or part of the steps of the above method are thereby performed. Note that processing component 802 may comprise one or more modules for interaction with other units. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802 .

メモリ８０４は、各種のデータを記憶することで電子機器８００における操作をサポートするように構成される。これらのデータの例として、電子機器８００上で操作れる如何なるアプリケーションプログラム又は方法の命令、連絡先データ、電話帳データ、メッセージ、イメージ、ビデオ等を含む。メモリ８０４は任意のタイプの揮発性または不揮発性記憶装置、あるいはこれらの組み合わせにより実現される。例えば、スタティックランダムアクセスメモリ（ＳＲＡＭ）、電気的消去可能なプログラマブル読み出し専用メモリ（ＥＥＰＲＯＭ）、電気的に消去可能なプログラマブル読出し専用メモリ（ＥＰＲＯＭ）、プログラマブル読出し専用メモリ（ＰＲＯＭ）、読出し専用メモリ（ＲＯＭ）、磁気メモリ、フラッシュメモリ、磁気もしくは光ディスクを含む。 Memory 804 is configured to support operations in electronic device 800 by storing various data. Examples of such data include instructions for any application programs or methods that operate on electronic device 800, contact data, phone book data, messages, images, videos, and the like. Memory 804 may be implemented by any type of volatile or non-volatile storage, or a combination thereof. For example, static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), electrically erasable programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM). ), magnetic memory, flash memory, magnetic or optical disk.

電源コンポーネント８０６は電子機器８００の様々なユニットに電力を提供する。電源コンポーネント８０６は、電源管理システム、１つ又は複数の電源、及び電子機器８００のための電力生成、管理、分配に関連する他のユニットを備えてもよい。 Power supply component 806 provides power to various units of electronic device 800 . Power component 806 may comprise a power management system, one or more power sources, and other units related to power generation, management, and distribution for electronic device 800 .

マルチメディアコンポーネント８０８は、上記電子機器８００とユーザとの間に出力インタフェースを提供するためのスクリーンを備える。幾つかの実施例において、スクリーンは、液晶ディスプレイ（ＬＣＤ）及びタッチパネル（ＴＰ）を含む。スクリーンは、タッチパネルを含むと、タッチパネルとして実現され、ユーザからの入力信号を受信する。タッチパネルは、タッチ、スライド及びパネル上のジェスチャを感知する１つ又は複数のタッチセンサを備える。上記タッチセンサは、タッチ又はスライド動作の境界を感知するだけでなく、上記タッチ又はスライド操作に関連する持続時間及び圧力を検出することもできる。幾つかの実施例において、マルチメディアコンポーネント８０８は、フロントカメラ及び／又はリアカメラを備える。電子機器８００が、撮影モード又はビデオモードのような操作モードであれば、フロントカメラ及び／又はリアカメラは外部からのマルチメディアデータを受信することができる。各フロントカメラ及びリアカメラは固定した光学レンズシステム又は焦点及び光学ズーム能力を持つものであってもよい。 A multimedia component 808 comprises a screen for providing an output interface between the electronic device 800 and a user. In some examples, the screen includes a liquid crystal display (LCD) and a touch panel (TP). When the screen includes a touch panel, it is implemented as a touch panel and receives input signals from the user. A touch panel comprises one or more touch sensors that sense touches, slides and gestures on the panel. The touch sensor can not only sense the boundaries of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action. In some examples, multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operating mode such as a shooting mode or a video mode, the front camera and/or the rear camera can receive multimedia data from the outside. Each front and rear camera may have a fixed optical lens system or focus and optical zoom capabilities.

オーディオコンポーネント８１０は、オーディオ信号を出力及び／又は入力するように構成される。例えば、オーディオコンポーネント８１０は、マイクロホン（ＭＩＣ）を備える。電子機器８００が、通話モード、記録モード及び音声識別モードのような操作モードであれば、マイクロホンは、外部からのオーディオ信号を受信するように構成される。受信したオーディオ信号を更にメモリ８０４に記憶するか、又は通信コンポーネント８１６を経由して送信することができる。幾つかの実施例において、オーディオコンポーネント８１０は、オーディオ信号を出力するように構成されるスピーカーを更に備える。 Audio component 810 is configured to output and/or input audio signals. For example, audio component 810 comprises a microphone (MIC). When the electronic device 800 is in operating modes such as call mode, recording mode and voice recognition mode, the microphone is configured to receive audio signals from the outside. The received audio signal can be further stored in memory 804 or transmitted via communication component 816 . In some examples, audio component 810 further comprises a speaker configured to output an audio signal.

Ｉ／Ｏインタフェース８１２は、処理コンポーネント８０２と周辺インタフェースモジュールとの間のインタフェースを提供する。上記周辺インタフェースモジュールは、キーボード、クリックホイール、ボタン等であってもよい。これらのボタンは、ホームボダン、ボリュームボタン、スタートボタン及びロックボタンを含むが、これらに限定されない。 I/O interface 812 provides an interface between processing component 802 and peripheral interface modules. The peripheral interface modules may be keyboards, click wheels, buttons, and the like. These buttons include, but are not limited to, home button, volume button, start button and lock button.

センサコンポーネント８１４は、１つ又は複数のセンサを備え、電子機器８００のために様々な状態の評価を行うように構成される。例えば、センサコンポーネント８１４は、電子機器８００のオン／オフ状態、ユニットの相対的な位置決めを検出することができる。例えば、上記ユニットが電子機器８００のディスプレイ及びキーパッドである。センサコンポーネント８１４は電子機器８００又は電子機器８００における１つのユニットの位置の変化、ユーザと電子機器８００との接触の有無、電子機器８００の方位又は加速／減速及び電子機器８００の温度の変動を検出することもできる。センサコンポーネント８１４は近接センサを備えてもよく、いかなる物理的接触もない場合に周囲の物体の存在を検出するように構成される。センサコンポーネント８１４は、相補型金属酸化膜半導体（ＣＭＯＳ）又は電荷結合素子（ＣＣＤ）画像センサのような光センサを備えてもよく、結像アプリケーションに適用されるように構成される。幾つかの実施例において、該センサコンポーネント８１４は、加速度センサ、ジャイロセンサ、磁気センサ、圧力センサ又は温度センサを備えてもよい。 Sensor component 814 comprises one or more sensors and is configured to perform various condition assessments for electronic device 800 . For example, the sensor component 814 can detect the on/off state of the electronic device 800, the relative positioning of the units. For example, the unit is the display and keypad of electronic device 800 . The sensor component 814 detects changes in the position of the electronic device 800 or a unit in the electronic device 800, whether there is contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and changes in the temperature of the electronic device 800. You can also Sensor component 814 may comprise a proximity sensor and is configured to detect the presence of surrounding objects in the absence of any physical contact. The sensor component 814 may comprise an optical sensor such as a complementary metal-oxide-semiconductor (CMOS) or charge-coupled device (CCD) image sensor and is configured for imaging applications. In some examples, the sensor component 814 may comprise an accelerometer, gyro sensor, magnetic sensor, pressure sensor, or temperature sensor.

通信コンポーネント８１６は、電子機器８００と他の機器との有線又は無線方式の通信に寄与するように構成される。電子機器８００は、無線ネットワーク（ＷｉＦｉ）、第２世代移動通信技術（２Ｇ）、第３世代移動通信技術（３Ｇ）、又はそれらの組み合わせのような通信規格に基づいた無線ネットワークにアクセスできる。一例示的な実施例において、通信コンポーネント８１６は放送チャネルを経由して外部放送チャネル管理システムからの放送信号又は放送関連する情報を受信する。一例示的な実施例において、上記通信コンポーネント８１６は、近接場通信（ＮＦＣ）モジュールを更に備えることで近距離通信を促進する。例えば、ＮＦＣモジュールは、無線周波数識別（ＲＦＩＤ）技術、赤外線データ協会（ＩｒＤＡ）技術、超広帯域（ＵＷＢ）技術、ブルートゥース（ＢＴ）技術及び他の技術に基づいて実現される。 Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices. Electronic device 800 may access a wireless network based on a communication standard such as wireless network (WiFi), second generation mobile technology (2G), third generation mobile technology (3G), or a combination thereof. In one exemplary embodiment, communication component 816 receives broadcast signals or broadcast-related information from external broadcast channel management systems via broadcast channels. In one exemplary embodiment, the communication component 816 further comprises a Near Field Communication (NFC) module to facilitate near field communication. For example, NFC modules are implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

例示的な実施例において、電子機器８００は、１つ又は複数の特定用途向け集積回路（ＡＳＩＣ）、デジタル信号プロセッサ（ＤＳＰ）、デジタル信号処理機器（ＤＳＰＤ）、プログラマブルロジックデバイス（ＰＬＤ）、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、コントローラ、マイクロコントローラ、マイクロプロセッサ又は他の電子素子により実現され、上記方法を実行するように構成されてもよい。 In an exemplary embodiment, electronic device 800 includes one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processors (DSPDs), programmable logic devices (PLDs), field programmable It may be implemented by a gate array (FPGA), controller, microcontroller, microprocessor or other electronic device and configured to carry out the methods described above.

例示的な実施例において、コンピュータプログラム命令を含むメモリ８０４のような不揮発性コンピュータ可読記憶媒体を更に提供する。上記コンピュータプログラム命令は、電子機器８００のプロセッサ８２０により実行され、上記方法を完了する。 The illustrative embodiment further provides a non-volatile computer-readable storage medium, such as memory 804, containing computer program instructions. The computer program instructions are executed by processor 820 of electronic device 800 to complete the method.

図６は、本願の実施例による電子機器１９００を示すブロック図である。例えば、電子機器１９００は、サーバとして提供されてもよい。図６を参照すると、電子機器１９００は、処理コンポーネント１９２２を備える。それは、１つ又は複数のプロセッサ、及びメモリ１９３２で表されるメモリリソースを更に備える。該メモリリソースは、アプリケーションプログラムのような、処理コンポーネント１９２２により実行される命令を記憶するためのものである。メモリ１９３２に記憶されているアプリケーションプログラムは、それぞれ一組の命令に対応する１つ又は１つ以上のモジュールを含んでもよい。なお、処理コンポーネント１９２２は、命令を実行して、上記方法を実行するように構成される。 FIG. 6 is a block diagram illustrating electronic device 1900 according to an embodiment of the present application. For example, electronic device 1900 may be provided as a server. Referring to FIG. 6, electronic device 1900 includes processing component 1922 . It further comprises one or more processors and memory resources represented by memory 1932 . The memory resources are for storing instructions to be executed by processing component 1922, such as application programs. An application program stored in memory 1932 may include one or more modules each corresponding to a set of instructions. It should be noted that processing component 1922 is configured to execute instructions to perform the methods described above.

電子機器１９００は、電子機器１９００の電源管理を実行するように構成される電源コンポーネント１９２６と、電子機器１９００をネットワークに接続するように構成される有線又は無線ネットワークインタフェース１９５０と、入力出力（Ｉ／Ｏ）インタフェース１９５８と、を更に備えてもよい。電子機器１９００は、マイクロソフトサーバオペレーティングシステム（ＷｉｎｄｏｗｓＳｅｒｖｅｒ^ＴＭ）、アップル社のグラフィックユーザインターフェイスベースオペレーティングシステム（ＭａｃＯＳＸ^ＴＭ）、マルチユーザマルチプロセス型コンピュータオペレーティングシステム（Ｕｎｉｘ^ＴＭ）、フリー・オープンソースコード型Ｕｎｉｘ系オペレーティングシステム（Ｌｉｎｕｘ^ＴＭ）、オープンソースコード型Ｕｎｉｘ系オペレーティングシステム（ＦｒｅｅＢＳＤ^ＴＭ）又は類似したもの等、メモリ１９３２に記憶されているオペレーティングシステムを実行することができる。 The electronic device 1900 includes a power component 1926 configured to perform power management of the electronic device 1900; a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network; O) an interface 1958; Electronic device 1900 runs on Microsoft server operating system (Windows Server ^™ ), Apple's graphic user interface-based operating system (Mac OS X ^™ ), multi-user multi-process computer operating system (Unix ^™ ), free and open source code type An operating system stored in memory 1932 may be run, such as a Unix-like operating system (Linux ^™ ), an open source code-based Unix-like operating system (FreeBSD ^™ ), or the like.

例示的な実施例において、例えば、コンピュータプログラム命令を含むメモリ１９３２のような不揮発性コンピュータ可読記憶媒体を更に提供する。上記コンピュータプログラム命令は、電子機器１９００の処理コンポーネント１９２２により実行されて上記方法を完了する。 Exemplary embodiments further provide a non-volatile computer-readable storage medium, such as memory 1932, which contains computer program instructions. The computer program instructions are executed by processing component 1922 of electronic device 1900 to complete the method.

本願の実施例は、システム、方法及び／又はコンピュータプログラム製品であってもよい。コンピュータプログラム製品は、コンピュータ可読記憶媒体を備えてもよく、プロセッサに本願の各態様を実現させるためのコンピュータ可読プログラム命令がそれに記憶されている。 Embodiments of the present application may be systems, methods and/or computer program products. A computer program product may comprise a computer readable storage medium having computer readable program instructions stored thereon for causing a processor to implement aspects of the present application.

コンピュータ可読記憶媒体は、命令実行装置に用いられる命令を保持又は記憶することができる有形装置であってもよい。コンピュータ可読記憶媒体は、例えば、電気記憶装置、磁気記憶装置、光記憶装置、電磁記憶装置、半導体記憶装置又は上記の任意の組み合わせであってもよいが、これらに限定されない。コンピュータ可読記憶媒体のより具体的な例（非網羅的なリスト）は、ポータブルコンピュータディスク、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、読み出し専用メモリ（ＲＯＭ）、消去可能なプログラマブル読み出し専用メモリ（ＥＰＲＯＭ又はフラッシュ）、スタティックランダムアクセスメモリ（ＳＲＡＭ）、ポータブルコンパクトディスク読み出し専用メモリ（ＣＤ－ＲＯＭ）、デジタル多目的ディスク（ＤＶＤ）、メモリスティック、フレキシブルディスク、命令が記憶されているパンチカード又は凹溝内における突起構造のような機械的符号化装置、及び上記任意の適切な組み合わせを含む。ここで用いられるコンピュータ可読記憶媒体は、電波もしくは他の自由に伝搬する電磁波、導波路もしくは他の伝送媒体を通って伝搬する電磁波（例えば、光ファイバケーブルを通過する光パルス）、または、電線を通して伝送される電気信号などの、一時的な信号それ自体であると解釈されるべきではない。 A computer-readable storage medium may be a tangible device capable of holding or storing instructions for use in an instruction-executing device. A computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any combination of the above. More specific examples (non-exhaustive list) of computer readable storage media are portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash) ), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, flexible disc, punched card in which instructions are stored, or protrusions in grooves and any suitable combination of the above. Computer-readable storage media, as used herein, include radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses passing through fiber optic cables), or through electrical wires. It should not be construed as being a transitory signal per se, such as a transmitted electrical signal.

ここで説明されるコンピュータ可読プログラム命令を、コンピュータ可読記憶媒体から各コンピューティング／処理装置にダウンロードすることができるか、又は、インターネット、ローカルエリアネットワーク、ワイドエリアネットワーク及び／又は無線ネットワークのようなネットワークを経由して外部コンピュータ又は外部記憶装置にダウンロードすることができる。ネットワークは、伝送用銅線ケーブル、光ファイバー伝送、無線伝送、ルータ、ファイアウォール、交換機、ゲートウェイコンピュータ及び／又はエッジサーバを含んでもよい。各コンピューティング／処理装置におけるネットワークインタフェースカード又はネットワークインタフェースは、ネットワークからコンピュータ可読プログラム命令を受信し、該コンピュータ可読プログラム命令を転送し、各コンピューティング／処理装置におけるコンピュータ可読記憶媒体に記憶する。 The computer readable program instructions described herein can be downloaded to each computing/processing device from a computer readable storage medium or network such as the Internet, local area networks, wide area networks and/or wireless networks. can be downloaded to an external computer or external storage device via A network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. A network interface card or network interface at each computing/processing device receives computer-readable program instructions from the network, transfers the computer-readable program instructions for storage on a computer-readable storage medium at each computing/processing device.

本願の操作を実行するように構成されるコンピュータプログラム命令は、アセンブラ命令、命令セットアーキテクチャ（ＩＳＡ）命令、マシン命令、マシン依存命令、マイクロコード、ファームウェア命令、状態設定データ、又は１つ又は複数のプログラミング言語で記述されたソースコード又はターゲットコードであってもよい。前記プログラミング言語は、Ｓｍａｌｌｔａｌｋ、Ｃ＋＋などのようなオブジェクト指向プログラミング言語と、「Ｃ」プログラミング言語又は類似したプログラミング言語などの従来の手続型プログラミング言語とを含む。コンピュータ可読プログラム命令は、ユーザコンピュータ上で完全に実行してもよく、ユーザコンピュータ上で部分的に実行してもよく、独立したソフトウェアパッケージとして実行してもよく、ユーザコンピュータ上で部分的に実行してリモートコンピュータ上で部分的に実行してもよく、又はリモートコンピュータ又はサーバ上で完全に実行してもよい。リモートコンピュータの場合に、リモートコンピュータは、任意の種類のネットワーク（ローカルエリアネットワーク（ＬＡＮ）又はワイドエリアネットワーク（ＷＡＮ）を含む）を通じてユーザのコンピュータに接続するか、または、外部のコンピュータに接続することができる（例えばインターネットサービスプロバイダを用いてインターネットを通じて接続する）。幾つかの実施例において、コンピュータ可読プログラム命令の状態情報を利用して、プログラマブル論理回路、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）又はプログラマブル論理アレイ（ＰＬＡ）のような電子回路をカスタマイズする。該電子回路は、コンピュータ可読プログラム命令を実行することで、本願の各態様を実現させることができる。 Computer program instructions configured to perform the operations of this application may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or one or more It may be source code or target code written in a programming language. The programming languages include object-oriented programming languages such as Smalltalk, C++, etc., and traditional procedural programming languages such as the "C" programming language or similar programming languages. The computer-readable program instructions may be executed entirely on the user computer, partially executed on the user computer, executed as an independent software package, or partially executed on the user computer. It may be executed partially on a remote computer as a remote computer, or it may be executed entirely on a remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer or to an external computer through any type of network, including local area networks (LAN) or wide area networks (WAN). (eg connect over the Internet using an Internet Service Provider). In some embodiments, state information in computer readable program instructions is used to customize electronic circuits such as programmable logic circuits, field programmable gate arrays (FPGAs) or programmable logic arrays (PLAs). The electronic circuitry may implement aspects of the present application by executing computer readable program instructions.

ここで、本願の実施例の方法、装置（システム）及びコンピュータプログラム製品のフローチャート及び／又はブロック図を参照しながら、本願の各態様を説明する。フローチャート及び／又はブロック図の各ブロック及びフローチャート及び／又はブロック図における各ブロックの組み合わせは、いずれもコンピュータ可読プログラム命令により実現できることは、理解されるべきである。 Aspects of the present application are now described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products of embodiments of the present application. It is to be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

これらのコンピュータ可読プログラム命令は、汎用コンピュータ、専用コンピュータまたはその他プログラマブルデータ処理装置のプロセッサに提供でき、それによって機器を生み出し、これら命令はコンピュータまたはその他プログラマブルデータ処理装置のプロセッサにより実行される時、フローチャート及び／又はブロック図における１つ又は複数のブロック中で規定している機能／操作を実現する装置を生み出した。これらのコンピュータ可読プログラム命令をコンピュータ可読記憶媒体に記憶してもよい。これらの命令によれば、コンピュータ、プログラマブルデータ処理装置及び／又は他の装置は特定の方式で動作する。従って、命令が記憶されているコンピュータ可読記憶媒体は、フローチャート及び／又はブロック図おける１つ又は複数のブロック中で規定している機能／操作を実現する各態様の命令を含む製品を備える。 These computer readable program instructions can be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus, thereby producing an apparatus, wherein these instructions, when executed by the processor of the computer or other programmable data processing apparatus, flow charts. and/or produce an apparatus that performs the functions/operations specified in one or more of the blocks in the block diagrams. These computer readable program instructions may be stored on a computer readable storage medium. These instructions cause computers, programmable data processing devices, and/or other devices to operate in specific manners. Accordingly, a computer-readable storage medium having instructions stored thereon comprises an article of manufacture containing instructions for each aspect of implementing the functions/operations specified in one or more blocks in the flowcharts and/or block diagrams.

コンピュータ可読プログラム命令をコンピュータ、他のプログラマブルデータ処理装置又は他の装置にロードしてもよい。これにより、コンピュータ、他のプログラマブルデータ処理装置又は他の装置で一連の操作の工程を実行して、コンピュータで実施されるプロセスを生成する。従って、コンピュータ、他のプログラマブルデータ処理装置又は他の装置で実行される命令により、フローチャート及び／又はブロック図における１つ又は複数のブロック中で規定している機能／操作を実現させる。 The computer readable program instructions may be loaded into a computer, other programmable data processing device or other device. It causes a computer, other programmable data processing device, or other device to perform a series of operational steps to produce a computer-implemented process. As such, instructions executed by a computer, other programmable data processing device, or other apparatus implement the functions/operations defined in one or more blocks in the flowchart illustrations and/or block diagrams.

図面におけるフローチャート及びブロック図は、本願の複数の実施例によるシステム、方法及びコンピュータプログラム製品の実現可能なアーキテクチャ、機能および操作を例示するものである。この点で、フローチャート又はブロック図における各ブロックは、１つのモジュール、プログラムセグメント又は命令の一部を表すことができる。前記モジュール、プログラムセグメント又は命令の一部は、１つまたは複数の所定のロジック機能を実現するための実行可能な命令を含む。いくつかの取り替えとしての実現中に、ブロックに表記される機能は図面中に表記される順序と異なる順序で発生することができる。例えば、２つの連続するブロックは実際には基本的に並行して実行でき、場合によっては反対の順序で実行することもでき、これは関係する機能から確定する。ブロック図及び／又はフローチャートにおける各ブロック、及びブロック図及び／又はフローチャートにおけるブロックの組み合わせは、所定の機能又は操作を実行するための専用ハードウェアベースシステムにより実現するか、又は専用ハードウェアとコンピュータ命令の組み合わせにより実現することができることに留意されたい。 The flowcharts and block diagrams in the figures illustrate possible architectures, functionality, and operation of systems, methods and computer program products according to embodiments of the present application. In this regard, each block in a flowchart or block diagram can represent part of a module, program segment or instruction. Some of said modules, program segments or instructions comprise executable instructions for implementing one or more predetermined logic functions. In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two consecutive blocks may in fact be executed essentially in parallel, or possibly in the opposite order, as determined from the functionality involved. Each block in the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, may be implemented by means of dedicated hardware-based systems, or dedicated hardware and computer instructions, to perform the specified functions or operations. Note that it can be realized by a combination of

該コンピュータプログラム製品は、具体的には、ハードウェア、ソフトウェア又はその組み合わせにより実現してもよい。選択可能な実施例において、前記コンピュータプログラム製品は、コンピュータ記憶媒体として具現化されてもよい。別の選択可能な実施例において、コンピュータプログラム製品は具体的には、例えば、ソフトウェア開発キット（ＳｏｆｔｗａｒｅＤｅｖｅｌｏｐｍｅｎｔＫｉｔ：ＳＤＫ）などのようなソフトウェア製品として具現化される。 The computer program product may be specifically implemented in hardware, software or a combination thereof. In alternative embodiments, the computer program product may be embodied as a computer storage medium. In another alternative embodiment, the computer program product is specifically embodied as a software product, such as, for example, a Software Development Kit (SDK).

以上は本願の各実施例を説明したが、前記説明は例示的なものであり、網羅するものではなく、且つ開示した各実施例に限定されない。説明した各実施例の範囲と趣旨から脱逸しない場合、当業者にとって、多くの修正及び変更は容易に想到し得るものである。本明細書に用いられる用語の選択は、各実施例の原理、実際の応用、或いは市場における技術の改善を最もよく解釈すること、或いは他の当業者が本明細書に開示された各実施例を理解できることを目的とする。 While embodiments of the present application have been described above, the above description is intended to be illustrative, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will readily occur to those skilled in the art without departing from the scope and spirit of each described embodiment. The choice of terminology used herein is such that it best interprets the principles, practical applications, or improvements of the technology in the marketplace, or that others of ordinary skill in the art may understand each embodiment disclosed herein. The purpose is to be able to understand

本願の実施例は、ネットワーク訓練方法及び装置、ターゲット検出方法及び装置並びに電子機器に関する。該ネットワーク訓練方法は、ラベリングされていないサンプル画像をターゲット検出ネットワークに入力して処理を行い、ターゲット検出結果を得ることであって、該ターゲット検出結果は、ターゲットの画像領域、特徴情報及び分類確率を含む、ことと、ターゲットの分類確率に基づいて、ターゲットのカテゴリ信頼度を決定することと、カテゴリ信頼度が閾値以上である第１ターゲットに対して、第１ターゲットが位置するサンプル画像をラベリングされたサンプル画像とし、訓練集合に加えることと、カテゴリ信頼度が第１閾値よりも小さい第２ターゲットに対して、第２ターゲットに対して特徴相関マイニングを行い、第２ターゲットから、第４ターゲットを決定し、それが位置するサンプル画像を訓練集合に加えることと、訓練集合におけるサンプル画像に基づいて、ターゲット検出ネットワークを訓練することと、を含む。本願の実施例は、ターゲット検出ネットワークの訓練効果を向上させることができる。 Embodiments of the present application relate to network training methods and apparatus, target detection methods and apparatus, and electronic devices. The network training method is to input unlabeled sample images into a target detection network for processing to obtain target detection results, which include target image regions, feature information and classification probabilities. determining a category confidence of the target based on the classification probability of the target; and labeling a sample image in which the first target is located for the first target whose category confidence is greater than or equal to a threshold and add it to the training set, and perform feature correlation mining on the second target for the second target whose category confidence is smaller than the first threshold, and perform feature correlation mining on the second target to the fourth target and adding the sample image in which it is located to the training set; and training a target detection network based on the sample images in the training set. Embodiments of the present application can improve training effectiveness of target detection networks.

Claims

A network training method comprising:
inputting an unlabeled first sample image into a target detection network and processing to obtain a target detection result of the first sample image, wherein the target detection result is the number of targets in the first sample image; including image regions, feature information and classification probabilities;
determining a category confidence for the target based on the target's classification probabilities;
For a first target among the targets whose category reliability is greater than or equal to a first threshold, a first sample image in which the first target is located is defined as a second labeled sample image and added to the training set. and the labeling information of the second sample images includes the image region of the first target and the category corresponding to the category confidence of the first target, and the training set includes labeled third sample images. and
For a second target among the targets whose category reliability is smaller than the first threshold, feature correlation mining is performed on the second target based on the feature information of the third target in the third sample image. determining a fourth target and a first sample image in which the fourth target is located from the second target by feature correlation mining, and using the first sample image in which the fourth target is located as a fourth sample image; adding to the training set;
training the target detection network based on labeling information of the fourth sample images, second sample images, third sample images and the fourth sample images in the training set.

training the target detection network based on the labeling information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set;
each determining a first number sampled from each category of positive sample images based on the category of the target in the training set of positive sample images, wherein the positive sample images are samples whose images contain the target; that it is an image;
sampling the positive sample images of each category based on a first number sampled from the positive sample images of each category to obtain a plurality of fifth sample images;
sampling the negative sample images of the training set to obtain a plurality of sixth sample images, wherein the negative sample images are sample images in which the target is not included in the images;
2. The method of claim 1, comprising training the target detection network based on the fifth sample image and the sixth sample image.

performing feature correlation mining on the second target based on the feature information of the third target in the third sample image, and by the feature correlation mining, the fourth target and the position of the fourth target are obtained from the second target; Determining the first sample image to
determining the information entropy of the second target based on the classification probability of the second target;
selecting a fifth target from the second target based on the category confidence and information entropy of the second target;
respectively determining a second number of sample images to be mined for each category based on a third target category and a total number of sample images to be mined in the third sample images;
Based on the feature information of the third target in the third sample image, the feature information of the fifth target, and the second number of sample images awaiting mining in each category, from the fifth target, the fourth target and the fourth target 3. A network training method according to claim 1 or 2, comprising determining the first sample image in which is located.

selecting a fifth target from the second target based on the categorical confidence and information entropy of the second target;
ordering the second targets based on their categorical confidence and information entropy, respectively, and selecting a third number of sixth targets and a fourth number of seventh targets;
merging the sixth target and the seventh target to obtain the fifth target.

Determining a second number of sample images to be mined for each category based on a third target category and the total number of sample images to be mined in the third sample images, respectively;
determining a percentage of third targets in each category based on categories of third targets in the third sample image;
determining a sampling weight for each category based on the proportion of the third target for each category;
Determining a second number of sample images to be mined for each category based on the sampling weight of each category, respectively.

Based on the feature information of the third target in the third sample image, the feature information of the fifth target, and the second number of sample images awaiting mining in each category, from the fifth target, the fourth target and the fourth target Determining the first sample image in which
Based on the distance between the feature information of the third target in the first category and the feature information of each fifth target, the third target with the smallest distance from each fifth target among the third targets in the first category is selected. each determining an eighth target, wherein the first category is any one of the categories of the third target;
The network training method according to any one of claims 3 to 5, further comprising determining a target having the longest distance among the eighth targets as a fourth target.

Based on the feature information of the third target in the third sample image, the feature information of the fifth target, and the second number of sample images awaiting mining in each category, from the fifth target, the fourth target and the fourth target Determining the first sample image in which
7. The method of claim 6, further comprising adding the determined fourth target to the third target of the first category and removing the determined fourth target from the unlabeled fifth target. network training method.

The network training method comprises:
8. Further comprising inputting said third sample image into said target detection network for processing to obtain characteristic information of a third target in said third sample image. The network training method according to item 1.

Prior to inputting the unlabeled first sample image into a target detection network for processing to obtain a target detection result for the first sample image, the network training method comprises:
A method for training a network according to any one of claims 1 to 8, further comprising pre-training the target detection network with labeled third sample images.

A method for training a network according to any one of claims 1-9, wherein the first sample images comprise long-tail images.

In a possible implementation, prior to the step of respectively determining a first number sampled from each category of positive sample images based on target categories in the training set of positive sample images, the network training method comprises:
3. The network training of claim 2, comprising sampling positive and negative sample images in a training set to obtain positive and negative sample images that are equal or close in number. Method.

The network training method according to claim 3, wherein the total number of sample images waiting to be mined is 5% to 25% of the total number of the first sample images.

merging the sixth target and the seventh target to obtain the fifth target,
removing a target among the sixth targets that is the same as the seventh target to obtain a surplus target that is different from the seventh target among the sixth targets;
5. The network training method according to claim 4, comprising setting the surplus target and the seventh target as the fifth target.

Based on the distance between the feature information of the third target in the first category and the feature information of each fifth target, the third target with the smallest distance from each fifth target among the third targets in the first category is selected. After each determining and eighth targeting step, the network training method comprises:
terminating the determination for the eighth target when the number of first sample images in which the fourth target is located reaches a second number of to-be-mined sample images of the first category. 7. A network training method according to Item 6.

Based on the distance between the feature information of the third target in the first category and the feature information of each fifth target, the third target with the smallest distance from each fifth target among the third targets in the first category is selected. After each determining and eighth targeting step, the network training method comprises:
the number of first sample images in which the fourth target is located does not reach the second number of mining-waiting sample images of the first category, and the set storing feature information of the fifth target is null; 8. The method of claim 7, further comprising terminating determination for the eighth target.

Inputting the third sample image to the target detection network and processing to obtain feature information of a third target in the third sample image includes:
inputting the third sample image into the target detection network and obtaining a feature vector output from a hidden layer of the target detection network;
and determining the feature vector as the feature information of the third target.

A target detection method comprising:
inputting a to-be-processed image into a target detection network for processing to obtain a target detection result of the to-be-processed image, the target-detection result including the position and category of the target in the to-be-processed image;
A target detection method, comprising: the target detection network being trained by a network training method according to any one of claims 1-10.

A network training device comprising:
A target detection unit configured to input and process an unlabeled first sample image into a target detection network to obtain a target detection result of the first sample image, wherein the target detection result is the a target detector, including target image regions, feature information and classification probabilities in the first sample image;
a confidence determiner configured to determine a category confidence of the target based on the classification probabilities of the target;
For a first target among the targets whose category reliability is greater than or equal to a first threshold, a first sample image in which the first target is located is defined as a labeled second sample image and added to the training set. wherein the labeling information of the second sample image includes an image region of the first target and a category corresponding to the category confidence of the first target, and the training set is labeled with a labeling portion containing a third sample image;
For a second target among the targets whose category reliability is smaller than the first threshold, feature correlation mining is performed on the second target based on the feature information of the third target in the third sample image. determining a fourth target and a first sample image in which the fourth target is located from the second target by feature correlation mining, and using the first sample image in which the fourth target is located as a fourth sample image; a feature mining unit configured to add to the training set;
a training unit configured to train the target detection network based on the labeling information of the fourth sample images, the second sample images, the third sample images and the fourth sample images in the training set. , a network training device.

a memory for storing instructions executable by a processor;
a processor for invoking the instructions stored in the memory to perform a network training method according to any one of claims 1 to 16 or a target detection method according to claim 17; An electronic device.

Computer program instructions for, when executed by a processor, causing said processor to perform a network training method according to any one of claims 1 to 16 or to perform a target detection method according to claim 17. computer-readable storage medium.