JP2021043949A

JP2021043949A - Device and method for training target detection model, and electronic apparatus

Info

Publication number: JP2021043949A
Application number: JP2020106745A
Authority: JP
Inventors: 路石; Lu Shi; タヌ・ジミン; Tan Zhiming
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-09-10
Filing date: 2020-06-22
Publication date: 2021-03-18
Also published as: CN112560541A

Abstract

To provide a device and a method for training a target detection model, and an electronic apparatus.SOLUTION: A training device includes: a test unit that performs a test to obtain detection results of the test images by inputting multiple test images at the termination of the N-th training in a training process of a target detection model into the target detection model; a determination unit that determines the convergence level predicted value for each test image based on the detection results of the test images; and an adjustment unit that adjusts the parameters used for the N+1 training based on the convergence level predicted value of the detection results of the test images.SELECTED DRAWING: Figure 1

Description

本発明は、情報技術分野に関する。 The present invention relates to the field of information technology.

近年、深層学習のかげで、コンピュータビジョン分野における研究が大きく進展している。深層学習とは、階層的なニューラルネットワーク上で各種の機械学習アルゴリズムを用いて、画像、テキスト処理などの各種の問題を解決するアルゴリズム集合を指す。深層学習の核心が特徴学習であり、その目的は、階層的なニューラルネットワークにより階層的な特徴情報を得ることで、特徴を手動（人工）で設計する必要があるという従来の難題を解決することにある。深層学習は、各種の人工智能分野に応用されている。 In recent years, research in the field of computer vision has made great progress behind deep learning. Deep learning refers to a set of algorithms that solve various problems such as image processing and text processing by using various machine learning algorithms on a hierarchical neural network. The core of deep learning is feature learning, and its purpose is to solve the conventional difficulty of manually (artificial) designing features by obtaining hierarchical feature information through a hierarchical neural network. It is in. Deep learning has been applied to various fields of artificial intelligence.

ターゲット検出が深層学習の重要な応用の１つである。深層学習に基づくターゲット検出モデルについて言えば、該モデルのパフォーマンスは、該モデルに対しての訓練により決定される。 Target detection is one of the important applications of deep learning. For a target detection model based on deep learning, the performance of the model is determined by training on the model.

しかし、訓練データ集の品質がターゲット検出に対しての厳しい制約であり、訓練データ集は、一般的に、生活シーンから収集され、すべての側面のバランスを取ることが難しい。例えば、交通シーンのデータ集が、一般的に、監視カメラにより取られ、該シーンが、比較的単一であるため、該データ集に基づいてロバストネスが強い交通シーン用のターゲット検出モデルを得ることが困難である。従来の訓練方法では、訓練データ集に存在するアンバランスを発見し難く、また、このようなアンバランスによる訓練効果への影響を無くすこともできない。 However, the quality of the training data collection is a severe constraint on target detection, and the training data collection is generally collected from the life scene and it is difficult to balance all aspects. For example, a traffic scene data collection is generally taken by a surveillance camera, and since the scene is relatively single, a target detection model for a traffic scene with strong robustness is obtained based on the data collection. Is difficult. With the conventional training method, it is difficult to find the imbalance existing in the training data collection, and it is not possible to eliminate the influence of such imbalance on the training effect.

本発明の目的は、訓練過程においてモデルの収斂（収束）方向を動的に制御し、訓練データのアンバランスによる訓練効果への影響を有効に除去することで、パフォーマンスが良好なターゲット検出モデルを訓練することができる、ターゲット検出モデルを訓練する装置及び方法並びに電子機器を提供することにある。 An object of the present invention is to dynamically control the convergence direction of the model in the training process and effectively eliminate the influence of the imbalance of training data on the training effect to obtain a target detection model with good performance. To provide equipment and methods as well as electronic devices for training target detection models that can be trained.

本発明の実施例の第一側面によれば、ターゲット検出モデルの訓練装置が提供され、前記装置は、
ターゲット検出モデルの訓練過程における第N回の訓練終了時に、複数のテスト画像を前記ターゲット検出モデルに入力してテストを行い、各テスト画像の検出結果を取得し、Nが正の整数であり；
各テスト画像の検出結果に基づいて、各テスト画像の収斂レベル予測値を確定し；及び
各テスト画像の検出結果の収斂レベル予測値に基づいて、第N+1回の訓練に使用されるパラメータを調整することを行うように構成される。 According to the first aspect of the embodiment of the present invention, a training device for a target detection model is provided, wherein the device.
At the end of the Nth training in the training process of the target detection model, a plurality of test images are input to the target detection model to perform a test, the detection result of each test image is acquired, and N is a positive integer;
Determine the predicted convergence level of each test image based on the detection result of each test image; and the parameter used for the N + 1 training based on the predicted convergence level of the detection result of each test image. Is configured to make adjustments.

本発明の実施例の第二側面によれば、電子機器が提供され、前記電子機器は、本発明の実施例の第一側面に記載の装置を含む。 According to the second aspect of the embodiment of the present invention, an electronic device is provided, the electronic device including the device described in the first aspect of the embodiment of the present invention.

本発明の実施例の第三側面によれば、ターゲット検出モデルの訓練方法が提供され、前記方法は、
ターゲット検出モデルの訓練過程における第N回の訓練終了時に、複数のテスト画像を前記ターゲット検出モデルに入力してテストを行い、各テスト画像の検出結果を取得し、Nは正の整数であり；
各テスト画像の検出結果に基づいて、各テスト画像の収斂レベル予測値を確定し；及び
各テスト画像の検出結果の収斂レベル予測値に基づいて、第N+1回の訓練に使用されるパラメータを調整することを含む。 According to a third aspect of an embodiment of the present invention, a training method for a target detection model is provided, wherein the method
At the end of the Nth training in the training process of the target detection model, a plurality of test images are input to the target detection model to perform a test, and the detection result of each test image is obtained. N is a positive integer;
Determine the predicted convergence level of each test image based on the detection result of each test image; and the parameter used for the N + 1 training based on the predicted convergence level of the detection result of each test image. Including adjusting.

本発明の有益な効果は以下の通りであり、即ち、ターゲット検出モデルの第N回の訓練終了時に、テスト画像の検出結果に基づいて収斂レベル予測値を確定し、そして、該収斂レベル予測値に基づいて次回の訓練に使用されるパラメータを調整することにより、訓練過程においてモデルの収斂方向を動的に制御し、訓練データのアンバランスによる訓練効果への影響を有効に除去することができ、これにより、パフォーマンスが良好なターゲット検出モデルを訓練することができる。 The beneficial effects of the present invention are as follows: At the end of the Nth training of the target detection model, the convergence level prediction value is determined based on the detection result of the test image, and the convergence level prediction value is determined. By adjusting the parameters used for the next training based on, the convergence direction of the model can be dynamically controlled during the training process, and the influence of the imbalance of training data on the training effect can be effectively eliminated. This allows you to train a well-performing target detection model.

本発明の実施例1におけるターゲット検出モデルの訓練装置を示す図である。It is a figure which shows the training apparatus of the target detection model in Example 1 of this invention. 本発明の実施例1における確定ユニット102を示す図である。It is a figure which shows the determination unit 102 in Example 1 of this invention. 本発明の実施例1における訓練装置100に基づく訓練過程を示す図である。It is a figure which shows the training process based on the training apparatus 100 in Example 1 of this invention. 本発明の実施例2における電子機器を示す図である。It is a figure which shows the electronic device in Example 2 of this invention. 本発明の実施例2における電子機器のシステム構成を示す図である。It is a figure which shows the system configuration of the electronic device in Example 2 of this invention. 本発明の実施例3におけるターゲット検出モデルの訓練方法を示す図である。It is a figure which shows the training method of the target detection model in Example 3 of this invention.

以下、添付した図面を参照しながら、本発明を実施するための好ましい実施例について詳細に説明する。 Hereinafter, preferred examples for carrying out the present invention will be described in detail with reference to the attached drawings.

本発明の実施例は、ターゲット検出モデルの訓練装置を提供する。図1は、本発明の実施例1におけるターゲット検出モデルの訓練装置を示す図である。 The embodiments of the present invention provide a training device for a target detection model. FIG. 1 is a diagram showing a training device for a target detection model according to the first embodiment of the present invention.

図1に示すように、ターゲット検出モデルの訓練装置100は、以下のものを含む。 As shown in FIG. 1, the training device 100 of the target detection model includes the following.

テストユニット101：ターゲット検出モデルの訓練過程における第N回の訓練終了時に、複数のテスト画像をターゲット検出モデルに入力してテストを行い、各テスト画像の検出結果を取得し、Nは正の整数であり；
確定ユニット102：各テスト画像の検出結果に基づいて、各テスト画像の収斂レベル予測値を確定し；及び
調整ユニット103：各テスト画像の検出結果の収斂レベル予測値に基づいて、第N+1回の訓練に使用されるパラメータを調整する。 Test unit 101: At the end of the Nth training in the training process of the target detection model, multiple test images are input to the target detection model for testing, and the detection result of each test image is obtained. N is a positive integer. Is;
Confirmation unit 102: Confirms the predicted convergence level of each test image based on the detection result of each test image; and Adjustment unit 103: N + 1 based on the predicted convergence level of the detection result of each test image. Adjust the parameters used for the training sessions.

このように、ターゲット検出モデルの第N回の訓練終了時に、テスト画像の検出結果に基づいて収斂レベル予測値を確定し、そして、該収斂レベル予測値に基づいて、次回の訓練に使用されるパラメータを調整することで、訓練過程においてモデルの収斂方向を動的に制御し、訓練データのアンバランスによる訓練効果への影響を有効に除去することができ、これにより、パフォーマンスが良好なターゲット検出モデルを訓練することができる。 In this way, at the end of the Nth training of the target detection model, the convergence level prediction value is determined based on the detection result of the test image, and the convergence level prediction value is used for the next training. By adjusting the parameters, the convergence direction of the model can be dynamically controlled during the training process, and the influence of the imbalance of the training data on the training effect can be effectively removed, which enables the target detection with good performance. You can train the model.

本実施例では、ターゲット検出モデルは、各種のネットワークに基づくターゲット検出モデルであっても良く、例えば、深層ニューラルネットワーク（DNN、Deep Neural Networks）に基づくターゲット検出モデルである。 In this embodiment, the target detection model may be a target detection model based on various networks, for example, a target detection model based on a deep neural network (DNN).

該ターゲット検出モデルの検出ターゲットが実際のニーズに応じて確定されても良い。例えば、応用のシーンに基づいて検出ターゲットを確定することができる。 The detection target of the target detection model may be determined according to the actual needs. For example, the detection target can be determined based on the application scene.

本実施例では、テストユニット101は、第N回の訓練終了時にテストを行い、Nは正の整数である。 In this embodiment, the test unit 101 tests at the end of the Nth training, where N is a positive integer.

本実施例では、何回目の訓練終了時にテストを行うかが実際の状況に応じて確定されても良い。例えば、複数（N）回の訓練を経てターゲット検出モデルが初期の安定性（或る程度の安定性）に達した後の毎回の訓練終了時に、テスト及びパラメータの調整を行っても良い。例えば、Nは5以上である。 In this embodiment, the number of times the test is performed at the end of the training may be determined according to the actual situation. For example, testing and parameter adjustment may be performed at the end of each training after the target detection model has reached initial stability (some degree of stability) after multiple (N) trainings. For example, N is 5 or greater.

例えば、深層ニューラルネットワークに基づくターゲット検出モデルについて言えば、第5回の訓練終了後の毎回の訓練終了時にテスト及びパラメータの調整を行っても良い。 For example, regarding a target detection model based on a deep neural network, tests and parameter adjustments may be performed at the end of each training after the fifth training.

本実施例では、訓練データとしての訓練画像及びテスト用のテスト画像が、複数の特徴をマーキングする複数のラベルを有する。このように、複数の特徴をマーキングすることで、複数の次元で該画像の内容を反映することができ、訓練データにおけるアンバランスの発見を助けることができ、これにより、該ターゲット検出モデルの訓練過程における最適化を助けることができる。 In this embodiment, the training image as training data and the test image for testing have a plurality of labels marking a plurality of features. By marking multiple features in this way, the content of the image can be reflected in multiple dimensions, helping to find imbalances in the training data, thereby training the target detection model. Can help optimize in the process.

例えば、交通シーンのターゲット検出モデルについて言えば、複数の特徴は、種類、画像のサイズ及び画像の明るさという3種類の特徴を含んでも良い。言い換えれば、少なくとも、各訓練画像及びテスト画像の種類（車両又は歩行者）、画像のサイズ（大サイズ又は小サイズ）及び画像の明るさ（明るい又は暗い）に対してマーキングを行う。 For example, with respect to a traffic scene target detection model, the plurality of features may include three types of features: type, image size, and image brightness. In other words, at least the type of each training image and test image (vehicle or pedestrian), the size of the image (large or small size) and the brightness of the image (bright or dark) are marked.

表1は、10000枚の訓練画像に対してマーキングを行った後の統計結果を示している。表1に示すように、種類という特徴について言えば、7500枚の訓練画像が正（車両）とマーキング（ラベル付け）され、2500枚の訓練画像が負（歩行者）とマーキングされ、画像のサイズという特徴について言えば、3500枚の訓練画像が正（大サイズ）とマーキングされ、6500枚の訓練画像が負（小サイズ）とマーキングされ、画像の明るさという特徴について言えば、8000枚の訓練画像が「明るい」とマーキングされ、2000枚の画像が「暗い」とマーキングされる。

Table 1 shows the statistical results after marking 10000 training images. As shown in Table 1, in terms of type characteristics, 7500 training images are marked positive (vehicle) and 2500 training images are marked negative (pedestrian), image size. As for the feature, 3500 training images are marked as positive (large size), 6500 training images are marked as negative (small size), and as for the feature of image brightness, 8000 training images. Images are marked "bright" and 2000 images are marked "dark".

本実施例では、テスト画像の数がnであり、nの数値が実際の状況に応じて設定されても良い。 In this embodiment, the number of test images is n, and the numerical value of n may be set according to the actual situation.

ターゲット検出モデルの訓練過程における第N回の訓練終了時に、テストユニット101は、n個のテスト画像をターゲット検出モデルに入力してテストを行い、n個のテスト画像のn個の検出結果を取得する。 At the end of the Nth training in the training process of the target detection model, the test unit 101 inputs n test images into the target detection model for testing, and obtains n detection results of n test images. To do.

例えば、第1個目のテスト画像について、その検出結果が“車両、大サイズ、明るい”であり、第2個目のテスト画像について、その検出結果が“歩行者、小サイズ、明るい”であり、第3個目のテスト画像について、その検出結果が“車両、小サイズ、暗い”であり、……、第n個目のテスト画像について、その検出結果が“歩行者、小サイズ、暗い”である。 For example, for the first test image, the detection result is "vehicle, large size, bright", and for the second test image, the detection result is "pedestrian, small size, bright". , The detection result of the 3rd test image is "vehicle, small size, dark", and ..., the detection result of the nth test image is "pedestrian, small size, dark". Is.

本実施例では、確定ユニット102は、各テスト画像の検出結果に基づいて、各テスト画像の収斂レベル予測値を確定する。 In this embodiment, the confirmation unit 102 determines the predicted convergence level of each test image based on the detection result of each test image.

図2は、本発明の実施例1における確定ユニット102を示す図である。図2に示すように、確定ユニット102は、以下のものを含む。 FIG. 2 is a diagram showing a determination unit 102 according to the first embodiment of the present invention. As shown in FIG. 2, the determination unit 102 includes the following.

計算ユニット201：各テスト画像の検出結果に基づいて、各テスト画像の検出パフォーマンス指標及びすべてのテスト画像の検出パフォーマンス指標の平均値を計算し；及び
取得ユニット202：各テスト画像の検出パフォーマンス指標及び平均値を分類器に入力し、各テスト画像の収斂レベル予測値を取得する。 Calculation unit 201: Calculates the average value of the detection performance index of each test image and the detection performance index of all test images based on the detection result of each test image; and acquisition unit 202: the detection performance index of each test image and Enter the mean value into the classifier and get the predicted convergence level of each test image.

本実施例では、計算ユニット201は、各テスト画像の検出結果に基づいて、各テスト画像の検出パフォーマンス指標を計算する。 In this embodiment, the calculation unit 201 calculates the detection performance index of each test image based on the detection result of each test image.

該検出パフォーマンス指標は、良く用いられる各種の指標、例えば、IOU（Intersection over Union）を使用して良い。 As the detection performance index, various commonly used indexes, for example, IOU (Intersection over Union) may be used.

例えば、計算ユニット201は、テスト画像の検出結果及び該テスト画像の真値（マーキングされる真値）に基づいて比較を行い、そして、比較の結果に基づいてIOUを計算することができる。なお、具体的な計算方法については、関連技術を参照することができ、ここでは、その詳しい説明を省略する。 For example, the calculation unit 201 can make a comparison based on the detection result of the test image and the true value (marked true value) of the test image, and can calculate the IOU based on the result of the comparison. For a specific calculation method, related techniques can be referred to, and detailed description thereof will be omitted here.

また、計算ユニット201は、すべてのテスト画像のIOUを計算した後に、さらにすべてのテスト画像のIOUの平均値を計算する。 Further, the calculation unit 201 calculates the IOUs of all the test images, and then further calculates the average value of the IOUs of all the test images.

例えば、計算ユニット201が計算した第1個目のテスト画像のIOUは、0.92であり、第2個目のテスト画像のIOUは、0.83であり、第3個目のテスト画像のIOUは、0.72であり、……、第n個目のテスト画像のIOUは、0.51であり、また、n個のテスト画像のIOUの平均値は、0.75である。 For example, the IOU of the first test image calculated by the calculation unit 201 is 0.92, the IOU of the second test image is 0.83, and the IOU of the third test image is 0.72. And ..., the IOU of the nth test image is 0.51, and the average value of the IOU of the n test images is 0.75.

計算ユニット201が各テスト画像の検出パフォーマンス指標及び平均値を計算した後に、取得ユニット202は、各テスト画像の検出パフォーマンス指標及び平均値を分類器に入力し、各テスト画像の収斂レベル予測値を取得する。 After the calculation unit 201 calculates the detection performance index and the average value of each test image, the acquisition unit 202 inputs the detection performance index and the average value of each test image into the classifier and obtains the convergence level predicted value of each test image. get.

本実施例では、該分類器の入力が各テスト画像の検出パフォーマンス指標及びこれらの検出パフォーマンス指標の平均値であり、出力の結果が各テスト画像の収斂レベルの予測値である。該予測値は、該テスト画像の次回の訓練過程におけるモデルの収斂に対しての影響を反映している。 In this embodiment, the input of the classifier is the detection performance index of each test image and the average value of these detection performance indexes, and the output result is the predicted value of the convergence level of each test image. The predicted value reflects the effect of the test image on the convergence of the model in the next training process.

本実施例では、該分類器が確定ユニット102に含まれても良く、確定ユニット102と別々で設置されても良い。 In this embodiment, the classifier may be included in the determinant unit 102 or may be installed separately from the determinant unit 102.

該分類器は、上述の機能を実現し得る各種の従来の分類器であっても良く、例えば、該分類器は、ランダムフォレストに基づく分類器である。該分類器は、従来の訓練方法で訓練することにより得ることができる。 The classifier may be a variety of conventional classifiers capable of achieving the above functions, for example, the classifier is a random forest based classifier. The classifier can be obtained by training with conventional training methods.

本実施例では、該分類器は、ガウス分布の分類基準に基づいて、収斂レベル予測値を確定することができる。 In this embodiment, the classifier can determine the predicted convergence level based on the Gaussian distribution classification criteria.

例えば、以下の公式（1）に基づいて収斂レベル予測値を確定することができる。

For example, the convergence level prediction value can be determined based on the following formula (1).

ここで、f(x)は、テスト画像の収斂程度（度合い）を表し、xは、該テスト画像の検出パフォーマンス指標を表し、μは、検出パフォーマンス指標の平均値を表し、係数σは、正の数であり、例えば、σ=0.1である。 Here, f (x) represents the degree of convergence (degree) of the test image, x represents the detection performance index of the test image, μ represents the average value of the detection performance index, and the coefficient σ is positive. For example, σ = 0.1.

或いは、訓練後期の過度の調節を抑制するために、上述の公式（1）を基に、標準偏差及び追加の係数を増設することで制約を行っても良く、具体的には、以下の公式（2）、（3）に示すようである。

Alternatively, in order to suppress excessive adjustment in the latter stage of training, restrictions may be made by increasing the standard deviation and additional coefficients based on the above formula (1). Specifically, the following formula may be applied. It is as shown in (2) and (3).

ここで、f(x)は、テスト画像の収斂程度を示し、xは、該テスト画像の検出パフォーマンス指標を示し、μは、検出パフォーマンス指標の平均値を示し、係数σは、正の数であり、λは、制約係数を示し、例えば、σλ=0.1であり、epochは、訓練の回数を示す。 Here, f (x) indicates the degree of convergence of the test image, x indicates the detection performance index of the test image, μ indicates the average value of the detection performance index, and the coefficient σ is a positive number. Yes, λ indicates the constraint coefficient, for example, σλ = 0.1, and epoch indicates the number of trainings.

本実施例では、計算された収斂程度に基づいて、それを或る収斂レベルに属させる。収斂レベルの数が実際の状況に設置されても良く、例えば、収斂レベルを、「非常に高い」、「高い」、「中」、「低い」という4つのレベルに分け、それぞれ、レベル1、レベル2、レベル3及びレベル4で表しても良い。言い換えると、収斂程度に基づいて確定される収斂レベル予測値が1、2、3、4のうちの1つである。なお、本発明の実施例は、このような分け方に限定されない。 In this example, based on the calculated degree of convergence, it belongs to a certain level of convergence. The number of convergence levels may be set in the actual situation, for example, the convergence levels are divided into four levels, "very high", "high", "medium", and "low", respectively. It may be represented by level 2, level 3 and level 4. In other words, the predicted convergence level determined based on the degree of convergence is one of 1, 2, 3, and 4. The embodiment of the present invention is not limited to such a division method.

例えば、表2は、各収斂レベル予測値及びその対応する収斂程度の数値範囲を示している。

For example, Table 2 shows the predicted value of each convergence level and the numerical range of the corresponding degree of convergence.

確定ユニット10が各テスト画像の収斂レベル予測値を確定した後に、調整ユニット103は、各テスト画像の検出結果の収斂レベル予測値に基づいて、第N+1回の訓練に使用されるパラメータを調整する。 After the confirmation unit 10 determines the predicted convergence level of each test image, the adjustment unit 103 determines the parameters used for the N + 1 training based on the predicted convergence level of the detection result of each test image. adjust.

本実施例では、テスト画像の収斂レベル予測値が高いほど、該テスト画像のデータによりターゲット検出モデルが良好な収斂性を有し、逆に、テスト画像の収斂レベル予測値が低いほど、該テスト画像のデータによりターゲット検出モデルが比較的悪い収斂性を有する。 In this embodiment, the higher the predicted convergence level of the test image, the better the convergence of the target detection model based on the data of the test image, and conversely, the lower the predicted convergence level of the test image, the better the test. The target detection model has relatively poor convergence due to the image data.

調整ユニット103は、テスト画像の収斂レベル予測値に基づいて、次回の訓練に使用されるパラメータを調整することで、収斂レベル予測値が比較的低い画像が訓練のために比較的多く用いられるようにさせる。このようにして、訓練データのアンバランスによる訓練効果への影響を除去し、ターゲット検出モデルが或る画像又は或る種類の画像（収斂レベル予測値が高い画像）のみについて良好な検出精度を有することを回避できる。これにより、訓練されたターゲット検出モデルが良好なロバートネスを有するようにさせることができる。 The adjustment unit 103 adjusts the parameters used for the next training based on the predicted convergence level of the test image so that the image with a relatively low predicted convergence level is used relatively frequently for training. Let me. In this way, the influence of the imbalance of training data on the training effect is removed, and the target detection model has good detection accuracy only for a certain image or a certain kind of image (an image having a high convergence level prediction value). You can avoid that. This allows the trained target detection model to have good Robertness.

例えば、次回の訓練において、収斂レベル予測値が比較的低いテスト画像と類似した訓練画像の使用回数を比較的高い値に設定することができる。 For example, in the next training, the number of times the training image similar to the test image having a relatively low predicted convergence level is used can be set to a relatively high value.

例えば、次回の訓練において、収斂レベル予測値に従って、次回の訓練におけるバックプロパゲーションに使用される勾配の係数、即ち、バックプロパゲーション勾配係数を確定することができる。 For example, in the next training, the gradient coefficient used for backpropagation in the next training, that is, the backpropagation gradient coefficient can be determined according to the predicted convergence level.

表3は、収斂レベル予測値と、次回の訓練におけるバックプロパゲーション勾配係数との対応関係を示している。

Table 3 shows the correspondence between the predicted convergence level and the backpropagation gradient coefficient in the next training.

表3から分かるように、収斂レベル予測値が大きいほど、次回の訓練におけるそれ相応のバックプロパゲーション勾配係数が大きい。言い換えると、現在の収斂レベルが比較的悪い画像について言えば、次回の訓練における重みを大きくする。 As can be seen from Table 3, the larger the predicted convergence level, the larger the corresponding backpropagation gradient coefficient in the next training. In other words, when it comes to images with relatively poor current convergence levels, we will increase the weight in the next training.

図3は、本発明の実施例1における訓練装置100に基づく訓練過程を示す図である。図3に示すように、第N回及び第N+1回の訓練を例として説明を行い、第N回の訓練終了時に、訓練装置100のテストユニット101は、複数のテスト画像を現在のターゲット検出モデル200に入力し、各テスト画像の検出結果を取得し；確定ユニット102は、各テスト画像の検出結果に基づいて、各テスト画像の収斂レベル予測値を確定し；調整ユニット103は、各テスト画像の収斂レベル予測値に基づいて、第N+1回の訓練に使用されるパラメータを調整する。 FIG. 3 is a diagram showing a training process based on the training device 100 in the first embodiment of the present invention. As shown in FIG. 3, the Nth and N + 1th trainings are explained as an example, and at the end of the Nth training, the test unit 101 of the training device 100 targets a plurality of test images as the current target. Input to the detection model 200 and acquire the detection result of each test image; the confirmation unit 102 determines the convergence level predicted value of each test image based on the detection result of each test image; the adjustment unit 103 determines each of them. Adjust the parameters used for the N + 1 training based on the predicted convergence level of the test image.

本実施例では、ターゲット検出モデルの訓練装置100は、第N回の訓練終了時にテスト及びパラメータの調整を行い、そして、調整されたパラメータを第N+1回の訓練に使用し、同様に、第N+1回の訓練終了時、同じ方法を用いてテスト及びパラメータの調整を行い、調整されたパラメータを第N+2回の訓練に使用しても良く、また、これに基づいて類推することができる。言い換えると、第N回の訓練終了後の毎回の訓練終了時にテスト及び次回の訓練時使用のパラメータの調整を行うが、本実施例では、便宜のため、そのうちの1回のテスト及びパラメータ調整の過程について説明している。 In this embodiment, the training device 100 of the target detection model performs tests and parameter adjustments at the end of the Nth training, and uses the adjusted parameters for the N + 1th training, as well. At the end of the N + 1 training, tests and parameter adjustments may be performed using the same method, and the adjusted parameters may be used for the N + 2 training, and inferred based on this. be able to. In other words, the parameters for the test and the next training use are adjusted at the end of each training after the Nth training, but in this embodiment, for convenience, one of the tests and parameter adjustments are performed. Explains the process.

本実施例では、ターゲット検出モデルに対しての訓練過程において、本発明の実施例に記載の内容の他に、訓練過程において使用する必要のある具体的な訓練方法及び終了判断方法については、関連技術を参照することができ、ここでは、その詳しい説明を省略する。 In this embodiment, in addition to the contents described in the examples of the present invention in the training process for the target detection model, the specific training method and the end determination method that need to be used in the training process are related. The technique can be referred to, and a detailed description thereof will be omitted here.

上述の実施例から分かるように、ターゲット検出モデルの第N回の訓練終了時に、テスト画像の検出結果に基づいて収斂レベル予測値を確定し、そして、該収斂レベル予測値に基づいて次回の訓練に使用されるパラメータを調整することで、訓練過程においてモデルの収斂方向を動的に制御し、訓練データのアンバランスによる訓練効果への影響を有効に除去することができ、これにより、パフォーマンスが良好なターゲット検出モデルを訓練することができる。 As can be seen from the above embodiment, at the end of the Nth training of the target detection model, the convergence level predicted value is determined based on the detection result of the test image, and the next training is performed based on the convergence level predicted value. By adjusting the parameters used for, the convergence direction of the model can be dynamically controlled during the training process, and the effect of imbalance of training data on the training effect can be effectively eliminated, thereby improving the performance. A good target detection model can be trained.

本発明の実施例はさらに、電子機器を提供し、図4は、本発明の実施例2における電子機器を示す図である。図4に示すように、電子機器400は、ターゲット検出モデルの訓練装置401を含み、ターゲット検出モデルの訓練装置401の構造及び機能については、実施例1中の記載と同じであるため、ここでは、その詳しい説明を省略する。 An embodiment of the present invention further provides an electronic device, and FIG. 4 is a diagram showing the electronic device according to the second embodiment of the present invention. As shown in FIG. 4, the electronic device 400 includes the training device 401 of the target detection model, and the structure and function of the training device 401 of the target detection model are the same as those described in the first embodiment. , The detailed explanation is omitted.

図5は、本発明の実施例2における電子機器のシステム構成を示す図である。図5に示すように、電子機器500は、処理器501及び記憶器502を含んでも良く、該記憶器502は、該処理器501に接続される。なお、該図は例示に過ぎず、該構造に対して他の類型の結構を以って補充又は代替を行うことで、電気通信機能又は他の機能を実現しても良い。 FIG. 5 is a diagram showing a system configuration of an electronic device according to a second embodiment of the present invention. As shown in FIG. 5, the electronic device 500 may include a processor 501 and a storage device 502, which is connected to the processor 501. It should be noted that the figure is merely an example, and a telecommunications function or another function may be realized by supplementing or substituting the structure with a structure of another type.

図5に示すように、電子機器500は、さらに、入力ユニット503、表示器504、電源505を含んでも良い。 As shown in FIG. 5, the electronic device 500 may further include an input unit 503, a display 504, and a power supply 505.

1つの実施方式において、実施例1に記載のターゲット検出モデルの訓練装置の機能が処理器501に統合されても良い。そのうち、処理器501は、次のように構成されても良く、即ち、ターゲット検出モデルの訓練過程における第N回の訓練終了時に、複数のテスト画像を前記ターゲット検出モデルに入力してテストを行い、各テスト画像の検出結果を取得し、Nが正の整数であり；各テスト画像の検出結果に基づいて各テスト画像の収斂レベル予測値を確定し；及び、各テスト画像の検出結果の収斂レベル予測値に基づいて、第N+1回の訓練に使用されるパラメータを調整する。 In one embodiment, the function of the training device of the target detection model described in Example 1 may be integrated into the processor 501. Among them, the processor 501 may be configured as follows, that is, at the end of the Nth training in the training process of the target detection model, a plurality of test images are input to the target detection model for testing. , Acquire the detection result of each test image, N is a positive integer; determine the predicted convergence level of each test image based on the detection result of each test image; and converge the detection result of each test image. Adjust the parameters used for the N + 1 training based on the predicted level.

例えば、前記の、各テスト画像の検出結果に基づいて各テスト画像の収斂レベル予測値を確定することは、各テスト画像の検出結果に基づいて、各テスト画像の検出パフォーマンス指標及びすべてのテスト画像の検出パフォーマンス指標の平均値を計算し；及び、各テスト画像の検出パフォーマンス指標及び前記平均値を分類器に入力し、各テスト画像の収斂レベル予測値を取得する。 For example, determining the convergence level prediction value of each test image based on the detection result of each test image described above is based on the detection result of each test image, the detection performance index of each test image, and all the test images. The average value of the detection performance index of each test image is calculated; and the detection performance index of each test image and the average value are input to the classifier, and the convergence level predicted value of each test image is acquired.

例えば、前記分類器は、ランダムフォレストに基づく分類器である。 For example, the classifier is a random forest-based classifier.

例えば、前記分類器は、ガウス分布の分類基準に基づいて、各テスト画像の収斂レベル予測値を確定する。 For example, the classifier determines the predicted convergence level of each test image based on the Gaussian distribution classification criteria.

例えば、前記パラメータは、バックプロパゲーション勾配係数である。 For example, the parameter is a backpropagation gradient coefficient.

例えば、前記ターゲット検出モデルに対して訓練を行う訓練画像及び前記テスト画像は、複数の特徴をマーキングする複数のラベルを有する。 For example, the training image and the test image for training the target detection model have a plurality of labels marking a plurality of features.

例えば、前記Nは、5以上である。 For example, the N is 5 or more.

もう1つの実施方式において、実施例1に記載のターゲット検出モデルの訓練装置が該処理器501と独立して配置されても良く、例えば、該ターゲット検出モデルの訓練装置を、処理器501に接続されるチップとして構成し、処理器501の制御により該ターゲット検出モデルの訓練装置の機能を実現しても良い。 In another embodiment, the training device of the target detection model according to the first embodiment may be arranged independently of the processor 501, for example, the training device of the target detection model is connected to the processor 501. It may be configured as a chip to be used, and the function of the training device of the target detection model may be realized by controlling the processor 501.

本実施例では、電子機器500は、図5に示すすべての部品を含む必要がない。 In this embodiment, the electronic device 500 need not include all the components shown in FIG.

図5に示すように、処理器501は、制御器又は操作コントローラと称される場合があり、マイクロプロセッサ又は他の処理装置及び／又は論理装置を含んでも良く、例えば、処理器501は、中央処理装置（CPU、Central Processing Unit）及び／又は図形処理装置（GPU、Graphics Processing Unit）を含んでも良い。処理器501は、入力を受信して電子機器500の各部品の操作を制御することができる。 As shown in FIG. 5, the processor 501 may be referred to as a controller or operation controller and may include a microprocessor or other processor and / or logic device, eg, the processor 501 is central. A processing unit (CPU, Central Processing Unit) and / or a graphic processing unit (GPU, Graphics Processing Unit) may be included. The processor 501 can receive the input and control the operation of each component of the electronic device 500.

記憶器502は、例えば、バッファ、フレッシュメモリ、HDD、移動可能な媒体、揮発性記憶器、不揮発性記憶器又は他の適切な装置のうちの１つ又は複数であっても良い。また、該処理器501は、該記憶器502に記憶のプログラムを実行することで、情報記憶、処理などを実現することもできる。なお、他の部品の機能については、従来と類似したの、ここでは、その詳しい説明を省略する。また、電子機器500の各部品は、専用ハードウェア、ファームウェア、ソフトウェア又はその組み合わせにより実現されても良いが、これらは、すべて、本発明の範囲に属する。 The storage device 502 may be, for example, one or more of buffers, fresh memory, HDDs, mobile media, volatile storage devices, non-volatile storage devices or other suitable devices. Further, the processor 501 can also realize information storage, processing, and the like by executing a storage program in the storage device 502. The functions of the other parts are similar to those of the conventional ones, and detailed description thereof will be omitted here. Further, each component of the electronic device 500 may be realized by dedicated hardware, firmware, software or a combination thereof, all of which belong to the scope of the present invention.

本発明の実施例はさらに、ターゲット検出モデルの訓練方法を提供し、該方法は、実施例1におけるターゲット検出モデルの訓練装置に対応する。図6は、本発明の実施例3におけるターゲット検出モデルの訓練方法を示す図である。図6に示すように、該方法は、以下のステップを含む。 The examples of the present invention further provide a training method for the target detection model, which corresponds to the training device for the target detection model in Example 1. FIG. 6 is a diagram showing a training method of the target detection model according to the third embodiment of the present invention. As shown in FIG. 6, the method includes the following steps.

ステップ601：ターゲット検出モデルの訓練過程における第N回の訓練終了時に、複数のテスト画像をターゲット検出モデルに入力してテストを行い、各テスト画像の検出結果を取得し、Nは正の整数であり；
ステップ602：各テスト画像の検出結果に基づいて、各テスト画像の収斂レベル予測値を確定し；
ステップ603：各テスト画像の検出結果の収斂レベル予測値に基づいて、第N+1回の訓練に使用されるパラメータを調整する。 Step 601: At the end of the Nth training in the training process of the target detection model, multiple test images are input to the target detection model for testing, and the detection result of each test image is obtained, where N is a positive integer. Yes;
Step 602: Determine the predicted convergence level of each test image based on the detection results of each test image;
Step 603: Adjust the parameters used for the N + 1 training based on the predicted convergence level of the detection result of each test image.

本実施例では、上述の各ステップの具体的な実現方法については、実施例1中の記載と同じであるため、ここでは、その詳しい説明を省略する。 In this embodiment, the specific implementation method of each step described above is the same as that described in Example 1, and therefore detailed description thereof will be omitted here.

本発明の実施例はさらに、コンピュータ可読プログラムを提供し、ターゲット検出モデルの訓練装置又は電子機器中で前記プログラムを実現するときに、前記プログラムは、コンピュータに、前記ターゲット検出モデルの訓練装置又は電子機器中で実施例3に記載のターゲット検出モデルの訓練方法を実行させる。 The embodiments of the present invention further provide a computer-readable program, and when the program is realized in a training device or electronic device of the target detection model, the program tells the computer the training device or electronic device of the target detection model. The training method of the target detection model described in Example 3 is executed in the device.

本発明の実施例はさらに、コンピュータ可読プログラムを記憶した記憶媒体を提供し、そのうち、前記コンピュータ可読プログラムは、コンピュータに、ターゲット検出モデルの訓練装置又は電子機器中で実施例3に記載のターゲット検出モデルの訓練方法を実行させる。 An embodiment of the present invention further provides a storage medium in which a computer-readable program is stored, wherein the computer-readable program causes a computer to detect a target according to the third embodiment in a training device or an electronic device of a target detection model. Have the model practice the training method.

また、本発明の実施例において説明した前記方法、装置などは、ハードウェア、処理器により実行されるソフトウェアモジュール、又は両者の組み合わせにより実現することができる。例えば、図1に示す機能ブロック図における１つ又は複数の機能及び／又は機能ブロック図における１つ又は複数の機能の組み合わせは、コンピュータプログラムにおける各ソフトウェアモジュールに対応しても良く、各ハードウェアモジュールに対応しても良い。また、これらのソフトウェアモジュールは、それぞれ、図6に示す各ステップに対応することができる。これらのハードウェアモジュールは、例えば、FPGA（field-programmable gate array）を用いてこれらのソフトウェアモジュールを固化して実現することができる。 In addition, the methods, devices, and the like described in the examples of the present invention can be realized by hardware, software modules executed by a processor, or a combination of both. For example, one or more functions in the functional block diagram shown in FIG. 1 and / or a combination of one or more functions in the functional block diagram may correspond to each software module in a computer program, and each hardware module. May correspond to. In addition, each of these software modules can correspond to each step shown in FIG. These hardware modules can be realized by solidifying these software modules using, for example, FPGA (field-programmable gate array).

また、本発明の実施例による装置、方法などは、ソフトウェアにより実現されても良く、ハードェアにより実現されてもよく、ハードェア及びソフトウェアの組み合わせにより実現されても良い。本発明は、このようなコンピュータ可読プログラムにも関し、即ち、前記プログラムは、ロジック部品により実行される時に、前記ロジック部品に、上述の装置又は構成要素を実現させることができ、又は、前記ロジック部品に、上述の方法又はそのステップを実現させることができる。さらに、本発明は、上述のプログラムを記憶した記憶媒体、例えば、ハードディスク、磁気ディスク、光ディスク、ＤＶＤ、フレッシュメモリなどにも関する。 Further, the apparatus, method, etc. according to the embodiment of the present invention may be realized by software, may be realized by hardware, or may be realized by a combination of hardware and software. The present invention also relates to such a computer-readable program, i.e., when the program is executed by a logic component, the logic component can realize the above-mentioned device or component, or the above-mentioned logic. The component can implement the method described above or a step thereof. Furthermore, the present invention also relates to a storage medium that stores the above-mentioned program, such as a hard disk, a magnetic disk, an optical disk, a DVD, or a fresh memory.

また、以上の実施例などに関し、さらに以下の付記を開示する。 In addition, the following additional notes will be further disclosed with respect to the above examples.

（付記1）
ターゲット検出モデルの訓練方法であって、
ターゲット検出モデルの訓練過程における第N回の訓練終了時に、複数のテスト画像を前記ターゲット検出モデルに入力してテストを行い、各テスト画像の検出結果を取得し、Nは正の整数であり；
各テスト画像の検出結果に基づいて各テスト画像の収斂レベル予測値を確定し；及び
各テスト画像の検出結果の収斂レベル予測値に基づいて、第N+1回の訓練に使用されるパラメータを調整することを含む、方法。 (Appendix 1)
A training method for the target detection model
At the end of the Nth training in the training process of the target detection model, a plurality of test images are input to the target detection model to perform a test, and the detection result of each test image is obtained. N is a positive integer;
The convergence level prediction value of each test image is determined based on the detection result of each test image; and the parameter used for the N + 1 training is determined based on the convergence level prediction value of the detection result of each test image. Methods, including adjusting.

（付記2）
付記1に記載の方法であって、
前記の、各テスト画像の検出結果に基づいて各テスト画像の収斂レベル予測値を確定することは、
各テスト画像の検出結果に基づいて、各テスト画像の検出パフォーマンス指標及びすべてのテスト画像の検出パフォーマンス指標の平均値を計算し；及び
各テスト画像の検出パフォーマンス指標及び前記平均値を分類器に入力し、各テスト画像の収斂レベル予測値を得ることを含む、方法。 (Appendix 2)
The method described in Appendix 1
Determining the convergence level prediction value of each test image based on the detection result of each test image described above is not possible.
Based on the detection result of each test image, the average value of the detection performance index of each test image and the detection performance index of all test images is calculated; and the detection performance index of each test image and the average value are input to the classifier. And a method that involves obtaining a convergent level prediction value for each test image.

（付記3）
付記2に記載の方法であって、
前記分類器は、ランダムフォレストに基づく分類器である、方法。 (Appendix 3)
The method described in Appendix 2
The method, wherein the classifier is a classifier based on a random forest.

（付記4）
付記2又は3に記載の方法であって、
前記分類器は、ガウス分布の分類基準に基づいて、各テスト画像の収斂レベル予測値を確定する、方法。 (Appendix 4)
The method described in Appendix 2 or 3,
The classifier is a method of determining a predicted convergence level of each test image based on a Gaussian distribution classification criterion.

（付記5）
付記1に記載の方法であって、
前記パラメータは、バックプロパゲーション勾配係数である、方法。 (Appendix 5)
The method described in Appendix 1
The method, wherein the parameter is a backpropagation gradient coefficient.

（付記6）
付記1に記載の方法であって、
前記ターゲット検出モデルに対して訓練を行う訓練画像、及び前記テスト画像は、複数の特徴に対してマーキングを行う複数のラベルを有する、方法。 (Appendix 6)
The method described in Appendix 1
A method in which a training image for training the target detection model and the test image have a plurality of labels for marking a plurality of features.

（付記7）
付記1に記載の方法であって、
前記Nは、5以上である、方法。 (Appendix 7)
The method described in Appendix 1
The method, wherein N is 5 or more.

以上、本発明の好ましい実施形態を説明したが、本発明はこの実施形態に限定されず、本発明の趣旨を離脱しない限り、本発明に対するあらゆる変更は本発明の技術的範囲に属する。 Although the preferred embodiment of the present invention has been described above, the present invention is not limited to this embodiment, and any modification to the present invention belongs to the technical scope of the present invention unless the gist of the present invention is deviated.

Claims

A device that trains a target detection model
At the end of the Nth training in the training process of the target detection model, a plurality of test images are input to the target detection model to perform a test, the detection result of each test image is acquired, and N is a positive integer. unit;
A confirmation unit that determines the predicted convergence level of each test image based on the detection result of each test image; and is used for the N + 1 training based on the predicted convergence level of the detection result of each test image. A device that contains an adjustment unit that adjusts parameters.

The device according to claim 1.
The confirmation unit is
A calculation unit that calculates the average value of the detection performance index of each test image and the detection performance index of all test images based on the detection result of each test image; and the detection performance index of each test image and the average value are classified. A device that includes an acquisition unit that inputs to and obtains a convergent level prediction for each test image.

The device according to claim 2.
The classifier is a device that is a classifier based on Dunlam Forest.

The device according to claim 2 or 3.
The classifier is a device that determines the predicted convergence level of each test image based on the classification criteria of the Gaussian distribution.

The device according to claim 1.
The device, wherein the parameter is a backpropagation gradient coefficient.

The device according to claim 1.
The training image for training the target detection model and the test image are devices having a plurality of labels marking a plurality of features.

The device according to claim 1.
The device in which N is 5 or more.

An electronic device including the device according to claim 1.

A way to train a target detection model
At the end of the Nth training in the training process of the target detection model, a plurality of test images are input to the target detection model to perform a test, the detection result of each test image is acquired, and N is a positive integer;
Determine the predicted convergence level of each test image based on the detection result of each test image; and the parameter used for the N + 1 training based on the predicted convergence level of the detection result of each test image. Methods, including adjusting.

The method according to claim 9.
Determining the convergence level prediction value of each test image based on the detection result of each test image is not possible.
Based on the detection result of each test image, the average value of the detection performance index of each test image and the detection performance index of all test images is calculated; and the detection performance index of each test image and the average value are input to the classifier. And a method that involves obtaining a convergent level prediction value for each test image.