TW202205151A

TW202205151A - Network training method, target detection method, electronic device and computer readable storage medium

Info

Publication number: TW202205151A
Application number: TW110120818A
Authority: TW
Inventors: 竇浩軒; 王意如; 甘偉豪; 路少卿; 武偉; 閆俊傑
Original assignee: 大陸商北京市商湯科技開發有限公司
Priority date: 2020-07-15
Filing date: 2021-06-08
Publication date: 2022-02-01
Also published as: CN111881956A; TWI780751B; CN111881956B; KR20220009965A; JP2022544893A; WO2022011892A1

Abstract

The disclosed embodiment relates to a network training method, a target detection method, an electronic device and a computer readable storage medium. The network training method includes: inputting the unlabeled sample image into the target detection network to obtain the target detection result, which includes the image region, feature information and classification probability of the target; according to the classification probability of the target, the category confidence of the target is determined; for the first target whose category confidence is greater than or equal to the threshold, the sample image of the first target is added to the training set as a labeled image; for the second target whose category confidence is less than the first threshold, the second target is extracted from feature correlation, and the fourth target is determined, the sample image of the target is added to the training set; the training target detection network is based on the training set.

Description

Network training method, target detection method, electronic device and computer-readable storage medium

本發明關於電腦技術領域，尤其關於一種網路訓練方法、目標檢測方法、電子設備和電腦可讀儲存介質。The present invention relates to the field of computer technology, and in particular, to a network training method, a target detection method, an electronic device and a computer-readable storage medium.

電腦視覺是人工智慧技術的重要方向，在電腦視覺處理中，通常需要對圖像或視頻中的目標（例如行人、物體等）進行檢測。大規模長尾數據的目標檢測在很多領域有重要應用，例如在城市監控中的異常物體檢測，異常行為檢測和突發事件報警等。然而，由於長尾數據的資料量巨大，以及嚴重的正負樣本不均衡現象，即大部分數據圖片為背景圖，僅有小部分圖片中含有可檢測的目標，導致相關技術的目標檢測方式對長尾數據的目標檢測效果較差。Computer vision is an important direction of artificial intelligence technology. In computer vision processing, it is usually necessary to detect objects (such as pedestrians, objects, etc.) in images or videos. Target detection of large-scale long-tail data has important applications in many fields, such as abnormal object detection in urban surveillance, abnormal behavior detection and emergency alarm. However, due to the huge amount of long-tail data and the serious imbalance of positive and negative samples, that is, most of the data images are background images, and only a small part of the images contain detectable targets, resulting in the target detection method of related technologies. The target detection effect is poor.

本發明實施例提出了一種網路訓練及目標檢測技術方案。The embodiment of the present invention provides a technical solution for network training and target detection.

根據本發明實施例的一方面，提供了一種網路訓練方法，包括：將未標注的第一樣本圖像輸入目標檢測網路中處理，得到所述第一樣本圖像的目標檢測結果，所述目標檢測結果包括所述第一樣本圖像中目標的圖像區域、特徵資訊及分類概率；根據所述目標的分類概率，確定所述目標的類別置信度；針對所述目標中類別置信度大於或等於第一閾值的第一目標，將所述第一目標所在的第一樣本圖像作為已標注的第二樣本圖像，並加入訓練集中，其中，所述第二樣本圖像的標注資訊包括所述第一目標的圖像區域及與所述第一目標的類別置信度對應的類別，所述訓練集中包括已標注的第三樣本圖像；針對所述目標中類別置信度小於所述第一閾值的第二目標，根據所述第三樣本圖像中的第三目標的特徵資訊，對所述第二目標進行特徵相關挖掘，通過特徵相關挖掘，從所述第二目標中確定出第四目標及所述第四目標所在的第一樣本圖像，並將所述第四目標所在的第一樣本圖像作為第四樣本圖像，並加入所述訓練集中；根據所述第四樣本圖像的標注資訊，所述訓練集中的第二樣本圖像、第三樣本圖像及所述第四樣本圖像，訓練所述目標檢測網路。According to an aspect of the embodiments of the present invention, a network training method is provided, including: Input the unlabeled first sample image into the target detection network for processing, and obtain the target detection result of the first sample image, and the target detection result includes the image of the target in the first sample image. image area, feature information and classification probability; according to the classification probability of the target, determine the category confidence of the target; for the first target whose category confidence is greater than or equal to the first threshold, the first target The first sample image where the target is located is taken as the marked second sample image and added to the training set, wherein the annotation information of the second sample image includes the image area of the first target and the The category corresponding to the category confidence of the first target, and the training set includes the labeled third sample image; for the second target whose category confidence in the target is less than the first threshold, according to the third sample The feature information of the third target in the image, the feature correlation mining is performed on the second target, and the fourth target and the first object where the fourth target is located are determined from the second target through the feature correlation mining. this image, and take the first sample image where the fourth target is located as the fourth sample image, and add it to the training set; according to the annotation information of the fourth sample image, the The second sample image, the third sample image and the fourth sample image are used to train the target detection network.

在一種可能的實現方式中，所述根據所述第四樣本圖像的標注資訊，所述訓練集中的第二樣本圖像、第三樣本圖像及所述第四樣本圖像，訓練所述目標檢測網路，包括：根據所述訓練集的正樣本圖像中目標的類別，分別確定從各個類別的正樣本圖像中採樣的第一數量，所述正樣本圖像為圖像中包括目標的樣本圖像；根據各個類別的正樣本圖像中採樣的第一數量，對各個類別的正樣本圖像進行採樣，得到多個第五樣本圖像；對所述訓練集的負樣本圖像進行採樣，得到多個第六樣本圖像，所述負樣本圖像為圖像中不包括目標的樣本圖像；根據所述第五樣本圖像及所述第六樣本圖像，訓練所述目標檢測網路。In a possible implementation manner, according to the annotation information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set, the training of the Object detection network, including: According to the category of the target in the positive sample images of the training set, respectively determine the first number of samples sampled from the positive sample images of each category, and the positive sample images are sample images including the target in the image; The first quantity sampled in the positive sample images of each category, sampling the positive sample images of each category to obtain a plurality of fifth sample images; sampling the negative sample images of the training set to obtain a plurality of A sixth sample image, the negative sample image is a sample image that does not include a target in the image; the target detection network is trained according to the fifth sample image and the sixth sample image.

在一種可能的實現方式中，所述根據所述第三樣本圖像中的第三目標的特徵資訊，對所述第二目標進行特徵相關挖掘，通過特徵相關挖掘，從所述第二目標中確定出第四目標及所述第四目標所在的第一樣本圖像，包括：根據所述第二目標的分類概率，確定所述第二目標的資訊熵；根據所述第二目標的類別置信度及資訊熵，從所述第二目標中選擇出第五目標；根據所述第三樣本圖像中的第三目標的類別以及待挖掘的樣本圖像的總數量，分別確定各個類別待挖掘的樣本圖像的第二數量；根據所述第三樣本圖像中的第三目標的特徵資訊，所述第五目標的特徵資訊以及各個類別待挖掘的樣本圖像的第二數量，從所述第五目標中確定出第四目標及所述第四目標所在的第一樣本圖像。In a possible implementation manner, the feature correlation mining is performed on the second target according to the feature information of the third target in the third sample image, and the feature correlation mining is performed from the second target. Determining the fourth object and the first sample image where the fourth object is located includes: determining the information entropy of the second object according to the classification probability of the second object; according to the category of the second object Confidence and information entropy, select the fifth target from the second target; according to the category of the third target in the third sample image and the total number of sample images to be mined, determine each category to be The second number of sample images to be mined; according to the feature information of the third target in the third sample image, the feature information of the fifth target and the second number of sample images to be mined in each category, from A fourth object and a first sample image where the fourth object is located are determined from the fifth object.

在一種可能的實現方式中，根據所述第二目標的類別置信度及資訊熵，從所述第二目標中選擇出第五目標，包括：根據所述第二目標的類別置信度及資訊熵，分別對所述第二目標進行排序，選擇出第三數量的第六目標和第四數量的第七目標；對所述第六目標和所述第七目標進行合併，得到所述第五目標。In a possible implementation manner, selecting a fifth target from the second target according to the category confidence and information entropy of the second target includes: according to the category confidence and information entropy of the second target , sort the second targets respectively, select the sixth target with the third quantity and the seventh target with the fourth quantity; combine the sixth target and the seventh target to obtain the fifth target .

在一種可能的實現方式中，根據所述第三樣本圖像中的第三目標的類別以及待挖掘的樣本圖像的總數量，分別確定各個類別待挖掘的樣本圖像的第二數量，包括：根據所述第三樣本圖像中的第三目標的類別，確定各個類別的第三目標的比例；根據各個類別的第三目標的比例，確定各個類別的抽樣比重；根據各個類別的抽樣比重，分別確定各個類別待挖掘的樣本圖像的第二數量。In a possible implementation manner, according to the category of the third target in the third sample image and the total number of sample images to be mined, the second number of sample images to be mined for each category is determined respectively, including : determine the proportion of the third object of each category according to the category of the third object in the third sample image; determine the sampling proportion of each category according to the proportion of the third object of each category; determine the sampling proportion of each category according to the proportion of the third object of each category , respectively determine the second number of sample images to be mined in each category.

在一種可能的實現方式中，根據所述第三樣本圖像中的第三目標的特徵資訊，所述第五目標的特徵資訊以及各個類別待挖掘的樣本圖像的第二數量，從所述第五目標中確定出第四目標及所述第四目標所在的第一樣本圖像，包括：根據第一類別的第三目標的特徵資訊與各個第五目標的特徵資訊之間的距離，分別確定所述第一類別的第三目標中與各個第五目標距離最小的第三目標，並作為第八目標，所述第一類別為第三目標的類別中的任意一個；將所述第八目標中距離最大的目標，確定為第四目標。In a possible implementation manner, according to the feature information of the third target in the third sample image, the feature information of the fifth target and the second number of sample images to be mined in each category, from the Determining the fourth target and the first sample image where the fourth target is located in the fifth target includes: according to the distance between the characteristic information of the third target of the first category and the characteristic information of each fifth target, Determine the third target with the smallest distance from each fifth target among the third targets of the first category, and use it as the eighth target, and the first category is any one of the categories of the third targets; The target with the largest distance among the eight targets is determined as the fourth target.

在一種可能的實現方式中，根據所述第三樣本圖像中的第三目標的特徵資訊，所述第五目標的特徵資訊以及各個類別待挖掘的樣本圖像的第二數量，從所述第五目標中確定出第四目標及所述第四目標所在的第一樣本圖像，還包括：將確定出的第四目標添加到所述第一類別的第三目標中，並將所述確定出的第四目標從未標注的第五目標中移除。In a possible implementation manner, according to the feature information of the third target in the third sample image, the feature information of the fifth target and the second number of sample images to be mined in each category, from the Determining the fourth object and the first sample image where the fourth object is located in the fifth object, further comprising: adding the determined fourth object to the third object of the first category, and adding the determined fourth object to the third object of the first category. The fourth target determined as described above is removed from the unmarked fifth target.

在一種可能的實現方式中，所述方法還包括：將所述第三樣本圖像輸入所述目標檢測網路中處理，得到所述第三樣本圖像中的第三目標的特徵資訊。In a possible implementation manner, the method further includes: inputting the third sample image into the target detection network for processing to obtain feature information of the third target in the third sample image.

在一種可能的實現方式中，在所述將未標注的第一樣本圖像輸入目標檢測網路中處理，得到所述第一樣本圖像的目標檢測結果的步驟之前，所述方法還包括：通過已標注的第三樣本圖像對所述目標檢測網路進行預訓練。In a possible implementation manner, before the step of inputting the unlabeled first sample image into the target detection network for processing to obtain the target detection result of the first sample image, the method further include: The target detection network is pre-trained through the labeled third sample image.

在一種可能的實現方式中，所述第一樣本圖像包括長尾圖像。In a possible implementation manner, the first sample image includes a long-tail image.

根據本發明實施例的一方面，提供了一種目標檢測方法，該方法包括：將待處理圖像輸入目標檢測網路中處理，得到所述待處理圖像的目標檢測結果，所述目標檢測結果包括所述待處理圖像中目標的位置和類別，所述目標檢測網路是根據上述的網路訓練方法訓練得到的。According to an aspect of the embodiments of the present invention, there is provided a target detection method, the method comprising: inputting a to-be-processed image into a target detection network for processing, and obtaining a target detection result of the to-be-processed image, the target detection result It includes the position and category of the target in the to-be-processed image, and the target detection network is trained according to the above-mentioned network training method.

根據本發明實施例的一方面，提供了一種網路訓練裝置，包括：目標檢測部分，被配置為將未標注的第一樣本圖像輸入目標檢測網路中處理，得到所述第一樣本圖像的目標檢測結果，所述目標檢測結果包括所述第一樣本圖像中目標的圖像區域、特徵資訊及分類概率；置信度確定部分，被配置為根據所述目標的分類概率，確定所述目標的類別置信度；標注部分，被配置為針對所述目標中類別置信度大於或等於第一閾值的第一目標，將所述第一目標所在的第一樣本圖像作為已標注的第二樣本圖像，並加入訓練集中，其中，所述第二樣本圖像的標注資訊包括所述第一目標的圖像區域及與所述第一目標的類別置信度對應的類別，所述訓練集中包括已標注的第三樣本圖像；特徵挖掘部分，被配置為針對所述目標中類別置信度小於所述第一閾值的第二目標，根據所述第三樣本圖像中的第三目標的特徵資訊，對所述第二目標進行特徵相關挖掘，通過特徵相關挖掘，從所述第二目標中確定出第四目標及所述第四目標所在的第一樣本圖像，並將所述第四目標所在的第一樣本圖像作為第四樣本圖像，並加入所述訓練集中；訓練部分，被配置為根據所述第四樣本圖像的標注資訊，所述訓練集中的第二樣本圖像、第三樣本圖像及所述第四樣本圖像，訓練所述目標檢測網路。According to an aspect of the embodiments of the present invention, a network training apparatus is provided, including: The target detection part is configured to input the unlabeled first sample image into the target detection network for processing, and obtain the target detection result of the first sample image, and the target detection result includes the first sample image The image area, feature information and classification probability of the target in this image; a confidence determination part, configured to determine the category confidence of the target according to the classification probability of the target; The labeling part is configured to take the first sample image where the first target is located as the labeled second sample image for the first target whose category confidence is greater than or equal to the first threshold in the target, and Add to the training set, wherein the annotation information of the second sample image includes the image area of the first target and the category corresponding to the category confidence of the first target, and the training set includes the marked first target. Three sample images; The feature mining part is configured to, for the second target whose category confidence is less than the first threshold in the target, perform a search on the second target according to the feature information of the third target in the third sample image. Feature correlation mining, through feature correlation mining, the fourth target and the first sample image where the fourth target is located are determined from the second target, and the first sample image where the fourth target is located is determined. image as the fourth sample image and added to the training set; The training part is configured to train the target detection network according to the annotation information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set .

在一種可能的實現方式中，所述訓練部分包括：採樣數量確定子部分，被配置為根據所述訓練集的正樣本圖像中目標的類別，分別確定從各個類別的正樣本圖像中採樣的第一數量，所述正樣本圖像為圖像中包括目標的樣本圖像；第一採樣子部分，被配置為根據各個類別的正樣本圖像中採樣的第一數量，對各個類別的正樣本圖像進行採樣，得到多個第五樣本圖像；第二採樣子部分，被配置為對所述訓練集的負樣本圖像進行採樣，得到多個第六樣本圖像，所述負樣本圖像為圖像中不包括目標的樣本圖像；訓練子部分，被配置為根據所述第五樣本圖像及所述第六樣本圖像，訓練所述目標檢測網路。In a possible implementation manner, the training part includes: a sampling quantity determination sub-part, configured to determine, according to the category of the target in the positive sample images of the training set, to sample samples from the positive sample images of each category respectively The first number of positive sample images is a sample image including the target in the image; the first sampling sub-section is configured to sample the positive sample images of each category according to the first number of samples of each category. The positive sample images are sampled to obtain a plurality of fifth sample images; the second sampling subsection is configured to sample the negative sample images of the training set to obtain a plurality of sixth sample images, and the negative sample images are obtained. The sample image is a sample image that does not include a target; the training subsection is configured to train the target detection network according to the fifth sample image and the sixth sample image.

在一種可能的實現方式中，所述特徵挖掘部分包括：資訊熵確定子部分，被配置為根據所述第二目標的分類概率，確定所述第二目標的資訊熵；目標選擇子部分，被配置為根據所述第二目標的類別置信度及資訊熵，從所述第二目標中選擇出第五目標；挖掘數量確定子部分，被配置為根據所述第三樣本圖像中的第三目標的類別以及待挖掘的樣本圖像的總數量，分別確定各個類別待挖掘的樣本圖像的第二數量；目標及圖像確定子部分，被配置為根據所述第三樣本圖像中的第三目標的特徵資訊，所述第五目標的特徵資訊以及各個類別待挖掘的樣本圖像的第二數量，從所述第五目標中確定出第四目標及所述第四目標所在的第一樣本圖像。In a possible implementation manner, the feature mining part includes: an information entropy determination sub-section, configured to determine the information entropy of the second target according to the classification probability of the second target; a target selection sub-section, configured by is configured to select a fifth target from the second target according to the category confidence and information entropy of the second target; the mining quantity determination subsection is configured to be configured according to the third sample image in the third sample image. The category of the target and the total number of sample images to be mined respectively determine the second number of sample images to be mined for each category; the target and image determination sub-section is configured to be based on the third sample image. The feature information of the third target, the feature information of the fifth target and the second number of sample images to be mined in each category, the fourth target and the fourth target where the fourth target is located are determined from the fifth target. a sample image.

在一種可能的實現方式中，所述目標選擇子部分被配置為：根據所述第二目標的類別置信度及資訊熵，分別對所述第二目標進行排序，選擇出第三數量的第六目標和第四數量的第七目標；對所述第六目標和所述第七目標進行合併，得到所述第五目標。In a possible implementation manner, the target selection sub-section is configured to: according to the category confidence and information entropy of the second target, sort the second targets respectively, and select a third number of sixth targets. A target and a seventh target of a fourth quantity; the sixth target and the seventh target are combined to obtain the fifth target.

在一種可能的實現方式中，所述挖掘數量確定子部分被配置為：根據所述第三樣本圖像中的第三目標的類別，確定各個類別的第三目標的比例；根據各個類別的第三目標的比例，確定各個類別的抽樣比重；根據各個類別的抽樣比重，分別確定各個類別待挖掘的樣本圖像的第二數量。In a possible implementation manner, the mining quantity determination subsection is configured to: determine the proportion of the third objects of each category according to the category of the third object in the third sample image; The proportion of the three targets determines the sampling proportion of each category; according to the sampling proportion of each category, the second quantity of sample images to be mined in each category is determined respectively.

在一種可能的實現方式中，所述目標及圖像確定子部分被配置為：根據第一類別的第三目標的特徵資訊與各個第五目標的特徵資訊之間的距離，分別確定所述第一類別的第三目標中與各個第五目標距離最小的第三目標，並作為第八目標，所述第一類別為第三目標的類別中的任意一個；將所述第八目標中距離最大的目標，確定為第四目標。In a possible implementation manner, the target and image determination subsections are configured to: determine the third target according to the distance between the characteristic information of the third target of the first category and the characteristic information of each fifth target. Among the third targets of a category, the third target with the smallest distance from each fifth target is used as the eighth target, and the first category is any one of the categories of the third targets; the distance among the eighth targets is the largest target, identified as the fourth target.

在一種可能的實現方式中，所述目標及圖像確定子部分還被配置為：將確定出的第四目標添加到所述第一類別的第三目標中，並將所述確定出的第四目標從未標注的第五目標中移除。In a possible implementation manner, the target and image determination subsection is further configured to: add the determined fourth target to the third target of the first category, and add the determined fourth target to the third target of the first category. Four targets were removed from the unlabeled fifth target.

在一種可能的實現方式中，所述裝置還包括：特徵提取部分，被配置為將所述第三樣本圖像輸入所述目標檢測網路中處理，得到所述第三樣本圖像中的第三目標的特徵資訊。In a possible implementation manner, the apparatus further includes: a feature extraction part, configured to input the third sample image into the target detection network for processing, and obtain the third sample image in the third sample image for processing. Characteristic information of three targets.

在一種可能的實現方式中，在所述目標檢測部分之前，所述裝置還包括：預訓練部分，被配置為通過已標注的第三樣本圖像對所述目標檢測網路進行預訓練。In a possible implementation manner, before the target detection part, the apparatus further includes: a pre-training part, configured to pre-train the target detection network by using the labeled third sample images.

根據本發明實施例的一方面，提供了一種目標檢測裝置，所述裝置包括：檢測處理部分，被配置為將待處理圖像輸入目標檢測網路中處理，得到所述待處理圖像的目標檢測結果，所述目標檢測結果包括所述待處理圖像中目標的位置和類別，所述目標檢測網路是根據上述的網路訓練方法訓練得到的。According to an aspect of the embodiments of the present invention, a target detection apparatus is provided, the apparatus includes: a detection processing part configured to input an image to be processed into a target detection network for processing, and obtain a target of the to-be-processed image The detection result, the target detection result includes the position and category of the target in the to-be-processed image, and the target detection network is trained according to the above-mentioned network training method.

在一種可能的實現方式中，在所述根據所述訓練集的正樣本圖像中目標的類別，分別確定從各個類別的正樣本圖像中採樣的第一數量的步驟之前，包括：對訓練集中的正樣本圖像和負樣本圖像進行採樣，得到數量相同或相近的正樣本圖像和負樣本圖像。In a possible implementation manner, before the step of respectively determining the first number of samples sampled from the positive sample images of each category according to the category of the target in the positive sample images of the training set, the method includes: training the training set. The positive sample images and negative sample images in the set are sampled to obtain the same or similar number of positive sample images and negative sample images.

在一種可能的實現方式中，所述待挖掘的樣本圖像的總數量為所述第一樣本圖像的總數量的5%~25%。In a possible implementation manner, the total number of sample images to be mined is 5% to 25% of the total number of the first sample images.

在一種可能的實現方式中，所述對所述第六目標和所述第七目標進行合併，得到所述第五目標，包括：去除所述第六目標中與所述第七目標相同的目標，得到所述第六目標中與所述第七目標不同的剩餘目標；將所述剩餘目標和所述第七目標作為所述第五目標。In a possible implementation manner, the combining the sixth target and the seventh target to obtain the fifth target includes: removing the same target as the seventh target in the sixth target , obtain the remaining target that is different from the seventh target in the sixth target; take the remaining target and the seventh target as the fifth target.

在一種可能的實現方式中，所述方法還包括：在所述第一類別的所述第四樣本圖像的數量達到所述第一類別的待挖掘的樣本圖像的第二數量時，結束對所述第一類別的特徵相關挖掘。In a possible implementation manner, the method further includes: when the number of the fourth sample images of the first category reaches a second number of sample images of the first category to be mined, ending Feature correlation mining for the first category.

在一種可能的實現方式中，在所述根據第一類別的第三目標的特徵資訊與各個第五目標的特徵資訊之間的距離，分別確定所述第一類別的第三目標中與各個第五目標距離最小的第三目標，並作為第八目標的步驟之後，所述方法還包括：在所述第四目標所在的第一樣本圖像的數量達到所述第一類別的待挖掘的樣本圖像的第二數量時，結束對所述第八目標的確定。In a possible implementation manner, according to the distance between the feature information of the third target of the first category and the feature information of each fifth target, determine the distance between the third target of the first category and the feature information of each fifth target, respectively. After the step of using the third target with the smallest distance from the five targets as the eighth target, the method further includes: when the number of the first sample images where the fourth target is located reaches the target of the first category to be mined When the second number of sample images is reached, the determination of the eighth target is ended.

在一種可能的實現方式中，在所述根據第一類別的第三目標的特徵資訊與各個第五目標的特徵資訊之間的距離，分別確定所述第一類別的第三目標中與各個第五目標距離最小的第三目標，並作為第八目標的步驟之後，所述方法還包括：在所述第四目標所在的第一樣本圖像的數量未達到所述第一類別的待挖掘的樣本圖像的第二數量，且儲存所述第五目標的特徵資訊的集合為空時，結束對所述第八目標的確定。In a possible implementation manner, according to the distance between the feature information of the third target of the first category and the feature information of each fifth target, determine the distance between the third target of the first category and the feature information of each fifth target, respectively. After the step of using the third target with the smallest distance between the five targets as the eighth target, the method further includes: when the number of the first sample images where the fourth target is located does not reach the number of the first category to be mined When the second number of sample images of , and the set storing the feature information of the fifth object is empty, the determination of the eighth object is ended.

在一種可能的實現方式中，所述將所述第三樣本圖像輸入所述目標檢測網路中處理，得到所述第三樣本圖像中的第三目標的特徵資訊，包括：將所述第三樣本圖像，輸入所述目標檢測網路中，得到所述目標檢測網路的隱藏層輸出的特徵向量；將所述特徵向量確定為所述第三目標的特徵資訊。In a possible implementation manner, the inputting the third sample image into the target detection network for processing to obtain feature information of the third target in the third sample image includes: The third sample image is input into the target detection network, and the feature vector output by the hidden layer of the target detection network is obtained; the feature vector is determined as the feature information of the third target.

根據本發明實施例的一方面，提供了一種電子設備，包括：處理器；被配置為儲存處理器可執行指令的記憶體；其中，所述處理器被配置為調用所述記憶體儲存的指令，以執行上述網路訓練方法，或執行上述目標檢測方法。According to an aspect of the embodiments of the present invention, an electronic device is provided, including: a processor; a memory configured to store instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory , to perform the above network training method, or to perform the above target detection method.

根據本發明實施例的一方面，提供了一種電腦可讀儲存介質，其上儲存有電腦程式指令，所述電腦程式指令被處理器執行時實現網路訓練方法，或實現上述目標檢測方法。According to an aspect of the embodiments of the present invention, there is provided a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the network training method or the above target detection method is implemented.

根據本發明的實施例，能夠通過目標檢測網路獲取未標注樣本圖像的目標檢測結果；根據目標檢測結果分別進行偽標注和特徵相關挖掘，標注並收集高價值的樣本圖像，加入訓練集；根據擴充後的訓練集訓練目標檢測網路，從而擴充訓練集中的正樣本資料數量，緩解正負樣本之間的不均衡問題，提高了目標檢測網路的訓練效果。According to the embodiment of the present invention, the target detection results of unlabeled sample images can be obtained through the target detection network; pseudo-labeling and feature correlation mining are respectively performed according to the target detection results, high-value sample images are marked and collected, and added to the training set ; Train the target detection network according to the expanded training set, thereby expanding the number of positive sample data in the training set, alleviating the imbalance between positive and negative samples, and improving the training effect of the target detection network.

應當理解的是，以上的一般描述和後文的細節描述僅是示例性和解釋性的，而非限制本發明實施例。根據下面參考附圖對示例性實施例的詳細說明，本發明實施例的其它特徵及方面將變得清楚。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not limiting of embodiments of the present invention. Other features and aspects of embodiments of the present invention will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

以下將參考附圖詳細說明本發明的各種示例性實施例、特徵和方面。附圖中相同的附圖標記表示功能相同或相似的元件。儘管在附圖中示出了實施例的各種方面，但是除非特別指出，不必按比例繪製附圖。Various exemplary embodiments, features and aspects of the present invention will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures denote elements that have the same or similar functions. While various aspects of the embodiments are shown in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.

在這裡專用的詞“示例性”意為“用作例子、實施例或說明性”。這裡作為“示例性”所說明的任何實施例不必解釋為優於或好於其它實施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

本文中術語“和/或”，僅僅是一種描述關聯物件的關聯關係，表示可以存在三種關係，例如，A和/或B，可以表示：單獨存在A，同時存在A和B，單獨存在B這三種情況。另外，本文中術語“至少一種”表示多種中的任意一種或多種中的至少兩種的任意組合，例如，包括A、B、C中的至少一種，可以表示包括從A、B和C構成的集合中選擇的任意一個或多個元素。The term "and/or" in this article is only a relationship to describe related objects, which means that there can be three relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. three conditions. In addition, the term "at least one" herein refers to any combination of any one of a plurality or at least two of a plurality, for example, including at least one of A, B, and C, and may mean including those composed of A, B, and C. Any one or more elements selected in the collection.

另外，為了更好地說明本發明實施例，在下文的具體實施方式中給出了眾多的具體細節。本領域技術人員應當理解，沒有某些具體細節，本發明實施例同樣可以實施。在一些實例中，對於本領域技術人員熟知的方法、手段、元件和電路未作詳細描述，以便於凸顯本發明實施例的主旨。In addition, in order to better illustrate the embodiments of the present invention, numerous specific details are given in the following detailed description. It should be understood by those skilled in the art that the embodiments of the present invention can also be implemented without certain specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail so as to obscure the subject matter of the embodiments of the present invention.

圖1示出根據本發明實施例的網路訓練方法的流程圖，如圖1所示，所述網路訓練方法包括：在步驟S11中，將未標注的第一樣本圖像輸入目標檢測網路中處理，得到所述第一樣本圖像的目標檢測結果，所述目標檢測結果包括所述第一樣本圖像中目標的圖像區域、特徵資訊及分類概率；在步驟S12中，根據所述目標的分類概率，確定所述目標的類別置信度；在步驟S13中，針對所述目標中類別置信度大於或等於第一閾值的第一目標，將所述第一目標所在的第一樣本圖像作為已標注的第二樣本圖像，並加入訓練集中，其中，所述第二樣本圖像的標注資訊包括所述第一目標的圖像區域及與所述第一目標的類別置信度對應的類別，所述訓練集中包括已標注的第三樣本圖像；在步驟S14中，針對所述目標中類別置信度小於所述第一閾值的第二目標，根據所述第三樣本圖像中的第三目標的特徵資訊，對所述第二目標進行特徵相關挖掘，通過特徵相關挖掘，從所述第二目標中確定出第四目標及所述第四目標所在的第一樣本圖像，並將所述第四目標所在的第一樣本圖像作為第四樣本圖像，並加入所述訓練集中；在步驟S15中，根據所述第四樣本圖像的標注資訊，所述訓練集中的第二樣本圖像、第三樣本圖像及所述第四樣本圖像，訓練所述目標檢測網路。FIG. 1 shows a flowchart of a network training method according to an embodiment of the present invention. As shown in FIG. 1 , the network training method includes: In step S11, the unlabeled first sample image is input into a target detection network for processing to obtain a target detection result of the first sample image, where the target detection result includes the first sample image The image area, feature information and classification probability of the target in the image; In step S12, the category confidence of the target is determined according to the classification probability of the target; In step S13, for the first target whose category confidence is greater than or equal to the first threshold in the target, the first sample image where the first target is located is taken as the marked second sample image, and added In the training set, the annotation information of the second sample image includes the image area of the first target and the category corresponding to the category confidence of the first target, and the training set includes the labeled third sample image; In step S14, for a second target whose category confidence is less than the first threshold in the target, perform feature correlation on the second target according to the feature information of the third target in the third sample image Mining, through feature correlation mining, the fourth target and the first sample image where the fourth target is located are determined from the second target, and the first sample image where the fourth target is located is used as The fourth sample image is added to the training set; In step S15, the target detection network is trained according to the annotation information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set.

在一種可能的實現方式中，所述方法可以由終端設備或伺服器等電子設備執行，終端設備可以為使用者設備（User Equipment，UE）、移動設備、使用者終端、終端、蜂窩電話、無線電話、個人數位助理（Personal Digital Assistant，PDA）、手持設備、計算設備、車載設備、可穿戴設備等，所述方法可以通過處理器調用記憶體中儲存的電腦可讀指令的方式來實現。或者，可通過伺服器執行所述方法。In a possible implementation manner, the method may be executed by an electronic device such as a terminal device or a server, and the terminal device may be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a cellular phone, a wireless For telephones, personal digital assistants (PDAs), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc., the method can be implemented by the processor calling computer-readable instructions stored in the memory. Alternatively, the method may be performed by a server.

舉例來說，第一樣本圖像可以是通過圖像採集設備（例如攝影頭）採集的圖像。第一樣本圖像可包括大規模的長尾（Long-tailed）圖像，也即大部分圖像為背景圖像，小部分圖像中包括可檢測的目標。可檢測的目標可例如包括人體、人臉、車輛、物體等。例如在安防領域中，可通過攝影頭採集某一地理區域的圖像，可能僅有小部分時間有人經過該地理區域，從而採集到的圖像大部分為背景圖像，僅小部分圖像中包括人臉和/或人體。在該情況下，採集到的多個圖像即可組成長尾資料集。本發明實施例對第一樣本圖像的獲取方式及第一樣本圖像中目標的類別不作限制。For example, the first sample image may be an image acquired by an image acquisition device (eg, a camera). The first sample image may include a large-scale long-tailed image, that is, most of the images are background images, and a small part of the images include detectable objects. Detectable targets may include, for example, human bodies, faces, vehicles, objects, and the like. For example, in the security field, images of a certain geographic area can be collected by cameras, and people may pass through the geographic area only a small part of the time, so most of the collected images are background images, and only a small part of the images Includes human face and/or human body. In this case, the multiple images collected can form a long-tail dataset. The embodiment of the present invention does not limit the acquisition method of the first sample image and the category of the target in the first sample image.

在一種可能的實現方式中，可預先設置有目標檢測網路，用於檢測圖像中目標的位置（也即檢測框）和類別。該目標檢測網路可例如為卷積神經網路，本發明實施例對目標檢測網路的網路結構不作限制。In a possible implementation manner, a target detection network may be preset to detect the position (that is, the detection frame) and category of the target in the image. The target detection network may be, for example, a convolutional neural network, and the embodiment of the present invention does not limit the network structure of the target detection network.

在一種可能的實現方式中，在步驟S11之前，該方法還包括：通過已標注的第三樣本圖像對所述目標檢測網路進行預訓練。也就是說，可預設有訓練集，該訓練集中包括已標注的第三樣本圖像，第三樣本圖像的標注資訊可包括圖像中目標的檢測框和類別。根據該訓練集，可採用相關技術中的方式對目標檢測網路進行預訓練，以使該目標檢測網路具有一定的檢測精度。In a possible implementation manner, before step S11, the method further includes: pre-training the target detection network by using the labeled third sample image. That is to say, a training set may be preset, and the training set includes the labeled third sample images, and the labeling information of the third sample images may include the detection frame and category of the target in the image. According to the training set, the target detection network can be pre-trained by the method in the related art, so that the target detection network has a certain detection accuracy.

然而，預訓練後的目標檢測網路對大規模長尾圖像的檢測效果較差，因此，可通過主動學習的方式，採用未標注的第一樣本圖像進一步訓練目標檢測網路。However, the pre-trained object detection network has poor detection effect on large-scale long-tail images. Therefore, the unlabeled first sample image can be used to further train the object detection network through active learning.

在一種可能的實現方式中，在步驟S11中，可將未標注的第一樣本圖像輸入目標檢測網路中處理，得到第一樣本圖像的目標檢測結果。該目標檢測結果可包括第一樣本圖像中目標的圖像區域、特徵資訊及分類概率。目標所在的圖像區域可為圖像中的檢測框；目標的特徵資訊可例如為目標檢測網路的隱藏層（例如卷積層）輸出的特徵向量；目標的分類概率可表示該目標屬於各個類別的分類後驗概率。In a possible implementation manner, in step S11 , the unlabeled first sample image may be input into a target detection network for processing to obtain a target detection result of the first sample image. The target detection result may include the image area, feature information and classification probability of the target in the first sample image. The image area where the target is located can be the detection frame in the image; the feature information of the target can be, for example, the feature vector output by the hidden layer (such as the convolution layer) of the target detection network; the classification probability of the target can indicate that the target belongs to each category The classification posterior probability of .

在一種可能的實現方式中，第一樣本圖像中的目標也可稱為實例，每個第一樣本圖像中可能檢測出一個或多個目標。在實際處理中，檢測到的目標的數量量級可能是圖像數量量級的幾倍到幾十倍。In a possible implementation manner, the target in the first sample image may also be referred to as an instance, and one or more targets may be detected in each first sample image. In practical processing, the order of magnitude of detected objects may be several to dozens of times the order of magnitude of images.

在一種可能的實現方式中，在步驟S12中，根據目標的分類概率，可求取分類概率的最大值，確定為該目標的類別置信度。In a possible implementation manner, in step S12, according to the classification probability of the target, the maximum value of the classification probability may be obtained and determined as the classification confidence level of the target.

在一種可能的實現方式中，在步驟S13中，針對類別置信度大於或等於第一閾值的目標（可稱為第一目標），可將該第一目標所在的第一樣本圖像作為已標注的樣本圖像（可稱為第二樣本圖像），並加入訓練集中。將第一目標的圖像區域作為標注的圖像區域，將與該第一目標的類別置信度對應的類別作為該第一目標的標注類別。同一個第二樣本圖像可能被該第二樣本圖像中的多個第一目標標注多次。其中，第一閾值例如為0.99，本發明實施例對第一閾值的取值不作限制。In a possible implementation manner, in step S13 , for a target whose category confidence is greater than or equal to the first threshold (which may be referred to as a first target), the first sample image where the first target is located may be regarded as an existing target. Annotated sample images (which can be called second sample images) are added to the training set. The image area of the first target is taken as the marked image area, and the category corresponding to the category confidence of the first target is taken as the marked category of the first target. The same second sample image may be labeled multiple times by multiple first objects in the second sample image. The first threshold is, for example, 0.99, and the embodiment of the present invention does not limit the value of the first threshold.

在一種可能的實現方式中，步驟S13的處理過程可稱為偽標注（pseudo-labeling）。也即，將置信度較高的目標所在的圖像作為高價值的樣本，將目標檢測推理結果直接作為目標的標注結果。通過這種方式，可以擴充訓練集中正樣本資料的數量，以解決正樣本收集困難的問題。In a possible implementation manner, the process of step S13 may be called pseudo-labeling. That is, the image where the target with higher confidence is located is regarded as a high-value sample, and the target detection inference result is directly used as the target annotation result. In this way, the number of positive samples in the training set can be expanded to solve the problem of difficulty in collecting positive samples.

在一種可能的實現方式中，在步驟S14中，針對類別置信度小於第一閾值的目標（可稱為第二目標），可根據訓練集中已標注的第三樣本圖像中目標（可稱為第三目標）的特徵資訊，對第二目標進行特徵相關挖掘，從第二目標中挖掘出滿足要求的目標（可稱為第四目標）。例如，可計算第三目標的特徵資訊與第二目標的特徵資訊之間的距離或相關度，根據距離或相關度選擇出預設數量的目標，並將選擇出的預設數量的目標作為第四目標。In a possible implementation manner, in step S14, for the target whose category confidence is less than the first threshold (which may be referred to as the second target), the target in the third sample image that has been marked in the training set (which may be referred to as The feature information of the third target), carry out feature correlation mining on the second target, and dig out the target that meets the requirements from the second target (which can be called the fourth target). For example, the distance or correlation between the feature information of the third target and the feature information of the second target can be calculated, a preset number of targets can be selected according to the distance or the correlation degree, and the selected preset number of targets can be used as the first target. Four goals.

在一種可能的實現方式中，可將挖掘到的第四目標所在的第一樣本圖像作為第四樣本圖像，並加入所述訓練集中，從而完成特徵相關挖掘的處理過程。通過這種方式，能夠進一步擴充訓練集中樣本資料的數量。In a possible implementation manner, the first sample image where the mined fourth target is located may be taken as the fourth sample image and added to the training set, so as to complete the processing process of feature correlation mining. In this way, the number of sample data in the training set can be further expanded.

在一種可能的實現方式中，可通過人工標注的方式獲取第四樣本圖像的標注資訊，例如人工確定第四樣本圖像中目標的檢測框和類別。本發明實施例對此不作限制。In a possible implementation manner, the annotation information of the fourth sample image may be obtained by manual annotation, for example, manually determining the detection frame and category of the target in the fourth sample image. This embodiment of the present invention does not limit this.

在一種可能的實現方式中，在步驟S15中，在得到第四樣本圖像的標注資訊後，可根據訓練集中的第二樣本圖像、第三樣本圖像及第四樣本圖像，訓練目標檢測網路。In a possible implementation manner, in step S15, after obtaining the annotation information of the fourth sample image, the training target can be trained according to the second sample image, the third sample image and the fourth sample image in the training set Check the network.

在一種可能的實現方式中，經過步驟S11處理，得到各個第一樣本圖像的目標檢測結果；經過S12處理，得到各個第一樣本圖像中目標的類別置信度。在步驟S13中，可將類別置信度大於或等於第一閾值的第一目標所在的樣本圖像加入訓練集，通過偽標注方式得到已標注的第二樣本圖像；在步驟S14中，可對類別置信度小於第一閾值的第二目標進行挖掘。In a possible implementation manner, after the processing of step S11, the target detection results of each first sample image are obtained; after the processing of S12, the category confidence of the target in each of the first sample images is obtained. In step S13, the sample image of the first target whose category confidence is greater than or equal to the first threshold can be added to the training set, and the labeled second sample image can be obtained by pseudo-labeling; in step S14, the The second target whose class confidence is less than the first threshold is mined.

在一種可能的實現方式中，步驟S14可包括：根據所述第二目標的分類概率，確定所述第二目標的資訊熵；根據所述第二目標的類別置信度及資訊熵，從所述第二目標中選擇出第五目標；根據所述第三樣本圖像中的第三目標的類別以及待挖掘的樣本圖像的總數量，分別確定各個類別待挖掘的樣本圖像的第二數量；根據所述第三樣本圖像中的第三目標的特徵資訊，所述第五目標的特徵資訊以及各個類別待挖掘的樣本圖像的第二數量，從所述第五目標中確定出第四目標及所述第四目標所在的第一樣本圖像。In a possible implementation manner, step S14 may include: determining the information entropy of the second target according to the classification probability of the second target; selecting a fifth target from the second targets according to the category confidence and information entropy of the second target; According to the category of the third target in the third sample image and the total number of sample images to be mined, respectively determine the second number of sample images to be mined in each category; According to the feature information of the third target in the third sample image, the feature information of the fifth target and the second number of sample images to be mined in each category, a fourth target is determined from the fifth target The target and the first sample image where the fourth target is located.

舉例來說，根據第二目標的分類概率，可計算得到第二目標的資訊熵，用於表示第二目標的不確定程度，也即，第二目標的資訊熵越大，則第二目標的不確定程度越大；反之，第二目標的資訊熵越小，則第二目標的不確定程度越小。本發明實施例對資訊熵的計算方式不作限制。For example, according to the classification probability of the second target, the information entropy of the second target can be calculated to represent the degree of uncertainty of the second target, that is, the greater the information entropy of the second target, the higher the The greater the degree of uncertainty; on the contrary, the smaller the information entropy of the second target, the smaller the degree of uncertainty of the second target. The embodiment of the present invention does not limit the calculation method of the information entropy.

在一種可能的實現方式中，根據第二目標的類別置信度及資訊熵，可分別從多個第二目標中選擇出滿足一定條件的目標（可稱為第五目標），例如選擇出類別置信度較大的目標、資訊熵較大的目標等。In a possible implementation manner, according to the category confidence and information entropy of the second target, a target (which may be referred to as a fifth target) that satisfies a certain condition may be selected from a plurality of second targets, for example, a category confidence may be selected. Targets with larger degrees, targets with larger information entropy, etc.

在一種可能的實現方式中，根據所述第二目標的類別置信度及資訊熵，從所述第二目標中選擇出第五目標的步驟，可包括：根據所述第二目標的類別置信度及資訊熵，分別對所述第二目標進行排序，選擇出第三數量的第六目標和第四數量的第七目標；對所述第六目標和所述第七目標進行合併，得到所述第五目標。In a possible implementation manner, the step of selecting a fifth target from the second target according to the category confidence and information entropy of the second target may include: According to the category confidence and information entropy of the second targets, the second targets are sorted respectively, and the third number of sixth targets and the fourth number of seventh targets are selected; The sixth target and the seventh target are combined to obtain the fifth target.

也就是說，根據第二目標的類別置信度，對多個第二目標進行排序；根據排序結果，從多個第二目標中選擇出預設的第三數量的目標（可稱為第六目標）。類似地，根據第二目標的資訊熵，對多個第二目標進行排序；根據排序結果，從多個第二目標中選擇出預設的第四數量的目標（可稱為第七目標）。其中，第三數量和第四數量可分別為3K，K表示待挖掘的樣本圖像的數量，K例如取值為10000。在實際處理中，K的取值可能為未標注的第一樣本圖像的總數量的5%~25%。本發明實施例對K的取值，以及第三數量和第四數量與K之間的數量關係均不作限制。That is, according to the category confidence of the second target, sort the multiple second targets; according to the sorting result, select a preset third number of targets (which may be referred to as the sixth target) from the plurality of second targets ). Similarly, according to the information entropy of the second targets, the plurality of second targets are sorted; according to the sorting result, a preset fourth number of targets (which may be referred to as seventh targets) are selected from the plurality of second targets. Wherein, the third number and the fourth number may be 3K respectively, K represents the number of sample images to be mined, and K is, for example, 10000. In actual processing, the value of K may be 5%~25% of the total number of unlabeled first sample images. The embodiment of the present invention does not limit the value of K and the quantitative relationship between the third quantity and the fourth quantity and K.

應當理解，本領域技術人員可根據實際情況設置待挖掘的樣本圖像的數量K、第三數量及第四數量的取值，且第三數量和第四數量可以不同，本發明實施例對此不作限制。It should be understood that those skilled in the art can set the values of the number K, the third number and the fourth number of sample images to be excavated according to the actual situation, and the third number and the fourth number may be different. No restrictions apply.

在一種可能的實現方式中，可將選取的第六目標和第七目標合併，將合併後的多個目標作為第五目標，以便去除其中可能存在的重複目標。在實際處理中，可得到大約6K個第五目標。In a possible implementation manner, the selected sixth target and the seventh target may be combined, and the combined multiple targets may be used as the fifth target, so as to remove possible duplicate targets therein. In actual processing, about 6K fifth objects are available.

上述的處理方式可稱為自舉法（bootstrapping），通過這種方式，可從第二目標中同時選取一定數量的、可能性較高的正樣本和負樣本，以便後續進行特徵相關挖掘，從而降低特徵相關挖掘的計算量，提高處理效率。The above processing method can be called bootstrapping. In this way, a certain number of positive samples and negative samples with high probability can be selected from the second target at the same time, so as to perform feature correlation mining in the future. Reduce the calculation amount of feature correlation mining and improve processing efficiency.

在一種可能的實現方式中，根據所述第三樣本圖像中的第三目標的類別以及待挖掘的樣本圖像的總數量，分別確定各個類別待挖掘的樣本圖像的第二數量的步驟，可包括：根據所述第三樣本圖像中的第三目標的類別，確定各個類別的第三目標的比例；根據各個類別的第三目標的比例，確定各個類別的抽樣比重；根據各個類別的抽樣比重，分別確定各個類別待挖掘的樣本圖像的第二數量。In a possible implementation manner, the step of respectively determining the second number of sample images to be mined in each category according to the category of the third target in the third sample image and the total number of sample images to be mined , which can include: According to the category of the third object in the third sample image, determine the proportion of the third object of each category; According to the proportion of the third target of each category, determine the sampling proportion of each category; According to the sampling proportion of each category, the second quantity of sample images to be mined in each category is determined respectively.

舉例來說，根據訓練集中已有標注的第三樣本圖像中的第三目標的類別，可確定各個類別的第三目標的比例f_c ；根據該比例f_c ，可通過如下公式計算出各個類別的抽樣比重

_c 。

（1）

（2）在公式（1）和（2）中，r_c 表示類別c的抽樣值；t為超參數，取值例如為0.1；C表示類別數量；r_i 表示C個類別中第i個類別的抽樣值。For example, according to the category of the third object in the third sample image that has been marked in the training set, the proportion f _c of the third object in each category can be determined; according to the proportion f _c , the following formula can be used to calculate each Sampling weights for categories

_c .

(1)

(2) In formulas (1) and (2), rc represents the sampling value of category _c ; t is a hyperparameter, with a value of 0.1, for example; C represents the number of categories; ri represents the _ith category in the C categories sample value.

通過公式（1）和（2）的處理，可提高比例較小的類別所對應的抽樣比重，並降低比例較大的類別所對應的抽樣比重，從而緩解不同類別的樣本之間的數量不平衡的問題，以便提高網路的訓練效果。Through the processing of formulas (1) and (2), the sampling proportion corresponding to the category with a small proportion can be increased, and the sampling proportion corresponding to the category with a large proportion can be reduced, thereby alleviating the number imbalance between samples of different categories problem in order to improve the training effect of the network.

在一種可能的實現方式中，根據各個類別的抽樣比重

_c 以及待挖掘的樣本圖像的總數量（K個），可確定出各個類別待挖掘的樣本圖像的第二數量。進而可根據第二數量進行特徵相關挖掘。In one possible implementation, according to the sampling weight of each category

_c and the total number (K) of sample images to be mined, the second number of sample images to be mined for each category can be determined. Further, feature correlation mining may be performed according to the second quantity.

也就是說，可將訓練集中已標注的第三樣本圖像，輸入到目標檢測網路中，由目標檢測網路的隱藏層（例如卷積層）輸出該第三樣本圖像的特徵資訊，例如特徵向量。通過這種方式，可得到第三樣本圖像的特徵，以便於後續的特徵相關挖掘。That is to say, the labeled third sample image in the training set can be input into the target detection network, and the feature information of the third sample image can be output by the hidden layer (such as the convolution layer) of the target detection network, such as Feature vector. In this way, the features of the third sample image can be obtained to facilitate subsequent feature correlation mining.

在一種可能的實現方式中，根據所述第三樣本圖像中的第三目標的特徵資訊，所述第五目標的特徵資訊以及各個類別待挖掘的樣本圖像的第二數量，從所述第五目標中確定出第四目標及所述第四目標所在的第一樣本圖像，包括：根據第一類別的第三目標的特徵資訊與各個第五目標的特徵資訊之間的距離，分別確定所述第一類別的第三目標中與各個第五目標距離最小的第三目標，並作為第八目標，所述第一類別為第三目標的類別中的任意一個；In a possible implementation manner, according to the feature information of the third target in the third sample image, the feature information of the fifth target and the second number of sample images to be mined in each category, from the In the fifth target, the fourth target and the first sample image where the fourth target is located are determined, including: According to the distance between the feature information of the third target of the first category and the feature information of each fifth target, the third target with the smallest distance from each fifth target among the third targets of the first category is respectively determined, and used as The eighth target, the first category is any one of the categories of the third target;

將所述第八目標中距離最大的目標，確定為第四目標。The target with the largest distance among the eighth targets is determined as the fourth target.

舉例來說，在確定各個類別待挖掘的樣本圖像的第二數量後，可採用k中心（k-center）方式，從第五目標所在的樣本圖像中挖掘對應數量的樣本圖像。針對第三目標的多個類別中的任一類別（可稱為第一類別），可計算該第一類別的第三目標的特徵資訊與各個第五目標的特徵資訊之間的距離，該距離可例如為歐氏距離。對於任意一個第五目標，可確定第一類別中的第三目標中與該第五目標距離最小的第三目標，從而可分別確定與各個第五目標距離最小的第三目標，可稱為第八目標。For example, after the second number of sample images to be mined in each category is determined, a k-center method may be used to mine a corresponding number of sample images from the sample images where the fifth target is located. For any one of the multiple categories of the third target (which may be referred to as the first category), the distance between the feature information of the third target of the first category and the feature information of each fifth target may be calculated, the distance It can be, for example, the Euclidean distance. For any fifth target, the third target with the smallest distance from the fifth target among the third targets in the first category can be determined, so that the third target with the smallest distance from each fifth target can be determined, which can be called the first target. Eight goals.

在一種可能的實現方式中，可從各個第八目標中，選擇出距離最大的一個目標，確定為本次的特徵相關挖掘得到的第四目標。如下公式所示：

（3）在公式（3）中，u表示特徵相關挖掘得到的第四目標；

表示第j個第五目標的特徵資訊

與第一類別c的第l個第三目標的特徵資訊

之間的距離；

表示第五目標的特徵資訊的集合；

表示第一類別c的第三目標的特徵資訊的集合。In a possible implementation manner, one target with the largest distance may be selected from each of the eighth targets, and determined as the fourth target obtained by this feature correlation mining. As shown in the following formula:

(3) In formula (3), u represents the fourth target obtained by feature correlation mining;

Indicates the feature information of the jth fifth target

The characteristic information of the lth third object with the first category c

the distance between;

A collection of feature information representing the fifth target;

A set of feature information representing the third object of the first category c.

在一種可能的實現方式中，可確定該第四目標所在的第一樣本圖像，並將該樣本圖像加入訓練集中，作為第四樣本圖像，從而完成本次的特徵相關挖掘過程。In a possible implementation manner, the first sample image where the fourth target is located can be determined, and the sample image is added to the training set as the fourth sample image, thereby completing the feature correlation mining process this time.

在一種可能的實現方式中，根據所述第三樣本圖像中的第三目標的特徵資訊，所述第五目標的特徵資訊以及各個類別待挖掘的樣本圖像的第二數量，從所述第五目標中確定出第四目標及所述第四目標所在的第一樣本圖像的步驟，還包括：將確定出的第四目標添加到所述第一類別的第三目標中，並將所述確定出的第四目標從未標注的第五目標中移除。In a possible implementation manner, according to the feature information of the third target in the third sample image, the feature information of the fifth target and the second number of sample images to be mined in each category, from the The step of determining the fourth target and the first sample image where the fourth target is located in the fifth target further includes: The determined fourth object is added to the third object of the first category, and the determined fourth object is removed from the unlabeled fifth object.

也就是說，將該次特徵相關挖掘得到的第四目標作為已標注目標，並將該第四目標從未標注目標中移除。在該情況下，可將該第四目標的特徵資訊加入到第一類別c的第三目標的特徵資訊的集合

中，從第五目標的特徵資訊的集合

中移除。這樣，在下次的特徵相關挖掘中，可以通過公式（3）對更新後的兩個集合進行挖掘，重複上述過程。That is to say, the fourth target obtained by this feature correlation mining is regarded as the labeled target, and the fourth target is removed from the unlabeled target. In this case, the feature information of the fourth object can be added to the set of feature information of the third object of the first category c

, the collection of feature information from the fifth target

removed in. In this way, in the next feature correlation mining, the two updated sets can be mined by formula (3), and the above process can be repeated.

在一種可能的實現方式中，在第一類別的第四樣本圖像的數量達到第一類別的第二數量，或未達到第二數量且第五目標耗盡（集合

為空）時，可完成該第一類別的特徵相關挖掘。In one possible implementation, the number of the fourth sample images in the first category reaches the second number of the first category, or the second number is not reached and the fifth target is exhausted (the set

is empty), the feature correlation mining of the first category can be completed.

通過這種方式，可分別對各個類別進行特徵相關挖掘，最終得到足夠數量的第四樣本圖像（通常為K個樣本圖像），從而進一步擴充訓練集中的樣本圖像的數量，並緩解正負樣本之間的不均衡。In this way, feature correlation mining can be performed on each category separately, and finally a sufficient number of fourth sample images (usually K sample images) can be obtained, so as to further expand the number of sample images in the training set and alleviate the positive and negative Imbalance between samples.

在一種可能的實現方式中，可對挖掘到的第四樣本圖像進行人工標注（human annotation），得到第四樣本圖像的標注資訊。由於第四樣本圖像中可能同時存在正樣本圖像（也即圖像中包括目標的第四樣本圖像）和負樣本圖像（也即圖像中不包括目標的第四樣本圖像），因此，第四樣本圖像的標注資訊可包括圖像是正樣本圖像或負樣本圖像的樣本類別資訊，正樣本圖像中目標所在的圖像框及目標的類別。In a possible implementation manner, human annotation may be performed on the mined fourth sample image to obtain annotation information of the fourth sample image. Since there may be both a positive sample image (that is, the fourth sample image including the target in the image) and a negative sample image (that is, the fourth sample image that does not include the target) in the fourth sample image , therefore, the annotation information of the fourth sample image may include sample type information of whether the image is a positive sample image or a negative sample image, the image frame where the target is located in the positive sample image, and the target type.

在一種可能的實現方式中，在完成人工標注後，可在步驟S15中根據所述第四樣本圖像的標注資訊，所述訓練集中的第二樣本圖像、第三樣本圖像及第四樣本圖像，訓練目標檢測網路。In a possible implementation manner, after the manual annotation is completed, the second sample image, the third sample image and the fourth sample image in the training set may be selected according to the annotation information of the fourth sample image in step S15. Sample images to train the object detection network.

其中，步驟S15可包括：根據所述訓練集的正樣本圖像中目標的類別，分別確定從各個類別的正樣本圖像中採樣的第一數量，所述正樣本圖像為圖像中包括目標的樣本圖像；根據各個類別的正樣本圖像中採樣的第一數量，對各個類別的正樣本圖像進行採樣，得到多個第五樣本圖像；對所述訓練集的負樣本圖像進行採樣，得到多個第六樣本圖像，所述負樣本圖像為圖像中不包括目標的樣本圖像；根據所述第五樣本圖像及所述第六樣本圖像，訓練所述目標檢測網路。Wherein, step S15 may include: according to the category of the target in the positive sample images of the training set, respectively determining the first number of samples sampled from the positive sample images of each category, and the positive sample images are images that include a sample image of the target; Sampling the positive sample images of each category according to the first quantity sampled in the positive sample images of each category to obtain a plurality of fifth sample images; Sampling the negative sample images of the training set to obtain a plurality of sixth sample images, where the negative sample images are sample images that do not include the target in the image; The target detection network is trained according to the fifth sample image and the sixth sample image.

舉例來說，可通過重採樣（resampling）的方式來訓練目標檢測網路，通過重採樣來增加資料中出現頻次較低的資料的採樣頻率，來改善網路對於這些資料的性能，進一步改善正負樣本之間的不均衡。For example, the target detection network can be trained by resampling, and the sampling frequency of data that appears less frequently in the data can be increased by resampling, so as to improve the performance of the network for these data, and further improve the positive and negative Imbalance between samples.

在一種可能的實現方式中，可分別對訓練集（包括第二樣本圖像、第三樣本圖像及第四樣本圖像）中的正樣本圖像和負樣本圖像進行採樣，以使採樣後正樣本圖像和負樣本圖像的數量相同或相近。In a possible implementation manner, the positive sample images and the negative sample images in the training set (including the second sample image, the third sample image, and the fourth sample image) may be sampled respectively, so that the sampling The number of post-positive sample images and negative sample images is the same or similar.

在一種可能的實現方式中，對於正樣本圖像，可預設有正樣本圖像的採樣總數量。根據訓練集中的正樣本圖像中目標的類別，分別確定從各個類別的正樣本圖像中採樣的第一數量。In a possible implementation manner, for the positive sample image, the total number of samples of the positive sample image may be preset. According to the categories of the objects in the positive sample images in the training set, the first number of samples sampled from the positive sample images of each category is determined respectively.

與前面的處理過程類似，根據正樣本圖像中目標的類別，可確定各個類別的目標的比例；根據該比例，可通過如下公式計算出各個類別的抽樣比重：

（4）公式（4）中，R_h 表示第h個類別的正樣本圖像的抽樣比重；q_h 表示第h個類別的目標的比例；t₁ 為超參數，取值例如為0.1。Similar to the previous processing process, according to the category of the target in the positive sample image, the proportion of the target of each category can be determined; according to the proportion, the sampling proportion of each category can be calculated by the following formula:

(4) In formula (4), R _h represents the sampling proportion of positive sample images of the h th category; q _h represents the proportion of the h th category of targets; t ₁ is a hyperparameter, and the value is, for example, 0.1.

通過公式（4）的處理，可提高比例較小的類別所對應的抽樣比重，並降低比例較大的類別所對應的抽樣比重，從而緩解不同類別的正樣本圖像之間的數量不平衡，以便提高網路的訓練效果。Through the processing of formula (4), the sampling proportion corresponding to the category with a small proportion can be increased, and the sampling proportion corresponding to the category with a large proportion can be reduced, thereby alleviating the number imbalance between positive sample images of different categories. In order to improve the training effect of the network.

在一種可能的實現方式中，根據各個類別的正樣本圖像的抽樣比重以及正樣本圖像的採樣總數量，可確定各個類別的正樣本圖像的第一數量。In a possible implementation manner, the first number of positive sample images of each category may be determined according to the sampling proportion of positive sample images of each category and the total number of samples of positive sample images.

在一種可能的實現方式中，對於任意一個類別，可根據該類別的第一數量，在該類別的正樣本圖像中隨機採樣出第一數量的正樣本圖像，作為第五樣本圖像。對各個類別的正樣本圖像分別進行採樣，可得到採樣總數量的第五樣本圖像。In a possible implementation manner, for any category, a first number of positive sample images may be randomly sampled from the positive sample images of the category according to the first number of the category, as the fifth sample image. The positive sample images of each category are sampled respectively, and the fifth sample image with the total number of samples can be obtained.

在一種可能的實現方式中，對於負樣本圖像，可根據預設的採樣總數量對訓練集中負樣本圖像直接進行隨機採樣，得到採樣總數量的第六樣本圖像。負樣本圖像的該採樣總數量可與正樣本圖像的採樣總數量相同或不同，本發明實施例對此不作限制。In a possible implementation manner, for the negative sample images, the negative sample images in the training set can be directly randomly sampled according to the preset total number of samples to obtain the sixth sample image with the total number of samples. The total number of samples of negative sample images may be the same as or different from the total number of samples of positive sample images, which is not limited in this embodiment of the present invention.

在一種可能的實現方式中，可根據第五樣本圖像及第六樣本圖像，訓練目標檢測網路。也即，將第五及第六樣本圖像分別輸入目標檢測網路，得到第五及第六樣本圖像的目標檢測結果；根據目標檢測結果及標注資訊，確定目標檢測網路的損失；根據損失反向調整目標檢測網路的參數；經過多輪反覆運算，在滿足預設條件（例如網路收斂）時，得到訓練後的目標檢測網路。In a possible implementation manner, the target detection network can be trained according to the fifth sample image and the sixth sample image. That is, input the fifth and sixth sample images into the target detection network respectively to obtain the target detection results of the fifth and sixth sample images; determine the loss of the target detection network according to the target detection results and the annotation information; The loss adjusts the parameters of the target detection network in reverse; after multiple rounds of repeated operations, the trained target detection network is obtained when the preset conditions (such as network convergence) are met.

通過這種方式，可顯著提高訓練後的目標檢測網路對於長尾圖像的檢測效果。In this way, the detection effect of the trained object detection network for long-tail images can be significantly improved.

在一種可能的實現方式中，在步驟S11之前，通過已標注的第三樣本圖像對所述目標檢測網路進行預訓練的步驟，也可以採用上述的重採樣訓練方式進行，從而提高目標檢測網路的預訓練效果。In a possible implementation manner, before step S11, the step of pre-training the target detection network by using the labeled third sample image can also be performed by the above-mentioned resampling training method, thereby improving target detection. The pre-training effect of the network.

在實際應用中，可重複步驟S11-S15的整個處理過程，實現持續的增量訓練。也就是說，當再次採集到未標注的樣本圖像時，可將本次訓練後的目標檢測網路作為初始的目標檢測網路，將本次擴展後的訓練集作為初始的訓練集，重複進行偽標注-特徵相關挖掘-重採樣訓練的處理過程，從而持續提升目標檢測網路的性能。In practical applications, the entire process of steps S11-S15 can be repeated to achieve continuous incremental training. That is to say, when the unlabeled sample images are collected again, the target detection network after this training can be used as the initial target detection network, and the expanded training set can be used as the initial training set, repeating Carry out the process of pseudo-annotation-feature correlation mining-resampling training, so as to continuously improve the performance of the target detection network.

圖2示出根據本發明實施例的網路訓練方法的處理過程的示意圖。如圖2所示，資料來源中包括大量未標注的第一樣本圖像20，將第一樣本圖像20輸入目標檢測網路中進行預測（predict），得到各個第一樣本圖像20的目標檢測結果21，包括第一樣本圖像中目標的圖像區域（未示出）、特徵向量及分類概率。FIG. 2 shows a schematic diagram of a processing procedure of a network training method according to an embodiment of the present invention. As shown in FIG. 2 , the data source includes a large number of unlabeled first sample images 20 , and the first sample images 20 are input into the target detection network for prediction (predict) to obtain each first sample image The target detection result 21 of 20 includes the image area (not shown), feature vector and classification probability of the target in the first sample image.

如圖2所示，在該示例中，目標檢測網路可包括CNN骨幹絡211、特徵圖金字塔網路（FPN）212以及全連接網路213，全連接網路213例如為bbox head。第一樣本圖像20輸入到目標檢測網路後，經由CNN骨幹絡211及FPN 212處理，得到第一樣本圖像的特徵圖214，特徵圖214經由全連接網路213處理，得到目標檢測結果21。As shown in FIG. 2 , in this example, the target detection network may include a CNN backbone network 211 , a feature map pyramid network (FPN) 212 , and a fully connected network 213 . The fully connected network 213 is, for example, a bbox head. After the first sample image 20 is input into the target detection network, it is processed by the CNN backbone network 211 and the FPN 212 to obtain a feature map 214 of the first sample image, and the feature map 214 is processed by the fully connected network 213 to obtain the target Test results 21.

在該示例中，可根據目標的分類概率，確定目標的類別置信度；對於類別置信度大於或等於第一閾值（例如為0.99）的第一目標，確定出這些第一目標所在的第一樣本圖像，作為第二樣本圖像22，並對第二樣本圖像22進行偽標注，也即將第一目標的圖像區域及與第一目標的類別置信度對應的類別作為第二樣本圖像22的標注資訊。將已標注的第二樣本圖像22加入到訓練集25中，從而實現對訓練集中正樣本的擴充。In this example, the category confidence of the target can be determined according to the classification probability of the target; for the first targets whose category confidence is greater than or equal to the first threshold (for example, 0.99), the first objects in which these first targets are located are determined. This image is taken as the second sample image 22, and the second sample image 22 is pseudo-labeled, that is, the image area of the first target and the category corresponding to the category confidence of the first target are taken as the second sample image Label information like 22. The labeled second sample image 22 is added to the training set 25, thereby realizing the expansion of the positive samples in the training set.

在該示例中，對於類別置信度小於第一閾值的第二目標，通過自舉法選擇出一定數量的第五目標，得到第五目標所在的樣本圖像23。根據訓練集中已標注的第三樣本圖像中的第三目標的特徵向量（未示出），對第五目標進行特徵相關挖掘，確定出第四目標以及第四目標所在的第一樣本圖像，作為第四樣本圖像24。對第四樣本圖像24進行人工標注，加入訓練集25中，從而實現對訓練集中已標注圖像的進一步擴充。In this example, for the second target whose category confidence is less than the first threshold, a certain number of fifth targets are selected by the bootstrapping method, and the sample image 23 where the fifth target is located is obtained. According to the feature vector (not shown) of the third target in the marked third sample image in the training set, perform feature correlation mining on the fifth target, and determine the fourth target and the first sample image where the fourth target is located image, as the fourth sample image 24. The fourth sample image 24 is manually labeled and added to the training set 25, so as to further expand the labeled images in the training set.

在該示例中，經兩次擴充後，訓練集25中包括已標注的第二樣本圖像、第三樣本圖像及第四樣本圖像。對訓練集25進行重採樣，平衡正負樣本的數量，以及不同類別的正樣本的數量，得到重採樣後的訓練集26；進而根據重採樣後的訓練集26，訓練目標檢測網路，從而完成整個處理過程。In this example, after two expansions, the training set 25 includes the labeled second sample image, the third sample image, and the fourth sample image. Resampling the training set 25, balancing the number of positive and negative samples, as well as the number of positive samples of different categories, to obtain a resampled training set 26; and then train the target detection network according to the resampled training set 26, thereby completing the entire process.

根據本發明的實施例，還提供了一種目標檢測方法，該方法包括：將待處理圖像輸入目標檢測網路中處理，得到所述待處理圖像的目標檢測結果，所述目標檢測結果包括所述待處理圖像中目標的位置和類別，所述目標檢測網路是根據上述的網路訓練方法訓練得到的。According to an embodiment of the present invention, a target detection method is also provided, the method comprising: Input the to-be-processed image into the target detection network for processing, and obtain the target detection result of the to-be-processed image, where the target detection result includes the position and category of the target in the to-be-processed image, and the target detection network It is trained according to the above-mentioned network training method.

也就是說，可將上述方法訓練得到的目標檢測網路進行部署，實現待處理圖像的目標檢測。待處理圖像可例如為圖像採集設備（例如攝影頭）採集的圖像，圖像中可能包括待檢測的目標，例如，人體、人臉、車輛、物體等。本發明實施例對此不作限制。That is to say, the target detection network trained by the above method can be deployed to realize the target detection of the image to be processed. The image to be processed may be, for example, an image collected by an image collection device (eg, a camera), and the image may include a target to be detected, such as a human body, a face, a vehicle, an object, and the like. This embodiment of the present invention does not limit this.

在一種可能的實現方式中，可將待處理圖像輸入目標檢測網路中處理，得到所述待處理圖像的目標檢測結果。該目標檢測結果包括待處理圖像中目標的位置和類別，例如待處理圖像中人臉所在的檢測框和人臉對應的身份。In a possible implementation manner, the to-be-processed image may be input into a target detection network for processing to obtain a target detection result of the to-be-processed image. The target detection result includes the position and category of the target in the image to be processed, such as the detection frame where the face in the image to be processed is located and the identity corresponding to the face.

通過這種方式，可提高目標檢測的檢測精度，實現大規模長尾圖像資料的目標檢測。In this way, the detection accuracy of target detection can be improved, and target detection of large-scale long-tail image data can be realized.

根據本發明實施例的網路訓練方法，利用主動學習挖掘方法來對潛在的無標注資料進行挖掘，利用半監督學習方法來對輔助對無標注資料進行標注，擴充正樣本資料的數量，從而解決大規模長尾檢測中資料規模大且正樣本收集困難的問題，並且，在一定程度上緩解了正負樣本之間的不均衡的問題。在有限的標注與計算資源環境下有效提升了模型性能。According to the network training method of the embodiment of the present invention, the active learning mining method is used to mine potential unlabeled data, the semi-supervised learning method is used to label the auxiliary unlabeled data, and the number of positive sample data is expanded, thereby solving the problem of In large-scale long-tail detection, the problem of large data size and difficulty in collecting positive samples, and to a certain extent, alleviates the problem of imbalance between positive and negative samples. The model performance is effectively improved in the environment of limited annotation and computing resources.

根據本發明實施例的網路訓練方法，採用重採樣的方式訓練目標檢測網路，能夠解決正負樣本不均衡對網路訓練的負面影響，並緩解正樣本不同類別之間不均衡對網路訓練的負面影響，使得目標檢測網路在訓練時能夠有效收斂並提高網路性能。According to the network training method of the embodiment of the present invention, the target detection network is trained by means of resampling, which can solve the negative impact of the imbalance of positive and negative samples on network training, and alleviate the imbalance between different categories of positive samples. The negative impact of the target detection network can effectively converge and improve the network performance during training.

根據本發明實施例的網路訓練方法，利用主動學習方法，可在巨量的未標注資料中，挖掘對於模型提升有説明的潛在高價值樣本，能夠在有限的標注與計算資源環境下有效提升模型性能，大量節省深度學習模型應用在新的業務上所需的人力以及計算成本；利用重採樣方法，能夠有效在樣本不均衡情況下訓練目標檢測網路，無需過多人工調參干預，節省深度學習模型應用在新的業務上所需的人力成本。According to the network training method of the embodiment of the present invention, using the active learning method, it is possible to mine potentially high-value samples for model improvement in a huge amount of unlabeled data, which can be effectively improved in the environment of limited labeling and computing resources Model performance, saving a lot of manpower and computing costs required for deep learning models to be applied to new businesses; using the resampling method, the target detection network can be effectively trained in the case of unbalanced samples, without too much manual parameter adjustment intervention, saving depth The labor cost required to apply the learning model to the new business.

根據本發明實施例的網路訓練方法，能夠應用于智慧視頻分析，安防等領域中，在有限的人工以及計算資源下，可以使用本方法線上上對智慧視頻分析或智慧監控中潛在的目標進行檢測，並對應用的檢測網路進行快速反覆運算提升，用較小的人力和計算成本快速達到業務所需的性能要求，並能夠在之後持續提升網路性能。The network training method according to the embodiment of the present invention can be applied to the fields of smart video analysis, security and other fields. With limited labor and computing resources, the method can be used to perform online training on potential targets in smart video analysis or smart monitoring. It can quickly and repeatedly improve the detection network of the application, and quickly meet the performance requirements required by the business with less labor and computing costs, and can continue to improve network performance in the future.

本發明實施例的網路訓練方法，可以應用于線上智慧視屏分析或智慧監控中，以在有限的人工以及計算資源下，線上上對於智慧視頻分析或者智慧監控中潛在的目標檢測應用進行快速反覆運算提升，從而用較小的人力和計算成本快速達到業務所需的性能要求，並能在之後繼續持續提升模型性能。The network training method of the embodiment of the present invention can be applied to online smart video analysis or smart monitoring, so as to quickly perform online and online potential target detection applications in smart video analysis or smart monitoring under limited labor and computing resources Iterative computing improves, so that the performance requirements required by the business can be quickly achieved with less labor and computing costs, and the model performance can be continuously improved after that.

可以理解，本發明實施例提及的上述各個方法實施例，在不違背原理邏輯的情況下，均可以彼此相互結合形成結合後的實施例，限於篇幅，本發明實施例不再贅述。本領域技術人員可以理解，在具體實施方式的上述方法中，各步驟的具體執行順序應當以其功能和可能的內在邏輯確定。It can be understood that the foregoing method embodiments mentioned in the embodiments of the present invention can be combined with each other to form a combined embodiment without violating the principle and logic. Those skilled in the art can understand that, in the above method of the specific embodiment, the specific execution order of each step should be determined by its function and possible internal logic.

此外，本發明實施例還提供了網路訓練裝置、目標檢測裝置、電子設備、電腦可讀儲存介質、程式，上述均可用來實現本發明實施例提供的任一種網路訓練方法或目標檢測方法，相應技術方案和描述和參見方法部分的相應記載，不再贅述。In addition, the embodiments of the present invention also provide a network training device, a target detection device, an electronic device, a computer-readable storage medium, and a program, all of which can be used to implement any network training method or target detection method provided by the embodiments of the present invention , the corresponding technical solutions and descriptions, and refer to the corresponding records in the method section, which will not be repeated.

圖3示出根據本發明實施例的網路訓練裝置的方塊圖，所述裝置包括處理器（圖3中未示出），所述處理器用於執行記憶體（圖3中未示出）中儲存的程式部分；如圖3所示，記憶體中儲存的程式部分包括：目標檢測部分31，被配置為將未標注的第一樣本圖像輸入目標檢測網路中處理，得到所述第一樣本圖像的目標檢測結果，所述目標檢測結果包括所述第一樣本圖像中目標的圖像區域、特徵資訊及分類概率；置信度確定部分32，被配置為根據所述目標的分類概率，確定所述目標的類別置信度；標注部分33，被配置為針對所述目標中類別置信度大於或等於第一閾值的第一目標，將所述第一目標所在的第一樣本圖像作為已標注的第二樣本圖像，並加入訓練集中，其中，所述第二樣本圖像的標注資訊包括所述第一目標的圖像區域及與所述第一目標的類別置信度對應的類別，所述訓練集中包括已標注的第三樣本圖像；特徵挖掘部分34，被配置為針對所述目標中類別置信度小於所述第一閾值的第二目標，根據所述第三樣本圖像中的第三目標的特徵資訊，對所述第二目標進行特徵相關挖掘，通過特徵相關挖掘，從所述第二目標中確定出第四目標及所述第四目標所在的第一樣本圖像，並將所述第四目標所在的第一樣本圖像作為第四樣本圖像，並加入所述訓練集中；訓練部分35，被配置為根據所述第四樣本圖像的標注資訊，所述訓練集中的第二樣本圖像、第三樣本圖像及所述第四樣本圖像，訓練所述目標檢測網路。FIG. 3 shows a block diagram of a network training apparatus including a processor (not shown in FIG. 3 ) for executing in a memory (not shown in FIG. 3 ) according to an embodiment of the present invention Stored program portion; as shown in Figure 3, the program portion stored in the memory includes: The target detection part 31 is configured to input the unlabeled first sample image into the target detection network for processing, and obtain the target detection result of the first sample image, and the target detection result includes the first sample image. The image area, feature information and classification probability of the target in the sample image; a confidence level determination part 32, configured to determine the category confidence level of the object according to the classification probability of the object; The labeling part 33 is configured to take the first sample image where the first target is located as the labeled second sample image for the first target whose category confidence is greater than or equal to the first threshold in the target, and added to the training set, wherein the annotation information of the second sample image includes the image area of the first target and the category corresponding to the category confidence of the first target, and the training set includes the marked the third sample image; The feature mining part 34 is configured to, for the second object whose category confidence is less than the first threshold in the object, according to the feature information of the third object in the third sample image, analyze the second object for the second object. Carry out feature correlation mining, through feature correlation mining, determine the fourth target and the first sample image where the fourth target is located from the second target, and use the first sample image where the fourth target is located. The image is used as the fourth sample image and added to the training set; The training part 35 is configured to train the target detection network according to the annotation information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set road.

在一種可能的實現方式中，所述裝置還包括：預訓練部分，被配置為通過已標注的第三樣本圖像對所述目標檢測網路進行預訓練。In a possible implementation manner, the apparatus further includes: a pre-training part, configured to pre-train the target detection network by using the labeled third sample image.

在一種可能的實現方式中，採樣數量確定子部分，還被配置為：在所述根據所述訓練集的正樣本圖像中目標的類別，分別確定從各個類別的正樣本圖像中採樣的第一數量之前，對訓練集中的正樣本圖像和負樣本圖像進行採樣，得到數量相同或相近的正樣本圖像和負樣本圖像。In a possible implementation manner, the sampling quantity determination sub-section is further configured to: in the category of the target according to the positive sample images of the training set, respectively determine the number of samples sampled from the positive sample images of each category. Before the first quantity, the positive sample images and the negative sample images in the training set are sampled to obtain the same or similar quantity of positive sample images and negative sample images.

在一種可能的實現方式中，所述目標選擇子部分，還被配置為：去除所述第六目標中與所述第七目標相同的目標，得到所述第六目標中與所述第七目標不同的剩餘目標；將所述剩餘目標和所述第七目標作為所述第五目標。In a possible implementation manner, the target selection sub-section is further configured to: remove the same target as the seventh target from the sixth target to obtain the sixth target and the seventh target Different residual goals; take the residual goal and the seventh goal as the fifth goal.

在一種可能的實現方式中，所述方法還包括：在所述根據第一類別的第三目標的特徵資訊與各個第五目標的特徵資訊之間的距離，分別確定所述第一類別的第三目標中與各個第五目標距離最小的第三目標，並作為第八目標之後，在所述第四目標所在的第一樣本圖像的數量達到所述第一類別的待挖掘的樣本圖像的第二數量時，結束對所述第八目標的確定。In a possible implementation manner, the method further includes: according to the distance between the feature information of the third target of the first category and the feature information of each fifth target, respectively determining the first category of the first category After the third target with the smallest distance from each fifth target among the three targets is used as the eighth target, after the number of the first sample images where the fourth target is located reaches the sample images to be mined of the first category When the second number of images is reached, the determination of the eighth target is ended.

在一種可能的實現方式中，所述目標及圖像確定子部分還被配置為：在所述根據第一類別的第三目標的特徵資訊與各個第五目標的特徵資訊之間的距離，分別確定所述第一類別的第三目標中與各個第五目標距離最小的第三目標，並作為第八目標之後，在所述第四目標所在的第一樣本圖像的數量未達到所述第一類別的待挖掘的樣本圖像的第二數量，且儲存所述第五目標的特徵資訊的集合為空時，結束對所述第八目標的確定。In a possible implementation manner, the target and image determination subsections are further configured to: the distance between the feature information of the third target according to the first category and the feature information of each fifth target, respectively After determining the third target with the smallest distance from each fifth target among the third targets of the first category and using it as the eighth target, the number of the first sample images where the fourth target is located does not reach the When the second quantity of sample images of the first category to be mined is empty, and the set of storing the feature information of the fifth target is empty, the determination of the eighth target is ended.

在一種可能的實現方式中，所述特徵提取部分，還被配置為：將所述第三樣本圖像，輸入所述目標檢測網路中，得到所述目標檢測網路的隱藏層輸出的特徵向量；將所述特徵向量確定為所述第三目標的特徵資訊。In a possible implementation manner, the feature extraction part is further configured to: input the third sample image into the target detection network to obtain features output by the hidden layer of the target detection network vector; determining the feature vector as feature information of the third target.

根據本發明的一方面，提供了一種目標檢測裝置，所述裝置包括：檢測處理部分，被配置為將待處理圖像輸入目標檢測網路中處理，得到所述待處理圖像的目標檢測結果，所述目標檢測結果包括所述待處理圖像中目標的位置和類別，所述目標檢測網路是根據上述的網路訓練方法訓練得到的。According to an aspect of the present invention, a target detection device is provided, the device includes: a detection processing part configured to input an image to be processed into a target detection network for processing, and obtain a target detection result of the to-be-processed image , the target detection result includes the position and category of the target in the to-be-processed image, and the target detection network is trained according to the above-mentioned network training method.

在一些實施例中，本發明實施例提供的裝置具有的功能或包含的部分可以被配置為執行上文方法實施例描述的方法，其具體實現可以參照上文方法實施例的描述，為了簡潔，這裡不再贅述。In some embodiments, the functions or included parts of the apparatus provided in the embodiments of the present invention may be configured to execute the methods described in the above method embodiments. For specific implementation, reference may be made to the above method embodiments. For brevity, I won't go into details here.

在一些實施例中，“部分”還可以是部分電路、部分處理器、部分程式或軟體等等，當然也可以是單元，還可以是模組也可以是非模組化的。In some embodiments, a "part" can also be a part of a circuit, a part of a processor, a part of a program or software, etc., of course, it can also be a unit, and it can also be a module or a non-module.

本發明實施例還提出一種電腦可讀儲存介質，其上儲存有電腦程式指令，所述電腦程式指令被處理器執行時實現上述方法。電腦可讀儲存介質可以是非易失性電腦可讀儲存介質。An embodiment of the present invention further provides a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned method is implemented. The computer-readable storage medium may be a non-volatile computer-readable storage medium.

本發明實施例還提出一種電子設備，包括：處理器；被配置為儲存處理器可執行指令的記憶體；其中，所述處理器被配置為調用所述記憶體儲存的指令，以執行上述方法。An embodiment of the present invention further provides an electronic device, including: a processor; a memory configured to store instructions executable by the processor; wherein the processor is configured to call the instructions stored in the memory to execute the above method .

本發明實施例還提供了一種電腦程式產品，包括電腦可讀代碼，當電腦可讀代碼在設備上運行時，設備中的處理器執行被配置為實現如上任一實施例提供的網路訓練方法或目標檢測方法的指令。An embodiment of the present invention also provides a computer program product, including computer-readable code, when the computer-readable code is run on a device, a processor in the device executes a network training method configured to implement the network training method provided in any of the above embodiments Or directives for object detection methods.

本發明實施例還提供了另一種電腦程式產品，被配置為儲存電腦可讀指令，指令被執行時使得電腦執行上述任一實施例提供的網路訓練方法或目標檢測方法的操作。Embodiments of the present invention also provide another computer program product configured to store computer-readable instructions, and when the instructions are executed, the computer executes the operations of the network training method or the target detection method provided by any of the above embodiments.

電子設備可以被提供為終端、伺服器或其它形態的設備。The electronic device may be provided as a terminal, server or other form of device.

圖4示出根據本發明實施例的一種電子設備800的方塊圖。例如，電子設備800可以是行動電話，電腦，數位廣播終端，消息收發設備，遊戲控制台，平板設備，醫療設備，健身設備，個人數位助理等終端。FIG. 4 shows a block diagram of an electronic device 800 according to an embodiment of the present invention. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.

參照圖4，電子設備800可以包括以下一個或多個組件：處理組件802，記憶體804，電源組件806，多媒體組件808，音頻組件810，輸入/輸出（I/ O）的介面812，感測器組件814，以及通信組件816。4, an electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensing server component 814, and communication component 816.

處理組件802通常控制電子設備800的整體操作，諸如與顯示，電話呼叫，資料通信，相機操作和記錄操作相關聯的操作。處理組件802可以包括一個或多個處理器820來執行指令，以完成上述的方法的全部或部分步驟。此外，處理組件802可以包括一個或多個模組，便於處理組件802和其他組件之間的交互。例如，處理組件802可以包括多媒體模組，以方便多媒體組件808和處理組件802之間的交互。The processing component 802 generally controls the overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 can include one or more processors 820 to execute instructions to perform all or some of the steps of the methods described above. Additionally, processing component 802 may include one or more modules to facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.

記憶體804被配置為儲存各種類型的資料以支援在電子設備800的操作。這些資料的示例包括用於在電子設備800上操作的任何應用程式或方法的指令，連絡人資料，電話簿資料，消息，圖片，視頻等。記憶體804可以由任何類型的易失性或非易失性存放裝置或者它們的組合實現，如靜態隨機存取記憶體（SRAM），電可擦除可程式設計唯讀記憶體（EEPROM），可擦除可程式設計唯讀記憶體（EPROM），可程式設計唯讀記憶體（PROM），唯讀記憶體（ROM），磁記憶體，快閃記憶體，磁片或光碟。The memory 804 is configured to store various types of data to support the operation of the electronic device 800 . Examples of such data include instructions for any application or method operating on electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. Memory 804 may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or CD.

電源組件806為電子設備800的各種組件提供電力。電源組件806可以包括電源管理系統，一個或多個電源，及其他與為電子設備800生成、管理和分配電力相關聯的組件。Power supply assembly 806 provides power to various components of electronic device 800 . Power supply components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 800 .

多媒體組件808包括在所述電子設備800和使用者之間的提供一個輸出介面的螢幕。在一些實施例中，螢幕可以包括液晶顯示器（LCD）和觸摸面板（TP）。如果螢幕包括觸摸面板，螢幕可以被實現為觸控式螢幕，以接收來自使用者的輸入信號。觸摸面板包括一個或多個觸摸感測器以感測觸摸、滑動和觸摸面板上的手勢。所述觸摸感測器可以不僅感測觸摸或滑動動作的邊界，而且還檢測與所述觸摸或滑動操作相關的持續時間和壓力。在一些實施例中，多媒體組件808包括一個前置攝影頭和/或後置攝影頭。當電子設備800處於操作模式，如拍攝模式或視訊模式時，前置攝影頭和/或後置攝影頭可以接收外部的多媒體資料。每個前置攝影頭和後置攝影頭可以是一個固定的光學透鏡系統或具有焦距和光學變焦能力。Multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 808 includes a front-facing camera and/or a rear-facing camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

音頻組件810被配置為輸出和/或輸入音頻信號。例如，音頻組件810包括一個麥克風（MIC），當電子設備800處於操作模式，如呼叫模式、記錄模式和語音辨識模式時，麥克風被配置為接收外部音頻信號。所接收的音頻信號可以被進一步儲存在記憶體804或經由通信組件816發送。在一些實施例中，音頻組件810還包括一個揚聲器，用於輸出音頻信號。Audio component 810 is configured to output and/or input audio signals. For example, audio component 810 includes a microphone (MIC) that is configured to receive external audio signals when electronic device 800 is in operating modes, such as call mode, recording mode, and voice recognition mode. The received audio signal may be further stored in memory 804 or transmitted via communication component 816 . In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

I/O介面812為處理組件802和週邊介面模組之間提供介面，上述週邊介面模組可以是鍵盤，點擊輪，按鈕等。這些按鈕可包括但不限於：主頁按鈕、音量按鈕、啟動按鈕和鎖定按鈕。The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules. The peripheral interface modules may be keyboards, click wheels, buttons, and the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

感測器組件814包括一個或多個感測器，用於為電子設備800提供各個方面的狀態評估。例如，感測器組件814可以檢測到電子設備800的打開/關閉狀態，組件的相對定位，例如所述組件為電子設備800的顯示器和小鍵盤，感測器組件814還可以檢測電子設備800或電子設備800一個組件的位置改變，使用者與電子設備800接觸的存在或不存在，電子設備800方位或加速/減速和電子設備800的溫度變化。感測器組件814可以包括接近感測器，被配置用來在沒有任何的物理接觸時檢測附近物體的存在。感測器組件814還可以包括光感測器，如互補金屬氧化物半導體（CMOS）或電荷耦合裝置（CCD）圖像感測器，用於在成像應用中使用。在一些實施例中，該感測器組件814還可以包括加速度感測器，陀螺儀感測器，磁感測器，壓力感測器或溫度感測器。Sensor assembly 814 includes one or more sensors for providing various aspects of status assessment for electronic device 800 . For example, the sensor assembly 814 can detect the open/closed state of the electronic device 800, the relative positioning of the components, such as the display and keypad of the electronic device 800, the sensor assembly 814 can also detect the electronic device 800 or Changes in the position of a component of the electronic device 800 , presence or absence of user contact with the electronic device 800 , orientation or acceleration/deceleration of the electronic device 800 and changes in the temperature of the electronic device 800 . Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 814 may also include a light sensor, such as a complementary metal oxide semiconductor (CMOS) or charge coupled device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

通信組件816被配置為便於電子設備800和其他設備之間有線或無線方式的通信。電子設備800可以接入基於通信標準的無線網路，如無線網路（WiFi），第二代移動通信技術（2G）或第三代移動通信技術（3G），或它們的組合。在一個示例性實施例中，通信組件816經由廣播通道接收來自外部廣播管理系統的廣播信號或廣播相關資訊。在一個示例性實施例中，所述通信組件816還包括近場通信（NFC）模組，以促進短程通信。例如，在NFC模組可基於射頻識別（RFID）技術，紅外資料協會（IrDA）技術，超寬頻（UWB）技術，藍牙（BT）技術和其他技術來實現。Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as wireless network (WiFi), second generation mobile communication technology (2G) or third generation mobile communication technology (3G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性實施例中，電子設備800可以被一個或多個應用專用積體電路（ASIC）、數位訊號處理器（DSP）、數位信號處理設備（DSPD）、可程式設計邏輯器件（PLD）、現場可程式設計閘陣列（FPGA）、控制器、微控制器、微處理器或其他電子組件實現，用於執行上述方法。In an exemplary embodiment, electronic device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), Field Programmable Gate Array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the above method.

在示例性實施例中，還提供了一種非易失性電腦可讀儲存介質，例如包括電腦程式指令的記憶體804，上述電腦程式指令可由電子設備800的處理器820執行以完成上述方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 804 including computer program instructions executable by the processor 820 of the electronic device 800 to accomplish the above method.

圖5示出根據本發明實施例的一種電子設備1900的方塊圖。例如，電子設備1900可以被提供為一伺服器。參照圖5，電子設備1900包括處理組件1922，其進一步包括一個或多個處理器，以及由記憶體1932所代表的記憶體資源，被配置為儲存可由處理組件1922的執行的指令，例如應用程式。記憶體1932中儲存的應用程式可以包括一個或一個以上的每一個對應於一組指令的模組。此外，處理組件1922被配置為執行指令，以執行上述方法。FIG. 5 shows a block diagram of an electronic device 1900 according to an embodiment of the present invention. For example, the electronic device 1900 may be provided as a server. 5, electronic device 1900 includes processing component 1922, which further includes one or more processors, and a memory resource represented by memory 1932 configured to store instructions executable by processing component 1922, such as applications . An application program stored in memory 1932 may include one or more modules, each corresponding to a set of instructions. Additionally, the processing component 1922 is configured to execute instructions to perform the above-described methods.

電子設備1900還可以包括一個電源組件1926被配置為執行電子設備1900的電源管理，一個有線或無線網路介面1950被配置為將電子設備1900連接到網路，和一個輸入輸出（I/O）介面1958。電子設備1900可以操作基於儲存在記憶體1932的作業系統，例如微軟伺服器作業系統（Windows Server^TM ），蘋果公司推出的基於圖形化使用者介面作業系統(Mac OS X^TM )，多使用者多進程的電腦作業系統（Unix^TM ），自由和開放原代碼的類Unix作業系統（Linux^TM ），開放原代碼的類Unix作業系統（FreeBSD^TM ）或類似。The electronic device 1900 may also include a power supply assembly 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input output (I/O) Interface 1958. The electronic device 1900 can operate an operating system based on the memory 1932, such as Microsoft Server Operating System (Windows Server ^TM ), a graphical user interface based operating system (Mac OS X ^TM ) introduced by Apple Inc. Process Computer Operating System (Unix ^™ ), Free and Open Source Unix-like Operating System (Linux ^™ ), Open Source Unix-like Operating System (FreeBSD ^™ ) or the like.

在示例性實施例中，還提供了一種非易失性電腦可讀儲存介質，例如包括電腦程式指令的記憶體1932，上述電腦程式指令可由電子設備1900的處理組件1922執行以完成上述方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 1932 including computer program instructions executable by the processing component 1922 of the electronic device 1900 to accomplish the above method.

本發明實施例可以是系統、方法和/或電腦程式產品。電腦程式產品可以包括電腦可讀儲存介質，其上載有用於使處理器實現本發明實施例的各個方面的電腦可讀程式指令。Embodiments of the present invention may be systems, methods and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the embodiments of the present invention.

電腦可讀儲存介質可以是可以保持和儲存由指令執行設備使用的指令的有形設備。電腦可讀儲存介質例如可以是(但不限於)電存放裝置、磁存放裝置、光存放裝置、電磁存放裝置、半導體存放裝置或者上述的任意合適的組合。電腦可讀儲存介質的更具體的例子（非窮舉的列表）包括：可擕式電腦盤、硬碟、隨機存取記憶體（RAM）、唯讀記憶體（ROM）、可擦式可程式設計唯讀記憶體（EPROM或快閃記憶體）、靜態隨機存取記憶體（SRAM）、可擕式壓縮磁碟唯讀記憶體（CD-ROM）、數位多功能盤（DVD）、記憶棒、軟碟、機械編碼設備、例如其上儲存有指令的打孔卡或凹槽內凸起結構、以及上述的任意合適的組合。這裡所使用的電腦可讀儲存介質不被解釋為暫態信號本身，諸如無線電波或者其他自由傳播的電磁波、通過波導或其他傳輸媒介傳播的電磁波（例如，通過光纖電纜的光脈衝）、或者通過電線傳輸的電信號。A computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable Design read only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory stick , a floppy disk, a mechanically encoded device, such as a punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the foregoing. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or Electrical signals carried by wires.

這裡所描述的電腦可讀程式指令可以從電腦可讀儲存介質下載到各個計算/處理設備，或者通過網路、例如網際網路、局域網、廣域網路和/或無線網下載到外部電腦或外部存放裝置。網路可以包括銅傳輸電纜、光纖傳輸、無線傳輸、路由器、防火牆、交換機、閘道電腦和/或邊緣伺服器。每個計算/處理設備中的網路介面卡或者網路介面從網路接收電腦可讀程式指令，並轉發該電腦可讀程式指令，以供儲存在各個計算/處理設備中的電腦可讀儲存介質中。The computer-readable program instructions described herein may be downloaded from computer-readable storage media to various computing/processing devices, or downloaded to external computers or external storage over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network device. Networks may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. A network interface card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for computer-readable storage stored in each computing/processing device in the medium.

用於執行本發明操作的電腦程式指令可以是彙編指令、指令集架構（ISA）指令、機器指令、機器相關指令、微代碼、固件指令、狀態設置資料、或者以一種或多種程式設計語言的任意組合編寫的原始程式碼或目標代碼，所述程式設計語言包括物件導向的程式設計語言—諸如Smalltalk、C++等，以及常規的過程式程式設計語言—諸如“C”語言或類似的程式設計語言。電腦可讀程式指令可以完全地在使用者電腦上執行、部分地在使用者電腦上執行、作為一個獨立的套裝軟體執行、部分在使用者電腦上部分在遠端電腦上執行、或者完全在遠端電腦或伺服器上執行。在涉及遠端電腦的情形中，遠端電腦可以通過任意種類的網路—包括局域網(LAN)或廣域網路(WAN)—連接到使用者電腦，或者，可以連接到外部電腦（例如利用網際網路服務提供者來通過網際網路連接）。在一些實施例中，通過利用電腦可讀程式指令的狀態資訊來個性化定制電子電路，例如可程式設計邏輯電路、現場可程式設計閘陣列（FPGA）或可程式設計邏輯陣列（PLA），該電子電路可以執行電腦可讀程式指令，從而實現本發明的各個方面。The computer program instructions for carrying out the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or any other information in one or more programming languages. Combining source or object code written in programming languages including object-oriented programming languages, such as Smalltalk, C++, etc., and conventional procedural programming languages, such as the "C" language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely remotely. run on a client computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network—including a local area network (LAN) or a wide area network (WAN)—or, can be connected to an external computer (for example, using the Internet road service provider to connect via the Internet). In some embodiments, electronic circuits, such as programmable logic circuits, field programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), are personalized by utilizing state information of computer readable program instructions. Electronic circuits may execute computer readable program instructions to implement various aspects of the present invention.

這裡參照根據本發明實施例的方法、裝置（系統）和電腦程式產品的流程圖和/或方塊圖描述了本發明的各個方面。應當理解，流程圖和/或方塊圖的每個方塊以及流程圖和/或方塊圖中各方塊的組合，都可以由電腦可讀程式指令實現。Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

這些電腦可讀程式指令可以提供給通用電腦、專用電腦或其它可程式設計資料處理裝置的處理器，從而生產出一種機器，使得這些指令在通過電腦或其它可程式設計資料處理裝置的處理器執行時，產生了實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作的裝置。也可以把這些電腦可讀程式指令儲存在電腦可讀儲存介質中，這些指令使得電腦、可程式設計資料處理裝置和/或其他設備以特定方式工作，從而，儲存有指令的電腦可讀介質則包括一個製造品，其包括實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作的各個方面的指令。These computer readable program instructions may be provided to the processor of a general purpose computer, special purpose computer or other programmable data processing device to produce a machine for execution of the instructions by the processor of the computer or other programmable data processing device When, means are created that implement the functions/acts specified in one or more of the blocks in the flowchart and/or block diagrams. These computer readable program instructions may also be stored on a computer readable storage medium, the instructions causing the computer, programmable data processing device and/or other equipment to operate in a particular manner, so that the computer readable medium storing the instructions Included is an article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

也可以把電腦可讀程式指令載入到電腦、其它可程式設計資料處理裝置、或其它設備上，使得在電腦、其它可程式設計資料處理裝置或其它設備上執行一系列操作步驟，以產生電腦實現的過程，從而使得在電腦、其它可程式設計資料處理裝置、或其它設備上執行的指令實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作。Computer readable program instructions can also be loaded into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to generate a computer Processes of implementation such that instructions executing on a computer, other programmable data processing apparatus, or other device implement the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

附圖中的流程圖和方塊圖顯示了根據本發明的多個實施例的系統、方法和電腦程式產品的可能實現的體系架構、功能和操作。在這點上，流程圖或方塊圖中的每個方塊可以代表一個模組、程式段或指令的一部分，所述模組、程式段或指令的一部分包含一個或多個用於實現規定的邏輯功能的可執行指令。在有些作為替換的實現中，方塊中所標注的功能也可以以不同於附圖中所標注的順序發生。例如，兩個連續的方塊實際上可以基本並行地執行，它們有時也可以按相反的循序執行，這依所涉及的功能而定。也要注意的是，方塊圖和/或流程圖中的每個方塊、以及方塊圖和/或流程圖中的方塊的組合，可以用執行規定的功能或動作的專用的基於硬體的系統來實現，或者可以用專用硬體與電腦指令的組合來實現。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that contains one or more logic for implementing the specified logic Executable instructions for the function. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by dedicated hardware-based systems that perform the specified functions or actions. implementation, or may be implemented in a combination of special purpose hardware and computer instructions.

該電腦程式產品可以具體通過硬體、軟體或其結合的方式實現。在一個可選實施例中，所述電腦程式產品具體體現為電腦儲存介質，在另一個可選實施例中，電腦程式產品具體體現為軟體產品，例如軟體發展包(Software Development Kit，SDK)等等。The computer program product can be implemented by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.

以上已經描述了本發明的各實施例，上述說明是示例性的，並非窮盡性的，並且也不限於所披露的各實施例。在不偏離所說明的各實施例的範圍和精神的情況下，對於本技術領域的普通技術人員來說許多修改和變更都是顯而易見的。本文中所用術語的選擇，旨在最好地解釋各實施例的原理、實際應用或對市場中的技術的改進，或者使本技術領域的其它普通技術人員能理解本文披露的各實施例。Various embodiments of the present invention have been described above, and the foregoing descriptions are exemplary, not exhaustive, and not limiting of the disclosed embodiments. Numerous modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the various embodiments, the practical application or improvement over the technology in the marketplace, or to enable others of ordinary skill in the art to understand the various embodiments disclosed herein.

工業實用性本發明實施例涉及一種網路訓練方法、目標檢測方法、電子設備和電腦可讀儲存介質。該網路訓練方法包括：將未標注的樣本圖像輸入目標檢測網路中處理，得到目標檢測結果，該結果包括目標的圖像區域、特徵資訊及分類概率；根據目標的分類概率，確定目標的類別置信度；針對類別置信度大於或等於閾值的第一目標，將第一目標所在的樣本圖像作為已標注圖像並加入訓練集；針對類別置信度小於第一閾值的第二目標，對第二目標進行特徵相關挖掘，從第二目標中確定出第四目標，將其所在的樣本圖像並加入訓練集；根據訓練集中的樣本圖像訓練目標檢測網路。本發明實施例可提高目標檢測網路的訓練效果。Industrial Applicability Embodiments of the present invention relate to a network training method, a target detection method, an electronic device, and a computer-readable storage medium. The network training method includes: inputting unlabeled sample images into a target detection network for processing to obtain a target detection result, the result including the image area, feature information and classification probability of the target; determining the target according to the classification probability of the target For the first target whose category confidence is greater than or equal to the threshold, the sample image where the first target is located is taken as the marked image and added to the training set; for the second target whose category confidence is less than the first threshold, The feature correlation mining is performed on the second target, the fourth target is determined from the second target, and the sample image where it is located is added to the training set; the target detection network is trained according to the sample image in the training set. The embodiment of the present invention can improve the training effect of the target detection network.

20:第一樣本圖像 21:目標檢測結果 211:CNN骨幹絡 212:特徵圖金字塔網路 213:全連接網路 214:特徵圖 22:第二樣本圖像 23:第五目標所在的樣本圖像 24:第四樣本圖像 25:訓練集 26:重採樣後的訓練集 31:目標檢測部分 32:置信度確定部分 33:標注部分 34:特徵挖掘部分 35:訓練部分 800:電子設備 802:處理組件 804:記憶體 806:電源組件 808:多媒體組件 810:音頻組件 812:輸入/輸出介面 814:感測器組件 816:通信組件 820:處理器 1900:電子設備 1922:處理組件 1926:電源組件 1932:記憶體 1950:網路介面 1958:輸入輸出介面 S11~S15:步驟20: First sample image 21: Target detection results 211: CNN backbone network 212: Feature Map Pyramid Network 213: Fully Connected Network 214: Feature Map 22: Second sample image 23: The sample image where the fifth target is located 24: Fourth sample image 25: Training set 26: Resampled training set 31: Target detection part 32: Confidence determination part 33: Labeling part 34: Feature Mining Section 35: Training Section 800: Electronics 802: Process component 804: memory 806: Power Components 808: Multimedia Components 810: Audio Components 812: Input/Output Interface 814: Sensor Assembly 816: Communication Components 820: Processor 1900: Electronic equipment 1922: Processing components 1926: Power Components 1932: Memory 1950: Web Interface 1958: Input and output interface S11~S15: Steps

此處的附圖被併入說明書中並構成本說明書的一部分，這些附圖示出了符合本發明的實施例，並與說明書一起用於說明本發明實施例的技術方案。圖1示出根據本發明實施例的網路訓練方法的流程圖。圖2示出根據本發明實施例的網路訓練方法的處理過程的示意圖。圖3示出根據本發明實施例的網路訓練裝置的方塊圖。圖4示出根據本發明實施例的一種電子設備的方塊圖。圖5示出根據本發明實施例的一種電子設備的方塊圖。The accompanying drawings herein are incorporated into the specification and constitute a part of the specification, and these drawings illustrate embodiments consistent with the present invention, and together with the description, serve to explain the technical solutions of the embodiments of the present invention. FIG. 1 shows a flowchart of a network training method according to an embodiment of the present invention. FIG. 2 shows a schematic diagram of a processing procedure of a network training method according to an embodiment of the present invention. FIG. 3 shows a block diagram of a network training apparatus according to an embodiment of the present invention. FIG. 4 shows a block diagram of an electronic device according to an embodiment of the present invention. FIG. 5 shows a block diagram of an electronic device according to an embodiment of the present invention.

S11~S15:步驟S11~S15: Steps

Claims

A network training method, comprising: Input the unlabeled first sample image into the target detection network for processing, and obtain the target detection result of the first sample image, and the target detection result includes the image of the target in the first sample image. like regions, feature information, and classification probabilities; According to the classification probability of the target, determine the category confidence of the target; For the first target whose category confidence is greater than or equal to the first threshold in the target, the first sample image where the first target is located is taken as the marked second sample image, and added to the training set, wherein, The labeling information of the second sample image includes an image area of the first target and a class corresponding to the class confidence of the first target, and the training set includes the labelled third sample image; For the second target whose category confidence in the target is less than the first threshold, according to the feature information of the third target in the third sample image, feature correlation mining is performed on the second target. Digging, determining the fourth target and the first sample image where the fourth target is located from the second target, and using the first sample image where the fourth target is located as the fourth sample image , and join the training set; The target detection network is trained according to the annotation information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set.

The method according to claim 1, wherein, according to the annotation information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set, Train the target detection network, including: According to the category of the target in the positive sample images of the training set, respectively determine the first number of samples sampled from the positive sample images of each category, and the positive sample images are sample images including the target in the image; Sampling the positive sample images of each category according to the first quantity sampled in the positive sample images of each category to obtain a plurality of fifth sample images; Sampling the negative sample images of the training set to obtain a plurality of sixth sample images, where the negative sample images are sample images that do not include the target in the image; The target detection network is trained according to the fifth sample image and the sixth sample image.

The method according to claim 1 or 2, wherein the feature correlation mining is performed on the second target according to the feature information of the third target in the third sample image, and the feature correlation mining is performed from the Determine the fourth target and the first sample image where the fourth target is located in the second target, including: determining the information entropy of the second target according to the classification probability of the second target; selecting a fifth target from the second targets according to the category confidence and information entropy of the second target; According to the category of the third target in the third sample image and the total number of sample images to be mined, respectively determine the second number of sample images to be mined in each category; According to the feature information of the third target in the third sample image, the feature information of the fifth target and the second number of sample images to be mined in each category, a fourth target is determined from the fifth target The target and the first sample image where the fourth target is located.

The method according to claim 3, wherein selecting a fifth target from the second target according to the category confidence and information entropy of the second target, comprising: According to the category confidence and information entropy of the second targets, the second targets are sorted respectively, and the third number of sixth targets and the fourth number of seventh targets are selected; The sixth target and the seventh target are combined to obtain the fifth target.

The method according to claim 3, wherein according to the category of the third target in the third sample image and the total number of sample images to be mined, the second category of the sample images to be mined in each category is determined respectively. quantity, including: According to the category of the third object in the third sample image, determine the proportion of the third object of each category; According to the proportion of the third target of each category, determine the sampling proportion of each category; According to the sampling proportion of each category, the second quantity of sample images to be mined in each category is determined respectively.

The method according to claim 3, wherein, according to the feature information of the third target in the third sample image, the feature information of the fifth target and the second number of sample images to be mined in each category, The fourth target and the first sample image where the fourth target is located are determined from the fifth target, including: According to the distance between the feature information of the third target of the first category and the feature information of each fifth target, the third target with the smallest distance from each fifth target among the third targets of the first category is respectively determined, and used as The eighth target, the first category is any one of the categories of the third target; The target with the largest distance among the eighth targets is determined as the fourth target.

The method according to claim 6, wherein, according to the feature information of the third target in the third sample image, the feature information of the fifth target and the second number of sample images to be mined in each category, Determining a fourth target and a first sample image where the fourth target is located from the fifth target, further comprising: The determined fourth object is added to the third object of the first category, and the determined fourth object is removed from the unlabeled fifth object.

The method according to claim 1 or 2, further comprising: The third sample image is input into the target detection network for processing to obtain feature information of the third target in the third sample image.

The method according to claim 1 or 2, wherein before the step of inputting the unlabeled first sample image into a target detection network for processing to obtain the target detection result of the first sample image , the method also includes: The target detection network is pre-trained through the labeled third sample image.

The method of claim 1 or 2, wherein the first sample image comprises a long-tail image.

The method according to claim 2, wherein, before the step of respectively determining the first number of samples sampled from the positive sample images of each category according to the category of the target in the positive sample images of the training set, the The methods described include: Sampling the positive sample images and negative sample images in the training set to obtain the same or similar number of positive sample images and negative sample images.

The method according to claim 3, wherein the total number of sample images to be mined is 5% to 25% of the total number of the first sample images.

The method according to claim 4, wherein the combining the sixth target and the seventh target to obtain the fifth target includes: Remove the same target as the seventh target in the sixth target, and obtain the remaining target that is different from the seventh target in the sixth target; The remaining target and the seventh target are taken as the fifth target.

The method according to claim 6, wherein according to the distance between the characteristic information of the third target of the first category and the characteristic information of each fifth target, the third targets of the first category are respectively determined After the third target with the smallest distance from each fifth target and serving as the eighth target, the method further includes: When the number of the first sample images where the fourth target is located reaches the second number of the sample images to be mined of the first category, the determination of the eighth target is ended.

The method according to claim 7, wherein according to the distance between the characteristic information of the third target of the first category and the characteristic information of each fifth target, respectively determine the third targets of the first category After the third target with the smallest distance from each fifth target and serving as the eighth target, the method further includes: When the number of the first sample images where the fourth target is located does not reach the second number of sample images to be mined in the first category, and the set storing the feature information of the fifth target is empty , ending the determination of the eighth target.

The method according to claim 8, wherein inputting the third sample image into the target detection network for processing to obtain feature information of the third target in the third sample image includes: Inputting the third sample image into the target detection network to obtain the feature vector output by the hidden layer of the target detection network; The feature vector is determined as feature information of the third target.

A target detection method, the method comprising: Input the to-be-processed image into a target detection network for processing, and obtain a target detection result of the to-be-processed image, where the target detection result includes the position and category of the target in the to-be-processed image; The target detection network is trained according to the network training method of any one of request items 1 to 10.

An electronic device comprising: processor; memory configured to store processor-executable instructions; Wherein, the processor is configured to call the instructions stored in the memory to execute the network training method described in any one of request items 1 to 16, or to execute the target detection method described in request item 17.

A computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, implement the method network training method described in any one of claim items 1 to 16, or implement the method described in claim item 17. The target detection method described above.