TW202205127A

TW202205127A - Target detection method, electronic equipment and computer-readable storage medium

Info

Publication number: TW202205127A
Application number: TW110120819A
Authority: TW
Inventors: 劉李洋; 王波超; 曠章輝; 陳益民; 張偉
Original assignee: 大陸商深圳市商湯科技有限公司
Priority date: 2020-07-30
Filing date: 2021-06-08
Publication date: 2022-02-01
Also published as: TWI785638B; CN111898676B; CN111898676A; WO2022021901A1

Abstract

The present disclosure relates to a target detection method, an electronic equipment and a computer-readable storage medium. The method includes: constructing a detection network of the target category; using the detection network of the target category to detect images to be detected to obtain the target detection result of the image to be detected; wherein the parameters of the detection network of the target category are obtained by inputting the training image of the target category into the parameter generation network. The embodiments of the present disclosure facilitate the dynamic addition of new categories.

Description

Object detection method, electronic device and computer-readable storage medium

本發明關於電腦技術領域，尤其關於一種目標檢測方法、電子設備和電腦可讀儲存介質。The present invention relates to the field of computer technology, and in particular, to a target detection method, an electronic device and a computer-readable storage medium.

在相關技術中，目標檢測依賴於大規模的訓練資料，需要大量的人力物力對訓練資料進行搜集和標注，且對象的類別越多，標注成本也就越高。在一些特定的場景下，資料的搜集也很困難，從而造成樣本圖像的數量較少。且實際應用場景中，需求往往是動態變化的，可能需要動態地增加檢測類別，而增加的類別的樣本圖像的數量可能較少。In related technologies, target detection relies on large-scale training data, requiring a lot of manpower and material resources to collect and label the training data, and the more categories of objects, the higher the labeling cost. In some specific scenarios, data collection is also difficult, resulting in a small number of sample images. In practical application scenarios, the requirements are often dynamic, and detection categories may need to be dynamically increased, and the number of sample images of the increased categories may be small.

本發明提出了一種目標檢測方法、電子設備和電腦可讀儲存介質。The present invention provides a target detection method, an electronic device and a computer-readable storage medium.

根據本發明的一方面，提供了一種目標檢測方法，包括：構建目標類別的檢測網路；採用所述目標類別的檢測網路對待檢測圖像進行檢測，得到所述待檢測圖像的目標檢測結果；其中，所述目標類別的檢測網路的參數是將目標類別的訓練圖像輸入參數生成網路中而得到的。According to an aspect of the present invention, a target detection method is provided, comprising: Build a detection network for the target category; Use the detection network of the target category to detect the image to be detected, and obtain the target detection result of the image to be detected; The parameters of the detection network of the target category are obtained by inputting the training images of the target category into the parameter generation network.

在一種可能的實現方式中，所述方法還包括：從圖像集中獲取一個或多個目標訓練集，其中，每個目標訓練集包括K個類別的訓練圖像，每個類別包括M個訓練圖像，K為大於0的整數；基於各目標訓練集，訓練所述參數生成網路。In a possible implementation, the method further includes: Obtain one or more target training sets from the image set, wherein each target training set includes training images of K categories, each category includes M training images, and K is an integer greater than 0; Based on each target training set, the parameter generation network is trained.

通過較少的樣本訓練參數生成網路，可以方便的獲取到檢測網路的參數，進而方便的構建出樣本數量較少的類別的檢測網路。By training the parameter generation network with fewer samples, the parameters of the detection network can be easily obtained, and then a detection network with a smaller number of samples can be easily constructed.

在一種可能的實現方式中，所述M個訓練圖像包括N個支援圖像和O個查詢圖像，N和O為大於0的整數；所述基於各目標訓練集，訓練所述參數生成網路，包括：針對每個目標訓練集：將該目標訓練集的各支援圖像輸入待訓練的參數生成網路，得到該目標訓練集的通用檢測網路的參數，並根據該通用檢測網路的參數，構建該目標訓練集的通用檢測網路；將該目標訓練集的各查詢圖像輸入待訓練的特徵提取網路，得到該目標訓練集的各查詢圖像的特徵圖；將所述各查詢圖像的特徵圖分別輸入所述通用檢測網路，得到所述各查詢圖像的預測標籤分佈結果；根據所述各查詢圖像的預測標籤分佈結果和真值標籤，確定所述通用檢測網路的檢測損失；根據所述通用檢測網路的檢測損失，訓練所述待訓練的參數生成網路。In a possible implementation manner, the M training images include N support images and O query images, and N and O are integers greater than 0; the parameter generation is trained based on each target training set Internet, including: For each target training set: Input each support image of the target training set into the parameter generation network to be trained, obtain the parameters of the general detection network of the target training set, and construct the general detection network of the target training set according to the parameters of the general detection network network; Input each query image of the target training set into a feature extraction network to be trained to obtain a feature map of each query image of the target training set; Inputting the feature maps of each query image into the general detection network, respectively, to obtain a predicted label distribution result of each query image; Determine the detection loss of the general detection network according to the predicted label distribution result and the true value label of each query image; The parameter generation network to be trained is trained according to the detection loss of the general detection network.

通過通用檢測網路的檢測損失，可以快速實現收斂，從而快速完成參數生成網路的訓練。Through the detection loss of the general detection network, convergence can be achieved quickly, so that the training of the parameter generation network can be quickly completed.

在一種可能的實現方式中，所述將該目標訓練集的各支援圖像輸入待訓練的參數生成網路，得到該目標訓練集的通用檢測網路的參數，包括：將該目標訓練集的各支援圖像分別輸入待訓練的參數生成網路，得到每個支援圖像對應的檢測網路的參數；根據各支援圖像對應的檢測網路的參數和各支援圖像的真實類別，確定該目標訓練集的每個類別的檢測網路的參數；根據該目標訓練集的各類別的檢測網路的參數，確定該目標訓練集的通用檢測網路的參數。In a possible implementation manner, the inputting each support image of the target training set into a parameter generation network to be trained, to obtain the parameters of the general detection network of the target training set, including: Input each support image of the target training set into the parameter generation network to be trained respectively, and obtain the parameters of the detection network corresponding to each support image; According to the parameters of the detection network corresponding to each support image and the real category of each support image, determine the parameters of the detection network of each category of the target training set; According to the parameters of each type of detection network of the target training set, the parameters of the general detection network of the target training set are determined.

通過基於多個支援圖像獲取目標訓練集的通用檢測網路的參數，可以提高通用檢測網路的參數的準確性。By acquiring the parameters of the general detection network of the target training set based on the multiple support images, the accuracy of the parameters of the general detection network can be improved.

在一種可能的實現方式中，所述方法還包括：根據所述通用檢測網路的檢測損失，訓練所述待訓練的特徵提取網路。In a possible implementation, the method further includes: The feature extraction network to be trained is trained according to the detection loss of the general detection network.

通過通用檢測網路的檢測損失訓練特徵提取網路，可以提高特徵提取網路的特徵區分能力。By training the feature extraction network with the detection loss of the general detection network, the feature discrimination ability of the feature extraction network can be improved.

在一種可能的實現方式中，所述根據所述通用檢測網路的檢測損失，訓練所述待訓練的特徵提取網路，包括：獲取目標訓練集的參考檢測網路；將所述各查詢圖像的特徵圖分別輸入所述參考檢測網路，得到所述各查詢圖像的參考標籤分佈結果；根據所述各查詢圖像的參考標籤分佈結果和真值標籤，確定所述參考檢測網路的檢測損失；根據所述通用檢測網路的檢測損失和所述參考檢測網路的檢測損失，訓練所述待訓練的特徵提取網路。In a possible implementation manner, the training of the feature extraction network to be trained according to the detection loss of the general detection network includes: Obtain the reference detection network of the target training set; inputting the feature maps of the query images into the reference detection network respectively, to obtain the reference label distribution results of the query images; Determine the detection loss of the reference detection network according to the reference label distribution result and the true value label of each query image; The feature extraction network to be trained is trained according to the detection loss of the general detection network and the detection loss of the reference detection network.

通過通用檢測網路的檢測損失和所述參考檢測網路的檢測損失訓練特徵提取網路，可以實現多樣本對少樣本的指導，進一步提升特徵提取網路提取特徵的特徵區分能力。By training the feature extraction network with the detection loss of the general detection network and the detection loss of the reference detection network, the guidance of more samples to fewer samples can be realized, and the feature discrimination ability of the feature extraction network can be further improved.

在一種可能的實現方式中，獲取該目標訓練集的參考檢測網路的參數，包括：獲取隨機初始化的檢測網路；基於該目標訓練集的所有查詢圖像對所述隨機初始化的檢測網路進行訓練；將訓練完成的檢測網路的參數，確定為該目標訓練集的參考檢測網路。In a possible implementation manner, the parameters of the reference detection network of the target training set are obtained, including: Get a randomly initialized detection network; training the randomly initialized detection network based on all query images in the target training set; The parameters of the trained detection network are determined as the reference detection network of the target training set.

通過獲取參考檢測網路，可以對通用檢測網路進行指導，使得少樣本訓練得到的通用檢測網路更加接近多樣本訓練得到的參考檢測網路，縮小少樣本帶來的損失。By obtaining the reference detection network, the general detection network can be guided, so that the general detection network obtained by training with few samples is closer to the reference detection network obtained by training with many samples, and the loss caused by the few samples can be reduced.

在一種可能的實現方式中，所述根據所述通用檢測網路的檢測損失，訓練所述待訓練的參數生成網路，包括：根據該目標訓練集的通用檢測網路的參數和該目標訓練集的參考檢測網路的參數，確定所述通用檢測網路的差距損失；根據所述通用檢測網路的檢測損失和差距損失，訓練所述待訓練的參數生成網路的參數。In a possible implementation manner, the training of the parameter generation network to be trained according to the detection loss of the general detection network includes: According to the parameters of the general detection network of the target training set and the parameters of the reference detection network of the target training set, determine the gap loss of the general detection network; According to the detection loss and gap loss of the general detection network, the parameters to be trained are trained to generate the parameters of the network.

這樣，根據所述通用檢測網路的檢測損失和差距損失，共同訓練所述待訓練的參數生成網路的參數，可以使基於參數生成網路得到的檢測網路的準確度更高。In this way, jointly training the parameters of the parameter generation network to be trained according to the detection loss and gap loss of the general detection network can make the detection network obtained based on the parameter generation network more accurate.

在一種可能的實現方式中，所述方法還包括：確定所述通用檢測網路的正交化損失；根據所述通用檢測網路的正交化損失，訓練所述待訓練的參數生成網路。In a possible implementation, the method further includes: determining an orthogonalization loss for the generic detection network; The parameter generation network to be trained is trained according to the orthogonalization loss of the general detection network.

通過使不同類別的檢測網路之間彼此正交，可以提升模型的區分能力。By making the detection networks of different classes orthogonal to each other, the discriminative ability of the model can be improved.

在一種可能的實現方式中，所述構建目標類別的檢測網路，包括：獲取所述目標類別的訓練圖像；將所述目標類別的各訓練圖像分別輸入所述參數生成網路中，得到所述目標類別的每個訓練樣本對應的檢測網路的參數；根據所述目標類別的每個訓練樣本對應的檢測網路的參數，確定所述目標類別的檢測網路的參數；根據所述目標類別的檢測網路的參數，構建所述目標類別的檢測網路。In a possible implementation manner, the construction of the detection network of the target category includes: obtaining training images of the target category; Input each training image of the target category into the parameter generation network respectively, to obtain the parameters of the detection network corresponding to each training sample of the target category; Determine the parameters of the detection network of the target category according to the parameters of the detection network corresponding to each training sample of the target category; According to the parameters of the detection network of the target category, the detection network of the target category is constructed.

根據本發明的一方面，提供了一種目標檢測裝置，包括：構建模組，配置為構建目標類別的檢測網路；檢測模組，配置為採用所述目標類別的檢測網路對待檢測圖像進行檢測，得到所述待檢測圖像的目標檢測結果；其中，所述目標類別的檢測網路的參數是基於目標類別的訓練圖像輸入參數生成網路中而得到的。According to an aspect of the present invention, a target detection device is provided, comprising: Build a module, configured to build a detection network for the target category; a detection module, configured to use the detection network of the target category to detect the to-be-detected image to obtain a target detection result of the to-be-detected image; Wherein, the parameters of the detection network of the target category are obtained from the input parameter generation network based on the training image of the target category.

在一種可能的實現方式中，所述裝置還包括：獲取模組，配置為從圖像集中獲取一個或多個目標訓練集，其中，每個目標訓練集包括K個類別的訓練圖像，每個類別包括M個訓練圖像，K為大於0的整數；第一訓練模組，配置為基於各目標訓練集，訓練所述參數生成網路。In a possible implementation, the apparatus further includes: The acquisition module is configured to acquire one or more target training sets from the image set, wherein each target training set includes training images of K categories, each category includes M training images, and K is greater than 0 integer; The first training module is configured to train the parameter generation network based on each target training set.

在一種可能的實現方式中，所述M個訓練圖像包括N個支援圖像和O個查詢圖像，N和O為大於0的整數；所述第一訓練模組還配置為：針對每個目標訓練集：將該目標訓練集的各支援圖像輸入待訓練的參數生成網路，得到該目標訓練集的通用檢測網路的參數，並根據該通用檢測網路的參數，構建該目標訓練集的通用檢測網路；將該目標訓練集的各查詢圖像輸入待訓練的特徵提取網路，得到該目標訓練集的各查詢圖像的特徵圖；將所述各查詢圖像的特徵圖分別輸入所述通用檢測網路，得到所述各查詢圖像的預測標籤分佈結果；根據所述各查詢圖像的預測標籤分佈結果和真值標籤，確定所述通用檢測網路的檢測損失；根據所述通用檢測網路的檢測損失，訓練所述待訓練的參數生成網路。In a possible implementation manner, the M training images include N support images and O query images, where N and O are integers greater than 0; the first training module is further configured as: For each target training set: Input each support image of the target training set into the parameter generation network to be trained, obtain the parameters of the general detection network of the target training set, and construct the general detection network of the target training set according to the parameters of the general detection network network; Input each query image of the target training set into a feature extraction network to be trained to obtain a feature map of each query image of the target training set; Inputting the feature maps of each query image into the general detection network, respectively, to obtain a predicted label distribution result of each query image; Determine the detection loss of the general detection network according to the predicted label distribution result and the true value label of each query image; The parameter generation network to be trained is trained according to the detection loss of the general detection network.

在一種可能的實現方式中，所述第一訓練模組還配置為：將該目標訓練集的各支援圖像分別輸入待訓練的參數生成網路，得到每個支援圖像對應的檢測網路的參數；根據各支援圖像對應的檢測網路的參數和各支援圖像的真實類別，確定該目標訓練集的每個類別的檢測網路的參數；根據該目標訓練集的各類別的檢測網路的參數，確定該目標訓練集的通用檢測網路的參數。In a possible implementation manner, the first training module is further configured as: Input each support image of the target training set into the parameter generation network to be trained respectively, and obtain the parameters of the detection network corresponding to each support image; According to the parameters of the detection network corresponding to each support image and the real category of each support image, determine the parameters of the detection network of each category of the target training set; According to the parameters of each type of detection network of the target training set, the parameters of the general detection network of the target training set are determined.

在一種可能的實現方式中，所述裝置還包括：第二訓練模組，配置為根據所述通用檢測網路的檢測損失，訓練所述待訓練的特徵提取網路。In a possible implementation, the apparatus further includes: The second training module is configured to train the feature extraction network to be trained according to the detection loss of the general detection network.

在一種可能的實現方式中，所述第二訓練模組還配置為：獲取目標訓練集的參考檢測網路；將所述各查詢圖像的特徵圖分別輸入所述參考檢測網路，得到所述各查詢圖像的參考標籤分佈結果；根據所述各查詢圖像的參考標籤分佈結果和真值標籤，確定所述參考檢測網路的檢測損失；根據所述通用檢測網路的檢測損失和所述參考檢測網路的檢測損失，訓練所述待訓練的特徵提取網路。In a possible implementation manner, the second training module is further configured as: Obtain the reference detection network of the target training set; inputting the feature maps of the query images into the reference detection network respectively, to obtain the reference label distribution results of the query images; Determine the detection loss of the reference detection network according to the reference label distribution result and the true value label of each query image; The feature extraction network to be trained is trained according to the detection loss of the general detection network and the detection loss of the reference detection network.

在一種可能的實現方式中，所述第一訓練模組還配置為：根據該目標訓練集的通用檢測網路的參數和該目標訓練集的參考檢測網路的參數，確定所述通用檢測網路的差距損失；根據所述通用檢測網路的檢測損失和差距損失，訓練所述待訓練的參數生成網路的參數。In a possible implementation manner, the first training module is further configured as: According to the parameters of the general detection network of the target training set and the parameters of the reference detection network of the target training set, determine the gap loss of the general detection network; According to the detection loss and gap loss of the general detection network, the parameters to be trained are trained to generate the parameters of the network.

在一種可能的實現方式中，所述裝置還包括：確定模組，配置為確定所述通用檢測網路的正交化損失；第三訓練模組，配置為根據所述通用檢測網路的正交化損失，訓練所述待訓練的參數生成網路。In a possible implementation, the apparatus further includes: a determination module configured to determine the orthogonalization loss of the universal detection network; The third training module is configured to train the parameter generation network to be trained according to the orthogonalization loss of the general detection network.

在一種可能的實現方式中，所述構建模組還配置為：獲取所述目標類別的訓練圖像；將所述目標類別的各訓練圖像分別輸入所述參數生成網路中，得到所述目標類別的每個訓練樣本對應的檢測網路的參數；根據所述目標類別的每個訓練樣本對應的檢測網路的參數，確定所述目標類別的檢測網路的參數；根據所述目標類別的檢測網路的參數，構建所述目標類別的檢測網路。In a possible implementation, the building module is further configured to: obtaining training images of the target category; Input each training image of the target category into the parameter generation network respectively, to obtain the parameters of the detection network corresponding to each training sample of the target category; Determine the parameters of the detection network of the target category according to the parameters of the detection network corresponding to each training sample of the target category; According to the parameters of the detection network of the target category, the detection network of the target category is constructed.

根據本發明的一方面，提供了一種電子設備，包括：處理器；配置為儲存處理器可執行指令的記憶體；其中，所述處理器被配置為調用所述記憶體儲存的指令，以執行上述方法。According to an aspect of the present invention, an electronic device is provided, comprising: a processor; a memory configured to store instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.

根據本發明的一方面，提供了一種電腦可讀儲存介質，其上儲存有電腦程式指令，所述電腦程式指令被處理器執行時實現上述方法。According to an aspect of the present invention, there is provided a computer-readable storage medium having computer program instructions stored thereon, the computer program instructions implementing the above method when executed by a processor.

在本發明實施例中，對於訓練圖像數量較少的目標類別，可以先通過參數生成網路得到目標類別的檢測網路的參數，然後根據該參數構建目標類別的檢測網路，從而實現目標類別的目標檢測。這樣，既降低了訓練圖像的標注成本，又降低了採用少量訓練圖像直接訓練檢測網路而帶來的過擬合的風險。進一步的，本發明實施例有利於動態增加新的類別。In the embodiment of the present invention, for the target category with a small number of training images, the parameters of the detection network of the target category can be obtained through the parameter generation network first, and then the detection network of the target category can be constructed according to the parameters, so as to achieve the target Class object detection. In this way, the labeling cost of training images is reduced, and the risk of overfitting caused by directly training the detection network with a small number of training images is reduced. Further, the embodiments of the present invention are conducive to dynamically adding new categories.

應當理解的是，以上的一般描述和後文的細節描述僅是示例性和解釋性的，而非限制本發明。根據下面參考附圖對示例性實施例的詳細說明，本發明的其它特徵及方面將變得清楚。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention. Other features and aspects of the present invention will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

以下將參考附圖詳細說明本發明的各種示例性實施例、特徵和方面。附圖中相同的附圖標記表示功能相同或相似的元件。儘管在附圖中示出了實施例的各種方面，但是除非特別指出，不必按比例繪製附圖。Various exemplary embodiments, features and aspects of the present invention will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures denote elements that have the same or similar functions. While various aspects of the embodiments are shown in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.

在這裡專用的詞“示例性”意為“用作例子、實施例或說明性”。這裡作為“示例性”所說明的任何實施例不必解釋為優於或好於其它實施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

本文中術語“和/或”，僅僅是一種描述關聯對象的關聯關係，表示可以存在三種關係，例如，A和/或B，可以表示：單獨存在A，同時存在A和B，單獨存在B這三種情況。另外，本文中術語“至少一種”表示多種中的任意一種或多種中的至少兩種的任意組合，例如，包括A、B、C中的至少一種，可以表示包括從A、B和C構成的集合中選擇的任意一個或多個元素。The term "and/or" in this article is only an association relationship to describe associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. three conditions. In addition, the term "at least one" herein refers to any combination of any one of a plurality or at least two of a plurality, for example, including at least one of A, B, and C, and may mean including those composed of A, B, and C. Any one or more elements selected in the collection.

另外，為了更好地說明本發明，在下文的具體實施方式中給出了眾多的具體細節。本領域技術人員應當理解，沒有某些具體細節，本發明同樣可以實施。在一些實例中，對於本領域技術人員熟知的方法、手段、元件和電路未作詳細描述，以便於凸顯本發明的主旨。In addition, in order to better illustrate the present invention, numerous specific details are given in the following detailed description. It will be understood by those skilled in the art that the present invention may be practiced without certain specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present invention.

目標檢測是電腦視覺裡面的經典問題，主要判斷圖像中是否包含某一類對象，如果包含還需要給出每個對象的位置。目標檢測是圖像內容理解的基石，是很多更加複雜視覺理解任務的基礎，如跟蹤識別、實例分割、場景分類和事件檢測等。隨著技術的發展，目標檢測在實際生活中有著廣泛的引用，如人臉識別、自動駕駛、安防布控和娛樂互動等。通常來說，讓檢測網路學習新的類別，需要大量該類別的圖像。然而在實際應用中，可能並不能獲取到大量新類別的圖像。例如，攝影師拍攝到一種珍惜的小鳥（或者罕見的場景、深海動物等）後，科研人員在研究的過程中可能需要從大量鳥類的圖像中，檢測是否出現過該類小鳥。此時，如果以人工的方式去確認大量鳥類的圖像中是否出現過該類小鳥，非常費時費力。因此，需要通過一個檢測網路進行該類小鳥的目標檢測。由於這類小鳥的數量較少，已確認包含該類小鳥的圖像也較少，因此無法直接通過已確認包含該類小鳥的圖片，訓練出能夠準確對該類小鳥進行目標檢測的檢測網路。而本發明實施例提供的目標檢測方法，可以基於少量的包含該類小鳥的圖像，構建出較為準確的對該類小鳥進行目標檢測的檢測網路。這樣，就可以對大量鳥類的圖像進行目標檢測，確定是否出現過該類小鳥。Object detection is a classic problem in computer vision. It mainly determines whether an image contains a certain type of object, and if it does, the position of each object needs to be given. Object detection is the cornerstone of image content understanding and the basis for many more complex visual understanding tasks, such as tracking recognition, instance segmentation, scene classification, and event detection. With the development of technology, object detection has been widely used in real life, such as face recognition, automatic driving, security control and entertainment interaction. Typically, for a detection network to learn a new class, a large number of images of that class are required. However, in practical applications, a large number of images of new categories may not be obtained. For example, after a photographer captures a rare bird (or a rare scene, a deep-sea animal, etc.), researchers may need to detect whether such a bird has appeared from a large number of bird images during the research process. At this time, it is very time-consuming and labor-intensive to manually confirm whether such small birds have appeared in the images of a large number of birds. Therefore, it is necessary to detect the target of this kind of bird through a detection network. Due to the small number of such birds and the few images confirmed to contain such birds, it is not possible to directly train a detection network that can accurately detect objects of such birds through images that have been confirmed to contain such birds. . The target detection method provided by the embodiment of the present invention can construct a relatively accurate detection network for detecting the target of the small bird based on a small number of images containing the small bird. In this way, object detection can be performed on images of a large number of birds to determine whether such small birds have appeared.

圖1示出根據本發明實施例的目標檢測方法的流程圖。如圖1所示，所述目標檢測方法可以包括：步驟S11，構建目標類別的檢測網路。步驟S12，採用所述目標類別的檢測網路對待檢測圖像進行檢測，得到所述待檢測圖像的目標檢測結果。FIG. 1 shows a flowchart of a target detection method according to an embodiment of the present invention. As shown in Figure 1, the target detection method may include: Step S11, constructing a detection network of the target category. Step S12, using the detection network of the target category to detect the image to be detected, to obtain a target detection result of the image to be detected.

其中，所述目標類別的檢測網路的參數是將目標類別的訓練圖像輸入參數生成網路中而得到的。The parameters of the detection network of the target category are obtained by inputting the training images of the target category into the parameter generation network.

可以理解的是，目標類別也可以為具有訓練圖像數量較多的類別，本發明實施例提供的目標檢測方法同樣可以適用於具有訓練圖像數量較多的類別。It can be understood that the target category can also be a category with a large number of training images, and the target detection method provided by the embodiment of the present invention can also be applied to a category with a large number of training images.

在一種可能的實現方式中，所述目標檢測方法可以由終端設備或伺服器等電子設備執行，終端設備可以為使用者設備（User Equipment，UE）、移動設備、使用者終端、終端、蜂窩電話、無線電話、個人數位助理（Personal Digital Assistant，PDA）、手持設備、計算設備、車載設備、可穿戴設備等，所述方法可以通過處理器調用記憶體中儲存的電腦可讀指令的方式來實現。或者，可通過伺服器執行所述方法。In a possible implementation manner, the target detection method may be executed by an electronic device such as a terminal device or a server, and the terminal device may be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a cellular phone , wireless phones, personal digital assistants (Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc., the method can be implemented by the processor calling the computer-readable instructions stored in the memory . Alternatively, the method may be performed by a server.

在步驟S11中，目標類別可以表示待進行目標檢測的類別。在一個示例中，目標類別可以為具有訓練圖像數量較少的類別，例如，目標類別可以為具有一個或幾個訓練圖像的類別。在實際應用場景中，目標類別可以為動態增加的新類別。In step S11, the target category may represent a category to be subjected to target detection. In one example, the target class may be a class with a small number of training images, eg, the target class may be a class with one or several training images. In practical application scenarios, the target category can be a dynamically added new category.

檢測網路可以表示用於進行目標檢測的網路。在一個示例中，檢測網路的結構可以為能夠進行無候選框（Anchor-Free）的目標檢測的網路，例如FCOS（Full Convolutional One Stage Object Detection，全卷積一級目標檢測）網路。本發明實施例對檢測網路不做限制。A detection network may represent a network used for object detection. In one example, the structure of the detection network may be a network capable of performing anchor-free object detection, such as an FCOS (Full Convolutional One Stage Object Detection, fully convolutional one-stage target detection) network. The embodiment of the present invention does not limit the detection network.

目標類別的檢測網路可以表示用於對目標類別進行目標檢測的網路。也就是說，通過目標類別的檢測網路可以檢測出待檢測圖像中是否存在目標類別的對象。在本發明實施例中，可以首先獲取目標類別的檢測網路的參數，然後基於該目標類別的檢測網路的參數，構建目標類別的檢測網路。其中，目標類別的檢測網路的參數是基於目標類別的訓練圖像輸入參數生成網路中而得到的。A detection network for object classes may represent a network for object detection for object classes. That is to say, the detection network of the target category can detect whether there is an object of the target category in the image to be detected. In this embodiment of the present invention, the parameters of the detection network of the target category may be obtained first, and then the detection network of the target category may be constructed based on the parameters of the detection network of the target category. Among them, the parameters of the detection network of the target category are obtained from the input parameter generation network based on the training image of the target category.

參數生成網路可以用於生成檢測網路的參數。參數生成網路以訓練圖像作為輸入，以檢測網路的參數作為輸出，將目標類別的訓練圖像輸入參數生成網路中，可以得到目標類別的檢測網路的參數。本發明實施例對參數生成網路的結構不做限制。The parameter generation network can be used to generate the parameters of the detection network. The parameter generation network takes the training image as input, the parameter of the detection network as the output, and the training image of the target category is input into the parameter generation network, and the parameters of the detection network of the target category can be obtained. This embodiment of the present invention does not limit the structure of the parameter generation network.

在一種可能的實現方式中，構建檢測網路的過程為：首先，獲取所述目標類別的訓練圖像，將所述目標類別的各訓練圖像分別輸入所述參數生成網路中，得到所述目標類別的每個訓練樣本對應的檢測網路的參數；然後，根據所述目標類別的每個訓練樣本對應的檢測網路的參數，確定所述目標類別的檢測網路的參數；最後，根據所述目標類別的檢測網路的參數，構建所述目標類別的檢測網路。In a possible implementation manner, the process of constructing the detection network is as follows: first, obtain the training images of the target category, input each training image of the target category into the parameter generation network, and obtain the the parameters of the detection network corresponding to each training sample of the target category; then, according to the parameters of the detection network corresponding to each training sample of the target category, determine the parameters of the detection network of the target category; finally, According to the parameters of the detection network of the target category, the detection network of the target category is constructed.

在本發明實施例中，可以將目標類別的各訓練圖像分別輸入參數生成網路中，得到目標類別的每個訓練圖像對應的檢測網路的參數。由於這些訓練圖像均屬於目標類別，因此，可以根據這些訓練圖像對應的檢測網路的參數，確定目標類別的檢測網路的參數。在一個示例中，可以將目標類別的訓練圖像對應的檢測網路的參數進行平均，將平均後的檢測網路的參數確定為目標類別的檢測網路的參數。在又一示例中，可以首先，根據訓練圖像中目的地區域（目標類別的對象所在區域）的位置或者大小等資訊，確定目標類別的各訓練圖像的權重資訊；然後，基於權重資訊，對目標類別的各訓練圖像對應的檢測網路的參數進行加權平均，將加權平均後的檢測網路的參數確定為目標類別的檢測網路的參數。In the embodiment of the present invention, each training image of the target category can be input into the parameter generation network respectively, and the parameters of the detection network corresponding to each training image of the target category can be obtained. Since these training images all belong to the target category, the parameters of the detection network of the target category can be determined according to the parameters of the detection network corresponding to these training images. In one example, the parameters of the detection network corresponding to the training images of the target category may be averaged, and the averaged parameters of the detection network may be determined as the parameters of the detection network of the target category. In yet another example, first, the weight information of each training image of the target category may be determined according to information such as the position or size of the destination area (the area where the object of the target category is located) in the training image; then, based on the weight information, The parameters of the detection network corresponding to each training image of the target category are weighted and averaged, and the parameters of the detection network after the weighted average are determined as the parameters of the detection network of the target category.

在獲取到檢測網路的參數之後，可以基於檢測網路的結構，構建出相應的檢測網路。也就是說，在獲取了目標類別的檢測網路的參數之後，可以基於檢測網路的結構，構建出目標類別的檢測網路。After acquiring the parameters of the detection network, a corresponding detection network can be constructed based on the structure of the detection network. That is, after acquiring the parameters of the detection network of the target category, the detection network of the target category can be constructed based on the structure of the detection network.

在一種可能的實現方式中，可以將參數設置為目標類別的檢測網路的參數的檢測網路，直接確定為目標類別的檢測網路。這樣，在將目標類別的訓練圖像輸入參數生成網路後，即可方便、快捷的得到目標類別的檢測網路。In a possible implementation manner, the parameters may be set as the detection network of the detection network of the target category, and the detection network of the target category may be directly determined as the detection network of the target category. In this way, after the training images of the target category are input into the parameters to generate the network, the detection network of the target category can be obtained conveniently and quickly.

在一種可能的實現方式中，可以先將參數設置為目標類別的檢測網路的參數的檢測網路，確定為目標類別的初始化檢測網路；然後，對該初始化檢測網路進行微調，得到為目標類別的檢測網路。在一個示例中，可以通過損失最小化對初始化檢測網路進行微調。這裡的損失可以包括初始化檢測網路的檢測損失和正交損失。其中，初始化檢測網路的檢測損失可以根據目標類別的訓練圖像輸入初始化檢測網路後輸出的預測標籤分佈結果和對應的真值標籤確定。In a possible implementation manner, the parameters of the detection network of the target category may be set as the parameters of the detection network first, and the detection network may be determined as the initialization detection network of the target category; then, the initialization detection network may be fine-tuned to obtain Detection network for object classes. In one example, the initial detection network can be fine-tuned by loss minimization. The loss here can include detection loss and quadrature loss for initializing the detection network. Among them, the detection loss of the initialized detection network can be determined according to the predicted label distribution result and the corresponding ground-truth label output after the input of the training image of the target category to the initialized detection network.

這樣，可以在較短的時間內得到優化的檢測網路，從而提升了目標類別的檢測網路的準確性。In this way, an optimized detection network can be obtained in a shorter time, thereby improving the accuracy of the detection network of the target category.

在步驟S12中，可以將待檢測圖像輸入目標類別的檢測網路得到待檢測圖像的目標檢測結果。一個示例中，目標檢測結果可以包括待檢測圖像為目標類別的概率以及待檢測圖像中目標類別的對象的位置資訊。In step S12, the image to be detected may be input into the detection network of the target category to obtain the target detection result of the image to be detected. In one example, the target detection result may include the probability that the image to be detected is of the target category and the position information of the object of the target category in the image to be detected.

在本發明實施例中，首先基於參數生成網路，得到目標類別的檢測網路的參數，然後根據目標類別的檢測網路的參數，構建出目標類別的檢測網路，從而實現了目標類別上的目標檢測。參數生成網路是本發明實施例的目標檢測方法的重要工具。下面對參數生成網路的訓練過程進行說明。In the embodiment of the present invention, a network is first generated based on parameters to obtain the parameters of the detection network of the target category, and then the detection network of the target category is constructed according to the parameters of the detection network of the target category, thereby realizing the detection network of the target category. target detection. The parameter generation network is an important tool of the target detection method of the embodiment of the present invention. The training process of the parameter generation network is described below.

在一種可能的實現方式中，參數生成網路的訓練過程可以包括：從圖像集中獲取一個或多個目標訓練集；並基於各目標訓練集，訓練所述參數生成網路。In a possible implementation manner, the training process of the parameter generation network may include: acquiring one or more target training sets from an image set; and training the parameter generation network based on each target training set.

以圖像集包括C個類別（稱為C個基類）的訓練圖像，每個類別包括A個訓練圖像為例進行說明。從圖像集中獲取一個目標訓練集的過程可以包括：從C個類別中隨機選取K個類別，並從每個類別的A個訓練圖像中，隨機選取M個訓練圖像。此時，目標訓練集包括K個類別的訓練圖像，每個類別包括M個訓練圖像。重複該過程，則可以從圖像集中獲取到多個目標訓練集。The image set includes training images of C categories (called C base categories), and each category includes A training images as an example for illustration. The process of obtaining a target training set from the image set may include: randomly selecting K categories from C categories, and randomly selecting M training images from A training images of each category. At this time, the target training set includes training images of K categories, and each category includes M training images. By repeating this process, multiple target training sets can be obtained from the image set.

其中，C、A、K、M為大於0的整數，且C＞K，A＞M。Wherein, C, A, K, and M are integers greater than 0, and C>K, A>M.

K和M的數量可以根據需要進行設置。由於本發明實施例的目標檢測方法要解決的是訓練圖像數量較少的類別的檢測問題，因此，本發明實施例在訓練參數生成網路時，採用的類別數量較少，每個類別的訓練圖像的數量也較少。在一個示例中，K可以取5；M可以取11、15或者20等。而相應的C可以為1000或者2000等，A可以取5000或者10000等。可以理解的是，本發明實施例提供的訓練參數生成網路的過程對數量較多的類別同樣適用，因此，每個類別的訓練圖像的數量可以較多，M還可以取500或者1000等。The number of K and M can be set as required. Since the target detection method of the embodiment of the present invention needs to solve the problem of detecting categories with a small number of training images, the embodiment of the present invention adopts a small number of categories when training the parameter generation network, and the number of categories for each category is small. The number of training images is also smaller. In one example, K may be 5; M may be 11, 15, or 20, etc. The corresponding C can be 1000 or 2000, etc., and A can be 5000 or 10000. It can be understood that the process of generating the network for training parameters provided by the embodiment of the present invention is also applicable to a large number of categories, therefore, the number of training images of each category can be large, and M can also be 500 or 1000, etc. .

需要說明的是，針對目標訓練集的每個類別，該類別包括的M個訓練圖像可以包括N個支援圖像和O個查詢圖像，N和O為大於0的整數，且M≥N+O。在M=N+O的情況下，針對目標訓練集的每個類別，可以隨機從該類別的M個訓練圖像中，選取N個訓練圖像作為支援圖像，並將該類別剩下的訓練圖像作為查詢圖像。在M＞N+O的情況下，針對目標訓練集的每個類別，可以隨機從類別的M個訓練圖像中，選取N個訓練圖像作為支援圖像，並從該類別剩下的訓練圖像中隨機選取O個訓練圖像作為查詢圖像。It should be noted that, for each category of the target training set, the M training images included in the category may include N support images and O query images, where N and O are integers greater than 0, and M≥N +O. In the case of M=N+O, for each category of the target training set, N training images can be randomly selected from the M training images of this category as support images, and the rest of the category can be selected as support images. training images as query images. In the case of M>N+O, for each category of the target training set, N training images can be randomly selected from the M training images of the category as support images, and the remaining training images of the category can be selected from the M training images of the category. O training images are randomly selected as query images.

下面以一個目標訓練集為例，對參數生成網路的訓練過程進行說明。採用多個目標訓練集訓練參數生成網路的過程，實際上是多次重複採用一個目標訓練集訓練參數生成網路的過程，這裡不再贅述。The following takes a target training set as an example to illustrate the training process of the parameter generation network. The process of using multiple target training sets to train parameters to generate a network is actually a process of repeatedly using one target training set to train parameters to generate a network, which will not be repeated here.

在一種可能的實現方式中，基於一個目標訓練集，訓練參數生成網路，可以包括：首先，將該目標訓練集的各支援圖像輸入待訓練的參數生成網路，得到該目標訓練集的通用檢測網路的參數，並根據該通用檢測網路的參數，構建該目標訓練集的通用檢測網路；其次，將該目標訓練集的各查詢圖像輸入待訓練的特徵提取網路，得到該目標訓練集的各查詢圖像的特徵圖；再次，將所述各查詢圖像的特徵圖分別輸入所述通用檢測網路，得到所述各查詢圖像的預測標籤分佈結果；最後，根據所述各查詢圖像的預測標籤分佈結果和真值標籤，確定所述通用檢測網路的檢測損失，並根據所述通用檢測網路的檢測損失，訓練所述待訓練的參數生成網路。In a possible implementation manner, training a parameter generation network based on a target training set may include: first, input each support image of the target training set into the parameter generation network to be trained, and obtain the target training set The parameters of the general detection network, and according to the parameters of the general detection network, the general detection network of the target training set is constructed; secondly, each query image of the target training set is input into the feature extraction network to be trained, and the result is obtained The feature map of each query image in the target training set; thirdly, input the feature map of each query image into the general detection network to obtain the predicted label distribution result of each query image; finally, according to The predicted label distribution result and true value label of each query image determine the detection loss of the general detection network, and train the parameter generation network to be trained according to the detection loss of the general detection network.

其中，將該目標訓練集的各支援圖像輸入待訓練的參數生成網路，得到該目標訓練集的通用檢測網路的參數，可以包括：將該目標訓練集的各支援圖像分別輸入待訓練的參數生成網路，得到每個支援圖像對應的檢測網路的參數；根據各支援圖像對應的檢測網路的參數和各支援圖像的真實類別，確定該目標訓練集的每個類別的檢測網路的參數；並根據該目標訓練集的各類別的檢測網路的參數，確定該目標訓練集的通用檢測網路的參數。Wherein, inputting each support image of the target training set into the parameter generation network to be trained, and obtaining the parameters of the general detection network of the target training set, may include: inputting each support image of the target training set to the to-be-trained The trained parameter generation network obtains the parameters of the detection network corresponding to each support image; according to the parameters of the detection network corresponding to each support image and the real category of each support image, each target training set is determined. The parameters of the detection network of the target training set are determined; and the parameters of the general detection network of the target training set are determined according to the parameters of the detection network of each type of the target training set.

在一個示例中，可以按照各支援圖像的真實類別，將同一類別的支援圖像對應的檢測網路的參數進行平均或者加權平均（權值可以根據支援圖像中目的地區域的位置或者大小等資訊確定），得到對應類別的檢測網路的參數。然後，將各類別的檢測網路的參數拼接為目標訓練集的通用檢測網路的參數。In one example, the parameters of the detection network corresponding to the support images of the same category may be averaged or weighted averaged according to the real category of each support image (the weight may be based on the location or size of the destination area in the support image) and other information to be determined) to obtain the parameters of the detection network of the corresponding category. Then, the parameters of each category of detection network are spliced into the parameters of the general detection network of the target training set.

圖2示出根據本發明實施例的網路架構示意圖。如圖2所示，該網路架構200包括參數生成網路

201和特徵提取網路202

。其中，參數生成網路

的參數為

，特徵提取網路

的參數為

。FIG. 2 shows a schematic diagram of a network architecture according to an embodiment of the present invention. As shown in FIG. 2, the network architecture 200 includes a parameter generation network

201 and Feature Extraction Networks 202

. Among them, the parameter generation network

The parameters are

, the feature extraction network

The parameters are

.

如圖2所示，從圖像集中獲取了目標訓練集

，該目標訓練集包括支援集

和查詢集

。As shown in Figure 2, the target training set was obtained from the image set

, the target training set includes the support set

and queryset

.

其中，支援集

包括K個類別的支援圖像，每個類別包括N個支援圖像。

表示支援圖像中的目的地區域，

表示

的真值標籤，

表示支持集

中第i個支援圖像的目的地區域和真值標籤，

，

，其中，

表示

的類別，

表示

的位置資訊。Among them, the support set

K categories of support images are included, and each category includes N support images.

Indicates the destination area in the support image,

Express

The truth label of ,

Indicates the support set

the destination region and ground truth label of the i-th support image,

,

,in,

Express

category,

Express

location information.

查詢集

包括K個類別的查詢圖像，每個類別包括O個查詢圖像。

表示查詢圖像中的目的地區域，

表示

的真值標籤，

表示查詢集

中第j個查詢圖像的目的地區域和真值標籤，

。

，其中，

表示

的類別，

表示

的位置資訊。query set

Query images of K categories are included, and each category includes O query images.

represents the destination area in the query image,

Express

The truth label of ,

Represents a queryset

The destination region and ground-truth label of the j-th query image in ,

.

,in,

Express

category,

Express

location information.

結合圖2，採用目標訓練集D，訓練參數生成網路的過程可以包括如下。With reference to Fig. 2, using the target training set D, the process of training the parameter generation network may include the following.

（1）構建目標訓練集的通用檢測網路，具體地：將支持集

中的各支援圖像進行裁剪得到各支援圖像的目的地區域

（在一個示例中，目的地區域的尺寸可以為224圖元*224圖元），將各支援圖像的目的地區域

輸入待訓練的參數生成網路

中，可以得到每個支援圖像對應的檢測網路的參數，將同一類別的支援圖像

對應的檢測網路的參數進行平均（或者加權平均），可以得到該類別的檢測網路的參數。公式（1）示出了類別k的檢測網路的參數：

（1）；其中，D表示檢測網路的參數的維度，

表示類別k的檢測網路的參數；

，也就是說檢測網路的類別與支援圖像的類別一致。(1) Build a general detection network for the target training set, specifically: the support set

Crop each support image in to get the destination area of each support image

(In an example, the size of the destination area can be 224 primitives*224 primitives), the destination area of each supporting image

Enter the parameters to be trained to generate the network

, the parameters of the detection network corresponding to each support image can be obtained, and the support images of the same category can be

The parameters of the corresponding detection network are averaged (or weighted average) to obtain the parameters of the detection network of this category. Equation (1) shows the parameters of the detection network for category k:

(1); Among them, D represents the dimension of the parameters of the detection network,

Represents the parameters of the detection network of category k;

, that is to say, the category of the detection network is consistent with the category of the supported image.

將K個類別的檢測網路的參數進行拼接，可以得到通用檢測網路的參數

，然後根據該通用檢測網路的參數

，可以構建出目標訓練集的通用檢測網路。By splicing the parameters of the K categories of detection networks, the parameters of the general detection network can be obtained.

, and then based on the general detection network parameters

, a general detection network for the target training set can be constructed.

（2）獲取查詢圖像的特徵圖，具體地：將查詢集

中的各查詢圖像進行裁剪後得到各查詢圖像的目的地區域

（在一個示例中，目的地區域的短邊為600圖元，長邊不超過1000圖元）輸入特徵提取網路

中，得到各查詢圖像的特徵圖

。(2) Obtain the feature map of the query image, specifically: the query set

The destination area of each query image is obtained by cropping each query image in

(In one example, the short side of the destination area is 600 primitives and the long side does not exceed 1000 primitives) Input Feature Extraction Network

, the feature map of each query image is obtained

.

（3）確定通用檢測網路的檢測損失，具體地：將各查詢圖像的特徵圖

輸入參數為

的通用檢測網路中，可以得到各查詢圖像的預測標籤分佈結果。根據各查詢圖像的預測標籤分佈結果和真值標籤

，可以得到通用檢測網路的檢測損失。在一個示例中，可以通過公式（2）得到通用檢測網路的檢測損失。

（2）；其中，

表示通用檢測網路的檢測損失，loss（…）表示損失函數。

表示以查詢圖像的真值標籤

和查詢圖像的特徵圖

輸入參數為

的通用檢測網路中得到的預測標籤分佈結果為參數的損失函數。本發明實施例中對損失函數的結構不做限制，例如可以為均方誤差函數、交叉熵函數等。(3) Determine the detection loss of the general detection network, specifically: the feature map of each query image

The input parameters are

In the general detection network of , the predicted label distribution results of each query image can be obtained. According to the predicted label distribution results and ground-truth labels of each query image

, the detection loss of the general detection network can be obtained. In one example, the detection loss of the general detection network can be obtained by Eq. (2).

(2); wherein,

represents the detection loss of the generic detection network, and loss(…) represents the loss function.

Representation to query the ground truth label of an image

and the feature map of the query image

The input parameters are

The predicted label distribution obtained in the general detection network results in a loss function of the parameters. The structure of the loss function is not limited in the embodiment of the present invention, for example, it may be a mean square error function, a cross entropy function, or the like.

（4）根據通用檢測網路的檢測損失

，訓練待訓練的參數生成網路

。(4) According to the detection loss of the general detection network

, train the parameter generation network to be trained

.

以通用檢測網路的檢測損失最小化為目標，調整參數生成網路

的參數

，以實現對參數生成網路

的訓練。To minimize the detection loss of the general detection network, adjust the parameters to generate the network

the parameters

, to generate a network for the parameters

training.

這樣，通過少量的樣本訓練出來的參數生成網路

，可以用來生成新類別的檢測網路的參數，具有將其檢測網路生成能力轉移到新類上的潛力。In this way, the network is generated by parameters trained with a small number of samples.

, parameters that can be used to generate detection networks for new classes, with the potential to transfer their detection network generation capabilities to new classes.

在一種可能的實現方式中，所述方法還包括：根據所述通用檢測網路的檢測損失，訓練所述待訓練的特徵提取網路。In a possible implementation manner, the method further includes: training the feature extraction network to be trained according to the detection loss of the general detection network.

由圖2所示的網路架構可知，在訓練參數生成網路

的過程中，可以同時對特徵提取網路

進行訓練。也就是說，還可以以通用檢測網路的檢測損失最小化為目標，更新特徵提取網路

的參數

。From the network architecture shown in Figure 2, it can be seen that the network is generated in the training parameters.

During the process, the feature extraction network can be simultaneously

to train. That is to say, it is also possible to update the feature extraction network with the goal of minimizing the detection loss of the general detection network.

the parameters

.

在一種可能的實現方式中，根據所述通用檢測網路的檢測損失，訓練所述待訓練的特徵提取網路，包括：獲取該目標訓練集的參考檢測網路的參數；根據所述目標訓練集的參考檢測網路的參數，構建所述目標訓練集的參考檢測網路；將所述各查詢圖像的特徵圖分別輸入所述參考檢測網路，得到所述各查詢圖像的參考標籤分佈結果；根據所述各查詢圖像的參考標籤分佈結果和真值標籤，確定所述參考檢測網路的檢測損失；根據所述通用檢測網路的檢測損失和所述參考檢測網路的檢測損失，訓練所述待訓練的特徵提取網路。In a possible implementation manner, according to the detection loss of the general detection network, training the feature extraction network to be trained includes: acquiring parameters of the reference detection network of the target training set; training according to the target The parameters of the reference detection network of the target training set are constructed, and the reference detection network of the target training set is constructed; the feature maps of the query images are input into the reference detection network respectively, and the reference labels of the query images are obtained. distribution result; according to the reference label distribution result and true value label of each query image, determine the detection loss of the reference detection network; according to the detection loss of the general detection network and the detection of the reference detection network loss, training the feature extraction network to be trained.

其中，參考檢測網路可以用於表示基於所述圖像集的所有類別的訓練圖像，訓練得到的檢測網路。The reference detection network may be used to represent the detection network obtained by training based on training images of all categories of the image set.

採用目標訓練集訓練參考檢測網路和特徵提取網路時，一次訓練過程僅涉及K個類別，多次訓練仍然被限制在有限數量的類別中。這樣，會導致訓練出來的特徵提取網路

的區分能力被限制在各目標訓練集涉及到的類別中，使其提取特徵的能力減弱。同時，採用目標訓練集訓練參考檢測網路和特徵提取網路時，訓練過程涉及的訓練圖像的數量較少。而採用少量訓練圖像訓練得到的通用檢測網路相較於採用大量訓練圖像得到的檢測網路的目標檢測能力較弱。因此，在本發明實施例中，引入通過多種類別的大量訓練圖像訓練得到的參考檢測網路，對參數生成網路

和特徵提取網路

的訓練進行優化。When using the target training set to train the reference detection network and the feature extraction network, only K categories are involved in one training process, and multiple training is still limited to a limited number of categories. In this way, the trained feature extraction network will

The discriminative ability is limited to the categories involved in each target training set, which weakens its ability to extract features. At the same time, when using the target training set to train the reference detection network and the feature extraction network, the number of training images involved in the training process is small. The general detection network trained with a small number of training images has weaker target detection ability than the detection network obtained with a large number of training images. Therefore, in the embodiment of the present invention, a reference detection network obtained by training a large number of training images of various categories is introduced, and the parameter generation network is

and feature extraction network

training is optimized.

圖3示出根據本發明實施例的網路架構示意圖。圖3所示的網路架構在圖2的基礎上增加了參數為θ的參考檢測網路301。將各查詢圖像的特徵圖

輸入參數為θ的參考檢測網路中，可以得到各查詢圖像的參考標籤分佈結果。根據各查詢圖像的參考標籤分佈結果和真值標籤

，可以得到參考檢測網路的檢測損失。在一個示例中，可以通過公式（3）示得到參考檢測網路的檢測損失。

（3）；其中，

表示參考檢測網路的檢測損失，loss（…）表示損失函數。

表示以查詢圖像的真值標籤

和查詢圖像的特徵圖

輸入參數為θ的參考檢測網路中得到的參考標籤分佈結果為參數的損失函數。本發明實施例中對損失函數的結構不做限制，例如可以為均方誤差函數、交叉熵函數等。FIG. 3 shows a schematic diagram of a network architecture according to an embodiment of the present invention. The network architecture shown in FIG. 3 adds a reference detection network 301 whose parameter is θ on the basis of FIG. 2 . The feature map of each query image

In the reference detection network whose input parameter is θ, the reference label distribution results of each query image can be obtained. Distribution results and ground-truth labels based on the reference labels of each query image

, the detection loss of the reference detection network can be obtained. In one example, the detection loss of the reference detection network can be obtained by formula (3).

(3); wherein,

represents the detection loss of the reference detection network, and loss(…) represents the loss function.

Representation to query the ground truth label of an image

and the feature map of the query image

The reference label distribution result obtained in the reference detection network with input parameter θ is the loss function of the parameter. The structure of the loss function is not limited in the embodiment of the present invention, for example, it may be a mean square error function, a cross entropy function, or the like.

需要說明的是公式（2）中的

和公式（3）中的

均可以查詢圖像的真值標籤，區別是

是K個類別中的一個，

是所有類別中的一個。It should be noted that in formula (2)

and in formula (3)

Both can query the ground truth label of the image, the difference is

is one of K categories,

is one of all categories.

這樣，由於參考檢測網路是基於所有類別的訓練圖像訓練出來的，因此根據所述通用檢測網路的檢測損失和所述參考檢測網路的檢測損失，共同訓練所述待訓練的特徵提取網路，可以提升特徵提取網路的特徵區分能力。In this way, since the reference detection network is trained based on training images of all categories, the feature extraction to be trained is jointly trained according to the detection loss of the general detection network and the detection loss of the reference detection network. The network can improve the feature discrimination ability of the feature extraction network.

在一種可能的實現方式中，獲取該目標訓練集的參考檢測網路的參數可以包括：獲取隨機初始化的檢測網路；基於該目標訓練集的所有查詢圖像對所述隨機初始化的檢測網路進行訓練；將訓練完成的檢測網路的參數，確定為該目標訓練集的參考檢測網路。In a possible implementation manner, obtaining the parameters of the reference detection network of the target training set may include: obtaining a randomly initialized detection network; Carry out training; determine the parameters of the detection network after training as the reference detection network of the target training set.

首先隨機初始化一個檢測網路作為待訓練的檢測網路，然後基於該目標訓練集的所有查詢圖像，對待訓練的檢測網路，得到目標訓練集的參考檢測網路。該目標訓練集的參考檢測網路與參數生成網路

和特徵提取網路

可以同時進行訓練。基於該目標訓練集的所有查詢圖像對所述隨機初始化的檢測網路進行訓練的過程可以參照相關技術中訓練檢測網路的訓練方法，例如YOLO、SSD等，對此本發明不做限制。First, a detection network is randomly initialized as the detection network to be trained, and then based on all the query images of the target training set, the detection network to be trained is obtained, and the reference detection network of the target training set is obtained. Reference detection network and parameter generation network for the target training set

and feature extraction network

Can be trained at the same time. For the process of training the randomly initialized detection network based on all the query images of the target training set, reference may be made to the training methods for training the detection network in the related art, such as YOLO, SSD, etc., which is not limited in the present invention.

目標訓練集的參考檢測網路的參數同樣由K個類別的檢測網路的參數拼接得到。基於目標訓練集的參考檢測網路的參數可以構建出目標訓練集的參考檢測網路。舉例來說，假設目標訓練集的K個類別中的類別k的參考檢測網路的參數為

，其中，D為參考檢測網路的參數的維度。將K個類別的參考檢測網路的參數進行拼接，可以得到目標訓練集的參考檢測網路的參數

。The parameters of the reference detection network of the target training set are also obtained by splicing the parameters of the detection networks of the K categories. Based on the parameters of the reference detection network of the target training set, the reference detection network of the target training set can be constructed. For example, suppose the parameters of the reference detection network for category k in the K categories of the target training set are

, where D is the dimension of the parameters of the reference detection network. By splicing the parameters of the K categories of reference detection networks, the parameters of the reference detection network of the target training set can be obtained

.

需要說明的是，在本發明實施例中還可以重新構建一個包括K個類別的資料集進行參考檢測網路的訓練。訓練過程可以參照上述採用查詢圖像進行訓練的過程，這裡不再贅述。It should be noted that, in the embodiment of the present invention, a data set including K categories may also be reconstructed for training of the reference detection network. For the training process, reference may be made to the above-mentioned training process using query images, which will not be repeated here.

在一種可能的實現方式中，根據所述通用檢測網路的檢測損失，訓練所述待訓練的參數生成網路，包括：根據該目標訓練集的通用檢測網路的參數和該目標訓練集的參考檢測網路的參數，確定所述通用檢測網路的差距損失；根據所述通用檢測網路的檢測損失和差距損失，訓練所述待訓練的參數生成網路的參數。In a possible implementation manner, according to the detection loss of the general detection network, training the parameter generation network to be trained includes: according to the parameters of the general detection network of the target training set and the parameters of the target training set Referring to the parameters of the detection network, the gap loss of the general detection network is determined; according to the detection loss and gap loss of the general detection network, the parameters to be trained are trained to generate the parameters of the network.

在一個示例中，可以通過公式（4）或者公式（5）得到通用檢測網路的差距損失。

（4）；

（5）；其中，

和

為通用檢測網路的差距損失

的兩個表現形式。

是一個條件函數，括弧中條件為真時，取值為1，括弧中條件為假時，取值為0。

表示一階範數，

表示二階範數。

和

分別表示類別c對應的通用檢測網路和參考檢測網路的參數。In one example, the gap loss of the general detection network can be obtained by Equation (4) or Equation (5).

(4);

(5); wherein,

and

Gap Loss for Universal Detection Networks

two manifestations.

is a conditional function that takes the value 1 when the condition in parentheses is true, and takes the value 0 when the condition in parentheses is false.

represents the first-order norm,

represents the second-order norm.

and

Represent the parameters of the general detection network and the reference detection network corresponding to category c, respectively.

這樣，由於參考檢測網路是基於所有類別的訓練圖像訓練出來的，因此，根據所述通用檢測網路的檢測損失和差距損失，共同訓練所述待訓練的參數生成網路的參數，可以使基於參數生成網路得到的檢測網路的準確度更高。In this way, since the reference detection network is trained based on training images of all categories, according to the detection loss and gap loss of the general detection network, jointly training the parameters of the to-be-trained parameter generation network can be The accuracy of the detection network obtained based on the parameter generation network is higher.

在一種可能的實現方式中，所述方法還可以包括：確定所述通用檢測網路的正交化損失；根據所述通用檢測網路的正交化損失，訓練所述待訓練的參數生成網路。In a possible implementation manner, the method may further include: determining an orthogonalization loss of the general detection network; training the parameter generation network to be trained according to the orthogonalization loss of the general detection network road.

在一個示例中，可以通過公式（6），確定通用檢測網路的正交化損失。

（6）；其中，

是

的行標準化版本，

表示1階範數，I是單位矩陣。In one example, the orthogonalization loss of the generic detection network can be determined by Equation (6).

(6); wherein,

Yes

The row normalized version of ,

represents the norm of order 1, and I is the identity matrix.

在本發明實施例中，通過使不同類別的檢測網路之間彼此正交，可以提升模型的區分能力。In the embodiment of the present invention, by making detection networks of different types orthogonal to each other, the distinguishing ability of the model can be improved.

考慮到本發明實施例中，圖3所示的參數生成網路

、特徵提取網路

和參考檢測網路可以同時進行訓練。因此，本發明實施例中可以通過公式（7），確定一個總的訓練損失。

（7）；其中，L表示總的訓練損失，

表示通用檢測網路的檢測損失（參見公式（2）），

表示參考檢測網路的檢測損失（參見公式（3）），

表示通用檢測網路的差距損失（參見公式（4）和公式（5）），

表示通用檢測網路的正交損失（參見公式（6））。

和

為超參數。

和

可以根據需要進行設置。在一個示例中，

可以取0.01，

可以取1。Considering that in the embodiment of the present invention, the parameter generation network shown in FIG. 3

, feature extraction network

and the reference detection network can be trained simultaneously. Therefore, in this embodiment of the present invention, a total training loss can be determined by formula (7).

(7); where L represents the total training loss,

represents the detection loss of the general detection network (see Equation (2)),

represents the detection loss of the reference detection network (see Equation (3)),

represents the gap loss of the generic detection network (see Equation (4) and Equation (5)),

represents the quadrature loss of the generic detection network (see Equation (6)).

and

are hyperparameters.

and

Can be set as required. In one example,

You can take 0.01,

1 can be taken.

在本發明實施例中，可以基於L同時對參數生成網路

、特徵提取網路

和參考檢測網路進行訓練，調整參數

、

和θ。In this embodiment of the present invention, a network can be generated for parameters based on L at the same time

, feature extraction network

Train with the reference detection network and adjust the parameters

,

and θ.

可以理解，本發明提及的上述各個方法實施例，在不違背原理邏輯的情況下，均可以彼此相互結合形成結合後的實施例，限於篇幅，本發明不再贅述。本領域技術人員可以理解，在具體實施方式的上述方法中，各步驟的具體執行順序應當以其功能和可能的內在邏輯確定。It can be understood that the above method embodiments mentioned in the present invention can be combined with each other to form a combined embodiment without violating the principle and logic. Due to space limitations, the present invention will not repeat them. Those skilled in the art can understand that, in the above method of the specific embodiment, the specific execution order of each step should be determined by its function and possible internal logic.

此外，本發明還提供了目標檢測裝置、電子設備、電腦可讀儲存介質、程式，上述均可用來實現本發明提供的任一種目標檢測方法，相應技術方案和描述和參見方法部分的相應記載，不再贅述。In addition, the present invention also provides a target detection device, an electronic device, a computer-readable storage medium, and a program, all of which can be used to implement any target detection method provided by the present invention. For the corresponding technical solutions and descriptions, refer to the corresponding records in the Methods section, No longer.

圖4示出根據本發明實施例的目標檢測裝置的方塊圖。如圖4所示，所述裝置40包括：構建模組41，配置為構建目標類別的檢測網路；檢測模組42，配置為採用所述目標類別的檢測網路對待檢測圖像進行檢測，得到所述待檢測圖像的目標檢測結果；其中，所述目標類別的檢測網路的參數是基於目標類別的訓練圖像輸入參數生成網路中而得到的。FIG. 4 shows a block diagram of a target detection apparatus according to an embodiment of the present invention. As shown in Figure 4, the device 40 includes: The construction module 41 is configured to construct the detection network of the target category; The detection module 42 is configured to use the detection network of the target category to detect the to-be-detected image to obtain the target detection result of the to-be-detected image; Wherein, the parameters of the detection network of the target category are obtained from the input parameter generation network based on the training image of the target category.

在一種可能的實現方式中，將該目標訓練集的各支援圖像輸入待訓練的參數生成網路，得到該目標訓練集的通用檢測網路的參數，包括：將該目標訓練集的各支援圖像分別輸入待訓練的參數生成網路，得到每個支援圖像對應的檢測網路的參數；根據各支援圖像對應的檢測網路的參數和各支援圖像的真實類別，確定該目標訓練集的每個類別的檢測網路的參數；根據該目標訓練集的各類別的檢測網路的參數，確定該目標訓練集的通用檢測網路的參數。In a possible implementation manner, each support image of the target training set is input into the parameter generation network to be trained, and the parameters of the general detection network of the target training set are obtained, including: Input each support image of the target training set into the parameter generation network to be trained respectively, and obtain the parameters of the detection network corresponding to each support image; According to the parameters of the detection network corresponding to each support image and the real category of each support image, determine the parameters of the detection network of each category of the target training set; According to the parameters of each type of detection network of the target training set, the parameters of the general detection network of the target training set are determined.

在一種可能的實現方式中，根據所述通用檢測網路的檢測損失，訓練所述待訓練的參數生成網路，包括：根據該目標訓練集的通用檢測網路的參數和該目標訓練集的參考檢測網路的參數，確定所述通用檢測網路的差距損失；根據所述通用檢測網路的檢測損失和差距損失，訓練所述待訓練的參數生成網路的參數。In a possible implementation manner, according to the detection loss of the general detection network, training the parameter generation network to be trained includes: According to the parameters of the general detection network of the target training set and the parameters of the reference detection network of the target training set, determine the gap loss of the general detection network; According to the detection loss and gap loss of the general detection network, the parameters to be trained are trained to generate the parameters of the network.

在一些實施例中，本發明實施例提供的裝置具有的功能或包含的模組可以用於執行上文方法實施例描述的方法，其具體實現可以參照上文方法實施例的描述，為了簡潔，這裡不再贅述。In some embodiments, the functions or modules included in the apparatus provided in the embodiments of the present invention may be used to execute the methods described in the above method embodiments. For specific implementation, reference may be made to the above method embodiments. For brevity, I won't go into details here.

本發明實施例還提出一種電腦可讀儲存介質，其上儲存有電腦程式指令，所述電腦程式指令被處理器執行時實現上述方法。電腦可讀儲存介質可以是非易失性電腦可讀儲存介質。An embodiment of the present invention further provides a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned method is implemented. The computer-readable storage medium may be a non-volatile computer-readable storage medium.

本發明實施例還提出一種電子設備，包括：處理器；用於儲存處理器可執行指令的記憶體；其中，所述處理器被配置為調用所述記憶體儲存的指令，以執行上述方法。An embodiment of the present invention further provides an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the instructions stored in the memory to execute the above method.

本發明實施例還提供了一種電腦程式產品，包括電腦可讀代碼，當電腦可讀代碼在設備上運行時，設備中的處理器執行用於實現如上任一實施例提供的目標檢測方法的指令。Embodiments of the present invention also provide a computer program product, including computer-readable codes. When the computer-readable codes are run on a device, a processor in the device executes instructions for implementing the target detection method provided in any of the above embodiments. .

本發明實施例還提供了另一種電腦程式產品，用於儲存電腦可讀指令，指令被執行時使得電腦執行上述任一實施例提供的目標檢測方法的操作。Embodiments of the present invention further provide another computer program product for storing computer-readable instructions, and when the instructions are executed, the computer executes the operations of the target detection method provided by any of the above embodiments.

電子設備可以被提供為終端、伺服器或其它形態的設備。The electronic device may be provided as a terminal, server or other form of device.

圖5示出根據本發明實施例的一種電子設備800的方塊圖。例如，電子設備800可以是行動電話，電腦，數位廣播終端，消息收發設備，遊戲控制台，平板設備，醫療設備，健身設備，個人數位助理等終端。FIG. 5 shows a block diagram of an electronic device 800 according to an embodiment of the present invention. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.

參照圖5，電子設備800可以包括以下一個或多個組件：處理組件802，記憶體804，電源組件806，多媒體組件808，音頻組件810，輸入/輸出（I/ O）的介面812，感測器組件814，以及通信組件816。5, an electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensing server component 814, and communication component 816.

處理組件802通常控制電子設備800的整體操作，諸如與顯示，電話呼叫，資料通信，相機操作和記錄操作相關聯的操作。處理組件802可以包括一個或多個處理器820來執行指令，以完成上述的方法的全部或部分步驟。此外，處理組件802可以包括一個或多個模組，便於處理組件802和其他組件之間的交互。例如，處理組件802可以包括多媒體模組，以方便多媒體組件808和處理組件802之間的交互。The processing component 802 generally controls the overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 can include one or more processors 820 to execute instructions to perform all or some of the steps of the methods described above. Additionally, processing component 802 may include one or more modules to facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.

記憶體804被配置為儲存各種類型的資料以支援在電子設備800的操作。這些資料的示例包括用於在電子設備800上操作的任何應用程式或方法的指令，連絡人資料，電話簿資料，消息，圖片，視頻等。記憶體804可以由任何類型的易失性或非易失性存放裝置或者它們的組合實現，如靜態隨機存取記憶體（SRAM），電可擦除可程式設計唯讀記憶體（EEPROM），可擦除可程式設計唯讀記憶體（EPROM），可程式設計唯讀記憶體（PROM），唯讀記憶體（ROM），磁記憶體，快閃記憶體，磁片或光碟。The memory 804 is configured to store various types of data to support the operation of the electronic device 800 . Examples of such data include instructions for any application or method operating on electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. Memory 804 may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or CD.

電源組件806為電子設備800的各種組件提供電力。電源組件806可以包括電源管理系統，一個或多個電源，及其他與為電子設備800生成、管理和分配電力相關聯的組件。Power supply assembly 806 provides power to various components of electronic device 800 . Power supply components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 800 .

多媒體組件808包括在所述電子設備800和使用者之間的提供一個輸出介面的螢幕。在一些實施例中，螢幕可以包括液晶顯示器（LCD）和觸摸面板（TP）。如果螢幕包括觸摸面板，螢幕可以被實現為觸控式螢幕，以接收來自使用者的輸入信號。觸摸面板包括一個或多個觸摸感測器以感測觸摸、滑動和觸摸面板上的手勢。所述觸摸感測器可以不僅感測觸摸或滑動動作的邊界，而且還檢測與所述觸摸或滑動操作相關的持續時間和壓力。在一些實施例中，多媒體組件808包括一個前置攝影頭和/或後置攝影頭。當電子設備800處於操作模式，如拍攝模式或視訊模式時，前置攝影頭和/或後置攝影頭可以接收外部的多媒體資料。每個前置攝影頭和後置攝影頭可以是一個固定的光學透鏡系統或具有焦距和光學變焦能力。Multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 808 includes a front-facing camera and/or a rear-facing camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

音頻組件810被配置為輸出和/或輸入音頻信號。例如，音頻組件810包括一個麥克風（MIC），當電子設備800處於操作模式，如呼叫模式、記錄模式和語音辨識模式時，麥克風被配置為接收外部音頻信號。所接收的音頻信號可以被進一步儲存在記憶體804或經由通信組件816發送。在一些實施例中，音頻組件810還包括一個揚聲器，用於輸出音頻信號。Audio component 810 is configured to output and/or input audio signals. For example, audio component 810 includes a microphone (MIC) that is configured to receive external audio signals when electronic device 800 is in operating modes, such as call mode, recording mode, and voice recognition mode. The received audio signal may be further stored in memory 804 or transmitted via communication component 816 . In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

I/ O介面812為處理組件802和週邊介面模組之間提供介面，上述週邊介面模組可以是鍵盤，點擊輪，按鈕等。這些按鈕可包括但不限於：主頁按鈕、音量按鈕、啟動按鈕和鎖定按鈕。The I/O interface 812 provides an interface between the processing element 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, and the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

感測器組件814包括一個或多個感測器，用於為電子設備800提供各個方面的狀態評估。例如，感測器組件814可以檢測到電子設備800的打開/關閉狀態，組件的相對定位，例如所述組件為電子設備800的顯示器和小鍵盤，感測器組件814還可以檢測電子設備800或電子設備800一個組件的位置改變，使用者與電子設備800接觸的存在或不存在，電子設備800方位或加速/減速和電子設備800的溫度變化。感測器組件814可以包括接近感測器，被配置用來在沒有任何的物理接觸時檢測附近物體的存在。感測器組件814還可以包括光感測器，如互補金屬氧化物半導體（CMOS）或電荷耦合裝置（CCD）圖像感測器，用於在成像應用中使用。在一些實施例中，該感測器組件814還可以包括加速度感測器，陀螺儀感測器，磁感測器，壓力感測器或溫度感測器。Sensor assembly 814 includes one or more sensors for providing various aspects of status assessment for electronic device 800 . For example, the sensor assembly 814 can detect the open/closed state of the electronic device 800, the relative positioning of the components, such as the display and keypad of the electronic device 800, the sensor assembly 814 can also detect the electronic device 800 or Changes in the position of a component of the electronic device 800 , presence or absence of user contact with the electronic device 800 , orientation or acceleration/deceleration of the electronic device 800 and changes in the temperature of the electronic device 800 . Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 814 may also include a light sensor, such as a complementary metal oxide semiconductor (CMOS) or charge coupled device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

通信組件816被配置為便於電子設備800和其他設備之間有線或無線方式的通信。電子設備800可以接入基於通信標準的無線網路，如無線網路（WiFi），第二代移動通信技術（2G）或第三代移動通信技術（3G），或它們的組合。在一個示例性實施例中，通信組件816經由廣播通道接收來自外部廣播管理系統的廣播信號或廣播相關資訊。在一個示例性實施例中，所述通信組件816還包括近場通信（NFC）模組，以促進短程通信。例如，在NFC模組可基於射頻識別（RFID）技術，紅外資料協會（IrDA）技術，超寬頻（UWB）技術，藍牙（BT）技術和其他技術來實現。Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as wireless network (WiFi), second generation mobile communication technology (2G) or third generation mobile communication technology (3G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性實施例中，電子設備800可以被一個或多個應用專用積體電路（ASIC）、數位訊號處理器（DSP）、數位信號處理設備（DSPD）、可程式設計邏輯器件（PLD）、現場可程式設計閘陣列（FPGA）、控制器、微控制器、微處理器或其他電子組件實現，用於執行上述方法。In an exemplary embodiment, electronic device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), Field Programmable Gate Array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the above method.

在示例性實施例中，還提供了一種非易失性電腦可讀儲存介質，例如包括電腦程式指令的記憶體804，上述電腦程式指令可由電子設備800的處理器820執行以完成上述方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 804 including computer program instructions executable by the processor 820 of the electronic device 800 to accomplish the above method.

圖6示出根據本發明實施例的一種電子設備1900的方塊圖。例如，電子設備1900可以被提供為一伺服器。參照圖6，電子設備1900包括處理組件1922，其進一步包括一個或多個處理器，以及由記憶體1932所代表的記憶體資源，用於儲存可由處理組件1922的執行的指令，例如應用程式。記憶體1932中儲存的應用程式可以包括一個或一個以上的每一個對應於一組指令的模組。此外，處理組件1922被配置為執行指令，以執行上述方法。FIG. 6 shows a block diagram of an electronic device 1900 according to an embodiment of the present invention. For example, the electronic device 1900 may be provided as a server. 6, the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and memory resources represented by memory 1932 for storing instructions executable by the processing component 1922, such as applications. An application program stored in memory 1932 may include one or more modules, each corresponding to a set of instructions. Additionally, the processing component 1922 is configured to execute instructions to perform the above-described methods.

電子設備1900還可以包括一個電源組件1926被配置為執行電子設備1900的電源管理，一個有線或無線網路介面1950被配置為將電子設備1900連接到網路，和一個輸入輸出（I/O）介面1958。電子設備1900可以操作基於儲存在記憶體1932的作業系統，例如微軟伺服器作業系統（Windows Server^TM ），蘋果公司推出的基於圖形化使用者介面作業系統(Mac OS X^TM )，多使用者多進程的電腦作業系統（Unix^TM ）, 自由和開放原代碼的類Unix作業系統（Linux^TM ），開放原代碼的類Unix作業系統（FreeBSD^TM ）或類似。The electronic device 1900 may also include a power supply assembly 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input output (I/O) Interface 1958. The electronic device 1900 can operate an operating system based on the memory 1932, such as Microsoft Server Operating System (Windows Server ^TM ), a graphical user interface based operating system (Mac OS X ^TM ) introduced by Apple Inc. Process Computer Operating System (Unix ^TM ), Free and Open Source Unix-like Operating System (Linux ^TM ), Open Source Unix-like Operating System (FreeBSD ^TM ) or the like.

在示例性實施例中，還提供了一種非易失性電腦可讀儲存介質，例如包括電腦程式指令的記憶體1932，上述電腦程式指令可由電子設備1900的處理組件1922執行以完成上述方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 1932 including computer program instructions executable by the processing component 1922 of the electronic device 1900 to accomplish the above method.

本發明可以是系統、方法和/或電腦程式產品。電腦程式產品可以包括電腦可讀儲存介質，其上載有用於使處理器實現本發明的各個方面的電腦可讀程式指令。The present invention may be a system, method and/or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the present invention.

電腦可讀儲存介質可以是可以保持和儲存由指令執行設備使用的指令的有形設備。電腦可讀儲存介質例如可以是(但不限於)電存放裝置、磁存放裝置、光存放裝置、電磁存放裝置、半導體存放裝置或者上述的任意合適的組合。電腦可讀儲存介質的更具體的例子（非窮舉的列表）包括：可擕式電腦盤、硬碟、隨機存取記憶體（RAM）、唯讀記憶體（ROM）、可擦式可程式設計唯讀記憶體（EPROM或快閃記憶體）、靜態隨機存取記憶體（SRAM）、可擕式壓縮磁碟唯讀記憶體（CD-ROM）、數位多功能盤（DVD）、記憶棒、軟碟、機械編碼設備、例如其上儲存有指令的打孔卡或凹槽內凸起結構、以及上述的任意合適的組合。這裡所使用的電腦可讀儲存介質不被解釋為暫態信號本身，諸如無線電波或者其他自由傳播的電磁波、通過波導或其他傳輸媒介傳播的電磁波（例如，通過光纖電纜的光脈衝）、或者通過電線傳輸的電信號。A computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable Design read only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory stick , a floppy disk, a mechanically encoded device, such as a punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the foregoing. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or Electrical signals carried by wires.

這裡所描述的電腦可讀程式指令可以從電腦可讀儲存介質下載到各個計算/處理設備，或者通過網路、例如網際網路、局域網、廣域網路和/或無線網下載到外部電腦或外部存放裝置。網路可以包括銅傳輸電纜、光纖傳輸、無線傳輸、路由器、防火牆、交換機、閘道電腦和/或邊緣伺服器。每個計算/處理設備中的網路介面卡或者網路介面從網路接收電腦可讀程式指令，並轉發該電腦可讀程式指令，以供儲存在各個計算/處理設備中的電腦可讀儲存介質中。The computer-readable program instructions described herein may be downloaded from computer-readable storage media to various computing/processing devices, or downloaded to external computers or external storage over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network device. Networks may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. A network interface card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for computer-readable storage stored in each computing/processing device in the medium.

用於執行本發明操作的電腦程式指令可以是彙編指令、指令集架構（ISA）指令、機器指令、機器相關指令、微代碼、固件指令、狀態設置資料、或者以一種或多種程式設計語言的任意組合編寫的原始程式碼或目標代碼，所述程式設計語言包括對象導向的程式設計語言—諸如Smalltalk、C++等，以及常規的過程式程式設計語言—諸如“C”語言或類似的程式設計語言。電腦可讀程式指令可以完全地在使用者電腦上執行、部分地在使用者電腦上執行、作為一個獨立的套裝軟體執行、部分在使用者電腦上部分在遠端電腦上執行、或者完全在遠端電腦或伺服器上執行。在涉及遠端電腦的情形中，遠端電腦可以通過任意種類的網路—包括局域網(LAN)或廣域網路(WAN)—連接到使用者電腦，或者，可以連接到外部電腦（例如利用網際網路服務提供者來通過網際網路連接）。在一些實施例中，通過利用電腦可讀程式指令的狀態資訊來個性化定制電子電路，例如可程式設計邏輯電路、現場可程式設計閘陣列（FPGA）或可程式設計邏輯陣列（PLA），該電子電路可以執行電腦可讀程式指令，從而實現本發明的各個方面。The computer program instructions for carrying out the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or any other information in one or more programming languages. Combining source or object code written in programming languages including object-oriented programming languages, such as Smalltalk, C++, etc., and conventional procedural programming languages, such as the "C" language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely remotely. run on a client computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network—including a local area network (LAN) or a wide area network (WAN)—or, can be connected to an external computer (for example, using the Internet road service provider to connect via the Internet). In some embodiments, electronic circuits, such as programmable logic circuits, field programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), are personalized by utilizing state information of computer readable program instructions. Electronic circuits may execute computer readable program instructions to implement various aspects of the present invention.

這裡參照根據本發明實施例的方法、裝置（系統）和電腦程式產品的流程圖和/或方塊圖描述了本發明的各個方面。應當理解，流程圖和/或方塊圖的每個方塊以及流程圖和/或方塊圖中各方塊的組合，都可以由電腦可讀程式指令實現。Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

這些電腦可讀程式指令可以提供給通用電腦、專用電腦或其它可程式設計資料處理裝置的處理器，從而生產出一種機器，使得這些指令在通過電腦或其它可程式設計資料處理裝置的處理器執行時，產生了實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作的裝置。也可以把這些電腦可讀程式指令儲存在電腦可讀儲存介質中，這些指令使得電腦、可程式設計資料處理裝置和/或其他設備以特定方式工作，從而，儲存有指令的電腦可讀介質則包括一個製造品，其包括實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作的各個方面的指令。These computer readable program instructions may be provided to the processor of a general purpose computer, special purpose computer or other programmable data processing device to produce a machine for execution of the instructions by the processor of the computer or other programmable data processing device When, means are created that implement the functions/acts specified in one or more of the blocks in the flowchart and/or block diagrams. These computer readable program instructions may also be stored on a computer readable storage medium, the instructions causing the computer, programmable data processing device and/or other equipment to operate in a particular manner, so that the computer readable medium storing the instructions Included is an article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

也可以把電腦可讀程式指令載入到電腦、其它可程式設計資料處理裝置、或其它設備上，使得在電腦、其它可程式設計資料處理裝置或其它設備上執行一系列操作步驟，以產生電腦實現的過程，從而使得在電腦、其它可程式設計資料處理裝置、或其它設備上執行的指令實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作。Computer readable program instructions can also be loaded into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to generate a computer Processes of implementation such that instructions executing on a computer, other programmable data processing apparatus, or other device implement the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

附圖中的流程圖和方塊圖顯示了根據本發明的多個實施例的系統、方法和電腦程式產品的可能實現的體系架構、功能和操作。在這點上，流程圖或方塊圖中的每個方塊可以代表一個模組、程式段或指令的一部分，所述模組、程式段或指令的一部分包含一個或多個用於實現規定的邏輯功能的可執行指令。在有些作為替換的實現中，方塊中所標注的功能也可以以不同於附圖中所標注的順序發生。例如，兩個連續的方塊實際上可以基本並行地執行，它們有時也可以按相反的循序執行，這依所涉及的功能而定。也要注意的是，方塊圖和/或流程圖中的每個方塊、以及方塊圖和/或流程圖中的方塊的組合，可以用執行規定的功能或動作的專用的基於硬體的系統來實現，或者可以用專用硬體與電腦指令的組合來實現。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that contains one or more logic for implementing the specified logic Executable instructions for the function. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by dedicated hardware-based systems that perform the specified functions or actions. implementation, or may be implemented in a combination of special purpose hardware and computer instructions.

該電腦程式產品可以具體通過硬體、軟體或其結合的方式實現。在一個可選實施例中，所述電腦程式產品具體體現為電腦儲存介質，在另一個可選實施例中，電腦程式產品具體體現為軟體產品，例如軟體發展包(Software Development Kit，SDK)等等。The computer program product can be implemented by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.

以上已經描述了本發明的各實施例，上述說明是示例性的，並非窮盡性的，並且也不限於所披露的各實施例。在不偏離所說明的各實施例的範圍和精神的情況下，對於本技術領域的普通技術人員來說許多修改和變更都是顯而易見的。本文中所用術語的選擇，旨在最好地解釋各實施例的原理、實際應用或對市場中的技術的改進，或者使本技術領域的其它普通技術人員能理解本文披露的各實施例。Various embodiments of the present invention have been described above, and the foregoing descriptions are exemplary, not exhaustive, and not limiting of the disclosed embodiments. Numerous modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the various embodiments, the practical application or improvement over the technology in the marketplace, or to enable others of ordinary skill in the art to understand the various embodiments disclosed herein.

工業實用性本發明提供了一種目標檢測方法、電子設備和電腦可讀儲存介質，其中，構建目標類別的檢測網路；採用所述目標類別的檢測網路對待檢測圖像進行檢測，得到所述待檢測圖像的目標檢測結果；其中，所述目標類別的檢測網路的參數是基於目標類別的訓練圖像輸入參數生成網路中而得到的。Industrial Applicability The present invention provides a target detection method, an electronic device and a computer-readable storage medium, wherein a detection network of a target category is constructed; an image to be detected is detected by using the detection network of the target category to obtain the to-be-detected image The target detection result of the image; wherein, the parameters of the detection network of the target category are obtained from the input parameter generation network based on the training image of the target category.

201:參數生成網路

202:特徵提取網路

40:目標檢測裝置 41:構建模組 42:檢測模組 800:電子設備 802:處理組件 804:記憶體 806:電源組件 808:多媒體組件 810:音頻組件 812:輸入/輸出介面 814:感測器組件 816:通信組件 820:處理器 1900:電子設備 1922:處理組件 1926:電源組件 1932:記憶體 1950:網路介面 1958:輸入輸出介面 S11~S12:步驟201: Parameter generation network

202: Feature Extraction Network

40: Object Detection Device 41: Building Module 42: Detection Module 800: Electronic Equipment 802: Processing Component 804: Memory 806: Power Component 808: Multimedia Component 810: Audio Component 812: Input/Output Interface 814: Sensor Component 816: Communication Component 820: Processor 1900: Electronic Equipment 1922: Processing Component 1926: Power Component 1932: Memory 1950: Network Interface 1958: I/O Interface S11~S12: Steps

此處的附圖被併入說明書中並構成本說明書的一部分，這些附圖示出了符合本發明的實施例，並與說明書一起用於說明本發明的技術方案。圖1示出根據本發明實施例的目標檢測方法的流程圖；圖2示出根據本發明實施例的網路架構示意圖；圖3示出根據本發明實施例的網路架構示意圖；圖4示出根據本發明實施例的目標檢測裝置的方塊圖；圖5示出根據本發明實施例的一種電子設備800的方塊圖；圖6示出根據本發明實施例的一種電子設備1900的方塊圖。The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present invention, and together with the description, serve to explain the technical solutions of the present invention. 1 shows a flowchart of a target detection method according to an embodiment of the present invention; 2 shows a schematic diagram of a network architecture according to an embodiment of the present invention; 3 shows a schematic diagram of a network architecture according to an embodiment of the present invention; 4 shows a block diagram of a target detection apparatus according to an embodiment of the present invention; FIG. 5 shows a block diagram of an electronic device 800 according to an embodiment of the present invention; FIG. 6 shows a block diagram of an electronic device 1900 according to an embodiment of the present invention.

S11~S12:步驟S11~S12: Steps

Claims

A target detection method, comprising: Build a detection network for the target category; Use the detection network of the target category to detect the image to be detected, and obtain the target detection result of the image to be detected; The parameters of the detection network of the target category are obtained by inputting the training images of the target category into the parameter generation network.

The method according to claim 1, further comprising: Obtain one or more target training sets from the image set, wherein each target training set includes training images of K categories, each category includes M training images, and K is an integer greater than 0; Based on each target training set, the parameter generation network is trained.

The method according to claim 2, wherein the M training images include N support images and O query images, and N and O are integers greater than 0; Generate a network based on the above parameters, including: For each target training set: Input each support image of the target training set into the parameter generation network to be trained, obtain the parameters of the general detection network of the target training set, and construct the general detection network of the target training set based on the parameters of the general detection network network; Input each query image of the target training set into the feature extraction network to be trained to obtain a feature map of each query image of the target training set; Inputting the feature maps of the query images into the general detection network respectively, to obtain the predicted label distribution results of the query images; Determine the detection loss of the general detection network according to the predicted label distribution result and the true value label of each query image; The parameter generation network to be trained is trained according to the detection loss of the general detection network.

The method according to claim 3, wherein the inputting each support image of the target training set into a parameter generation network to be trained to obtain the parameters of the general detection network of the target training set, including: Input each support image of the target training set into the parameter generation network to be trained respectively, and obtain the parameters of the detection network corresponding to each support image; According to the parameters of the detection network corresponding to each support image and the real category of each support image, determine the parameters of the detection network of each category of the target training set; According to the parameters of each type of detection network of the target training set, the parameters of the general detection network of the target training set are determined.

The method according to claim 3 or 4, further comprising: The feature extraction network to be trained is trained according to the detection loss of the general detection network.

The method according to claim 5, wherein the training of the feature extraction network to be trained according to the detection loss of the general detection network includes: Obtain the reference detection network of the target training set; inputting the feature maps of the query images into the reference detection network respectively, to obtain the reference label distribution results of the query images; Determine the detection loss of the reference detection network according to the reference label distribution result and the true value label of each query image; The feature extraction network to be trained is trained according to the detection loss of the general detection network and the detection loss of the reference detection network.

The method according to claim 6, wherein the acquiring the parameters of the reference detection network of the target training set includes: Get a randomly initialized detection network; training the randomly initialized detection network based on all query images in the target training set; The parameters of the trained detection network are determined as the reference detection network of the target training set.

The method according to claim 6, wherein the training of the parameter generation network to be trained according to the detection loss of the general detection network comprises: According to the parameters of the general detection network of the target training set and the parameters of the reference detection network of the target training set, determine the gap loss of the general detection network; According to the detection loss and gap loss of the general detection network, the parameters to be trained are trained to generate the parameters of the network.

The method according to claim 3 or 4, further comprising: determining an orthogonalization loss for the generic detection network; The parameter generation network to be trained is trained according to the orthogonalization loss of the general detection network.

The method according to claim 2, wherein the constructing the detection network of the target category includes: obtaining training images of the target category; Input each training image of the target category into the parameter generation network respectively, to obtain the parameters of the detection network corresponding to each training sample of the target category; Determine the parameters of the detection network of the target category according to the parameters of the detection network corresponding to each training sample of the target category; According to the parameters of the detection network of the target category, the detection network of the target category is constructed.

. An electronic device comprising: processor; memory for storing processor-executable instructions; Wherein, the processor is configured to invoke the instructions stored in the memory to execute the method of any one of request items 1 to 10.

A computer-readable storage medium on which computer program instructions are stored, the computer program instructions implement the method described in any one of claim 1 to 10 when the computer program instructions are executed by a processor.