WO2022011892A1 - Procédé et appareil d'instruction de réseau, procédé et appareil de détection de cible et dispositif électronique - Google Patents

Procédé et appareil d'instruction de réseau, procédé et appareil de détection de cible et dispositif électronique Download PDF

Info

Publication number
WO2022011892A1
WO2022011892A1 PCT/CN2020/125972 CN2020125972W WO2022011892A1 WO 2022011892 A1 WO2022011892 A1 WO 2022011892A1 CN 2020125972 W CN2020125972 W CN 2020125972W WO 2022011892 A1 WO2022011892 A1 WO 2022011892A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
sample image
category
sample
image
Prior art date
Application number
PCT/CN2020/125972
Other languages
English (en)
Chinese (zh)
Inventor
窦浩轩
王意如
甘伟豪
路少卿
武伟
闫俊杰
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2021569189A priority Critical patent/JP2022544893A/ja
Priority to KR1020217038227A priority patent/KR20220009965A/ko
Publication of WO2022011892A1 publication Critical patent/WO2022011892A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present disclosure relates to the field of computer technologies, and in particular, to a network training method and device, a target detection method and device, and electronic equipment.
  • Computer vision is an important direction of artificial intelligence technology. In computer vision processing, it is usually necessary to detect objects (such as pedestrians, objects, etc.) in images or videos.
  • Target detection of large-scale long-tail data has important applications in many fields, such as abnormal object detection in urban surveillance, abnormal behavior detection and emergency alarm.
  • abnormal object detection in urban surveillance abnormal behavior detection and emergency alarm.
  • the embodiments of the present disclosure propose a technical solution for network training and target detection.
  • a network training method including:
  • the sample image is taken as the marked second sample image and added to the training set, wherein the annotation information of the second sample image includes the image area of the first target and the category confidence corresponding to the first target.
  • the training set includes the labeled third sample image; for the second target whose category confidence in the target is less than the first threshold, according to the feature information of the third target in the third sample image, Feature correlation mining is performed on the second target, and through feature correlation mining, a fourth target and a first sample image where the fourth target is located are determined from the second target, and the fourth target is located.
  • the first sample image is taken as the fourth sample image and added to the training set; according to the annotation information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set images to train the object detection network.
  • the target detection network is trained according to the labeling information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set ,include:
  • the first number of samples sampled from the positive sample images of each category is determined respectively, and the positive sample images are the sample images including the target in the image; according to the positive samples of each category The first number of samples in the image, sampling positive sample images of each category to obtain a plurality of fifth sample images; sampling the negative sample images of the training set to obtain a plurality of sixth sample images, the negative samples
  • the image is a sample image that does not include a target; the target detection network is trained according to the fifth sample image and the sixth sample image.
  • the feature correlation mining is performed on the second target according to the feature information of the third target in the third sample image, and the feature correlation mining is performed from the second target.
  • Determining the fourth target and the first sample image where the fourth target is located includes: determining the information entropy of the second target according to the classification probability of the second target; according to the category confidence of the second target degree and information entropy, select the fifth target from the second target; according to the category of the third target in the third sample image and the total number of sample images to be mined, determine the samples to be mined for each category respectively The second number of images; according to the feature information of the third target in the third sample image, the feature information of the fifth target and the second number of sample images to be mined in each category, from the fifth target A fourth target and a first sample image where the fourth target is located are determined.
  • selecting a fifth target from the second target according to the category confidence and information entropy of the second target includes: according to the category confidence and information of the second target entropy, sort the second targets respectively, select the sixth target with the third quantity and the seventh target with the fourth quantity; combine the sixth target and the seventh target to obtain the fifth target Target.
  • the category of the third target in the third sample image and the total number of sample images to be mined respectively determining the second number of sample images to be mined for each category, including: according to The category of the third target in the third sample image determines the proportion of the third target of each category; according to the proportion of the third target of each category, the sampling proportion of each category is determined; according to the sampling proportion of each category, the proportion of each category is determined respectively The second number of sample images to be mined for each category.
  • Determining the fourth target and the first sample image where the fourth target is located from among the five targets includes: according to the distance between the characteristic information of the third target of the first category and the characteristic information of each fifth target, respectively determining Among the third targets of the first category, the third target with the smallest distance from each fifth target is used as the eighth target, and the first category is any one of the categories of the third targets; the eighth target is The target with the largest middle distance is determined as the fourth target.
  • the feature information of the third target in the third sample image the feature information of the fifth target and the second number of sample images to be mined in each category, from the third sample image Determining the fourth target and the first sample image where the fourth target is located among the five targets, further comprising: adding the determined fourth target to the third target of the first category, and adding the determined fourth target to the third target of the first category.
  • the outgoing fourth target is removed from the unlabeled fifth target.
  • the method further includes: inputting the third sample image into the target detection network for processing to obtain feature information of the third target in the third sample image.
  • the method before the step of inputting the unlabeled first sample image into the target detection network for processing to obtain the target detection result of the first sample image, the method further includes:
  • the target detection network is pre-trained by using the labeled third sample image.
  • the first sample image includes a long-tail image.
  • a target detection method includes: inputting an image to be processed into a target detection network for processing, and obtaining a target detection result of the to-be-processed image, where the target detection result includes all The position and category of the target in the image to be processed are obtained, and the target detection network is trained according to the above-mentioned network training method.
  • a network training apparatus including:
  • the target detection part is configured to input the unlabeled first sample image into the target detection network for processing, and obtain the target detection result of the first sample image, and the target detection result includes the target detection result in the first sample image.
  • a confidence determination part configured to determine the category confidence of the target according to the classification probability of the target
  • the labeling part is configured to take the first sample image where the first target is located as the labeled second sample image for the first target whose category confidence in the target is greater than or equal to the first threshold, and add training set, wherein the labeling information of the second sample image includes the image area of the first target and the category corresponding to the class confidence of the first target, and the training set includes the labelled third sample image;
  • the feature mining part is configured to, for the second target whose category confidence is less than the first threshold in the target, perform feature information on the second target according to the feature information of the third target in the third sample image Relevance mining, through feature correlation mining, the fourth target and the first sample image where the fourth target is located are determined from the second target, and the first sample image where the fourth target is located is used as the first sample image.
  • Four sample images are added to the training set;
  • the training part is configured to train the target detection network according to the label information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set.
  • the training part includes: a sampling quantity determination sub-part, configured to separately determine, according to the category of the target in the positive sample images of the training set, the number of samples sampled from the positive sample images of each category the first quantity, the positive sample images are sample images including the target in the image; the first sampling subsection is configured to sample the positive sample images of each category according to the first quantity sampled in the positive sample images of each category , to obtain a plurality of fifth sample images; the second sampling subsection is configured to sample the negative sample images of the training set to obtain a plurality of sixth sample images, and the negative sample images are images that do not include the target a sample image; a training subsection configured to train the object detection network according to the fifth sample image and the sixth sample image.
  • the feature mining part includes: an information entropy determination sub-section, configured to determine the information entropy of the second target according to the classification probability of the second target; a target selection sub-section, is configured to select a fifth target from the second target according to the category confidence and information entropy of the second target; the mining quantity determination subsection is configured to select a fifth target according to the third sample image in the third sample image
  • the category of the target and the total number of sample images to be mined respectively determine the second quantity of the sample images to be mined for each category;
  • the target and image determination sub-section is configured to be based on the third target in the third sample image.
  • the feature information, the feature information of the fifth target and the second number of sample images to be mined in each category, the fourth target and the first sample image where the fourth target is located are determined from the fifth target.
  • the target selection sub-section is configured to: according to the category confidence and information entropy of the second target, sort the second targets respectively, and select a third number of Six targets and a seventh target with a fourth quantity; the sixth target and the seventh target are combined to obtain the fifth target.
  • the mining quantity determination subsection is configured to: determine the proportion of the third objects of each category according to the category of the third object in the third sample image; The proportion of the three targets determines the sampling proportion of each category; according to the sampling proportion of each category, the second quantity of sample images to be mined in each category is determined respectively.
  • the target and image determination subsection is configured to: determine the third target according to the distance between the feature information of the third target of the first category and the feature information of each fifth target.
  • the third target with the smallest distance from each fifth target is used as the eighth target, and the first category is any one of the categories of the third targets; the distance among the eighth targets is the largest target, identified as the fourth target.
  • the target and image determination subsection is further configured to: add the determined fourth target to the third target of the first category, and add the determined fourth target to the third target of the first category. Four targets were removed from the unlabeled fifth target.
  • the apparatus further includes: a feature extraction part, configured to input the third sample image into the target detection network for processing to obtain a third target in the third sample image characteristic information.
  • the apparatus before the target detection part, further includes: a pre-training part configured to pre-train the target detection network by using the labeled third sample images.
  • the first sample image includes a long-tail image.
  • a target detection apparatus includes: a detection processing part configured to input an image to be processed into a target detection network for processing, and obtain a target detection result of the to-be-processed image , the target detection result includes the position and category of the target in the to-be-processed image, and the target detection network is trained according to the above-mentioned network training method.
  • the method before the step of respectively determining the first number of samples sampled from the positive sample images of each category according to the category of the target in the positive sample images of the training set, the method includes: The positive sample images and negative sample images are sampled to obtain the same or similar number of positive sample images and negative sample images.
  • the total number of sample images to be mined is 5% to 25% of the total number of the first sample images.
  • the combining the sixth target and the seventh target to obtain the fifth target includes: removing the sixth target that is the same as the seventh target target, obtain the remaining target that is different from the seventh target in the sixth target; take the remaining target and the seventh target as the fifth target.
  • the method further includes: when the number of the fourth sample images of the first category reaches a second number of sample images of the first category to be mined, ending the pairing process. Feature correlation mining of the first category.
  • the method further includes: after the number of the first sample images where the fourth target is located reaches the target of the first category to be mined. When the second number of sample images is reached, the determination of the eighth target is ended.
  • the method further includes: when the number of the first sample images where the fourth target is located does not reach the number of the first category to be mined When the second number of sample images is stored, and the set of storing the feature information of the fifth target is empty, the determination of the eighth target is ended.
  • the inputting the third sample image into the target detection network for processing to obtain the feature information of the third target in the third sample image includes: The sample image is input into the target detection network, and the feature vector output by the hidden layer of the target detection network is obtained; the feature vector is determined as the feature information of the third target.
  • an electronic device comprising: a processor; a memory configured to store instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory, to perform the above-mentioned network training method, or to perform the above-mentioned target detection method.
  • a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, implement the network training method, or implement the above-mentioned target detection method.
  • the target detection results of unlabeled sample images can be obtained through the target detection network; pseudo-labeling and feature correlation mining are respectively performed according to the target detection results, high-value sample images are marked and collected, and added to the training set;
  • the latter training set trains the target detection network, thereby expanding the number of positive sample data in the training set, alleviating the imbalance between positive and negative samples, and improving the training effect of the target detection network.
  • FIG. 1 shows a flowchart of a network training method according to an embodiment of the present disclosure.
  • FIG. 2 shows a schematic diagram of a processing procedure of a network training method according to an embodiment of the present disclosure.
  • FIG. 3 shows a block diagram of a network training apparatus according to an embodiment of the present disclosure.
  • FIG. 4 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
  • FIG. 5 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
  • FIG. 1 shows a flowchart of a network training method according to an embodiment of the present disclosure.
  • the network training method includes:
  • step S11 the unlabeled first sample image is input into a target detection network for processing to obtain a target detection result of the first sample image, where the target detection result includes the target detection result of the target in the first sample image Image area, feature information and classification probability;
  • step S12 the category confidence of the target is determined according to the classification probability of the target
  • step S13 for the first target whose category confidence is greater than or equal to the first threshold in the target, the first sample image where the first target is located is taken as the marked second sample image, and added to the training set , wherein the labeling information of the second sample image includes the image area of the first target and the category corresponding to the class confidence of the first target, and the training set includes the labelled third sample image;
  • step S14 for the second target whose category confidence is less than the first threshold in the target, perform feature correlation mining on the second target according to the feature information of the third target in the third sample image , through feature correlation mining, determine the fourth target and the first sample image where the fourth target is located from the second target, and use the first sample image where the fourth target is located as the fourth sample image, and add it to the training set;
  • step S15 the target detection network is trained according to the label information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set.
  • the method may be executed by an electronic device such as a terminal device or a server, and the terminal device may be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, For personal digital processing (Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc., the method can be implemented by the processor calling the computer-readable instructions stored in the memory. Alternatively, the method may be performed by a server.
  • UE User Equipment
  • PDA Personal Digital Assistant
  • the method may be implemented by the processor calling the computer-readable instructions stored in the memory.
  • the method may be performed by a server.
  • the first sample image may be an image acquired by an image acquisition device (eg, a camera).
  • the first sample image may include a large-scale long-tailed image, that is, most of the images are background images, and a small part of the images include detectable objects.
  • Detectable targets may include, for example, human bodies, faces, vehicles, objects, and the like.
  • images of a certain geographical area can be collected by cameras, and people may pass through the geographical area only a small part of the time, so most of the collected images are background images, and only a small part of the images include human faces and/or faces. or human body.
  • the collected images can form a long-tailed dataset.
  • the embodiment of the present disclosure does not limit the acquisition method of the first sample image and the category of the target in the first sample image.
  • a target detection network may be preset to detect the position (ie, detection frame) and category of the target in the image.
  • the target detection network may be, for example, a convolutional neural network, and the embodiment of the present disclosure does not limit the network structure of the target detection network.
  • the method further includes: pre-training the target detection network by using the labeled third sample image. That is to say, a training set may be preset, and the training set includes the labeled third sample images, and the labeling information of the third sample images may include the detection frame and category of the target in the image. According to the training set, the target detection network can be pre-trained by the method in the related art, so that the target detection network has a certain detection accuracy.
  • the pre-trained object detection network has poor detection effect on large-scale long-tail images. Therefore, the unlabeled first sample image can be used to further train the object detection network through active learning.
  • the unlabeled first sample image may be input into the target detection network for processing to obtain the target detection result of the first sample image.
  • the target detection result may include the image area, feature information and classification probability of the target in the first sample image.
  • the image area where the target is located can be the detection frame in the image;
  • the feature information of the target can be, for example, the feature vector output by the hidden layer (such as the convolution layer) of the target detection network;
  • the classification probability of the target can represent the classification of the target belonging to each category Posterior probability.
  • the target in the first sample image may also be referred to as an instance, and one or more targets may be detected in each first sample image.
  • the order of magnitude of detected objects may be several to dozens of times the order of magnitude of images.
  • step S12 according to the classification probability of the target, the maximum value of the classification probability may be obtained and determined as the classification confidence level of the target.
  • step S13 for a target whose category confidence is greater than or equal to the first threshold (which may be referred to as a first target), the first sample image where the first target is located may be used as a Annotated sample images (may be referred to as second sample images) are added to the training set.
  • the image area of the first target is taken as the marked image area, and the category corresponding to the category confidence of the first target is taken as the marked category of the first target.
  • the same second sample image may be labeled multiple times by multiple first objects in the second sample image.
  • the first threshold is, for example, 0.99, and the embodiment of the present disclosure does not limit the value of the first threshold.
  • step S13 may be called pseudo-labeling. That is, the image where the target with higher confidence is located is regarded as a high-value sample, and the target detection inference result is directly used as the target annotation result. In this way, the number of positive sample data in the training set can be expanded to solve the problem of difficult collection of positive samples.
  • step S14 for the target whose category confidence is less than the first threshold (which may be referred to as the second target), the target in the third sample image that has been marked in the training set (which may be referred to as The feature information of the third target), the feature correlation mining is performed on the second target, and the target that meets the requirements (may be referred to as the fourth target) is mined from the second target.
  • the distance or correlation between the feature information of the third target and the feature information of the second target can be calculated, a preset number of targets can be selected according to the distance or the correlation, and the selected preset number of targets can be used as the first target.
  • the first sample image where the mined fourth target is located may be taken as the fourth sample image and added to the training set, so as to complete the processing process of feature correlation mining. In this way, the number of sample data in the training set can be further expanded.
  • the annotation information of the fourth sample image may be obtained by manual annotation, for example, manually determining the detection frame and category of the target in the fourth sample image. This embodiment of the present disclosure does not limit this.
  • step S15 after obtaining the label information of the fourth sample image, the target detection network can be trained according to the second sample image, the third sample image and the fourth sample image in the training set.
  • the target detection results of unlabeled sample images can be obtained through the target detection network; pseudo-labeling and feature correlation mining are respectively performed according to the target detection results, high-value sample images are marked and collected, and added to the training set;
  • the latter training set trains the target detection network, thereby expanding the number of positive sample data in the training set, alleviating the imbalance between positive and negative samples, and improving the training effect of the target detection network.
  • step S11 the target detection result of each first sample image is obtained; through the processing of S12, the class confidence of the target in each of the first sample images is obtained.
  • step S13 the sample image of the first target whose class confidence is greater than or equal to the first threshold can be added to the training set, and the labeled second sample image can be obtained by pseudo-labeling; in step S14, the class confidence can be Mining is performed on second targets whose degree is less than the first threshold.
  • step S14 may include:
  • the classification probability of the second target determine the information entropy of the second target
  • the category of the third target in the third sample image and the total number of sample images to be mined respectively determine the second number of sample images to be mined in each category
  • the feature information of the third target in the third sample image According to the feature information of the third target in the third sample image, the feature information of the fifth target and the second number of sample images to be mined in each category, the fourth target and the The first sample image where the fourth target is located.
  • the information entropy of the second target can be calculated to indicate the degree of uncertainty of the second target, that is, the greater the information entropy of the second target, the higher the information entropy of the second target The greater the degree of uncertainty; on the contrary, the smaller the information entropy of the second target, the smaller the degree of uncertainty of the second target.
  • the embodiment of the present disclosure does not limit the calculation method of the information entropy.
  • a target (which may be referred to as a fifth target) that satisfies a certain condition may be selected from a plurality of second targets, for example, a category may be selected.
  • the step of selecting a fifth target from the second target according to the category confidence and information entropy of the second target may include:
  • the second targets are sorted respectively, and a third number of sixth targets and a fourth number of seventh targets are selected;
  • the sixth target and the seventh target are combined to obtain the fifth target.
  • the plurality of second targets are sorted; according to the sorting result, a preset fourth number of targets (which may be referred to as seventh targets) are selected from the plurality of second targets.
  • the third number and the fourth number may be 3K respectively
  • K represents the number of sample images to be mined
  • K is, for example, 10000.
  • the value of K may be 5% to 25% of the total number of unlabeled first sample images.
  • the embodiments of the present disclosure do not limit the value of K and the quantitative relationship between the third quantity and the fourth quantity and K.
  • the selected sixth target and the seventh target may be combined, and the combined multiple targets may be used as the fifth target, so as to remove possible duplicate targets therein.
  • the selected sixth target and the seventh target may be combined, and the combined multiple targets may be used as the fifth target, so as to remove possible duplicate targets therein.
  • about 6K fifth objects are available.
  • the above processing method can be called bootstrapping. In this way, a certain number of positive samples and negative samples with high probability can be selected from the second target at the same time, so as to carry out feature correlation mining in the future. Reduce the calculation amount of feature correlation mining and improve processing efficiency.
  • the step of respectively determining the second number of sample images to be mined in each category may be: include:
  • the second quantity of sample images to be mined in each category is determined respectively.
  • the proportion f c of the third object in each category can be determined; according to the proportion f c , each category can be calculated by the following formula sampling weight of
  • c represents R & lt class c sample values; t is the hyper-parameters, for example, a value of 0.1; C denotes the number of categories; R & lt classes C i represents the i-th sample value of the categories .
  • the sampling proportion corresponding to the category with a smaller proportion can be increased, and the sampling proportion corresponding to the category with a larger proportion can be reduced, thereby alleviating the quantity imbalance between samples of different categories in order to improve the training effect of the network.
  • the second number of sample images to be mined for each category can be determined. Further, feature correlation mining may be performed according to the second quantity.
  • the method further includes: inputting the third sample image into the target detection network for processing to obtain feature information of the third target in the third sample image.
  • the labeled third sample image in the training set can be input into the target detection network, and the feature information of the third sample image, such as a feature vector, is output from the hidden layer (eg, convolution layer) of the target detection network.
  • the hidden layer eg, convolution layer
  • the feature information of the third target in the third sample image the feature information of the fifth target and the second number of sample images to be mined in each category, from the third sample image Among the five targets, the fourth target and the first sample image where the fourth target is located are determined, including:
  • the first category is any one of the categories of the third target
  • the target with the largest distance among the eighth targets is determined as the fourth target.
  • a k-center method may be used to mine a corresponding number of sample images from the sample images where the fifth target is located.
  • the distance between the feature information of the third target of the first category and the feature information of each fifth target may be calculated, the distance It can be, for example, the Euclidean distance.
  • the third target with the smallest distance from the fifth target among the third targets in the first category can be determined, so that the third target with the smallest distance from each fifth target can be determined, which can be called the first target.
  • one target with the largest distance may be selected from each of the eighth targets, and determined as the fourth target obtained by this feature correlation mining. As shown in the following formula:
  • u represents the fourth target obtained by feature correlation mining; dist(f j , g l ) represents the feature information f j of the jth fifth target and the lth third target of the first category c the distance between the feature information g L; A set of feature information representing the fifth target; A set of feature information representing the third object of the first category c.
  • the first sample image where the fourth target is located can be determined, and the sample image is added to the training set as the fourth sample image, thereby completing the feature correlation mining process this time.
  • the feature information of the third target in the third sample image the feature information of the fifth target and the second number of sample images to be mined in each category, from the third sample image
  • the step of determining the fourth target and the first sample image where the fourth target is located among the five targets further includes:
  • the determined fourth object is added to the third object of the first category, and the determined fourth object is removed from the unlabeled fifth object.
  • the fourth target obtained by this feature correlation mining is regarded as the labeled target, and the fourth target is removed from the unlabeled target.
  • the feature information of the fourth object may be added to the set of feature information of the third object of the first category c , the set of feature information from the fifth target removed in.
  • the two updated sets can be mined by formula (3), and the above process can be repeated.
  • the number of the fourth sample images of the first category reaches the second number of the first category, or the second number is not reached and the fifth target is exhausted (the set When it is empty), the feature correlation mining of the first category can be completed.
  • human annotation may be performed on the mined fourth sample image to obtain annotation information of the fourth sample image. Since there may be both a positive sample image (that is, the fourth sample image including the target in the image) and a negative sample image (that is, the fourth sample image that does not include the target) in the fourth sample image, the fourth sample image
  • the annotation information can include the sample category information of whether the image is a positive sample image or a negative sample image, the image frame where the object is located in the positive sample image, and the category of the object.
  • the second sample image, the third sample image and the fourth sample image in the training set may be selected according to the annotation information of the fourth sample image in step S15. , train the target detection network.
  • step S15 may include: according to the categories of the targets in the positive sample images of the training set, respectively determining the first number of samples sampled from the positive sample images of each category, the positive sample images being the sample images including the target in the image ;
  • the object detection network is trained according to the fifth sample image and the sixth sample image.
  • the target detection network can be trained by resampling, and the sampling frequency of data with low frequency in the data can be increased by resampling to improve the performance of the network for these data, and further improve the positive and negative samples. imbalance between.
  • the positive sample images and the negative sample images in the training set may be sampled respectively, so that the sampled positive sample image
  • the number of negative samples is the same or similar.
  • the total number of samples of the positive sample image may be preset. According to the categories of the objects in the positive sample images in the training set, the first number of samples sampled from the positive sample images of each category is determined respectively.
  • the proportion of the target of each category can be determined; according to the proportion, the sampling proportion of each category can be calculated by the following formula:
  • R h represents the sampling proportion of the positive sample images of the h th category
  • q h represents the proportion of the objects of the h th category
  • t 1 is a hyperparameter, and the value is, for example, 0.1.
  • the sampling proportion corresponding to the category with a smaller proportion can be increased, and the sampling proportion corresponding to the category with a larger proportion can be reduced, so as to alleviate the imbalance in the number of positive sample images of different categories, so that Improve the training effect of the network.
  • the first number of positive sample images of each category may be determined according to the sampling proportion of positive sample images of each category and the total number of samples of positive sample images.
  • a first number of positive sample images may be randomly sampled from the positive sample images of the category according to the first number of the category, as the fifth sample image.
  • the positive sample images of each category are sampled respectively, and the fifth sample image with the total number of samples can be obtained.
  • the negative sample images in the training set can be directly randomly sampled according to the preset total number of samples to obtain the sixth sample image with the total number of samples.
  • the total number of samples of negative sample images may be the same as or different from the total number of samples of positive sample images, which is not limited in this embodiment of the present disclosure.
  • the target detection network can be trained according to the fifth sample image and the sixth sample image. That is, input the fifth and sixth sample images into the target detection network respectively to obtain the target detection results of the fifth and sixth sample images; determine the loss of the target detection network according to the target detection results and the label information; and adjust the loss in the reverse direction.
  • step S11 the step of pre-training the target detection network by using the marked third sample image can also be performed by the above-mentioned resampling training method, thereby improving the target detection network. pre-training effect.
  • steps S11-S15 can be repeated to achieve continuous incremental training. That is to say, when the unlabeled sample images are collected again, the target detection network after this training can be used as the initial target detection network, the expanded training set can be used as the initial training set, and the pseudo-labeling can be repeated.
  • Feature correlation mining The process of resampling training, so as to continuously improve the performance of the target detection network.
  • FIG. 2 shows a schematic diagram of a processing procedure of a network training method according to an embodiment of the present disclosure.
  • the data source includes a large number of unlabeled first sample images 20 , the first sample images 20 are input into the target detection network for prediction, and the target detection of each first sample image 20 is obtained.
  • the result 21 includes the image area (not shown), the feature vector and the classification probability of the object in the first sample image.
  • the target detection network may include a CNN backbone network 211, a feature map pyramid network (FPN) 212, and a fully connected network 213, such as a bbox head.
  • FPN feature map pyramid network
  • the target detection network After the first sample image 20 is input to the target detection network, it is processed by the CNN backbone network 211 and the FPN 212 to obtain a feature map 214 of the first sample image, and the feature map 214 is processed by the fully connected network 213 to obtain the target detection result 21.
  • the category confidence of the target can be determined according to the classification probability of the target; for the first targets whose category confidence is greater than or equal to the first threshold (for example, 0.99), the first objects in which the first targets are located are determined.
  • This image is used as the second sample image 22, and pseudo-labeling is performed on the second sample image 22, that is, the image area of the first target and the category corresponding to the category confidence of the first target are used as the labeling information of the second sample image 22.
  • the labeled second sample image 22 is added to the training set 25, thereby realizing the expansion of the positive samples in the training set.
  • a certain number of fifth targets are selected by the bootstrapping method, and the sample image 23 where the fifth target is located is obtained.
  • the feature vector (not shown) of the third target in the marked third sample image in the training set perform feature correlation mining on the fifth target, and determine the fourth target and the first sample image where the fourth target is located, As the fourth sample image 24 .
  • the fourth sample image 24 is manually labeled and added to the training set 25, so as to further expand the labeled images in the training set.
  • the training set 25 includes the labeled second sample image, the third sample image, and the fourth sample image. Resampling the training set 25, balancing the number of positive and negative samples, and the number of positive samples of different categories, to obtain a resampled training set 26; and then train the target detection network according to the resampled training set 26, thereby completing the entire process.
  • a target detection method comprising:
  • the target detection network trained by the above method can be deployed to realize the target detection of the image to be processed.
  • the image to be processed may be, for example, an image collected by an image collection device (eg, a camera), and the image may include a target to be detected, such as a human body, a face, a vehicle, an object, and the like. This embodiment of the present disclosure does not limit this.
  • the to-be-processed image may be input into a target detection network for processing to obtain a target detection result of the to-be-processed image.
  • the target detection result includes the position and category of the target in the image to be processed, such as the detection frame where the face in the image to be processed is located and the identity corresponding to the face.
  • the active learning mining method is used to mine potential unlabeled data
  • the semi-supervised learning method is used to label the auxiliary unlabeled data
  • the quantity of positive sample data is expanded, thereby solving large-scale problems.
  • large-scale long-tail detection the problem of large data size and difficulty in collecting positive samples, and to a certain extent, alleviates the problem of imbalance between positive and negative samples.
  • the model performance is effectively improved in the environment of limited annotation and computing resources.
  • the target detection network is trained by means of resampling, which can solve the negative impact of the imbalance of positive and negative samples on network training, and alleviate the negative impact of the imbalance between different categories of positive samples on network training. , so that the target detection network can effectively converge during training and improve the network performance.
  • the network training method by using the active learning method, potentially high-value samples that are helpful for model improvement can be mined in a huge amount of unlabeled data, and the model can be effectively improved in a limited labeling and computing resource environment. performance, saving a lot of manpower and computing costs required for the application of deep learning models in new businesses; using the resampling method, the target detection network can be effectively trained in the case of unbalanced samples, without too much manual parameter adjustment intervention, saving deep learning The labor cost required to apply the model to the new business.
  • the network training method according to the embodiment of the present disclosure can be applied to the fields of intelligent video analysis, security and other fields.
  • the method can be used to detect potential targets in intelligent video analysis or intelligent monitoring online. , and iteratively improve the detection network of the application, quickly meet the performance requirements required by the business with less labor and computing costs, and can continue to improve network performance in the future.
  • the network training method of the embodiments of the present disclosure can be applied to online intelligent video analysis or intelligent monitoring, so as to rapidly iterate online potential target detection applications in intelligent video analysis or intelligent monitoring under limited labor and computing resources It can quickly achieve the performance requirements required by the business with less labor and computing costs, and can continue to improve the performance of the model afterwards.
  • the embodiments of the present disclosure also provide a network training device, a target detection device, an electronic device, a computer-readable storage medium, and a program, all of which can be used to implement any network training method or target detection method provided by the embodiments of the present disclosure, Corresponding technical solutions and descriptions, and refer to the corresponding records in the method section, will not be repeated.
  • FIG. 3 shows a block diagram of a network training apparatus including a processor (not shown in FIG. 3 ) for executing a program stored in a memory (not shown in FIG. 3 ) according to an embodiment of the present disclosure part; as shown in Figure 3, the program part stored in the memory includes:
  • the target detection part 31 is configured to input the unlabeled first sample image into the target detection network for processing, and obtain the target detection result of the first sample image, and the target detection result includes the first sample image image area, feature information and classification probability of the target;
  • a confidence level determination part 32 configured to determine the category confidence level of the object according to the classification probability of the object
  • the labeling part 33 is configured to take the first sample image where the first target is located as the marked second sample image for the first target whose category confidence is greater than or equal to the first threshold in the target, and add In the training set, the annotation information of the second sample image includes the image area of the first target and the category corresponding to the category confidence of the first target, and the training set includes the labeled third sample image ;
  • the feature mining part 34 is configured to, for the second object in the object whose category confidence is less than the first threshold value, perform an analysis on the second object according to the feature information of the third object in the third sample image.
  • Feature correlation mining through feature correlation mining, determine the fourth target and the first sample image where the fourth target is located from the second target, and use the first sample image where the fourth target is located as the The fourth sample image is added to the training set;
  • the training part 35 is configured to train the target detection network according to the label information of the fourth sample image, the second sample image, the third sample image and the fourth sample image in the training set.
  • the training part includes: a sampling quantity determination sub-part, configured to separately determine, according to the category of the target in the positive sample images of the training set, the number of samples sampled from the positive sample images of each category the first quantity, the positive sample images are sample images including the target in the image; the first sampling subsection is configured to sample the positive sample images of each category according to the first quantity sampled in the positive sample images of each category , to obtain a plurality of fifth sample images; the second sampling subsection is configured to sample the negative sample images of the training set to obtain a plurality of sixth sample images, and the negative sample images are images that do not include the target a sample image; a training subsection configured to train the object detection network according to the fifth sample image and the sixth sample image.
  • the feature mining part includes: an information entropy determination sub-section, configured to determine the information entropy of the second target according to the classification probability of the second target; a target selection sub-section, is configured to select a fifth target from the second target according to the category confidence and information entropy of the second target; the mining quantity determination subsection is configured to select a fifth target according to the third sample image in the third sample image
  • the category of the target and the total number of sample images to be mined respectively determine the second quantity of the sample images to be mined for each category;
  • the target and image determination sub-section is configured to be based on the third target in the third sample image.
  • the feature information, the feature information of the fifth target and the second number of sample images to be mined in each category, the fourth target and the first sample image where the fourth target is located are determined from the fifth target.
  • the target selection sub-section is configured to: according to the category confidence and information entropy of the second target, sort the second targets respectively, and select a third number of Six targets and a seventh target with a fourth quantity; the sixth target and the seventh target are combined to obtain the fifth target.
  • the mining quantity determination subsection is configured to: determine the proportion of the third objects of each category according to the category of the third object in the third sample image; The proportion of the three targets determines the sampling proportion of each category; according to the sampling proportion of each category, the second quantity of sample images to be mined in each category is determined respectively.
  • the target and image determination subsection is configured to: determine the third target according to the distance between the feature information of the third target of the first category and the feature information of each fifth target.
  • the third target with the smallest distance from each fifth target is used as the eighth target, and the first category is any one of the categories of the third targets; the distance among the eighth targets is the largest target, identified as the fourth target.
  • the target and image determination subsection is further configured to: add the determined fourth target to the third target of the first category, and add the determined fourth target to the third target of the first category. Four targets were removed from the unlabeled fifth target.
  • the apparatus further includes: a feature extraction part, configured to input the third sample image into the target detection network for processing to obtain a third target in the third sample image characteristic information.
  • the apparatus further includes: a pre-training part configured to pre-train the target detection network by using the labeled third sample image.
  • the first sample image includes a long-tail image.
  • the sampling quantity determination sub-section is further configured to: in the category of the target according to the positive sample images of the training set, respectively determine the number of samples sampled from the positive sample images of each category. Before a certain number, the positive sample images and negative sample images in the training set are sampled to obtain the same or similar number of positive sample images and negative sample images.
  • the total number of sample images to be mined is 5% to 25% of the total number of the first sample images.
  • the target selection subsection is further configured to: remove the same target as the seventh target from the sixth target, and obtain the sixth target and the seventh target The remaining target with different targets; the remaining target and the seventh target are regarded as the fifth target.
  • the method further includes: according to the distance between the feature information of the third target of the first category and the feature information of each fifth target, respectively determining the first category of After the third target with the smallest distance from each fifth target among the third targets is used as the eighth target, after the number of the first sample images where the fourth target is located reaches the sample images of the first category to be mined When the second number of , ends the determination of the eighth target.
  • the target and image determination subsection is further configured to: the distance between the feature information of the third target according to the first category and the feature information of each fifth target, respectively After determining the third target with the smallest distance from each fifth target among the third targets of the first category and using it as the eighth target, the number of the first sample images where the fourth target is located does not reach the number of the first sample images.
  • the determination of the eighth target is ended.
  • the feature extraction part is further configured to: input the third sample image into the target detection network to obtain a feature vector output by the hidden layer of the target detection network;
  • the feature vector is determined as feature information of the third target.
  • a target detection apparatus includes: a detection processing part configured to input an image to be processed into a target detection network for processing, and obtain a target detection result of the to-be-processed image, where The target detection result includes the position and category of the target in the to-be-processed image, and the target detection network is trained according to the above-mentioned network training method.
  • the functions or included parts of the apparatus provided in the embodiments of the present disclosure may be configured to execute the methods described in the above method embodiments, and the specific implementation may refer to the above method embodiments. For brevity, I won't go into details here.
  • a "part” can also be a part of a circuit, a part of a processor, a part of a program or software, etc., of course, it can also be a unit, and it can also be a module or non-modular.
  • Embodiments of the present disclosure further provide a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory configured to store instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.
  • Embodiments of the present disclosure also provide a computer program product, including computer-readable code, when the computer-readable code is run on a device, a processor in the device executes a network training method configured to implement the network training method provided in any of the above embodiments Or directives for object detection methods.
  • Embodiments of the present disclosure also provide another computer program product configured to store computer-readable instructions, which, when executed, cause the computer to perform the operations of the network training method or the target detection method provided by any of the foregoing embodiments.
  • the electronic device may be provided as a terminal, server or other form of device.
  • FIG. 4 shows a block diagram of an electronic device 800 according to an embodiment of the present disclosure.
  • electronic device 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, etc. terminal.
  • the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814 , and the communication component 816 .
  • the processing component 802 generally controls the overall operation of the electronic device 800, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing component 802 can include one or more processors 820 to execute instructions to perform all or some of the steps of the methods described above.
  • processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components.
  • processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.
  • Memory 804 is configured to store various types of data to support operation at electronic device 800 . Examples of such data include instructions for any application or method operating on electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. Memory 804 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • Power supply assembly 806 provides power to various components of electronic device 800 .
  • Power supply components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 800 .
  • Multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action.
  • the multimedia component 808 includes a front-facing camera and/or a rear-facing camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.
  • Audio component 810 is configured to output and/or input audio signals.
  • audio component 810 includes a microphone (MIC) that is configured to receive external audio signals when electronic device 800 is in operating modes, such as calling mode, recording mode, and voice recognition mode.
  • the received audio signal may be further stored in memory 804 or transmitted via communication component 816 .
  • audio component 810 also includes a speaker for outputting audio signals.
  • the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.
  • Sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of electronic device 800 .
  • the sensor assembly 814 can detect the on/off state of the electronic device 800, the relative positioning of the components, such as the display and the keypad of the electronic device 800, the sensor assembly 814 can also detect the electronic device 800 or one of the electronic device 800 Changes in the position of components, presence or absence of user contact with the electronic device 800 , orientation or acceleration/deceleration of the electronic device 800 and changes in the temperature of the electronic device 800 .
  • Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • Sensor assembly 814 may also include a light sensor, such as a complementary metal oxide semiconductor (CMOS) or charge coupled device (CCD) image sensor, for use in imaging applications.
  • CMOS complementary metal oxide semiconductor
  • CCD charge coupled device
  • the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices.
  • the electronic device 800 may access a wireless network based on a communication standard, such as wireless network (WiFi), second generation mobile communication technology (2G) or third generation mobile communication technology (3G), or a combination thereof.
  • the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wide Band
  • Bluetooth Bluetooth
  • electronic device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programmed gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation is used to perform the above method.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable A programmed gate array
  • controller microcontroller, microprocessor or other electronic component implementation is used to perform the above method.
  • a non-volatile computer-readable storage medium such as a memory 804 comprising computer program instructions executable by the processor 820 of the electronic device 800 to perform the above method is also provided.
  • FIG. 5 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure.
  • the electronic device 1900 may be provided as a server.
  • electronic device 1900 includes processing component 1922, which further includes one or more processors, and a memory resource represented by memory 1932 configured to store instructions executable by processing component 1922, such as an application program.
  • An application program stored in memory 1932 may include one or more modules, each corresponding to a set of instructions.
  • the processing component 1922 is configured to execute instructions to perform the above-described methods.
  • the electronic device 1900 may also include a power supply assembly 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input output (I/O) interface 1958 .
  • the electronic device 1900 can operate based on an operating system stored in the memory 1932, such as a Microsoft server operating system (Windows Server TM ), a graphical user interface based operating system (Mac OS X TM ) introduced by Apple, a multi-user multi-process computer operating system (Unix TM ), Free and Open Source Unix-like Operating System (Linux TM ), Open Source Unix-like Operating System (FreeBSD TM ) or the like.
  • Microsoft server operating system Windows Server TM
  • Mac OS X TM graphical user interface based operating system
  • Uniix TM multi-user multi-process computer operating system
  • Free and Open Source Unix-like Operating System Linux TM
  • FreeBSD TM Open Source Unix-like Operating System
  • a non-volatile computer-readable storage medium such as memory 1932 comprising computer program instructions executable by processing component 1922 of electronic device 1900 to perform the above-described method.
  • Embodiments of the present disclosure may be systems, methods and/or computer program products.
  • the computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the embodiments of the present disclosure.
  • a computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory sticks, floppy disks, mechanically coded devices, such as printers with instructions stored thereon Hole cards or raised structures in grooves, and any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • flash memory static random access memory
  • SRAM static random access memory
  • CD-ROM compact disk read only memory
  • DVD digital versatile disk
  • memory sticks floppy disks
  • mechanically coded devices such as printers with instructions stored thereon Hole cards or raised structures in grooves, and any suitable combination of the above.
  • Computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or through electrical wires transmitted electrical signals.
  • the computer readable program instructions described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for carrying out operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages.
  • Source or object code written in any combination, including object-oriented programming languages, such as Smalltalk, C++, etc., and conventional procedural programming languages, such as the "C" language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through the Internet connect).
  • LAN local area network
  • WAN wide area network
  • custom electronic circuits such as programmable logic circuits, field programmable gate arrays (FPGAs), or programmable logic arrays (PLAs) can be personalized by utilizing state information of computer readable program instructions.
  • Computer readable program instructions are executed to implement various aspects of the present disclosure.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the computer program product can be specifically implemented by hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
  • a software development kit Software Development Kit, SDK
  • the embodiments of the present disclosure relate to a network training method and apparatus, a target detection method and apparatus, and an electronic device.
  • the network training method includes: inputting unlabeled sample images into a target detection network for processing to obtain a target detection result, the result including the image area, feature information and classification probability of the target; and determining the category confidence of the target according to the classification probability of the target ; For the first target whose category confidence is greater than or equal to the threshold, the sample image where the first target is located is used as a marked image and is added to the training set; For the second target whose category confidence is less than the first threshold, the second target is characterized For related mining, the fourth target is determined from the second target, and the sample image where it is located is added to the training set; the target detection network is trained according to the sample image in the training set.
  • the embodiments of the present disclosure can improve the training effect of the target detection network.

Abstract

Des modes de réalisation de la présente divulgation concernent un procédé et un appareil d'instruction de réseau, un procédé et un appareil de détection de cible et un dispositif électronique. Le procédé d'instruction de réseau consiste à : entrer une image non annotée d'échantillon dans un réseau de détection de cible pour obtenir un résultat de détection de cible, le résultat comprenant une zone d'image, des informations de caractéristique et une probabilité de classification d'une cible ; déterminer une confiance de catégorie de la cible selon la probabilité de classification de la cible ; pour une première cible dont la confiance de catégorie est supérieure ou égale à un certain seuil, utiliser une image d'échantillon où la première cible prend la forme d'une image annotée et ajouter l'image annotée dans un ensemble d'instruction ; pour une deuxième cible dont la confiance de catégorie est inférieure à un premier seuil, réaliser une exploitation liée à des caractéristiques sur la deuxième cible, déterminer une quatrième cible et ajouter une image d'échantillon, la quatrième cible se trouvant dans l'ensemble d'instruction ; et instruire le réseau de détection de cible selon l'ensemble d'instruction.
PCT/CN2020/125972 2020-07-15 2020-11-02 Procédé et appareil d'instruction de réseau, procédé et appareil de détection de cible et dispositif électronique WO2022011892A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021569189A JP2022544893A (ja) 2020-07-15 2020-11-02 ネットワーク訓練方法及び装置、ターゲット検出方法及び装置並びに電子機器
KR1020217038227A KR20220009965A (ko) 2020-07-15 2020-11-02 네트워크 트레이닝 방법 및 장치, 타깃 검출 방법 및 장치와 전자 기기

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010681178.2 2020-07-15
CN202010681178.2A CN111881956B (zh) 2020-07-15 2020-07-15 网络训练方法及装置、目标检测方法及装置和电子设备

Publications (1)

Publication Number Publication Date
WO2022011892A1 true WO2022011892A1 (fr) 2022-01-20

Family

ID=73154466

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/125972 WO2022011892A1 (fr) 2020-07-15 2020-11-02 Procédé et appareil d'instruction de réseau, procédé et appareil de détection de cible et dispositif électronique

Country Status (5)

Country Link
JP (1) JP2022544893A (fr)
KR (1) KR20220009965A (fr)
CN (1) CN111881956B (fr)
TW (1) TWI780751B (fr)
WO (1) WO2022011892A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100419A (zh) * 2022-07-20 2022-09-23 中国科学院自动化研究所 目标检测方法、装置、电子设备及存储介质
CN115470910A (zh) * 2022-10-20 2022-12-13 晞德软件(北京)有限公司 基于贝叶斯优化及K-center采样的自动调参方法
CN115601749A (zh) * 2022-12-07 2023-01-13 赛维森(广州)医疗科技服务有限公司(Cn) 基于特征峰值图谱的病理图像分类方法、图像分类装置

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446335A (zh) * 2020-12-02 2021-03-05 电子科技大学中山学院 一种基于深度学习的太赫兹违禁物品检测方法
CN112541928A (zh) * 2020-12-18 2021-03-23 上海商汤智能科技有限公司 网络训练方法及装置、图像分割方法及装置和电子设备
CN112581472B (zh) * 2021-01-26 2022-09-02 中国人民解放军国防科技大学 一种面向人机交互的目标表面缺陷检测方法
CN113052244B (zh) * 2021-03-30 2023-05-26 歌尔股份有限公司 一种分类模型训练方法和一种分类模型训练装置
CN113111960B (zh) * 2021-04-25 2024-04-26 北京文安智能技术股份有限公司 图像处理方法和装置以及目标检测模型的训练方法和系统
CN113159209A (zh) * 2021-04-29 2021-07-23 深圳市商汤科技有限公司 目标检测方法、装置、设备和计算机可读存储介质
CN113344086B (zh) * 2021-06-16 2022-07-01 深圳市商汤科技有限公司 人机回圈方法、装置、系统、电子设备和存储介质
CN113748430A (zh) * 2021-06-28 2021-12-03 商汤国际私人有限公司 对象检测网络的训练与检测方法、装置、设备和存储介质
CN113486957A (zh) * 2021-07-07 2021-10-08 西安商汤智能科技有限公司 神经网络训练和图像处理方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156438A (zh) * 2014-08-12 2014-11-19 德州学院 一种基于置信度和聚类的未标记样本选择的方法
CN104318242A (zh) * 2014-10-08 2015-01-28 中国人民解放军空军工程大学 一种高效的svm主动半监督学习算法
CN108985334A (zh) * 2018-06-15 2018-12-11 广州深域信息科技有限公司 基于自监督过程改进主动学习的通用物体检测系统及方法
US20190073447A1 (en) * 2017-09-06 2019-03-07 International Business Machines Corporation Iterative semi-automatic annotation for workload reduction in medical image labeling
US20190378044A1 (en) * 2013-12-23 2019-12-12 Groupon, Inc. Processing dynamic data within an adaptive oracle-trained learning system using curated training data for incremental re-training of a predictive model

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341434A (zh) * 2016-08-19 2017-11-10 北京市商汤科技开发有限公司 视频图像的处理方法、装置和终端设备
CN107665353A (zh) * 2017-09-15 2018-02-06 平安科技(深圳)有限公司 基于卷积神经网络的车型识别方法、装置、设备及计算机可读存储介质
CN108764281A (zh) * 2018-04-18 2018-11-06 华南理工大学 一种基于半监督自步学习跨任务深度网络的图像分类方法
US10810460B2 (en) * 2018-06-13 2020-10-20 Cosmo Artificial Intelligence—AI Limited Systems and methods for training generative adversarial networks and use of trained generative adversarial networks
CN109034190B (zh) * 2018-06-15 2022-04-12 拓元(广州)智慧科技有限公司 一种动态选择策略的主动样本挖掘的物体检测系统及方法
JP6678709B2 (ja) * 2018-08-24 2020-04-08 株式会社東芝 情報処理装置、情報処理方法およびプログラム
CN109766991A (zh) * 2019-01-14 2019-05-17 电子科技大学 一种采用对抗性训练的人工智能优化系统及方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190378044A1 (en) * 2013-12-23 2019-12-12 Groupon, Inc. Processing dynamic data within an adaptive oracle-trained learning system using curated training data for incremental re-training of a predictive model
CN104156438A (zh) * 2014-08-12 2014-11-19 德州学院 一种基于置信度和聚类的未标记样本选择的方法
CN104318242A (zh) * 2014-10-08 2015-01-28 中国人民解放军空军工程大学 一种高效的svm主动半监督学习算法
US20190073447A1 (en) * 2017-09-06 2019-03-07 International Business Machines Corporation Iterative semi-automatic annotation for workload reduction in medical image labeling
CN108985334A (zh) * 2018-06-15 2018-12-11 广州深域信息科技有限公司 基于自监督过程改进主动学习的通用物体检测系统及方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100419A (zh) * 2022-07-20 2022-09-23 中国科学院自动化研究所 目标检测方法、装置、电子设备及存储介质
CN115470910A (zh) * 2022-10-20 2022-12-13 晞德软件(北京)有限公司 基于贝叶斯优化及K-center采样的自动调参方法
CN115601749A (zh) * 2022-12-07 2023-01-13 赛维森(广州)医疗科技服务有限公司(Cn) 基于特征峰值图谱的病理图像分类方法、图像分类装置

Also Published As

Publication number Publication date
CN111881956A (zh) 2020-11-03
TWI780751B (zh) 2022-10-11
CN111881956B (zh) 2023-05-12
TW202205151A (zh) 2022-02-01
KR20220009965A (ko) 2022-01-25
JP2022544893A (ja) 2022-10-24

Similar Documents

Publication Publication Date Title
WO2022011892A1 (fr) Procédé et appareil d'instruction de réseau, procédé et appareil de détection de cible et dispositif électronique
CN107491541B (zh) 文本分类方法及装置
TWI766286B (zh) 圖像處理方法及圖像處理裝置、電子設備和電腦可讀儲存媒介
US11120078B2 (en) Method and device for video processing, electronic device, and storage medium
US20210248718A1 (en) Image processing method and apparatus, electronic device and storage medium
WO2022166069A1 (fr) Procédé et appareil de détermination de réseau d'apprentissage profond, et dispositif électronique et support de stockage
WO2021036382A9 (fr) Procédé et appareil de traitement d'image, dispositif électronique et support de stockage
WO2021027343A1 (fr) Procédé et appareil de reconnaissance d'images de visages humains, dispositif électronique, et support d'informations
CN113326768B (zh) 训练方法、图像特征提取方法、图像识别方法及装置
CN111931844B (zh) 图像处理方法及装置、电子设备和存储介质
CN113792207A (zh) 一种基于多层次特征表示对齐的跨模态检索方法
CN111539410B (zh) 字符识别方法及装置、电子设备和存储介质
WO2022247103A1 (fr) Procédé et appareil de traitement d'image, dispositif électronique et support de stockage lisible par ordinateur
CN110532956B (zh) 图像处理方法及装置、电子设备和存储介质
WO2022021901A1 (fr) Procédé et appareil de détection de cible, dispositif électronique et support de stockage
CN109685041B (zh) 图像分析方法及装置、电子设备和存储介质
WO2022099989A1 (fr) Procédés de commande de dispositif d'identification de vitalité et de contrôle d'accès, appareil, dispositif électronique, support de stockage, et programme informatique
WO2022247128A1 (fr) Procédé et appareil de traitement d'image, dispositif électronique et support de stockage
CN111259967A (zh) 图像分类及神经网络训练方法、装置、设备及存储介质
CN111242303A (zh) 网络训练方法及装置、图像处理方法及装置
WO2022247091A1 (fr) Procédé et appareil de positionnement de foule, dispositif électronique et support de stockage
WO2022193456A1 (fr) Procédé de suivi de cible, appareil, dispositif électronique, et support d'informations
WO2022141969A1 (fr) Procédé et appareil de segmentation d'image, dispositif électronique, support de stockage et programme
CN111523599B (zh) 目标检测方法及装置、电子设备和存储介质
WO2023173659A1 (fr) Procédé et appareil de mise en correspondance de visage, dispositif électronique, support de stockage, produit programme d'ordinateur et programme d'ordinateur

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021569189

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20945441

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20945441

Country of ref document: EP

Kind code of ref document: A1