CN110598609A - Weak supervision target detection method based on significance guidance - Google Patents

Weak supervision target detection method based on significance guidance Download PDF

Info

Publication number
CN110598609A
CN110598609A CN201910824612.5A CN201910824612A CN110598609A CN 110598609 A CN110598609 A CN 110598609A CN 201910824612 A CN201910824612 A CN 201910824612A CN 110598609 A CN110598609 A CN 110598609A
Authority
CN
China
Prior art keywords
image
target
training
anchor
saliency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910824612.5A
Other languages
Chinese (zh)
Other versions
CN110598609B (en
Inventor
赵丹培
袁志超
史振威
姜志国
谢凤英
张浩鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Aeronautics and Astronautics
Original Assignee
Beijing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Aeronautics and Astronautics filed Critical Beijing University of Aeronautics and Astronautics
Priority to CN201910824612.5A priority Critical patent/CN110598609B/en
Publication of CN110598609A publication Critical patent/CN110598609A/en
Application granted granted Critical
Publication of CN110598609B publication Critical patent/CN110598609B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a salient-guided weak supervision target detection method, which utilizes a salient region in an image as guide information of target detection and combines image-level labels to form pseudo labels for training a supervision depth target detection network. In the training process, only the image-level labeled samples need to be provided for the network, a large amount of image labeling work can be omitted, the time and labor cost is reduced, and the method is suitable for actual engineering requirements. The model does not need additional algorithm to provide area proposal, can finish target detection by using a single deep network after finishing network training, is simple and convenient to use, has high operation speed, and has particularly remarkable advantage of detection precision on images containing single targets.

Description

Weak supervision target detection method based on significance guidance
Technical Field
The invention relates to the technical field of computer vision and image processing, in particular to a weak supervision target detection method guided by significance.
Background
With the development of CNNs, many target detection algorithms have emerged. Although these CNN-based target detection algorithms can achieve high detection accuracy, they all require training support by relying on a large number of samples of object-level labels (drawing a target box for each target). Moreover, for different detection tasks, completely different databases need to be constructed for training. In practical applications, acquiring a large number of training samples sometimes requires a large expenditure of labor and time, and sometimes is completely inaccessible. This has become a bottleneck in applying CNN-based target detection algorithms.
In order to solve the problem that object-level labels are difficult to obtain, a target detection algorithm based on weak supervised learning is developed. This type of algorithm is also based on CNN, but differs in that object-level labels are no longer used in the training process, but rather image-level labels (only labeling whether there is an object in the image). On the one hand, when manual labeling is carried out, the difficulty of image-level labeling is far lower than that of object-level labeling, and a training data set can be constructed with higher efficiency. On the other hand, due to the existence of the search engine, people can easily obtain the sample with the specific image-level label even through the network, and the workload of constructing the data set is further reduced.
The current target detection algorithm can be mainly divided into a traditional target detection method and a target detection method based on deep learning. The main idea of the traditional target detection algorithm is to extract the features of the target and distinguish the target from the background, and the following three methods are mainly used: the method comprises an image processing-based target detection method, a visual saliency-based target detection method and a machine learning-based target detection method. Traditional methods, while continually evolving, are always limited by the ability to manually extract features. The advent of deep learning has completely changed the landscape of target detection domain algorithms. Target detection based on deep learning mainly develops two directions: the target detection method based on the regional proposal and the target detection method based on regression have the advantages that the detection precision of the former method is higher, and the detection speed of the latter method is higher.
The target detection method based on deep learning is far better than the traditional method in precision and time, but has a common problem of CNN, namely that a large number of object-level labeled samples are required for training. Obtaining a large number of samples marked at object level is very difficult, requires a lot of manpower for manual labeling, also takes a long time, and the time and manpower costs severely limit the wide application of deep learning in practical engineering.
Therefore, the problem that needs to be solved by those skilled in the art is how to provide a new method for detecting a target through weak supervised learning, which can efficiently and accurately complete a target detection task without using a large-scale object-level data set.
Disclosure of Invention
In view of this, the present invention provides a method for detecting a target under weak supervision with saliency guidance, and in the field of target detection, weak supervision learning mainly refers to a method for training by using only a sample labeled at an image level without using a sample labeled at an object level. Different from the object-level labeling, the image-level labeling only provides the class labels in the image, i.e. the objects of which class are contained in the image, but does not provide other information such as the positions and the number of the objects, and the labeling mode remarkably reduces the workload when the training data set is constructed. The method uses the result of the significance detection as guide information, and combines with image-level labels to form pseudo labels for supervising the training of the depth target detection network. In the training process, only image-level labeled samples need to be provided for the network, a large amount of image labeling work can be omitted, the time and labor cost is low, and the method is suitable for the actual engineering environment. The model does not need an additional algorithm to provide an area proposal, can finish detection by using a single deep network after finishing the training of the network, is simple and convenient to use, has high operation speed, and has the particularly prominent advantage of detecting precision on an image of a single target.
In order to achieve the above purpose, the invention provides the following technical scheme:
a method for detecting a weakly supervised target guided by significance comprises the following specific steps:
step 1: carrying out significance detection on the input training image by utilizing a learning promotion significance model to obtain a visual significance detection result of the input training image;
step 2: segmenting a visual saliency detection result by using a self-adaptive threshold, converting a grayscale saliency map into a binary saliency map, and removing isolated noise points by using morphological operation;
and step 3: taking the boundary of the significance map in the binary form generated in the step 2 as the position of a target in the training image, combining target category information provided by image-level labels, constructing a pseudo label simultaneously containing the target category and the position information, and matching and storing the pseudo label with the training image; judging whether the generation of pseudo labels of all training images is finished, namely, checking whether the traversal of the images in the training set list is finished, if so, entering a network detection training stage, executing a step 4, otherwise, continuously generating the pseudo labels, and executing a step 1;
and 4, step 4: carrying out feature extraction on the input convolutional neural network with the pseudo label to obtain a multi-scale feature map, carrying out intensive sampling on the multi-scale feature map, and carrying out primary refining on the prediction frame through the pseudo label;
and 5: fusing multi-scale feature maps through deconvolution, performing full supervision training on the fused feature maps by using pseudo labels, classifying and regressing refined detection frames, and executing a step 6 when the number of training rounds reaches a set number, or executing a step 4;
step 6: and inputting the image to be detected into the trained convolutional neural network to obtain a detection result.
Preferably, in the above method for detecting a weakly supervised target guided by saliency, the step 1 is used to obtain a visual saliency detection result of the input training image, and the method specifically includes the following steps:
s11: for a training image XmM denotes the m-th image, each pixel of which is denoted xmnN denotes the nth pixel in the image; training image XmDivided into two regions Cm1And Cm2Respectively representing a salient region and a background region, Xm=Cm1∪Cm2
S12: an embedded function model is constructed by utilizing a deep neural network phi containing a parameter theta, and each pixel of an input training image is mapped into a D-dimensional vector
φmn=φ(xmn;θ) (1)
Mapping both salient and background regions to a D-dimensional matrix mu using a deep neural network psi containing a parameter etamk:μmk=ψ(Cmk;η),k=1,2 (2);
CmkIs a training image XmWherein k 1 represents a salient region and k 2 represents a background region;
s13: pixel xmnFalls in the region CmkThe probability of (d) is expressed by a Softmax function:
wherein phimnAnd mumkRespectively representing projection results obtained by calculation of the formula (1) and the formula (2), and d (-) represents Euclidean distance;
s14: defining a loss function:wherein t ismnIs an indicator variable, tmn1 means that the pixel belongs to a salient region, i.e. xmn∈Cm1,tmn0 means that the pixel belongs to the background region, i.e. xmn∈Cm2(ii) a Optimization of loss function using gradient descent methodAnd (5) converting to obtain a significance detection result.
Preferably, in the above method for detecting a weakly supervised target guided by saliency, the step 2 converts the saliency map in a grayscale form into the saliency map in a binary form by means of threshold segmentation, and removes noise points by morphological operations, and the method specifically includes:
s21: saliency map I for grayscale formgObtaining an adaptive threshold value T by means of maximum inter-class variance, and obtaining a significance map I of a binary form by threshold segmentationb
S22: and through open operation, firstly expanding and then corroding, and removing noise points.
Preferably, in the above method for detecting a weakly supervised target guided by saliency, the step 3 provides target position information with a saliency map in a binary form, generates a pseudo label by combining with image-level labels, and determines whether to complete labeling of all training images, and the specific steps include:
s31: significance map I 'for morphologically processed binary form'bIs divided into a significant region S1And an insignificant area S2And l'b=S1∪S2Finding the smallest rectangular areaSo thatAnd an arbitrary rectangular regionAll can not satisfyWherein m is0Number of minimum region, mkIs the serial number of the other area; getFour vertices of { x1,y1,x2,y2And the image level label C of the training image forms a pseudo label L of the imagem={x1,y1,x2,y2,C};
S32: and (3) when all the training images finish the pseudo-labeling, entering the training of the next stage, otherwise, continuing to perform the step 1 to finish the generation of the pseudo-labeling.
Preferably, in the above method for detecting a weakly supervised target guided by saliency, the step 4 utilizes a convolutional neural network to perform feature extraction, proposes an anchor point for prediction, and refines the anchor point by using pseudo labeling, and includes the specific steps of:
s41: extracting characteristics: VGG16 is used as a basic network for feature extraction, and a plurality of convolution layers are additionally added; the size of the input image is scaled to H; finally, a total of 4 specific feature maps were extracted from Conv4_3, Conv5_3, Conv7 and the additional convolutional layer, the resolution of each feature map being
S42: multi-scale dense sampling of anchor points: sampling anchor points on feature maps of different scales simultaneously; on the extracted 4 characteristic maps, obtainingAn anchor point;
s43: and (3) carrying out secondary classification and primary regression on the obtained anchor points: performing target or background classification in the anchor point refining module, and performing fine adjustment on the position; defining a refining loss function
Wherein the content of the first and second substances,andrespectively a two-classification loss function and a regression loss function,is the class of the object within the anchor point,indicates that the anchor belongs to the background, andindicating that anchor belongs to a certain class of objects, piJudging whether the anchor point is the probability of the target; x is the number ofiThe coordinates of the anchor point are represented by,representing the position of the target in the pseudo label; n is a radical ofaRepresenting the number of anchor points in the anchor point refining module;
using two classes of L2Loss function:
if it isThen li1, otherwisei=0;
Use ofFunction:
wherein:
andrepresenting the offset of the anchor point and the pseudo label after refining relative to the anchor point before refining, i representing a serial number, k taking x, y, w and h and respectively representing the abscissa, the ordinate, the width and the height of the anchor point and the pseudo label; are respectively defined as:
wherein x isi、xaiRespectively representing the positions of the anchor point after refining, the anchor point before refining and the pseudo label. The criteria for positive sample selection is that the intersection ratio of target pseudo labels of any class IoU > 0.5:
wherein TP is a region that is a target in both prediction and annotation, FP represents a region that is a target in prediction and is an annotation in background, FN represents a region that is a background in prediction and is an annotation in object;
the criteria for negative sample selection is to select the highest scoring among all negative samples, and the number is 3 times the number of positive samples.
Preferably, in the above method for detecting a weakly supervised target guided by saliency, the step 5 of fusing the multi-scale feature map increases the scale of the deep feature map by deconvolution, and fuses the deep feature map with the shallow feature map, and the specific steps include:
and S51 feature map fusion: for 4 feature maps extracted by Conv4_3, Conv5_3 and Conv7 and the additionally added convolutional layers, increasing the scale of one with smaller scale by deconvolution, and linearly adding the feature map with the previous adjacent scale to obtain a fused feature map; 4 fused feature maps are obtained according to the method, dense sampling of anchor points is not carried out on the fused feature maps, and the anchor points refined in the step 4 are used as anchor point sampling points in the step;
s52: and (3) multi-class classification regression of the anchor points: on the basis of the anchor points subjected to the primary classification regression, further performing fine multi-class classification regression; defining a detection loss function
Wherein the content of the first and second substances,andrespectively a two-classification loss function and a regression loss function,is the class of the object within the anchor point,indicates that the anchor belongs to the background, andindicating that anchor points belong to a certain class of objects,ciIs the probability that the object belongs to class C; t is tiThe coordinates representing the predicted position are then calculated,representing the position of the target in the pseudo label; n is a radical ofoRepresenting the number of anchor points in the anchor point refining module;
using L2Loss function:
use ofFunction, synchronization step 4;
s53: constructing a multitask loss function together:
wherein p isi、xi、ci、tiRespectively representing the probability of whether the anchor point is the target, the position of the anchor point, the probability of the target belonging to the class C and the coordinate of the predicted position; and performing cascade training by using the pseudo labels, setting a total training round number E, and finishing the training of the network when the training round number reaches E.
Preferably, in the above method for detecting a weakly supervised target guided by saliency, the step of inputting the image to be detected into the trained convolutional neural network includes: inputting an image to be detected into a network, and scaling the image to be detected into a specified size by the network; extracting 4 characteristic graphs through convolution, carrying out intensive sampling on the characteristic graphs to obtain anchor points, screening the anchor points subjected to intensive sampling through an anchor point refining module to remove simple negative samples, and finely adjusting the positions of the anchor points; fusing the feature maps by using deconvolution to obtain 4 feature maps fused with deep layers, and classifying and regressing by using refined anchor points to obtain a final detection position; and finally, returning the detection result to the size of the original image according to the proportion, namely the detected target position.
According to the technical scheme, compared with the prior art, the method for detecting the weakly supervised target by the saliency guide has the advantages that the result of the saliency detection is used as guide information, and the pseudo label is formed by combining the image-level label and is used for training the supervised deep target detection network. In the training process, only image-level labeled samples need to be provided for the network, a large amount of image labeling work can be omitted, the time and labor cost is low, and the method is suitable for the actual engineering environment. The model does not need an additional algorithm to provide an area proposal, can finish detection by using a single deep network after finishing the training of the network, is simple and convenient to use, has high operation speed, and has the particularly prominent advantage of detecting precision on an image of a single target.
The invention has the following advantages and beneficial effects:
(1) the invention is based on the regression-based deep learning target detection network, does not use an additional region proposing algorithm or network, completes the network training of the target detection part in one step, and has high training speed.
(2) The invention trains the network by adopting a weak supervision learning mode, and the data set used for training is image-level labeled data, namely only the category of the target in the image is labeled, but not the category and the position of each target. The labeling work of the data set is easier to complete, and the cost of consumed labor and time is low.
(3) The invention adopts the visual saliency as a guide, provides position information for the image labeled at the image level to form a pseudo label, accords with the visual judgment of human brain, and is beneficial to improving the accuracy of target detection in images containing different types of targets in different environments.
(4) The method adopts a multi-scale feature fusion feature extraction strategy, and the shallow feature map has high-resolution information, so that small targets can be detected conveniently; the deep characteristic map has high-level semantic information, large targets are favorably detected, the deep characteristic map and the shallow characteristic map are fused in a deconvolution mode, high-resolution information and strong semantic information can be fused and complemented, and the small target detection precision is improved.
(5) The weak supervision target detection method based on the significance guidance can accurately detect various targets under different scales and different environments, and has better robustness.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a salient guidance-based weakly supervised target detection algorithm of the present invention;
FIG. 2 is a flow diagram of the pseudo annotation generation based on saliency steering of the present invention;
FIG. 3 is a deep feature extraction network based on multi-scale feature fusion in the present invention;
FIG. 4 is a detection effect diagram of a weak supervision target detection algorithm based on significance guidance under different conditions of background, illumination, scale and the like;
FIG. 5 is a detection effect diagram of the weakly supervised target detection algorithm based on saliency guidance in the invention under different scales and different fuzzy degrees.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a saliency-guided weak supervision target detection method, which utilizes a saliency detection result as guide information and combines image-level labels to form pseudo labels for training a supervision depth target detection network. In the training process, only image-level labeled samples need to be provided for the network, a large amount of image labeling work can be omitted, the time and labor cost is low, and the method is suitable for the actual engineering environment. The model does not need an additional algorithm to provide an area proposal, can finish detection by using a single deep network after finishing the training of the network, is simple and convenient to use, has high operation speed, and has the particularly prominent advantage of detecting precision on an image of a single target.
Referring to fig. 1, the method for detecting a target by using a weak supervised learning training target utilizes significance to guide generation of pseudo labels, and comprises the following specific implementation steps:
step 1: carrying out saliency detection on an input training image by utilizing a visual saliency model, such as a Full Connected Network (FCN), and obtaining a visual saliency detection result of the input training image;
for a training image XmM denotes the m-th image, each pixel of which is denoted xmnAnd n denotes an nth pixel in the image. Training image XmDivided into two regions Cm1And Cm2Respectively representing a salient region and a background region, Xm=Cm1∪Cm2
An embedded function model is constructed by using Deep Neural Networks (DNN) phi containing a parameter theta, and each pixel of an input training image is mapped into a D-dimensional vector
φmn=φ(xmn;θ)
Mapping both salient and background regions to a D-dimensional matrix μ using DNN ψ containing a parameter ηmk
μmk=ψ(Cmk;η),k=1,2
CmkIs a training image XmWherein k-1 represents a salient region and k-2 represents a background region.
Pixel xmnFalls in the region CmkThe probability of (d) can be expressed by a Softmax function:
wherein phimnAnd mumkRespectively representing projection results obtained by calculation of the formula (1) and the formula (2), and d (-) represents Euclidean distance;
defining a loss function:
wherein t ismnIs an indicator variable, tmn1 means that the pixel belongs to a salient region, i.e. xmn∈Cm1,tmn0 means that the pixel belongs to the background region, i.e. xmn∈Cm2(ii) a By optimizing the loss function by using a gradient descent method, the network can have the capability of classifying whether the pixels are obvious or not.
In the detection process, a rough significance map is calculated by a priori method, and then the model is used for iterative optimization to obtain a significance detection result.
Step 2: the visual saliency detection result is segmented by using a self-adaptive threshold, a grayscale saliency map is converted into a binary saliency map, isolated noise points are removed by using morphological operation, and the quality of the saliency map is improved;
saliency map I for grayscale formgObtaining an adaptive threshold value T by means of maximum inter-class variance, and obtaining a significance map I of a binary form by threshold segmentationb
Many isolated noise points often exist on the significance map of the binary form, the noise points can influence the generation of pseudo labels, and the small points can be removed through open operation, namely expansion and corrosion.
And step 3: taking the boundary of the significance map in the binary form generated in the step 2 as the position of a target in the training image, combining target category information provided by image-level labels, constructing a pseudo label simultaneously containing target category and position information, and matching and storing the pseudo label with the training image; judging whether the generation of pseudo labels of all training images is finished, namely, checking whether the traversal of the images in the training set list is finished, if so, entering a network detection training stage, executing a step 4, otherwise, continuously generating the pseudo labels, and executing a step 1;
the target to be detected in the image often shows significance, so that the result of the significance detection can roughly represent the position of the target to be detected; significance map I 'for morphologically processed binary form'bIs divided into a significant region S1And an insignificant area S2And l'b=S1∪S2Finding the smallest rectangular areaSo thatAnd an arbitrary rectangular regionAll can not satisfyWherein m is0Number of minimum region, mkIs the serial number of the other area; getFour vertices of { x1,y1,x2,y2And the combined image and the image-level label C of the training image form an imagePseudo label L ofm={x1,y1,x2,y2C }; and (3) when all the training images finish the pseudo-labeling, entering the training of the next stage, otherwise, continuing to perform the step 1 to finish the generation of the pseudo-labeling.
And 4, step 4: inputting a training image into an anchor point refining module of a target detection network, extracting the features of the image by using a VGG16 network, and taking 4 specific feature maps out of the network, wherein the 4 feature maps have different scales, namely different resolutions, the feature map with less convolution times is a shallow feature map with high resolution, and the feature map with more convolution times is a deep feature map with rich semantic information; carrying out intensive sampling on each feature map, obtaining anchor points with different sizes and aspect ratios, and grading and screening the anchor points by using the pseudo labels generated in the steps 1 to 3 to remove simple negative samples;
feature extraction network VGG 16:
VGG16 is used as a basic network for feature extraction, and a plurality of convolution layers are additionally added; the size of the input image is scaled to H × H, which is generally 320; finally, a total of 4 signatures were extracted from Conv4_3, Conv5_3, Conv7 and the additional convolutional layer, the resolutions of which were respectively When H is 320, the resolutions of the feature maps are 40 × 40, 20 × 20, 10 × 10, and 5 × 5, respectively;
multi-scale dense sampling of anchor points:
the shallow feature map is used for detecting a large target, the deep feature map is used for detecting a small target, and in order to meet the detection requirements of the large target and the small target at the same time, anchor points are sampled on the feature maps with different scales at the same time; on the extracted 4 characteristic diagrams, taking each pixel as a center, taking 3 anchor points with the aspect ratios of 1:1, 1:2 and 2:1 on each pixel point, and obtaining the anchor points on the extracted 4 characteristic diagrams in total according to the methodAnchor point, when H is 320, N is 6375;
two classifications and preliminary regression of anchors:
the number of anchor points for intensive sampling is large, and the anchor points are classified into two categories of targets or backgrounds in an anchor point refining module and subjected to fine adjustment of positions; defining a refining loss function
Wherein the content of the first and second substances,andrespectively a two-classification loss function and a regression loss function,is the class of the object within the anchor point,indicates that the anchor belongs to the background, andindicating that anchor belongs to a certain class of objects, piJudging whether the anchor point is the probability of the target; x is the number ofiThe coordinates of the anchor point are represented by,representing the position of the target in the pseudo label; n is a radical ofaRepresenting the number of anchor points in the anchor point refining module;
using two classes of L2Loss function:
if it isThen li1, otherwisei=0;
Use ofFunction:
wherein:
andrepresenting the offset of the anchor point and the pseudo label after refining relative to the anchor point before refining, i representing a serial number, k taking x, y, w and h and respectively representing the abscissa, the ordinate, the width and the height of the anchor point and the pseudo label; are respectively defined as:
wherein x isi、xaiRespectively representing the positions of the anchor point after refining, the anchor point before refining and the pseudo label.
The criteria for positive sample selection is a cross-over ratio IoU > 0.5 with any class of target pseudo labels:
wherein TP is a region that is a target in both prediction and annotation, FP represents a region that is a target in prediction and is an annotation in background, FN represents a region that is a background in prediction and is an annotation in object;
the criteria for negative sample selection is to select the highest scoring among all negative samples, and the number is 3 times the number of positive samples.
And 5: the scale of the deep characteristic diagram is increased in a deconvolution mode, the deep characteristic diagram is fused with the shallow characteristic diagram to obtain 4 fused characteristic diagrams, the characteristic vector of a refining anchor point is extracted from the fused characteristic diagrams, and full supervision learning is carried out by using pseudo labeling; stopping training when the number of learning rounds reaches a set threshold value, and performing step 6, otherwise, continuing training and performing step 4;
deconvolution module and fusion feature map:
for 4 feature maps extracted by Conv4_3, Conv5_3 and Conv7 and the additionally added convolutional layers, increasing the scale of one with smaller scale by deconvolution, and linearly adding the feature map with the previous adjacent scale to obtain a fused feature map; 4 fused feature maps are obtained according to the method, dense sampling of anchor points is not carried out on the fused feature maps, and the anchor points refined in the step 4 are used as anchor point sampling points in the step;
and (3) multi-class classification regression of the anchor points:
on the basis of the anchor points subjected to the primary classification regression, further performing fine multi-class classification regression; defining a detection loss function
Wherein the content of the first and second substances,andrespectively a two-classification loss function and a regression loss function,is the class of the object within the anchor point,indicates that the anchor belongs to the background, andindicating that an anchor belongs to a certain class of objects, ciIs the probability that the object belongs to class C; t is tiThe coordinates representing the predicted position are then calculated,representing the position of the target in the pseudo label; n is a radical ofoRepresenting the number of anchor points in the anchor point refining module;
using L2Loss function:
use ofFunction, synchronization step 4;
step 4 and step 5 jointly construct a multitask loss function:
wherein p isi、xi、ci、tiRepresenting the probability of whether the anchor point is the target, the location of the anchor point, the probability of the target belonging to category C, and the coordinates of the predicted location, respectively. And performing cascade training by using the pseudo labels, setting a total training round number E, and finishing the training of the network when the training round number reaches E. And performing cascade training by using the pseudo labels, setting a total training round number E, and finishing the training of the network when the training round number reaches E.
Step 6: after training is finished, the significance detection of the test image is not needed, the image is input into the detection network constructed in the steps 4 to 5, and a target detection result can be directly obtained.
Inputting an image to be detected into a network, and scaling the image to be detected into a specified size by the network, wherein the side length H is 320 square to meet the requirement of the model; extracting 4 characteristic graphs through convolution, carrying out intensive sampling on the characteristic graphs to obtain anchor points, screening the anchor points subjected to intensive sampling through an anchor point refining module to remove simple negative samples, and finely adjusting the positions of the anchor points; fusing the feature maps by using deconvolution to obtain 4 feature maps fused with deep semantic information, and classifying and regressing by using the refined anchor points to obtain a final detection position; and finally, returning the detection result to the size of the original image according to the proportion, namely the detected target position.
FIG. 2 is a flow chart of pseudo annotation generation based on saliency guidance, wherein the pseudo annotation is obtained by image-level annotation of an input image, and a saliency map obtained by saliency detection is obtained by threshold segmentation and morphological processing. First, a saliency map of the input image is obtained by saliency detection, as shown in fig. 2 (b). The saliency map is a grayscale image in which the more salient portions have larger grayscale values. Generally, the target to be detected is often a more prominent part in the map, so we need to extract a region with a larger gray scale in the prominent map. Through threshold segmentation, the gray level image can be converted into a binary image, and an area with a large gray level value in the image is separated. As shown in fig. 2(c), the binary image obtained by threshold segmentation often contains some noises of non-target regions, because the saliency detection only judges whether an object in the image is salient or not, and some regions close to the target position and color are easily considered to be salient at the same time. But this will have an impact on the generation of pseudo labels, so we use morphological operations to remove the fine noise. Firstly, removing isolated points in the binary image, and then carrying out open operation on the rest images, namely, firstly corroding and then expanding. This enables fine noise to be removed without affecting the size of the target position. After morphological operation, as shown in fig. 2(d), noise in the visible image is removed, and the rough position of the target can be obtained by taking the outer edge of the portion with the value of 1 in the binary image, but the category of the target is not known at this time. The label of the image level, i.e. the type information of the image, is also input simultaneously with the image, and at this time, the pseudo-label information with the type label can be obtained by giving the type information to the image, as shown in fig. 2 (f).
FIG. 3 is a deep feature extraction network based on multi-scale feature fusion in the present invention; the network is divided into two parts, and classification regression is carried out for two times. The first part is an anchor point refining module which, as the name implies, performs refining operations on a large number of anchor points proposed by intensive sampling. And 6375 anchor points are proposed on the feature map with 4 scales, so that the difficulty of directly predicting among a large number of anchor points is high, and most of the anchor points are negative samples, namely, a serious problem of unbalance of positive and negative samples exists, so that the network is difficult to effectively learn the features of the target. Therefore, the anchor points need to be selected and refined, the anchor point refining module only uses the original feature map which is not fused with the deep feature map, the category of the target is not distinguished, and only two classifications of the target and the background are carried out, so that the difficulty of classification regression can be reduced, a certain number of negative samples are eliminated, and the problem of unbalanced number of positive and negative samples is solved.
The second part is a target detection module, the network of the first part uses a deep-layer and shallow-layer combined feature map, the feature extraction capability and the target positioning capability are both enhanced, a large number of anchor points belonging to the background are eliminated through an anchor point refining module, the problem of unbalance of positive and negative samples is solved to a certain extent, the network can extract the features of the targets more easily, and the network carries out classification and regression of various targets in the refined anchor points to obtain the final prediction result.
Fig. 4 is a detection effect diagram of a part of the weak supervision target detection algorithm based on saliency guidance in the invention under different conditions of background, illumination and the like. Fig. 4(a) is a detection result of an image with a simple background, and as can be seen from the detection result, in an image with a monotonous background, the detection accuracy of the present invention is high; fig. 4(b) is a detection result in an image with light and shadow interference, where the light and shadow interference is a common scene in the image, and in the image with light and shadow interference, the color and texture of the target may be interfered and changed, which affects the target detection effect, and it can be seen from the detection result that the present invention can still maintain high detection accuracy in the presence of light and shadow interference; fig. 4(c) shows a situation that the target is blocked or incompletely photographed, and the incompletely photographed target may lose many features and details of the target, which may easily cause missed detection or inaccurate positioning.
FIG. 5 is a detection result of the weakly supervised target detection algorithm based on saliency guidance in the present invention at different scales and different degrees of ambiguity. FIG. 5(a) is a detection result obtained by artificially reducing an image to different scales, and the small target has fewer texture features and higher detection difficulty, so that the detection model of the invention can accurately detect the small target; fig. 5(b) is a detection result obtained after blurring is added to an original image, the blurred image is a common situation in engineering practice, blurring can cause that textures and edges of a target are difficult to extract, and detection quality is reduced.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. A method for detecting a weakly supervised target guided by significance is characterized by comprising the following specific steps:
step 1: carrying out significance detection on the input training image by using a visual significance model to obtain a visual significance detection result of the input training image;
step 2: segmenting a visual saliency detection result by using an adaptive threshold, converting a grayscale saliency map into a binary saliency map, and removing isolated noise points by using morphological operation to realize the optimization of the saliency map;
and step 3: taking the boundary of the significance map in the binary form generated in the step 2 as an estimated position of a target in a training image, constructing a pseudo label simultaneously containing target type and position information by combining target type information provided by image-level labels, and matching and storing the pseudo label with the training image; judging whether the generation of pseudo labels of all training images is finished, if so, entering a detection network training stage, executing the step 4, otherwise, continuously generating the pseudo labels, and executing the step 1;
and 4, step 4: inputting the training image with the pseudo label into a convolutional neural network for feature extraction to obtain a multi-scale feature map, carrying out intensive sampling on the multi-scale feature map, and carrying out primary refining on a prediction frame through the pseudo label;
and 5: fusing multi-scale feature maps through deconvolution, performing full supervision training on the fused feature maps by using pseudo labels, classifying and regressing refined detection frames, and executing a step 6 when the number of training rounds reaches a set number, or executing a step 4;
step 6: and inputting the image to be detected into the trained convolutional neural network to obtain a detection result.
2. The method for detecting the weakly supervised target of saliency guidance as recited in claim 1, wherein the step 1 is used to obtain the visual saliency detection result of the input training image, and the specific steps include:
s11: for a training image XmM denotes the m-th image, each pixel of which is denoted xmnN denotes the nth pixel in the image; training image XmDivided into two regions Cm1And Cm2Respectively representing a salient region and a background region, Xm=Cm1∪Cm2
S12: an embedded function model is constructed by utilizing a deep neural network phi containing a parameter theta, and each pixel of an input training image is mapped into a D-dimensional vector
φmn=φ(xmn;θ) (1)
Mapping salient and background regions simultaneously to a D-dimensional matrix using a deep neural network psi containing a parameter η
μmk:μmk=ψ(Cmk;η),k=1,2 (2);
CmkIs a training image XmWherein k 1 represents a salient region and k 2 represents a background region;
s13: pixel xmnFalls in the region CmkThe probability of (d) is expressed by a Softmax function:
wherein phimnAnd mumkRespectively representing projection results obtained by calculation of the formula (1) and the formula (2), and d (-) represents Euclidean distance;
s14: defining a loss function:
wherein t ismnIs an indicator variable, tmn1 indicates that the pixel belongs to a salient region, xmn∈Cm1;tmn0 indicates that the pixel belongs to the background region, xmn∈Cm2(ii) a And optimizing the loss function by using a gradient descent method to obtain a significance detection result.
3. The method for detecting the weakly supervised target guided by the saliency according to claim 1, wherein the step 2 is to convert the saliency map in the gray scale form into the saliency map in the binary form by means of threshold segmentation and remove the noise points by means of morphological calculation, and comprises the following specific steps:
s21: saliency map I for grayscale formgObtaining an adaptive threshold value T by means of maximum inter-class variance, and obtaining a significance map I of a binary form by threshold segmentationb
S22: and through open operation, firstly expanding and then corroding, and removing noise points.
4. The method for detecting the weakly supervised target guided by the saliency according to claim 1, wherein in the step 3, the saliency map in the binary form is used for providing target position information, the target position information is combined with image-level labeling to generate a pseudo label, and whether the labeling of all the training images is completed is judged, and the specific steps include:
s31: significance map I 'for morphologically processed binary form'bIs divided into a significant region S1And an insignificant area S2And l'b=S1∪S2Finding the smallest rectangular areaSo thatAnd an arbitrary rectangular regionAll can not satisfyWherein m is0Number of minimum region, mkIs the serial number of the other area; getFour vertices of { x1,y1,x2,y2And the image level label C of the training image forms a pseudo label L of the imagem={x1,y1,x2,y2,C};
S32: and (4) when all the training images finish the pseudo-labeling, entering the training of the next stage, otherwise, continuing to perform the step 1 to finish the generation of the pseudo-labeling.
5. The method for detecting the weakly supervised target guided by saliency according to claim 1, wherein the step 4 is to perform feature extraction by using a convolutional neural network, propose anchor points for prediction and perform refinement of the anchor points by using pseudo labeling, and comprises the following specific steps:
s41: extracting characteristics: VGG16 is used as a basic network for feature extraction, and a plurality of convolution layers are additionally added; the size of the input image is scaled to H; finally, a total of 4 signatures were extracted from Conv4_3, Conv5_3, Conv7 and the additional convolutional layer, the resolutions of which were respectively
S42: multi-scale dense sampling of anchor points: sampling anchor points on feature maps of different scales simultaneously; on the extracted 4 characteristic maps, obtainingAn anchor point;
s43: and (3) carrying out secondary classification and primary regression on the obtained anchor points: performing target or background classification in the anchor point refining module, and performing fine adjustment on the position; defining a refining loss function
Wherein the content of the first and second substances,andrespectively a two-classification loss function and a regression loss function,is the class of the object within the anchor point,indicates that the anchor belongs to the background, andindicating that anchor belongs to a certain class of objects, piJudging whether the anchor point is the probability of the target; x is the number ofiSeats representing anchor pointsThe mark is that,representing the position of the target in the pseudo label; n is a radical ofaRepresenting the number of anchor points in the anchor point refining module;
using two classes of L2Loss function:
if it isThen li1, otherwisei=0;
Use ofFunction:
wherein:
andrepresenting the offset of the anchor point and the pseudo label after refining relative to the anchor point before refining, i representing a serial number, k taking x, y, w and h and respectively representing the abscissa, the ordinate, the width and the height of the anchor point and the pseudo label; are respectively defined as:
wherein x isi、xaiRespectively representing the positions of the anchor point after refining, the anchor point before refining and the pseudo label;
the criteria for positive sample selection is that the intersection ratio of target pseudo labels of any class IoU > 0.5:
wherein TP is a region that is a target in both prediction and annotation, FP represents a region that is a target in prediction and is an annotation in background, FN represents a region that is a background in prediction and is an annotation in object;
the criteria for negative sample selection is to select the highest scoring among all negative samples, and the number is 3 times the number of positive samples.
6. The method for detecting the weakly supervised target guided by saliency of claim 5, wherein the step 5 of fusing the multi-scale feature map enlarges the deep-layer feature map by deconvolution and fuses the deep-layer feature map with the shallow-layer feature map, and the method comprises the following specific steps:
and S51 feature map fusion: for 4 feature maps extracted by Conv4_3, Conv5_3 and Conv7 and the additionally added convolutional layers, increasing the scale of one with smaller scale by deconvolution, and linearly adding the feature map with the previous adjacent scale to obtain a fused feature map; 4 fused feature maps are obtained according to the method, dense sampling of anchor points is not carried out on the fused feature maps, and the anchor points refined in the step 4 are used as anchor point sampling points in the step;
s52: and (3) multi-class classification regression of the anchor points: based on the anchor points subjected to the primary classification regression, the method is further carried outFine multi-class classification regression; defining a detection loss function
Wherein the content of the first and second substances,andrespectively a two-classification loss function and a regression loss function,is the class of the object within the anchor point,indicates that the anchor belongs to the background, andindicating that an anchor belongs to a certain class of objects, ciIs the probability that the object belongs to class C; t is tiThe coordinates representing the predicted position are then calculated,representing the position of the target in the pseudo label; n is a radical ofoRepresenting the number of anchor points in the anchor point refining module;
using L2Loss function:
use ofFunction, synchronization step 4;
s53: constructing a multitask loss function together:
wherein p isi、xi、ci、tiRespectively representing the probability of whether the anchor point is the target, the position of the anchor point, the probability of the target belonging to the class C and the coordinate of the predicted position; and performing cascade training by using the pseudo labels, setting a total training round number E, and finishing the training of the network when the training round number reaches E.
7. The method for detecting the weakly supervised target guided by the saliency of claim 6, wherein the image to be detected is input into a trained convolutional neural network, and the specific steps include: inputting an image to be detected into a network, and scaling the image to be detected into a specified size by the network; extracting 4 characteristic graphs through convolution, carrying out intensive sampling on the characteristic graphs to obtain anchor points, screening the anchor points subjected to intensive sampling through an anchor point refining module to remove simple negative samples, and finely adjusting the positions of the anchor points; fusing the feature maps by using deconvolution to obtain 4 feature maps fused with deep semantic information, and classifying and regressing by using the refined anchor points to obtain a final detection position; and finally, returning the detection result to the size of the original image according to the proportion to obtain the detected target position.
CN201910824612.5A 2019-09-02 2019-09-02 Weak supervision target detection method based on significance guidance Active CN110598609B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910824612.5A CN110598609B (en) 2019-09-02 2019-09-02 Weak supervision target detection method based on significance guidance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910824612.5A CN110598609B (en) 2019-09-02 2019-09-02 Weak supervision target detection method based on significance guidance

Publications (2)

Publication Number Publication Date
CN110598609A true CN110598609A (en) 2019-12-20
CN110598609B CN110598609B (en) 2022-05-03

Family

ID=68857057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910824612.5A Active CN110598609B (en) 2019-09-02 2019-09-02 Weak supervision target detection method based on significance guidance

Country Status (1)

Country Link
CN (1) CN110598609B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191732A (en) * 2020-01-03 2020-05-22 天津大学 Target detection method based on full-automatic learning
CN111680702A (en) * 2020-05-28 2020-09-18 杭州电子科技大学 Method for realizing weak supervision image significance detection by using detection frame
CN111950545A (en) * 2020-07-23 2020-11-17 南京大学 Scene text detection method based on MSNDET and space division
CN111968105A (en) * 2020-08-28 2020-11-20 南京诺源医疗器械有限公司 Method for detecting salient region in medical fluorescence imaging
CN112037216A (en) * 2020-09-09 2020-12-04 南京诺源医疗器械有限公司 Image fusion method for medical fluorescence imaging system
CN112052907A (en) * 2020-09-15 2020-12-08 浙江智慧视频安防创新中心有限公司 Target detection method and device based on image edge information and storage medium
CN112102250A (en) * 2020-08-20 2020-12-18 西北大学 Method for establishing and detecting pathological image detection model with training data as missing label
CN112115723A (en) * 2020-09-14 2020-12-22 中国船舶重工集团公司第七0九研究所 Weak supervision semantic analysis method based on false positive sample detection
CN112560853A (en) * 2020-12-14 2021-03-26 中科云谷科技有限公司 Image processing method, device and storage medium
CN112598053A (en) * 2020-12-21 2021-04-02 西北工业大学 Active significance target detection method based on semi-supervised learning
CN112766285A (en) * 2021-01-26 2021-05-07 北京有竹居网络技术有限公司 Image sample generation method and device and electronic equipment
CN112861880A (en) * 2021-03-05 2021-05-28 江苏实达迪美数据处理有限公司 Weak supervision RGBD image saliency detection method and system based on image classification
CN113096132A (en) * 2020-01-08 2021-07-09 东华医为科技有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN113139431A (en) * 2021-03-24 2021-07-20 杭州电子科技大学 Image saliency target detection method based on deep supervised learning
CN113240659A (en) * 2021-05-26 2021-08-10 广州天鹏计算机科技有限公司 Image feature extraction method based on deep learning
CN113610807A (en) * 2021-08-09 2021-11-05 西安电子科技大学 New coronary pneumonia segmentation method based on weak supervision multitask learning
CN113762455A (en) * 2020-08-07 2021-12-07 北京沃东天骏信息技术有限公司 Detection model training method, single character detection method, device, equipment and medium
CN114627437A (en) * 2022-05-16 2022-06-14 科大天工智能装备技术(天津)有限公司 Traffic target identification method and system
US20220254136A1 (en) * 2021-02-10 2022-08-11 Nec Corporation Data generation apparatus, data generation method, and non-transitory computer readable medium
CN116189058A (en) * 2023-03-03 2023-05-30 北京信息科技大学 Video saliency target detection method and system based on unsupervised deep learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103996209A (en) * 2014-05-21 2014-08-20 北京航空航天大学 Infrared vessel object segmentation method based on salient region detection
US20150310303A1 (en) * 2014-04-29 2015-10-29 International Business Machines Corporation Extracting salient features from video using a neurosynaptic system
CN105184293A (en) * 2015-08-29 2015-12-23 电子科技大学 Automobile logo positioning method based on significance area detection
US20170200065A1 (en) * 2016-01-13 2017-07-13 Adobe Systems Incorporated Image Captioning with Weak Supervision
CN107203781A (en) * 2017-05-22 2017-09-26 浙江大学 A kind of object detection method Weakly supervised end to end instructed based on conspicuousness
CN108898145A (en) * 2018-06-15 2018-11-27 西南交通大学 A kind of image well-marked target detection method of combination deep learning
CN109919059A (en) * 2019-02-26 2019-06-21 四川大学 Conspicuousness object detecting method based on depth network layerization and multitask training
CN109919013A (en) * 2019-01-28 2019-06-21 浙江英索人工智能科技有限公司 Method for detecting human face and device in video image based on deep learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150310303A1 (en) * 2014-04-29 2015-10-29 International Business Machines Corporation Extracting salient features from video using a neurosynaptic system
CN103996209A (en) * 2014-05-21 2014-08-20 北京航空航天大学 Infrared vessel object segmentation method based on salient region detection
CN105184293A (en) * 2015-08-29 2015-12-23 电子科技大学 Automobile logo positioning method based on significance area detection
US20170200065A1 (en) * 2016-01-13 2017-07-13 Adobe Systems Incorporated Image Captioning with Weak Supervision
CN107203781A (en) * 2017-05-22 2017-09-26 浙江大学 A kind of object detection method Weakly supervised end to end instructed based on conspicuousness
CN108898145A (en) * 2018-06-15 2018-11-27 西南交通大学 A kind of image well-marked target detection method of combination deep learning
CN109919013A (en) * 2019-01-28 2019-06-21 浙江英索人工智能科技有限公司 Method for detecting human face and device in video image based on deep learning
CN109919059A (en) * 2019-02-26 2019-06-21 四川大学 Conspicuousness object detecting method based on depth network layerization and multitask training

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
J. ZHU ET AL: "Unsupervised Object Class Discovery via Saliency-Guided Multiple Class Learning", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
徐丹等: "融合颜色属性和空间信息的显著性物体检测", 《中国图象图形学报》 *
赵丹培等: "基于显著语义模型的机场与油库目标的识别方法", 《计算机辅助设计与图形学学报》 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191732B (en) * 2020-01-03 2021-05-14 天津大学 Target detection method based on full-automatic learning
CN111191732A (en) * 2020-01-03 2020-05-22 天津大学 Target detection method based on full-automatic learning
CN113096132B (en) * 2020-01-08 2022-02-08 东华医为科技有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN113096132A (en) * 2020-01-08 2021-07-09 东华医为科技有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN111680702A (en) * 2020-05-28 2020-09-18 杭州电子科技大学 Method for realizing weak supervision image significance detection by using detection frame
CN111950545A (en) * 2020-07-23 2020-11-17 南京大学 Scene text detection method based on MSNDET and space division
CN111950545B (en) * 2020-07-23 2024-02-09 南京大学 Scene text detection method based on MSDNet and space division
CN113762455A (en) * 2020-08-07 2021-12-07 北京沃东天骏信息技术有限公司 Detection model training method, single character detection method, device, equipment and medium
CN112102250A (en) * 2020-08-20 2020-12-18 西北大学 Method for establishing and detecting pathological image detection model with training data as missing label
CN112102250B (en) * 2020-08-20 2022-11-04 西北大学 Method for establishing and detecting pathological image detection model with training data as missing label
CN111968105A (en) * 2020-08-28 2020-11-20 南京诺源医疗器械有限公司 Method for detecting salient region in medical fluorescence imaging
CN112037216A (en) * 2020-09-09 2020-12-04 南京诺源医疗器械有限公司 Image fusion method for medical fluorescence imaging system
CN112037216B (en) * 2020-09-09 2022-02-15 南京诺源医疗器械有限公司 Image fusion method for medical fluorescence imaging system
CN112115723B (en) * 2020-09-14 2022-08-12 中国船舶重工集团公司第七0九研究所 Weak supervision semantic analysis method based on false positive sample detection
CN112115723A (en) * 2020-09-14 2020-12-22 中国船舶重工集团公司第七0九研究所 Weak supervision semantic analysis method based on false positive sample detection
CN112052907A (en) * 2020-09-15 2020-12-08 浙江智慧视频安防创新中心有限公司 Target detection method and device based on image edge information and storage medium
CN112560853A (en) * 2020-12-14 2021-03-26 中科云谷科技有限公司 Image processing method, device and storage medium
CN112598053B (en) * 2020-12-21 2024-01-09 西北工业大学 Active significance target detection method based on semi-supervised learning
CN112598053A (en) * 2020-12-21 2021-04-02 西北工业大学 Active significance target detection method based on semi-supervised learning
CN112766285B (en) * 2021-01-26 2024-03-19 北京有竹居网络技术有限公司 Image sample generation method and device and electronic equipment
CN112766285A (en) * 2021-01-26 2021-05-07 北京有竹居网络技术有限公司 Image sample generation method and device and electronic equipment
US20220254136A1 (en) * 2021-02-10 2022-08-11 Nec Corporation Data generation apparatus, data generation method, and non-transitory computer readable medium
CN112861880A (en) * 2021-03-05 2021-05-28 江苏实达迪美数据处理有限公司 Weak supervision RGBD image saliency detection method and system based on image classification
CN113139431B (en) * 2021-03-24 2024-05-03 杭州电子科技大学 Image saliency target detection method based on deep supervised learning
CN113139431A (en) * 2021-03-24 2021-07-20 杭州电子科技大学 Image saliency target detection method based on deep supervised learning
CN113240659B (en) * 2021-05-26 2022-02-25 广州天鹏计算机科技有限公司 Heart nuclear magnetic resonance image lesion structure extraction method based on deep learning
CN113240659A (en) * 2021-05-26 2021-08-10 广州天鹏计算机科技有限公司 Image feature extraction method based on deep learning
CN113610807B (en) * 2021-08-09 2024-02-09 西安电子科技大学 New coronaries pneumonia segmentation method based on weak supervision multitask learning
CN113610807A (en) * 2021-08-09 2021-11-05 西安电子科技大学 New coronary pneumonia segmentation method based on weak supervision multitask learning
CN114627437A (en) * 2022-05-16 2022-06-14 科大天工智能装备技术(天津)有限公司 Traffic target identification method and system
CN116189058B (en) * 2023-03-03 2023-10-03 北京信息科技大学 Video saliency target detection method and system based on unsupervised deep learning
CN116189058A (en) * 2023-03-03 2023-05-30 北京信息科技大学 Video saliency target detection method and system based on unsupervised deep learning

Also Published As

Publication number Publication date
CN110598609B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
CN110598609B (en) Weak supervision target detection method based on significance guidance
CN108898610B (en) Object contour extraction method based on mask-RCNN
CN108376244B (en) Method for identifying text font in natural scene picture
CN108121991B (en) Deep learning ship target detection method based on edge candidate region extraction
CN112508975A (en) Image identification method, device, equipment and storage medium
Kim et al. Multi-task convolutional neural network system for license plate recognition
CN107480585B (en) Target detection method based on DPM algorithm
CN110751154B (en) Complex environment multi-shape text detection method based on pixel-level segmentation
CN109858438B (en) Lane line detection method based on model fitting
CN101770583B (en) Template matching method based on global features of scene
CN110633727A (en) Deep neural network ship target fine-grained identification method based on selective search
CN112784757B (en) Marine SAR ship target significance detection and identification method
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN112101108B (en) Left-right pass-through sign recognition method based on graph pole position characteristics
Jayasinghe et al. Ceymo: See more on roads-a novel benchmark dataset for road marking detection
CN111274964B (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
Zhou et al. Building segmentation from airborne VHR images using Mask R-CNN
CN112308040A (en) River sewage outlet detection method and system based on high-definition images
CN115830359A (en) Workpiece identification and counting method based on target detection and template matching in complex scene
CN113326734B (en) Rotational target detection method based on YOLOv5
CN114550134A (en) Deep learning-based traffic sign detection and identification method
CN116912184B (en) Weak supervision depth restoration image tampering positioning method and system based on tampering area separation and area constraint loss
CN111353459A (en) Ship target detection method under resource-limited condition
CN107704864A (en) Well-marked target detection method based on image object Semantic detection
Oluchi et al. Development of a Nigeria vehicle license plate detection system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zhao Danpei

Inventor after: Yuan Zhichao

Inventor after: Shi Zhenwei

Inventor after: Jiang Zhiguo

Inventor after: Xie Fengying

Inventor after: Zhang Haopeng

Inventor before: Zhao Danpei

Inventor before: Yuan Zhichao

Inventor before: Shi Zhenwei

Inventor before: Jiang Zhiguo

Inventor before: Xie Fengying

Inventor before: Zhang Haopeng

GR01 Patent grant
GR01 Patent grant