CN109583517A - A kind of full convolution example semantic partitioning algorithm of the enhancing suitable for small target deteection - Google Patents
A kind of full convolution example semantic partitioning algorithm of the enhancing suitable for small target deteection Download PDFInfo
- Publication number
- CN109583517A CN109583517A CN201811601302.9A CN201811601302A CN109583517A CN 109583517 A CN109583517 A CN 109583517A CN 201811601302 A CN201811601302 A CN 201811601302A CN 109583517 A CN109583517 A CN 109583517A
- Authority
- CN
- China
- Prior art keywords
- algorithm
- roi
- enhancing
- map
- rpn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002708 enhancing effect Effects 0.000 title claims abstract description 29
- 238000000638 solvent extraction Methods 0.000 title claims abstract description 21
- 230000004927 fusion Effects 0.000 claims abstract description 29
- 239000000284 extract Substances 0.000 claims abstract description 22
- 238000000605 extraction Methods 0.000 claims abstract description 10
- 230000011218 segmentation Effects 0.000 claims description 27
- 238000000034 method Methods 0.000 claims description 19
- 238000005070 sampling Methods 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 9
- 230000010339 dilation Effects 0.000 claims description 8
- 238000001228 spectrum Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 5
- 230000001629 suppression Effects 0.000 claims description 5
- 230000009977 dual effect Effects 0.000 abstract description 22
- 230000000694 effects Effects 0.000 abstract description 16
- 230000009286 beneficial effect Effects 0.000 abstract description 4
- 238000012545 processing Methods 0.000 abstract description 3
- 230000001737 promoting effect Effects 0.000 abstract description 2
- 239000010410 layer Substances 0.000 description 39
- 238000013527 convolutional neural network Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 11
- 238000001514 detection method Methods 0.000 description 4
- 238000012800 visualization Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 206010003671 Atrioventricular Block Diseases 0.000 description 2
- GIYXAJPCNFJEHY-UHFFFAOYSA-N N-methyl-3-phenyl-3-[4-(trifluoromethyl)phenoxy]-1-propanamine hydrochloride (1:1) Chemical compound Cl.C=1C=CC=CC=1C(CCNC)OC1=CC=C(C(F)(F)F)C=C1 GIYXAJPCNFJEHY-UHFFFAOYSA-N 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000010845 search algorithm Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 101100498818 Arabidopsis thaliana DDR4 gene Proteins 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 208000011580 syndromic disease Diseases 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention belongs to technical field of image processing, divide (FCIS) algorithm based on full convolution example semantic, disclose a kind of full convolution example semantic partitioning algorithm of enhancing suitable for small target deteection, it includes that sharing feature figure extracts, frame is preselected to extract, generate position sensing score map, classification and recurrence;In sharing feature figure extraction process, fusion conv1, conv3 and conv5 characteristic pattern, so that sharing feature figure remains high semantic information and high detailed information are proposed;It is bad for pre-selection frame extraction network effect in pre-selection frame extraction process, propose that dual RPN algorithm, its average recall rate ratio RPN improve 7%.The mAP ratio FCIS of EFCIS algorithm of the present invention improves 3.5%, and for small size target, the mAP ratio FCIS of EFCIS algorithm improves 2.9%.It is shown experimentally that the present invention is very beneficial for promoting the ability of crawl Small object.
Description
Technical field
The invention belongs to technical field of image processing more particularly to a kind of full convolution of the enhancing suitable for small target deteection
Example semantic partitioning algorithm.
Background technique
Currently, the prior art commonly used in the trade is such that
Scene understanding is a core difficult point of computer vision field, and example semantic segmentation is to realize scene understanding
One necessary process, in image domains, example semantic segmentation is integrated images classification, target detection, image segmentation it is comprehensive
Task, it is widely used in GIS-Geographic Information System, unmanned, medical image analysis, robot and other field.
As the deep learning based on convolutional neural networks rapidly develops, more and more example semantic segmentations subtask can
It is completed with using convolutional neural networks, and in recent years, full convolutional neural networks (FCN) related algorithm has monopolized image
Semantic segmentation field, wherein divided in related algorithm using full convolutional neural networks processing image, semantic, full convolution example semantic
Segmentation (FCIS) achieves extremely excellent effect, and full convolution example semantic segmentation (FCIS) has been gathered numerous at this stage outstanding
Achievement is first algorithm for realizing convolution example semantic segmentation end to end, and divides the achievement in field in example semantic
Other algorithms are held a safe lead, it obtains 2016 MS COCO segmentation challenge first places, and much leads
First second place.
In conclusion problem of the existing technology is:
(1) since defect-convolutional neural networks of convolutional neural networks itself are down-sampled, cause lesser in picture
Target may just disappear during sharing feature is extracted, and can not participate in the later period task of algorithm.
(2) full convolution example semantic segmentation (FCIS) is typical " two-stage " algorithm, in the first stage of the algorithm
Need to extract a large amount of pre-selection frames, and the recall rate that existing RPN network extracts pre-selection frame still has several drawbacks, causes at this
The reduced performance of the later period task of algorithm.
Solve the difficulty and meaning of above-mentioned technical problem:
Difficulty:
(1), Small object characteristic pattern is caused to be lost for convolutional neural networks are down-sampled, industry and academia follow at present
Thought be fusion shallow-layer, middle layer, the characteristic pattern of deep layer network, but will increase dramatically entire model parameter quantity in this way, this
So that whole network training is very difficult, it is extremely easy training over-fitting, and need to greatly improve computing resource, entire algorithm
Execute the time be significantly increased, other than these difficulties, during this algorithm improvement, we further encounter using up-sampling or
We lose bulk information during person is down-sampled, and these processes be not can learning process, this is for entire algorithm essence
Exactness is mortality strike.
(2), current whether industry or academia, for " two-stage " algorithm, pre-selection frame, which extracts, all to be used
RPN network, but RPN network will obtain the quantity that higher recall rate needs to increase substantially pre-selection frame, this calculates the later period
Method speed-raising generates great burden, while even if extracting a large amount of pre-selection frame, still resulting in target loss, proposing thus
The schemes such as Selective Search, Edge boxes, but still be difficult to solve the above problems.
Meaning:
(1), for problem 1, the invention proposes a kind of Fusion Features schemes, merge conv1, conv3, conv5, take into account
The high semantic information that the high detailed information and further feature that shallow-layer feature has have, and will be fused by 1 × 1 convolution
Feature Compression is mutually coordinated between the sharing feature map of guarantee participation later period task in uniform spaces, in fusion process,
It is down-sampled for 2 expansion convolution progress that we use stride to conv1, and is up-sampled to conv5 using transposition convolution, these
Sampling operation be all can learning process and the sampling of traditional bilinearity and interpolation method it is completely different, to preventing characteristic pattern to be distorted
There is very big help.So we, which are characterized characteristic pattern sampling in fusion process, proposes a new angle.
(2), it is directed to problem 2, the present invention proposes to be based on one RPN network of each training of conv3 and conv4, passes through NMS algorithm
The pre-selection frame generated to the two merges, and for this scheme, we increase a semantically enhancement network after conv3, this
Sample solves the problems, such as that shallow-layer network is semantic lower, and the pre-selection frame for just enabling conv3 and conv4 to extract in this way merges,
Dual RPN be for the first time by it is proposed that, and solve the problems, such as that shallow-layer Feature Semantics are low during realization.And
For dual RPN than currently a popular RPN, Selective Search, Edge boxes performance is more excellent.
Summary of the invention
In view of the problems of the existing technology, the present invention provides a kind of full convolution of enhancing suitable for small target deteection
Example semantic partitioning algorithm.So proposing that enhanced full convolution example semantic segmentation (EFCIS) is calculated for these problems present invention
Method, for solving the problems, such as that full convolution example semantic segmentation (FCIS) algorithm is bad for Small object segmentation performance.
It is calculated the invention is realized in this way a kind of full convolution example semantic of enhancing suitable for small target deteection is divided
Method, comprising:
Step 1: sharing feature figure extracts, and carries out up-sampling to the resolution ratio of the characteristic pattern of different layers or down-sampling is gone forward side by side
Row Fusion Features;
Step 2: pre-selection frame extracts, and two RPN networks are respectively trained in conv3 and conv4, generates to two RPN networks
Pre-selection frame carries out comprehensive extraction using non-maxima suppression NMS;
Step 3: generating the score map based on position sensing, generates sharing feature figure based on full convolutional network FCN, then
Use 2k2× (C+1)-d ties up the score map that 1 × 1 convolution generates position sensing, and wherein C+1 indicates C target category and 1
Background classification, the position sensing characteristic spectrum of each ROI is by corresponding k2A position sensing map is spliced;
Step 4: classification and recurrence return the ROI position sensing map that step 3 generates, corresponding score
Map is classified, and in classification, obtains the score map of Pixel-level by the ROI position sensing map of splicing correspondence, will be in ROI
Each pixel, by two 1 × 1 convolution judge respectively the pixel whether in this ROI and the pixel whether in object
In the bounds of body.
Further, step 1 specifically includes:
Sharing feature figure extracts, and when input picture the ratio of width to height is constant, short side is adjusted to 600;Using ResNet-101
The characteristic pattern that conv1, conv3 and conv5 layer of basic network Model Fusion extracts sharing feature figure;To the characteristic pattern of different layers
Make the resolution ratio of characteristic pattern consistent using different up-sampling or down-sampling;
Use dilation for 2, stride 2 conv1, the expansion convolution that kernel 3, padding are 0 makes
The resolution ratio of conv1 layers of characteristic pattern declines 2 times;Keep conv3 resolution ratio constant;
Merge conv3 and it is down-sampled after conv1 characteristic pattern, reuse 1 × 1 × 512 convolution will from conv1 and
The characteristic pattern of conv3 is compressed in uniform spaces;
Conv5 layers use dilation for 2, stride 2, the deconvolution that kernel 3, padding are 0
(Deconv), so that conv5 layers of characteristic pattern resolution ratio up-samples 2 times;
Conv1 is come from using the fusion of 1 × 1 × 512 convolution, conv3, conv5 layers of characteristic pattern is final to realize that feature is melted
It closes.
Further, step 2 specifically includes:
1) pass through conv4 layers of feature training deep layer RPN network;
2) the semantically enhancement network being made of 33 × 3 × 512 convolutional layers is added on the basis of conv3;
3) on the basis of the characteristic pattern of step 2) output, training shallow-layer RPN network;
4) the RPN network of step 1) and step 3) respectively generates 300 pre-selection frames, using soft-NMS algorithm fusion 2
The pre-selection frame of RPN network, and the parameter IoU that soft-NMS algorithm is arranged is 0.7, will predict highest 300 pre-selections of score
Frame retains.
Further, step 3 specifically includes
Use 2k21 × 1 convolution of × (C+1)-d generate position sensing score map, wherein C+1 indicate target category and
One background classification, setting (K, C) are (7,80);And each characteristic spectrum compares down-sampled 16 times of resolution ratio of input picture,
Each ROI is by corresponding k2A position sensing map combines to be formed.
Further, step 4 specifically includes:
The ROI of generation return and its score map is classified, if ROI callout box corresponding with its
IoU is more than 0.5, and it is positive sample that the ROI, which is arranged, and otherwise, which is marked as negative sample, and is carrying out classification regression process
It is middle to use a multitask loss function,
L(k,k*,m,m*,t,t*)=Lcls(k,k*)+Lmask(m,m*)+Lreg(t,t*);
Wherein LclsIt is the softmax loss function for positive and negative sample pane, LregFor the recurrence of positive sample frame, LmaskWith
In the softmax function of the mask of positive sample, k and k*Indicate ROI prediction result and true tag, m and m*Indicate each pixel
Prediction result and mark;
In recurrence, a positive sample frame vector t=(tx,ty,th,tw) and predicted vector t*=(tx *,ty *,th *,tw *) simultaneously
And calculating t is to be calculated by the following formula, tx=(Gx-Px)/Pw, ty=(Gy-Py)/Ph, tw=log (Gw/Pw), th=log
(Gh/Ph);
Wherein Pi=(Px,Py,Ph,Pw) it is prediction block, Ph,PwIndicate that the width of prediction block is high, Px,Py, indicate in prediction block
Heart point coordinate, same method calculate callout box G;
In classification, ROI is given, the score map of Pixel-level is obtained by the position sensing map of the ROI of splicing correspondence;It is logical
1 × 1 × 512 convolution operations are crossed to classify.
Another object of the present invention is to provide a kind of full convolution for the enhancing for being suitable for small target deteection described in realize is real
The computer program of illustrative phrase justice partitioning algorithm.
Another object of the present invention is to provide a kind of terminal, it is described suitable for Small object that the terminal at least carries realization
The controller of the full convolution example semantic partitioning algorithm of the enhancing of detection.
Another object of the present invention is to provide a kind of computer readable storage mediums, including instruction, when it is in computer
When upper operation, so that computer executes the full convolution example semantic partitioning algorithm of the enhancing suitable for small target deteection.
Another object of the present invention is to provide a kind of full volumes for implementing to realize the enhancing suitable for small target deteection
The automatic driving vehicle of product example semantic partitioning algorithm.
Another object of the present invention is to provide a kind of full volumes for implementing to realize the enhancing suitable for small target deteection
The medical image analytical equipment of product example semantic partitioning algorithm.
Another object of the present invention is to provide a kind of full volumes for implementing to realize the enhancing suitable for small target deteection
The intelligent robot of product example semantic partitioning algorithm.
In conclusion advantages of the present invention and good effect are as follows:
(1), sharing feature figure is made to be provided simultaneously with high semantic information (for classifying) and high details letter for Fusion Features
The performance for ceasing (for positioning), and extracting pre-selection frame for network just tests the classification capacity and stationkeeping ability of network simultaneously,
This present invention uses resnet-50, based on the training of 2007 data set of pascal voc and verifying, IoU=0.5 is arranged, by Fig. 5
Know conv1, the effect of conv3, conv5 fusion is best, and multitiered network is better than single layer network effect, and deep layer network is than shallow
Layer network effect is good.
(2), it is based on 2007 data set of pascal voc, dual RPN algorithm and pre-selection frame all the fashion at present extract
Algorithm Edge Boxes, Selective Search and RPN is compared, as a result as shown in Figure 3.
Under conditions of prediction score highest pre-selection frame takes 50,100 and 200 respectively, dual RPN proposed by the present invention
Other three kinds pre-selection frames can far be won and extract network.Particularly with the prediction highest preceding 50 pre-selections frame of score, when IoU is set as
When 0.5, dual RPN algorithm realizes 94.4% recall rate, and than RPN, Selective Search and Edge boxes's is called together
The rate of returning is higher by 10.4%, 41.4%, 38.4% respectively.
When IoU is respectively set to 0.5,0.6,0.7, dual RPN performance proposed by the present invention far wins other three kinds pre-selections
Frame extracts network.Particularly with the highest preceding 1000 pre-selection frame of prediction score, dual RPN algorithm recall rate ratio RPN algorithm difference
It is higher by 3.2%, 8%, 10.5%;It is higher by 9.3%, 18%, 21.5% respectively than Selective Search algorithm;Compare Edge
Boxes algorithm will be higher by 7.2%, 10.5%, 12%, show that algorithm positioning performance proposed by the present invention is prominent, be very beneficial for
Promote Small object Grabbing properties.As a result as shown in Figure 4.
Generally, pre-selection frame, the IoU 0.5 of prediction score highest preceding 300 are taken, dual RPN proposed by the present invention is calculated
Method realizes 99.7% recall rate, is higher by 7% than RPN algorithm.
(3), it is based on MS COCO data set, as shown in Table 1, very popular at present flip horizontal and multiple dimensioned training two
Kind scheme makes the mAP of EFCIS algorithm improve 0.6% and 0.8% respectively, and data cutting scheme proposed by the present invention makes
The mAP of EFCIS algorithm improves 2.5%.In particular for Small object, flip horizontal and multiple dimensioned training improve 0.4% respectively
With 0.5%, and data cutting scheme of the invention makes the mAP of EFCIS algorithm improve 2%.
(4), it is based on MS COCO data set, as shown in Table 2, the mAP ratio FCIS algorithm of EFCIS algorithm improves 3.5%;
In particular for Small object, the mAP ratio FCIS of EFCIS improves 2.9%, it was demonstrated that EFCIS algorithm of the invention has stronger
Small object Grasping skill.In addition, for the target of medium size and large scale target, the mAP ratio FCIS of EFCIS algorithm
3.6% and 4.1% has been respectively increased.Such as Fig. 7 is divided in comparison visualization, (a): original graph, (b): FCIS segmentation result, (c):
EFCIS segmentation result.
Detailed description of the invention
Fig. 1 is the full convolution example semantic partitioning algorithm of the enhancing provided in an embodiment of the present invention suitable for small target deteection
Flow chart.
Fig. 2 is Fusion Features visualization provided in an embodiment of the present invention and feature sampling process figure.
Fig. 3 is provided in an embodiment of the present invention based on 2007 data set of PASCAl, recall and IoU comparison line chart.
In figure: (a), prediction score Top-50 pre-selection frame;(b), the pre-selection frame of score Top-100 is predicted;(c), it predicts
The pre-selection frame of score Top-200.
Fig. 4 is provided in an embodiment of the present invention based on 2007 data set of PASCAl, recall and proposal number
Compare line chart.
In figure: (a), IoU=0.5;(b), IoU=0.6;(c), IoU=0.7.
Fig. 5 be the embodiment of the present invention propose Fusion Features, be based on 2007 data set of PASCAl VOC, recall with
Proposal number compares line chart.
Fig. 6 is image cutting scheme figure provided in an embodiment of the present invention.
Fig. 7 is the EFCIS algorithm and FCIS Algorithm Demo comparison diagram that embodiment provides.In figure: (a), inputting picture;
(b), FCIS algorithm segmentation result;(c), EFCIS algorithm segmentation result.
Fig. 8 is dual RPN algorithm schematic diagram provided in an embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention
It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to
Limit the present invention.
In the present invention, it is that Fast RCNN algorithm mentions for the first time that RPN, which is that full name is region of proposal network,
Out for preselecting the network of frame extraction.
Recall rate (recall): also known as recall ratio refers to the ratio of the relevant documentation number and total number of files that retrieve.
ResNet-101: the network be had in 2015 by it is triumphant it is bright et al. be suggested, and it is big in ImageNet image classification
Match obtains the first place and gains considerable fame.
ROI: full name region of interest, it is referred to as area-of-interest, that is, mesh in algorithm of target detection
Indicate region that may be present.
FCN: full name is Fully convolutional neural network, by the Jonathan of UC Berkeley
Long et al. was proposed in 2015.
NMS: full name is Non-maximum Suppression, by paper EFFicient Non-maximum
Suppression is put forward for the first time, for extracting the highest window of score in target detection.
Application principle of the invention is described in detail below with reference to concrete scheme.
As shown in Figure 1, the full convolution example semantic of the enhancing provided in an embodiment of the present invention suitable for small target deteection point
Cutting algorithm is on the basis of full convolution example semantic divides (FCIS) algorithm, in order to improve full convolution example semantic segmentation
(FCIS) the enhanced full convolution example semantic that algorithm is divided Small object performance and proposed divides (EFCIS), mainly passes through spy
Sign fusion, dual RPN, image cuts three technical solutions, and to solve, convolutional neural networks are down-sampled to cause Small object feature to be lost
It loses, RPN network preselects frame and extracts the bad problem of recall rate performance, to promote full convolution example semantic segmentation (FCIS) algorithm
Divide Small object performance.
Specific step is as follows:
Step 1: sharing feature figure extracts, and keeps the ratio of width to height of input picture constant, and short side is adjusted to 600, this
Invention extracts sharing feature figure using ResNet-101, and on the basis of ResNet-101, present invention fusion comes from conv1,
Conv3 and conv5 layers of characteristic pattern, the present invention are protected for the characteristic pattern of different layers using different up-sampling or down-sampling
The resolution ratio of characteristics of syndrome figure is consistent, for conv1, uses dilation for 2, stride 2, kernel 3, padding 0
Expansion convolution so that conv1 layers of characteristic pattern resolution ratio decline 2 times, merge it is down-sampled after conv1 and conv3 spy
Sign figure, unifies this feature figure in a space using 1 × 1 × 512 convolution, for conv5 layers, uses dilation for 2,
Stride is 2, kernel 3, and the deconvolution that padding is 0 equally samples so that conv5 layers of characteristic pattern up-samples 2 times
Above-mentioned integration program, final fusion come from conv1, conv3, conv5 layers of characteristic pattern, to realize the following Fig. 2 of Fusion Features.
Step 2: pre-selection frame extracts, and in order to improve the recall rate that traditional RPN network pre-selection frame extracts, the present invention is led to respectively
Conv3 and conv4 two RPN networks of training are crossed, the pre-selection frame generated respectively to it is melted using non-maxima suppression (NMS)
It closes, details of operation is as follows:
(1) by conv4 layers of feature one RPN network of training, the present invention is referred to as deep layer RPN network.
(2) present invention adds one by 33 × 3 × 512 convolutional layers (activation after each convolution on the basis of conv3
Function is ReLU) enhancing semantic network, its parameter (kernel, padding, stride, filter is arranged in the present invention
Number) it is (3,1,1,512).
(3) on the basis of the characteristic pattern of (2) output, the present invention has one RPN network of training, and the present invention is referred to as shallow-layer
RPN network.
(4) 300 pre-selection frames can produce by the RPN network of (1) and (3) respectively, the present invention uses soft-NMS algorithm
The parameter IoU for merging the pre-selection frame from two RPN networks, and soft-NMS algorithm being arranged is 0.7.Make prediction point in this way
The highest 300 pre-selections frame of number will all remain.
Step 3: generating the score map based on position sensing, the sharing feature figure generated for settlement steps to deal one it is flat
Motion immovability, the present invention using full convolutional network (FCN) scheme go generate position sensing score map, one 1 × 1 × 1024
Convolution operation then use 2k for sharing feature figure2The score chart of 1 × 1 convolution of × (C+1)-d generation position sensing
Spectrum, wherein C+1 indicates target category and a background classification, based on the MS COCO dataset present invention be arranged (K, C) be (7,
80), here, each characteristic spectrum compares down-sampled 16 times of resolution ratio of input picture, each ROI is by corresponding k2Position is quick
Sense map is composed.
Step 4: classification and recurrence, carry out recurrence score map corresponding with its for the ROI that step 3 generates and divide
Class, if ROI and its IoU for corresponding to callout box are more than 0.5 present invention setting, it is positive sample, and otherwise, this ROI is marked as
Negative sample, the present invention uses a multitask loss function, L (k, k during training*,m,m*,t,t*)=Lcls(k,k*)
+Lmask(m,m*)+Lreg(t,t*), wherein LclsIt is the softmax loss function for positive and negative sample pane, LregFor positive sample frame
Recurrence, LmaskIndicate the softmax function for being used for positive sample mask, k and k* indicate ROI prediction result and true tag, m and
M* indicates the prediction result and mark of each pixel, for regression problem, the label vector t=(t of a positive sample framex,ty,
th,tw) and predicted vector t*=(tx *,ty *,th *,tw *) and the present invention calculate t be to be calculated by the following formula, tx=(Gx-
Px)/Pw, ty=(Gy-Py)/Ph, tw=log (Gw/Pw), th=log (Gh/Ph), wherein Pi=(Px,Py,Ph,Pw) it is prediction block,
Ph,PwIndicate that the width of prediction block is high, Px,Py, indicate the center point coordinate of prediction block, similarly methods calculate callout box G.It is right
In classification problem, a ROI is given, the score map of Pixel-level is obtained by the position sensing map of the ROI of splicing correspondence, this
When for each pixel in this ROI, there are two tasks, first, whether the pixel in this ROI, second, the pixel is
It is no in the bounds of target object.The present invention realizes the two tasks by two 1 × 1 convolution operations.
Application of the invention is further described with reference to the accompanying drawing.
By double RPN algorithms of the invention in pre-selection frame extraction algorithm Edge Boxes, Selective all the fashion at present
Search and RPN are compared, for predicting that the highest preceding 50 pre-selections frame of score, setting IoU are 0.5, dual of the invention
RPN algorithm realizes 94.4% recall rate, and than RPN, Selective Search and Edge boxes will be higher by 10.4%,
41.4%, 38.4%, it is 0.5,0.6,0.7 when IoU is respectively set, predicts highest preceding 1000 prediction blocks of score, the present invention
The recall rate ratio RPN algorithm of dual RPN algorithm to be higher by 3.2%, 8%, 10.5%, than Selective Search algorithm
It it is higher by 9.3%, 18%, 21.5%, is higher by 7.2%, 10.5%, 12% than Edge boxes algorithm, it was demonstrated that the present invention
Dual RPN algorithm brilliance positioning performance, this be very beneficial for promoted small target auto-orientation performance.Particularly, for prediction point
The pre-selection frame of number highest preceding 300, IoU 0.5, dual RPN algorithm of the invention realize 99.7% recall rate, compare RPN
Algorithm is higher by 7%, as a result such as Fig. 3, shown in Fig. 4.
Under the standard of MS COCO data set, image cutting scheme of the invention improves 2.5% to the mAP of algorithm, and
Current very popular flip horizontal and multiple dimensioned trained two schemes make the mAP of algorithm improve 0.6% and 0.8% respectively.
Particularly, for Small object, the mAP of data cutting scheme boosting algorithm of the invention is 2%, and flip horizontal and multiple dimensioned instruction
It is experienced then improve 0.4% and 0.5% respectively, comparing result such as the following table 1.
By merging conv1, conv3 and conv5, so that the sharing feature figure that the present invention extracts is provided simultaneously with high semanteme point
The location information of category information and high details, while one times of the increase resolution of sharing feature figure have stronger crawl Small object
Ability.
By the experiment based on the verifying collection of MS COCO 2014, it is a discovery of the invention that the mAP ratio of EFCIS algorithm of the present invention
FCIS algorithm improves 3.5%, and particularly, for Small object, the mAP ratio FCIS of EFCIS of the invention improves 2.9%, card
The ability that algorithm of the invention has stronger crawl Small object is illustrated, comparing result is as shown in table 2 below.
Table 1 uses different data enhanced schemes based on MS COCO data set, and wherein EFCIS baseline contains spy
It levies fusion and dual RPN but eliminates data cutting of the invention.
Table 2 is based on MS COCO data set, EFCIS and FCIS Comparative result
Application principle of the invention is further described combined with specific embodiments below.
Embodiment
Such as Fig. 1, the embodiment of the present invention proposes a kind of full convolution example semantic segmentation of enhancing suitable for small target deteection
Algorithm (EFCIS).First, proposing a kind of Fusion Features scheme and image cutting scheme, solve since convolutional neural networks drop
The problem of sampling causes Small object characteristic pattern to be lost;Second, being based on RPN network, propose that dual RPN greatly improves extraction pre-selection
Frame recall rate.Based on MS COCO data set, the mAP of final EFCIS algorithm proposed by the present invention compares FCIS and improves 3.5%,
Particularly with small size target, the mAP of EFCIS algorithm compares FCIS and improves 2.9%.Specific steps are as follows:
Step 1: building hardware environment, and core of the invention hsrdware requirements are that RAM:256G SSD solid state hard disk adds 2T machine
Tool hard disk;ROM:32G DDR4 memory bar;CUP:Intel CORE I7-7700K;GPU:NVIDIA GTX 1070Ti (8G).
Step 2: software environment is built, first installation 16.04 operating system of Ubuntu, is based on Ubuntu operating system,
Linux x64 Display Driver (version:390.87) video driver is installed, CUDA9.0+CUDNN5.1 video card of arranging in pairs or groups
Accelerating driving, installation compiling Mxnet (commit 998378a) deep learning frame installs Cython, opencv3.2,
The scientific algorithm library of easydict 1.6, hickle based on python interface.
Step 3: as shown in Fig. 2, sharing feature figure extracts, keep the ratio of width to height of input picture constant, and short side tune
Whole is 600;On the basis of ResNet-101, fusion comes from conv1, conv3 and conv5 layers of characteristic pattern;For conv1,
Use dilation for 2, stride 2, the expansion convolution that kernel 3, padding are 0, the conv1 after fusion is down-sampled
With the characteristic pattern of conv3, unify this feature figure in a space using 1 × 1 × 512 convolution;For conv5 layers, use
Dilation is 2, stride 2, and the deconvolution that kernel 3, padding are 0 equally samples above-mentioned integration program, finally
Fusion comes from conv1, and conv3, conv5 layers of characteristic pattern realizes Fusion Features.
Step 4: the algorithm of dual RPN as shown in figure 8,
1) pass through conv4 layers of feature training deep layer RPN network;
2) the enhancing semantic network by 33 × 3 × 512 convolutional layers is added on the basis of conv3;
3) on the basis of the characteristic pattern of step 2) output, training shallow-layer RPN network;
4) 300 pre-selection frames are generated by the RPN network of step 1) and step 3) respectively, using soft-NMS algorithm fusion
Pre-selection frame from 2 RPN networks, and the parameter IoU that soft-NMS algorithm is arranged is 0.7, finally to prediction score highest
Pre-selection frame retain.
Step 5: the sharing feature figure generated based on step 3 uses 2k2It is quick that 1 × 1 convolution of × (C+1)-d generates position
The score map of sense, wherein C+1 indicates target category and a background classification, and (K, C) is (7,80);Each characteristic spectrum is compared
Down-sampled 16 times of resolution ratio of picture are inputted, the characteristic pattern of each ROI based on step 4 is by corresponding k2Position sensing map
Combination is formed.
Step 6: carrying out returning corresponding with its score map and classify for the ROI of generation, if a ROI with
The IoU that ROI corresponds to callout box is more than 0.5, and setting ROI is positive sample, and otherwise, this ROI is marked as negative sample, trained
A multitask loss function, L (k, k are used in the process*,m,m*,t,t*)=Lcls(k,k*)+Lmask(m,m*)+Lreg(t,t*);
Wherein LclsIt is the softmax loss function for positive and negative sample pane, LregFor the recurrence of positive sample frame, LmaskTable
Show that the softmax function for positive sample mask, k and k* indicate that ROI prediction result and true tag, m and m* indicate each picture
The prediction result and mark of element;
For recurrence, the label vector t=(t of a positive sample framex,ty,th,tw) and predicted vector t*=(tx *,ty *,
th *,tw *) and to calculate t be to be calculated by the following formula, tx=(Gx-Px)/Pw, ty=(Gy-Py)/Ph, tw=log (Gw/Pw),
th=log (Gh/Ph);
Wherein Pi=(Px,Py,Ph,Pw) it is prediction block, Ph,PwIndicate that the width of prediction block is high, Px,Py, indicate in prediction block
Heart point coordinate similarly calculates callout box G, finally obtains the example semantic segmentation result of a picture is complete, as shown in Figure 7.
Fig. 6 is image cutting scheme figure provided in an embodiment of the present invention.
The comparing result of Fusion Features scheme proposed by the present invention is as shown in figure 5, merge conv1, conv3, conv5 feature
It is best to scheme the effect obtained, and therefrom it has also been found that multilayer feature figure fusion ratio lacks layer characteristic pattern syncretizing effect more preferably, deep layer is special
Figure is levied than shallow-layer characteristic pattern better effect.
As shown in figure 3, when take the prediction highest preceding 50 pre-selections frame of score, when IoU=0.5, the realization of dual RPN algorithm
94.4% recall rate, than RPN, the recall rate of Selective Search and Edge boxes are higher by 10.4% respectively,
41.4%, 38.4%.
As shown in figure 4, when taking the prediction highest preceding 1000 pre-selections frame of score, IoU is respectively set to 0.5,0.6,0.7
When,.Dual RPN algorithm recall rate ratio RPN algorithm is higher by 3.2%, 8%, 10.5% respectively;It is calculated than Selective Search
Method is higher by 9.3%, 18%, 21.5% respectively;7.2%, 10.5%, 12% is higher by than Edge boxes algorithm.
As shown in fig. 7, the picture in visualization portion MS COCO test set, first row is input picture, and secondary series is
FCIS algorithm segmentation effect, third column EFCIS algorithm segmentation effect, You Shangtu effect of visualization is it is found that EFCIS algorithm can grab
Picture small-medium size object is taken, and FCIS algorithm can not almost be accomplished, and large-sized object in, the segmentation of EFCIS algorithm
It is more fine.
As shown in table 1, it is based on MS COCO data set, image cutting scheme of the invention improves the mAP of algorithm
2.5%, and flip horizontal and multiple dimensioned trained two schemes make the mAP of algorithm improve 0.6% and 0.8% respectively.
As shown in table 2, it is based on MS COCO data set, the mAP ratio FCIS algorithm of EFCIS algorithm improves 3.5%, especially
For Small object, the mAP ratio FCIS of EFCIS of the invention improves 2.9%.
Emulation the above result shows that, the present invention is based on FCIS algorithm propose EFCIS algorithm can effectively improve example
The effect of semantic segmentation realizes the effect of 66.8%mAP as IoU=0.5, has compared to the 61.7%mAP of FCIS algorithm aobvious
The promotion of work, especially for small size target, the mAP ratio FCIS of EFCIS algorithm improves 2.9%mAP.
In embodiments of the present invention.Fig. 8 is dual RPN algorithm schematic diagram provided by the invention.
Application principle of the invention is further described below with reference to effect.
Algorithm provided by the invention includes that sharing feature figure extracts, and pre-selection frame extracts, and generates position sensing score map, point
Class and recurrence;In sharing feature figure extraction process, for the problem that sharing feature is sparse, propose fusion conv1, conv3 and
Conv5 characteristic pattern, so that sharing feature figure remains high semantic information and high detailed information;It is down-sampled for convolutional neural networks
Caused by Small object Character losing the problem of, propose a kind of image cutting scheme to inhibit Small object Character losing;In pre-selection frame
In extraction process, the bad problem of network effect is extracted for pre-selection frame, and proposes dual RPN algorithm, being averaged for it is recalled
Rate ratio RPN improves 7%;In summary the mAP ratio FCIS of EFCIS algorithm improves 3.5%, especially for small size target,
The mAP ratio FCIS of EFCIS algorithm improves 2.9%.It is shown experimentally that the present invention is very beneficial for promoting the energy of crawl Small object
Power.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When using entirely or partly realizing in the form of a computer program product, the computer program product include one or
Multiple computer instructions.When loading on computers or executing the computer program instructions, entirely or partly generate according to
Process described in the embodiment of the present invention or function.The computer can be general purpose computer, special purpose computer, computer network
Network or other programmable devices.The computer instruction may be stored in a computer readable storage medium, or from one
Computer readable storage medium is transmitted to another computer readable storage medium, for example, the computer instruction can be from one
A web-site, computer, server or data center pass through wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)
Or wireless (such as infrared, wireless, microwave etc.) mode is carried out to another web-site, computer, server or data center
Transmission).The computer-readable storage medium can be any usable medium or include one that computer can access
The data storage devices such as a or multiple usable mediums integrated server, data center.The usable medium can be magnetic Jie
Matter, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid
State Disk (SSD)) etc..
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (11)
1. a kind of full convolution example semantic partitioning algorithm of enhancing suitable for small target deteection, which is characterized in that described to be applicable in
Include: in the full convolution example semantic partitioning algorithm of the enhancing of small target deteection
Step 1: sharing feature figure extracts, and carries out up-sampling or down-sampling to the resolution ratio of the characteristic pattern of different layers and carries out spy
Sign fusion;
Step 2: pre-selection frame extracts, and two RPN networks are respectively trained in conv3 and conv4, the pre-selection generated to two RPN networks
Frame carries out comprehensive extraction using non-maxima suppression NMS;
Step 3: generating the score map based on position sensing, generates sharing feature figure based on full convolutional network FCN, reuses
2k2× (C+1)-d ties up the score map that 1 × 1 convolution generates position sensing, and wherein C+1 indicates C target category and 1 background
Classification, the position sensing characteristic spectrum of each ROI is by corresponding k2A position sensing map is spliced;
Step 4: classification and recurrence return the ROI position sensing map that step 3 generates, corresponding score map
Classify, in classification, the score map of Pixel-level is obtained by the ROI position sensing map of splicing correspondence, it will be every in ROI
A pixel, by two 1 × 1 convolution judge respectively the pixel whether in this ROI and the pixel whether in target object
In bounds.
2. the full convolution example semantic partitioning algorithm suitable for the enhancing of small target deteection as described in claim 1, feature
It is, step 1 specifically includes:
Sharing feature figure extracts, and when input picture the ratio of width to height is constant, short side is adjusted to 600;Using the basis ResNet-101
Network model merges conv1, conv3 and conv5 layers of characteristic pattern to extract sharing feature figure;The characteristic pattern of different layers is used
Different up-samplings or down-sampling make the resolution ratio of characteristic pattern consistent;
Use dilation for 2, stride 2 conv1, the expansion convolution that kernel 3, padding are 0 makes conv1
The resolution ratio of the characteristic pattern of layer declines 2 times;Keep conv3 resolution ratio constant;
Merge conv3 and it is down-sampled after conv1 characteristic pattern, reuse 1 × 1 × 512 convolution will from conv1 and
The characteristic pattern of conv3 is compressed in uniform spaces;
Conv5 layers use dilation for 2, stride 2, and the deconvolution (Deconv) that kernel 3, padding are 0 makes
It obtains conv5 layers of characteristic pattern resolution ratio and up-samples 2 times;
Conv1 is come from using the fusion of 1 × 1 × 512 convolution, conv3, conv5 layers of characteristic pattern finally realizes Fusion Features.
3. the full convolution example semantic partitioning algorithm suitable for the enhancing of small target deteection as described in claim 1, feature
It is, step 2 specifically includes:
1) pass through conv4 layers of feature training deep layer RPN network;
2) the semantically enhancement network being made of 33 × 3 × 512 convolutional layers is added on the basis of conv3;
3) on the basis of the characteristic pattern of step 2) output, training shallow-layer RPN network;
4) the RPN network of step 1) and step 3) respectively generates 300 pre-selection frames, using 2 RPN nets of soft-NMS algorithm fusion
The pre-selection frame of network, and the parameter IoU that soft-NMS algorithm is arranged is 0.7, will predict that the highest 300 pre-selection frames of score are protected
It stays.
4. the full convolution example semantic partitioning algorithm suitable for the enhancing of small target deteection as described in claim 1, feature
It is, step 3 specifically includes
Use 2k21 × 1 convolution of × (C+1)-d generates the score map of position sensing, and wherein C+1 indicates target category and one
Background classification, setting (K, C) are (7,80);And each characteristic spectrum compares down-sampled 16 times of resolution ratio of input picture, each
ROI is by corresponding k2A position sensing map combines to be formed.
5. the full convolution example semantic partitioning algorithm suitable for the enhancing of small target deteection as described in claim 1, feature
It is, step 4 specifically includes:
The ROI of generation return and its score map is classified, if the IoU of ROI callout box corresponding with its is super
0.5 is crossed, it is positive sample that the ROI, which is arranged, and otherwise, which is marked as negative sample, and uses in carrying out classification regression process
One multitask loss function,
L(k,k*,m,m*,t,t*)=Lcls(k,k*)+Lmask(m,m*)+Lreg(t,t*);
Wherein LclsIt is the softmax loss function for positive and negative sample pane, LregFor the recurrence of positive sample frame, LmaskFor just
The softmax function of the mask of sample, k and k*Indicate ROI prediction result and true tag, m and m*Indicate the prediction of each pixel
As a result it and marks;
In recurrence, a positive sample frame vector t=(tx,ty,th,tw) and predicted vector t*=(tx *,ty *,th *,tw *) and calculate
T is to be calculated by the following formula, tx=(Gx-Px)/Pw, ty=(Gy-Py)/Ph, tw=log (Gw/Pw), th=log (Gh/Ph);
Wherein Pi=(Px,Py,Ph,Pw) it is prediction block, Ph,PwIndicate that the width of prediction block is high, Px,Py, indicate the central point of prediction block
Coordinate, same method calculate callout box G;
In classification, ROI is given, the score map of Pixel-level is obtained by the position sensing map of the ROI of splicing correspondence;Pass through one
A 1 × 1 × 512 convolution operation is classified.
6. a kind of full convolution example semantic realized described in Claims 1 to 5 any one suitable for the enhancing of small target deteection
The computer program of partitioning algorithm.
7. a kind of terminal, which is characterized in that the terminal is at least carried to be suitable for described in realization Claims 1 to 4 any one
The controller of the full convolution example semantic partitioning algorithm of the enhancing of small target deteection.
8. a kind of computer readable storage medium, including instruction, when run on a computer, so that computer is executed as weighed
Benefit requires the full convolution example semantic partitioning algorithm described in 1~5 any one suitable for the enhancing of small target deteection.
9. a kind of implement to realize the full convolution example for being suitable for the enhancing of small target deteection described in Claims 1 to 5 any one
The automatic driving vehicle of semantic segmentation algorithm.
10. a kind of implement to realize the full convolution example for being suitable for the enhancing of small target deteection described in Claims 1 to 5 any one
The medical image analytical equipment of semantic segmentation algorithm.
11. a kind of implement to realize the full convolution example for being suitable for the enhancing of small target deteection described in Claims 1 to 5 any one
The intelligent robot of semantic segmentation algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811601302.9A CN109583517A (en) | 2018-12-26 | 2018-12-26 | A kind of full convolution example semantic partitioning algorithm of the enhancing suitable for small target deteection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811601302.9A CN109583517A (en) | 2018-12-26 | 2018-12-26 | A kind of full convolution example semantic partitioning algorithm of the enhancing suitable for small target deteection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109583517A true CN109583517A (en) | 2019-04-05 |
Family
ID=65931945
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811601302.9A Pending CN109583517A (en) | 2018-12-26 | 2018-12-26 | A kind of full convolution example semantic partitioning algorithm of the enhancing suitable for small target deteection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109583517A (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110211097A (en) * | 2019-05-14 | 2019-09-06 | 河海大学 | A kind of crack image detecting method based on the migration of Faster R-CNN parameter |
CN110222641A (en) * | 2019-06-06 | 2019-09-10 | 北京百度网讯科技有限公司 | The method and apparatus of image for identification |
CN110349167A (en) * | 2019-07-10 | 2019-10-18 | 北京悉见科技有限公司 | A kind of image instance dividing method and device |
CN110378381A (en) * | 2019-06-17 | 2019-10-25 | 华为技术有限公司 | Object detecting method, device and computer storage medium |
CN110570458A (en) * | 2019-08-12 | 2019-12-13 | 武汉大学 | Target tracking method based on internal cutting and multi-layer characteristic information fusion |
CN110765886A (en) * | 2019-09-29 | 2020-02-07 | 深圳大学 | Road target detection method and device based on convolutional neural network |
CN110780164A (en) * | 2019-11-04 | 2020-02-11 | 华北电力大学(保定) | Insulator infrared fault positioning diagnosis method and device based on YOLO |
CN110930373A (en) * | 2019-11-06 | 2020-03-27 | 天津大学 | Pneumonia recognition device based on neural network |
CN111507115A (en) * | 2020-04-12 | 2020-08-07 | 北京花兰德科技咨询服务有限公司 | Multi-modal language information artificial intelligence translation method, system and equipment |
WO2020216008A1 (en) * | 2019-04-25 | 2020-10-29 | 腾讯科技(深圳)有限公司 | Image processing method, apparatus and device, and storage medium |
CN111860510A (en) * | 2020-07-29 | 2020-10-30 | 浙江大华技术股份有限公司 | X-ray image target detection method and device |
CN111954053A (en) * | 2019-05-17 | 2020-11-17 | 上海哔哩哔哩科技有限公司 | Method for acquiring mask frame data, computer device and readable storage medium |
CN112396620A (en) * | 2020-11-17 | 2021-02-23 | 齐鲁工业大学 | Image semantic segmentation method and system based on multiple thresholds |
CN112492323A (en) * | 2019-09-12 | 2021-03-12 | 上海哔哩哔哩科技有限公司 | Live broadcast mask generation method, readable storage medium and computer equipment |
CN112541900A (en) * | 2020-12-15 | 2021-03-23 | 平安科技(深圳)有限公司 | Detection method and device based on convolutional neural network, computer equipment and storage medium |
WO2021068182A1 (en) * | 2019-10-11 | 2021-04-15 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for instance segmentation based on semantic segmentation |
CN112700444A (en) * | 2021-02-19 | 2021-04-23 | 中国铁道科学研究院集团有限公司铁道建筑研究所 | Bridge bolt detection method based on self-attention and central point regression model |
WO2021129105A1 (en) * | 2019-12-27 | 2021-07-01 | 歌尔股份有限公司 | Mask rcnn network model-based target identification method and apparatus |
CN113065637A (en) * | 2021-02-27 | 2021-07-02 | 华为技术有限公司 | Perception network and data processing method |
CN113313094A (en) * | 2021-07-30 | 2021-08-27 | 北京电信易通信息技术股份有限公司 | Vehicle-mounted image target detection method and system based on convolutional neural network |
CN114817991A (en) * | 2022-05-10 | 2022-07-29 | 上海计算机软件技术开发中心 | Internet of vehicles image desensitization method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169421A (en) * | 2017-04-20 | 2017-09-15 | 华南理工大学 | A kind of car steering scene objects detection method based on depth convolutional neural networks |
CN107463892A (en) * | 2017-07-27 | 2017-12-12 | 北京大学深圳研究生院 | Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics |
CN107886117A (en) * | 2017-10-30 | 2018-04-06 | 国家新闻出版广电总局广播科学研究院 | The algorithm of target detection merged based on multi-feature extraction and multitask |
US20180253622A1 (en) * | 2017-03-06 | 2018-09-06 | Honda Motor Co., Ltd. | Systems for performing semantic segmentation and methods thereof |
CN108509978A (en) * | 2018-02-28 | 2018-09-07 | 中南大学 | The multi-class targets detection method and model of multi-stage characteristics fusion based on CNN |
CN108664994A (en) * | 2018-04-17 | 2018-10-16 | 哈尔滨工业大学深圳研究生院 | A kind of remote sensing image processing model construction system and method |
CN108830205A (en) * | 2018-06-04 | 2018-11-16 | 江南大学 | Based on the multiple dimensioned perception pedestrian detection method for improving full convolutional network |
-
2018
- 2018-12-26 CN CN201811601302.9A patent/CN109583517A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180253622A1 (en) * | 2017-03-06 | 2018-09-06 | Honda Motor Co., Ltd. | Systems for performing semantic segmentation and methods thereof |
CN107169421A (en) * | 2017-04-20 | 2017-09-15 | 华南理工大学 | A kind of car steering scene objects detection method based on depth convolutional neural networks |
CN107463892A (en) * | 2017-07-27 | 2017-12-12 | 北京大学深圳研究生院 | Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics |
CN107886117A (en) * | 2017-10-30 | 2018-04-06 | 国家新闻出版广电总局广播科学研究院 | The algorithm of target detection merged based on multi-feature extraction and multitask |
CN108509978A (en) * | 2018-02-28 | 2018-09-07 | 中南大学 | The multi-class targets detection method and model of multi-stage characteristics fusion based on CNN |
CN108664994A (en) * | 2018-04-17 | 2018-10-16 | 哈尔滨工业大学深圳研究生院 | A kind of remote sensing image processing model construction system and method |
CN108830205A (en) * | 2018-06-04 | 2018-11-16 | 江南大学 | Based on the multiple dimensioned perception pedestrian detection method for improving full convolutional network |
Non-Patent Citations (2)
Title |
---|
YI LI 等: "Fully Convolutional Instance-Aware Semantic Segmentation", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
李雯 等: "基于卷积神经网络的特定细小军事标识检测", 《信息工程大学学报》 * |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020216008A1 (en) * | 2019-04-25 | 2020-10-29 | 腾讯科技(深圳)有限公司 | Image processing method, apparatus and device, and storage medium |
CN110211097A (en) * | 2019-05-14 | 2019-09-06 | 河海大学 | A kind of crack image detecting method based on the migration of Faster R-CNN parameter |
CN111954053B (en) * | 2019-05-17 | 2023-09-05 | 上海哔哩哔哩科技有限公司 | Method for acquiring mask frame data, computer equipment and readable storage medium |
CN111954053A (en) * | 2019-05-17 | 2020-11-17 | 上海哔哩哔哩科技有限公司 | Method for acquiring mask frame data, computer device and readable storage medium |
CN110222641A (en) * | 2019-06-06 | 2019-09-10 | 北京百度网讯科技有限公司 | The method and apparatus of image for identification |
CN110378381A (en) * | 2019-06-17 | 2019-10-25 | 华为技术有限公司 | Object detecting method, device and computer storage medium |
CN110378381B (en) * | 2019-06-17 | 2024-01-19 | 华为技术有限公司 | Object detection method, device and computer storage medium |
CN110349167A (en) * | 2019-07-10 | 2019-10-18 | 北京悉见科技有限公司 | A kind of image instance dividing method and device |
CN110570458A (en) * | 2019-08-12 | 2019-12-13 | 武汉大学 | Target tracking method based on internal cutting and multi-layer characteristic information fusion |
CN110570458B (en) * | 2019-08-12 | 2022-02-01 | 武汉大学 | Target tracking method based on internal cutting and multi-layer characteristic information fusion |
CN112492323A (en) * | 2019-09-12 | 2021-03-12 | 上海哔哩哔哩科技有限公司 | Live broadcast mask generation method, readable storage medium and computer equipment |
CN110765886A (en) * | 2019-09-29 | 2020-02-07 | 深圳大学 | Road target detection method and device based on convolutional neural network |
CN110765886B (en) * | 2019-09-29 | 2022-05-03 | 深圳大学 | Road target detection method and device based on convolutional neural network |
WO2021068182A1 (en) * | 2019-10-11 | 2021-04-15 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for instance segmentation based on semantic segmentation |
CN110780164A (en) * | 2019-11-04 | 2020-02-11 | 华北电力大学(保定) | Insulator infrared fault positioning diagnosis method and device based on YOLO |
CN110780164B (en) * | 2019-11-04 | 2022-03-25 | 华北电力大学(保定) | Insulator infrared fault positioning diagnosis method and device based on YOLO |
CN110930373A (en) * | 2019-11-06 | 2020-03-27 | 天津大学 | Pneumonia recognition device based on neural network |
US11688163B2 (en) | 2019-12-27 | 2023-06-27 | Goertek Inc. | Target recognition method and device based on MASK RCNN network model |
WO2021129105A1 (en) * | 2019-12-27 | 2021-07-01 | 歌尔股份有限公司 | Mask rcnn network model-based target identification method and apparatus |
CN111507115A (en) * | 2020-04-12 | 2020-08-07 | 北京花兰德科技咨询服务有限公司 | Multi-modal language information artificial intelligence translation method, system and equipment |
CN111507115B (en) * | 2020-04-12 | 2021-07-27 | 北京花兰德科技咨询服务有限公司 | Multi-modal language information artificial intelligence translation method, system and equipment |
CN111860510A (en) * | 2020-07-29 | 2020-10-30 | 浙江大华技术股份有限公司 | X-ray image target detection method and device |
CN112396620A (en) * | 2020-11-17 | 2021-02-23 | 齐鲁工业大学 | Image semantic segmentation method and system based on multiple thresholds |
CN112541900A (en) * | 2020-12-15 | 2021-03-23 | 平安科技(深圳)有限公司 | Detection method and device based on convolutional neural network, computer equipment and storage medium |
CN112541900B (en) * | 2020-12-15 | 2024-01-02 | 平安科技(深圳)有限公司 | Detection method and device based on convolutional neural network, computer equipment and storage medium |
CN112700444B (en) * | 2021-02-19 | 2023-06-23 | 中国铁道科学研究院集团有限公司铁道建筑研究所 | Bridge bolt detection method based on self-attention and central point regression model |
CN112700444A (en) * | 2021-02-19 | 2021-04-23 | 中国铁道科学研究院集团有限公司铁道建筑研究所 | Bridge bolt detection method based on self-attention and central point regression model |
CN113065637B (en) * | 2021-02-27 | 2023-09-01 | 华为技术有限公司 | Sensing network and data processing method |
CN113065637A (en) * | 2021-02-27 | 2021-07-02 | 华为技术有限公司 | Perception network and data processing method |
CN113313094A (en) * | 2021-07-30 | 2021-08-27 | 北京电信易通信息技术股份有限公司 | Vehicle-mounted image target detection method and system based on convolutional neural network |
CN113313094B (en) * | 2021-07-30 | 2021-09-24 | 北京电信易通信息技术股份有限公司 | Vehicle-mounted image target detection method and system based on convolutional neural network |
CN114817991A (en) * | 2022-05-10 | 2022-07-29 | 上海计算机软件技术开发中心 | Internet of vehicles image desensitization method and system |
CN114817991B (en) * | 2022-05-10 | 2024-02-02 | 上海计算机软件技术开发中心 | Internet of vehicles image desensitization method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109583517A (en) | A kind of full convolution example semantic partitioning algorithm of the enhancing suitable for small target deteection | |
CN109859190B (en) | Target area detection method based on deep learning | |
CN110738207B (en) | Character detection method for fusing character area edge information in character image | |
CN111784685B (en) | Power transmission line defect image identification method based on cloud edge cooperative detection | |
CN111368600B (en) | Remote sensing image target detection and identification method and device, readable storage medium and equipment | |
CN110532859A (en) | Remote Sensing Target detection method based on depth evolution beta pruning convolution net | |
CN110874841A (en) | Object detection method and device with reference to edge image | |
CN114120019A (en) | Lightweight target detection method | |
US20190244358A1 (en) | Method and system for scene parsing and storage medium | |
CN111461212B (en) | Compression method for point cloud target detection model | |
CN107463892A (en) | Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics | |
Putra et al. | Convolutional neural network for person and car detection using yolo framework | |
CN113033537A (en) | Method, apparatus, device, medium and program product for training a model | |
CN112581443A (en) | Light-weight identification method for surface damage of wind driven generator blade | |
US10762389B2 (en) | Methods and systems of segmentation of a document | |
CN113139543B (en) | Training method of target object detection model, target object detection method and equipment | |
CN105303163B (en) | A kind of method and detection device of target detection | |
CN110349167A (en) | A kind of image instance dividing method and device | |
CN114202743A (en) | Improved fast-RCNN-based small target detection method in automatic driving scene | |
KR20200129314A (en) | Object detection in very high-resolution aerial images feature pyramid network | |
CN110599453A (en) | Panel defect detection method and device based on image fusion and equipment terminal | |
CN110532914A (en) | Building analyte detection method based on fine-feature study | |
CN110008900A (en) | A kind of visible remote sensing image candidate target extracting method by region to target | |
CN115512251A (en) | Unmanned aerial vehicle low-illumination target tracking method based on double-branch progressive feature enhancement | |
CN115131797A (en) | Scene text detection method based on feature enhancement pyramid network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190405 |
|
RJ01 | Rejection of invention patent application after publication |