CN111666988A - Target detection algorithm based on multi-layer information fusion - Google Patents
Target detection algorithm based on multi-layer information fusion Download PDFInfo
- Publication number
- CN111666988A CN111666988A CN202010444366.3A CN202010444366A CN111666988A CN 111666988 A CN111666988 A CN 111666988A CN 202010444366 A CN202010444366 A CN 202010444366A CN 111666988 A CN111666988 A CN 111666988A
- Authority
- CN
- China
- Prior art keywords
- network
- features
- information
- region
- detection algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a target detection algorithm based on multi-layer information fusion, which comprises the following steps: s1, preprocessing the data set image; adjusting the image data to the size set by the network; s2, extracting different levels of information of the image by using Densenet, and extracting characteristic features of four stages; s3, normalizing the channel number of the extracted features of the four stages; and S4, vertically fusing the extracted multi-level information, and enhancing the transmission of different levels of information, so that the feature map has rich deep semantic information and shallow position information. The invention uses Densenet as a feature extraction network, compared with the traditional ResNet network, the parameter quantity required by the network is less than half of the ResNet; for the industry, the small model can obviously save bandwidth, reduce storage overhead, improve the calculation efficiency of the network model, and extract information of different levels according to network characteristics.
Description
Technical Field
The invention belongs to the field of computer vision detection, and particularly relates to a target detection algorithm based on multi-layer information fusion.
Background
Object detection and recognition are one of the basic tasks in the field of computer vision, and in the industry, object detection is widely regarded and has many practical applications in various fields. For example: target tracking, automobile assistant driving, biological recognition, smart home, smart agriculture, medical image analysis and identification of flying objects. The reduction of human capital consumption through computer vision is of great practical significance. In the automobile industry, automobile enterprises and primary suppliers have a dispute in the field of auxiliary driving. The driving assistance road based on the camera is opened. For the auxiliary driving of the urban comprehensive road, the road condition is complex, the number of obstacles such as motor vehicles, non-motor vehicles and pedestrians is large, and small targets such as children, pets and scooters are possible to appear. The accurate detection of vehicles and pedestrians by the aid of the camera is required by the driving system and is used as a very important basic link in the auxiliary driving technology, and the improvement of the accuracy and efficiency of the detection algorithm based on the camera has important significance on vehicle safety.
With the fire development of deep learning technology in recent years, the target detection algorithm is also shifted to the detection technology based on the deep neural network from the traditional algorithm based on manual characteristics. In the target detection algorithm based on deep learning, the detection precision and the detection speed of the algorithm are an independent item, and if the detection precision is improved, the detection speed needs to be reduced. And the detection network structure is more complicated, the parameter quantity is overlarge, the training time is long, and the training efficiency is low. The overall algorithm also has great room for improvement. In the existing target detection algorithm, fast-Rcnn is taken as a more advanced algorithm, and the author of the algorithm proposes a candidate region extraction network with shared characteristics, and the application of the network further improves the performance of the algorithm. However, the backbone network VGGNet is an image classification network pre-trained based on ImageNet, and has a position insensitive characteristic, and continuous down-sampling of VGGNet filters information of some smaller targets, so that characteristic information input into the regional candidate network is incomplete.
The single-stage detection algorithm only needs to obtain the position and the category information of the target through one-time network. Compared with the target detection algorithm based on the suggestion area, the detection speed is greatly improved, and the method is more suitable for mobile equipment. However, such methods have the problems of inaccurate positioning and poor recall rate compared with the area-based suggestion method, and have poor detection effect on objects with close distance and very small objects, and relatively weak generalization capability.
Disclosure of Invention
In view of the above, the present invention provides a target detection algorithm based on multi-layer information fusion, which aims to overcome the defects in the prior art.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a target detection algorithm based on multi-layer information fusion comprises the following steps:
s1, preprocessing the data set image; adjusting the image data to the size set by the network;
s2, extracting different levels of information of the image by using Densenet, and extracting characteristic features of four stages;
s3, normalizing the channel number of the extracted features of the four stages;
s4, vertically fusing the extracted multi-level information, and enhancing the transmission of different levels of information, so that the feature map has rich deep semantic information and shallow position information;
s5, extracting the region of interest from the fused multi-level information by using a region suggestion algorithm;
s6, predicting the accurate category of the region of interest and regressing the position coordinates;
s7, calculating the multitask loss function of the classification network and the regression network, training and optimizing the network to make the classification and regression loss function converge and save the weight parameter of the network
And S8, deploying the optimized parameters and detecting the target.
Further, the specific steps of step S1 are as follows:
s101, performing color enhancement, translation change, horizontal and vertical turnover on an image;
s102, scaling all image data to 448 x 448 size using linear interpolation.
Further, the specific method for extracting the features in step S2 is as follows: and performing convolution pooling on the image by using a built 98-layer Densenet network, and outputting the result of each transmission layer to obtain four-stage feature maps with the resolutions of 56 × 56,28 × 28,17 × 17 and 17 × 17.
Further, the specific method for normalizing the number of channels by the four-stage features in step S3 is as follows: the convolution operation is performed on the four stage features by using convolution of 1 × 1 with the channel number of 256, and the dimension of all the stage features is specified to be 256.
Further, the multi-stage feature fusion in step S4 specifically includes:
s401, performing corresponding element addition operation on two adjacent stage features with the same size, and performing up-sampling operation on a smaller-size feature if the two stage features are different in size to ensure that the two fused features are the same in size;
and S402, convolving the fused result by using a convolution kernel of 3x3 to eliminate the aliasing effect after fusion.
Further, the specific method of step S5 is as follows: and (4) extracting the region of interest from the multiple stage features fused in the step S4 by using a region candidate network, and performing foreground and background binary prediction and rough fitting of the border position on the region of interest by using an anchor point mechanism.
Further, the specific method of step S6 is as follows:
s601, performing pooling operation on the region of interest extracted in the step S5;
s602, inputting the pooled region of interest into a fully-connected network, and classifying by using a Softmax classifier;
and S603, outputting predicted target position coordinates x, y, w and h, wherein x, y, w and h respectively represent the center coordinate, the width and the height of the box.
Further, the specific method of step S7 is as follows:
s701, firstly, calculating a loss function of a classification part:
wherein: p is a radical ofiThe probability of predicting a target for an anchor point,case of true tags for data sets
S702, calculating a loss function of the position regression part: using Smoothl1(═ 3) smoothing loss function:
wherein: t is t n4 parameterized coordinate vectors representing the prediction bounding box,is the vector of the real frame matched with the positive case anchor point;
wherein: x, xa,x*Respectively corresponding to a prediction frame, an anchor point and a real frame;
s703, finally calculating the sum of two loss functions:
wherein: n is a radical ofclsFor training the network, the number of images per input, NregThe number of anchor points is, and lambda is a balance parameter of loss of the two parts;
and S704, training the fully-connected network to make the loss function converge.
Compared with the prior art, the invention has the following advantages:
the invention uses Densenet as a feature extraction network, compared with the traditional ResNet network, the parameter quantity required by the network is less than half of the ResNet; for the industry, the small model can obviously save bandwidth, reduce storage overhead, improve the calculation efficiency of the network model, and extract information of different levels according to network characteristics.
The invention solves the problem that a two-stage target detection algorithm is insensitive to small target objects, enhances the capability of extracting information by using a basic network, and builds a multi-layer information fusion network to fuse information of different layers, thereby ensuring that the position information of the features of the high layer is not lost and the semantic information of the features of the low layer is not lost; the detection capability of the detection algorithm to the targets with different sizes is improved while the detection speed is not reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the invention without limitation. In the drawings:
FIG. 1 is a flowchart of a target price detection algorithm based on multi-layer information fusion according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a feature extraction network Densenet in the embodiment of the present invention;
FIG. 3 is a flow chart of a multi-information fusion network in an embodiment of the present invention;
FIG. 4 is a flowchart of the regression and classification of candidate regions according to an embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in the orientation or positional relationship indicated in the drawings, which are merely for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the referenced device or element must have a particular orientation, be constructed and operated in a particular orientation, and are therefore not to be construed as limiting the invention. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the invention, the meaning of "a plurality" is two or more unless otherwise specified.
In the description of the invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted", "connected" and "connected" are to be construed broadly, e.g. as being fixed or detachable or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the creation of the present invention can be understood by those of ordinary skill in the art through specific situations.
The invention will be described in detail with reference to the following embodiments with reference to the attached drawings.
An object detection algorithm based on multi-layer information fusion, as shown in fig. 1 to 4, includes:
s1, preprocessing the data set image; adjusting the image data to the size set by the network; s2, extracting different levels of information of the image by using Densenet, and extracting characteristic features of four stages; s3, normalizing the channel number of the extracted features of the four stages; s4, vertically fusing the extracted multi-level information, and enhancing the transmission of different levels of information, so that the feature map has rich deep semantic information and shallow position information; s5, extracting the region of interest from the fused multi-level information by using a region suggestion algorithm; s6, predicting the accurate category of the region of interest and regressing the position coordinates; s7, calculating a multi-task loss function of the classification network and the regression network, training and optimizing the network to make the classification and regression loss function converge and save the weight parameters of the network; s8, deploying the optimized parameters, and detecting the target;
specifically, S8, deploying the network weight parameters stored in step 7 into a network, inputting image data including the target, performing feature extraction and fusion on the image through a trained parameter network, performing rough prediction on the region of interest, performing accurate prediction and regression on the type and position of the target, and finally outputting the target type and position information. .
The specific steps of step S1 are as follows: s101, performing color enhancement, translation change, horizontal and vertical turnover on an image; s102, scaling all image data to 448 x 448 size using linear interpolation.
The specific method for extracting the features in the step S2 is as follows: and performing convolution pooling on the image by using a built 98-layer Densenet network, and outputting the result of each transmission layer to obtain four-stage feature maps with the resolutions of 56 × 56,28 × 28,17 × 17 and 17 × 17.
The specific method for normalizing the number of channels by the four-stage features in the step S3 is as follows: the convolution operation is performed on the four stage features by using convolution of 1 × 1 with the channel number of 256, and the dimension of all the stage features is specified to be 256.
The specific method of the multi-stage feature fusion in step S4 is as follows: s401, performing corresponding element addition operation on two adjacent stage features with the same size, and performing up-sampling operation on a smaller-size feature if the two stage features are different in size to ensure that the two fused features are the same in size; and S402, convolving the fused result by using a convolution kernel of 3x3 to eliminate the aliasing effect after fusion.
The specific method of step S5 is as follows: and (4) extracting the region of interest from the multiple stage features fused in the step S4 by using a region candidate network, and performing foreground and background binary prediction and rough fitting of the border position on the region of interest by using an anchor point mechanism.
The specific method of step S6 is as follows: s601, performing pooling operation on the region of interest extracted in the step S5; s602, inputting the pooled region of interest into a fully-connected network, and classifying by using a Softmax classifier; and S603, outputting predicted target position coordinates x, y, w and h, wherein x, y, w and h respectively represent the center coordinate, the width and the height of the box.
The specific method of step S7 is as follows: s701, firstly, calculating a loss function of a classification part:
wherein: p is a radical ofiThe probability of predicting a target for an anchor point,case of true tags for data sets
S702, calculating a loss function of the position regression part: using Smoothl1(═ 3) smoothing loss function:
wherein: t is t n4 parameters representing the prediction bounding boxThe vector of coordinates of the quantization is,is the vector of the real frame matched with the positive case anchor point;
wherein: x, xa,x*Respectively corresponding to a prediction frame, an anchor point and a real frame;
s703, finally calculating the sum of two loss functions:
wherein: n is a radical ofclsFor training the network, the number of images per input, NregThe number of anchor points is, and lambda is a balance parameter of loss of the two parts;
and S704, training the fully-connected network to make the loss function converge.
Specifically, the structure of the feature fusion network created by the present invention is shown in fig. 3. The invention utilizes dense feature extraction network to extract feature maps (C1, C2, C3 and C4) with different scales. In order to achieve the effect of feature sharing and more accurate detection, the feature graphs at different stages are subjected to pyramid fusion and then are respectively input into the regional candidate network for prediction. The multi-level feature information is fused by means of the characteristics of the structure, the features with low resolution and high level are connected with the features with high resolution and low level from top to bottom in a side-to-side mode, and therefore the features with all scales contain feature information of objects with different sizes. The perception of information by the detector is increased to some extent. Compared with the fast Rcnn algorithm, the original fast Rcnn only utilizes the last layer of feature information of the feature extraction network, and the algorithm performs suggestion region extraction on fused multi-stage features instead of only using the last p1 stage features because the subsequent region suggestion network is a sliding window detector with a fixed window size, so that the sliding on different layers of the fused network can increase the robustness of the fused network to target scale change. In addition, only the last stage is used, so that more anchor points exist, and the accuracy cannot be effectively improved by increasing the number of mapping anchor points.
The structure on the left side of FIG. 3 is the dimension normalization process of different levels of features, and the invention creates the output (C1, C2, C3, C4) of each transition layer as the input of the feature fusion network. The feature map extracted by the trunk network densenert has different dimensions and resolutions, and the dimensions of features of different levels are normalized before fusion. All the extracted features are subjected to convolution operation of 1x1, different channel information is subjected to linear combination, and the features are subjected to dimensionality reduction and dimensionality enhancement under the condition that the expression capability of the model is not damaged. And on the premise of keeping the size of the characteristic diagram unchanged, the nonlinear characteristic is added. The features after dimension unification are respectively C1 ', C2', C3 'and C4'. The resolution of the feature maps of C1 ', C2', C3 'and C4' are 28x28,28x28,56x56 and 112x112 respectively, and FIG. 3 shows the fusion process of the multi-level features. The specific fusion process of fusion1 and fusion2 fusion3 is shown as the right dashed box flow at C4'kAnd C3'kIn the fusion process of (1), C4'kAnd C3'kFeature map size same, C4'kC4 'is directly prepared without an upsampling process'kAnd C3'kAnd performing add operation. P3'kAnd C2'kThe same applies to the fusion operation 2; p2'kAnd C1'kIn the fusion process, the sizes of the two groups of feature maps are different, and P2 'is obtained by utilizing bilinear interpolation operation'kReduction to C1'kAnd (4) size. Subscriptk represents the kth dimension of a feature, e.g.The Add operation calculation for the two features of the k-th dimension is shown in the following equation.
Zk(x,y)=fadd(Ak,Bk)=β1Ak(x,y)+β2Bk(x,y)
The above formula represents the characteristic Ak,BkThe corresponding elements of (x, y) position of (c) are added, and the result of adding all positions is used as the characteristic after add, using β1,β2For feature Ak,BkCarry out the weighted balance β1=β2=0.5。
To eliminate aliasing effects after fusion, each fusion result was convolved again using a convolution kernel of 3 × 3. The fused network outputs P1(28x28 d-256), P2(28x28 d-256), P3(28x28 d-256), and P4(56x56 d-256), respectively, and then these fused features are input to subsequent candidate networks of the region of interest.
Region of interest extraction and classification and regression network: the algorithm uses a region suggestion network to extract the region of interest of the features. The RPN network is essentially a non-category target detector based on a sliding window, the input of the network is a characteristic diagram with different sizes returned by a basic network, and an interested area is output. Structure of area candidate network as shown in fig. 4, in order to generate candidate areas, a 3 × 3 window is slid on feature maps of multiple sizes, and the anchor point mapping mechanism plays a central role in the network. Anchors are placed on pictures of different sizes and scales with a fixed frame and are then used as reference frames in the prediction of the target location. The candidate area network provides two fully connected outputs for each anchor point. The first output is the probability of the anchor point as the target, and the second output is the frame regression, which is used to adjust the anchor point to better fit the predicted target.
The ROIPooling layer utilizes the suggested region generated by the region candidate network and the feature extracted by the feature networkObtaining a characteristic diagram of a fixed-size suggested area, then forming the characteristic diagram of the fixed size on the Roi Pooling layer for full connection operation, classifying specific categories by using a normalization function, and simultaneously, using Smoothl1The loss function performs a regression operation to obtain the precise position of the object.
A multitask penalty function; in the regional candidate network, we set two types of labels for each anchor point: positive and negative examples. The positive sample is the anchor point with the highest intersection ratio with the real frame, and if the intersection ratio of the anchor point with the real frame is lower than 0.3, the anchor point is set as the negative sample. The loss function for the RPN network is defined as:
equation 5 is divided into two parts, the first part is the classification loss and the second part is the regression loss of the target box, where: p is a radical ofiThe probability of predicting a target for an anchor point,in the case of a dataset true tag. t is t n4 parameterized coordinate vectors representing the prediction bounding box,is the vector of the real box that matches the positive case anchor.
X, y, w, h in equation (6) represent the box center coordinates and width and height, respectively. x, xa,x*Respectively corresponding to the prediction frame, the anchor point and the real frame.
in the second part of the target block regression prediction, a least squares loss function is generally used. But the penalty of the L2 loss for a relatively large error is high. Smooth is used hereinl1(═ 3) smoothing loss function
The two part losses are represented by Ncls(sizing of Small batches, 32) and Nreg(anchor location number determination, here 5488) is normalized and weighted by a balance parameter lambda to achieve the effect of balancing the classification and regressing the partial weights.
The region of interest extracted by the RPN network also needs to be subjected to prediction of a specific class and fine tuning of a prediction box. The method is a classification and regression process, the regression loss principle of the fine tuning candidate box is consistent with the regression loss of the region of interest, and the prediction of a specific class is expanded into 20 classes from plus and minus classes. The classification and regression loss for the specific 20 classes is shown in equation 8.
p=(p0,...,pk) Predicting the discrete probability distribution of each interested region for the output of the classification network, u being the real class of each interested region, mu being the balance coefficient mu of the two loss functions, 0.5, tn' denotes the 4 parameterized coordinate vectors of the final fine-tuning bounding box. In order to accurately extract the region of interest and perform the following specific classification and boundary fine tuning, the total loss function of the algorithm is set as the sum of the above two losses (formula 12), so that the weights of the two stages can be updated simultaneously when the network is trained.
Itotal=L({pi},{tn})+L({p},{tn′}) (12)
The specific steps of the proposed algorithm are as follows: in order to acquire features including different levels of information, a densinet network is used for carrying out multi-stage feature extraction on an image, and the output of a transition layer in the densinet is used as the extracted different-stage features Cn (n is 1, 2, 3 and 4).
And because the extracted feature dimensions of different stages are inconsistent, performing dimension normalization on the features of different stages. The feature dimensions of different stages are normalized to 256 by using the characteristic of convolution of 1x1, and then the fusion operation shown in fig. 3 is performed, and the specific fusion calculation is shown in formulas 13-16.
Cn′=Cn*conv1x1d(n=1,2,3,4,d=256) (13)
P4=C4′
In equation 13, Cn' is the result of dimension normalization of the features at different stages, and d ═ 256 represents the dimension of the normalized features. In formulae 14 to 16, fadd() Add operation, β, representing two features set forth in 2.21,β2Function to weight balance the features β1=β2=0.5
And extracting the region of interest from the fused feature using the region suggestion network. And roughly predicting the target, judging whether the target contains the object or not, not predicting the specific category, and roughly regressing the position of the target. And finishing the extraction of the region of interest, and then performing the prediction of the specific category and the fine regression of the target position.
In order to accurately acquire the category and the position of the target. The sum of the loss function of the proposed region and the final classification regression loss function is calculated using the multitask loss function designed in 2.4. And then, the total loss function is subjected to derivation by using a back propagation algorithm, the weight and the bias parameters are updated, and multiple iterations are performed to minimize the loss function.
The invention uses Densenet as a feature extraction network, compared with the traditional ResNet network, the parameter quantity required by the network is less than half of the ResNet; for the industry, the small model can obviously save bandwidth, reduce storage overhead, improve the calculation efficiency of the network model, and extract information of different levels according to network characteristics.
The invention solves the problem that a two-stage target detection algorithm is insensitive to small target objects, enhances the capability of extracting information by using a basic network, and builds a multi-layer information fusion network to fuse information of different layers, thereby ensuring that the position information of the features of the high layer is not lost and the semantic information of the features of the low layer is not lost; the detection capability of the detection algorithm to the targets with different sizes is improved while the detection speed is not reduced.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the invention, so that any modifications, equivalents, improvements and the like, which are within the spirit and principle of the present invention, should be included in the scope of the present invention.
Claims (8)
1. A target detection algorithm based on multi-layer information fusion is characterized by comprising the following steps:
s1, preprocessing the data set image; adjusting the image data to the size set by the network;
s2, extracting different levels of information of the image by using Densenet, and extracting characteristic features of four stages;
s3, normalizing the channel number of the extracted features of the four stages;
s4, vertically fusing the extracted multi-level information, and enhancing the transmission of different levels of information, so that the feature map has rich deep semantic information and shallow position information;
s5, extracting the region of interest from the fused multi-level information by using a region suggestion algorithm;
s6, predicting the accurate category of the region of interest and regressing the position coordinates;
s7, calculating the multitask loss function of the classification network and the regression network, training and optimizing the network to make the classification and regression loss function converge and save the weight parameter of the network
And S8, deploying the optimized parameters and detecting the target.
2. The multi-layer information fusion-based target detection algorithm as claimed in claim 1, wherein the specific steps of step S1 are as follows:
s101, performing color enhancement, translation change, horizontal and vertical turnover on an image;
s102, scaling all image data to 448 x 448 size using linear interpolation.
3. The multi-layer information fusion-based target detection algorithm as claimed in claim 1, wherein the specific method for feature extraction in step S2 is as follows: and performing convolution pooling on the image by using a built 98-layer Densenet network, and outputting the result of each transmission layer to obtain four-stage feature maps with the resolutions of 56 × 56,28 × 28,17 × 17 and 17 × 17.
4. The multi-layer information fusion-based target detection algorithm as claimed in claim 1, wherein the specific method for performing channel number normalization on the four-stage features in step S3 is as follows: the convolution operation is performed on the four stage features by using convolution of 1 × 1 with the channel number of 256, and the dimension of all the stage features is specified to be 256.
5. The multi-layer information fusion-based target detection algorithm according to claim 1, wherein the multi-stage feature fusion in step S4 is performed by:
s401, performing corresponding element addition operation on two adjacent stage features with the same size, and performing up-sampling operation on a smaller-size feature if the two stage features are different in size to ensure that the two fused features are the same in size;
and S402, convolving the fused result by using a convolution kernel of 3x3 to eliminate the aliasing effect after fusion.
6. The multi-layer information fusion-based target detection algorithm as claimed in claim 1, wherein the specific method of step S5 is as follows: and (4) extracting the region of interest from the multiple stage features fused in the step S4 by using a region candidate network, and performing foreground and background binary prediction and rough fitting of the border position on the region of interest by using an anchor point mechanism.
7. The multi-layer information fusion-based target detection algorithm as claimed in claim 1, wherein the specific method of step S6 is as follows:
s601, performing pooling operation on the region of interest extracted in the step S5;
s602, inputting the pooled region of interest into a fully-connected network, and classifying by using a Softmax classifier;
and S603, outputting predicted target position coordinates x, y, w and h, wherein x, y, w and h respectively represent the center coordinate, the width and the height of the box.
8. The multi-layer information fusion-based target detection algorithm as claimed in claim 1, wherein the specific method of step S7 is as follows:
s701, firstly, calculating a loss function of a classification part:
wherein: p is a radical ofiThe probability of predicting a target for an anchor point,case of true tags for data sets
S702, calculating a loss function of the position regression part: using Smoothl1(═ 3) smoothing loss function:
wherein: t is tn4 parameterized coordinate vectors representing the prediction bounding box,is the vector of the real frame matched with the positive case anchor point;
wherein: x, xa,x*Respectively corresponding to a prediction frame, an anchor point and a real frame;
s703, finally calculating the sum of two loss functions:
wherein: n is a radical ofclsFor training the network, the number of images per input, NregThe number of anchor points is, and lambda is a balance parameter of loss of the two parts;
and S704, training the fully-connected network to make the loss function converge.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010444366.3A CN111666988A (en) | 2020-05-22 | 2020-05-22 | Target detection algorithm based on multi-layer information fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010444366.3A CN111666988A (en) | 2020-05-22 | 2020-05-22 | Target detection algorithm based on multi-layer information fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111666988A true CN111666988A (en) | 2020-09-15 |
Family
ID=72384416
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010444366.3A Pending CN111666988A (en) | 2020-05-22 | 2020-05-22 | Target detection algorithm based on multi-layer information fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111666988A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329610A (en) * | 2020-11-03 | 2021-02-05 | 中科九度(北京)空间信息技术有限责任公司 | High-voltage line detection method based on edge attention mechanism fusion network |
CN112990327A (en) * | 2021-03-25 | 2021-06-18 | 北京百度网讯科技有限公司 | Feature fusion method, device, apparatus, storage medium, and program product |
CN113610822A (en) * | 2021-08-13 | 2021-11-05 | 湖南大学 | Surface defect detection method based on multi-scale information fusion |
CN115017540A (en) * | 2022-05-24 | 2022-09-06 | 贵州大学 | Lightweight privacy protection target detection method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117876A (en) * | 2018-07-26 | 2019-01-01 | 成都快眼科技有限公司 | A kind of dense small target deteection model building method, model and detection method |
CN109614985A (en) * | 2018-11-06 | 2019-04-12 | 华南理工大学 | A kind of object detection method based on intensive connection features pyramid network |
US20190139257A1 (en) * | 2017-08-31 | 2019-05-09 | Nec Laboratories America, Inc. | Online flow guided memory networks for object detection in video |
CN110046572A (en) * | 2019-04-15 | 2019-07-23 | 重庆邮电大学 | A kind of identification of landmark object and detection method based on deep learning |
CN110378880A (en) * | 2019-07-01 | 2019-10-25 | 南京国科软件有限公司 | The Cremation Machine burning time calculation method of view-based access control model |
-
2020
- 2020-05-22 CN CN202010444366.3A patent/CN111666988A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190139257A1 (en) * | 2017-08-31 | 2019-05-09 | Nec Laboratories America, Inc. | Online flow guided memory networks for object detection in video |
CN109117876A (en) * | 2018-07-26 | 2019-01-01 | 成都快眼科技有限公司 | A kind of dense small target deteection model building method, model and detection method |
CN109614985A (en) * | 2018-11-06 | 2019-04-12 | 华南理工大学 | A kind of object detection method based on intensive connection features pyramid network |
CN110046572A (en) * | 2019-04-15 | 2019-07-23 | 重庆邮电大学 | A kind of identification of landmark object and detection method based on deep learning |
CN110378880A (en) * | 2019-07-01 | 2019-10-25 | 南京国科软件有限公司 | The Cremation Machine burning time calculation method of view-based access control model |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329610A (en) * | 2020-11-03 | 2021-02-05 | 中科九度(北京)空间信息技术有限责任公司 | High-voltage line detection method based on edge attention mechanism fusion network |
CN112990327A (en) * | 2021-03-25 | 2021-06-18 | 北京百度网讯科技有限公司 | Feature fusion method, device, apparatus, storage medium, and program product |
CN113610822A (en) * | 2021-08-13 | 2021-11-05 | 湖南大学 | Surface defect detection method based on multi-scale information fusion |
CN115017540A (en) * | 2022-05-24 | 2022-09-06 | 贵州大学 | Lightweight privacy protection target detection method and system |
CN115017540B (en) * | 2022-05-24 | 2024-07-02 | 贵州大学 | Lightweight privacy protection target detection method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110298262B (en) | Object identification method and device | |
Wang et al. | Soft-weighted-average ensemble vehicle detection method based on single-stage and two-stage deep learning models | |
CN112926396B (en) | Action identification method based on double-current convolution attention | |
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN112418236B (en) | Automobile drivable area planning method based on multitask neural network | |
CN111666988A (en) | Target detection algorithm based on multi-layer information fusion | |
CN106326858A (en) | Road traffic sign automatic identification and management system based on deep learning | |
WO2021218786A1 (en) | Data processing system, object detection method and apparatus thereof | |
CN106228125B (en) | Method for detecting lane lines based on integrated study cascade classifier | |
CN109886066A (en) | Fast target detection method based on the fusion of multiple dimensioned and multilayer feature | |
CN111046781A (en) | Robust three-dimensional target detection method based on ternary attention mechanism | |
CN111126459A (en) | Method and device for identifying fine granularity of vehicle | |
CN111339830A (en) | Target classification method based on multi-modal data features | |
CN112395951A (en) | Complex scene-oriented domain-adaptive traffic target detection and identification method | |
CN115375781A (en) | Data processing method and device | |
CN115631344A (en) | Target detection method based on feature adaptive aggregation | |
CN115273032A (en) | Traffic sign recognition method, apparatus, device and medium | |
CN117132759A (en) | Saliency target detection method based on multiband visual image perception and fusion | |
Zhang et al. | Classroom student posture recognition based on an improved high-resolution network | |
CN116563682A (en) | Attention scheme and strip convolution semantic line detection method based on depth Hough network | |
CN117911698A (en) | Image semantic segmentation method and device based on multi-feature fusion | |
Liang et al. | Car detection and classification using cascade model | |
CN116863260A (en) | Data processing method and device | |
CN116433675B (en) | Vehicle counting method based on residual information enhancement, electronic device and readable medium | |
CN115082869B (en) | Vehicle-road cooperative multi-target detection method and system for serving special vehicle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200915 |
|
WD01 | Invention patent application deemed withdrawn after publication |