CN111666988A - Target detection algorithm based on multi-layer information fusion - Google Patents

Target detection algorithm based on multi-layer information fusion Download PDF

Info

Publication number
CN111666988A
CN111666988A CN202010444366.3A CN202010444366A CN111666988A CN 111666988 A CN111666988 A CN 111666988A CN 202010444366 A CN202010444366 A CN 202010444366A CN 111666988 A CN111666988 A CN 111666988A
Authority
CN
China
Prior art keywords
network
features
information
region
detection algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010444366.3A
Other languages
Chinese (zh)
Inventor
陈宝远
申宇琨
历博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202010444366.3A priority Critical patent/CN111666988A/en
Publication of CN111666988A publication Critical patent/CN111666988A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target detection algorithm based on multi-layer information fusion, which comprises the following steps: s1, preprocessing the data set image; adjusting the image data to the size set by the network; s2, extracting different levels of information of the image by using Densenet, and extracting characteristic features of four stages; s3, normalizing the channel number of the extracted features of the four stages; and S4, vertically fusing the extracted multi-level information, and enhancing the transmission of different levels of information, so that the feature map has rich deep semantic information and shallow position information. The invention uses Densenet as a feature extraction network, compared with the traditional ResNet network, the parameter quantity required by the network is less than half of the ResNet; for the industry, the small model can obviously save bandwidth, reduce storage overhead, improve the calculation efficiency of the network model, and extract information of different levels according to network characteristics.

Description

Target detection algorithm based on multi-layer information fusion
Technical Field
The invention belongs to the field of computer vision detection, and particularly relates to a target detection algorithm based on multi-layer information fusion.
Background
Object detection and recognition are one of the basic tasks in the field of computer vision, and in the industry, object detection is widely regarded and has many practical applications in various fields. For example: target tracking, automobile assistant driving, biological recognition, smart home, smart agriculture, medical image analysis and identification of flying objects. The reduction of human capital consumption through computer vision is of great practical significance. In the automobile industry, automobile enterprises and primary suppliers have a dispute in the field of auxiliary driving. The driving assistance road based on the camera is opened. For the auxiliary driving of the urban comprehensive road, the road condition is complex, the number of obstacles such as motor vehicles, non-motor vehicles and pedestrians is large, and small targets such as children, pets and scooters are possible to appear. The accurate detection of vehicles and pedestrians by the aid of the camera is required by the driving system and is used as a very important basic link in the auxiliary driving technology, and the improvement of the accuracy and efficiency of the detection algorithm based on the camera has important significance on vehicle safety.
With the fire development of deep learning technology in recent years, the target detection algorithm is also shifted to the detection technology based on the deep neural network from the traditional algorithm based on manual characteristics. In the target detection algorithm based on deep learning, the detection precision and the detection speed of the algorithm are an independent item, and if the detection precision is improved, the detection speed needs to be reduced. And the detection network structure is more complicated, the parameter quantity is overlarge, the training time is long, and the training efficiency is low. The overall algorithm also has great room for improvement. In the existing target detection algorithm, fast-Rcnn is taken as a more advanced algorithm, and the author of the algorithm proposes a candidate region extraction network with shared characteristics, and the application of the network further improves the performance of the algorithm. However, the backbone network VGGNet is an image classification network pre-trained based on ImageNet, and has a position insensitive characteristic, and continuous down-sampling of VGGNet filters information of some smaller targets, so that characteristic information input into the regional candidate network is incomplete.
The single-stage detection algorithm only needs to obtain the position and the category information of the target through one-time network. Compared with the target detection algorithm based on the suggestion area, the detection speed is greatly improved, and the method is more suitable for mobile equipment. However, such methods have the problems of inaccurate positioning and poor recall rate compared with the area-based suggestion method, and have poor detection effect on objects with close distance and very small objects, and relatively weak generalization capability.
Disclosure of Invention
In view of the above, the present invention provides a target detection algorithm based on multi-layer information fusion, which aims to overcome the defects in the prior art.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a target detection algorithm based on multi-layer information fusion comprises the following steps:
s1, preprocessing the data set image; adjusting the image data to the size set by the network;
s2, extracting different levels of information of the image by using Densenet, and extracting characteristic features of four stages;
s3, normalizing the channel number of the extracted features of the four stages;
s4, vertically fusing the extracted multi-level information, and enhancing the transmission of different levels of information, so that the feature map has rich deep semantic information and shallow position information;
s5, extracting the region of interest from the fused multi-level information by using a region suggestion algorithm;
s6, predicting the accurate category of the region of interest and regressing the position coordinates;
s7, calculating the multitask loss function of the classification network and the regression network, training and optimizing the network to make the classification and regression loss function converge and save the weight parameter of the network
And S8, deploying the optimized parameters and detecting the target.
Further, the specific steps of step S1 are as follows:
s101, performing color enhancement, translation change, horizontal and vertical turnover on an image;
s102, scaling all image data to 448 x 448 size using linear interpolation.
Further, the specific method for extracting the features in step S2 is as follows: and performing convolution pooling on the image by using a built 98-layer Densenet network, and outputting the result of each transmission layer to obtain four-stage feature maps with the resolutions of 56 × 56,28 × 28,17 × 17 and 17 × 17.
Further, the specific method for normalizing the number of channels by the four-stage features in step S3 is as follows: the convolution operation is performed on the four stage features by using convolution of 1 × 1 with the channel number of 256, and the dimension of all the stage features is specified to be 256.
Further, the multi-stage feature fusion in step S4 specifically includes:
s401, performing corresponding element addition operation on two adjacent stage features with the same size, and performing up-sampling operation on a smaller-size feature if the two stage features are different in size to ensure that the two fused features are the same in size;
and S402, convolving the fused result by using a convolution kernel of 3x3 to eliminate the aliasing effect after fusion.
Further, the specific method of step S5 is as follows: and (4) extracting the region of interest from the multiple stage features fused in the step S4 by using a region candidate network, and performing foreground and background binary prediction and rough fitting of the border position on the region of interest by using an anchor point mechanism.
Further, the specific method of step S6 is as follows:
s601, performing pooling operation on the region of interest extracted in the step S5;
s602, inputting the pooled region of interest into a fully-connected network, and classifying by using a Softmax classifier;
and S603, outputting predicted target position coordinates x, y, w and h, wherein x, y, w and h respectively represent the center coordinate, the width and the height of the box.
Further, the specific method of step S7 is as follows:
s701, firstly, calculating a loss function of a classification part:
Figure BDA0002505183930000041
wherein: p is a radical ofiThe probability of predicting a target for an anchor point,
Figure BDA0002505183930000042
case of true tags for data sets
S702, calculating a loss function of the position regression part: using Smoothl1(═ 3) smoothing loss function:
Figure BDA0002505183930000043
Figure BDA0002505183930000044
wherein: t is t n4 parameterized coordinate vectors representing the prediction bounding box,
Figure BDA0002505183930000045
is the vector of the real frame matched with the positive case anchor point;
Figure BDA0002505183930000046
Figure BDA0002505183930000047
Figure BDA0002505183930000048
Figure BDA0002505183930000049
wherein: x, xa,x*Respectively corresponding to a prediction frame, an anchor point and a real frame;
s703, finally calculating the sum of two loss functions:
Figure BDA00025051839300000410
wherein: n is a radical ofclsFor training the network, the number of images per input, NregThe number of anchor points is, and lambda is a balance parameter of loss of the two parts;
and S704, training the fully-connected network to make the loss function converge.
Compared with the prior art, the invention has the following advantages:
the invention uses Densenet as a feature extraction network, compared with the traditional ResNet network, the parameter quantity required by the network is less than half of the ResNet; for the industry, the small model can obviously save bandwidth, reduce storage overhead, improve the calculation efficiency of the network model, and extract information of different levels according to network characteristics.
The invention solves the problem that a two-stage target detection algorithm is insensitive to small target objects, enhances the capability of extracting information by using a basic network, and builds a multi-layer information fusion network to fuse information of different layers, thereby ensuring that the position information of the features of the high layer is not lost and the semantic information of the features of the low layer is not lost; the detection capability of the detection algorithm to the targets with different sizes is improved while the detection speed is not reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the invention without limitation. In the drawings:
FIG. 1 is a flowchart of a target price detection algorithm based on multi-layer information fusion according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a feature extraction network Densenet in the embodiment of the present invention;
FIG. 3 is a flow chart of a multi-information fusion network in an embodiment of the present invention;
FIG. 4 is a flowchart of the regression and classification of candidate regions according to an embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in the orientation or positional relationship indicated in the drawings, which are merely for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the referenced device or element must have a particular orientation, be constructed and operated in a particular orientation, and are therefore not to be construed as limiting the invention. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the invention, the meaning of "a plurality" is two or more unless otherwise specified.
In the description of the invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted", "connected" and "connected" are to be construed broadly, e.g. as being fixed or detachable or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the creation of the present invention can be understood by those of ordinary skill in the art through specific situations.
The invention will be described in detail with reference to the following embodiments with reference to the attached drawings.
An object detection algorithm based on multi-layer information fusion, as shown in fig. 1 to 4, includes:
s1, preprocessing the data set image; adjusting the image data to the size set by the network; s2, extracting different levels of information of the image by using Densenet, and extracting characteristic features of four stages; s3, normalizing the channel number of the extracted features of the four stages; s4, vertically fusing the extracted multi-level information, and enhancing the transmission of different levels of information, so that the feature map has rich deep semantic information and shallow position information; s5, extracting the region of interest from the fused multi-level information by using a region suggestion algorithm; s6, predicting the accurate category of the region of interest and regressing the position coordinates; s7, calculating a multi-task loss function of the classification network and the regression network, training and optimizing the network to make the classification and regression loss function converge and save the weight parameters of the network; s8, deploying the optimized parameters, and detecting the target;
specifically, S8, deploying the network weight parameters stored in step 7 into a network, inputting image data including the target, performing feature extraction and fusion on the image through a trained parameter network, performing rough prediction on the region of interest, performing accurate prediction and regression on the type and position of the target, and finally outputting the target type and position information. .
The specific steps of step S1 are as follows: s101, performing color enhancement, translation change, horizontal and vertical turnover on an image; s102, scaling all image data to 448 x 448 size using linear interpolation.
The specific method for extracting the features in the step S2 is as follows: and performing convolution pooling on the image by using a built 98-layer Densenet network, and outputting the result of each transmission layer to obtain four-stage feature maps with the resolutions of 56 × 56,28 × 28,17 × 17 and 17 × 17.
The specific method for normalizing the number of channels by the four-stage features in the step S3 is as follows: the convolution operation is performed on the four stage features by using convolution of 1 × 1 with the channel number of 256, and the dimension of all the stage features is specified to be 256.
The specific method of the multi-stage feature fusion in step S4 is as follows: s401, performing corresponding element addition operation on two adjacent stage features with the same size, and performing up-sampling operation on a smaller-size feature if the two stage features are different in size to ensure that the two fused features are the same in size; and S402, convolving the fused result by using a convolution kernel of 3x3 to eliminate the aliasing effect after fusion.
The specific method of step S5 is as follows: and (4) extracting the region of interest from the multiple stage features fused in the step S4 by using a region candidate network, and performing foreground and background binary prediction and rough fitting of the border position on the region of interest by using an anchor point mechanism.
The specific method of step S6 is as follows: s601, performing pooling operation on the region of interest extracted in the step S5; s602, inputting the pooled region of interest into a fully-connected network, and classifying by using a Softmax classifier; and S603, outputting predicted target position coordinates x, y, w and h, wherein x, y, w and h respectively represent the center coordinate, the width and the height of the box.
The specific method of step S7 is as follows: s701, firstly, calculating a loss function of a classification part:
Figure BDA0002505183930000081
wherein: p is a radical ofiThe probability of predicting a target for an anchor point,
Figure BDA0002505183930000082
case of true tags for data sets
S702, calculating a loss function of the position regression part: using Smoothl1(═ 3) smoothing loss function:
Figure BDA0002505183930000083
Figure BDA0002505183930000084
wherein: t is t n4 parameters representing the prediction bounding boxThe vector of coordinates of the quantization is,
Figure BDA0002505183930000085
is the vector of the real frame matched with the positive case anchor point;
Figure BDA0002505183930000086
Figure BDA0002505183930000087
Figure BDA0002505183930000088
Figure BDA0002505183930000089
wherein: x, xa,x*Respectively corresponding to a prediction frame, an anchor point and a real frame;
s703, finally calculating the sum of two loss functions:
Figure BDA00025051839300000810
wherein: n is a radical ofclsFor training the network, the number of images per input, NregThe number of anchor points is, and lambda is a balance parameter of loss of the two parts;
and S704, training the fully-connected network to make the loss function converge.
Specifically, the structure of the feature fusion network created by the present invention is shown in fig. 3. The invention utilizes dense feature extraction network to extract feature maps (C1, C2, C3 and C4) with different scales. In order to achieve the effect of feature sharing and more accurate detection, the feature graphs at different stages are subjected to pyramid fusion and then are respectively input into the regional candidate network for prediction. The multi-level feature information is fused by means of the characteristics of the structure, the features with low resolution and high level are connected with the features with high resolution and low level from top to bottom in a side-to-side mode, and therefore the features with all scales contain feature information of objects with different sizes. The perception of information by the detector is increased to some extent. Compared with the fast Rcnn algorithm, the original fast Rcnn only utilizes the last layer of feature information of the feature extraction network, and the algorithm performs suggestion region extraction on fused multi-stage features instead of only using the last p1 stage features because the subsequent region suggestion network is a sliding window detector with a fixed window size, so that the sliding on different layers of the fused network can increase the robustness of the fused network to target scale change. In addition, only the last stage is used, so that more anchor points exist, and the accuracy cannot be effectively improved by increasing the number of mapping anchor points.
The structure on the left side of FIG. 3 is the dimension normalization process of different levels of features, and the invention creates the output (C1, C2, C3, C4) of each transition layer as the input of the feature fusion network. The feature map extracted by the trunk network densenert has different dimensions and resolutions, and the dimensions of features of different levels are normalized before fusion. All the extracted features are subjected to convolution operation of 1x1, different channel information is subjected to linear combination, and the features are subjected to dimensionality reduction and dimensionality enhancement under the condition that the expression capability of the model is not damaged. And on the premise of keeping the size of the characteristic diagram unchanged, the nonlinear characteristic is added. The features after dimension unification are respectively C1 ', C2', C3 'and C4'. The resolution of the feature maps of C1 ', C2', C3 'and C4' are 28x28,28x28,56x56 and 112x112 respectively, and FIG. 3 shows the fusion process of the multi-level features. The specific fusion process of fusion1 and fusion2 fusion3 is shown as the right dashed box flow at C4'kAnd C3'kIn the fusion process of (1), C4'kAnd C3'kFeature map size same, C4'kC4 'is directly prepared without an upsampling process'kAnd C3'kAnd performing add operation. P3'kAnd C2'kThe same applies to the fusion operation 2; p2'kAnd C1'kIn the fusion process, the sizes of the two groups of feature maps are different, and P2 'is obtained by utilizing bilinear interpolation operation'kReduction to C1'kAnd (4) size. Subscriptk represents the kth dimension of a feature, e.g.
Figure BDA0002505183930000101
The Add operation calculation for the two features of the k-th dimension is shown in the following equation.
Zk(x,y)=fadd(Ak,Bk)=β1Ak(x,y)2Bk(x,y)
The above formula represents the characteristic Ak,BkThe corresponding elements of (x, y) position of (c) are added, and the result of adding all positions is used as the characteristic after add, using β1,β2For feature Ak,BkCarry out the weighted balance β1=β2=0.5。
To eliminate aliasing effects after fusion, each fusion result was convolved again using a convolution kernel of 3 × 3. The fused network outputs P1(28x28 d-256), P2(28x28 d-256), P3(28x28 d-256), and P4(56x56 d-256), respectively, and then these fused features are input to subsequent candidate networks of the region of interest.
Region of interest extraction and classification and regression network: the algorithm uses a region suggestion network to extract the region of interest of the features. The RPN network is essentially a non-category target detector based on a sliding window, the input of the network is a characteristic diagram with different sizes returned by a basic network, and an interested area is output. Structure of area candidate network as shown in fig. 4, in order to generate candidate areas, a 3 × 3 window is slid on feature maps of multiple sizes, and the anchor point mapping mechanism plays a central role in the network. Anchors are placed on pictures of different sizes and scales with a fixed frame and are then used as reference frames in the prediction of the target location. The candidate area network provides two fully connected outputs for each anchor point. The first output is the probability of the anchor point as the target, and the second output is the frame regression, which is used to adjust the anchor point to better fit the predicted target.
The ROIPooling layer utilizes the suggested region generated by the region candidate network and the feature extracted by the feature networkObtaining a characteristic diagram of a fixed-size suggested area, then forming the characteristic diagram of the fixed size on the Roi Pooling layer for full connection operation, classifying specific categories by using a normalization function, and simultaneously, using Smoothl1The loss function performs a regression operation to obtain the precise position of the object.
A multitask penalty function; in the regional candidate network, we set two types of labels for each anchor point: positive and negative examples. The positive sample is the anchor point with the highest intersection ratio with the real frame, and if the intersection ratio of the anchor point with the real frame is lower than 0.3, the anchor point is set as the negative sample. The loss function for the RPN network is defined as:
Figure BDA0002505183930000111
equation 5 is divided into two parts, the first part is the classification loss and the second part is the regression loss of the target box, where: p is a radical ofiThe probability of predicting a target for an anchor point,
Figure BDA0002505183930000113
in the case of a dataset true tag. t is t n4 parameterized coordinate vectors representing the prediction bounding box,
Figure BDA0002505183930000114
is the vector of the real box that matches the positive case anchor.
Figure BDA0002505183930000112
X, y, w, h in equation (6) represent the box center coordinates and width and height, respectively. x, xa,x*Respectively corresponding to the prediction frame, the anchor point and the real frame.
In the classification loss of the first part
Figure BDA0002505183930000115
Is a log loss of two classes (target, non-target):
Figure BDA0002505183930000121
in the second part of the target block regression prediction, a least squares loss function is generally used. But the penalty of the L2 loss for a relatively large error is high. Smooth is used hereinl1(═ 3) smoothing loss function
Figure BDA0002505183930000122
Figure BDA0002505183930000123
The two part losses are represented by Ncls(sizing of Small batches, 32) and Nreg(anchor location number determination, here 5488) is normalized and weighted by a balance parameter lambda to achieve the effect of balancing the classification and regressing the partial weights.
The region of interest extracted by the RPN network also needs to be subjected to prediction of a specific class and fine tuning of a prediction box. The method is a classification and regression process, the regression loss principle of the fine tuning candidate box is consistent with the regression loss of the region of interest, and the prediction of a specific class is expanded into 20 classes from plus and minus classes. The classification and regression loss for the specific 20 classes is shown in equation 8.
Figure BDA0002505183930000124
p=(p0,...,pk) Predicting the discrete probability distribution of each interested region for the output of the classification network, u being the real class of each interested region, mu being the balance coefficient mu of the two loss functions, 0.5, tn' denotes the 4 parameterized coordinate vectors of the final fine-tuning bounding box. In order to accurately extract the region of interest and perform the following specific classification and boundary fine tuning, the total loss function of the algorithm is set as the sum of the above two losses (formula 12), so that the weights of the two stages can be updated simultaneously when the network is trained.
Itotal=L({pi},{tn})+L({p},{tn′}) (12)
The specific steps of the proposed algorithm are as follows: in order to acquire features including different levels of information, a densinet network is used for carrying out multi-stage feature extraction on an image, and the output of a transition layer in the densinet is used as the extracted different-stage features Cn (n is 1, 2, 3 and 4).
And because the extracted feature dimensions of different stages are inconsistent, performing dimension normalization on the features of different stages. The feature dimensions of different stages are normalized to 256 by using the characteristic of convolution of 1x1, and then the fusion operation shown in fig. 3 is performed, and the specific fusion calculation is shown in formulas 13-16.
Cn′=Cn*conv1x1d(n=1,2,3,4,d=256) (13)
P4=C4′
Figure BDA0002505183930000131
Figure BDA0002505183930000132
Figure BDA0002505183930000133
In equation 13, Cn' is the result of dimension normalization of the features at different stages, and d ═ 256 represents the dimension of the normalized features. In formulae 14 to 16, fadd() Add operation, β, representing two features set forth in 2.21,β2Function to weight balance the features β1=β2=0.5
And extracting the region of interest from the fused feature using the region suggestion network. And roughly predicting the target, judging whether the target contains the object or not, not predicting the specific category, and roughly regressing the position of the target. And finishing the extraction of the region of interest, and then performing the prediction of the specific category and the fine regression of the target position.
In order to accurately acquire the category and the position of the target. The sum of the loss function of the proposed region and the final classification regression loss function is calculated using the multitask loss function designed in 2.4. And then, the total loss function is subjected to derivation by using a back propagation algorithm, the weight and the bias parameters are updated, and multiple iterations are performed to minimize the loss function.
The invention uses Densenet as a feature extraction network, compared with the traditional ResNet network, the parameter quantity required by the network is less than half of the ResNet; for the industry, the small model can obviously save bandwidth, reduce storage overhead, improve the calculation efficiency of the network model, and extract information of different levels according to network characteristics.
The invention solves the problem that a two-stage target detection algorithm is insensitive to small target objects, enhances the capability of extracting information by using a basic network, and builds a multi-layer information fusion network to fuse information of different layers, thereby ensuring that the position information of the features of the high layer is not lost and the semantic information of the features of the low layer is not lost; the detection capability of the detection algorithm to the targets with different sizes is improved while the detection speed is not reduced.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the invention, so that any modifications, equivalents, improvements and the like, which are within the spirit and principle of the present invention, should be included in the scope of the present invention.

Claims (8)

1. A target detection algorithm based on multi-layer information fusion is characterized by comprising the following steps:
s1, preprocessing the data set image; adjusting the image data to the size set by the network;
s2, extracting different levels of information of the image by using Densenet, and extracting characteristic features of four stages;
s3, normalizing the channel number of the extracted features of the four stages;
s4, vertically fusing the extracted multi-level information, and enhancing the transmission of different levels of information, so that the feature map has rich deep semantic information and shallow position information;
s5, extracting the region of interest from the fused multi-level information by using a region suggestion algorithm;
s6, predicting the accurate category of the region of interest and regressing the position coordinates;
s7, calculating the multitask loss function of the classification network and the regression network, training and optimizing the network to make the classification and regression loss function converge and save the weight parameter of the network
And S8, deploying the optimized parameters and detecting the target.
2. The multi-layer information fusion-based target detection algorithm as claimed in claim 1, wherein the specific steps of step S1 are as follows:
s101, performing color enhancement, translation change, horizontal and vertical turnover on an image;
s102, scaling all image data to 448 x 448 size using linear interpolation.
3. The multi-layer information fusion-based target detection algorithm as claimed in claim 1, wherein the specific method for feature extraction in step S2 is as follows: and performing convolution pooling on the image by using a built 98-layer Densenet network, and outputting the result of each transmission layer to obtain four-stage feature maps with the resolutions of 56 × 56,28 × 28,17 × 17 and 17 × 17.
4. The multi-layer information fusion-based target detection algorithm as claimed in claim 1, wherein the specific method for performing channel number normalization on the four-stage features in step S3 is as follows: the convolution operation is performed on the four stage features by using convolution of 1 × 1 with the channel number of 256, and the dimension of all the stage features is specified to be 256.
5. The multi-layer information fusion-based target detection algorithm according to claim 1, wherein the multi-stage feature fusion in step S4 is performed by:
s401, performing corresponding element addition operation on two adjacent stage features with the same size, and performing up-sampling operation on a smaller-size feature if the two stage features are different in size to ensure that the two fused features are the same in size;
and S402, convolving the fused result by using a convolution kernel of 3x3 to eliminate the aliasing effect after fusion.
6. The multi-layer information fusion-based target detection algorithm as claimed in claim 1, wherein the specific method of step S5 is as follows: and (4) extracting the region of interest from the multiple stage features fused in the step S4 by using a region candidate network, and performing foreground and background binary prediction and rough fitting of the border position on the region of interest by using an anchor point mechanism.
7. The multi-layer information fusion-based target detection algorithm as claimed in claim 1, wherein the specific method of step S6 is as follows:
s601, performing pooling operation on the region of interest extracted in the step S5;
s602, inputting the pooled region of interest into a fully-connected network, and classifying by using a Softmax classifier;
and S603, outputting predicted target position coordinates x, y, w and h, wherein x, y, w and h respectively represent the center coordinate, the width and the height of the box.
8. The multi-layer information fusion-based target detection algorithm as claimed in claim 1, wherein the specific method of step S7 is as follows:
s701, firstly, calculating a loss function of a classification part:
Figure FDA0002505183920000031
wherein: p is a radical ofiThe probability of predicting a target for an anchor point,
Figure FDA0002505183920000032
case of true tags for data sets
S702, calculating a loss function of the position regression part: using Smoothl1(═ 3) smoothing loss function:
Figure FDA0002505183920000033
Figure FDA0002505183920000034
wherein: t is tn4 parameterized coordinate vectors representing the prediction bounding box,
Figure FDA0002505183920000035
is the vector of the real frame matched with the positive case anchor point;
Figure FDA0002505183920000036
Figure FDA0002505183920000037
Figure FDA0002505183920000038
Figure FDA0002505183920000039
wherein: x, xa,x*Respectively corresponding to a prediction frame, an anchor point and a real frame;
s703, finally calculating the sum of two loss functions:
Figure FDA00025051839200000310
wherein: n is a radical ofclsFor training the network, the number of images per input, NregThe number of anchor points is, and lambda is a balance parameter of loss of the two parts;
and S704, training the fully-connected network to make the loss function converge.
CN202010444366.3A 2020-05-22 2020-05-22 Target detection algorithm based on multi-layer information fusion Pending CN111666988A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010444366.3A CN111666988A (en) 2020-05-22 2020-05-22 Target detection algorithm based on multi-layer information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010444366.3A CN111666988A (en) 2020-05-22 2020-05-22 Target detection algorithm based on multi-layer information fusion

Publications (1)

Publication Number Publication Date
CN111666988A true CN111666988A (en) 2020-09-15

Family

ID=72384416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010444366.3A Pending CN111666988A (en) 2020-05-22 2020-05-22 Target detection algorithm based on multi-layer information fusion

Country Status (1)

Country Link
CN (1) CN111666988A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329610A (en) * 2020-11-03 2021-02-05 中科九度(北京)空间信息技术有限责任公司 High-voltage line detection method based on edge attention mechanism fusion network
CN112990327A (en) * 2021-03-25 2021-06-18 北京百度网讯科技有限公司 Feature fusion method, device, apparatus, storage medium, and program product
CN113610822A (en) * 2021-08-13 2021-11-05 湖南大学 Surface defect detection method based on multi-scale information fusion
CN115017540A (en) * 2022-05-24 2022-09-06 贵州大学 Lightweight privacy protection target detection method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117876A (en) * 2018-07-26 2019-01-01 成都快眼科技有限公司 A kind of dense small target deteection model building method, model and detection method
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network
US20190139257A1 (en) * 2017-08-31 2019-05-09 Nec Laboratories America, Inc. Online flow guided memory networks for object detection in video
CN110046572A (en) * 2019-04-15 2019-07-23 重庆邮电大学 A kind of identification of landmark object and detection method based on deep learning
CN110378880A (en) * 2019-07-01 2019-10-25 南京国科软件有限公司 The Cremation Machine burning time calculation method of view-based access control model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190139257A1 (en) * 2017-08-31 2019-05-09 Nec Laboratories America, Inc. Online flow guided memory networks for object detection in video
CN109117876A (en) * 2018-07-26 2019-01-01 成都快眼科技有限公司 A kind of dense small target deteection model building method, model and detection method
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network
CN110046572A (en) * 2019-04-15 2019-07-23 重庆邮电大学 A kind of identification of landmark object and detection method based on deep learning
CN110378880A (en) * 2019-07-01 2019-10-25 南京国科软件有限公司 The Cremation Machine burning time calculation method of view-based access control model

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329610A (en) * 2020-11-03 2021-02-05 中科九度(北京)空间信息技术有限责任公司 High-voltage line detection method based on edge attention mechanism fusion network
CN112990327A (en) * 2021-03-25 2021-06-18 北京百度网讯科技有限公司 Feature fusion method, device, apparatus, storage medium, and program product
CN113610822A (en) * 2021-08-13 2021-11-05 湖南大学 Surface defect detection method based on multi-scale information fusion
CN115017540A (en) * 2022-05-24 2022-09-06 贵州大学 Lightweight privacy protection target detection method and system
CN115017540B (en) * 2022-05-24 2024-07-02 贵州大学 Lightweight privacy protection target detection method and system

Similar Documents

Publication Publication Date Title
CN110298262B (en) Object identification method and device
Wang et al. Soft-weighted-average ensemble vehicle detection method based on single-stage and two-stage deep learning models
CN112926396B (en) Action identification method based on double-current convolution attention
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN112418236B (en) Automobile drivable area planning method based on multitask neural network
CN111666988A (en) Target detection algorithm based on multi-layer information fusion
CN106326858A (en) Road traffic sign automatic identification and management system based on deep learning
WO2021218786A1 (en) Data processing system, object detection method and apparatus thereof
CN106228125B (en) Method for detecting lane lines based on integrated study cascade classifier
CN109886066A (en) Fast target detection method based on the fusion of multiple dimensioned and multilayer feature
CN111046781A (en) Robust three-dimensional target detection method based on ternary attention mechanism
CN111126459A (en) Method and device for identifying fine granularity of vehicle
CN111339830A (en) Target classification method based on multi-modal data features
CN112395951A (en) Complex scene-oriented domain-adaptive traffic target detection and identification method
CN115375781A (en) Data processing method and device
CN115631344A (en) Target detection method based on feature adaptive aggregation
CN115273032A (en) Traffic sign recognition method, apparatus, device and medium
CN117132759A (en) Saliency target detection method based on multiband visual image perception and fusion
Zhang et al. Classroom student posture recognition based on an improved high-resolution network
CN116563682A (en) Attention scheme and strip convolution semantic line detection method based on depth Hough network
CN117911698A (en) Image semantic segmentation method and device based on multi-feature fusion
Liang et al. Car detection and classification using cascade model
CN116863260A (en) Data processing method and device
CN116433675B (en) Vehicle counting method based on residual information enhancement, electronic device and readable medium
CN115082869B (en) Vehicle-road cooperative multi-target detection method and system for serving special vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200915

WD01 Invention patent application deemed withdrawn after publication