CN112270366B - Micro target detection method based on self-adaptive multi-feature fusion - Google Patents

Micro target detection method based on self-adaptive multi-feature fusion Download PDF

Info

Publication number
CN112270366B
CN112270366B CN202011204130.9A CN202011204130A CN112270366B CN 112270366 B CN112270366 B CN 112270366B CN 202011204130 A CN202011204130 A CN 202011204130A CN 112270366 B CN112270366 B CN 112270366B
Authority
CN
China
Prior art keywords
layer
feature
network
fusion
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011204130.9A
Other languages
Chinese (zh)
Other versions
CN112270366A (en
Inventor
朱智勤
张源川
李嫄源
冒睿睿
李鹏华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202011204130.9A priority Critical patent/CN112270366B/en
Publication of CN112270366A publication Critical patent/CN112270366A/en
Application granted granted Critical
Publication of CN112270366B publication Critical patent/CN112270366B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention relates to a micro target detection method based on self-adaptive multi-feature fusion, and belongs to the field of target detection. The extracted features are subjected to feature fusion through the traditional feature pyramid result, on the basis, a path is additionally designed for further feature fusion, and then multi-scale fusion is performed through a self-adaptive multi-feature fusion algorithm, so that the semantic information of the micro target is transmitted in a multi-scale feature layer, and the semantic information and texture information of the micro target are enriched. Meanwhile, a more reasonable prior frame parameter is obtained by using a k-means algorithm, network convergence is accelerated, and model precision is improved. And finally, carrying out non-maximum value inhibition on the detection result, and screening out overlapped object frames. The whole network continuously updates the network weight in an end-to-end mode until convergence. Finally, the self-adaptive multi-feature fusion micro-target detection algorithm can effectively complete the detection of the micro-target.

Description

Micro target detection method based on self-adaptive multi-feature fusion
Technical Field
The invention belongs to the field of target detection, and relates to a micro target detection method based on self-adaptive multi-feature fusion.
Background
Although many deep learning algorithms based on object detection have appeared in recent years and have brought strong improvements to object detection, there are many places to be improved for detecting tiny objects (less than 15 × 15 pixels) in images. Before the deep learning method is popular, for targets with different scales, image pyramids with different resolutions are constructed from an original image, and then a detector with a fixed input resolution is used for detecting the target for each layer of pyramids, so that small targets are detected at the bottom of the pyramids. However, for some complex background and large-resolution images, the image resolution is large and the target is small, so that the use of the image pyramid causes problems of excessive calculation amount and memory consumption. In recent years, a lot of results have been obtained in the detection of targets by deep learning methods, and for the field of small target detection, there are mainly several types of methods: characteristic pyramid, super-resolution, GAN network enhancement and the like. The super-resolution and GAN network enhancement methods greatly increase the calculation amount and memory consumption when the input image is large, and the main defect of the feature pyramid is the inconsistency among features of different scales. In order to solve the above problems, the invention provides a detection method specially for a tiny target, which adds a new path on the basis of a traditional Feature pyramid to enhance semantic information of a small target, and fuses Multi-scale features of the new path, so as to be called an adaptive Multi-Feature Fusion Network (adaptive Multi-Feature Fusion Network). Meanwhile, the invention also designs a Lightweight Multi-level Feature Extraction Network (light Multi-level Feature Extraction Network), which can simply and effectively extract Multi-scale features. The whole network model consists of a multi-level feature extraction network, a multi-feature fusion network and a detection network, wherein the multi-level feature extraction network is adopted to extract features of an input image to obtain high-level semantic and low-level semantic features; semantic information transmission from top to bottom and from bottom to top is carried out on the obtained multi-scale features, and a self-adaptive multi-feature fusion method is used for carrying out self-adaptive fusion on the features with different scales, so that the semantic information of the tiny target is enriched; and performing multi-scale prediction through a detection network, and generating an objective proposal box for classification and regression tasks by using a k-means method. The invention discloses a novel feature fusion method which can be directly applied to a detector using a feature pyramid structure and has better effect and robustness on detecting tiny targets in an image.
Disclosure of Invention
In view of this, the present invention provides a method for detecting a small target based on adaptive multi-feature fusion.
In order to achieve the purpose, the invention provides the following technical scheme:
the method for detecting the tiny target based on the self-adaptive multi-feature fusion comprises the following steps:
1) extracting high-level semantic and low-level semantic information of a tiny target by using the proposed lightweight multilevel feature extraction network, wherein the whole feature extraction network consists of five feature extraction modules, each feature extraction module consists of a [3 x 3,2] convolution network and three convolution blocks, and the depth and the feature extraction capability of the network are improved by using a residual error connection mode;
2) the characteristic layers with the down sampling rates of 8, 16 and 32 pass through a characteristic pyramid structure, the dimensionality is processed by using a convolution network of [1 multiplied by 1,1], the scale problem is processed by using a bilinear interpolation algorithm, and the characteristic dimensionality is improved by adopting connection on a channel in a fusion mode;
3) additionally adding a path on the basis of the characteristic pyramid structure to enrich semantic information and texture information of a tiny target, further extracting characteristics and adjusting dimensionality by using a [3 multiplied by 3,2] convolution network, wherein the fusion mode is still channel addition;
4) then passing the features after twice fusion through a self-adaptive multi-feature fusion network, wherein the up-sampling uses a bilinear interpolation algorithm, and the down-sampling is completed by adopting a [3 x 3,2] convolution network and maximum pooling; meanwhile, dimension matching is carried out by using a convolution network of 1 multiplied by 1, a required weight parameter is generated by using the convolution network of 1 multiplied by 1,1 with an output channel of 3, and finally the weight parameter is multiplied to a corresponding characteristic layer for fusion;
5) the network obtains a prior frame by adopting a k-means algorithm, clustering is carried out according to the target frame scale of the object in the data set to obtain k scales of the prior frame, and convergence of the model is accelerated;
6) finally, the fused features are respectively passed through a [3 x 3,1] convolution network to reach the output requirement of detection, and a non-maximum suppression algorithm is used for result screening; the whole network is trained in an end-to-end mode until the model converges.
In the step 1), a lightweight multilevel feature extraction network is used for extracting high-level semantic and low-level semantic information of an input image, the network is composed of a plurality of feature extraction modules, and the specific structure is as follows:
a) each feature extraction module consists of a [3 x 3,2] convolution network and three convolution blocks, wherein 3 x 3 is the convolution kernel size, 2 is the step length and is used for completing the downsampling process, and the downsampling rate is 2;
b) each convolution block in the characteristic extraction module consists of a [1 × 1,1] convolution network and a [3 × 3,1] convolution network, and a residual error connection mode of adding corresponding elements is used to improve the nonlinear capacity and depth of the model;
c) the feature extraction network has five feature extraction modules in total, the down-sampling rate is 32(2^5), and in the feature layer with the down-sampling rate of 8, 16 and 32, namely, the feature extraction network corresponds to the third, fourth and five feature extraction modules, and the output feature graph is used for self-adaptive multi-feature fusion.
Optionally, in step 2), the feature layers with the down-sampling rates of 8, 16, and 32 are respectively denoted as p3, p4, and p5, and then the feature layers are subjected to a feature pyramid structure to obtain a multi-scale feature, which includes the following specific steps:
a) the p5 layers are processed through a [1 x 1,1] convolution network, mainly used for dimensionality reduction, the output dimensionality is adjusted to be that of a p4 layer, and the output characteristic layer of the layer is recorded as c 5;
b) in addition, an upsampling layer is adopted, a bilinear interpolation algorithm is used (in the invention, the upsampling default is the bilinear interpolation algorithm, unless specially stated, the upsampling rate is increased by 2 times (namely the downsampling rate of p5 after upsampling is 16); after 1 × 1 convolution and up-sampling layer, the output dimensionality and down-sampling rate are matched with the p4 layer, so that the feature maps of the p5 layer and the p4 layer can be subjected to channel addition to obtain a fused feature map, and the feature map is subjected to 1 × 1 convolution by a feature extraction module to obtain a feature layer c 4;
c) similarly, c4 passes through an up-sampling layer, and is subjected to channel addition with the feature map of the p3 layer to obtain a fused feature map, and then the fused feature map passes through a feature extraction module to obtain a feature layer c 3.
Optionally, in step 3), on the basis of the traditional feature pyramid, a path from bottom to top is added to enrich semantic information of the micro target, and the specific steps are as follows:
a) passing the c3 layer through a [3 x 3,2] convolution network, further extracting features and adjusting output dimensions to match the c4 layer, performing channel addition on the feature map of the c4 layer to obtain a fused feature map, and passing through a feature extraction module to obtain a feature layer c 4';
b) similarly, c4 'is processed through a [3 × 3,2] convolution network and feature fusion with the c5 layer, and then processed through a feature extraction module to obtain a feature layer c 5'.
Optionally, in the step 4), the feature layers c3, c4 'and c 5' are obtained for subsequent detection, and the specific steps are as follows:
a) the c5 'layer is used as a fusion layer, so the c 4' layer needs to be subjected to 2 times of downsampling, namely a convolution network of [3 x 3,2] is used for realizing, the c3 layer needs to be subjected to 4 times of downsampling, namely the 2 times of downsampling is firstly carried out by using maximum pooling, and then the convolution network of [3 x 3,2] is used; then the layer c5 ' and the processed layers c4 ' and c3 pass through a self-adaptive fusion network to obtain a fusion result F5 of the layer c5 ';
b) the layer c4 'is used as a fusion layer, so the layer c 5' needs to be up-sampled by 2 times, and the layer c3 needs to be down-sampled by 2 times, namely, the convolution network of [3 x 3,2] is used for realizing the fusion; similarly, the layer c4 ' and the processed layers c5 ' and c3 are subjected to a self-adaptive fusion network to obtain a fusion result F4 of the layer c4 ';
c) the layer c3 is used as a fusion layer, so the layer c5 'needs to be up-sampled by 4 times, and the layer c 4' needs to be up-sampled by 2 times; similarly, a fusion result F3 of the c3 layer is obtained after the self-adaptive fusion network.
Optionally, in the step 5), the adaptive fusion network is formed by using a plurality of [1 × 1,1] convolution networks, in the layer c5 ', the layer c5 ', the layer c4 ' and the layer c3 which are processed are respectively subjected to dimensionality reduction processing by one [1 × 1,1] convolution network, the three feature maps after convolution are added on a channel, then the feature maps after convolution pass through the [1 × 1,1] convolution network with an output channel of 3, and finally the layer c5 ', the layer c4 ' and the layer c3 which are processed are respectively multiplied by weight parameters obtained by the adaptive fusion network, and then the weight parameters are added to obtain a fusion result F5; the same applies to the case of using the c 4' or c3 layer as the fusion layer, and this is expressed by equation (1):
F level =α level ·x 3→levellevel ·x 4→levellevel ·x 5→level (1)
in the formula, level represents the current fusion layer, x n→level The method is characterized in that the feature layers with different down-sampling rates are adjusted to the feature layer with the resolution of the fusion layer, the fusion layer corresponding to the level does not need to be adjusted, and alpha is level 、β level And gamma level Represents a weight parameter, wherein level Is represented by formula (2):
Figure BDA0002756456990000041
in the formula (I), the compound is shown in the specification,
Figure BDA0002756456990000042
and
Figure BDA0002756456990000043
is [ 1X 1,1] with an output channel of 3]Weight, beta, corresponding to each channel after convolution network level And gamma level The same applies to the definition of (1).
Optionally, in the step 6), after the adaptive multi-feature fusion network is performed, three fused feature layers F5, F4, and F3 are obtained for a subsequent detection network, and before that, prior frame parameters required by the detection network need to be calculated according to a data set; the prior frame parameters obtained by calculation through the k-means algorithm can be more reasonable than those set by an empirical method, so that the convergence of the network is accelerated, and the model has better performance, wherein the k-means calculation formula is as follows:
Figure BDA0002756456990000044
in the formula, x (i) Is the size of the target box in the dataset, i ═ 1,2, 3., m; j is the prior box to get k scales, with default k being 9, j being 1,2, 3. Mu.s j Represents the center after clustering, defined as follows:
Figure BDA0002756456990000045
by repeating the calculations of equation (3) and equation (4) until the algorithm converges.
Optionally, after the step 6), the method further comprises the step 7): after a priori frame is obtained, the feature layers F5, F4 and F3 are input into a detection network for detection, the detection network is composed of three [3 x 3,1] convolution networks and aims to perform dimension matching and dimension reduction processing to meet the output requirement of detection, and finally the recognition result of the detection network is subjected to non-maximum suppression to obtain the final detection result.
The invention has the beneficial effects that:
the invention relates to a micro-target detection method based on self-adaptive multi-feature fusion, the traditional micro-target detection method is generally based on an image pyramid, and along with the development of deep learning, methods such as super-resolution and GAN network enhancement gradually make better progress in the field of micro-targets, but when an input image is too large and has a complex background or the number of objects to be detected is too large, the methods can cause the problems of increased calculated amount, overflow of memory and the like. The invention discloses a self-adaptive multi-feature fusion method, which can improve the recognition result of a tiny target, hardly needs to increase redundant memory consumption and time consumption, and simultaneously designs a lightweight multi-stage feature extraction network which can efficiently extract image features while reducing the parameter quantity and the calculation quantity of a model.
The method uses a lightweight multilevel feature extraction network to extract features of an input image, so that the obtained feature map contains feature information of high-level semantics and low-level semantics; secondly, path enhancement is carried out by using a traditional characteristic pyramid structure, and then a path is added to the traditional path for enhancement, so that the target characteristic information is richer; then, a self-adaptive multi-feature fusion method is used for feature fusion, so that semantic information of the tiny target is richer, and the recall rate and the accuracy rate of the network model are greatly improved; then, calculating to obtain prior frame parameters required by the detection network according to the target size of the data set by using a k-means algorithm, so that the convergence speed of the network model is accelerated, and the generalization performance of the model is improved; and finally, identifying the result through a detection network, and continuously updating the network weight by adopting an end-to-end mode until convergence.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For a better understanding of the objects, aspects and advantages of the present invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a diagram of a lightweight multi-level feature extraction network architecture;
FIG. 2 is a diagram of an adaptive multi-feature fusion network architecture;
fig. 3 is an overall structure diagram of a network model.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
Referring to fig. 1 to 3, a method for detecting a small target based on adaptive multi-feature fusion includes the following steps:
1) extracting high-level semantic and low-level semantic information of an input image by using a lightweight multilevel feature extraction network, wherein the network consists of a plurality of feature extraction modules and has the following specific structure:
a) each feature extraction module consists of a [3 x 3,2] convolution network and three convolution blocks, wherein 3 x 3 is the convolution kernel size, 2 is the step length and is used for completing the downsampling process, and the downsampling rate is 2;
b) each convolution block in the characteristic extraction module consists of a [1 × 1,1] convolution network and a [3 × 3,1] convolution network, and a residual error connection mode of adding corresponding elements is used to improve the nonlinear capacity and depth of the model;
c) the feature extraction network has five feature extraction modules in total, the down-sampling rate is 32(2^5), and the feature graph is output at the feature layer (corresponding to the third, fourth and five feature extraction modules) with the down-sampling rate of 8, 16 and 32 for self-adaptive multi-feature fusion.
2) Respectively recording the characteristic layers with the down sampling rates of 8, 16 and 32 as p3, p4 and p5, and further obtaining the multi-scale characteristics by passing the characteristic layers through a characteristic pyramid structure, wherein the specific steps are as follows:
a) the p5 layers are processed through a [1 x 1,1] convolution network, mainly used for dimensionality reduction, the output dimensionality is adjusted to be that of a p4 layer, and the output characteristic layer of the layer is recorded as c 5;
b) in addition, an upsampling layer is adopted, a bilinear interpolation algorithm is used (in the invention, the upsampling default is the bilinear interpolation algorithm, unless specially stated, the upsampling rate is increased by 2 times (namely the downsampling rate of p5 after upsampling is 16); after 1 × 1 convolution and upsampling, the output dimensionality and downsampling rate are matched with the p4 layer, so that feature maps of the p5 layer and the p4 layer can be subjected to channel addition to obtain a fused feature map, and the fused feature map is subjected to convolution with 1 × 1 through a feature extraction module to obtain a feature layer c 4;
c) similarly, c4 passes through an up-sampling layer, and is subjected to channel addition with the feature map of the p3 layer to obtain a fused feature map, and then the fused feature map passes through a feature extraction module to obtain a feature layer c 3.
3) On the basis of the traditional characteristic pyramid, a path from bottom to top is added for enriching the semantic information of the tiny target, and the method specifically comprises the following steps:
a) the c3 layer is further extracted with characteristics and output dimension is adjusted through a [3 x 3,2] convolution network to be matched with the c4 layer, and then channel addition is carried out on the c4 layer characteristic diagram to obtain a fused characteristic diagram, and a characteristic layer c 4' is obtained through a characteristic extraction module;
b) similarly, c4 'is processed through a [3 × 3,2] convolution network and feature fusion with the c5 layer, and then processed through a feature extraction module to obtain a feature layer c 5'.
4) After the above operations, the feature layers c3, c4 'and c 5' are obtained for subsequent detection, and at this time, although the semantic information of the feature layers c3 and c4 'is subjected to path enhancement twice, the semantic information is still not as rich as that of the feature layer c 5'. The feature texture information provides accurate position information for a target, the strength of the semantic information can help to judge whether the object is a foreground or a background or what kind of object, a feature layer with a low down-sampling rate has high texture information but insufficient semantic information, and a feature layer with a high down-sampling rate has rich semantic information but insufficient texture information. Therefore, the invention discloses a method for self-adaptive multi-feature fusion, which can effectively enrich insufficient information in feature layers with different down-sampling rates, and comprises the following specific steps:
a) the layer c5 'is used as a fusion layer, so the layer c 4' needs to be subjected to 2 times of down sampling, namely realized by a convolution network of [3 x 3,2], and the layer c3 needs to be subjected to 4 times of down sampling, namely, the maximum pooling is firstly used for 2 times of down sampling, and then the convolution network of [3 x 3,2] is used; then the layer c5 ' and the processed layers c4 ' and c3 are subjected to a self-adaptive fusion network to obtain a fusion result F5 of the layer c5 ';
b) the layer c4 'is used as a fusion layer, so the layer c 5' needs to be up-sampled by 2 times, and the layer c3 needs to be down-sampled by 2 times, namely, the convolution network of [3 x 3,2] is used for realizing the fusion; similarly, the layer c4 ' and the processed layers c5 ' and c3 are subjected to a self-adaptive fusion network to obtain a fusion result F4 of the layer c4 ';
c) the layer c3 is used as a fusion layer, so the layer c5 'needs to be up-sampled by 4 times, and the layer c 4' needs to be up-sampled by 2 times; similarly, a fusion result F3 of the c3 layer is obtained after the self-adaptive fusion network.
5) The adaptive fusion network is composed of a plurality of convolution networks of 1 × 1, taking a layer c5 ' as a fusion layer as an example, respectively performing dimensionality reduction on the layer c5 ', the processed layers c4 ' and c3 through a convolution network of [1 × 1,1], adding the three feature maps after convolution on channels, then passing through a convolution network of [1 × 1,1] with an output channel of 3, and finally respectively multiplying the layer c5 ', the processed layers c4 ' and c3 by weight parameters obtained by the adaptive fusion network, and then adding to obtain a fusion result F5; the same applies when the layer c 4' or c3 is used as the fusion layer. This process can be expressed by equation (1):
F level =α level ·x 3→levellevel ·x 4→levellevel ·x 5→level (1)
in the formula, level represents the current fusion layer, x n→level Indicates that the feature layers with different down-sampling rates are adjusted to the feature layer with the fusion layer resolution (note that the fusion layer corresponding to level does not need to be adjusted), and alpha level 、β level And gamma level Represents a weight parameter, wherein level Is represented by formula (2):
Figure BDA0002756456990000071
in the formula (I), the compound is shown in the specification,
Figure BDA0002756456990000081
and
Figure BDA0002756456990000082
is [ 1X 1,1] with an output channel of 3]Weight, beta, corresponding to each channel after convolution network level And gamma level The same applies to the definition of (1).
6) After the self-adaptive multi-feature fusion network is adopted, three fused feature layers F5, F4 and F3 are obtained and are used for a subsequent detection network, and before the three fused feature layers are used for the subsequent detection network, the prior frame parameters required by the detection network are obtained through calculation according to a data set. The prior frame parameters obtained by calculation through the k-means algorithm can be more reasonable than those set by an empirical method, so that the convergence of the network is accelerated, and the model has better performance, wherein the k-means calculation formula is as follows:
Figure BDA0002756456990000083
in the formula, x (i) Is the scale of the target box in the data set1,2,3, ·, m; j is the prior box to get k scales (default k 9), j 1,2, 3. Mu.s j Represents the center after clustering, defined as follows:
Figure BDA0002756456990000084
by repeating the calculations of equation (3) and equation (4) until the algorithm converges.
7) After a priori frame is obtained, the feature layers F5, F4 and F3 can be input into a detection network for detection, the detection network is composed of three [3 x 3,1] convolution networks and aims to perform dimension matching and dimension reduction processing to meet the output requirement of detection, and finally the recognition result of the detection network is subjected to non-maximum suppression to obtain the final detection result.
1. Extracting high-level semantic and low-level semantic features of the input image by using a lightweight multistage feature extraction network, and storing intermediate results of the last three feature layers for subsequent feature fusion;
2. firstly, a traditional characteristic pyramid path is enhanced once, a path from bottom to top is added on the basis of the traditional characteristic pyramid for enriching the characteristic information of the tiny target, and finally, a self-adaptive multi-characteristic fusion method is used for carrying out multi-layer characteristic fusion, so that the semantic information of the tiny target is further improved;
3. the prior frame parameters are obtained by using a k-means algorithm, the recognition result of the image is obtained by a detection network and a non-maximum suppression method, and the weight parameters are continuously updated by adopting an end-to-end training mode in the whole network until the network is converged.
Finally, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (3)

1. The method for detecting the tiny target based on the self-adaptive multi-feature fusion is characterized by comprising the following steps: the method comprises the following steps:
1) extracting high-level semantic and low-level semantic information of a tiny target by using the proposed lightweight multilevel feature extraction network, wherein the whole feature extraction network consists of five feature extraction modules, each feature extraction module consists of a [3 x 3,2] convolution network and three convolution blocks, and the depth and the feature extraction capability of the network are improved by using a residual error connection mode;
2) processing the feature layers with the down sampling rates of 8, 16 and 32 by a feature pyramid structure, processing dimensionality by using a [1 x 1,1] convolution network, processing a scale problem by using a bilinear interpolation algorithm, and improving the feature dimensionality by adopting connection on a channel in a fusion mode;
3) additionally adding a path on the basis of the characteristic pyramid structure to enrich semantic information and texture information of a tiny target, further extracting characteristics and adjusting dimensionality by using a [3 x 3,2] convolution network, wherein the fusion mode is still channel addition;
4) then passing the features after twice fusion through a self-adaptive multi-feature fusion network, wherein the up-sampling uses a bilinear interpolation algorithm, and the down-sampling is completed by adopting a [3 x 3,2] convolution network and maximum pooling; meanwhile, dimension matching is carried out by using a convolution network of 1 multiplied by 1, a required weight parameter is generated by using the convolution network of 1 multiplied by 1,1 with an output channel of 3, and finally the weight parameter is multiplied to a corresponding characteristic layer for fusion;
5) the network obtains a prior frame by adopting a k-means algorithm, clustering is carried out according to the target frame scale of the object in the data set to obtain k scales of the prior frame, and convergence of the model is accelerated;
6) finally, the fused features are respectively passed through a [3 x 3,1] convolution network to reach the output requirement of detection, and a non-maximum suppression algorithm is used for result screening; training the whole network in an end-to-end mode until the model converges;
in the step 1), a lightweight multilevel feature extraction network is used for extracting high-level semantic and low-level semantic information of an input image, the network is composed of a plurality of feature extraction modules, and the specific structure is as follows:
a) each feature extraction module consists of a [3 x 3,2] convolution network and three convolution blocks, wherein 3 x 3 is the convolution kernel size, 2 is the step length and is used for completing the downsampling process, and the downsampling rate is 2;
b) each convolution block in the characteristic extraction module consists of a [1 × 1,1] convolution network and a [3 × 3,1] convolution network, and a residual error connection mode of adding corresponding elements is used to improve the nonlinear capacity and depth of the model;
c) the feature extraction network has five feature extraction modules in total, and outputs a feature map for self-adaptive multi-feature fusion in a feature layer with a lower sampling rate of 8, 16 and 32, namely corresponding to the third, fourth and five feature extraction modules;
in the step 2), the feature layers with the down-sampling rates of 8, 16 and 32 are respectively marked as p3, p4 and p5, and then the feature layers are subjected to a feature pyramid structure to obtain a multi-scale feature, which specifically comprises the following steps:
a) the p5 layers are subjected to a [1 x 1,1] convolution network, mainly used for dimension reduction processing, the output dimension is adjusted to the dimension of the p4 layer, and the output characteristic layer of the layer is recorded as c 5;
b) the sampling rate can be increased by 2 times after the upsampling by using a bilinear interpolation algorithm in the upsampling layer, namely the downsampling rate of the upsampled p5 is 16; after 1 × 1 convolution and upsampling, the output dimensionality and downsampling rate are matched with the p4 layer, so that feature maps of the p5 layer and the p4 layer can be subjected to channel addition to obtain a fused feature map, and the fused feature map is subjected to convolution with 1 × 1 through a feature extraction module to obtain a feature layer c 4;
c) similarly, c4 passes through an upper sampling layer and is subjected to channel addition with the characteristic diagram of the p3 layer to obtain a fused characteristic diagram, and then the fused characteristic diagram passes through a characteristic extraction module to obtain a characteristic layer c 3;
in 3), on the basis of the traditional characteristic pyramid, a path from bottom to top is added to enrich semantic information of the tiny target, and the specific steps are as follows:
a) the c3 layer is further extracted with characteristics and output dimension is adjusted through a [3 x 3,2] convolution network to be matched with the c4 layer, and then channel addition is carried out on the c4 layer characteristic diagram to obtain a fused characteristic diagram, and a characteristic layer c 4' is obtained through a characteristic extraction module;
b) similarly, c4 'is subjected to feature fusion with the c5 layer through a [3 × 3,2] convolution network, and then is subjected to a feature extraction module to obtain a feature layer c 5';
in the step 4), the feature layers c3, c4 'and c 5' are obtained for subsequent detection, and the specific steps are as follows:
a) the layer c5 'is used as a fusion layer, so the layer c 4' needs to be subjected to 2 times of down sampling, namely realized by a convolution network of [3 x 3,2], and the layer c3 needs to be subjected to 4 times of down sampling, namely, the maximum pooling is firstly used for 2 times of down sampling, and then the convolution network of [3 x 3,2] is used; then the layer c5 ' and the processed layers c4 ' and c3 pass through a self-adaptive fusion network to obtain a fusion result F5 of the layer c5 ';
b) the layer c4 'is used as a fusion layer, so the layer c 5' needs to be up-sampled by 2 times, and the layer c3 needs to be down-sampled by 2 times, namely, the convolution network of [3 x 3,2] is used for realizing the fusion; similarly, the layer c4 ' and the processed layers c5 ' and c3 are subjected to a self-adaptive fusion network to obtain a fusion result F4 of the layer c4 ';
c) the layer c3 is used as a fusion layer, so the layer c5 'needs to be up-sampled by 4 times, and the layer c 4' needs to be up-sampled by 2 times; similarly, a fusion result F3 of the c3 layer is obtained after the self-adaptive fusion network;
in the step 5), the adaptive fusion network is composed of a plurality of convolution networks of [1 × 1,1], in the layer c5 ', the layer c5 ', the processed layers c4 ' and c3 are respectively subjected to dimensionality reduction processing through the convolution network of [1 × 1,1], the three feature maps after convolution are added on a channel, then the feature maps after convolution pass through the convolution network of [1 × 1,1] with an output channel of 3, and finally the layer c5 ', the processed layers c4 ' and c3 are respectively multiplied by weight parameters obtained by the adaptive fusion network and then added to obtain a fusion result F5; the same applies to the case of using the c 4' or c3 layer as the fusion layer, and this is expressed by equation (1):
F level =α level ·x 3→levellevel ·x 4→levellevel ·x 5→level (1)
in the formula, level represents the current fusion layer, x n→level Indicates the feature layer after adjusting the feature layers of different down-sampling rates to the resolution of the fusion layer, alpha level 、β level And gamma level Represents a weight parameter, wherein level Is represented by formula (2):
Figure FDA0003742279710000031
in the formula (I), the compound is shown in the specification,
Figure FDA0003742279710000032
and
Figure FDA0003742279710000033
is [ 1X 1,1] with an output channel of 3]Weight, beta, corresponding to each channel after convolution network level And gamma level The same applies to the definition of (1).
2. The method for detecting the tiny target based on the adaptive multi-feature fusion as claimed in claim 1, wherein: in the step 6), after the adaptive multi-feature fusion network is performed, three fused feature layers F5, F4 and F3 are obtained and used for a subsequent detection network, and before that, prior frame parameters required by the detection network need to be calculated according to a data set; the prior frame parameters obtained by calculation through the k-means algorithm can be more reasonable than those set by an empirical method, so that the convergence of the network is accelerated, and the model has better performance, wherein the k-means calculation formula is as follows:
Figure FDA0003742279710000034
in the formula, x (i) Being object boxes in the data setDimension, i ═ 1,2,3,. ·, m; j is a priori box to get k dimensions, default k 9, j 1,2, 3. Mu.s j Represents the center after clustering, defined as follows:
Figure FDA0003742279710000035
by repeating the calculations of equation (3) and equation (4) until the algorithm converges.
3. The method for detecting the tiny target based on the adaptive multi-feature fusion as claimed in claim 2, wherein: after said 6), further comprising 7): after a priori frame is obtained, the feature layers F5, F4 and F3 are input into a detection network for detection, the detection network is composed of three [3 x 3,1] convolution networks and aims to perform dimension matching and dimension reduction processing to meet the output requirement of detection, and finally the recognition result of the detection network is subjected to non-maximum suppression to obtain the final detection result.
CN202011204130.9A 2020-11-02 2020-11-02 Micro target detection method based on self-adaptive multi-feature fusion Active CN112270366B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011204130.9A CN112270366B (en) 2020-11-02 2020-11-02 Micro target detection method based on self-adaptive multi-feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011204130.9A CN112270366B (en) 2020-11-02 2020-11-02 Micro target detection method based on self-adaptive multi-feature fusion

Publications (2)

Publication Number Publication Date
CN112270366A CN112270366A (en) 2021-01-26
CN112270366B true CN112270366B (en) 2022-08-26

Family

ID=74345879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011204130.9A Active CN112270366B (en) 2020-11-02 2020-11-02 Micro target detection method based on self-adaptive multi-feature fusion

Country Status (1)

Country Link
CN (1) CN112270366B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112950703B (en) * 2021-03-11 2024-01-19 无锡禹空间智能科技有限公司 Small target detection method, device, storage medium and equipment
CN113011442A (en) * 2021-03-26 2021-06-22 山东大学 Target detection method and system based on bidirectional adaptive feature pyramid
CN114022682A (en) * 2021-11-05 2022-02-08 天津大学 Weak and small target detection method based on attention secondary feature fusion mechanism

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097129A (en) * 2019-05-05 2019-08-06 西安电子科技大学 Remote sensing target detection method based on profile wave grouping feature pyramid convolution

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992854A (en) * 2017-12-22 2018-05-04 重庆邮电大学 Forest Ecology man-machine interaction method based on machine vision
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109658412B (en) * 2018-11-30 2021-03-30 湖南视比特机器人有限公司 Rapid packaging box identification and segmentation method for unstacking and sorting
CN110555475A (en) * 2019-08-29 2019-12-10 华南理工大学 few-sample target detection method based on semantic information fusion
CN111199255A (en) * 2019-12-31 2020-05-26 上海悠络客电子科技股份有限公司 Small target detection network model and detection method based on dark net53 network
CN111860637B (en) * 2020-07-17 2023-11-21 河南科技大学 Single-shot multi-frame infrared target detection method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097129A (en) * 2019-05-05 2019-08-06 西安电子科技大学 Remote sensing target detection method based on profile wave grouping feature pyramid convolution

Also Published As

Publication number Publication date
CN112270366A (en) 2021-01-26

Similar Documents

Publication Publication Date Title
CN110111366B (en) End-to-end optical flow estimation method based on multistage loss
CN110335290B (en) Twin candidate region generation network target tracking method based on attention mechanism
CN110232394B (en) Multi-scale image semantic segmentation method
CN112270366B (en) Micro target detection method based on self-adaptive multi-feature fusion
CN111047551B (en) Remote sensing image change detection method and system based on U-net improved algorithm
CN109389556B (en) Multi-scale cavity convolutional neural network super-resolution reconstruction method and device
CN112287940A (en) Semantic segmentation method of attention mechanism based on deep learning
CN109711401B (en) Text detection method in natural scene image based on Faster Rcnn
CN113505792B (en) Multi-scale semantic segmentation method and model for unbalanced remote sensing image
US20210065337A1 (en) Method and image processing device for image super resolution, image enhancement, and convolutional neural network model training
CN112365514A (en) Semantic segmentation method based on improved PSPNet
CN111340046A (en) Visual saliency detection method based on feature pyramid network and channel attention
CN110717921B (en) Full convolution neural network semantic segmentation method of improved coding and decoding structure
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN115457568B (en) Historical document image noise reduction method and system based on generation countermeasure network
CN113486890A (en) Text detection method based on attention feature fusion and cavity residual error feature enhancement
CN114241274A (en) Small target detection method based on super-resolution multi-scale feature fusion
CN114359292A (en) Medical image segmentation method based on multi-scale and attention
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN112132834A (en) Ventricular image segmentation method, system, device and storage medium
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN113807340A (en) Method for recognizing irregular natural scene text based on attention mechanism
CN115375711A (en) Image segmentation method of global context attention network based on multi-scale fusion
CN110633706B (en) Semantic segmentation method based on pyramid network
CN115330620A (en) Image defogging method based on cyclic generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant