CN111126385A - Deep learning intelligent identification method for deformable living body small target - Google Patents

Deep learning intelligent identification method for deformable living body small target Download PDF

Info

Publication number
CN111126385A
CN111126385A CN201911284570.7A CN201911284570A CN111126385A CN 111126385 A CN111126385 A CN 111126385A CN 201911284570 A CN201911284570 A CN 201911284570A CN 111126385 A CN111126385 A CN 111126385A
Authority
CN
China
Prior art keywords
deformable
convolution
offset
dimensional
pooling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911284570.7A
Other languages
Chinese (zh)
Inventor
黄海
靳佰达
万兆亮
周浩
石晓婷
吴晗
梅洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201911284570.7A priority Critical patent/CN111126385A/en
Publication of CN111126385A publication Critical patent/CN111126385A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a deep learning intelligent identification method of a deformable living body small target, belonging to the technical field of robot vision and intelligent identification thereof; the deformable convolution module and the deformable ROI pooling module are reasonably combined with the Faster R-CNN, the deformable convolution module and the deformable ROI pooling module of the deformable network are used for improving the model, two-dimensional or even high-dimensional offset is added to the spatial sampling point and the common ROI pooling of the standard convolution, so that the shape of the convolved sampling point is changed, the deformable characteristic of the improved model is improved, and the detection and identification effect of the improved model on a deformable target is improved; considering fusion of feature maps of different layers, performing pooling processing on a bottom layer feature map to reduce resolution, performing deconvolution processing on a high layer feature to improve resolution, and then fusing low, medium and high layer feature maps; meanwhile, a group of small-scale preselection frames are added, the generation number of the small target preselection frames is increased, and the detection and identification effect of the small target is improved by improving the model.

Description

Deep learning intelligent identification method for deformable living body small target
Technical Field
The invention relates to a deep learning intelligent identification method of a deformable living body small target, belonging to the technical field of robot vision and intelligent identification thereof.
Background
The vision and intelligent recognition technology of the robot is one of the main means for acquiring external information by the robot, and is widely used in the fields of detection, target tracking, operation and the like in the robot field at present. However, with the revolution of technology and the need for improving system performance, the visual intelligent technology requires the robot to detect and identify not only small-scale targets in different scenes, but also deformable living targets. Two main solutions are currently available for the problem of difficulty in detecting deformable living targets. The first category is to build a training set with enough varying shapes for the target, which is mainly achieved by augmenting the existing data. The method mainly achieves robust detection of deformable targets by consuming a large amount of training and complex model parameters. The second category of methods uses features and algorithms with transform invariance, and this category of methods contains many classical algorithms such as SIFT, scale invariant feature transform, and sliding window based object detection paradigm.
However, the above-mentioned method suffers from both of these disadvantages. First, the geometric transformation is fixed and known, and this a priori knowledge is used to design augmentation data and design features and algorithms, however, for live objects, the shape transformation has many forms, and the augmented object morphology is limited, so this approach cannot deal with the unknown geometric transformations for those morphologies that are not augmented. Second, for overly complex transforms, even if the transforms are known, artificially designing invariant features and algorithms is difficult to implement and infeasible.
Disclosure of Invention
The invention aims to provide a deep learning intelligent identification method of a deformable living small target for improving the detection effect of the deformable target.
The invention aims to realize the method, and the method for intelligently recognizing the deep learning of the deformable living small target specifically comprises the following steps:
step 1, replacing a basic convolution unit by a deformable convolution module: adding two-dimensional or even high-dimensional offset to the spatial sampling points of the standard convolution to change the shapes of the sampling points of the standard convolution;
step 2. the deformable ROI pooling module replaces the ROI pooling layer: adding a two-dimensional or even high-dimensional offset to the position of each square grid of the common ROI (Region of Interest) pooling so as to improve the deformable capability of the convolutional neural network, obtain the deformable convolutional network and improve the detection and identification capability of the convolutional neural network on a deformable target;
step 3, aiming at the detection and identification of the small target, a structure based on inverse convolution and multi-layer feature fusion is used for improving the Faster R-CNN model, so that the information quantity obtained by the small target preselection frame is richer;
and 4, in the fast R-CNN network, the RPN network is used for generating preselected frames, then an algorithm classifies and regresses the preselected frames, the mechanism of the anchor point is improved, and a group of small-scale preselected frames are added in the anchor point, so that the RPN can generate more small target preselected frames, and the detection and identification effects on small targets are improved.
The invention also includes such structural features:
1. the deformable convolution network comprises a deformable convolution module, a deformable ROI pooling module and a deformable position-sensitive ROI pooling module; the convolution and the feature map in a convolutional neural network are both three-dimensional, the deformable convolution operates in a two-dimensional spatial domain, and the deformable convolution operation is the same between different channel dimensions.
2. Step 1 is a two-dimensional operation description of deformable convolution, and specifically comprises adding two-dimensional or even high-dimensional offset to spatial sampling points of standard convolution to enable the shape of the sampling points of the convolution to change; the offset is obtained by performing convolution operation on the same input feature map, and the convolution kernel of the convolution operation and the previous convolution layer keep the same resolution and expansion value; the output offset domain and the input feature map have the same spatial resolution, the number of channels of the offset domain is twice that of the input feature map, which corresponds to the two-dimensional offset of each sampling position of convolution, in training, a convolution kernel for generating the output feature map and a convolution kernel for generating the offset domain are learned at the same time, and for learning to obtain the offset domain, the gradient is obtained by bilinear operation inverse operation of the following two formulas:
Figure BDA0002317640160000021
where p denotes the position of an arbitrary sample point, the gradient relative offset Δ p in the deformable ROI pooling moduleijIn the formula (c), p is p0+pn+ΔpnQ represents an input feature diagram InAll integer spatial traversal points in (a), (b), G (a, b) max (0,1- | a-b);
in the deformable convolution formula, the gradient is relative to the offset Δ pnThe calculation formula of (2) is as follows:
Figure BDA0002317640160000022
in the formula (I), the compound is shown in the specification,
Figure BDA0002317640160000023
can be represented by the formula G (q, p) ═ G (q)x,px)·g(qy,py) Derivation gives, notice of Δ pnIs a two-dimensional quantity, which we use for simplicity
Figure BDA0002317640160000024
To replace
Figure BDA0002317640160000025
And
Figure BDA0002317640160000026
3. the step 2 is that the deformable ROI pooling operation of the ROI pooling layer is operated in a two-dimensional space domain, the deformable ROI pooling operation is the same among different channel dimensions, and the deformable ROI pooling operation specifically comprises that a two-dimensional or even high-dimensional offset is added to the position of each square of the common ROI pooling, so that the deformable capability of the convolutional neural network is improved, and the detection and identification capability of the convolutional neural network on a deformable target is improved; firstly, obtaining a pooled feature map by using ROI pooling operation; then, a full connection layer is connected behind the characteristic graph to obtain normalized offset; finally this normalized offset is calculated by multiplying the elements of the width and height of the region of interest; the normalization of the offset is essential for learning the invariance of the offset to the size of the region of interest, and the parameters of the subsequent full-link layer are obtained through a back propagation algorithm; in the deformable ROI pooling Module, the relative gradient offset Δ pijThe value of (d) can be calculated as:
Figure BDA0002317640160000031
4. the deformable convolution network can be improved on a Faster R-CNN network, the improvement is divided into two stages, the first stage is that a full convolution network generates a feature map for an input picture, and a modified VGG16 network removes a maximum pooling layer, two 4096 unit full connection layers and a 1000 unit full connection layer which follow a convolution unit in order to extract features; the deformable convolution is applied to the last convolution unit, i.e., the three convolution layers conv5_1, conv5_2, and conv5_ 3. The second phase is that a lightweight task-based network generates results based on the input feature map; the classification regression part of the Faster R-CNN network mainly uses an RPN network to generate a preselected frame, then the preselected frame and a feature map are input into the Fast R-CNN network, firstly, an ROI pooling layer performs ROI pooling on a frame to obtain features, two 1024-dimensional full-connected layers are added, and finally, two parallel branches are connected, and target regression and classification are respectively performed to obtain a final result.
5. Step 3, the improvement of the Faster R-CNN model by using a structure based on inverse convolution specifically comprises the steps of inserting an inverse pooling layer in a convolution neural network; in order to apply the inverse pooling layer, the position of the maximum activation value is first recorded at the time of the pooling operation; then, returning the activation value to the position of the activation value during pooling during anti-pooling, and setting the rest positions to be zero; finally, the output characteristic diagram of the inverse convolution needs to be clipped, so that the resolution of the characteristic diagram after the inverse convolution processing is consistent with the resolution of the inverse pooling output characteristic diagram.
6. Step 3, the improvement of the Faster R-CNN model by using the multi-layer feature fusion structure specifically comprises the steps of firstly carrying out fusion processing on features aiming at the condition of insufficient feature information, and then carrying out ROI pooling on a plurality of regions of interest, so that only one-time feature fusion and one-time normalization are needed, and the time for repeated calculation is saved; secondly, aiming at the condition that the region of interest is small, the last layer of features is subjected to inverse convolution processing, the third layer of features is subjected to maximum pooling processing, and finally the three feature graphs are fused.
Compared with the prior art, the invention has the beneficial effects that: the invention designs a deep learning intelligent identification method of a deformable living body small target, which reasonably combines a deformable convolution module and a deformable ROI pooling module with fast R-CNN according to the characteristics of the deformable living body small target, wherein the deformable convolution module is used for replacing a basic convolution unit, the deformable ROI pooling module is used for replacing an ROI pooling layer, and the sampling of a detection model can be changed along with the change of the shape of the detection target by introducing the deformable convolution and the deformable ROI pooling module, so that the detection effect of the deformable target is improved. The fast R-CNN model is improved by using inverse convolution and multi-layer feature fusion, the information quantity obtained by a small target preselection frame is richer by using the inverse convolution and the multi-layer feature fusion, and the improvement of an anchor point mechanism is that RPN can generate more small target preselection frames. Meanwhile, the method based on inverse convolution and multi-layer feature fusion has strong semantic information of high-layer features and combines the advantage of high resolution of low-layer features.
Drawings
FIG. 1 is a schematic diagram of a 3 × 3 deformable convolution;
FIG. 2 is a schematic 3X 3 deformable ROI pooling;
FIG. 3 is a schematic diagram of a modification of the deformable convolution, deformable ROI pooling to Faster R-CNN;
FIG. 4 is a schematic diagram of the deconvolution and inverse pooling operations;
FIG. 5 is a schematic of multi-layer feature fusion;
FIG. 6 is a schematic diagram of improved multi-layer feature fusion;
fig. 7 is a schematic diagram of the structure of an RPN network;
FIG. 8 is a result of deformable convolution, deformable ROI pooling real-time online identification of video frames;
FIG. 9 is a visualization of the original Faster R-CNN (left) and improved Faster R-CNN (right) sea creature target detection.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
The invention designs a method for intelligently recognizing the deep learning of a deformable living body small target, which reasonably combines a deformable convolution module and a deformable ROI pooling module with an Faster R-CNN according to the characteristics of the deformable living body small target, wherein the deformable convolution module is used for replacing a basic convolution unit, the deformable ROI pooling module is used for replacing an ROI pooling layer, meanwhile, an inverse convolution and multilayer feature fusion are used for improving an Faster R-CNN model, the inverse convolution and multilayer feature fusion enable a small target preselection frame to obtain richer information quantity, and the improvement on an anchor point mechanism enables an RPN to generate more small target preselection frames.
The method is based on the advanced achievement of the inventor in the aspect of artificial intelligence research, and the method is accurate in identification of the deformable living body small target. The introduction of the deformable convolution and the deformable ROI pooling module can enable the sampling of the detection model to change along with the change of the shape of the detection target, thereby improving the detection effect of the deformable target. The method based on the inverse convolution and the multi-layer feature fusion has strong semantic information of high-layer features and combines the advantage of high resolution of low-layer features.
The invention is realized as follows:
a. a deep learning intelligent identification method of a deformable living body small target mainly comprises the following steps: firstly, adding two-dimensional or even high-dimensional offset to spatial sampling points of standard convolution to enable the shape of the sampling points of the convolution to change; secondly, adding a two-dimensional or even high-dimensional offset to the position of each square block of the ordinary ROI (region of interest) pooling so as to improve the deformable capability of the convolutional neural network, thereby improving the detection and identification capability of the convolutional neural network on a deformable target. Aiming at the detection and identification of small targets, a structure based on inverse convolution and multi-layer feature fusion is firstly used for improving a Faster R-CNN model, so that the information quantity obtained by a small target preselection frame is richer; second, an improvement to the anchor point mechanism allows the RPN to generate more small target pre-selection boxes. Thereby improving the detection and identification effects on small targets.
b. The deformable convolution network comprises a deformable convolution module, a deformable ROI pooling module and a deformable position-sensitive ROI pooling module. The convolution and the feature map in a convolutional neural network are both three-dimensional, the deformable convolution operates in a two-dimensional spatial domain, and the deformable convolution operation is the same between different channel dimensions. Without loss of generality, we will next describe the two-dimensional operation of the model, extending to three dimensions, exactly the same, in order to simplify the problem.
Adding two-dimensional or even high-dimensional offset to the spatial sampling points of the standard convolution to enable the shape of the sampling points of the standard convolution to change; the offset is obtained by performing a convolution operation on the same input feature map, the convolution kernel of the convolution operation maintaining the same resolution and dilation values as the previous convolution layer. The output offset field has the same spatial resolution as the input signature, and the number of channels in the offset field is twice the number of channels in the input signature, which corresponds to the two-dimensional offset of convolving each sample location. In training, the convolution kernel that generates the output feature map and the convolution kernel that generates the offset field are learned simultaneously. To learn the offset domain, the gradient is obtained by inverse operation of bilinear operations in equations (1) and (2).
Figure BDA0002317640160000051
In the formula, p represents an arbitrary sampling point position (in the formula (4-3), p is p0+pn+Δpn) Q represents an input feature diagram InAll integer space traversal points in (a), (b) represent a bilinear interpolation kernel, G (a, b) ═ max (0,1- | a-b |).
In the deformable convolution formula, the gradient is relative to the offset Δ pnThe calculation formula of (a) is as follows:
Figure BDA0002317640160000052
in the formula (I), the compound is shown in the specification,
Figure BDA0002317640160000053
can be derived by the formula (2). Note Δ pnIs a two-dimensional quantity, which we use for simplicity
Figure BDA0002317640160000054
To replace
Figure BDA0002317640160000055
And
Figure BDA0002317640160000056
c. likewise, the deformable ROI pooling operation is also operated in the two-dimensional spatial domain, and is the same between different channel dimensions. Without loss of generality, we will next describe the two-dimensional operation of the model, extending to three dimensions, exactly the same, in order to simplify the problem.
A two-dimensional or even high-dimensional offset is added to the position of each square block for the common ROI (region of interest) pooling, so that the deformable capability of the convolutional neural network is improved, and the detection and identification capability of the convolutional neural network on a deformable target is improved. First, a pooled feature map is obtained using an ROI pooling operation. The signature is then followed by a fully connected layer to get the normalized offset. Finally, this normalized offset is multiplied by the elements of the width and height of the region of interest. Normalization of the offset is essential for the offset learning to be invariant to the region of interest size, and the parameters of the subsequent fully-connected layer will be obtained by the back-propagation algorithm.
In the deformable ROI pooling Module, the relative gradient offset Δ pijThe value of (d) can be calculated as follows:
Figure BDA0002317640160000061
d. for Faster R-CNN, the network is intended to be divided into two phases. In the first stage, a full convolution network generates a feature map for an input picture. In the second phase, a lightweight task-based network generates results based on the input feature map. We mainly refined these two parts with deformable convolution and deformable ROI pooling.
The first stage of the improvement of deformable network to the fast R-CNN network is that a full convolution network generates a characteristic diagram for an input picture. A modified version of the VGG16 network removed one maximum pooling layer, two 4096-cell full-link layers, and one 1000-cell full-link layer following the convolution cell in order to extract features. The method of claim 1 is applied to the last convolution unit, namely the three convolution layers conv5_1, conv5_2 and conv5_ 3.
A classification regression part of a light task-based network generating a result Faster R-CNN network based on an input feature map, mainly using an RPN network to generate a preselected frame, inputting the preselected frame and the feature map into the Fast R-CNN network, performing ROI pooling on a frame by an ROI pooling layer to obtain features, adding two 1024-dimensional full-connected layers, and finally connecting two parallel branches to respectively perform target regression and classification to obtain a final result.
e. Aiming at the detection and identification of small targets, a structure based on inverse convolution and multi-layer feature fusion is designed. The fast R-CNN model is first improved, and an inverse pooling layer is inserted into a convolutional neural network. To apply the inverse pooling layer, first, at the time of the pooling operation, the location of the maximum activation value is recorded. Then, when the pool is reversed, the activation value is returned to the position when the pool is reversed, and the rest positions are all set to be zero. Finally, we need to clip the deconvolved output feature map so that the resolution of the deconvolved feature map is consistent with the resolution of the inverse pooled output feature map.
In the aspect of multi-layer feature fusion, firstly, the feature is fused under the condition of insufficient feature information, and then ROI pooling is carried out on a plurality of interested regions, so that only one-time feature fusion and one-time normalization are needed, and the time of repeated calculation is saved. Secondly, aiming at the condition that the region of interest is small, the last layer of features is subjected to inverse convolution processing, the third layer of features is subjected to maximum pooling processing, and finally the three feature graphs are fused. The resolution of the final used feature map is improved.
f. The anchor point mechanism in the RPN network is modified, and a group of small-scale pre-selection frames are added in the anchor point, so that the pre-selection frames of the small targets contained in the pre-selection frames extracted last by the RPN network are more, and the detection and the identification of the small targets are facilitated.
In the Faster R-CNN network, the RPN network is used to generate preselected boxes, which are then classified and regressed by an algorithm. Therefore, if the RPN is able to generate a more appropriate pre-selection frame, the detection recognition result will also be improved.
The present invention will be described in detail with reference to the drawings:
the first implementation mode comprises the following steps: FIG. 1 is a schematic diagram of a deformable convolution in which a sample point becomes an irregular and offset point by applying an offset to a conventional sampling grid, typically a fractional number, for sampling on an input feature map, typically by bilinear interpolation. The offset is obtained by performing convolution operation on the same input feature map, the convolution kernel of the convolution operation maintains the same resolution and expansion value as the previous convolution layer, the output offset domain has the same spatial resolution as the input feature map, and the number of channels of the offset domain is twice the number of channels of the input feature map, which corresponds to the two-dimensional offset (offset in the x-axis direction and offset in the y-axis direction) of each sampling position of convolution.
The second embodiment: FIG. 2 is a schematic diagram of deformable ROI pooling. First, a pooled feature map is obtained using an ROI pooling operation. Then, the feature map is followed by a fully connected layer to obtain a normalized offset
Figure BDA0002317640160000071
Finally, this normalized offset
Figure BDA0002317640160000072
By multiplication of elements with the width and height of the region of interest, e.g. formula
Figure BDA0002317640160000073
The offset Δ p used in the following equation is obtainedij. Empirically, this amount is usually set to γ of 0.1. Normalization of the offset is essential for the invariance of the offset to the size of the region of interest. The parameters of the subsequent fully-connected layer are obtained by a back propagation algorithm.
Figure BDA0002317640160000074
The third embodiment is as follows: FIG. 3 is a schematic diagram of a modification of the deformable convolution, deformable ROI pooling to Faster R-CNN. The characteristic extraction part of the Faster R-CNN network uses a modified VGG16 network as a basic network to extract characteristics, and the modified VGG16 network removes a maximum pooling layer, two 4096 unit full-link layers and a 1000 unit full-link layer which follow a convolution unit in order to extract characteristics. Experiments have shown that better results can be obtained when the last convolution unit is used for the deformable convolution. Therefore, consider applying the deformable convolution to the last convolution unit, i.e., the three convolution layers conv5_1, conv5_2, conv5_ 3.
The classification regression part of the Faster R-CNN network mainly uses an RPN network to generate a preselected frame, then the preselected frame and a feature map are input into the Fast R-CNN network, firstly, an ROI pooling layer performs ROI pooling on a frame to obtain features, two 1024-dimensional full-connected layers are added, and finally, two parallel branches are connected, and target regression and classification are respectively performed to obtain a final result. In the Fast R-CNN part, we replace the ROI pooling layer with a deformable ROI pooling layer.
The fourth embodiment: FIG. 4 is a schematic diagram of the deconvolution and inverse pooling operations: first, at the time of the pooling operation, the location of the maximum activation value is recorded. Then, when the pool is reversed, the activation value is returned to the position when the pool is reversed, and the rest positions are all set to be zero. The deconvolution operation subjects the output profile of the inverse pooling operation to densification by using a multi-layer convolution-like operation to generate a dense profile. However, in contrast to convolutional layer one convolution operation, which convolves multiple inputs to obtain one output, one input is deconvoluted to obtain multiple outputs. Finally, we need to clip the deconvolved output feature map so that the resolution of the deconvolved feature map is consistent with the resolution of the inverse pooled output feature map.
The fifth embodiment: the combination of global features and local features, such as multi-scale, is used to enhance the acquisition of global texture and local information by the fast R-CNN network to improve the robustness of target detection, and fig. 5 is a multi-layer feature fusion, and the combination of global features and local features, such as multi-scale, is used to enhance the acquisition of global texture and local information by the fast R-CNN network to improve the robustness of target detection. In order to enhance the detection capability of the network, a shallow feature map, such as conv3 or conv4, is considered and then ROI pooling is performed, so that the network can detect a feature containing more low-level features within the region of interest, as shown in the figure.
Embodiment six: consider that the high-level information is deconvoluted to the same resolution as the low-level information, and then the multi-level features of the same resolution are fused. FIG. 6 is a schematic diagram of improved multi-layer feature fusion. Firstly, the output characteristic maps of the three layers conv3, conv4 and conv5 are taken. Then, ROI pooling is carried out on areas corresponding to conv3, conv4 and conv5 by using the region of interest, pooled features are normalized and merged by using L2 in one layer, and the number of the merged feature channels is reduced to be consistent with the output features of conv 5. And finally, connecting a target classification layer and a target regression layer. Since the three feature maps need to be combined, the features of different layers are normalized, such as L2 normalization, and then combined.
Embodiment seven: fig. 7 is a network configuration diagram of the RPN. Nine pre-selected boxes are generated at each sliding window in the original RPN network, which are the metrics 1282,2562,5122]And aspect ratio [1:1,1:2,2:1]A random combination of (a). This scale and aspect ratio selection gave the best test results for the pascal voc data set. Adding a set 64 of small target objects2Of (4), i.e. a preselected box size of [642,1282,2562,5122]. Thus, 12 pre-selection frames are generated at each sliding window and the pre-selection frames tend to tilt towards small target detection, ultimately improving the detection efficiency of small targets.
The eighth embodiment: FIG. 8 is the result of deformable convolution, deformable ROI pooling real-time online identification of video frames, we performed online identification experiments using the improved Faster R-CNN model, with the detection rate of the improved algorithm in the experiments being 12 frames per second.
Table 1 is the test results of our online identification. The predicted value is a target value of each kind of result obtained by algorithm prediction, and the true value is a value obtained by manually labeling the real-time detection video. As can be seen from Table 1, the predicted value of the algorithm is close to the true value, which shows that the improved algorithm has better detection robustness to the deformation problem of marine organisms encountered in real-time detection. FIG. 8 is a test result of identifying certain frames in a video online. As can be seen from fig. 8, the detection result is stable, which indicates that the improved algorithm has a better detection performance for the disturbance deformable target in the unstable imaging environment.
TABLE 1 Online identification test results
Figure BDA0002317640160000081
The ninth embodiment: detection results of original Faster R-CNN algorithm and improved Faster R-CNN algorithm on targets with different scales of marine biological data
TABLE 2
Figure BDA0002317640160000091
As can be seen from Table 2, the improved Faster R-CNN improves the detection results of targets with different scales, and the improvement effect of the small target detection effect is obvious. The detection results of the original Faster R-CNN algorithm and the improved Faster R-CNN algorithm on the small target are respectively mAP (IOU threshold value minus 0.5)35.45 and 42.95, the improvement is 21.16%, and the improvement of the improved Faster R-CNN algorithm on the small target detection is obvious compared with the original Faster R-CNN algorithm. Under a stricter evaluation index, namely, the IOU threshold is taken to be 0.7, the detection results mAP of the original Faster R-CNN algorithm and the improved Faster R-CNN algorithm on the small target are respectively 22.40 and 29.78, the improvement is 32.94%, and the improvement of the algorithm on the small target detection effect is further explained.
In summary, the invention introduces the deformable network, improves the model by using the deformable convolution module and the deformable ROI pooling module of the deformable network, adds two-dimensional or even high-dimensional offset to the spatial sampling point of the standard convolution and the common ROI (region of interest) pooling, enables the sampling point of the convolution to generate shape change, improves the deformable characteristic of the improved model, and improves the detection and identification effect of the deformable target by improving the model. And considering fusion of feature maps of different layers, performing pooling processing on the feature maps of the bottom layer to reduce resolution, performing deconvolution processing on the features of the high layer to improve resolution, and then fusing the feature maps of the low layer, the middle layer and the high layer. Meanwhile, a group of small-scale preselection frames is added, the generation number of the small target preselection frames is increased, and the detection and identification effect of the small target is improved by improving the model.

Claims (7)

1. A deep learning intelligent identification method for a deformable living body small target is characterized by specifically comprising the following steps:
step 1, replacing a basic convolution unit by a deformable convolution module: adding two-dimensional or even high-dimensional offset to the spatial sampling points of the standard convolution to change the shapes of the sampling points of the standard convolution;
step 2. the deformable ROI pooling module replaces the ROI pooling layer: adding a two-dimensional or even high-dimensional offset to the position of each square grid of the common ROI (Region of Interest) pooling so as to improve the deformable capability of the convolutional neural network, obtain the deformable convolutional network and improve the detection and identification capability of the convolutional neural network on a deformable target;
step 3, aiming at the detection and identification of the small target, a structure based on inverse convolution and multi-layer feature fusion is used for improving the FasterR-CNN model, so that the information quantity obtained by a small target preselection frame is richer;
and 4, in the fast R-CNN network, the RPN network is used for generating preselected frames, then an algorithm classifies and regresses the preselected frames, the mechanism of the anchor point is improved, and a group of small-scale preselected frames are added in the anchor point, so that the RPN can generate more small target preselected frames, and the detection and identification effects on small targets are improved.
2. The intelligent deep learning identification method for the deformable living small target as claimed in claim 1, characterized in that: the deformable convolution network comprises a deformable convolution module, a deformable ROI pooling module and a deformable position-sensitive ROI pooling module; the convolution and the feature map in a convolutional neural network are both three-dimensional, the deformable convolution operates in a two-dimensional spatial domain, and the deformable convolution operation is the same between different channel dimensions.
3. The intelligent deep learning identification method for the deformable living small target as claimed in claim 1, characterized in that: step 1 is a two-dimensional operation description of deformable convolution, and specifically comprises adding two-dimensional or even high-dimensional offset to spatial sampling points of standard convolution to enable the shape of the sampling points of the convolution to change; the offset is obtained by performing convolution operation on the same input feature map, and the convolution kernel of the convolution operation and the previous convolution layer keep the same resolution and expansion value; the output offset domain and the input feature map have the same spatial resolution, the number of channels of the offset domain is twice that of the input feature map, which corresponds to the two-dimensional offset of each sampling position of convolution, in training, a convolution kernel for generating the output feature map and a convolution kernel for generating the offset domain are learned at the same time, and for learning to obtain the offset domain, the gradient is obtained by bilinear operation inverse operation of the following two formulas:
Figure FDA0002317640150000011
G(q,p)=g(qx,px)·g(qy,py)
where p denotes the position of an arbitrary sample point, the gradient relative offset Δ p in the deformable ROI pooling moduleijIn the formula (c), p is p0+pn+ΔpnQ represents an input feature diagram InAll integer space traversal points in (a), (b), G (a, b) denotes a bilinear interpolation kernel, G max (0,1- | a-b |);
in the deformable convolution formula, the gradient is relative to the offset Δ pnThe calculation formula of (2) is as follows:
Figure FDA0002317640150000021
in the formula (I), the compound is shown in the specification,
Figure FDA0002317640150000022
can be represented by the formula G (q, p) ═ G (q)x,px)·g(qy,py) Derivation gives, notice of Δ pnIs a two-dimensional quantity, which we use for simplicity
Figure FDA0002317640150000023
To replace
Figure FDA0002317640150000024
And
Figure FDA0002317640150000025
4. the intelligent deep learning identification method for the deformable living small target as claimed in claim 1, characterized in that: the step 2 is that the deformable ROI pooling operation of the ROI pooling layer is operated in a two-dimensional space domain, the deformable ROI pooling operation is the same among different channel dimensions, and the deformable ROI pooling operation specifically comprises that a two-dimensional or even high-dimensional offset is added to the position of each square of the common ROI pooling, so that the deformable capability of the convolutional neural network is improved, and the detection and identification capability of the convolutional neural network on a deformable target is improved; firstly, obtaining a pooled feature map by using ROI pooling operation; then, a full connection layer is connected behind the characteristic graph to obtain normalized offset; finally this normalized offset is calculated by multiplying the elements of the width and height of the region of interest; the normalization of the offset is essential for learning the invariance of the offset to the size of the region of interest, and the parameters of the subsequent full-link layer are obtained through a back propagation algorithm; in the deformable ROI pooling Module, the relative gradient offset Δ pijThe value of (d) can be calculated as:
Figure FDA0002317640150000026
5. the intelligent deep learning identification method for the deformable living small target as claimed in claim 1, characterized in that: the deformable convolution network can be improved on a Faster R-CNN network, the improvement is divided into two stages, the first stage is that a full convolution network generates a feature map for an input picture, and a modified VGG16 network removes a maximum pooling layer, two 4096 unit full connection layers and a 1000 unit full connection layer which follow a convolution unit in order to extract features; applying the deformable convolution to the last convolution unit, i.e. the three convolution layers conv5_1, conv5_2 and conv5_ 3; the second phase is that a lightweight task-based network generates results based on the input feature map; the classification regression part of the Faster R-CNN network mainly uses an RPN network to generate a preselected frame, then the preselected frame and a feature map are input into the Fast R-CNN network, firstly, an ROI pooling layer performs ROI pooling on a frame to obtain features, two 1024-dimensional full-connected layers are added, and finally, two parallel branches are connected, and target regression and classification are respectively performed to obtain a final result.
6. The intelligent deep learning identification method for the deformable living small target as claimed in claim 1, characterized in that: step 3, the improvement of the Faster R-CNN model by using a structure based on inverse convolution specifically comprises the steps of inserting an inverse pooling layer in a convolution neural network; in order to apply the inverse pooling layer, the position of the maximum activation value is first recorded at the time of the pooling operation; then, returning the activation value to the position of the activation value during pooling during anti-pooling, and setting the rest positions to be zero; finally, the output characteristic diagram of the inverse convolution needs to be clipped, so that the resolution of the characteristic diagram after the inverse convolution processing is consistent with the resolution of the inverse pooling output characteristic diagram.
7. The intelligent deep learning identification method for the deformable living small target as claimed in claim 1, characterized in that: step 3, the improvement of the Faster R-CNN model by using the multi-layer feature fusion structure specifically comprises the steps of firstly carrying out fusion processing on features aiming at the condition of insufficient feature information, and then carrying out ROI pooling on a plurality of regions of interest, so that only one-time feature fusion and one-time normalization are needed, and the time for repeated calculation is saved; secondly, aiming at the condition that the region of interest is small, the last layer of features is subjected to inverse convolution processing, the third layer of features is subjected to maximum pooling processing, and finally the three feature graphs are fused.
CN201911284570.7A 2019-12-13 2019-12-13 Deep learning intelligent identification method for deformable living body small target Pending CN111126385A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911284570.7A CN111126385A (en) 2019-12-13 2019-12-13 Deep learning intelligent identification method for deformable living body small target

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911284570.7A CN111126385A (en) 2019-12-13 2019-12-13 Deep learning intelligent identification method for deformable living body small target

Publications (1)

Publication Number Publication Date
CN111126385A true CN111126385A (en) 2020-05-08

Family

ID=70498812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911284570.7A Pending CN111126385A (en) 2019-12-13 2019-12-13 Deep learning intelligent identification method for deformable living body small target

Country Status (1)

Country Link
CN (1) CN111126385A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709307A (en) * 2020-05-22 2020-09-25 哈尔滨工业大学 Resolution enhancement-based remote sensing image small target detection method
CN111815510A (en) * 2020-09-11 2020-10-23 平安国际智慧城市科技股份有限公司 Image processing method based on improved convolutional neural network model and related equipment
CN111860171A (en) * 2020-06-19 2020-10-30 中国科学院空天信息创新研究院 Method and system for detecting irregular-shaped target in large-scale remote sensing image
CN112651346A (en) * 2020-12-29 2021-04-13 青海三新农电有限责任公司 Streaming media video identification and detection method based on deep learning
CN112733672A (en) * 2020-12-31 2021-04-30 深圳一清创新科技有限公司 Monocular camera-based three-dimensional target detection method and device and computer equipment
CN113177486A (en) * 2021-04-30 2021-07-27 重庆师范大学 Dragonfly order insect identification method based on regional suggestion network
CN114155246A (en) * 2022-02-10 2022-03-08 国网江西省电力有限公司电力科学研究院 Deformable convolution-based power transmission tower pin defect detection method
CN116205967A (en) * 2023-04-27 2023-06-02 中国科学院长春光学精密机械与物理研究所 Medical image semantic segmentation method, device, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229904A (en) * 2017-04-24 2017-10-03 东北大学 A kind of object detection and recognition method based on deep learning
CN107316001A (en) * 2017-05-31 2017-11-03 天津大学 Small and intensive method for traffic sign detection in a kind of automatic Pilot scene
CN107766818A (en) * 2017-10-18 2018-03-06 哈尔滨工程大学 A kind of didactic submerged structure environment line feature extraction method
US20180137642A1 (en) * 2016-11-15 2018-05-17 Magic Leap, Inc. Deep learning system for cuboid detection
CN109766873A (en) * 2019-02-01 2019-05-17 中国人民解放军陆军工程大学 A kind of pedestrian mixing deformable convolution recognition methods again
CN110163275A (en) * 2019-05-16 2019-08-23 西安电子科技大学 SAR image objective classification method based on depth convolutional neural networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137642A1 (en) * 2016-11-15 2018-05-17 Magic Leap, Inc. Deep learning system for cuboid detection
CN107229904A (en) * 2017-04-24 2017-10-03 东北大学 A kind of object detection and recognition method based on deep learning
CN107316001A (en) * 2017-05-31 2017-11-03 天津大学 Small and intensive method for traffic sign detection in a kind of automatic Pilot scene
CN107766818A (en) * 2017-10-18 2018-03-06 哈尔滨工程大学 A kind of didactic submerged structure environment line feature extraction method
CN109766873A (en) * 2019-02-01 2019-05-17 中国人民解放军陆军工程大学 A kind of pedestrian mixing deformable convolution recognition methods again
CN110163275A (en) * 2019-05-16 2019-08-23 西安电子科技大学 SAR image objective classification method based on depth convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周浩: "样本不足条件下水下机器人小目标检测识别研究", 《网页在线公开:HTTPS://D.WANFANGDATA.COM.CN/THESIS/CHJUAGVZAXNOZXDTMJAYMTEYMDESCFKZNTUXMDU0GGHYAG10EWXWNW%3D%3D》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709307A (en) * 2020-05-22 2020-09-25 哈尔滨工业大学 Resolution enhancement-based remote sensing image small target detection method
CN111709307B (en) * 2020-05-22 2022-08-30 哈尔滨工业大学 Resolution enhancement-based remote sensing image small target detection method
CN111860171A (en) * 2020-06-19 2020-10-30 中国科学院空天信息创新研究院 Method and system for detecting irregular-shaped target in large-scale remote sensing image
CN111815510A (en) * 2020-09-11 2020-10-23 平安国际智慧城市科技股份有限公司 Image processing method based on improved convolutional neural network model and related equipment
CN111815510B (en) * 2020-09-11 2020-12-22 平安国际智慧城市科技股份有限公司 Image processing method based on improved convolutional neural network model and related equipment
CN112651346A (en) * 2020-12-29 2021-04-13 青海三新农电有限责任公司 Streaming media video identification and detection method based on deep learning
CN112733672A (en) * 2020-12-31 2021-04-30 深圳一清创新科技有限公司 Monocular camera-based three-dimensional target detection method and device and computer equipment
CN113177486A (en) * 2021-04-30 2021-07-27 重庆师范大学 Dragonfly order insect identification method based on regional suggestion network
CN113177486B (en) * 2021-04-30 2022-06-03 重庆师范大学 Dragonfly order insect identification method based on regional suggestion network
CN114155246A (en) * 2022-02-10 2022-03-08 国网江西省电力有限公司电力科学研究院 Deformable convolution-based power transmission tower pin defect detection method
CN114155246B (en) * 2022-02-10 2022-06-14 国网江西省电力有限公司电力科学研究院 Deformable convolution-based power transmission tower pin defect detection method
CN116205967A (en) * 2023-04-27 2023-06-02 中国科学院长春光学精密机械与物理研究所 Medical image semantic segmentation method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN111126385A (en) Deep learning intelligent identification method for deformable living body small target
Zhang et al. SCSTCF: spatial-channel selection and temporal regularized correlation filters for visual tracking
Garg et al. Unsupervised cnn for single view depth estimation: Geometry to the rescue
US9418458B2 (en) Graph image representation from convolutional neural networks
CN111291809B (en) Processing device, method and storage medium
CN112750148B (en) Multi-scale target perception tracking method based on twin network
CN108846404B (en) Image significance detection method and device based on related constraint graph sorting
CN110796686A (en) Target tracking method and device and storage device
CN111797841B (en) Visual saliency detection method based on depth residual error network
CN116188999B (en) Small target detection method based on visible light and infrared image data fusion
CN111768415A (en) Image instance segmentation method without quantization pooling
CN112183675B (en) Tracking method for low-resolution target based on twin network
CN113052755A (en) High-resolution image intelligent matting method based on deep learning
CN111566675A (en) Vehicle positioning
CN114641790A (en) Super-resolution processing method and system for infrared image
Leng et al. Context-aware attention network for image recognition
Zhang et al. A new deep spatial transformer convolutional neural network for image saliency detection
Vaquero et al. Tracking more than 100 arbitrary objects at 25 FPS through deep learning
Lee et al. Connectivity-based convolutional neural network for classifying point clouds
Wu et al. Depth dynamic center difference convolutions for monocular 3D object detection
CN111274901B (en) Gesture depth image continuous detection method based on depth gating recursion unit
Zhang et al. Planeseg: Building a plug-in for boosting planar region segmentation
Hallek et al. Real-time stereo matching on CUDA using Fourier descriptors and dynamic programming
CN117011655A (en) Adaptive region selection feature fusion based method, target tracking method and system
CN115496859A (en) Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200508