CN113160050A - Small target identification method and system based on space-time neural network - Google Patents
Small target identification method and system based on space-time neural network Download PDFInfo
- Publication number
- CN113160050A CN113160050A CN202110319609.5A CN202110319609A CN113160050A CN 113160050 A CN113160050 A CN 113160050A CN 202110319609 A CN202110319609 A CN 202110319609A CN 113160050 A CN113160050 A CN 113160050A
- Authority
- CN
- China
- Prior art keywords
- gate
- time sequence
- neural network
- lstm
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000010586 diagram Methods 0.000 claims abstract description 20
- 230000007704 transition Effects 0.000 claims abstract description 19
- 238000013507 mapping Methods 0.000 claims abstract description 7
- 238000012546 transfer Methods 0.000 claims abstract description 7
- 230000007246 mechanism Effects 0.000 claims abstract description 5
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 238000000605 extraction Methods 0.000 claims description 9
- 230000000306 recurrent effect Effects 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000008569 process Effects 0.000 description 9
- 238000012549 training Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000012937 correction Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a small target identification method and a system based on a space-time neural network, wherein the method comprises the following steps: preprocessing an original blurred image by using a super-resolution algorithm to obtain a high-quality image sequence; carrying out logic subtraction operation on adjacent frames of the high-quality image sequence by utilizing a space-time attention mechanism, and capturing and highlighting a suspicious region; extracting depth features in the suspicious region to obtain a feature map time sequence; inputting the characteristic diagram time sequence into a mapping device with confidence output by adopting an LSTM state transfer subnet to obtain a transfer state; and classifying the transition state by using a classifier to obtain a final recognition result, wherein the final recognition result is a target type and a confidence rate. The method is characterized in that the model is self-corrected along with the continuous reading of the frame sequence, and is gradually corrected into a correct category and the confidence rate is continuously improved.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a small target identification method and system based on a space-time neural network.
Background
With the development of the computer vision field, the target identification technology becomes a research hotspot and is widely applied to the fields of intelligent security, automatic driving, medical auxiliary diagnosis and the like. In practical applications, it is often impractical to require the target to be clearly and easily identified, which makes small target recognition technology more and more interesting in recent years. Some difficulties are prevalent in real-world application scenarios, such as: the small size of the target, the long distance of the target, the low resolution of the image source, etc. pose a serious challenge to the traditional algorithm using a single frame image as the identification basis.
In the current generation of a pervasive target recognition algorithm based on a deep network, a mainstream deep network model is basically adopted as a backbone network and an automatic feature extractor, and a final recognition result is given by a classifier. Because the data set containing a large number of pictures is used for training, the universal algorithms can usually achieve good effects when clearly identifiable objects are faced, but because operations such as convolution and the like are applied to different degrees in a backbone network, the characteristic resolution on a convolution channel is inevitably reduced, and further the performance of the algorithms is seriously degraded when the algorithms are faced with small target problems.
In recent years, related work is developed around the problem of small target recognition, and one idea is to improve the model itself from the perspective of recognizing the model by performing operations such as feature fusion of different sizes, expansion of the acceptance field, introduction of image context and the like, so as to improve the recognition capability of the model on the small target. Another idea is to restore a small target to a clearly recognizable signal as much as possible by means of data enhancement, super-resolution processing, and the like from the viewpoint of image source restoration. Although the two methods have a certain effect, the two methods can not meet the real-time and accurate requirements in a real scene because the two methods only work on a single-frame image.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
To this end, an object of the present invention is to provide a small target identification method based on a spatio-temporal neural network.
Another objective of the present invention is to provide a small target recognition system based on spatiotemporal neural network.
In order to achieve the above object, an embodiment of the invention provides a small target identification method based on a spatiotemporal neural network, which includes the following steps: step S1, acquiring an original blurred image at the current moment; step S2, preprocessing the original blurred image by using a super-resolution algorithm to obtain a high-quality image sequence; step S3, performing logic subtraction operation between adjacent frames of the high-quality image sequence by using a space-time attention mechanism, capturing and highlighting the suspicious region; step S4, extracting the depth features in the suspicious region to obtain a feature map time sequence; step S5, inputting the characteristic diagram time sequence into a mapping device of confidence output by adopting an LSTM state transfer subnet to obtain a corrected characteristic diagram time sequence; and step S6, classifying the corrected characteristic diagram time sequence by using a classifier to obtain a final recognition result, wherein the final recognition result is a target type and a confidence rate.
The small target identification method based on the space-time neural network solves the problem of identification performance reduction caused by the existing single-frame image target identification, after the area where the target is located is approximately locked, the visual capturer and the computing component resources are intensively and continuously distributed to the suspicious target located in the area, and the identification confidence rate is gradually improved through the continuous time sequence image capture within a certain time; meanwhile, along with the gradual operation of the model, some error conclusions obtained at the early stage are also corrected, so that the model has certain error correction performance.
In addition, the small target identification method based on the spatiotemporal neural network according to the above embodiment of the present invention may further have the following additional technical features:
further, in one embodiment of the present invention, the LSTM state transition subnet section employs a significant variant LSTM of the RNN recurrent neural network as a main component, wherein a complete significant variant LSTM cell structure includes an input gate, an output gate, a gate, and a forgetting gate.
Further, in one embodiment of the present invention, the structure of the LSTM cell of the one of the intact significant variants is:
wherein i is an input gate, f is a forgetting gate, o is an output gate, g is a gate, and sigmod function σ (x) is 1/(1+ e)-x),φ(x)=(ex-e-x)/(ex+e-x) W is the weight matrix, B is the offset vector, xtAnd ht-1Is the current input.
Further, in an embodiment of the present invention, the transition state is:
wherein h istIs the output state of the current time sequence, t is the time sequence, o is the output gate, ctIs a hidden state of the current time sequence, f is a forgetting gate, ct-1In the hidden state of the previous time sequence, i is the input gate and g is the gate.
Further, in an embodiment of the present invention, any one of the deep convolution models is adopted as the backbone network in the step S4 and the step S6.
In order to achieve the above object, another embodiment of the present invention provides a small target recognition system based on a spatiotemporal neural network, including: the system comprises an acquisition module, a super-resolution module, a space-time attention module, a feature extraction module, an LSTM state transition subnet and a classification module, wherein the acquisition module is used for acquiring an original fuzzy image at the current moment; the super-resolution module is used for preprocessing the original blurred image to obtain a high-quality image sequence; the space-time attention module is used for carrying out logic subtraction operation between adjacent frames of the high-quality image sequence, capturing and highlighting a suspicious region; the feature extraction module is used for extracting the depth features in the suspicious region to obtain a feature map time sequence; the LSTM state transfer subnet is used for inputting the characteristic diagram time sequence into a mapping device with confidence output to obtain a corrected characteristic diagram time sequence; and the classification module is used for classifying the corrected characteristic diagram time sequence to obtain a final recognition result, wherein the final recognition result is a type and a confidence rate.
The small target recognition system based on the spatiotemporal neural network solves the problem of the reduction of the recognition performance caused by the conventional single-frame image target recognition, and after the area where the target is located is approximately locked, the visual capturer and the computing component resources are intensively and continuously distributed to the suspicious target in the area, and the recognition confidence rate is gradually improved through the continuous time sequence image capture for a certain time; meanwhile, along with the gradual operation of the model, some error conclusions obtained at the early stage are also corrected, so that the model has certain error correction performance.
In addition, the small target recognition system based on the spatiotemporal neural network according to the above embodiment of the present invention may further have the following additional technical features:
further, in one embodiment of the present invention, the LSTM state transition subnet section employs a significant variant LSTM of the RNN recurrent neural network as a main component, wherein a complete significant variant LSTM cell structure includes an input gate, an output gate, a gate, and a forgetting gate.
Further, in one embodiment of the present invention, the structure of the LSTM cell of the one of the intact significant variants is:
wherein i is an input gate, f is a forgetting gate, o is an output gate, g is a gate, and sigmod function σ (x) is 1/(1+ e)-x),φ(x)=(ex-e-x)/(ex+e-x) W is the weight matrix, B is the offset vector, xtAnd ht-1Is the current input.
Further, in an embodiment of the present invention, the transition state is:
wherein h istIs the output state of the current time sequence, t is the time sequence, o is the output gate, ctIs a hidden state of the current time sequence, f is a forgetting gate, ct-1In the hidden state of the previous time sequence, i is the input gate and g is the gate.
Further, in an embodiment of the present invention, any one of the deep convolution models is used as a backbone network in the feature extraction module and the classification module.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a method for identifying small targets based on a spatiotemporal neural network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a relationship between attention distribution and recognition accuracy for a specific target category according to an embodiment of the present invention, wherein (a) the attention distribution is distracted and the recognition is incorrect; (b) correct identification for attention concentration;
FIG. 3 is a schematic diagram of the construction of an LSTM cell unit according to one embodiment of the present invention;
FIG. 4 is a sample pictorial illustration of ATSETC4 of an embodiment of the present invention;
FIG. 5 is a schematic diagram of model self-correction capability according to one embodiment of the invention;
FIG. 6 is a diagram illustrating the processing of different size images by SRGAN in accordance with an embodiment of the present invention;
FIG. 7 is a structural diagram of a small target recognition system based on a spatiotemporal neural network according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The method and system for identifying small targets based on the spatio-temporal neural network proposed by the embodiments of the present invention will be described below with reference to the accompanying drawings, and first, the method for identifying small targets based on the spatio-temporal neural network proposed by the embodiments of the present invention will be described with reference to the accompanying drawings.
FIG. 1 is a flow chart of a small target identification method based on a spatiotemporal neural network according to an embodiment of the present invention.
As shown in fig. 1, the small target identification method based on the spatiotemporal neural network comprises the following steps:
in step S1, the original blurred image at the current time is acquired.
In step S2, the original blurred image is preprocessed by using a super-resolution algorithm to obtain a high-quality image sequence.
Specifically, a super-resolution algorithm with a complete training model is used for carrying out initial enhancement on an original blurred image to obtain a data source with better image quality, wherein any effective super-resolution method can be introduced into the super-resolution algorithm.
In step S3, a logical subtraction is performed between adjacent frames of the high-quality image sequence using a spatiotemporal attention mechanism to capture and highlight the suspicious region, so that the subsequent computing resources are more accurately allocated to the actual target.
Formally, the attention score Y of a model for a certain target is taken as the inner product of the weight w and the feature map a:
wherein A is a feature map, w is a neural network model weight, Y is an attention distribution score of a model inference process, relu is a linear rectification function,the weight gradient of the model is specifically as follows:
in the formula (2), the first and second groups,is a gradient weighted sum of each feature element. And combining the formula (1) and the formula (2) to obtain a formula (3), which is a final form of the attention score of the model to the fixed category:
in fact, the distribution of attention is closely related to the recognition accuracy, and as shown in fig. 2, when a false recognition occurs, the attention of the model becomes abnormally dispersed, whereas when the recognition is correct, the attention almost completely fits the target contour.
In step S4, the depth features in the suspicious region are extracted to obtain a feature map time series sequence.
That is, the suspicious region output by the spatio-temporal attention mechanism is accepted and its depth features are extracted as inputs to the LSTM state transition sub-network.
In step S5, the LSTM state transition subnet is used to input the signature graph timing sequence into the mapping device for confidence output, and a corrected signature graph timing sequence is obtained.
Further, the LSTM state transition subnet section employs LSTM, an important variant of RNN recurrent neural networks, as a main component. The traditional RNN recurrent neural network unit has the limitation of time length of stored contents due to the problem of gradient disappearance and is not easy to train. As shown in fig. 3, LSTM is a variant of the recurrent neural network specifically designed to solve such problems, and a complete LSTM cell structure includes an input gate, an output gate, a gate and a forgetting gate, which can transmit the current hidden state to the next time to participate in fusion calculation, and simultaneously avoid the memory storage duration limitation caused by the gradient disappearance problem of the general recurrent neural network, and the specific formula is as follows:
wherein i is an input gate, f is a forgetting gate, o is an output gate, g is a gate, and sigmod function σ (x) is 1/(1+ e)-x),φ(x)=(ex-e-x)/(ex+e-x) W is the weight matrix, B is the offset vector, xtAnd ht-1Is the current input.
The output state and hidden state calculation process is as follows:
wherein h istIs the output state of the current time sequence, t is the time sequence, o is the output gate, ctIs a hidden state of the current time sequence, f is a forgetting gate, ct-1In the hidden state of the previous time sequence, i is an input gate, and g is a gate;
and then, correcting the characteristic diagram time sequence according to the output state and the hidden state.
In step S6, the classifier is used to classify the corrected feature map timing sequence to obtain a final recognition result, where the final recognition result is a target type and a confidence rate.
Specifically, the time-series output of LSTM cells (the corrected feature map time-series) is input into the classifier to obtain the final classification result.
It should be noted that, in step S4 and step S6, any one of the mainstream deep convolution models may be used as the backbone network, and similarly, any effective feature extractor and classifier may be substituted.
In addition, in order to carry out a quantification experiment, the invention constructs an empty small target sequence image challenge data set ATSETC4 to be used as a deep network training basis, and solves the defect that a special data set for training a serialization neural network is lacked in the prior art. The ATSETC4 contains 2400 video clips from real capture and network resources, taking into account a number of scenarios including wilderness, urban, virtual environments, and complex weather environments. As shown in fig. 4, the present invention sets four flight target categories in ATSETC 4: flying birds, hot air balloons, fixed wing drones and rotor drones. Six standard size image subsets are also set: 224 × 224, 112 × 112, 56 × 56, 28 × 28, 14 × 14, 7 × 7 for multi-scale contrast testing (the reason for this arrangement is to accommodate the parameter requirements of a network with full connection layer depth). In particular, the small-scale subset is sampled from the large-scale subset, and the original targets of different sizes are properly fused in the initialization stage of the large-scale subset, so as to increase the challenge of the data set. The final ATSETC4 contains 2400 sequences, each 25 frames in length, totaling 60000 images. In a general sense, a target size of 28 × 28 or less may be considered a small target.
Therefore, the specific working process of the small target identification method proposed by the present invention can be shown in table 1 below.
Furthermore, in practical tests, the method has a significant effect on identifying continuous target frames and has strong self-correction capability, as shown in fig. 5, in the early stage identification process of the model, transient error identification occurs due to reasons of signal blurring, small target, complex background and the like, and as the frame sequence is continuously read in, the model performs self-correction, gradually corrects the model into a correct type and continuously improves the confidence rate.
The small target identification method based on the spatiotemporal neural network proposed by the present invention is further explained by a specific embodiment.
Firstly, the SRGAN super-resolution model with the trained model is adopted to directly process the image in the embodiment of the invention.
Next, a cross entropy loss function is used as an optimization function, and the minimum blocksize is set to 16. Specifically, one sequence with 25 frames is set as the minimum batch unit. Initial learning rate of 10-4The learning rate decreases by a 100-fold scaling factor as the experimental accuracy of the validation set stops rising significantly. In particular, the fully-connected layer in the model is regularized by a dropout mode in a training process, and the dropout factor is 0.5 (namely, a part of parameters of the fully-connected layer are randomly frozen to prevent overfitting). In addition, the method is mainly based on a VGG11 deep convolution network which is pre-trained on ImageNet and serves as a feature extractor, and most parameters of a convolution main network are kept in a frozen state in the training process. Accordingly, ATSETC4 is partitioned into a training set and a test set at an 8:2 ratio. Finally, 55-65 rounds of training are averagely needed on subsets with different sizes, and the time for averagely training the subset model with one size is 90 minutes. The experimental device is a single-card NVIDIA GTX 1080Ti GPU, and a machine learning framework adopts a Pythrch. The other models in the comparative experiment used default parameters, and the test phase was still based on ATSETC4 provided by the present invention.
In the experiment, the embodiment of the invention sets the simplified spatio-temporal neural network as Simple _ STNet (without a super-module), and the full version model is named as STNet. In comparison with some leading edge target identification network performance, as shown in table 2 below.
Table 2: simple STNet, STNet and various advanced recognition algorithm performance comparison
From table 2, it can be seen that both the simplified version of Simple STNet and the full version of STNet achieve the best performance on almost all size subsets of ATSETC4, and the degradation of full version of STNet at 7-scale is due to the fact that the super-resolution has exceeded the theoretical limit under the 32-fold down-sampling condition, the image restoration process is erroneous, and the performance is degraded. As shown in fig. 6, the results of the original images with different sizes and the SRGAN processing are shown in three rows from left to right: 224 size high definition map, 7 size low definition map and 7 size SRGAN processing results.
Therefore, the small target identification method based on the spatiotemporal neural network provided by the embodiment of the invention solves the problem of identification performance reduction caused by the existing single-frame image target identification, after the area where the target is located is approximately locked, the visual capturer and the computing component resources are intensively and continuously distributed to the suspicious target in the area, and the identification confidence rate is gradually improved through the continuous time sequence image capture for a certain time; meanwhile, along with the gradual operation of the model, some error conclusions obtained at the early stage are also corrected, so that the model has certain error correction performance.
Next, a small object recognition system based on a spatiotemporal neural network proposed according to an embodiment of the present invention will be described with reference to the accompanying drawings.
FIG. 7 is a schematic structural diagram of a small target recognition system based on a spatiotemporal neural network according to an embodiment of the present invention.
As shown in fig. 7, the system 10 includes: the system comprises an acquisition module 100, a super-resolution module 200, a spatiotemporal attention module 300, a feature extraction module 400, an LSTM state transition subnet 500 and a classification module 600.
The obtaining module 100 is configured to obtain an original blurred image at a current time. The super-resolution module 200 is configured to pre-process the original blurred image to obtain a high-quality image sequence. The spatiotemporal attention module 300 is used to perform a logical subtraction operation between adjacent frames of a high quality image sequence to capture and highlight suspicious regions. The feature extraction module 400 is configured to extract depth features in the suspicious region to obtain a feature map time sequence. The LSTM state transition sub-network 500 is used to input the signature graph timing sequence into the mapping device of the confidence output, resulting in a transition state. The classification module 600 is configured to classify the transition state to obtain a final recognition result, where the final recognition result is a category and a confidence rate.
Further, in one embodiment of the invention, the LSTM state transition subnet section employs as a major component the important variant LSTM of the RNN recurrent neural network, wherein a complete important variant LSTM cell structure includes an input gate, an output gate, a gate, and a forgetting gate.
Further, in one embodiment of the present invention, the structure of an entire major variant LSTM cell is:
wherein i is an input gate, f is a forgetting gate, o is an output gate, g is a gate, and sigmod function σ (x) is 1/(1+ e)-x),φ(x)=(ex-e-x)/(ex+e-x) W is the weight matrix, B is the offset vector, xtAnd ht-1Is the current input.
Further, in one embodiment of the present invention, the transition state is:
wherein h istIs the output state of the current time sequence, t is the time sequence, o is the output gate, ctIs a hidden state of the current time sequence, f is a forgetting gate, ct-1I is the input gate in the hidden state of the previous time sequenceAnd g is a door.
Optionally, in an embodiment of the present invention, either one of the deep convolution model is adopted as the backbone network in the feature extraction module and the classification module.
It should be noted that the foregoing explanation of the embodiment of the small target identification method based on the spatio-temporal neural network is also applicable to the system, and is not repeated here.
According to the small target recognition system based on the space-time neural network, which is provided by the embodiment of the invention, the problem of the reduction of the recognition performance caused by the recognition of the existing single-frame image target is solved, after the area where the target is located is roughly locked, the visual capturer and the computing component resources are intensively and continuously distributed to the suspicious target in the area, and the recognition confidence rate is gradually improved through the continuous time sequence image capture for a certain time; meanwhile, along with the gradual operation of the model, some error conclusions obtained at the early stage are also corrected, so that the model has certain error correction performance.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
Claims (10)
1. A small target identification method based on a space-time neural network is characterized by comprising the following steps:
step S1, acquiring an original blurred image at the current moment;
step S2, preprocessing the original blurred image by using a super-resolution algorithm to obtain a high-quality image sequence;
step S3, performing logic subtraction operation between adjacent frames of the high-quality image sequence by using a space-time attention mechanism, capturing and highlighting the suspicious region;
step S4, extracting the depth features in the suspicious region to obtain a feature map time sequence;
step S5, inputting the characteristic diagram time sequence into a mapping device of confidence output by adopting an LSTM state transfer subnet to obtain a corrected characteristic diagram time sequence;
and step S6, classifying the corrected characteristic diagram time sequence by using a classifier to obtain a final recognition result, wherein the final recognition result is a target type and a confidence rate.
2. The spatiotemporal neural network-based small target identification method according to claim 1, wherein the LSTM state transition sub-network portion employs a significant variant LSTM of RNN recurrent neural network as a main component, wherein a complete significant variant LSTM cell structure comprises an input gate, an output gate, a gate and a forgetting gate.
3. The spatiotemporal neural network-based small target identification method of claim 2, wherein the structure of the one complete significant variant LSTM cell is:
wherein i is an input gate, f is a forgetting gate, o is an output gate, g is a gate, and sigmod function σ (x) is 1/(1+ e)-x),φ(x)=(ex-e-x)/(ex+e-x) W is the weight matrix, B is the offset vector, xtAnd ht-1Is the current input.
4. The spatiotemporal neural network-based small target identification method according to claim 1, wherein the transition state is:
wherein h istIs the output state of the current time sequence, t is the time sequence, o is the output gate, ctIs a hidden state of the current time sequence, f is a forgetting gate, ct-1In the hidden state of the previous time sequence, i is the input gate and g is the gate.
5. The spatio-temporal neural network-based small object recognition method according to claim 1, wherein any one of deep convolution models is adopted as a backbone network in the steps S4 and S6.
6. A small target recognition system based on a spatiotemporal neural network, comprising: an acquisition module, a super-resolution module, a time-space attention module, a feature extraction module, an LSTM state transfer subnet and a classification module, wherein,
the acquisition module is used for acquiring an original blurred image at the current moment;
the super-resolution module is used for preprocessing the original blurred image to obtain a high-quality image sequence;
the space-time attention module is used for carrying out logic subtraction operation between adjacent frames of the high-quality image sequence, capturing and highlighting a suspicious region;
the feature extraction module is used for extracting the depth features in the suspicious region to obtain a feature map time sequence;
the LSTM state transfer subnet is used for inputting the characteristic diagram time sequence into a mapping device with confidence output to obtain a corrected characteristic diagram time sequence;
and the classification module is used for classifying the corrected characteristic diagram time sequence to obtain a final recognition result, wherein the final recognition result is a type and a confidence rate.
7. The spatiotemporal neural network-based small target recognition system of claim 1, wherein the LSTM state transition sub-network portion employs a significant variant LSTM of the RNN recurrent neural network as a main component, wherein one complete significant variant LSTM cell structure comprises an input gate, an output gate, a gate and a forgetting gate.
8. The spatiotemporal neural network-based small object recognition system of claim 7, wherein the structure of the one entire significant variant LSTM cell is:
wherein i is an input gate, f is a forgetting gate, o is an output gate, g is a gate, and sigmod function σ (x) is 1/(1+ e)-x),φ(x)=(ex-e-x)/(ex+e-x) W is the weight matrix, B is the offset vector, xtAnd ht-1Is the current input.
9. The spatiotemporal neural network-based small object recognition system of claim 6, wherein the transition state is:
wherein h istIs the output state of the current time sequence, t is the time sequence, o is the output gate, ctIs a hidden state of the current time sequence, f is a forgetting gate, ct-1In the hidden state of the previous time sequence, i is the input gate and g is the gate.
10. The spatiotemporal neural network-based small target recognition system of claim 6, wherein either of the feature extraction module and the classification module employs a deep convolution model as a backbone network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110319609.5A CN113160050B (en) | 2021-03-25 | 2021-03-25 | Small target identification method and system based on space-time neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110319609.5A CN113160050B (en) | 2021-03-25 | 2021-03-25 | Small target identification method and system based on space-time neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113160050A true CN113160050A (en) | 2021-07-23 |
CN113160050B CN113160050B (en) | 2023-08-25 |
Family
ID=76884634
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110319609.5A Active CN113160050B (en) | 2021-03-25 | 2021-03-25 | Small target identification method and system based on space-time neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113160050B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116091428A (en) * | 2022-12-29 | 2023-05-09 | 国网电力空间技术有限公司 | High-precision intelligent power transmission line inspection image tower dividing method and system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170262995A1 (en) * | 2016-03-11 | 2017-09-14 | Qualcomm Incorporated | Video analysis with convolutional attention recurrent neural networks |
CN108765394A (en) * | 2018-05-21 | 2018-11-06 | 上海交通大学 | Target identification method based on quality evaluation |
WO2019039157A1 (en) * | 2017-08-24 | 2019-02-28 | 日立オートモティブシステムズ株式会社 | Device and method for identifying region including small object around vehicle |
CN111402131A (en) * | 2020-03-10 | 2020-07-10 | 北京师范大学 | Method for acquiring super-resolution land cover classification map based on deep learning |
CN111524135A (en) * | 2020-05-11 | 2020-08-11 | 安徽继远软件有限公司 | Image enhancement-based method and system for detecting defects of small hardware fittings of power transmission line |
US20200279157A1 (en) * | 2017-10-16 | 2020-09-03 | Illumina, Inc. | Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks |
CN111832509A (en) * | 2020-07-21 | 2020-10-27 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle weak and small target detection method based on space-time attention mechanism |
CN112215119A (en) * | 2020-10-08 | 2021-01-12 | 华中科技大学 | Small target identification method, device and medium based on super-resolution reconstruction |
CN112288778A (en) * | 2020-10-29 | 2021-01-29 | 电子科技大学 | Infrared small target detection method based on multi-frame regression depth network |
-
2021
- 2021-03-25 CN CN202110319609.5A patent/CN113160050B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170262995A1 (en) * | 2016-03-11 | 2017-09-14 | Qualcomm Incorporated | Video analysis with convolutional attention recurrent neural networks |
WO2019039157A1 (en) * | 2017-08-24 | 2019-02-28 | 日立オートモティブシステムズ株式会社 | Device and method for identifying region including small object around vehicle |
US20200279157A1 (en) * | 2017-10-16 | 2020-09-03 | Illumina, Inc. | Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks |
CN108765394A (en) * | 2018-05-21 | 2018-11-06 | 上海交通大学 | Target identification method based on quality evaluation |
CN111402131A (en) * | 2020-03-10 | 2020-07-10 | 北京师范大学 | Method for acquiring super-resolution land cover classification map based on deep learning |
CN111524135A (en) * | 2020-05-11 | 2020-08-11 | 安徽继远软件有限公司 | Image enhancement-based method and system for detecting defects of small hardware fittings of power transmission line |
CN111832509A (en) * | 2020-07-21 | 2020-10-27 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle weak and small target detection method based on space-time attention mechanism |
CN112215119A (en) * | 2020-10-08 | 2021-01-12 | 华中科技大学 | Small target identification method, device and medium based on super-resolution reconstruction |
CN112288778A (en) * | 2020-10-29 | 2021-01-29 | 电子科技大学 | Infrared small target detection method based on multi-frame regression depth network |
Non-Patent Citations (9)
Title |
---|
ADITYA CHATTOPADHAY ET AL.: "Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks", 《2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION》, pages 841 - 843 * |
DEZHONG XU ET AL: "Visual Tracking by Combining the Structure-Aware Network and Spatial-Temporal Regression", 《 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)》 * |
YUXIN PENG ET AL.: "Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》, vol. 29, no. 3, XP011714235, DOI: 10.1109/TCSVT.2018.2808685 * |
佚名: "基于深度学习的长期记忆模型", 《上海节能》 * |
佚名: "基于深度学习的长期记忆模型", 《上海节能》, 31 December 2020 (2020-12-31), pages 304 * |
姜峰等: "a water surface moving target detection based on information fusion using deep learning", 《JOURNAL OF PHYSICS CONFERENCE SERIES》 * |
姜峰等: "a water surface moving target detection based on information fusion using deep learning", 《JOURNAL OF PHYSICS CONFERENCE SERIES》, 31 August 2020 (2020-08-31) * |
杜圣东 等: "一种基于序列到序列时空注意力学习的交通流预测模型", 《计算机研究与发展》 * |
杨杰: "《人工智能基础》", 30 April 2020, pages: 132 - 135 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116091428A (en) * | 2022-12-29 | 2023-05-09 | 国网电力空间技术有限公司 | High-precision intelligent power transmission line inspection image tower dividing method and system |
CN116091428B (en) * | 2022-12-29 | 2023-09-01 | 国网电力空间技术有限公司 | High-precision intelligent power transmission line inspection image tower dividing method and system |
Also Published As
Publication number | Publication date |
---|---|
CN113160050B (en) | 2023-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Detect globally, refine locally: A novel approach to saliency detection | |
CN110443143B (en) | Multi-branch convolutional neural network fused remote sensing image scene classification method | |
Arbelle et al. | Microscopy cell segmentation via adversarial neural networks | |
CN112507777A (en) | Optical remote sensing image ship detection and segmentation method based on deep learning | |
CN113378600B (en) | Behavior recognition method and system | |
US20210326638A1 (en) | Video panoptic segmentation | |
Jain et al. | Pixel objectness | |
CN111612008A (en) | Image segmentation method based on convolution network | |
CN112215119B (en) | Small target identification method, device and medium based on super-resolution reconstruction | |
CN112950561B (en) | Optical fiber end face defect detection method, device and storage medium | |
Fang et al. | Efficient and robust fragments-based multiple kernels tracking | |
CN107609571B (en) | Adaptive target tracking method based on LARK features | |
Wan et al. | Generative adversarial multi-task learning for face sketch synthesis and recognition | |
CN114782298B (en) | Infrared and visible light image fusion method with regional attention | |
CN113378675A (en) | Face recognition method for simultaneous detection and feature extraction | |
CN113628297A (en) | COVID-19 deep learning diagnosis system based on attention mechanism and transfer learning | |
Nair et al. | T2v-ddpm: Thermal to visible face translation using denoising diffusion probabilistic models | |
CN108921872B (en) | Robust visual target tracking method suitable for long-range tracking | |
CN113160050A (en) | Small target identification method and system based on space-time neural network | |
CN114049531A (en) | Pedestrian re-identification method based on weak supervision human body collaborative segmentation | |
CN111881803B (en) | Face recognition method based on improved YOLOv3 | |
CN112487926A (en) | Scenic spot feeding behavior identification method based on space-time diagram convolutional network | |
Zhang et al. | Artifact detection in endoscopic video with deep convolutional neural networks | |
Chu et al. | Semi-supervised 3d human pose estimation by jointly considering temporal and multiview information | |
Wu et al. | Vehicle detection in high-resolution images using superpixel segmentation and CNN iteration strategy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |