CN113160050A - Small target identification method and system based on space-time neural network - Google Patents

Small target identification method and system based on space-time neural network Download PDF

Info

Publication number
CN113160050A
CN113160050A CN202110319609.5A CN202110319609A CN113160050A CN 113160050 A CN113160050 A CN 113160050A CN 202110319609 A CN202110319609 A CN 202110319609A CN 113160050 A CN113160050 A CN 113160050A
Authority
CN
China
Prior art keywords
gate
time sequence
neural network
lstm
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110319609.5A
Other languages
Chinese (zh)
Other versions
CN113160050B (en
Inventor
刘绍辉
梁智博
姜峰
付森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202110319609.5A priority Critical patent/CN113160050B/en
Publication of CN113160050A publication Critical patent/CN113160050A/en
Application granted granted Critical
Publication of CN113160050B publication Critical patent/CN113160050B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06T5/90
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a small target identification method and a system based on a space-time neural network, wherein the method comprises the following steps: preprocessing an original blurred image by using a super-resolution algorithm to obtain a high-quality image sequence; carrying out logic subtraction operation on adjacent frames of the high-quality image sequence by utilizing a space-time attention mechanism, and capturing and highlighting a suspicious region; extracting depth features in the suspicious region to obtain a feature map time sequence; inputting the characteristic diagram time sequence into a mapping device with confidence output by adopting an LSTM state transfer subnet to obtain a transfer state; and classifying the transition state by using a classifier to obtain a final recognition result, wherein the final recognition result is a target type and a confidence rate. The method is characterized in that the model is self-corrected along with the continuous reading of the frame sequence, and is gradually corrected into a correct category and the confidence rate is continuously improved.

Description

Small target identification method and system based on space-time neural network
Technical Field
The invention relates to the technical field of computer vision, in particular to a small target identification method and system based on a space-time neural network.
Background
With the development of the computer vision field, the target identification technology becomes a research hotspot and is widely applied to the fields of intelligent security, automatic driving, medical auxiliary diagnosis and the like. In practical applications, it is often impractical to require the target to be clearly and easily identified, which makes small target recognition technology more and more interesting in recent years. Some difficulties are prevalent in real-world application scenarios, such as: the small size of the target, the long distance of the target, the low resolution of the image source, etc. pose a serious challenge to the traditional algorithm using a single frame image as the identification basis.
In the current generation of a pervasive target recognition algorithm based on a deep network, a mainstream deep network model is basically adopted as a backbone network and an automatic feature extractor, and a final recognition result is given by a classifier. Because the data set containing a large number of pictures is used for training, the universal algorithms can usually achieve good effects when clearly identifiable objects are faced, but because operations such as convolution and the like are applied to different degrees in a backbone network, the characteristic resolution on a convolution channel is inevitably reduced, and further the performance of the algorithms is seriously degraded when the algorithms are faced with small target problems.
In recent years, related work is developed around the problem of small target recognition, and one idea is to improve the model itself from the perspective of recognizing the model by performing operations such as feature fusion of different sizes, expansion of the acceptance field, introduction of image context and the like, so as to improve the recognition capability of the model on the small target. Another idea is to restore a small target to a clearly recognizable signal as much as possible by means of data enhancement, super-resolution processing, and the like from the viewpoint of image source restoration. Although the two methods have a certain effect, the two methods can not meet the real-time and accurate requirements in a real scene because the two methods only work on a single-frame image.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
To this end, an object of the present invention is to provide a small target identification method based on a spatio-temporal neural network.
Another objective of the present invention is to provide a small target recognition system based on spatiotemporal neural network.
In order to achieve the above object, an embodiment of the invention provides a small target identification method based on a spatiotemporal neural network, which includes the following steps: step S1, acquiring an original blurred image at the current moment; step S2, preprocessing the original blurred image by using a super-resolution algorithm to obtain a high-quality image sequence; step S3, performing logic subtraction operation between adjacent frames of the high-quality image sequence by using a space-time attention mechanism, capturing and highlighting the suspicious region; step S4, extracting the depth features in the suspicious region to obtain a feature map time sequence; step S5, inputting the characteristic diagram time sequence into a mapping device of confidence output by adopting an LSTM state transfer subnet to obtain a corrected characteristic diagram time sequence; and step S6, classifying the corrected characteristic diagram time sequence by using a classifier to obtain a final recognition result, wherein the final recognition result is a target type and a confidence rate.
The small target identification method based on the space-time neural network solves the problem of identification performance reduction caused by the existing single-frame image target identification, after the area where the target is located is approximately locked, the visual capturer and the computing component resources are intensively and continuously distributed to the suspicious target located in the area, and the identification confidence rate is gradually improved through the continuous time sequence image capture within a certain time; meanwhile, along with the gradual operation of the model, some error conclusions obtained at the early stage are also corrected, so that the model has certain error correction performance.
In addition, the small target identification method based on the spatiotemporal neural network according to the above embodiment of the present invention may further have the following additional technical features:
further, in one embodiment of the present invention, the LSTM state transition subnet section employs a significant variant LSTM of the RNN recurrent neural network as a main component, wherein a complete significant variant LSTM cell structure includes an input gate, an output gate, a gate, and a forgetting gate.
Further, in one embodiment of the present invention, the structure of the LSTM cell of the one of the intact significant variants is:
Figure BDA0002992587730000021
wherein i is an input gate, f is a forgetting gate, o is an output gate, g is a gate, and sigmod function σ (x) is 1/(1+ e)-x),φ(x)=(ex-e-x)/(ex+e-x) W is the weight matrix, B is the offset vector, xtAnd ht-1Is the current input.
Further, in an embodiment of the present invention, the transition state is:
Figure BDA0002992587730000022
wherein h istIs the output state of the current time sequence, t is the time sequence, o is the output gate, ctIs a hidden state of the current time sequence, f is a forgetting gate, ct-1In the hidden state of the previous time sequence, i is the input gate and g is the gate.
Further, in an embodiment of the present invention, any one of the deep convolution models is adopted as the backbone network in the step S4 and the step S6.
In order to achieve the above object, another embodiment of the present invention provides a small target recognition system based on a spatiotemporal neural network, including: the system comprises an acquisition module, a super-resolution module, a space-time attention module, a feature extraction module, an LSTM state transition subnet and a classification module, wherein the acquisition module is used for acquiring an original fuzzy image at the current moment; the super-resolution module is used for preprocessing the original blurred image to obtain a high-quality image sequence; the space-time attention module is used for carrying out logic subtraction operation between adjacent frames of the high-quality image sequence, capturing and highlighting a suspicious region; the feature extraction module is used for extracting the depth features in the suspicious region to obtain a feature map time sequence; the LSTM state transfer subnet is used for inputting the characteristic diagram time sequence into a mapping device with confidence output to obtain a corrected characteristic diagram time sequence; and the classification module is used for classifying the corrected characteristic diagram time sequence to obtain a final recognition result, wherein the final recognition result is a type and a confidence rate.
The small target recognition system based on the spatiotemporal neural network solves the problem of the reduction of the recognition performance caused by the conventional single-frame image target recognition, and after the area where the target is located is approximately locked, the visual capturer and the computing component resources are intensively and continuously distributed to the suspicious target in the area, and the recognition confidence rate is gradually improved through the continuous time sequence image capture for a certain time; meanwhile, along with the gradual operation of the model, some error conclusions obtained at the early stage are also corrected, so that the model has certain error correction performance.
In addition, the small target recognition system based on the spatiotemporal neural network according to the above embodiment of the present invention may further have the following additional technical features:
further, in one embodiment of the present invention, the LSTM state transition subnet section employs a significant variant LSTM of the RNN recurrent neural network as a main component, wherein a complete significant variant LSTM cell structure includes an input gate, an output gate, a gate, and a forgetting gate.
Further, in one embodiment of the present invention, the structure of the LSTM cell of the one of the intact significant variants is:
Figure BDA0002992587730000031
wherein i is an input gate, f is a forgetting gate, o is an output gate, g is a gate, and sigmod function σ (x) is 1/(1+ e)-x),φ(x)=(ex-e-x)/(ex+e-x) W is the weight matrix, B is the offset vector, xtAnd ht-1Is the current input.
Further, in an embodiment of the present invention, the transition state is:
Figure BDA0002992587730000032
wherein h istIs the output state of the current time sequence, t is the time sequence, o is the output gate, ctIs a hidden state of the current time sequence, f is a forgetting gate, ct-1In the hidden state of the previous time sequence, i is the input gate and g is the gate.
Further, in an embodiment of the present invention, any one of the deep convolution models is used as a backbone network in the feature extraction module and the classification module.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a method for identifying small targets based on a spatiotemporal neural network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a relationship between attention distribution and recognition accuracy for a specific target category according to an embodiment of the present invention, wherein (a) the attention distribution is distracted and the recognition is incorrect; (b) correct identification for attention concentration;
FIG. 3 is a schematic diagram of the construction of an LSTM cell unit according to one embodiment of the present invention;
FIG. 4 is a sample pictorial illustration of ATSETC4 of an embodiment of the present invention;
FIG. 5 is a schematic diagram of model self-correction capability according to one embodiment of the invention;
FIG. 6 is a diagram illustrating the processing of different size images by SRGAN in accordance with an embodiment of the present invention;
FIG. 7 is a structural diagram of a small target recognition system based on a spatiotemporal neural network according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The method and system for identifying small targets based on the spatio-temporal neural network proposed by the embodiments of the present invention will be described below with reference to the accompanying drawings, and first, the method for identifying small targets based on the spatio-temporal neural network proposed by the embodiments of the present invention will be described with reference to the accompanying drawings.
FIG. 1 is a flow chart of a small target identification method based on a spatiotemporal neural network according to an embodiment of the present invention.
As shown in fig. 1, the small target identification method based on the spatiotemporal neural network comprises the following steps:
in step S1, the original blurred image at the current time is acquired.
In step S2, the original blurred image is preprocessed by using a super-resolution algorithm to obtain a high-quality image sequence.
Specifically, a super-resolution algorithm with a complete training model is used for carrying out initial enhancement on an original blurred image to obtain a data source with better image quality, wherein any effective super-resolution method can be introduced into the super-resolution algorithm.
In step S3, a logical subtraction is performed between adjacent frames of the high-quality image sequence using a spatiotemporal attention mechanism to capture and highlight the suspicious region, so that the subsequent computing resources are more accurately allocated to the actual target.
Formally, the attention score Y of a model for a certain target is taken as the inner product of the weight w and the feature map a:
Figure BDA0002992587730000051
Figure BDA0002992587730000052
wherein A is a feature map, w is a neural network model weight, Y is an attention distribution score of a model inference process, relu is a linear rectification function,
Figure BDA0002992587730000053
the weight gradient of the model is specifically as follows:
Figure BDA0002992587730000054
in the formula (2), the first and second groups,
Figure BDA0002992587730000055
is a gradient weighted sum of each feature element. And combining the formula (1) and the formula (2) to obtain a formula (3), which is a final form of the attention score of the model to the fixed category:
Figure BDA0002992587730000056
in fact, the distribution of attention is closely related to the recognition accuracy, and as shown in fig. 2, when a false recognition occurs, the attention of the model becomes abnormally dispersed, whereas when the recognition is correct, the attention almost completely fits the target contour.
In step S4, the depth features in the suspicious region are extracted to obtain a feature map time series sequence.
That is, the suspicious region output by the spatio-temporal attention mechanism is accepted and its depth features are extracted as inputs to the LSTM state transition sub-network.
In step S5, the LSTM state transition subnet is used to input the signature graph timing sequence into the mapping device for confidence output, and a corrected signature graph timing sequence is obtained.
Further, the LSTM state transition subnet section employs LSTM, an important variant of RNN recurrent neural networks, as a main component. The traditional RNN recurrent neural network unit has the limitation of time length of stored contents due to the problem of gradient disappearance and is not easy to train. As shown in fig. 3, LSTM is a variant of the recurrent neural network specifically designed to solve such problems, and a complete LSTM cell structure includes an input gate, an output gate, a gate and a forgetting gate, which can transmit the current hidden state to the next time to participate in fusion calculation, and simultaneously avoid the memory storage duration limitation caused by the gradient disappearance problem of the general recurrent neural network, and the specific formula is as follows:
Figure BDA0002992587730000061
wherein i is an input gate, f is a forgetting gate, o is an output gate, g is a gate, and sigmod function σ (x) is 1/(1+ e)-x),φ(x)=(ex-e-x)/(ex+e-x) W is the weight matrix, B is the offset vector, xtAnd ht-1Is the current input.
The output state and hidden state calculation process is as follows:
Figure BDA0002992587730000062
wherein h istIs the output state of the current time sequence, t is the time sequence, o is the output gate, ctIs a hidden state of the current time sequence, f is a forgetting gate, ct-1In the hidden state of the previous time sequence, i is an input gate, and g is a gate;
and then, correcting the characteristic diagram time sequence according to the output state and the hidden state.
In step S6, the classifier is used to classify the corrected feature map timing sequence to obtain a final recognition result, where the final recognition result is a target type and a confidence rate.
Specifically, the time-series output of LSTM cells (the corrected feature map time-series) is input into the classifier to obtain the final classification result.
It should be noted that, in step S4 and step S6, any one of the mainstream deep convolution models may be used as the backbone network, and similarly, any effective feature extractor and classifier may be substituted.
In addition, in order to carry out a quantification experiment, the invention constructs an empty small target sequence image challenge data set ATSETC4 to be used as a deep network training basis, and solves the defect that a special data set for training a serialization neural network is lacked in the prior art. The ATSETC4 contains 2400 video clips from real capture and network resources, taking into account a number of scenarios including wilderness, urban, virtual environments, and complex weather environments. As shown in fig. 4, the present invention sets four flight target categories in ATSETC 4: flying birds, hot air balloons, fixed wing drones and rotor drones. Six standard size image subsets are also set: 224 × 224, 112 × 112, 56 × 56, 28 × 28, 14 × 14, 7 × 7 for multi-scale contrast testing (the reason for this arrangement is to accommodate the parameter requirements of a network with full connection layer depth). In particular, the small-scale subset is sampled from the large-scale subset, and the original targets of different sizes are properly fused in the initialization stage of the large-scale subset, so as to increase the challenge of the data set. The final ATSETC4 contains 2400 sequences, each 25 frames in length, totaling 60000 images. In a general sense, a target size of 28 × 28 or less may be considered a small target.
Therefore, the specific working process of the small target identification method proposed by the present invention can be shown in table 1 below.
Figure BDA0002992587730000071
Furthermore, in practical tests, the method has a significant effect on identifying continuous target frames and has strong self-correction capability, as shown in fig. 5, in the early stage identification process of the model, transient error identification occurs due to reasons of signal blurring, small target, complex background and the like, and as the frame sequence is continuously read in, the model performs self-correction, gradually corrects the model into a correct type and continuously improves the confidence rate.
The small target identification method based on the spatiotemporal neural network proposed by the present invention is further explained by a specific embodiment.
Firstly, the SRGAN super-resolution model with the trained model is adopted to directly process the image in the embodiment of the invention.
Next, a cross entropy loss function is used as an optimization function, and the minimum blocksize is set to 16. Specifically, one sequence with 25 frames is set as the minimum batch unit. Initial learning rate of 10-4The learning rate decreases by a 100-fold scaling factor as the experimental accuracy of the validation set stops rising significantly. In particular, the fully-connected layer in the model is regularized by a dropout mode in a training process, and the dropout factor is 0.5 (namely, a part of parameters of the fully-connected layer are randomly frozen to prevent overfitting). In addition, the method is mainly based on a VGG11 deep convolution network which is pre-trained on ImageNet and serves as a feature extractor, and most parameters of a convolution main network are kept in a frozen state in the training process. Accordingly, ATSETC4 is partitioned into a training set and a test set at an 8:2 ratio. Finally, 55-65 rounds of training are averagely needed on subsets with different sizes, and the time for averagely training the subset model with one size is 90 minutes. The experimental device is a single-card NVIDIA GTX 1080Ti GPU, and a machine learning framework adopts a Pythrch. The other models in the comparative experiment used default parameters, and the test phase was still based on ATSETC4 provided by the present invention.
In the experiment, the embodiment of the invention sets the simplified spatio-temporal neural network as Simple _ STNet (without a super-module), and the full version model is named as STNet. In comparison with some leading edge target identification network performance, as shown in table 2 below.
Table 2: simple STNet, STNet and various advanced recognition algorithm performance comparison
Figure BDA0002992587730000081
From table 2, it can be seen that both the simplified version of Simple STNet and the full version of STNet achieve the best performance on almost all size subsets of ATSETC4, and the degradation of full version of STNet at 7-scale is due to the fact that the super-resolution has exceeded the theoretical limit under the 32-fold down-sampling condition, the image restoration process is erroneous, and the performance is degraded. As shown in fig. 6, the results of the original images with different sizes and the SRGAN processing are shown in three rows from left to right: 224 size high definition map, 7 size low definition map and 7 size SRGAN processing results.
Therefore, the small target identification method based on the spatiotemporal neural network provided by the embodiment of the invention solves the problem of identification performance reduction caused by the existing single-frame image target identification, after the area where the target is located is approximately locked, the visual capturer and the computing component resources are intensively and continuously distributed to the suspicious target in the area, and the identification confidence rate is gradually improved through the continuous time sequence image capture for a certain time; meanwhile, along with the gradual operation of the model, some error conclusions obtained at the early stage are also corrected, so that the model has certain error correction performance.
Next, a small object recognition system based on a spatiotemporal neural network proposed according to an embodiment of the present invention will be described with reference to the accompanying drawings.
FIG. 7 is a schematic structural diagram of a small target recognition system based on a spatiotemporal neural network according to an embodiment of the present invention.
As shown in fig. 7, the system 10 includes: the system comprises an acquisition module 100, a super-resolution module 200, a spatiotemporal attention module 300, a feature extraction module 400, an LSTM state transition subnet 500 and a classification module 600.
The obtaining module 100 is configured to obtain an original blurred image at a current time. The super-resolution module 200 is configured to pre-process the original blurred image to obtain a high-quality image sequence. The spatiotemporal attention module 300 is used to perform a logical subtraction operation between adjacent frames of a high quality image sequence to capture and highlight suspicious regions. The feature extraction module 400 is configured to extract depth features in the suspicious region to obtain a feature map time sequence. The LSTM state transition sub-network 500 is used to input the signature graph timing sequence into the mapping device of the confidence output, resulting in a transition state. The classification module 600 is configured to classify the transition state to obtain a final recognition result, where the final recognition result is a category and a confidence rate.
Further, in one embodiment of the invention, the LSTM state transition subnet section employs as a major component the important variant LSTM of the RNN recurrent neural network, wherein a complete important variant LSTM cell structure includes an input gate, an output gate, a gate, and a forgetting gate.
Further, in one embodiment of the present invention, the structure of an entire major variant LSTM cell is:
Figure BDA0002992587730000091
wherein i is an input gate, f is a forgetting gate, o is an output gate, g is a gate, and sigmod function σ (x) is 1/(1+ e)-x),φ(x)=(ex-e-x)/(ex+e-x) W is the weight matrix, B is the offset vector, xtAnd ht-1Is the current input.
Further, in one embodiment of the present invention, the transition state is:
Figure BDA0002992587730000092
wherein h istIs the output state of the current time sequence, t is the time sequence, o is the output gate, ctIs a hidden state of the current time sequence, f is a forgetting gate, ct-1I is the input gate in the hidden state of the previous time sequenceAnd g is a door.
Optionally, in an embodiment of the present invention, either one of the deep convolution model is adopted as the backbone network in the feature extraction module and the classification module.
It should be noted that the foregoing explanation of the embodiment of the small target identification method based on the spatio-temporal neural network is also applicable to the system, and is not repeated here.
According to the small target recognition system based on the space-time neural network, which is provided by the embodiment of the invention, the problem of the reduction of the recognition performance caused by the recognition of the existing single-frame image target is solved, after the area where the target is located is roughly locked, the visual capturer and the computing component resources are intensively and continuously distributed to the suspicious target in the area, and the recognition confidence rate is gradually improved through the continuous time sequence image capture for a certain time; meanwhile, along with the gradual operation of the model, some error conclusions obtained at the early stage are also corrected, so that the model has certain error correction performance.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A small target identification method based on a space-time neural network is characterized by comprising the following steps:
step S1, acquiring an original blurred image at the current moment;
step S2, preprocessing the original blurred image by using a super-resolution algorithm to obtain a high-quality image sequence;
step S3, performing logic subtraction operation between adjacent frames of the high-quality image sequence by using a space-time attention mechanism, capturing and highlighting the suspicious region;
step S4, extracting the depth features in the suspicious region to obtain a feature map time sequence;
step S5, inputting the characteristic diagram time sequence into a mapping device of confidence output by adopting an LSTM state transfer subnet to obtain a corrected characteristic diagram time sequence;
and step S6, classifying the corrected characteristic diagram time sequence by using a classifier to obtain a final recognition result, wherein the final recognition result is a target type and a confidence rate.
2. The spatiotemporal neural network-based small target identification method according to claim 1, wherein the LSTM state transition sub-network portion employs a significant variant LSTM of RNN recurrent neural network as a main component, wherein a complete significant variant LSTM cell structure comprises an input gate, an output gate, a gate and a forgetting gate.
3. The spatiotemporal neural network-based small target identification method of claim 2, wherein the structure of the one complete significant variant LSTM cell is:
Figure FDA0002992587720000011
wherein i is an input gate, f is a forgetting gate, o is an output gate, g is a gate, and sigmod function σ (x) is 1/(1+ e)-x),φ(x)=(ex-e-x)/(ex+e-x) W is the weight matrix, B is the offset vector, xtAnd ht-1Is the current input.
4. The spatiotemporal neural network-based small target identification method according to claim 1, wherein the transition state is:
Figure FDA0002992587720000012
wherein h istIs the output state of the current time sequence, t is the time sequence, o is the output gate, ctIs a hidden state of the current time sequence, f is a forgetting gate, ct-1In the hidden state of the previous time sequence, i is the input gate and g is the gate.
5. The spatio-temporal neural network-based small object recognition method according to claim 1, wherein any one of deep convolution models is adopted as a backbone network in the steps S4 and S6.
6. A small target recognition system based on a spatiotemporal neural network, comprising: an acquisition module, a super-resolution module, a time-space attention module, a feature extraction module, an LSTM state transfer subnet and a classification module, wherein,
the acquisition module is used for acquiring an original blurred image at the current moment;
the super-resolution module is used for preprocessing the original blurred image to obtain a high-quality image sequence;
the space-time attention module is used for carrying out logic subtraction operation between adjacent frames of the high-quality image sequence, capturing and highlighting a suspicious region;
the feature extraction module is used for extracting the depth features in the suspicious region to obtain a feature map time sequence;
the LSTM state transfer subnet is used for inputting the characteristic diagram time sequence into a mapping device with confidence output to obtain a corrected characteristic diagram time sequence;
and the classification module is used for classifying the corrected characteristic diagram time sequence to obtain a final recognition result, wherein the final recognition result is a type and a confidence rate.
7. The spatiotemporal neural network-based small target recognition system of claim 1, wherein the LSTM state transition sub-network portion employs a significant variant LSTM of the RNN recurrent neural network as a main component, wherein one complete significant variant LSTM cell structure comprises an input gate, an output gate, a gate and a forgetting gate.
8. The spatiotemporal neural network-based small object recognition system of claim 7, wherein the structure of the one entire significant variant LSTM cell is:
Figure FDA0002992587720000021
wherein i is an input gate, f is a forgetting gate, o is an output gate, g is a gate, and sigmod function σ (x) is 1/(1+ e)-x),φ(x)=(ex-e-x)/(ex+e-x) W is the weight matrix, B is the offset vector, xtAnd ht-1Is the current input.
9. The spatiotemporal neural network-based small object recognition system of claim 6, wherein the transition state is:
Figure FDA0002992587720000022
wherein h istIs the output state of the current time sequence, t is the time sequence, o is the output gate, ctIs a hidden state of the current time sequence, f is a forgetting gate, ct-1In the hidden state of the previous time sequence, i is the input gate and g is the gate.
10. The spatiotemporal neural network-based small target recognition system of claim 6, wherein either of the feature extraction module and the classification module employs a deep convolution model as a backbone network.
CN202110319609.5A 2021-03-25 2021-03-25 Small target identification method and system based on space-time neural network Active CN113160050B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110319609.5A CN113160050B (en) 2021-03-25 2021-03-25 Small target identification method and system based on space-time neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110319609.5A CN113160050B (en) 2021-03-25 2021-03-25 Small target identification method and system based on space-time neural network

Publications (2)

Publication Number Publication Date
CN113160050A true CN113160050A (en) 2021-07-23
CN113160050B CN113160050B (en) 2023-08-25

Family

ID=76884634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110319609.5A Active CN113160050B (en) 2021-03-25 2021-03-25 Small target identification method and system based on space-time neural network

Country Status (1)

Country Link
CN (1) CN113160050B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091428A (en) * 2022-12-29 2023-05-09 国网电力空间技术有限公司 High-precision intelligent power transmission line inspection image tower dividing method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170262995A1 (en) * 2016-03-11 2017-09-14 Qualcomm Incorporated Video analysis with convolutional attention recurrent neural networks
CN108765394A (en) * 2018-05-21 2018-11-06 上海交通大学 Target identification method based on quality evaluation
WO2019039157A1 (en) * 2017-08-24 2019-02-28 日立オートモティブシステムズ株式会社 Device and method for identifying region including small object around vehicle
CN111402131A (en) * 2020-03-10 2020-07-10 北京师范大学 Method for acquiring super-resolution land cover classification map based on deep learning
CN111524135A (en) * 2020-05-11 2020-08-11 安徽继远软件有限公司 Image enhancement-based method and system for detecting defects of small hardware fittings of power transmission line
US20200279157A1 (en) * 2017-10-16 2020-09-03 Illumina, Inc. Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks
CN111832509A (en) * 2020-07-21 2020-10-27 中国人民解放军国防科技大学 Unmanned aerial vehicle weak and small target detection method based on space-time attention mechanism
CN112215119A (en) * 2020-10-08 2021-01-12 华中科技大学 Small target identification method, device and medium based on super-resolution reconstruction
CN112288778A (en) * 2020-10-29 2021-01-29 电子科技大学 Infrared small target detection method based on multi-frame regression depth network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170262995A1 (en) * 2016-03-11 2017-09-14 Qualcomm Incorporated Video analysis with convolutional attention recurrent neural networks
WO2019039157A1 (en) * 2017-08-24 2019-02-28 日立オートモティブシステムズ株式会社 Device and method for identifying region including small object around vehicle
US20200279157A1 (en) * 2017-10-16 2020-09-03 Illumina, Inc. Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks
CN108765394A (en) * 2018-05-21 2018-11-06 上海交通大学 Target identification method based on quality evaluation
CN111402131A (en) * 2020-03-10 2020-07-10 北京师范大学 Method for acquiring super-resolution land cover classification map based on deep learning
CN111524135A (en) * 2020-05-11 2020-08-11 安徽继远软件有限公司 Image enhancement-based method and system for detecting defects of small hardware fittings of power transmission line
CN111832509A (en) * 2020-07-21 2020-10-27 中国人民解放军国防科技大学 Unmanned aerial vehicle weak and small target detection method based on space-time attention mechanism
CN112215119A (en) * 2020-10-08 2021-01-12 华中科技大学 Small target identification method, device and medium based on super-resolution reconstruction
CN112288778A (en) * 2020-10-29 2021-01-29 电子科技大学 Infrared small target detection method based on multi-frame regression depth network

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
ADITYA CHATTOPADHAY ET AL.: "Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks", 《2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION》, pages 841 - 843 *
DEZHONG XU ET AL: "Visual Tracking by Combining the Structure-Aware Network and Spatial-Temporal Regression", 《 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)》 *
YUXIN PENG ET AL.: "Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》, vol. 29, no. 3, XP011714235, DOI: 10.1109/TCSVT.2018.2808685 *
佚名: "基于深度学习的长期记忆模型", 《上海节能》 *
佚名: "基于深度学习的长期记忆模型", 《上海节能》, 31 December 2020 (2020-12-31), pages 304 *
姜峰等: "a water surface moving target detection based on information fusion using deep learning", 《JOURNAL OF PHYSICS CONFERENCE SERIES》 *
姜峰等: "a water surface moving target detection based on information fusion using deep learning", 《JOURNAL OF PHYSICS CONFERENCE SERIES》, 31 August 2020 (2020-08-31) *
杜圣东 等: "一种基于序列到序列时空注意力学习的交通流预测模型", 《计算机研究与发展》 *
杨杰: "《人工智能基础》", 30 April 2020, pages: 132 - 135 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091428A (en) * 2022-12-29 2023-05-09 国网电力空间技术有限公司 High-precision intelligent power transmission line inspection image tower dividing method and system
CN116091428B (en) * 2022-12-29 2023-09-01 国网电力空间技术有限公司 High-precision intelligent power transmission line inspection image tower dividing method and system

Also Published As

Publication number Publication date
CN113160050B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
Wang et al. Detect globally, refine locally: A novel approach to saliency detection
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN110135366B (en) Shielded pedestrian re-identification method based on multi-scale generation countermeasure network
Chen et al. Reverse attention for salient object detection
Arbelle et al. Microscopy cell segmentation via adversarial neural networks
CN112507777A (en) Optical remote sensing image ship detection and segmentation method based on deep learning
CN113378600B (en) Behavior recognition method and system
Jain et al. Pixel objectness
US20210326638A1 (en) Video panoptic segmentation
CN111612008A (en) Image segmentation method based on convolution network
Fang et al. Efficient and robust fragments-based multiple kernels tracking
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN112950561B (en) Optical fiber end face defect detection method, device and storage medium
CN112215119B (en) Small target identification method, device and medium based on super-resolution reconstruction
CN107609571B (en) Adaptive target tracking method based on LARK features
Wan et al. Generative adversarial multi-task learning for face sketch synthesis and recognition
CN114782298B (en) Infrared and visible light image fusion method with regional attention
CN113628297A (en) COVID-19 deep learning diagnosis system based on attention mechanism and transfer learning
CN113378675A (en) Face recognition method for simultaneous detection and feature extraction
CN110135435B (en) Saliency detection method and device based on breadth learning system
CN108921872B (en) Robust visual target tracking method suitable for long-range tracking
Nair et al. T2V-DDPM: Thermal to visible face translation using denoising diffusion probabilistic models
CN113160050A (en) Small target identification method and system based on space-time neural network
CN114049531A (en) Pedestrian re-identification method based on weak supervision human body collaborative segmentation
CN111881803B (en) Face recognition method based on improved YOLOv3

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant