CN114022759A

CN114022759A - Airspace finite pixel target detection system and method fusing neural network space-time characteristics

Info

Publication number: CN114022759A
Application number: CN202111160863.1A
Authority: CN
Inventors: 薛晗庆; 王晓天; 于雪洁; 窦小明; 陈超; 潘红九; 胡煜; 尹琼; 张运
Original assignee: Beijing Institute of Near Space Vehicles System Engineering
Current assignee: Beijing Institute of Near Space Vehicles System Engineering
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-02-08

Abstract

A space domain finite pixel target detection system and method fusing neural network space-time characteristics. The system comprises a data enhancement module, a deep learning target detection module, a time sequence signal processing module, an optical flow detection module and a result fusion module; the data enhancement module removes random noise in the image by using a filtering algorithm, removes noise in a frequency domain by using a frequency domain processing technology, and increases the image contrast by using histogram equalization; the deep learning detection module finishes a model training process to obtain a stable deep learning target detector and provides a single-frame target detection result; the time sequence signal processing module judges whether a target detection result in the image is a false alarm result caused by background interference and noise interference; the optical flow detection module calculates optical flow information of the optical flow detection module, judges whether a target detection result is a false alarm result caused by background interference and noise interference through the optical flow characteristics of image motion, and provides time domain information of a limited pixel target on two adjacent frames of images; and the result fusion module fuses the information and comprehensively judges the target detection effectiveness in a cluster analysis mode.

Description

Airspace finite pixel target detection system and method fusing neural network space-time characteristics

Technical Field

The invention relates to the fields of aerospace, target detection, time-frequency signal analysis, sea surface ship identification, surface feature target identification, remote sensing image interpretation and the like, in particular to an airspace finite pixel target detection system and method fusing the spatial-temporal characteristics of a neural network.

Background

With the development of deep learning technology, the target detection technology becomes the most important and core technology direction of artificial intelligence, and has been applied in a plurality of fields. In the target detection tasks of the classical scenes, some classical neural network algorithms with large target size, clear target semantic information and large target sample number have good effect in the target detection tasks. However, in some application scenarios, such as the aerospace field, the target exhibits more complex characteristics: small target size, easy annihilation of the entire image, weak optical characteristics, few pixels, various morphological changes, and the like. The traditional computer vision technology is difficult to discover the targets, and the application efficiency of the identification and detection algorithm in a practical system is influenced by the missing detection and the false detection caused by the weak target characteristics of the limited pixel blocks. The classical constant false alarm detection only models background information statistics, does not consider complete modeling of target information, only utilizes partial characteristics of a target imaging area for detection, does not fully utilize global characteristics, and has high false alarm rate; the traditional feature extraction method needs to consider information modeling or rule matching in a specific environment, and the specific statistical modeling method and the rule matching mode are only suitable for a set environment. Meanwhile, changes of the profile, the direction, the size, the background interference and the like of the target can obviously influence the actual target detection, and manually set background information modeling and rule matching are difficult to realize in general and have large limitations. In summary, the existing detection method has great limitation on the task of detecting the limited pixel target, and in order to solve the high-accuracy detection problem of the special target, the algorithm detection accuracy is improved, the accuracy and the false alarm rate are reduced, and a novel target detection technology needs to be developed urgently.

Most of the existing limited pixel target recognition algorithms applied in engineering belong to the class based on background modeling and feature matching, and the algorithms have great limitation in use because specific use scenes need to be set.

Disclosure of Invention

The technical problem solved by the invention is as follows: the system and the method for detecting the airspace limited pixel target with the fusion of the spatial-temporal characteristics of the neural network overcome the defects of the prior art, and the characteristics of the limited pixel target are considered from two dimensions of a time domain and a space domain through the target detection network, a time sequence signal processing network and an optical flow detection network, so that the influence of background interference, system noise and typical environment interference on the detection process is reduced. Compared with methods such as background modeling, the method is higher in practicability, and the problem of low model accuracy caused by insufficient learning degree of a single model is solved through a multi-model fusion mode.

The technical scheme of the invention is as follows: a spatial domain finite pixel target detection system fusing neural network space-time characteristics comprises a data enhancement module, a deep learning target detection module, a time sequence signal processing module, an optical flow detection module and a result fusion module;

the data enhancement module receives a data set image as input, removes random noise in the image by using a filtering algorithm, removes noise in a frequency domain by using a frequency domain processing technology, and increases the image contrast by using histogram equalization; after the image preprocessing is finished, the image and the image labeling information are input into a data enhancement module, the data enhancement module determines whether the image is rotated and cut according to a rotation field and a cutting field in the labeling information, and in addition, the image is turned over, mirrored, zoomed and added with random noise to generate a new image and an annotation;

the deep learning detection module finishes a model training process by taking training data as input to obtain a stable deep learning target detector and give a single-frame target detection result; sending the detection information of the n frames to a time sequence signal processing module;

the time sequence signal processing module takes n frame image detection coordinates as input, judges whether a target detection result in an nth frame image is a false alarm result caused by background interference and noise interference through an LSTM model prediction result, and excavates information of a target on a time domain through the LSTM;

the optical flow detection module reads the T moment and the T-1 space-time image, calculates the optical flow information thereof, judges whether the target detection result is a false alarm result caused by background interference and noise interference according to the optical flow characteristics of image motion, and provides time domain information of the limited pixel target on two adjacent frames of images;

and the result fusion module is used for fusing the information of the target detection module, the time sequence signal processing module and the optical flow detection module, and comprehensively judging the target detection effectiveness through a cluster analysis mode by constructing a sample set D and a cluster target set K.

The data enhancement module comprises a data enhancement sub-module based on a generated countermeasure network and a traditional data enhancement sub-module; the method comprises the steps that a data enhancement module for generating an antagonistic network is used, an antagonistic technology generated in a deep learning theory is used, the network consists of a generator and an antagonistic device, the generator is responsible for generating sample data, the antagonistic device is responsible for correcting errors of the generator, the generator and the antagonistic device are iterated until convergence, and then a stable model for producing a training sample can be obtained; the traditional data enhancement module uses classical image enhancement technologies including image copying, rotation, shearing and noise addition, and uses a certain amount of image data as input to realize sample enhancement and amplification.

The specific work flow inside the traditional image enhancement submodule is as follows:

11) reading image data in the data set;

12) removing random noise in the image by using a Gaussian filter, removing salt and pepper noise by using a median filter, and removing the rest of noise by using a bilateral filter to obtain a filtered image blurrimg;

13) performing fast Fourier transform on the image blurImg, transforming the image into a frequency domain, removing noise in the frequency domain by using a frequency domain median filter, and obtaining an image fftImg through inverse transformation after the processing is finished;

14) counting pixels of the image fftImg to obtain a gray distribution histogram, normalizing the histogram to a 0-1 interval, calculating cumulative distribution by using the normalized gray histogram, calculating an image gray conversion relation f after the original image and the histogram are equalized by using a quantization formula, calculating and replacing the gray value of the original image according to the f to obtain an image histImg;

15) defining an image contrast enhancement function F, inputting the image histImg into the function F to obtain an image Fimg after contrast enhancement, and finishing the image preprocessing work after the contrast enhancement is finished;

16) reading in the preprocessed image and image annotation information, and reading an image name and annotation information corresponding to the image name from the annotation information;

17) judging whether the rotation information field in the annotation information is True, if not, jumping to the step 14, and if so, turning to the step 18);

18) generating a random number R between open intervals (0, 360), rotating by R degrees by taking the center of a target circumscribed rectangle as a rotation center, and correspondingly calculating the image labeling information after rotation according to a transformation relation, wherein the image labeling information comprises a rectangle position frame coordinate and image pixel level position information;

19) judging whether the cutting information field in the labeling information is True, if not, jumping to the step 16), and if yes, jumping to the step 110);

110) acquiring a target upper left corner (x, y), a height ht and a width wt, acquiring an image height h and a width w, generating a random number delta w e (0, w-wt) and delta h e (0, h-ht), acquiring a cutting region left corner (x-delta w, y-delta h), a cutting region width w +2 delta w and a cutting region height h +2 delta h, correspondingly calculating image labeling information after rotation according to a transformation relation, wherein the image labeling information comprises a rectangular position frame coordinate and image pixel level position information, and acquiring a newly generated image after target cutting and a comment;

111) turning and mirroring the image to generate a new image and an annotation file;

112) zooming the image to form a new picture, and generating the new picture and an annotation file according to the zooming conversion relation;

113) random noise is added into the image to generate a new picture and an annotation file.

The data enhancement submodule based on the generation countermeasure network has the working process that:

21) reading a training image I, setting Pr as the distribution of a real training image and Pz as the data distribution of a hidden variable, generating a network G, judging a network D, and generating a network iteration number k;

22) judging whether the generated network G is converged, and if so, jumping to the step 211);

23) the iteration counter n is set to 0;

24) sampling m samples from Pz-N (0, I) { z (1), z (2).. z (m);

25) sample m true samples { x (1), x (2).. x (m) } from Pr;

26) calculating a loss function value;

27) updating the network D by using a gradient ascending method, and updating the iteration times n;

28) judging whether n is smaller than k, and jumping to step 24 if n is not smaller than k);

29) sampling m samples from Pz-N (0, I) { z (1), z (2).. z (m);

210) updating the network G by using a gradient descent method, and jumping to the step 22);

211) reading K images K;

212) sending the image to a generation network G;

213) forward reasoning of the model;

214) and acquiring a generated sample K'.

The deep learning target module comprises the following working processes:

31) reading in preprocessed de-noised image data and unfolding the pre-processed de-noised image data into a neural network tensor;

32) reading in an image annotation file, and analyzing the coordinates of a target external matrix;

33) reading the mask marking information of the image, creating a mask image with the same size as the image to be processed, filling the target area with white, and filling the background with black;

34) reading the neural network hyperparameters including the learning rate, the iteration times and the convergence error interval, and simultaneously reading the image scale parameters;

35) zooming the image according to the scale zooming parameter;

36) constructing a batch of training data according to the image batch data in the hyper-parameter, and sending the batch of training data into a deep neural network;

37) seeking values of all hidden layers and output layers of the neural network;

38) solving for an error e between the output layer of the neural network and the actual value of the labeling information;

39) if the error is beyond the specified range, ending the training and jumping to the step 313);

310) calculating neuron error quantities of all layers;

311) solving an error gradient;

312) updating the weights using the error gradients, jumping to step 36);

313) integrating with deep neural network models at other scales, ending the training process, and entering an image inference stage step 314);

314) inputting the image and annotation file enhanced by the preprocessed data into a feature extraction network;

315) the feature extraction network obtains feature Map of the image through convolution operation;

316) establishing an interested image area ROI for each point in the feature Map;

317) obtaining the number Num of the ROIs;

318) selecting an ROI to be sent into an RPN network, and generating a candidate target area by the RPN network;

319) carrying out binarization classification on the ROI candidate target area to distinguish a foreground from a background;

320) performing regression on the frame of the candidate region;

321) judging whether the number of ROI processing is NUM or not, namely whether all candidate ROIs are processed or not, and jumping to the step 35 if the condition is not met);

322) according to the ROI classification and frame regression results, carrying out ROI region screening and filtering by using a threshold value set in the super parameter;

323) acquiring the number of the screened ROI;

324) selecting an ROI area for multi-classification;

325) regressing the frame of the ROI area again;

326) generating an image mask for the target in the ROI area, and obtaining accurate pixel-level position information calculation of the target;

327) and judging whether the ROI areas are processed completely, if so, ending the workflow of the deep learning target module, otherwise, skipping to the step 324).

The specific work flow inside the time sequence signal processing module is as follows:

41) acquiring n frames of image output static target detection coordinate information including n frame of image coordinate information Cn;

42) constructing the target detection coordinate result of the n frames of images into an LSTM model input sequence S;

43) inputting the sequence S into an LSTM model, and obtaining coordinate prediction Sn of the LSTM model for the corresponding moment of the nth frame image;

44) judging whether the Cn result is consistent with Sn, if so, jumping to a step 46), and if not, jumping to a step 45);

45) updating the prediction result of the nth frame image into the prediction result of the LSTM;

46) and taking the prediction result of the LSTM as the prediction result of the image of the nth frame.

The specific work flow inside the optical flow detection module is as follows:

51) reading in two images ImgT and ImgT-1;

52) calculating the optical flow information of the two images in the X direction pixel by pixel to obtain an optical flow characteristic diagram X;

53) calculating the optical flow information of the two images in the x direction pixel by pixel to obtain an optical flow characteristic diagram Y;

54) constructing an optical flow detection network input I;

55) inputting the I into the convolution layer and the full link layer to perform feature extraction to obtain a feature map M;

56) inputting M into a coordinate regression model, and calculating target coordinate information;

57) and outputting the target coordinates.

The result fusion module has the working process that:

61) setting the number of the targets identified by the detection network as a clustering center number k, and constructing a static detection result, a time sequence signal detection result and an optical flow detection result into a sample space D;

62) randomly screening K samples from the D as centers;

63) aggregating the other | D | -k samples onto k center samples according to a distance function;

64) updating the center coordinates;

65) if the central coordinate is changed, turning to step 63);

66) taking an element E in K clusters;

67) judging whether the size of E is equal to 3, and if not, turning to the step 69);

68) adding a reserved result prediction set in the neural network training step, namely sampling in a sample to obtain an image and annotation information;

69) and judging whether the traversal of the K clusters is finished, if so, ending the time sequence signal working module, and otherwise, turning to the step 66.

A space domain finite pixel target detection method fusing neural network space-time characteristics comprises the following steps:

1) the data enhancement module receives a data set image as input, removes random noise in the image by using a filtering algorithm, removes noise in a frequency domain by using a frequency domain processing technology, and increases the image contrast by using histogram equalization; after the image preprocessing is finished, the image and the image labeling information are input into a data enhancement module, the data enhancement module determines whether the image is rotated and cut according to a rotation field and a cutting field in the labeling information, and in addition, the image is turned over, mirrored, zoomed and added with random noise to generate a new image and an annotation;

2) the deep learning detection module finishes a model training process by taking training data as input to obtain a stable deep learning target detector and give a single-frame target detection result; sending the detection information of the n frames to a time sequence signal processing module;

3) the time sequence signal processing module takes n frame image detection coordinates as input, judges whether a target detection result in an nth frame image is a false alarm result caused by background interference and noise interference through an LSTM model prediction result, and excavates information of a target on a time domain through the LSTM;

4) the optical flow detection module reads the T moment and the T-1 space-time image, calculates the optical flow information thereof, judges whether the target detection result is a false alarm result caused by background interference and noise interference according to the optical flow characteristics of image motion, and provides time domain information of the limited pixel target on two adjacent frames of images;

5) and the result fusion module fuses information of the target detection module, the time sequence signal processing module and the optical flow detection module, and comprehensively judges the target detection effectiveness through a cluster analysis mode by constructing a sample set D and a cluster target set K.

Compared with the prior art, the invention has the advantages that:

(1) the finite pixel target recognition algorithm provided by the invention considers the image characteristics of the finite pixel target, improves the accuracy of target detection and realizes data driving by constructing a deep learning detection network, and has better generalization capability compared with the traditional background modeling and pattern matching method;

(2) the finite pixel target recognition algorithm provided by the invention considers the time characteristics of some pixel targets, namely the detection task of the finite pixel target is considered from the time domain, and the target detection accuracy is improved by fusing the time domain information and the space domain information;

(3) the multi-model fusion method provided by the invention considers the problem that the single model is insufficient in accuracy of the finite pixel target detection task, and avoids the influence of the single model on the overall accuracy and the overall capability of the detection task.

(4) The method can obviously improve the accuracy of target detection, increases the number and the abundance of samples by using a generation countermeasure network and a classical data enhancement mode, effectively reduces the missing rate of the algorithm by better utilizing the time and the space characteristics of the limited pixel target and further improves the efficiency of the target detection. Therefore, the technology is of great significance in engineering practice.

Drawings

Fig. 1 is a system configuration diagram.

Fig. 2 is a flowchart of a conventional image enhancement module.

Fig. 3 is a work flow diagram of a data enhancement sub-module based on a generation countermeasure network.

Fig. 4 is a flowchart of the deep learning target detection module.

Fig. 5 is a flowchart of the operation of the timing signal processing module.

FIG. 6 is a flow chart of the optical flow detection module operation.

FIG. 7 is a flowchart of the results fusion module operation.

Detailed Description

With the development of deep learning technology, the target detection technology becomes the most important and core technology direction of artificial intelligence, and has been applied in a plurality of fields. In the target detection tasks of the classical scenes, some classical neural network algorithms with large target size, clear target semantic information and large target sample number have good effect in the target detection tasks. However, in some application scenarios, such as the aerospace field, the target exhibits more complex characteristics: small target size, easy annihilation of the entire image, weak optical characteristics, few pixels, various morphological changes, and the like. The traditional computer vision technology is difficult to discover the targets, and the application efficiency of the identification and detection algorithm in a practical system is influenced by the missing detection and the false detection caused by the weak target characteristics of the limited pixel blocks. The time-space characteristics on the image sequence are fused by using the deep learning algorithm, so that the problems of false detection and false omission caused by the traditional algorithm are effectively avoided, the limited pixel target identification precision is effectively improved, and the method has high engineering practice application value.

As shown in fig. 1, the spatial domain finite pixel target detection system fusing the neural network spatiotemporal characteristics of the present invention comprises 5 modules, which are respectively a data enhancement module, a deep learning target detection module, a time sequence signal processing module, an optical flow detection module, and a result fusion module; wherein, the data enhancement module relates to a data enhancement submodule based on the generation countermeasure network and a traditional data enhancement submodule.

The data enhancement module receives a data set image as input, removes random noise in the image by using a filtering algorithm, removes noise in a frequency domain by using a frequency domain processing technology, and increases the image contrast by using histogram equalization; after the image is preprocessed, the image and the image labeling information are input into a data enhancement module, the module determines whether the image is rotated and cut according to a rotation field and a cutting field in the labeling information, and in addition, the image can be turned over, mirrored, zoomed and added with random noise to generate new images and annotations, and the aim is to increase the number of the images and the diversity of targets. On the basis of classical data enhancement, a production confrontation network technology is introduced into a data enhancement module, a training sample generator is constructed by using a generation confrontation network, and the number of training samples is increased.

The deep learning detection module finishes the model training process by taking training data as input to obtain a stable deep learning target detector, and can give a single-frame target detection result. And the detection information of the n frames is sent to a time sequence signal processing module.

The time sequence signal processing module takes n frame image detection coordinates as input, judges whether a target detection result in an nth frame image is a false alarm result caused by interference signals such as background interference, noise interference and the like through an LSTM model prediction result, and excavates information of a target on a time domain through the LSTM.

And the optical flow detection module is used for reading the T moment and the T-1 space-time image, calculating optical flow information of the T moment and the T-1 space-time image, judging whether a target detection result is a false alarm result caused by interference signals such as background interference, noise interference and the like according to the optical flow characteristics of image motion, and providing time domain information of the limited pixel target on two adjacent frames of images.

And the result fusion module is used for fusing the information of the target detection module, the time sequence signal processing module and the optical flow detection module, comprehensively judging the target detection effectiveness by constructing a sample set D and a clustering target set K and in a clustering analysis mode, considering the problem that a single model is insufficient in the accuracy of a limited pixel target detection task, and avoiding the influence of the single model on the overall precision and the overall capacity of the detection task.

As shown in fig. 2, the workflow of the conventional image enhancement sub-module is as follows:

1. reading image data in the data set;

2. removing random noise in the image by using a Gaussian filter, removing salt and pepper noise by using a median filter, and removing the rest of noise by using a bilateral filter to obtain a filtered image blurrimg;

3, performing fast Fourier transform on blumImg, transforming the image into a frequency domain, removing noise in the frequency domain by using a frequency domain median filter, and performing inverse transformation after the processing to obtain an image fftImg;

4. counting pixels of the fftImg to obtain a gray distribution histogram, normalizing the histogram to a 0-1 interval, calculating cumulative distribution by using the normalized gray histogram, calculating an image gray conversion relation f after the original image and the histogram are equalized by using a quantization formula, calculating and replacing the gray value of the original image according to the f to obtain an image histImg;

5. and defining an image contrast enhancement function F, inputting the histImg into the function F to obtain an image Fimg after contrast enhancement, and finishing the image preprocessing work after the contrast enhancement is finished.

6. Reading in the preprocessed image and image annotation information, and reading an image name and annotation information corresponding to the image name from the annotation information;

7. judging whether the rotation information field in the annotation information is True, if not, skipping to the step 4, and judging whether the image can be cut;

8. generating a random number R between open intervals (0, 360), rotating by R degrees by taking the center of a target circumscribed rectangle as a rotation center, and correspondingly calculating the image labeling information after rotation according to a transformation relation, such as the coordinates of a rectangular position frame and the position information of an image pixel level;

9. judging whether the cutting information field in the labeling information is True, if not, jumping to step 6, and judging whether the image can be cut;

10. acquiring a target upper left corner (x, y), a height ht and a width wt, acquiring an image height h and a width w, generating a random number delta w e (0, w-wt) and delta h e (0, h-ht), acquiring a cutting region left corner (x-delta w, y-delta h), a cutting region width w +2 delta w and a cutting region height h +2 delta h, correspondingly calculating image labeling information after rotation according to a transformation relation, such as a rectangular position frame coordinate and image pixel level position information, and acquiring a newly generated image and a comment after target cutting;

11. turning and mirroring the image to generate a new image and an annotation file;

12. zooming the image to form a new picture, and generating the new picture and an annotation file according to the zooming conversion relation;

13. and adding random noise into the image to generate a new picture and an annotation file, and jumping out of the image enhancement module after the step is finished.

As shown in fig. 3, the work flow of the data enhancement submodule based on the generation of the countermeasure network is as follows:

1. reading a training image I, setting Pr as the distribution of a real training image and Pz as the data distribution of a hidden variable, generating a network G, judging a network D, and generating a network iteration number k;

2. judging whether the generated network G is converged, and if so, turning to the step 11;

3. the iteration counter n is set to 0;

4. sampling m samples from Pz-N (0, I) { z (1), z (2).. z (m);

5. sample m true samples { x (1), x (2).. x (m) } from Pr;

6. calculating a loss function value;

7. updating the network D by using a gradient ascending method, and updating the iteration times n;

8. judging whether n is smaller than k, if not, turning to the step 4;

9. sampling m samples from Pz-N (0, I) { z (1), z (2).. z (m);

10. updating the network G by using a gradient descent method, and turning to the step 2;

11. reading K images K;

12. sending the image to a generation network G;

13. forward reasoning of the model;

14. and acquiring a generated sample K'.

As shown in fig. 4, the deep learning objective module workflow is:

1. reading in preprocessed de-noised image data and unfolding the pre-processed de-noised image data into a neural network tensor;

2. reading in an image annotation file, and analyzing the coordinates of a target external matrix;

3. reading the mask marking information of the image, creating a mask image with the same size as the image to be processed, filling the target area with white, and filling the background with black;

4. reading the neural network hyperparameters including learning rate, iteration times, convergence error interval and the like, and simultaneously reading image scale parameters (large L, medium M and small S);

5. zooming the image according to the scale zooming parameter;

6. constructing a batch of training data according to the image batch data in the hyper-parameter, and sending the batch of training data into a deep neural network;

7. seeking values of all hidden layers and output layers of the neural network;

8. solving for an error e between the output layer of the neural network and the actual value of the labeling information;

9. if the error is in the range beyond the parameter, ending the training and jumping to the step 13;

10. calculating neuron error quantities of all layers;

11. solving an error gradient;

12. updating the weights using the error gradients, jumping to step 6

13. Integrating with deep neural network models at other scales, finishing the training process, and entering an image reasoning stage;

14. inputting the image and annotation file enhanced by the preprocessed data into a feature extraction network;

15. the feature extraction network obtains feature Map of the image through convolution operation;

16. establishing an interested image area ROI for each point in the feature Map;

17. obtaining the number Num of the ROIs;

18. selecting an ROI to be sent into an RPN network, and generating a candidate target area by the RPN network;

carrying out binarization classification on the ROI candidate target region to distinguish a foreground from a background;

20. performing regression on the frame of the candidate region;

21. judging whether the number of ROI processing is NUM or not, namely whether all candidate ROIs are processed or not, and jumping to the step 5 if the condition is not met;

22. according to the ROI classification and frame regression results, carrying out ROI region screening and filtering by using a threshold value set in the super parameter;

23. acquiring the number of the screened ROI;

24. selecting an ROI area for multi-classification;

25. regressing the frame of the ROI area again;

26. generating an image mask for the target in the ROI area, and obtaining accurate pixel-level position information calculation of the target;

27. judging whether the ROI is processed completely or not, and otherwise, skipping to the step 24;

as shown in fig. 5, the timing signal processing module has a work flow of:

1. acquiring n frames of image output static target detection coordinate information including n frame of image coordinate information Cn;

2. constructing the target detection coordinate result of the n frames of images into an LSTM model input sequence S;

3. inputting the sequence S into an LSTM model, and obtaining coordinate prediction Sn of the LSTM model for the corresponding moment of the nth frame image;

4. judging whether the Cn result is consistent with Sn, if so, turning to step 6;

5. updating the prediction result of the nth frame image into the prediction result of the LSTM;

6. and taking the prediction result of the LSTM as the prediction result of the image of the nth frame.

As shown in fig. 6, the optical flow detection module has the following workflow:

1. reading in two images ImgT and ImgT-1;

2. calculating the optical flow information of the two images in the X direction pixel by pixel to obtain an optical flow characteristic diagram X;

3. calculating the optical flow information of the two images in the x direction pixel by pixel to obtain an optical flow characteristic diagram Y;

4. constructing an optical flow detection network input I;

5. inputting the I into the convolution layer and the full link layer to perform feature extraction to obtain a feature map M;

6. inputting M into a coordinate regression model, and calculating target coordinate information;

7. outputting the target coordinates;

as shown in fig. 7, the workflow of the result fusion module is as follows:

1. setting the number of the targets identified by the detection network as a clustering center number k, and constructing a static detection result, a time sequence signal detection result and an optical flow detection result into a sample space D;

2. randomly screening K samples from the D as centers;

3. aggregating the other | D | -k samples onto k center samples according to a distance function;

4. updating the center coordinates;

5. if the central coordinate changes, turning to the step 3;

6. taking an element E in K clusters;

7. judging whether the size of E is equal to 3, if not, turning to the step 9;

8. adding a result prediction set;

9. and judging whether the traversal of the K clusters is completed, and if not, turning to the step 6.

Although the present invention has been described with reference to the preferred embodiments, it is not intended to limit the present invention, and those skilled in the art can make modifications and variations of the present invention without departing from the spirit and scope of the present invention.

Claims

1. A spatial domain finite pixel target detection system fusing spatial and temporal characteristics of a neural network is characterized in that: the system comprises a data enhancement module, a deep learning target detection module, a time sequence signal processing module, an optical flow detection module and a result fusion module;

2. The spatial domain finite pixel target detection system fusing the spatial and temporal characteristics of the neural network as claimed in claim 1, wherein: the data enhancement module comprises a data enhancement sub-module based on a generated countermeasure network and a traditional data enhancement sub-module; the method comprises the steps that a data enhancement module for generating an antagonistic network is used, an antagonistic technology generated in a deep learning theory is used, the network consists of a generator and an antagonistic device, the generator is responsible for generating sample data, the antagonistic device is responsible for correcting errors of the generator, the generator and the antagonistic device are iterated until convergence, and then a stable model for producing a training sample can be obtained; the traditional data enhancement module uses classical image enhancement technologies including image copying, rotation, shearing and noise addition, and uses a certain amount of image data as input to realize sample enhancement and amplification.

3. The spatial domain finite pixel target detection system fusing the spatial and temporal characteristics of the neural network as claimed in claim 2, wherein: the specific work flow inside the traditional image enhancement submodule is as follows:

11) reading image data in the data set;

4. The spatial domain finite pixel target detection system fusing the spatial and temporal characteristics of the neural network as claimed in claim 3, wherein: the data enhancement submodule based on the generation countermeasure network has the working process that:

23) the iteration counter n is set to 0;

24) sampling m samples from Pz-N (0, I) { z (1), z (2).. z (m);

25) sample m true samples { x (1), x (2).. x (m) } from Pr;

26) calculating a loss function value;

29) sampling m samples from Pz-N (0, I) { z (1), z (2).. z (m);

211) reading K images K;

212) sending the image to a generation network G;

213) forward reasoning of the model;

214) and acquiring a generated sample K'.

5. The spatial domain finite pixel target detection system fusing the spatial and temporal characteristics of the neural network as claimed in claim 1, wherein: the deep learning target module comprises the following working processes:

35) zooming the image according to the scale zooming parameter;

310) calculating neuron error quantities of all layers;

311) solving an error gradient;

312) updating the weights using the error gradients, jumping to step 36);

317) obtaining the number Num of the ROIs;

320) performing regression on the frame of the candidate region;

323) acquiring the number of the screened ROI;

324) selecting an ROI area for multi-classification;

325) regressing the frame of the ROI area again;

6. The spatial domain finite pixel target detection system fusing the spatial and temporal characteristics of the neural network as claimed in claim 1, wherein: the specific work flow inside the time sequence signal processing module is as follows:

7. The spatial domain finite pixel target detection system fusing the spatial and temporal characteristics of the neural network as claimed in claim 1, wherein: the specific work flow inside the optical flow detection module is as follows:

51) reading in two images ImgT and ImgT-1;

54) constructing an optical flow detection network input I;

57) and outputting the target coordinates.

8. The spatial domain finite pixel target detection system fusing the spatial and temporal characteristics of the neural network as claimed in claim 1, wherein: the result fusion module has the working process that:

62) randomly screening K samples from the D as centers;

64) updating the center coordinates;

65) if the central coordinate is changed, turning to step 63);

66) taking an element E in K clusters;

9. A space domain finite pixel target detection method fusing neural network space-time characteristics is characterized by comprising the following steps:

10. The spatial domain finite pixel target detection method fusing the spatial and temporal characteristics of the neural network as claimed in claim 9, wherein: the data enhancement module comprises a data enhancement sub-module based on a generated countermeasure network and a traditional data enhancement sub-module; the method comprises the steps that a data enhancement module for generating an antagonistic network is used, an antagonistic technology generated in a deep learning theory is used, the network consists of a generator and an antagonistic device, the generator is responsible for generating sample data, the antagonistic device is responsible for correcting errors of the generator, the generator and the antagonistic device are iterated until convergence, and then a stable model for producing a training sample can be obtained; the traditional data enhancement module uses classical image enhancement technologies including image copying, rotation, shearing and noise addition, and uses a certain amount of image data as input to realize sample enhancement and amplification.