CN112907626A

CN112907626A - Moving object extraction method based on satellite time-exceeding phase data multi-source information

Info

Publication number: CN112907626A
Application number: CN202110172481.4A
Authority: CN
Inventors: 鹿明; 李峰; 辛蕾; 杨雪; 鲁啸天; 张南; 任志聪; 肖化超
Original assignee: China Academy of Space Technology CAST
Current assignee: China Academy of Space Technology CAST
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2021-06-04

Abstract

The invention relates to a moving target extraction method based on satellite time-exceeding phase data multi-source information, which comprises the following steps: a. collecting an overtime phase image of an area, and preprocessing the image; b. preliminarily extracting a moving target in the image, and extracting a road target on which the moving target depends in the image; c. further extraction of the moving target in the image is completed according to the preliminarily extracted moving target and the road target; d. and c, performing morphological processing on the target result image extracted in the step c to obtain a final result. The method extracts the information such as the space geometry, the motion speed, the road environment and the like of the moving target on the basis of the characteristics such as the spectrum, the texture, the time sequence and the like of the video satellite, and the information is jointly used for detecting the moving target, so that the false moving target caused by parallax and the false moving target caused by registration error and random noise are avoided, the accuracy of detecting the moving target is effectively improved, and the false detection rate is reduced.

Description

Moving object extraction method based on satellite time-exceeding phase data multi-source information

Technical Field

The invention relates to a moving target extraction method based on satellite time-exceeding phase data multi-source information.

Background

The moving target detection is a technology formed by cross fusion in the fields of computer vision, remote sensing image processing, artificial intelligence and the like, and is an important research content of situation awareness. The method can not only sense the existence of objects, but also sense the dynamic change trend of the objects. The method plays an extremely important role in the aspects of real-time investigation, real-time monitoring, real-time control and the like in the military and civil fields. The video satellite is a novel earth observation technology, can capture continuous images from a moving satellite platform, and provides a reliable data source for moving target detection and situation perception. The system has the characteristics of large-area observation, high spatial resolution, continuous video imaging and the like of the video satellite, so that the real-time dynamic information of the earth surface can be rapidly acquired.

Due to the relative motion of the satellite platform and the earth surface, the different frames of the video have deformation such as translation, rotation, distortion and stretching caused by the factors such as the height and the like of the earth surface. Therefore, objects such as high-rise buildings and iron towers can present obvious pseudo motion characteristics. If moving objects of a video satellite are directly detected by using moving object detection algorithms such as a traditional inter-frame difference method, a background modeling method, an optical flow method and the like, the high-level objects may be wrongly judged as moving objects. And, it is difficult to identify the false motion of the high object caused by the edge of the ground object and the parallax change generated by the image translation only by means of simple improvement of the traditional method.

Disclosure of Invention

The invention aims to provide a moving object extraction method based on satellite time-exceeding phase data multi-source information.

In order to achieve the above object, the present invention provides a moving object extraction method based on satellite hyper-temporal data multi-source information, comprising the following steps:

a. collecting an overtime phase image of an area, and preprocessing the image;

b. preliminarily extracting a moving target in the image, and extracting a road target on which the moving target depends in the image;

c. further extraction of the moving target in the image is completed according to the preliminarily extracted moving target and the road target;

d. and (c) performing morphological processing on the target result image extracted in the step (c) to obtain a final result.

According to an aspect of the present invention, in the step (a), a current frame image is collected and one frame image is collected in a period of time before and after the current frame, so as to form a three-frame sequence of a previous frame, the current frame and a next frame;

the time interval of three-frame image acquisition is determined by the speed and the length of a moving object and the frame frequency of the video, and the motion amplitude of the moving object between the extracted adjacent frames is between 10m and 100 m.

According to one aspect of the invention, the preprocessing in the step (a) is to perform interframe registration on a frame of image extracted before and after the current frame by taking the current frame as a reference;

the inter-frame registration comprises the steps of reading a current frame image and a frame image before or after the current frame image into an array, and respectively carrying out key point detection and feature description on the two frame images by adopting an SIFT (Scale invariant feature transform), SURF (speeded up robust features), ORB (object-oriented features) or AKAZE (AKAZE) algorithm;

performing feature matching on the key points on the two frames of images by using a matcher, wherein the matching method comprises the steps of calculating the distance of descriptors between each pair of key points and returning the minimum distance of k optimal matches with each key point;

and calculating a homography matrix of the two image transformations according to the matching point pairs, carrying out image deformation on a frame of image before or after the current frame, and removing abnormal point pairs by adopting an RANSAC algorithm during deformation.

According to an aspect of the present invention, in the step (b), the moving object in the image is preliminarily extracted in such a manner that the moving object is extracted based on a velocity attribute and a time-series attribute of the moving object, respectively;

when a moving target is extracted based on the speed attribute of the moving target, carrying out dense optical flow calculation on a current frame and a frame image before the current frame by using an optical flow method to obtain the optical flow state of each pixel, wherein the speed and the direction of the pixel are unchanged, the pixel is taken as a background, and otherwise, the pixel is taken as a foreground target;

when a moving target is extracted based on the time sequence attribute of the moving target, preliminarily dividing a foreground target and a background target by combining the motility characteristics of the moving target on a time sequence by using a three-frame difference method;

and extracting the road object depending on the moving object on the image by using a D _ LinkNet network based on deep learning.

According to one aspect of the invention, when a moving target is extracted based on the speed attribute of the moving target, a previous frame image and a current frame image which need to be subjected to optical flow calculation are sequentially input, and the image proportion is specified to construct a pyramid for each image;

determining the number of layers of a pyramid, the size of an average window, the iteration times of an algorithm in each layer of an image pyramid, the number of adjacent pixel points expanded by a calculation polynomial at each pixel point, a Gaussian standard deviation for smoothing a derivative and initial flow approximation;

converting the calculated optical flow from a Cartesian coordinate system to a polar coordinate system, and acquiring the speed and direction of each pixel point;

and according to the speed and the direction of the optical flow calculation, representing that the speed and the direction values of the pixel points are both 0 as a background target, and otherwise, representing as a foreground target.

According to an aspect of the present invention, when the moving object is extracted based on the time series property of the moving object in step (b), all of the three collected frames of images are read, and the images are converted from RGB images into grayscale images;

performing inter-frame difference on the gray-scale image of the current frame and the gray-scale images of the frames before and after the current frame to obtain two difference value images;

setting a threshold value to carry out binarization on the two difference images respectively to obtain two binary images which distinguish foreground objects from background objects;

and (4) performing AND operation on the two binary images, and extracting a moving object in the intersected image.

According to one aspect of the invention, when extracting the road target on which the moving target depends in step (b), firstly, a sample data set for network training and testing is constructed for generating a road extraction network;

selecting remote sensing images of some target satellites, and labeling road targets and other targets in the images;

dividing a data set into a training set, a verification set and a test set according to a certain proportion, wherein the training set is used for carrying out iterative training on network parameters, and the verification set is used for verifying whether a trained model can reach the expected precision;

and (4) inputting the current frame image as a test data set into the trained road extraction network to obtain a final road segmentation result.

According to an aspect of the present invention, the road extraction network is a U-type network based on an encoder-bridge-decoder structure;

the encoder part in the network is a ResNet34, and the bridge part is five volume blocks;

the decoder part is the inverse operation of the encoder part, and adopts up-sampling and is overlapped with the encoder part at the same level.

According to an aspect of the present invention, in the step (c), the three types of results extracted in the step (b) are respectively subjected to binary storage of 0 and 1, wherein 0 represents a background target and 1 represents a foreground;

and extracting the targets with the numerical values of 1 after binarization storage to serve as an accurate extraction result of the moving targets.

According to an aspect of the present invention, the morphological processing in step (d) is to perform a morphological opening operation on the image extracted from the final target by using a 3 × 3 circular template structure to eliminate the spots in the image;

adopting a 3 x 3 circular template structure to perform shape closing operation on the image subjected to shape opening operation so as to eliminate holes in the image;

the method also comprises the steps of carrying out connectivity analysis on the images subjected to morphological processing, and extracting a final moving target according to the following rules:

the vehicle target is determined when the target size is between 4 pixels and 2000 pixels, the target aspect ratio is below 8, the ratio of the target area to the minimum circumscribed rectangle area is greater than 0.2, and the average pixel value of the target is between 10 and 250.

According to the invention, on the basis of the characteristics of the spectrum, the texture, the time sequence and the like of the video satellite, the information of the space geometry, the motion speed, the road environment and the like of the moving target is extracted and is jointly used for detecting the moving target, so that the false moving target caused by parallax and the false moving target caused by registration error and random noise are avoided, the accuracy of the detection of the moving target is effectively improved, and the false detection rate is reduced.

According to one scheme of the invention, a deep learning algorithm is used for extracting road objects on which moving objects depend from the image. Therefore, objects which are mistaken for moving targets in the preliminary moving target extraction and have pseudo-moving characteristics can be screened out, the moving targets of the video satellites can be extracted more accurately, and the defects in the prior art are overcome.

According to one scheme of the invention, the moving objects in the image are respectively extracted by using an optical flow method and an improved three-frame difference method, and then the extraction results of the two methods and the extraction result of the road object are subjected to superposition analysis, so that the two methods are mutually corrected, and the accuracy of the final extraction of the moving objects is further improved.

According to one scheme of the invention, the acquired image is preprocessed before the moving target and the road target are extracted. The preprocessing step mainly comprises the step of performing interframe registration on two frames of images before and after the current frame by using the current frame. The subsequent target extraction is prevented from being influenced by the phenomenon of image distortion caused by satellite jitter and the like.

Drawings

FIG. 1 is a flow diagram schematically illustrating a moving object extraction method based on satellite overtime phase data multi-source information, according to one embodiment of the present invention;

FIG. 2 is a schematic diagram showing best matching keypoints pairs for a current frame and a previous frame during image registration;

FIG. 3 is a schematic diagram illustrating the velocity of a moving object obtained by an optical flow method;

FIG. 4 is a schematic diagram of a moving object obtained by an improved three-frame difference method;

FIG. 5 is a schematic diagram illustrating a road object (road) extracted based on a deep learning method;

fig. 6 schematically shows a current frame (left) and after final object extraction (right) on the current frame.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

The present invention is described in detail below with reference to the drawings and the specific embodiments, which are not repeated herein, but the embodiments of the present invention are not limited to the following embodiments.

Referring to fig. 1, in the moving object extraction method based on satellite hyper-temporal data multi-source information, a hyper-temporal image (or referred to as a multi-temporal image) of a certain region is collected first, and the hyper-temporal image is image data (which can also be understood as video data) with continuous time series. And then, preprocessing the acquired image by using a data preprocessing module (image reading can also be completed by the data preprocessing module), and after preprocessing, primarily extracting a moving target in the image by using a target information extraction module. Considering that a high object may be misjudged as a moving object, according to the concept of the present invention, a road object on which the moving object depends in an image is additionally extracted based on the background environment-dependent attribute of the moving object. Taking a vehicle and a road as examples, the vehicle is a moving object, and the road is a road object on which the vehicle depends. The preliminarily extracted moving object may include both a vehicle and a high-rise building. However, these moving objects are only moving objects that should be extracted when they appear on the road. Therefore, the invention additionally considers the environment dependence attribute of the moving target and further extracts the image on the basis of initial extraction, thereby eliminating the defect that the object with false motion characteristics, such as a high-rise building or an iron tower, is mistakenly judged as the moving target in the prior art.

The method of the present invention will be described in detail below with reference to the video 03 star Jilin No. I in the airport area in Atlanta as an embodiment shown in FIG. 1.

When the invention collects the image, the collected overtime phase image comprises the current frame and collects a frame of image in a period of time before and after the current frame respectively to form a three-frame sequence of the previous frame, the current frame and the next frame. The time interval of data reading is determined by the speed and the length of the target to be extracted and the frame frequency of the video, so that the moving target is not overlapped between adjacent frames to the maximum extent. In particular, for the three frames of images, the acquisition time interval between adjacent frames should ensure that the motion amplitude of the moving object is between 10m and 100m, so as to avoid the overlapping of the moving object between different frames. Specifically, the previous frame, the current frame, and the subsequent frame may be sequentially read. And initializing the first two frames of images, and reading the next frame of image as a frame after the current frame. For the present embodiment, the frame rate of the Jilin A video satellite is 10 frames/s, the ground vehicle motion speed is typically between 20-160m/s, and the ground vehicle target size is typically between 3-15 m. Therefore, in the present embodiment, 0.5s is set as the time interval of image capturing (or image reading), so that most vehicle targets can be detected. The presence of only a very small number of very long trucks and vehicles with very low motion speeds, which can also be eliminated in the post-processing, can cause cavitation due to the differences. The three frames of images with the time interval of the selected image determined together according to the speed of the moving object and the data frame rate of the overtime phase are not continuous in time.

The idea of the image preprocessing step of the invention is to take the current frame as a reference and register (transform) two frames of images extracted before and after the current frame in sequence. Therefore, the phenomenon that subsequent target extraction is influenced by image distortion and the like caused by factors such as self-shaking of the satellite can be avoided. Specifically, referring to fig. 2, a current frame and a previous frame are read into an array, and the two frames of images are respectively subjected to key point detection and feature description by using the algorithms of SIFT, SURF, ORB, AKAZE, and the like. In the embodiment, an AKAZE algorithm is used, and 644 key points are identified and screened. When the key points of the two images are identified, the matcher is used for carrying out feature matching on the key points on the two images, and the matching method is to calculate the distance of the descriptor between each pair of key points and return the minimum distance of the k best matches with each key point. Specifically, the distance between the descriptors of each pair of keypoints is measured to obtain k best matching keypoints pairs with the minimum distance from each keypoint. As shown in fig. 2, in the present embodiment, a total of 515 best matching key point pairs are returned. And then, calculating homography matrixes (namely Homographies) of the two image transformations according to the matching point pairs, and carrying out image deformation on one frame of image before the current frame. In the present embodiment, in order to ensure the optimal deformation effect, the RANSAC algorithm is used to remove the abnormal point pairs during deformation. And obtaining the registered previous frame image after the final deformation is finished. Then the current frame and the frame image after the current frame are read into the array, and the registered frame image after the current frame and the frame image after the current frame can also be obtained by the same steps.

The steps complete the inter-frame registration work (namely preprocessing) of the image, and can ensure that the object in the image is not distorted to influence the subsequent detection and identification processes. And then, preliminarily extracting the moving target to be extracted from the image, wherein the moving target is extracted by adopting two methods, namely extracting the moving target based on the speed attribute and the time sequence attribute of the moving target. When the moving target is extracted based on the speed attribute of the moving target, dense optical flow calculation is carried out on the current frame and a frame image before the current frame by using an optical flow method, the optical flow state of each pixel is obtained, the speed and the direction of the pixel are unchanged, and the pixel is regarded as a background, otherwise, the pixel is regarded as a foreground target. In this way, the moving speed of the target on the pixel level is obtained, and therefore the speed measurement of the target is realized. When the moving target is extracted based on the time sequence attribute of the moving target, the preliminary division of the foreground target and the background target is carried out by utilizing an improved three-frame difference method and combining the motility characteristics of the target on the time sequence.

When extracting a moving target using the optical flow method, it should be based on an image gradient constancy assumption and a local optical flow constancy assumption. That is, the luminance does not change when the same point changes with time. This is an assumption of basic optical flow, and all optical flow variants must be satisfied to derive the optical flow equations. In addition, it should be ensured that the time variation does not cause a drastic change in position, so that the grey scale can only partially derive the position. In this way, the partial derivative of gray level with respect to position can be approximated by the gray level change caused by the unit position change between the previous and subsequent frames. In the embodiment, a Gunnar Farneback algorithm is adopted to calculate the dense optical flow, all points on the image are matched point by the method, the offset of all points is calculated, an optical flow field is obtained, and then registration is carried out. In the present embodiment, the previous frame image and the current frame image that require optical flow calculation are sequentially input, the image ratio is specified to be 0.5, and a (image) pyramid is constructed for each image. Determining the number of layers of the pyramid to be 3, determining the size of an average window to be 12, determining the number of iterations of the algorithm in each layer of the image pyramid to be 3, and determining the number of adjacent pixel points which are expanded by the calculation polynomial at each pixel point to be 7. The gaussian standard deviation for smoothing the derivatives is then determined and used as the basis for the polynomial expansion, which the present embodiment sets to 1.5. In addition, the present embodiment uses the input flow as the initial flow approximation, although in other embodiments a Gaussian filter may be used.

And after the setting is completed, converting the calculated optical flow from a Cartesian coordinate system to a polar coordinate system, and acquiring the speed and the direction of the moving object of each pixel point. As shown in fig. 3, the speed of the light flow graph is higher as the brightness is higher, and then a certain threshold value can be set to determine the moving object in the light flow graph. For example, if the velocity value is 0, 1, 2, 3, … …, a point equal to or greater than 1 can be regarded as a moving object in the optical flow graph, that is, the threshold value can be set to 1. Therefore, the moving target, namely the target with the extraction speed being more than or equal to 1, can be preliminarily extracted according to the speed attribute of the moving target. Specifically, the direction of the moving object may be characterized by color. And according to the speed and the direction of the optical flow calculation, representing that the speed and the direction values of the pixel points are both 0 as a background target, and otherwise, representing as a foreground target.

When a moving target is extracted by using a three-frame difference method, three frames of images acquired in the image acquisition step are read. The images are converted into gray-scale images from RGB images, and the gray-scale image of the current frame and the gray-scale images of the frames before and after the current frame are subjected to inter-frame difference to obtain two difference value images. Then, a threshold value can be set, and the difference maps of the two frames are respectively binarized to obtain two binary maps. In the present embodiment, a threshold value is set to 40, which is an image gradation, and becomes brighter as the value thereof becomes higher and darker as the value thereof becomes lower, that is, a pixel point becomes larger than this threshold value at the time of binarization to be 1, and conversely to be 0. Therefore, the foreground (namely, a bright spot area) and the background (namely, a black area) can be distinguished, and the moving object extracted by the current frame and the images of the frames before and after the current frame is obtained. Subsequently, the two binary maps need to be subjected to an and operation (i.e. intersection) which may be understood as finding an intersection (also called overlay analysis) of the two frames of binary maps, for example, a point which is simultaneously 1 is a moving object (bright point), so that the moving object in the image can be preliminarily extracted from the intersected binary maps, as shown in fig. 4.

Through the steps, namely, the optical flow method and the three-frame difference method are respectively utilized to preliminarily extract the moving object in the image according to the speed attribute and the time series attribute of the moving object. According to the above concept of the present invention, it is also necessary to extract a road object from which a moving object comes. In contrast, the present invention extracts a target-dependent environment element (i.e., a road target) by using a target (road) recognition method based on deep learning, and the extraction of the ground moving target-dependent environment mainly refers to the extraction of a road target (vehicle) from a high-resolution satellite remote sensing image. The invention adopts the D _ LinkNet network to extract the road target. For this, a road extraction network is first constructed. In the present embodiment, the vehicle is used as the moving object, and the road is used as the road object. Therefore, the road extraction network is a U-shaped road extraction network based on an encoder-bridge-decoder structure. The encoder part of the network is a ResNet34, the bridge part is five (regular) volume blocks, and the decoder part is the inverse operation of the encoder part. And the decoder part adopts a mode of upsampling and overlapping with the encoder part at the same level, thereby realizing the feature fusion of different spatial scales. When the road extraction network is constructed, firstly, a sample data set of network training and testing is used for generating the road extraction network. Specifically, some representative remote sensing images of some target satellites need to be selected in a targeted manner. Representative data may be selected by a gray scale attribute, such as a typical gray scale value of a road object at different times in the history data. The data set constructed by the images can ensure that the trained model can identify various road targets at all times. Of course, the characteristics of the target object can be reflected in any case according to other attributes of the target object, such as space, spectrum and the like, so that the model identification is facilitated. After a representative image is selected, road targets and other targets in the representative image need to be labeled (data set), the data set is divided into a training set, a verification set and a test set according to a certain proportion, the training set is used for carrying out iterative training on network parameters, and the verification set is used for verifying whether a trained model can reach the expected precision. Inputting the training set into a road extraction network, and carrying out iterative training on network parameters until the network achieves ideal loss and precision on the training set and the verification set. Finally, the image frame (i.e., the current frame) of the video satellite is used as a test data set, the trained network is input, and a final road segmentation result, i.e., a road map (i.e., an extraction result) in the image corresponding to the current frame, is output, as shown in fig. 5.

Therefore, the moving target extracted in two modes and the road target obtained by the deep learning algorithm can be obtained according to the steps. The three extracted targets should be subjected to superposition analysis, i.e. the final extraction of the moving target is completed by using the extraction results of the three modes together. Specifically, the three types of results, namely the moving object extracted by the optical flow method and the three-frame difference method and the extraction result (figure) of the road object extracted based on the deep learning, are subjected to binary storage of 0 and 1 to form a binary image, and the moving object is stored as 1 (namely, a bright spot) in the storage process, namely, 0 represents a background object, and 1 represents a foreground. Therefore, the targets with the numerical values being 1 after binarization storage are subjected to target extraction in the current frame image, and the accurate (further) extraction result of the moving target obtained by fusing multi-source information attributes can be obtained.

Thus, the present invention is essentially to extract a moving object from an image by using an optical flow method and a three-frame difference method, respectively, that is, to integrate an object velocity attribute obtained by the optical flow method and a foreground object obtained by the three-frame difference method. The two methods have advantages and disadvantages respectively, and the two methods are combined to enable the two methods to be mutually corrected, and then the most accurate moving target extraction result can be obtained by matching with the environment attribute consideration of the dependence of the moving target. Of course, although the method has basically completed the final extraction of the moving object, the extraction result is only the initial result and needs to be completed through the post-processing step.

Specifically, as shown in fig. 6, the data post-processing module is used to perform morphological processing on the image extracted by the target information, so as to eliminate the influence of small spots and small holes (i.e., random noise and image registration error) in the image on the integrity of the moving target (vehicle), and obtain a final result. Specifically, a 3-by-3 circular template structure is adopted to perform shape opening operation on the image, and small isolated points (namely spots) are eliminated. And then, adopting a 3-by-3 circular template structure to perform form closing operation on the image subjected to form opening operation, and removing the influence of the small holes on the integrity of the vehicle target. And performing connectivity analysis on the result, and extracting a desired moving target according to a certain rule, wherein the rule can be formulated according to the size, the aspect ratio, the area ratio of the target to the minimum circumscribed rectangle and the like of the target. For example, for a vehicle object, only those with a moving object size (dimension) >4 pels and at the same time <2000 pels, an object aspect ratio of ≦ 8, an object area to minimum bounding rectangle area ratio >0.2, an object average pixel value >10, or <250, will be considered a vehicle object. After the morphological analysis and the connectivity analysis, the result graph is stored to obtain the current frame image labeled with the final moving target extraction result.

In summary, the moving object detection technology based on the hyper-temporal phase data integrates the processes of inter-frame registration, optical flow calculation, three-frame difference, deep learning-based object dependent environment extraction, post-processing and the like of the hyper-temporal phase data on the basis of the traditional inter-frame difference and optical flow method, so that a whole set of moving object detection method based on the satellite hyper-temporal phase data is formed by comprehensively utilizing the speed attribute, the time sequence attribute, the spectral attribute and the spatial position attribute of the object, namely, the multi-dimensional attribute of the object is sensed from different angles. Compared with the traditional moving object detection method such as interframe difference, an optical flow method, a background modeling method and the like, the method is more systematized, has higher extraction precision, can further carry out more accurate moving object detection, and has higher practicability in the application of satellite overtime phase data. Therefore, the problems of low detection accuracy and high false detection rate when the current moving target detection algorithm is applied to satellite video data detection are solved, and more accurate detection of the moving target is realized. In conclusion, the method can be used for the research on the aspects of detection, tracking and the like of the moving target (ground moving vehicle target) of the space-based satellite time-exceeding phase data, has very good universality and higher extraction precision in the extraction of the moving target of the satellite time-exceeding phase data, has important significance for the development of the research on the analysis of vehicle flow based on video satellites, the positioning and tracking of the ground moving target and the like, and can be widely applied to the fields of intelligent transportation, smart cities, emergency rescue and the like.

The above description is only one embodiment of the present invention, and is not intended to limit the present invention, and it is apparent to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A moving object extraction method based on satellite hyper-temporal data multi-source information comprises the following steps:

a. collecting an overtime phase image of an area, and preprocessing the image;

2. The method of claim 1, wherein in step (a), the current frame image is collected and one frame image is collected in a period of time before and after the current frame, so as to form a three-frame sequence of the previous frame, the current frame and the next frame;

3. The method according to claim 2, wherein the preprocessing in step (a) is to perform inter-frame registration on a frame of image extracted before and after the current frame by using the current frame as a reference;

4. The method according to claim 2, wherein in the step (b), the moving object in the image is preliminarily extracted in such a manner that the moving object is extracted based on a velocity attribute and a time-series attribute of the moving object, respectively;

5. The method according to claim 4, wherein when extracting the moving object based on the velocity attribute of the moving object, a previous frame image and a current frame image which need optical flow calculation are sequentially input, and an image proportion is specified to construct a pyramid for each image;

6. The method according to claim 4, wherein when the moving object is extracted based on the time series property of the moving object in the step (b), all the three collected frames of images are read, and the images are converted from RGB (red, green and blue) images into gray-scale images;

7. The method according to claim 5 or 6, wherein when extracting the road target on which the moving object depends in step (b), firstly constructing a sample data set of network training and testing for generating a road extraction network;

8. The method as claimed in claim 7, wherein the road extraction network is a U-type network based on an encoder-bridge-decoder structure;

9. The method according to claim 8, wherein in the step (c), the three types of results extracted in the step (b) are respectively subjected to binary storage of 0 and 1, wherein 0 represents a background object and 1 represents a foreground;

10. The method according to claim 1, wherein the morphological processing in step (d) is to perform a morphological opening operation on the image extracted from the final target to eliminate the spots in the image by using a 3-by-3 circular template structure;