CN109255317B - Aerial image difference detection method based on double networks - Google Patents

Aerial image difference detection method based on double networks Download PDF

Info

Publication number
CN109255317B
CN109255317B CN201811015421.6A CN201811015421A CN109255317B CN 109255317 B CN109255317 B CN 109255317B CN 201811015421 A CN201811015421 A CN 201811015421A CN 109255317 B CN109255317 B CN 109255317B
Authority
CN
China
Prior art keywords
detection
difference
candidate
convolution
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811015421.6A
Other languages
Chinese (zh)
Other versions
CN109255317A (en
Inventor
布树辉
李清
韩鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN201811015421.6A priority Critical patent/CN109255317B/en
Publication of CN109255317A publication Critical patent/CN109255317A/en
Application granted granted Critical
Publication of CN109255317B publication Critical patent/CN109255317B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for recognising patterns
    • G06K9/62Methods or arrangements for pattern recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06K9/6256Obtaining sets of training patterns; Bootstrap methods, e.g. bagging, boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Computing arrangements based on biological models using neural network models
    • G06N3/04Architectures, e.g. interconnection topology
    • G06N3/0454Architectures, e.g. interconnection topology using a combination of multiple neural nets
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]

Abstract

The invention provides an aerial image difference detection method based on double networks. According to the invention, deep learning is added into difference detection, and the problem that effective feature descriptors are difficult to select in image segmentation and difference detection can be avoided because the deep learning does not need manual feature design. The deep learning method can overcome the defect of low robustness to illumination in the difference detection task of the RGB images. Meanwhile, the invention adopts the object detection to replace the traditional segmentation method, can distinguish single object objects better, and the position coordinates of the objects are more accurate and easier to express. The requirement for registration accuracy can also be reduced by calculating the ROI of the predicted frame after detection to determine the relevant objects of the two images. Importantly, the semantic information of the object is added, the object type information is provided, the anti-interference capability is higher, and the difference type can be better analyzed.

Description

Aerial image difference detection method based on double networks
Technical Field
The invention belongs to the field of image processing and machine vision, and relates to a difference detection method for aerial images based on double networks.
Background
In recent years, with the rapid development of unmanned aerial vehicles, unmanned aerial vehicles are widely used in the fields of agriculture, geology, forests, oceans, geographical mapping, military reconnaissance, environmental protection and the like. We have never measured this earth accurately, both in time and space, and have collected various data quickly and accurately. In the past, data was acquired from the air, typically using a satellite or an airplane. But drones are superior "air sensors" over both. The view of the satellite is shielded by cloud cover covering the area above the earth 2/3, so that the unmanned aerial vehicle can collect data more accurately and more frequently; and compare the aircraft, unmanned aerial vehicle cost is lower, and it is easier to operate, also safer. The unmanned aerial vehicle can provide overlook RGB images with high precision anytime and anywhere.
The difference detection is to analyze aerial images in the same area at different moments through image processing and mathematical models, and detect the change of the images along with time. The change detection technology relates to a technology of multidisciplinary intersection such as machine vision, image processing, computer science and the like, is an important direction of current unmanned aerial vehicle aerial image analysis and research, and can be applied to many fields such as forest coverage in resource protection and dynamic monitoring of rivers and lakes, monitoring of illegal buildings in urban planning, dynamic monitoring of suspicious targets in military reconnaissance, petroleum pipeline leakage monitoring, natural disaster assessment and the like.
Currently, the difference detection is mainly classified into pixel level difference detection, feature level difference detection, and object level difference detection. The pixel level difference detection only considers the characteristics of a single pixel, and is low in detection precision and sensitive to noise. And the pixel level difference detection result is too fragmented, and the result is difficult to analyze and describe. The feature level difference detection is used for analyzing and detecting extracted features of an original image, is mainly used for detecting the difference of ground features with special edge features and regional features, and has many qualitative detections. And object-level detection represented by a segmentation means can integrate characteristic information and spatial information, and has the advantages of overcoming noise and high detection precision. At present, the following methods are mainly used for object level detection: the pictures are clustered and segmented, and then the difference change is judged by comparing the difference of the characteristics of the segmented regions in the two pictures by using the manually designed characteristics, but an attempt is continuously made to find a proper characteristic descriptor. The other is to compare the connection between two pictures using a hybrid markov model, but this model requires a high registration requirement for the two pictures. The two common object-level detection methods both need clustering segmentation, but the traditional segmentation method is difficult to perform real-time accurate segmentation and is sensitive to illumination. Meanwhile, the clustering segmentation is to segment more pixel clusters formed by related pixels in the picture without adding semantic information. Therefore, the position of the object in the picture cannot be determined accurately, and more importantly, the difference type cannot be analyzed well without the category information of the object.
Disclosure of Invention
With the rapid development of deep learning, the deep learning method has incomparable advantages in the aspect of feature extraction and classification based on big data. According to the invention, deep learning is added into difference detection, and the problem that effective feature descriptors are difficult to select in image segmentation and difference detection can be avoided because the deep learning does not need manual feature design. In addition, by adopting a deep learning method, the defect of low illumination robustness in the difference detection task of the RGB images can be overcome. Meanwhile, the traditional segmentation method is replaced by the object detection method, the detection method can distinguish single object objects better than the segmentation method, and the position coordinates of the objects are more accurate and easier to express. The requirement for registration accuracy can also be reduced by calculating the ROI of the predicted frame after detection to determine the relevant objects of the two images. Importantly, the semantic information of the object is added, the object type information is provided, the anti-interference capability is higher, and the difference type can be better analyzed.
Based on the above, the invention provides an aerial image difference detection method based on dual networks, which mainly comprises the following tasks: whether a change occurs in the image, the exact location of the changed region and the type of change region. The semantic-based object level difference detection method has the advantages that objects can be segmented in real time under different scenes, the anti-interference capability is high, difference results and types of change regions can be better analyzed, and the like.
The technical scheme of the invention is as follows:
the aerial image difference detection method based on the double networks is characterized by comprising the following steps: the method comprises the following steps:
step 1: the method comprises the steps that a man-machine flies along the same planned path at different moments, and an airborne camera of an unmanned aerial vehicle is used for collecting aerial images along the line to obtain a series of aerial images shot at different moments; after the aerial image is subjected to scaling processing and normalization processing, image matching is carried out to obtain a plurality of pairs of images T at the same place and different momentspast,Tcurrent
Step 2: building a double-network model:
the dual-network model consists of a feature extraction part, an object detection part and a difference detection part;
the two branches of the feature extraction part have the same structure and share weight, each branch consists of 10 layers, namely a convolution layer, a pooling layer, a convolution layer, a pooling layer;
the two branch structures of the object detection part are the same and respectively receive the output of each branch structure of the feature extraction part, and each branch consists of a detection module; the detection module is a convolution layer;
the difference detection part receives the output of the feature extraction part and the output of the object detection part and consists of a CRP layer, a convolution layer aiming at each branch, two fully-connected layers for performing information fusion on the two branches and a Softmax layer;
and step 3: and (3) dual-network training:
step 3.1: making a label: labeling each pair of difference pictures obtained in the step 1 as follows:
respectively marking the position and the category of the object in each picture: the position of the object is defined by the center position (p) of the circumscribed rectangle of the objectx *,py *) And the length and width (p) of the rectanglew *,ph *) Records, each object is described as a form of (p)x *,py *,pw *,ph *Class), class being the object class;
marking areas with differences in the two pictures: the difference region position is the center position (q) of the rectangle circumscribed by the regionx,qy) And the length and width (q) of the rectanglew,qh) Recording, each difference area is described as a form of (q)x,qy,qw,qh) The vector of (a);
step 3.2: initializing the part of the model established in the step 2, which needs the autonomous learning parameters;
step 3.3: network training:
step 3.3.1: inputting two aerial pictures which are shot at the same place at different time and have the size of M x M after scaling processing and normalization processing into a network, inputting one picture into one branch, and inputting the other picture into the other branch;
step 3.3.2: extracting deep semantic features of the picture by using a feature extraction part, wherein the image output by the feature extraction part is a feature map FmapThe characteristic graph is tensor of K512, wherein the value of K is the size of the characteristic graph;
step 3.3.3: will feature chart FmapInputting the convolution layer into the detection module, and outputting tensor of K × K (5+ C) after convolution, wherein each vector of 1 × 5+ C records the position and the length and the width of a candidate frame and indicates whether an object and the class of the object are contained in the candidate frame;
step 3.3.4: candidate frames beyond the picture boundary are eliminated, and then the overlapping degree with the annotation object is selected IoU>All candidate frames of 0.7 are used as detection positive samples, if the overlap degree with the labeling object is not IoU>Selecting the candidate frame with the maximum coincidence degree with the labeling object as a detection positive sample and establishing a real label corresponding to the detection positive sample if the candidate frame is 0.7; the real label comprises a position (p) of the annotation objectx *,py *,pw *,py *) Confidence p (confidence)*And the category of the labeling object, wherein the category of the labeling object adopts onehot coding, and the labeling object corresponds to the coding p (Class | defindence) of the category*1, the remaining Class of codes p (Class | confidence)*0, and a confidence p (confidence)*Is 1;
step 3.3.5: will overlap IoU with each of the annotation objects<0.3 as a detection negative sample, and establishing a corresponding real label, wherein the real label is p (confidence)*Is 0;
step 3.3.6: calculating p of each detected positive and negative samplex,py,pw,pyP (Class | confidence), p (confidence) and the correspondingReal label px *,py *,pw *,py *,p(Class|confindence)*,p(confindence)*Squared error loss function of
L(w)=λ1((px-px *)2+(py-py *)2+(pw-pw *)2+(ph-ph *)2)
2(p(Class|confindence)-p(Class|confindence)*)2
3(p(confindence)-p(confindence*))2
Wherein λ is1,λ2,λ3The values of (A) are as follows:
updating parameters of the feature extraction and object detection part by a gradient descent method according to the square error loss function;
step 3.3.7: calculating an mAP value of a detection result output by the detection module;
step 3.3.8: only the candidate frames with the confidence degrees larger than a set threshold value in the detection result are reserved, and then non-maximum value suppression is carried out on the candidate frames reserved in each image;
step 3.3.9: matching the candidate frames after the non-maximum value inhibition in the two branches of the object detection part to obtain a pair of candidate frames corresponding to the same labeled object, only keeping one candidate frame with high confidence in the pair of candidate frames, and then combining the candidate frame prediction results of the two branches to finally obtain a group of candidate frames, wherein the number of the candidate frames is n;
step 3.3.10: selecting a candidate box from the set of candidate boxes obtained in step 3.3.9; according to the position of the candidate frame, the characteristic diagram F is calculatedmapRespectively intercepting corresponding areas of the feature maps of the two branches at corresponding positions;
step 3.3.11: convolving the intercepted areas by a convolution layer respectively;
step 3.3.12: connecting the features after the convolution of the two branches with a full connection layer at the same time for information fusion;
step 3.3.13: then go through a full connection layer and a Softmax layer to obtain the probability Y that the candidate frame is the difference regiontrueAnd probability Y of not being a difference regionfalse
Step 3.3.14: if the candidate box overlaps IoU with the labeled difference region>0.8, the candidate frame is the positive sample for judging the difference, the real label YlabelIs 1; if there is an overlap IoU with the labeled difference region<0.3, the candidate box is the judgment difference negative sample, the real label YlabelIs 0;
step 3.3.15: calculating an output probability Y of a current frame candidate difference detection sectiontrueAnd YfalseWith a real label YlabelCross entropy loss function of (1):
L(w)=YlabellogYtrue+(1-Ylabel)logYfalse
step 3.3.16: circularly executing 3.10-3.15 until the cross entropy loss function of each candidate box and the real label obtained in the step 3.3.9 is calculated; updating parameters of the whole network by using a gradient descent method;
step 3.3.17: calculating the accuracy of difference judgment:
if Y istrue>YfalseIf y is 1; otherwise, y is 0; further calculation of
Wherein the content of the first and second substances,
step 3.3.18: circularly executing the step 3.3.1-3.3.17 until the mAP is more than 70% and the P is more than 95%, or exiting the circulation when the circulation times reach the set times;
and 4, step 4: a pair of newly acquired aerial images Tpast,TcurrentInputting the scaled and normalized data into the model trained in the step 3 to obtain the coordinates of the framing area, the categories of the framing objects and difference results; selecting partial results in aerial photograph TcurrentDrawing, and marking the category of the object to obtain a difference detection diagram; the frame part is the difference between the current time and the past shooting of the place.
Further preferably, the method for detecting difference of aerial images based on dual networks is characterized in that: the specific process in the step 1 is as follows:
step 1.1: data acquisition:
the unmanned aerial vehicle flies along the same planned path at different moments, and simultaneously, an airborne camera of the unmanned aerial vehicle is used for collecting aerial images along the line to obtain a series of aerial images shot at different moments;
step 1.2, zooming the images collected by the unmanned aerial vehicle-mounted camera:
uniformly scaling the acquired images to the same size according to the input size of the neural network; when needing to carry out amplification processing, carrying out image interpolation processing on the image in the amplification process;
step 1.3, according to the following formula
Normalizing the zoomed image; wherein XkFor pixel values, X, at each position of the pictureminIs the smallest pixel value, X, in the picturemaxThe maximum pixel value in the picture is taken;
step 1.4, image matching:
coordinate matching is carried out by utilizing the GPS information of the picture to obtain a plurality of pairs of images T at the same place and different momentspast,Tcurrent(ii) a If the picture does not record GPS information, a plurality of pairs of same places at different moments are obtained by using a characteristic point matching methodThe picture of (2).
Further preferably, the method for detecting difference of aerial images based on dual networks is characterized in that: the specific 10-layer structure of each branch of the feature extraction part in the step 2 is a first convolution layer, a pooling layer, a second convolution layer, a pooling layer, a third convolution layer, a pooling layer, a fourth convolution layer, a pooling layer, a fifth convolution layer and a pooling layer; the first convolution layer is 32 convolution kernels, the size of the convolution kernels is 3 x 3, the second convolution layer is 64 convolution kernels, the size of the convolution kernels is 3 x 3, the third convolution layer is 128 convolution kernels, the size of the convolution kernels is 3 x 3, the fourth convolution layer is 256 convolution kernels, the size of the convolution kernels is 3 x 3, the fifth convolution layer is 512 convolution kernels, and the size of the convolution kernels is 3 x 3.
Further preferably, the method for detecting difference of aerial images based on dual networks is characterized in that: in the step 2, the convolution layer of the detection module is 5+ C convolution kernels, the convolution kernel is 3 x 3, and C represents the set detection variety number.
Further preferably, the method for detecting difference of aerial images based on dual networks is characterized in that: in step 3.3.1, illumination noise is manually added to the two pictures for data enhancement, and then the two pictures are input into the network.
Advantageous effects
Compared with the defects that the pixel level difference detection is easily influenced by noise and the characteristic level difference detection only can carry out qualitative analysis, the method can better overcome the noise by adopting the object level difference detection and has high detection precision.
Compared with the traditional object-level difference detection method, the method has the defects of continuous trial and error, manual feature design and poor illumination robustness, and the problem that an effective feature descriptor is difficult to select in image segmentation and difference detection can be solved by adopting a deep learning method. Meanwhile, the network has illumination invariance through a data enhancement means during network training. Therefore, the interference of illumination change at different time on target difference detection is eliminated, the robustness of the method is enhanced, and the accuracy of difference detection is improved.
Compared with the difference detection by using a segmentation means, the method has the defects of difficult description of segmentation positions, high requirement on registration accuracy and the like. The invention adopts the detection method, the detection method can distinguish single object more than the segmentation method, and the position coordinates of the object are more accurate and easier to express. And the ROI of the prediction frame after detection is calculated to determine the related objects of the two images, so that the requirement on registration accuracy can be reduced.
More importantly, the semantic information of the object is introduced, so that the difference result can be better analyzed, the change type of the change area is judged, the classification result is coded and recorded, and data reference can be made for projects such as navigation and map construction.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic diagram of a dual network model
FIG. 2 is a schematic diagram of a part of feature extraction
FIG. 3 is a schematic diagram of a feature diagram
FIG. 4 is a schematic diagram of the detection process
FIG. 5 is a schematic diagram of a process for detecting differences
Detailed Description
The following detailed description of embodiments of the invention is intended to be illustrative, and not to be construed as limiting the invention.
In this embodiment, an aerial image difference detection method based on a dual network includes the following steps:
data acquisition and processing
The neural network is sensitive to the input data, so that the processing of the original data is particularly important in the field of deep learning. The correctly processed data can not only accelerate the convergence speed of the network training, but also obtain better training results. The data processing procedure in the present invention is described below:
1. data acquisition
The unmanned aerial vehicle flies along the same planning path at different moments, and simultaneously, an aerial image along the line is acquired by utilizing an airborne camera of the unmanned aerial vehicle. Thus, a series of aerial images taken at different times are obtained.
2. Zooming processing is carried out on images acquired by unmanned aerial vehicle-mounted camera
And uniformly scaling the acquired images to the same size according to the input size of the neural network so as to adapt to the input size of the neural network. In this embodiment, the collected images are uniformly scaled to 418 × 418 pixels. When the image needs to be enlarged, image interpolation is performed on a part of the small-resolution images during the enlargement, and the commonly used interpolation method comprises the following steps: nearest neighbor, bilinear, pixel based region, cubic and Lanuss interpolation, etc. The interpolation effect and the algorithm time complexity are considered. Bilinear interpolation is used in this embodiment.
3. According to the following formula
The scaled image is normalized (pixels are 0-1) to reduce the interference caused by non-uniform light. Wherein: xkFor pixel values, X, at each position of the pictureminFor the smallest pixel value, X, in the whole picturemaxThe maximum pixel value in the whole picture.
4. Image matching
By the method, aerial images along the line at different moments can be obtained. But it is also necessary to find out which two pictures are taken at the same location. The GPS information of the picture is utilized to carry out coordinate matching, and a pair of images T at the same place and different moments can be obtainedpast,Tcurrent
If the picture does not record GPS information, a pair of pictures at the same place and different moments can be obtained by using a feature point matching method, and the most common method is to match the pictures by using SIFT features.
Second, build up the model of the dual network
The invention provides a double-network model capable of detecting differences, which mainly comprises a feature extraction part, an object detection part and a difference detection part. Referring to fig. 1, a concrete network model includes a feature extraction part, an object detection part, and a difference detection part;
the feature extraction part extracts deep semantic features of the picture, learns more universal and robust feature descriptors through a neural network and prepares for object detection and difference judgment later; the two branches of the feature extraction part are identical in structure and share weight values, and each branch is composed of 10 layers (as shown in FIG. 2). The convolution layers (32 convolution kernels, convolution kernel size 3 × 3), pooling layer, convolution layer (64 convolution kernels, convolution kernel size 3 × 3), pooling layer, convolution layer (128 convolution kernels, convolution kernel size 3 × 3), pooling layer, convolution layer (256 convolution kernels, convolution kernel size 3 × 3), pooling layer, convolution layer (512 convolution kernels, convolution kernel size 3 × 3), and pooling layer were sequentially described. The 10 th layer of output image after being pooled is a feature map FmapThe semantic information of the picture depth is recorded.
The object detection section finds a target object to be detected using the extracted features. The two branch structures of the object detection part are the same, and respectively receive the output of each branch structure of the feature extraction part, and each branch consists of a detection module. The detection module is a convolution layer (5+ C convolution kernels, convolution kernel 3 x 3). And C represents the set number of detection types. Will feature chart FmapThe output of the detection is a tensor K x K (5+ C) which includes the characteristic diagram FmapPredicted object location, category, and confidence information; each vector of 1 × 5+ C records the position and length and width of the candidate frame, and determines whether or not the object and the class of the object are included therein. Wherein the value of K is the size of the characteristic diagram; multiplying by 5 is because the center coordinate (p) of each candidate box is to be predictedx,py) Length and width (p) of candidate framew,ph) And the confidence of whether this frame contains an object, p (confidence); c represents the number of detection types set, and records a probability value p (Class | confidence) that the detection object belongs to each type. According to the outputThe tensor calculates the position of the prediction frame and the category of the frame selection object.
The difference detection part receives the outputs of the feature extraction part and the object detection part and consists of a CRP layer, a convolution layer (32 convolution kernels, the convolution kernel is 3 x 3) aiming at each branch, two fully-connected layers for performing information fusion on the two branches and a Softmax layer.
The CRP layer is called Change Region Proposal (Change Region recommendation layer), and the function of the CRP layer is to generate difference regions and Region feature extraction. And only the frame selection result with the confidence level greater than a certain threshold value in the detection result is reserved in the CRP layer, and then the remaining candidate frames of each aerial photograph are respectively subjected to non-maximum value inhibition, so that the number of unnecessary candidate frames is further reduced, and the condition that different candidate frames detect the same object is avoided. And matching and traversing the prediction results of the candidate boxes of the two branches after the non-maximum suppression, only keeping IoU one of the two candidate boxes which is larger than a certain threshold value, and then merging the prediction results of the candidate boxes of the two branches. Will feature chart FmapAnd intercepting the part of the candidate frame corresponding to the position of the candidate frame result which is finally reserved. The characteristics of the two pictures at the selected part are obtained.
And (3) respectively convolving the region characteristics of the two images obtained by the CRP layer by a convolution layer: more expressive semantics are obtained, and meanwhile, the method plays a role in dimension reduction, and the possibility of excessive parameters of the full-connection layer is avoided. And connecting the features after the convolution of the two branches with a full connection layer at the same time for information fusion. Then go through a full connection layer and a Softmax layer to obtain the probability Y of whether the frame part is the difference regiontrue、Yfalse
Three, two network training
1. Making a label:
for each pair of difference pictures obtained in the first step, the following labels are carried out:
and 1.1, respectively marking the position and the category of the object in each picture. The position of the object is defined by the center position (p) of the circumscribed rectangle of the objectx *,py *) And the length and width (p) of the rectanglew *,ph *) And (6) recording. Each object can be described as being of the form (p)x *,py *,pw *,ph *Class), class being the object class.
And 1.2, marking the areas with differences in the two pictures. The difference region position is the center position (q) of the rectangle circumscribed by the regionx,qy) And the length and width (q) of the rectanglew,qh) And (6) recording. Each difference region can be described as having a form of (q)x,qy,qw,qh) The vector of (2).
2. Initializing the model established in the second step: the part (convolution layer, full connection layer) of the model which needs the self-learning parameter is initialized. In order to make the output variance of each layer in the network as equal as possible, the method adopts an Xavier method to carry out parameter initialization.
3. Network training
And 3.1, manually adding illumination noise into two aerial photographs which are input into the network and have different time sizes of M × M at the same place to perform data enhancement, so that the trained network has certain illumination invariance.
3.2, inputting a pair of pictures subjected to data enhancement into a network, inputting one picture into one branch, and inputting the other picture into the other branch; and extracting deep semantic features of the picture by using the feature extraction part. The image output by the feature extraction part is a feature map Fmap. The feature map is a tensor of K × 512. (wherein, K is M/32)
3.3, will feature map FmapThe convolution is input to the convolution layer of the detection module, and the convolved output is a tensor of K × K (5+ C). Each vector of 1 × 5+ C records the position and length and width of the candidate frame, and determines whether or not the object and the class of the object are included therein.
Wherein the value of K is the size of the characteristic diagram; 5 is because the center coordinate (p) of each candidate frame is to be predictedx,py) Length and width (p) of candidate framew,ph) And the confidence of whether this frame contains an object, p (confidence); c represents the number of detection types set, and records that the detection object belongs to eachProbability value p (Class | confidence) of the Class.
And 3.4, removing the candidate frames beyond the picture boundary. Then selecting IoU degree of overlap with the annotation object>All candidate frames of 0.7 are used as detection positive samples, if the overlap degree with the labeling object is not IoU>Selecting the candidate frame with the maximum coincidence degree with the labeling object as a detection positive sample and establishing a real label corresponding to the detection positive sample if the candidate frame is 0.7; the real label comprises a position (p) of the annotation objectx *,py *,pw *,py *) Confidence p (confidence)*And the category of the labeling object, wherein the category of the labeling object adopts onehot coding, and the labeling object corresponds to the coding p (Class | defindence) of the category*1, the remaining Class of codes p (Class | confidence)*0, and a confidence p (confidence)*Is 1.
3.5 degree of overlap IoU with each of the annotation objects<0.3 as a detection negative sample, and establishing a corresponding real label, wherein the real label is p (confidence)*Is 0.
3.6, calculating p of each positive and negative samplex,py,pw,pyP (Class | confidence), p (confidence) and the corresponding real tag px *,py *,pw *,py *,p(Class|confindence)*,p(confindence)*Squared error loss function of
L(w)=λ1((px-px *)2+(py-py *)2+(pw-pw *)2+(ph-ph *)2)
2(p(Class|confindence)-p(Class|confindence)*)2
3(p(confindence)-p(confindence*))2
Wherein λ is1,λ2,λ3The values of (A) are as follows:
due to the fact that the dimensionality of positioning and classification is different, and the number of positive and negative samples is different, stable convergence of the network during training can be guaranteed by using different parameters. And updating parameters of the characteristic extraction and object detection part by a gradient descent method according to the squared error loss function.
And 3.7, calculating the mAP (mean of precision) value of the detection result (tensor of K × K (5+ C) output by the detection module).
3.8, only the candidate frames with the confidence level greater than a certain threshold (60%) in the detection result are reserved, and then the candidate frames reserved in each aerial photography image in the two frames are subjected to non-maximum value suppression, so that the number of unnecessary candidate frames is further reduced, and the condition that different candidate frames detect the same object is avoided.
And 3.9, matching the candidate frames of the two branches of the object detection part after the non-maximum value is inhibited to obtain a pair of candidate frames corresponding to the same marked object in the candidate frames of the two branches, only keeping one candidate frame with high confidence in the two candidate frames, then merging the candidate frame prediction results of the two branches to finally obtain a group of candidate frames, wherein the number of the candidate frames is n.
3.10, selecting a candidate box from the set of candidate boxes obtained in 3.9. According to the positions of the candidate frames, calculating the feature map FmapAnd respectively intercepting corresponding areas of the feature maps of the two branches at corresponding positions.
3.11, convolving the intercepted region features of the pair of aerial images by a convolution layer (32 convolution kernels, the convolution kernel is 3 × 3): the semantic expression is obtained, the dimension reduction effect is achieved, and the possibility of excessive parameters of the full connection layer is avoided.
And 3.12, connecting the features after the convolution of the two branches with a full connection layer at the same time for information fusion.
3.13, then go through a full link layer and a Softmax layer to obtain the candidate frame as the difference regionProbability YtrueAnd probability Y of not being a difference regionfalse
3.14 if the candidate box overlaps IoU with the marked difference region>0.8, the candidate frame is the positive sample for judging the difference, the real label YlabelIs 1; if there is an overlap IoU with the labeled difference region<0.3, the candidate box is the judgment difference negative sample, the real label YlabelIs 0.
3.15 calculating the output Y of the current candidate frame difference detection parttrueAnd YfalseWith a real label YlabelCross entropy loss function of (1):
L(w)=YlabellogYtrue+(1-Ylabel)logYfalse
and 3.16, circularly executing 3.10-3.15 until the cross entropy loss function of each candidate frame and the real label obtained in 3.9 is calculated. And updating parameters of the whole network by using a gradient descent method. So that the network can learn how to judge whether the objects of the two pictures in the same area are the same.
3.17, calculating the accuracy of difference judgment
If Y istrue>YfalseThen y is 1; otherwise, y is 0.
ComputingWherein the content of the first and second substances,
and 3.18, circularly executing the steps 3.1-3.17 until the mAP is more than 70% and the P is more than 95% or the circulation is exited when the iteration times reach the set times.
Use of model
And according to the obtained dual-network model for detecting the difference. The model can be used for obtaining an object difference result based on semantics, so that data reference can be conveniently made for projects such as navigation, map construction and the like; meanwhile, a difference annotation graph can be obtained, the difference of change is visualized, and people can conveniently check the difference change at different moments. As shown in fig. 5, the specific application method is as follows:
1. processing the pair of aerial images Tpast,TcurrentThe input is a dual network.
2. By the double-network model for detecting the difference, the coordinate of the framing area, the category of the framing object and the difference result are obtained.
2.1 retaining the frame selection result judged as difference by the difference detection part in the candidate areas obtained by the CRP layer.
2.2 in the results of the object detection part, finding the classification results of the frame selection results corresponding to the two branches.
And 2.3, coding and recording the classification result, and making data reference for projects such as navigation, map construction and the like.
3. Selecting partial results in aerial photograph TcurrentDrawing, and marking the category of the object to obtain a difference detection diagram. The frame part is the difference between the current time and the past shooting of the place.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.

Claims (5)

1. An aerial image difference detection method based on dual networks is characterized in that: the method comprises the following steps:
step 1: the method comprises the steps that an unmanned aerial vehicle flies along the same planned path at different moments, and an airborne camera of the unmanned aerial vehicle is used for collecting aerial images along the line to obtain a series of aerial images shot at different moments; after the aerial image is subjected to scaling processing and normalization processing, image matching is carried out to obtain a plurality of pairs of images T at the same place and different momentspast,Tcurrent
Step 2: building a double-network model:
the dual-network model consists of a feature extraction part, an object detection part and a difference detection part;
the two branches of the feature extraction part have the same structure and share weight, each branch consists of 10 layers, namely a convolution layer, a pooling layer, a convolution layer, a pooling layer;
the two branch structures of the object detection part are the same and respectively receive the output of each branch structure of the feature extraction part, and each branch consists of a detection module; the detection module is a convolution layer;
the difference detection part receives the output of the feature extraction part and the output of the object detection part and consists of a Change Region Proposal layer, a convolution layer for each branch, two full-connection layers for performing information fusion on the two branches and a Softmax layer;
and step 3: and (3) dual-network training:
step 3.1: making a label: labeling each pair of difference pictures obtained in the step 1 as follows:
respectively marking the position and the category of the object in each picture: the position of the object is defined by the center position (p) of the circumscribed rectangle of the objectx *,py *) And the length and width (p) of the rectanglew *,ph *) Records, each object is described as a form of (p)x *,py *,pw *,ph *Class), class being the object class;
marking areas with differences in the two pictures: the difference region position is the center position (q) of the rectangle circumscribed by the regionx,qy) And the length and width (q) of the rectanglew,qh) Recording, each difference area is described as a form of (q)x,qy,qw,qh) The vector of (a);
step 3.2: initializing the part of the model established in the step 2, which needs the autonomous learning parameters;
step 3.3: network training:
step 3.3.1: inputting two aerial pictures which are shot at the same place at different time and have the size of M x M after scaling processing and normalization processing into a network, inputting one picture into one branch, and inputting the other picture into the other branch;
step 3.3.2: extracting deep semantic features of the picture by using a feature extraction part, wherein the image output by the feature extraction part is a feature map FmapThe characteristic graph is tensor of K512, wherein the value of K is the size of the characteristic graph;
step 3.3.3: will feature chart FmapInputting the convolution layer into a detection module, and outputting a tensor of K x K (5+ C) after convolution, wherein each vector of 1 x (5+ C) records the position and the length and the width of a candidate frame and indicates whether an object and the class of the object are contained in the candidate frame, and C represents the set number of detection types;
step 3.3.4: candidate frames beyond the picture boundary are eliminated, and then the overlapping degree with the annotation object is selected IoU>All candidate frames of 0.7 are used as detection positive samples, if the overlap degree with the labeling object is not IoU>Selecting the candidate frame with the maximum coincidence degree with the labeling object as a detection positive sample and establishing a real label corresponding to the detection positive sample if the candidate frame is 0.7; the real label comprises a position (p) of the annotation objectx *,py *,pw *,py *) Confidence p (confidence)*And the category of the labeling object, wherein the category of the labeling object adopts onehot coding, and the labeling object corresponds to the coding p (Class | defindence) of the category*1, the remaining Class of codes p (Class | confidence)*0, and a confidence p (confidence)*Is 1;
step 3.3.5: will overlap IoU with each of the annotation objects<0.3 as a detection negative sample, and establishing a corresponding real label, wherein the real label is p (confidence)*Is 0;
step 3.3.6: calculating p of each detected positive and negative samplex,py,pw,pyP (Class | confidence), p (confidence) and the corresponding real tag px *,py *,pw *,py *,p(Class|confindence)*,p(confindence)*Squared error loss function of
L(w)=λ1((px-px *)2+(py-py *)2+(pw-pw *)2+(ph-ph *)2)+λ2(p(Class|confindence)-p(Class|confindence)*)23(p(confindence)-p(confindence*))2
Wherein (p)x,py) Center coordinates (p) of the candidate framew,ph) Representing the length and the width of the candidate box, p (confidence) representing the confidence that the candidate box contains the object, and p (Class | confidence) representing the probability value that the detection object belongs to each Class; lambda [ alpha ]1,λ2,λ3The values of (A) are as follows:
updating parameters of the feature extraction and object detection part by a gradient descent method according to the square error loss function;
step 3.3.7: calculating an mAP value of a detection result output by the detection module;
step 3.3.8: only the candidate frames with the confidence degrees larger than a set threshold value in the detection result are reserved, and then non-maximum value suppression is carried out on the candidate frames reserved in each image;
step 3.3.9: matching the candidate frames after the non-maximum value inhibition in the two branches of the object detection part to obtain a pair of candidate frames corresponding to the same labeled object, only keeping one candidate frame with high confidence in the pair of candidate frames, and then combining the candidate frame prediction results of the two branches to finally obtain a group of candidate frames, wherein the number of the candidate frames is n;
step 3.3.10: selecting a candidate box from the set of candidate boxes obtained in step 3.3.9; according to the position of the candidate frame, the characteristic diagram F is calculatedmapAt a corresponding position, twoRespectively intercepting corresponding areas of the characteristic diagrams of the branches;
step 3.3.11: convolving the intercepted areas by a convolution layer respectively;
step 3.3.12: connecting the features after the convolution of the two branches with a full connection layer at the same time for information fusion;
step 3.3.13: then go through a full connection layer and a Softmax layer to obtain the probability Y that the candidate frame is the difference regiontrueAnd probability Y of not being a difference regionfalse
Step 3.3.14: if the candidate box overlaps IoU with the labeled difference region>0.8, the candidate frame is the positive sample for judging the difference, the real label YlabelIs 1; if there is an overlap IoU with the labeled difference region<0.3, the candidate box is the judgment difference negative sample, the real label YlabelIs 0;
step 3.3.15: calculating an output probability Y of a current frame candidate difference detection sectiontrueAnd YfalseWith a real label YlabelCross entropy loss function of (1):
L(w)=YlabellogYtrue+(1-Ylabel)logYfalse
step 3.3.16: circularly executing 3.10-3.15 until the cross entropy loss function of each candidate box and the real label obtained in the step 3.3.9 is calculated; updating parameters of the whole network by using a gradient descent method;
step 3.3.17: calculating the accuracy of difference judgment:
if Y istrue>YfalseIf y is 1; otherwise, y is 0; further calculation of
Wherein the content of the first and second substances,
step 3.3.18: circularly executing the step 3.3.1-3.3.17 until the mAP is more than 70% and the P is more than 95%, or exiting the circulation when the circulation times reach the set times;
and 4, step 4: a pair of newly acquired aerial images Tpast,TcurrentInputting the scaled and normalized data into the model trained in the step 3 to obtain the coordinates of the framing area, the categories of the framing objects and difference results; selecting partial results in aerial photograph TcurrentDrawing, and marking the category of the object to obtain a difference detection diagram; the frame part is the difference between the current time and the past shooting of the place.
2. The aerial image difference detection method based on the dual networks as claimed in claim 1, wherein: the specific process in the step 1 is as follows:
step 1.1: data acquisition:
the unmanned aerial vehicle flies along the same planned path at different moments, and simultaneously, an airborne camera of the unmanned aerial vehicle is used for collecting aerial images along the line to obtain a series of aerial images shot at different moments;
step 1.2, zooming the images collected by the unmanned aerial vehicle-mounted camera:
uniformly scaling the acquired images to the same size according to the input size of the neural network; when needing to carry out amplification processing, carrying out image interpolation processing on the image in the amplification process;
step 1.3, according to the following formula
Normalizing the zoomed image; wherein XkFor pixel values, X, at each position of the pictureminIs the smallest pixel value, X, in the picturemaxThe maximum pixel value in the picture is taken;
step 1.4, image matching:
coordinate matching is carried out by utilizing the GPS information of the picture to obtain a plurality of pairs of images T at the same place and different momentspast,Tcurrent(ii) a If the picture isIf the piece does not record GPS information, a plurality of pairs of pictures at the same place at different moments are obtained by using a characteristic point matching method.
3. The aerial image difference detection method based on the dual networks as claimed in claim 2, wherein: the specific 10-layer structure of each branch of the feature extraction part in the step 2 is a first convolution layer, a pooling layer, a second convolution layer, a pooling layer, a third convolution layer, a pooling layer, a fourth convolution layer, a pooling layer, a fifth convolution layer and a pooling layer; the first convolution layer is 32 convolution kernels, the size of the convolution kernels is 3 x 3, the second convolution layer is 64 convolution kernels, the size of the convolution kernels is 3 x 3, the third convolution layer is 128 convolution kernels, the size of the convolution kernels is 3 x 3, the fourth convolution layer is 256 convolution kernels, the size of the convolution kernels is 3 x 3, the fifth convolution layer is 512 convolution kernels, and the size of the convolution kernels is 3 x 3.
4. The aerial image difference detection method based on the dual networks as claimed in claim 3, wherein: in the step 2, the convolution layer of the detection module is 5+ C convolution kernels, the convolution kernel is 3 x 3, and C represents the set detection variety number.
5. The aerial image difference detection method based on the dual networks as claimed in claim 4, wherein: in step 3.3.1, illumination noise is manually added to the two pictures for data enhancement, and then the two pictures are input into the network.
CN201811015421.6A 2018-08-31 2018-08-31 Aerial image difference detection method based on double networks Active CN109255317B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811015421.6A CN109255317B (en) 2018-08-31 2018-08-31 Aerial image difference detection method based on double networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811015421.6A CN109255317B (en) 2018-08-31 2018-08-31 Aerial image difference detection method based on double networks

Publications (2)

Publication Number Publication Date
CN109255317A CN109255317A (en) 2019-01-22
CN109255317B true CN109255317B (en) 2021-06-11

Family

ID=65050484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811015421.6A Active CN109255317B (en) 2018-08-31 2018-08-31 Aerial image difference detection method based on double networks

Country Status (1)

Country Link
CN (1) CN109255317B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135424B (en) * 2019-05-23 2021-06-11 阳光保险集团股份有限公司 Inclined text detection model training method and ticket image text detection method
CN110188776A (en) * 2019-05-30 2019-08-30 京东方科技集团股份有限公司 Image processing method and device, the training method of neural network, storage medium
CN112188217B (en) * 2019-07-01 2022-03-04 四川大学 JPEG compressed image decompression effect removing method combining DCT domain and pixel domain learning
CN111582043B (en) * 2020-04-15 2022-03-15 电子科技大学 High-resolution remote sensing image ground object change detection method based on multitask learning
CN111751380B (en) * 2020-07-08 2021-08-31 中国水利水电科学研究院 Concrete dam crack inspection method based on light and small unmanned aerial vehicle
CN112818966B (en) * 2021-04-16 2021-07-30 武汉光谷信息技术股份有限公司 Multi-mode remote sensing image data detection method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7596242B2 (en) * 1995-06-07 2009-09-29 Automotive Technologies International, Inc. Image processing for vehicular applications
CN106778867A (en) * 2016-12-15 2017-05-31 北京旷视科技有限公司 Object detection method and device, neural network training method and device
CN107871119A (en) * 2017-11-01 2018-04-03 西安电子科技大学 A kind of object detection method learnt based on object space knowledge and two-stage forecasting
CN108062753A (en) * 2017-12-29 2018-05-22 重庆理工大学 The adaptive brain tumor semantic segmentation method in unsupervised domain based on depth confrontation study
CN108319949A (en) * 2018-01-26 2018-07-24 中国电子科技集团公司第十五研究所 Mostly towards Ship Target Detection and recognition methods in a kind of high-resolution remote sensing image

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7596242B2 (en) * 1995-06-07 2009-09-29 Automotive Technologies International, Inc. Image processing for vehicular applications
CN106778867A (en) * 2016-12-15 2017-05-31 北京旷视科技有限公司 Object detection method and device, neural network training method and device
CN107871119A (en) * 2017-11-01 2018-04-03 西安电子科技大学 A kind of object detection method learnt based on object space knowledge and two-stage forecasting
CN108062753A (en) * 2017-12-29 2018-05-22 重庆理工大学 The adaptive brain tumor semantic segmentation method in unsupervised domain based on depth confrontation study
CN108319949A (en) * 2018-01-26 2018-07-24 中国电子科技集团公司第十五研究所 Mostly towards Ship Target Detection and recognition methods in a kind of high-resolution remote sensing image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Dual-channel convolutional neural network for change detection of multitemporal SAR images;Tao Liu et al.;《2016 International Conference on Orange Technologies (ICOT)》;20180205;60-63 *
基于双模全卷积网络的行人检测算法(特邀);罗海波 等;《红外与激光工程》;20180225;第47卷(第2期);10-17 *

Also Published As

Publication number Publication date
CN109255317A (en) 2019-01-22

Similar Documents

Publication Publication Date Title
CN109255317B (en) Aerial image difference detection method based on double networks
Chen et al. Vehicle detection in high-resolution aerial images via sparse representation and superpixels
US10217236B2 (en) Remote determination of containers in geographical region
CN107133569B (en) Monitoring video multi-granularity labeling method based on generalized multi-label learning
CN108734143A (en) A kind of transmission line of electricity online test method based on binocular vision of crusing robot
CN108319964B (en) Fire image recognition method based on mixed features and manifold learning
CN111259809B (en) Unmanned aerial vehicle coastline floating garbage inspection system based on DANet
Shahab et al. How salient is scene text?
Stankov et al. Building detection in very high spatial resolution multispectral images using the hit-or-miss transform
Hormese et al. Automated road extraction from high resolution satellite images
Seo et al. Exploiting publicly available cartographic resources for aerial image analysis
Han et al. Aerial image change detection using dual regions of interest networks
CN113111727A (en) Method for detecting rotating target in remote sensing scene based on feature alignment
Han et al. Research on remote sensing image target recognition based on deep convolution neural network
Khoshboresh-Masouleh et al. Robust building footprint extraction from big multi-sensor data using deep competition network
Balaska et al. Enhancing satellite semantic maps with ground-level imagery
CN110688512A (en) Pedestrian image search algorithm based on PTGAN region gap and depth neural network
CN113284144B (en) Tunnel detection method and device based on unmanned aerial vehicle
Persson et al. Automatic building detection from aerial images for mobile robot mapping
Persson et al. Fusion of aerial images and sensor data from a ground vehicle for improved semantic mapping
Albalooshi et al. Deep belief active contours (DBAC) with its application to oil spill segmentation from remotely sensed sea surface imagery
Guili et al. A man-made object detection algorithm based on contour complexity evaluation
CN112329559A (en) Method for detecting homestead target based on deep convolutional neural network
CN110245566B (en) Infrared target remote tracking method based on background features
Rui et al. Real-Time obstacle detection based on monocular vision for unmanned surface vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant