Disclosure of Invention
In view of the above problems of the prior art, an object of the present invention is to provide a water system information extraction algorithm based on FasterR-CNN.
In order to solve the problems, the technical scheme adopted by the invention is as follows:
a water system information extraction algorithm based on FasterR-CNN is characterized by comprising the following steps:
step 1: preprocessing remote sensing image data for training and establishing a sample database;
step 2: carrying out deep learning on the FasterR-CNN network by using data in the sample database to finish algorithm training;
and step 3: detecting the omission factor by using the trained FasterR-CNN network and calculating an F1 score index;
and 4, step 4: calculating actually acquired data by using FasterR-CNN to obtain a training sample set, performing transfer learning by using AlexNet, and positioning a crack framework by using AlexNet;
and 5: calculating the shape and width of the water system by using a morphological algorithm based on the water system classification result obtained in the step 4;
step 6: and reducing the false detection rate of the final result through an F1 score index, and outputting the final result.
The pretreatment in the step 1 comprises the following steps:
step 1: extracting picture characteristics through a ResNet residual error network, and generating a convolution characteristic graph;
step 2: the water system is classified by the region suggestion network RPN using the convolution signature.
The F1 score index is obtained by the following formula:
in the formula: TP: true positive, FP: false positive, FN false negative.
The steps of the morphological algorithm are as follows:
step 1: extracting a water system framework processed by Faster R-CNN in a boundary frame area and each row of minimum gray value pixel points processed by CNN to form a water system initial basic framework;
step 2: traversing the picture in the step 1 by adopting a 3-by-3 sliding window, and filtering pixel points in the region if only one pixel point exists in the sliding window;
and step 3: filtering out a communicating region with a smaller area through an area filter, and performing adjacent communicating region connection operation in the remaining communicating region;
and 4. skeletonizing the water system form to obtain a complete continuous skeleton.
And 5: and (5) repeating the step 3 and the step 4 to obtain the width of the water system.
The AlexNet neural network consists of five convolutional layers and three full-connection layers, the depth of the three full-connection layers is eight layers in total, and the output of the last full-connection layer is sent to the Softmax layer.
Compared with the prior art, the method adopts a FasterR-CNN algorithm to quickly identify the type, position and width of the water system, uses AlexNet migration learning to position a crack skeleton aiming at the extracted water system area, finally extracts the river form by using a morphological algorithm, calculates the length and width of the crack skeleton, introduces an F1 score index to reduce the false detection rate aiming at the problem that FasterR-CNN has low false detection rate and high false detection rate, and is suitable for application scenes with more diversified water systems in reality.
Detailed Description
The invention is further described with reference to specific examples.
Example 1
The present embodiment performs the demonstration of the algorithm with the Chongqing terrain.
The topographic range of the data acquisition object is shown in fig. 1, the research area is located in a Chongli area, the administrative area belongs to Zhang Kousian of Hebei province, and the geographic position is 40 degrees, 47 degrees, 41 degrees, 17 'N degrees, 114 degrees, 17 degrees, 115 degrees, 34' E degrees. Area 2334 square kilometers, elevation 813-2174m, maximum elevation 1361 m. The field observation data shows that the daily average highest temperature of the region is 12 degrees, the daily average lowest temperature is-2 degrees, the annual average water quantity is 488 millimeters, the total precipitation quantity is 11.3 billion cubic meters, the average annual runoff quantity of the surface water of the whole region is 42.9 millimeters, and the total annual runoff quantity is 1.0069 billion cubic meters. 80% of the interior is mountainous regions, the forest coverage rate is as high as 52.38%, and the interior is the northwest of the Ji, which is mostly in the north-west China and east-west China.
In this embodiment, elevation data provided by enter is used, and the downloaded data is preprocessed by data mosaic, coordinate conversion, data clipping, etc. to obtain a DEM original image with the same size as the research area, ASTER GDEM is one of the most common surface digital elevations, and is widely used for water system information extraction research.
Due to the influence of spatial resolution and system errors, some wrong 'holes' are generated in DEM data in the generation process, so that when the river flow direction is extracted, the water flow direction of the holes in the DEM is incorrect, and the accuracy of river network extraction is affected. It is therefore necessary to "puddle" the original DEM to eliminate the effects of the puddle. However, not all dimples are caused by data errors, and some dimples are true topographic responses, and based on this theory, reasonable dimple filling thresholds are set for "dimpling" the original DEM to obtain the final initial data.
The general flow chart of the algorithm is shown in fig. 2, in the prior art, the water system apparent image acquisition means are various, and the unmanned aerial vehicle, the professional configuration camera and the remote sensing image data can realize rapid and non-contact high-quality data acquisition. Firstly, preprocessing collected data, cutting and dividing a water system picture, then unifying the size of the picture and enhancing the picture data. Classifying according to the type of the water system, labeling the images by adopting special software, and establishing a corresponding sample database; inputting sample data for training into a built Faster R-CNN network after the algorithm starts to run, and deeply learning the whole network by adjusting parameters; and finally, testing the trained network.
After the network learning network is trained, the fast R-CNN model is tested, the false detection rate of the picture is detected, the introduced accuracy rate P (Precision) and the recall rate R (Recall) are used for algorithm evaluation, and all indexes are defined as follows:
in the formula: TP: true Positive (True Positive), FP: false Positive (False Positive) and False Negative (False Negative) FN, and the False Negative rate R-1 and the False Positive rate P-1 can be obtained according to the above formula.
And finally, introducing an F1 score index for reducing the false detection rate, wherein the F1 score index is calculated as follows:
after the algorithm determines the boundary box of the identified water system, AlexNet transfer learning in CNN is further introduced, and the morphology algorithm is combined to perform subsequent processing on the water system, so that the form of the water system is extracted, the length and width of the water system in the image of the water system are obtained, and the water system information is more accurate and abundant.
The FasterR-CNN algorithm is derived from CNN and is an advanced version of target detection algorithms R-CNN and Fast R-CNN, the structure is shown in FIG. 3, firstly, a picture is input, and picture features are extracted through ResNet. ResNet has the advantages that the recognition accuracy can be improved by increasing the depth by using the residual error network, and the more layers of the network, the more abundant the features extracted from different layers are. Then, on the generated convolution Feature Map (Feature Map), a candidate window is generated by the region suggestion network RPN, and it is determined whether or not the current region is a foreground (target/water system). If the relevant area is classified as the foreground, the water systems are classified through the full connecting layer, the window coordinates are finely adjusted through frame regression, finally, the specific water system category is marked by applying a rectangular frame on an output picture, and the classification confidence coefficient is given.
The region suggestion network RPN belongs to the category of FCN (fuzzy connected networks), and aims to obtain a high-quality water system candidate region frame on an image and replace a previous selective search method, so that the whole network can be trained end to end. In practical use of the algorithm, as shown in fig. 4, firstly, a convolution of 3 × 3 is continuously performed on a feature map (with a size of 60 × 40 pixels and a depth of 256 dimensions) output by a feature extraction network (ResNet, VGG, etc.), and the size and the depth are unchanged, so as to combine with surrounding information to obtain a convolution feature map. And then mapping each pixel point on the feature map to the original image to generate 9 types of Anchor frames (anchors). Simultaneously, two convolution operations (convolution kernel is 1 multiplied by 1) are synchronously carried out on the convolution characteristic diagram, wherein one operation is classification (divided into positive samples: target; negative samples: background), and each pixel point outputs 18-dimensional vectors (9 anchor frames multiplied by 2 classification scores); the other is regression, and the output is 36-dimensional vector (9 anchor frames × 4 coordinate values) in the same way, so the number of the anchor frames obtained finally is: 60X 40X 9 ≈ 21600. Then, screening out proper positive and negative samples through the following steps:
where IOU (interaction over union) is the interaction ratio, and the ratio of the intersection and union of the areas between the predicted anchor frame a and the corresponding actual anchor frame B is calculated, where the anchor frame IOU >0.6 is considered as a positive sample, the IOU <0.4 is considered as a negative sample, and the rest are discarded.
And then adopting a non-Maximum value to inhibit NMS (non Maximum suppression) for finding the best detection target position by removing the anchor frame with the Maximum coincidence rate with the true value and only leaving the candidate frame with the Maximum prediction probability value as the final prediction result.
And finally, Rank sorting is carried out, the remaining anchor boxes are sorted from high to low according to the confidence level, at most, the first 128 anchor boxes are taken as positive samples for final training, and 128 negative samples are provided. And finally substituting the candidate frame obtained by screening into a training network for training.
And then training is carried out, wherein water system samples for training are derived from water system pictures which are actually acquired and downloaded digital elevation data, the size of an original picture is 1024X1000 pixels, after slicing and dividing, the size of a small sample picture is 32X32 pixels, data enhancement (rotation and the like) is carried out on the data to obtain a training sample set, wherein the water system samples are 10118, the number of non-water system samples is 9742, and then the prediction training network AlexNet training network is used for transfer learning. The process diagram of training is shown in fig. 5.
The AlexNet neural network model is shown in FIG. 6, which consists of five convolutional layers and three fully-connected layers, eight layers deep in total, with the output of the last fully-connected layer sent to the Softmax layer as follows:
softmax may output a limit in the (0,1) range, thereby ensuring that neurons are activated, resulting in a distribution that covers multiple classes of tags. Compared with the traditional neural network, the Alex Net model has the following advantages:
1. using a ReLU activation function (ReLU (x) ═ max (x,0)), if the input is not less than 0, the gradient of the ReLU is always 1;
2. enhancing the data set to suppress the fit;
3. inhibiting overfitting by adopting a Drop Out method;
4. the generalization is enhanced with a local response normalization layer.
The input of the Alex Net model is 32x3, the pathological image used in the method is a three-channel RGB image, and the dimension meets the requirement of the model. And (3) sending the adjusted pathological image into a network, firstly reaching a convolution layer, pooling the obtained result after the first layer of convolution operation, and finally inputting the result into a second layer after standardization. The operations of the subsequent second layer to the fifth layer are similar to those of the first layer, and are not described again. The output result of the fifth layer of the network is sent to the following full connection layer. The outputs of the sixth and seventh layers are both vectors of length 1000. And finally, the network obtains a final classification result by using a Softmax classifier.
After classification is finished, a morphological algorithm is used for extracting water system morphology, a fast R-CNN technology is adopted to process and obtain a basic framework of a Chongli area water system in an anchor frame mode, remote sensing data are affected by mountain terrain, noise is large, errors are large, and the resolution of partial water system characteristics on a background pixel gray level distinguishing boundary line is low, so that the morphological algorithm is introduced to carry out deep processing on the water system coarse framework processed by the CNN, the water system morphology is restored through the gray level minimum value in the boundary frame area, and the method mainly comprises the following steps:
step 1, extracting each row of minimum gray value pixel points of a water system framework (CNN treatment) in a boundary frame area (Faster R-CNN treatment) to form a water system initial basic framework;
step 2, because of the influence of surrounding geographic environment, the extraction result in the first step is easy to generate pepper salt noise points, a 3-by-3 sliding window is adopted to traverse the picture, and if only one pixel point is arranged in the sliding window, the pixel point in the region is filtered;
filtering out a communicating region with a smaller area through an area filter, and performing adjacent communicating region connection operation in the remaining communicating region;
and 4. skeletonizing the form of the water system, so that the width of the water system is convenient to extract later. The obtained framework is a complete continuous type and is different from the discontinuous type obtained in the previous steps;
and 5, combining the steps 3 and 4, extracting the width of the water system, using the normal length of each point tangent line of the skeleton as the water system width of the point in a mode of traversing the water system skeleton, and selecting 30 points of water system width along the water system direction for practical test as shown in FIG. 7, wherein the result shows that the calculated width and the actual width are within 1 m.
Finally, an F1 score index in the formula is introduced, the false detection rate of the algorithm is reduced, the algorithm identification precision is further improved, the corresponding water system pixel area and the confidence coefficient threshold value are determined according to the maximum value of the F1 score to reduce the false detection rate and adapt to the actual scene of water system diversification, the accuracy rate and the recall rate are combined into a score value, the accuracy rate and the recall rate are considered to be equally important, the value range of F1 is [0,1], and the larger the value is, the higher the corresponding identification precision is.