CN112215189A

CN112215189A - Accurate detecting system for illegal building

Info

Publication number: CN112215189A
Application number: CN202011133537.7A
Authority: CN
Inventors: 王也; 周龙; 汤淼; 葛家明
Original assignee: Nanjing Smart Aviation Research Institute Co ltd
Current assignee: Nanjing Smart Aviation Research Institute Co ltd
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2021-01-12

Abstract

The invention belongs to the technical field of image recognition target detection, and particularly relates to an accurate detection system for a violation building, which comprises the following components: the picture acquisition module is used for acquiring a picture of the illegal building according to the video of the preselected area; the marking module marks the picture and acquires a prior frame according to the marked picture; the dividing module is used for dividing a data set consisting of pictures into a training set and a test set; the preprocessing module is used for preprocessing the training set; a training module for training the pairs according to the pre-processed training setYOLOV4, training the model; an adjustment module for training according to the test setYOLOV4, adjusting the model; and a detection module for adjusting the video inputYOLOVThe 4 model is used for detecting the illegal buildings in the video, so that the illegal buildings can be quickly and accurately identified.

Description

Accurate detecting system for illegal building

Technical Field

The invention belongs to the technical field of image recognition target detection, and particularly relates to an accurate detection system for illegal buildings.

Background

With the continuous development of urbanization, the value of house products is also high along with the water rising ship, illegal buildings with regulations are newly added on the occupied land or the situation that the original building design is changed and the illegal buildings with the covers are covered is in a large number, the illegal buildings are difficult to discover due to the concealment of the illegal buildings on the roofs of the buildings, the illegal buildings are always difficult to discover in illegal dismantling work, but the illegal buildings cannot be discovered and dismantled quickly due to the limitation of manpower resources and information ways.

At present, with excellent performance of deep learning in the field of target detection and popularization of unmanned aerial vehicles, a very powerful solution is provided for finding and removing buildings violating regulations on building roofs. However, most of the existing algorithms are optimized based on the existing data sets, such as ImageNet, COCO, VOC, and the like, and in practical application, no standardized method has been formed for identifying specific target objects in specific scenes, such as identifying illegal buildings in low-altitude unmanned-machine shooting videos. Because the flying height of the unmanned aerial vehicle is higher, the pixel size of each image is larger, which brings some difficulties for the identification of the violation buildings of the unmanned aerial vehicle, and the samples of the violation buildings shot by the unmanned aerial vehicle are difficult to obtain.

In the illegal building identification, two methods are mainly used, one is a method based on image comparison, and the other is a method based on image identification, and the method is specifically as follows:

the method based on image recognition mainly depends on excellent performance of a deep learning model, a model is trained through a marked data set, and new aerial photography data are input, so that information of the illegal building can be obtained.

After the difference is identified by the image contrast-based method, human judgment may be needed, and some current image identification methods are not suitable for video identification due to low efficiency. Therefore, an efficient and feasible scheme for detecting the target object of the video shot by the low-altitude unmanned aerial vehicle is urgently needed.

Therefore, a new illegal building accurate detection system needs to be designed based on the technical problems.

Disclosure of Invention

The invention aims to provide an accurate detection system for illegal buildings.

In order to solve the technical problem, the invention provides a system for detecting illegal buildings, which comprises:

the picture acquisition module is used for acquiring a picture of the illegal building according to the video of the preselected area;

the marking module marks the picture and acquires a prior frame according to the marked picture;

the dividing module is used for dividing a data set consisting of pictures into a training set and a test set;

the preprocessing module is used for preprocessing the training set;

the training module is used for training the Yolov4 model according to the preprocessed training set;

the adjusting module is used for adjusting the trained Yolov4 model according to the test set; and

and the detection module inputs the video into the adjusted YOLOV4 model to detect the illegal buildings in the video.

Further, the picture acquisition module is adapted to acquire a picture of the violation building according to the video of the preselected area, i.e. the picture acquisition module is adapted to acquire the picture of the violation building according to the video of the preselected area

The method comprises the steps of shooting a video of a preselected area, selecting a video segment with the illegal buildings in the video to obtain a picture with the illegal buildings, and adjusting the resolution of the picture to be a preset resolution.

Further, the labeling module is adapted to label the picture and obtain the prior frame according to the labeled picture, that is, the prior frame is obtained

Marking the position of the illegal building in the picture after resolution adjustment, acquiring a target frame, and acquiring the length, width and position of the target frame in the marking data;

taking the marked object type, the length and width of the target frame and the position as labels of the picture, and normalizing the length and width of the target frame:

wherein, w_rIs the width of the target box after normalization; h is_rIs the height of the target box after normalization; w is the width of the target frame; h is the target frame height; w is the width of the picture; h is the height of the picture;

initializing the category number and the clustering centers of the prior frames, and calculating the distance IOU between each target frame and all the clustering centers:

in＝min(h₁,h₂)min(w₁,w₂)；

un＝h₁w₁+h₂w₂；

wherein in is the intersection of the two target frames; un is the union of two target frames; h is₁Is the height of one target box; w is a₁Is the width of one target frame; h is₂Is the height of another target box; w is a₂Is the width of another target box;

selecting the nearest clustering center as the category of the target frame;

and taking the mean value of each cluster as the clustering center of the next iteration until the center position of each category is within the error of the two adjacent iterations, and taking the final clustering center as a prior frame.

Further, the partitioning module is adapted to partition the data set consisting of pictures into a training set and a test set, i.e. into

And dividing a data set consisting of the pictures with the adjusted resolution ratio into a training set and a test set according to a preset proportion.

Further, the preprocessing module is adapted to preprocess the training set, i.e.

Data enhancement of training sets, i.e.

Selecting a preset number of pictures in the training set each time, turning over, zooming and changing the color gamut of the pictures, and placing the pictures according to a preset position to combine the pictures and the frames;

and coding the class number of the picture into one-hot coding, and processing the label.

Further, the training module is adapted to train the YOLOV4 model according to a pre-processed training set, i.e. the training module is adapted to train the YOLOV4 model according to a pre-processed training set

Obtaining parameters of a Yolov4 model pre-trained on a COCO data set, and initializing a Yolov4 network according to the parameters;

inputting data and prior frames in a training set, carrying out forward propagation of a YOLOV4 network, and calculating a loss value between a prediction result and a real label according to a loss function of a YOLOV4 model, namely

Obtaining a regression optimization loss value CIOU of a YOLOV4 model:

where ρ is²(b,b^gt) The Euclidean distance between the central points of the prediction frame and the real frame is taken as the distance; c is the diagonal distance of the minimum closure area which simultaneously contains the prediction frame and the real frame; w is a^pIs the width of the prediction box; h is^pIs the high of the prediction box; w is a^gtIs the width of the real frame; h is^gtIs the height of the real frame;

loss value Loss_CIOUComprises the following steps:

Loss_CIOU＝1-CIOU；

and adjusting the weight and the bias of the YOLOV4 network according to the loss value, completing one iteration of the YOLOV4 network, and circulating until an early stop condition or the maximum iteration number is reached so as to complete the training of the YOLOV4 model.

Further, the adaptation module is adapted to adapt the trained YOLOV4 model according to the test set, i.e. the adaptation module is adapted to adapt the trained YOLOV4 model according to the test set

And inputting the data in the test set into the trained Yolov4 model to obtain a test result, and adjusting the trained Yolov4 model according to the test result.

Further, the detection module is adapted to input the video into the adapted YOLOV4 model to detect buildings violating regulations within the video, i.e. the detection module is adapted to detect buildings violating regulations within the video

And inputting the video into the adjusted YOLOV4 model, performing predictive analysis on the picture of each frame, detecting the illegal buildings in the picture, and marking the picture.

The method has the advantages that the picture of the illegal building is obtained through the picture obtaining module according to the video of the preselected area; the marking module marks the picture and acquires a prior frame according to the marked picture; the dividing module is used for dividing a data set consisting of pictures into a training set and a test set; the preprocessing module is used for preprocessing the training set; the training module is used for training the Yolov4 model according to the preprocessed training set; the adjusting module is used for adjusting the trained Yolov4 model according to the test set; and the detection module inputs the video into the adjusted YOLOV4 model to detect the illegal buildings in the video, so that the illegal buildings are quickly and accurately identified.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of an accurate detection system for illegal buildings according to the present invention;

fig. 2 is a schematic diagram of the results of data enhancement in accordance with the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flow chart of an accurate detection system for buildings against traffic violations in accordance with the present invention.

As shown in fig. 1, the present embodiment provides a violation building detection system, including: the picture acquisition module is used for acquiring a picture of the illegal building according to the video of the preselected area; the marking module marks the picture and acquires a prior frame according to the marked picture; the dividing module is used for dividing a data set consisting of pictures into a training set and a test set; the preprocessing module is used for preprocessing the training set; the training module is used for training the Yolov4 model according to the preprocessed training set; the adjusting module is used for adjusting the trained Yolov4 model according to the test set; and the detection module inputs the video into the adjusted YOLOV4 model to detect the illegal buildings in the video, so that the illegal buildings are quickly and accurately identified.

In this embodiment, the picture acquiring module is adapted to acquire a picture of the buildings against regulations according to the video of the preselected area, that is, an unmanned aerial vehicle is used to shoot the selected area (preselected area), the video of the preselected area is shot, a video clip with buildings against regulations in the video is selected to be processed, so as to acquire the picture with buildings against regulations, and the resolution of the picture is adjusted to a preset resolution; considering that the pixels of the high-definition image (picture) shot by the unmanned aerial vehicle are too large, subsequent processing is affected, and the resolution of the picture is reduced to a preset resolution (for example, 1024 × 1024).

In this embodiment, the labeling module is adapted to label the picture and obtain the prior frame according to the labeled picture, i.e. using Labelimg

taking the marked object type, the length and width of the target frame and the position as labels of the picture, and normalizing the length and width of the target frame: (considering the scenes of different scales, the sizes of the target frames are different, so the target frames are normalized according to the height and width of the picture, and the normalization is performed according to the following formula):

in＝min(h₁,h₂)min(w₁,w₂)；

un＝h₁w₁+h₂w₂；

selecting the nearest clustering center as the category of the target frame;

taking the mean value of each cluster as a clustering center of the next iteration;

and repeatedly calculating the distance IOU between each target frame and all the clustering centers and taking the mean value of each cluster as the clustering center of the next iteration until the error of the center position of each category in the two adjacent iterations is within epsilon, wherein the last clustering center is a prior frame, and 9 prior frames are obtained through K-Means clustering.

In this embodiment, the dividing module is adapted to divide the data set formed by the pictures into a training set and a test set, that is, divide the data set formed by the pictures with the adjusted resolution into the training set and the test set according to a preset ratio, and divide the data set into the training set and the test set according to the object category in the label in the original data according to 8: a ratio of 2 is used to divide the test set and the training set.

In this embodiment, the preprocessing module is adapted to perform preprocessing on the training set, that is, perform Mosaic data enhancement on the training set, that is, select a preset number of pictures in the training set each time, perform flipping, scaling, and color gamut change processing on the pictures, and place each picture according to a preset position to perform combination of the pictures and combination of the frames, for example, first read four pictures each time, then respectively flip, scale, and color gamut change the four pictures, and place the pictures according to four direction positions, and finally perform combination of the pictures and combination of the frames (as shown in fig. 2); and coding the class number of the picture into a one-hot code, and smoothing the Label by a Label-smooth method, thereby improving the generalization capability of the Yolov4 model.

In this embodiment, the training module is adapted to train the YOLOV4 model according to a pre-processed training set, i.e. the training module is adapted to train the YOLOV4 model according to a pre-processed training set

Obtaining parameters of a YOLOV4 model pre-trained on a COCO data set (e.g., downloaded from a related website), and initializing a YOLOV4 network according to the parameters;

Obtaining a regression optimization loss value CIOU of a YOLOV4 model:

where ρ is²(b,b^gt) Euclidean distance between the predicted frame (the predicted output of the YOLOV4 model) and the center point of the true frame (the true target frame); c is the diagonal distance of the minimum closure area which simultaneously contains the prediction frame and the real frame; w is a^pIs the width of the prediction box; h is^pIs the high of the prediction box; w is a^gtIs the width of the real frame; h is^gtIs the height of the real frame;

loss value Loss_CIOUComprises the following steps:

Loss_CIOU＝1-CIOU；

and adjusting the weight and the bias of the YOLOV4 network by using a back propagation algorithm according to the loss value, completing one iteration of the YOLOV4 network, and circulating until an early stop condition or the maximum iteration number is reached so as to complete the training of the YOLOV4 model.

In this embodiment, the adjusting module is adapted to adjust the trained YOLOV4 model according to the test set, that is, to input the data in the test set into the trained YOLOV4 model (the trained YOLOV4 model is a mixed model) to obtain the inspection result, and to adjust the trained YOLOV4 model according to the inspection result.

In this embodiment, the detection module is adapted to input the video into the adjusted YOLOV4 model to detect the buildings against regulations in the video, that is, to input the video into the adjusted YOLOV4 model, perform prediction analysis on the picture of each frame, detect the buildings against regulations in the picture, mark the pictures, and then form the video display.

In this embodiment, a set of targeted training data is acquired by unmanned aerial vehicle aerial photography, the accuracy of model (YOLOV4 model) recognition can be improved by a data processing technology, and the current latest target recognition model YOLOV4 has good performance in speed and accuracy.

In conclusion, the picture of the illegal building is obtained through the picture obtaining module according to the video of the preselected area; the marking module marks the picture and acquires a prior frame according to the marked picture; the dividing module is used for dividing a data set consisting of pictures into a training set and a test set; the preprocessing module is used for preprocessing the training set; the training module is used for training the Yolov4 model according to the preprocessed training set; the adjusting module is used for adjusting the trained Yolov4 model according to the test set; and the detection module inputs the video into the adjusted YOLOV4 model to detect the illegal buildings in the video, so that the illegal buildings are quickly and accurately identified.

In the embodiments provided in the present application, it should be understood that the disclosed system may be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the functions described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims

1. A violation building detection system, comprising:

the preprocessing module is used for preprocessing the training set;

2. The violation building detection system of claim 1 wherein,

the picture acquisition module is suitable for acquiring pictures of illegal buildings according to videos of preselected areas, namely

3. The violation building detection system of claim 2 wherein,

the labeling module is suitable for labeling the picture and acquiring a prior frame according to the labeled picture, namely

in＝min(h₁,h₂)min(w₁,w₂)；

un＝h₁w₁+h₂w₂；

selecting the nearest clustering center as the category of the target frame;

4. The violation building detection system of claim 3 wherein,

the partitioning module is adapted to partition a data set consisting of pictures into a training set and a test set, i.e. into

5. The violation building detection system of claim 4 wherein,

the preprocessing module is adapted to preprocess the training set, i.e. to

Data enhancement of training sets, i.e.

6. The violation building detection system of claim 5 wherein,

the training module is adapted to train the YOLOV4 model according to a pre-processed training set, i.e. the training module is adapted to train the YOLOV4 model according to a pre-processed training set

Obtaining a regression optimization loss value CIOU of a YOLOV4 model:

loss value Loss_CIOUComprises the following steps:

Loss_CIOU＝1-CIOU；

7. The violation building detection system of claim 6 wherein,

the adaptation module is adapted to adapt the trained YOLOV4 model according to the test set, i.e.

8. The violation building detection system of claim 7 wherein,

the detection module is adapted to input the video into the adapted YOLOV4 model to detect the offending building within the video, i.e. the