CN110852164A

CN110852164A - YOLOv 3-based method and system for automatically detecting illegal building

Info

Publication number: CN110852164A
Application number: CN201910956709.1A
Authority: CN
Inventors: 江寅; 朱传瑞
Original assignee: Anhui Pan Public Mdt Infotech Ltd
Current assignee: Anhui Pan Public Mdt Infotech Ltd
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2020-02-28

Abstract

The invention discloses a method and a system for automatically detecting illegal buildings based on YOLOv 3. The image acquisition equipment of the building area shoots four types of Kinect images by using Kinect equipment, and the detection system is used for image preprocessing, labeling and judgment by using a YOLOv3 neural network. And finally, sending a result obtained by processing the detection system to a system terminal, and fusing the recognition result of the Kinect images of four types as an auxiliary diagnosis result to be delivered to a user for final diagnosis so as to realize the recognition of the building area (the illegal building). The invention completes an automatic detection violation building system based on an intelligent image processing technology and a neural network, can reduce the workload of manual identification to a certain extent, and has economic and social significance.

Description

YOLOv 3-based method and system for automatically detecting illegal building

Technical Field

The invention relates to the technical field of illegal building image identification, in particular to a method and a system for automatically detecting illegal buildings based on YOLOv 3.

Background

The building change detection is one of important contents of geographic national condition monitoring, and has important significance for illegal building identification, city dynamic monitoring, geographic information updating and the like. Taking urban illegal building detection as an example, along with the continuous development of the economic society of China, the urbanization process is continuously accelerated, urban buildings are continuously increased, the number and the scale of illegal buildings are also continuously increased, the phenomenon not only destroys urban planning and urban landscape, but also influences urban image and resident life, is a hotspot problem concerned by common people, is a difficult problem of urban management, and is one of negative factors influencing social harmony. At present, the 'law enforcement cost is low and the law enforcement cost is high' is one of the main reasons for repeated prohibition of illegal buildings, besides the lack of related legal links, the detection aspect of the illegal buildings is weak, and due to the lack of automatic monitoring means for the illegal buildings, the mode of utilizing manual inspection has a plurality of disadvantages, namely, the period of the discovery process is long, and the large-scale monitoring cost is high. In recent years, illegal building detection is attempted by using satellite image data in cities such as Beijing, but the automatic analysis technology of image information is still not mature enough, and the specific gravity of manual identification and verification participation in the process is large. Billions of manpower and material resources are invested by land law enforcement, city management, nationwide each year for this task. The method with high automation degree, robustness and reliability is urgently needed in the market to detect urban illegal buildings, so that the renovation process of the urban illegal buildings is promoted.

Disclosure of Invention

The invention aims to provide a method and a system for automatically detecting illegal buildings based on YOLOv3, which aim to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme:

an automatic illegal building detection method based on YOLOv3 comprises the following steps:

step 1: firstly, obtaining a picture of a building area through image acquisition equipment, then firstly carrying out image definition pretreatment on the picture through image scanning equipment, then making a training set, selecting x groups of sample groups from a data set, wherein each sample group comprises y samples, each sample consists of an RGB image picture and a depth image picture, and obtaining 2 xxxy sample pictures in total;

step 2: copying each sample picture, and respectively adjusting the resolution to 300 × 225, 400 × 300, 500 × 375 and 600 × 450 according to the proportion to obtain sample pictures with four times of amplification quantity;

and step 3: pre-training the sample pictures with the four-fold amplification quantity through Darknet-53, transferring the network parameters obtained after the pre-training to a basic network, and initializing to obtain a transferred Darknet-53 model;

and 4, step 4: adopting a K-means clustering algorithm to cluster the building area frames manually marked in the training set, setting different K values, and counting the values of the corresponding error Square Sum (SSE);

step 6: drawing a relation graph of the SSE value and the k value; finding an optimal k value by an elbow method according to a relation graph of the SSE value and the k value to obtain corresponding k clustering centers, and writing the k clustering centers into a configuration file as initial candidate frame parameters of YOLOv 3;

and 7: training the training set obtained in the step 1 by using improved YOLOv3 to obtain a parameter model for completing training; and realizing the identification of the violation buildings by fusing the identification results of the four types of Kinect images.

Preferably, the training set in step 1 is prepared as follows:

1.1: the image acquisition of image acquisition equipment uses the Kinect equipment to shoot four types of Kinect images, and the image acquisition is respectively: cutting each image into a picture with a fixed position, a consistent angle and a known visual field; the resolution of the picture is 640 multiplied by 480;

1.2: copying each picture obtained by shooting, and respectively adjusting the resolution to 300 multiplied by 225, 400 multiplied by 300, 500 multiplied by 375 and 600 multiplied by 450 according to the proportion to obtain a Kinect image data set with four times of amplification;

1.3: manually marking a building area frame aiming at each picture in the four times of amplified Kinect image data set to generate a label file;

1.4: and combining the Kinect image data set and the label file to form a training set.

Preferably, after the trained parametric model is obtained in step 6, the method further includes: calling a Kinect camera to simultaneously output four types of Kinect images, and identifying by adopting a parameter model to obtain identification results of the four types of Kinect images; the four types of Kinect images refer to: IR images, Registration of RGB images, RGB images and Depth images.

Preferably, the value of the sum of squared errors SSE in step 3 is obtained as follows: YOLOv3 divides the image into S × S grids in the training process, and obtains B detection frames and confidence conf (object) thereof for each grid prediction according to formula (1), formula (2) and formula (3);

Conf(Object)＝Pr(Object)×IOU (1)，

wherein:

pr (object) indicates whether an object falls into the grid corresponding to the candidate box, if so, 0, as shown in formula (2);

IOU represents the ratio of the intersection area and union area of the prediction frame and the real frame; box (pred) represents a prediction box; box (Truth) represents a real box; area (·) denotes an area;

the confidence conf (object) represents the confidence level of the detected object;

each test box contains 5 parameters: x, y, w, h and conf (object); wherein, (x, y) represents the offset of the center of the detection box from the network position, and (w, h) represents the width and height of the detection box;

predicting C Class probabilities Pr (Class) per grid_i|Object)，Pr(Class_iI Object) represents the probability that the Object falls into grid i; the final output S × S × [ B × (4+1+ C)]Tensor of dimension (tensor); the loss function loss of YOLOv3 is characterized by equation (4):

wherein,

in order to be a coordinate error,in order to be an error of the IOU,

is a classification error, and has:

wherein:

λ_coordis composed of

Weight parameter of λ_coord＝5；λ_noobjIs composed of

Correction parameter lambda of_noobj＝0.5；

The value of the x parameter representing the real box to which grid i corresponds,

error of x parameter representing grid i;

the value of the y parameter representing the real box to which grid i corresponds,error of y parameter representing grid i;

the value of the w parameter representing the real box to which grid i corresponds,

error of w parameter representing grid i;

the value of the h parameter representing the real box to which grid i corresponds,error of h parameter representing grid i;

C_ia confidence Conf (object) predictor representing grid i;

the confidence conf (object) true value representing the mesh i,

representing the confidence error of grid i;

p_i(c) a prediction probability Pr (Class) representing the object falling into grid i_i|Object)；

Representing the true probability of the target falling into grid i,

representing the probability error that the target falls into grid i;

whether a target falls into the grid i or not is shown, if the target falls into the grid i, the target is 1, otherwise, the target is 0;

the judgment result shows that whether an object falls into the grid i in the jth prediction frame is judged, if so, the result is 1, otherwise, the result is 0.

Preferably, in the step 4, a group of initial candidate frames with fixed size and aspect ratio is introduced into the YOLOv3 in the target detection process, and a K-Means clustering algorithm is adopted to perform clustering analysis on the manually marked target frames in the training set obtained in the step 1, so as to find out the optimal K value representing the number of the initial candidate frames and the width-height dimension of K clustering centers as the candidate frame parameters in the network configuration file;

and (3) determining the k value according to the error sum of squares SSE and the elbow method according to the formula (8):

wherein Cl is_iIs the ith cluster, p is Cl_iSample point of (1), m_iIs Cl_iIs the center of gravity of Cl_iThe mean value of all samples in the process, SSE is the clustering error of all samples, which represents the good or bad clustering effect, and the core idea of the elbow method is as follows: along with the increase of the k value, the sample division is more fine, the SSE is gradually reduced, when k reaches the optimal clustering number, the return of clustering degree is rapidly reduced by continuously increasing the k value, which is represented by the sudden reduction of the SSE descending amplitude, the relation graph of the SSE and the k shows the shape of an elbow, and the k value corresponding to the elbow is the optimal clustering number required by the user.

Preferably, in the K-means clustering in step 5, the euclidean distance is used to represent the error between the sample point and the sample mean value, the sample point is a prediction frame, the sample mean value is a real frame, the IOU is used to reflect the error between the prediction frame and the real frame, and the larger the IOU is, the smaller the error is; the clustering error of the obtained samples is calculated by using equation (9):

wherein, the IOU_pIOU for sample point p, 1-IOU_pThe error at sample point p is represented, resulting in the SSE and k values.

Preferably, in step 6, the recognition result is sent to the system terminal, and the recognition result is delivered to the user for final recognition as an auxiliary diagnosis result.

The invention also provides an automatic illegal building detection system based on YOLOv3, which comprises:

the image acquisition equipment is used for shooting four types of Kinect images and uploading the images to the detection system;

the image scanning equipment is used for carrying out image definition pretreatment on the image shot by the image acquisition equipment and then sending the image into the detection system;

the detection system is used for acquiring the Kinect image, preprocessing the image, labeling and judging whether the Kinect image is a violation building or not by utilizing a YOLOv3 neural network;

and the system terminal receives the judgment processed by the detection system and displays the judgment result as a user auxiliary judgment result.

Preferably, the image acquisition equipment adopts Kinect equipment, and the four types of Kinect images comprise an IR image, a Registration of RGB image, an RGB image and a Depth image respectively; the resolution of the picture is 640 multiplied by 480, the image preprocessing is to copy each picture obtained by shooting, and the resolutions are respectively adjusted to 300 multiplied by 225, 400 multiplied by 300, 500 multiplied by 375 and 600 multiplied by 450 according to the proportion, so as to obtain a Kinect image data set with four times of amplification; and manually marking a building area frame for each picture in the four times of the amplified Kinect image data set by a labeling pointer to generate a label file.

Preferably, the image scanning device is based on a PC, into which Matlab710 based on the Retinex image enhancement algorithm is loaded.

Compared with the prior art, the invention has the beneficial effects that:

the invention can provide effective auxiliary diagnosis information, and the invention completes an automatic detection violation building system based on intelligent image processing technology and neural network, can reduce the workload of manual identification to a certain extent, and has economic and social significance.

Drawings

FIG. 1 is a schematic diagram of the system of the present invention;

fig. 2 is a schematic diagram of the network structure of YOLOV3 in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-2, the present invention provides a technical solution:

referring to fig. 1, an automatic illegal building detection method based on YOLOv3 includes an image acquisition device, an image scanning device, a detection system and a system terminal. The image acquisition of the building area is to use a Kinect device to shoot four types of Kinect images, which are respectively as follows: one each of the IR image, Registration of RGB image, RGB image and Depth image; the resolution of the picture is 640 × 480. The detection system comprises image preprocessing, labeling and discrimination by using a YOLOv3 neural network.

According to the invention, the unmanned aerial vehicle can be used as a camera carrying the Kinect as photographing equipment, low-altitude photographing can be carried out through the unmanned aerial vehicle, and the manual photographing difficulty and workload are reduced. The unmanned aerial vehicle shoots buildings (including illegal buildings) in a city or other regions according to a preset route, an obtained picture image is input into image scanning equipment for preprocessing, the preprocessing is carried out under a Windows XP or above operating system by adopting a computer (PC), Matlab710 based on a Retinex image enhancement algorithm is installed in the Windows XP, and the picture or the photo shot by the unmanned aerial vehicle can be subjected to sharpening processing based on the Retinex algorithm. Because unmanned aerial vehicle is at the in-process of taking a picture and collecting evidence, there is the influence of weather or other factors, for example there is the haze condition or receives the wind fast disturbance and make the photo of taking a picture greatly reduced in the whole contrast and the luminance of image, image color distortion. By adopting the Retinex post-processing of the restored image, the image is integrally enhanced, the detail information of the image is enhanced, the edge is clear, the image color information is further enhanced, the image color is better recovered, and the purpose of enhancing the highlight area is achieved. The Matlab710 based on the Retinex image enhancement algorithm is the prior art, and specifically, reference may be made to the image enhancement algorithm based on the Retinex principle-article number: 1009-3044(2018)11-0185-02, a method for improving the definition of a foggy image-article number: 100325060(2011)0120083204, etc.

FIG. 2 shows a schematic diagram of the structure of YOLOv3, including making training sets, generating migration Darknet-53 models, improving candidate box parameters, and violation building identification. The method comprises the following steps:

step 1, making a training set according to the following process

1.1, using a Kinect device to shoot four types of Kinect images, namely: one each of the IR image, the registration of the RGB image, the RGB image and the Depth image; the resolution of the picture obtained by shooting was 640 × 480.

And 1.2, copying each picture obtained by shooting, and respectively adjusting the resolution to 300 × 225, 400 × 300, 500 × 375 and 600 × 450 in proportion to obtain a four-time-multiplied Kinect image data set.

And 1.3, manually marking a building area frame aiming at each picture in the four times of amplified Kinect image data set, and generating a label file.

And 1.4, combining the Kinect image data set and the label file to form a training set.

Step 2, generating a migration Darknet-53 model according to the following process

2.1, selecting x groups of sample groups in the data set, wherein each sample group comprises y samples, each sample consists of an RGBImage picture and a depth image picture, and obtaining 2 x y sample pictures.

2.2, each sample picture is copied and scaled to resolutions of 300 × 225, 400 × 300, 500 × 375 and 600 × 450, respectively, to obtain four times the number of sample pictures.

2.3, pre-training the sample pictures amplified by four times through Darknet-53, transferring the network parameters obtained after the pre-training to a basic network, and initializing to obtain a transferred Darknet-53 model.

Step 3, setting initial candidate frame parameters of YOLOv3 according to the following process

3.1, clustering the manually marked building area frames in the training set by adopting a K-means clustering algorithm, setting different K values, and counting the values of the corresponding sum of Squared errors SSE (sum of the Squared errors).

3.2, finding the optimal k value by using an elbow method to obtain corresponding k clustering centers, and writing the k clustering centers into a configuration file as initial candidate frame parameters of YOLOv 3.

Step 4, identifying the illegal buildings according to the following process

4.1, training the training set obtained in the step 1 by using improved YOLOv3 to obtain a trained parameter model;

4.2, calling a Kinect camera to simultaneously output four types of Kinect images, and identifying by adopting the parameter model obtained in the step 4.1 to obtain identification results of the four types of Kinect images; the four types of Kinect images refer to: IR images, Registration of RGB images, RGB images and Depth images.

And 4.3, identifying the building area (illegal building) by fusing the identification results of the four types of Kinect images.

In a specific implementation, step 3.1 is to obtain the values of the sum of squared errors SSE as follows:

YOLOv3 divides the image into S × S grids in the training process, and obtains B detection frames and confidence conf (object) thereof for each grid prediction according to formula (1), formula (2) and formula (3);

Conf(Object)＝Pr(Object)×IOU (1)，

wherein:

wherein,

in order to be a coordinate error,

in order to be an error of the IOU,

is a classification error, and has:

wherein:

λ_coordis composed ofWeight parameter of λ_coord＝5；λ_noobjIs composed of

Correction parameter lambda of_noobj＝0.5；

error of x parameter representing grid i;

the value of the y parameter representing the real box to which grid i corresponds,

error of y parameter representing grid i;

error of w parameter representing grid i;

the value of the h parameter representing the real box to which grid i corresponds,

error of h parameter representing grid i;

C_ia confidence Conf (object) predictor representing grid i;

the confidence conf (object) true value representing the mesh i,

representing the confidence error of grid i;

Representing the true probability of the target falling into grid i,

representing the probability error that the target falls into grid i;

whether an object falls into the grid i in the jth prediction frame or not is judged, if yes, the number is 1, and otherwise, the number is 0;

introducing a group of initial candidate frames with fixed size and aspect ratio into the YOLOv3 in the target detection process, carrying out clustering analysis on the manually marked target frames in the training set obtained in the step 1 by adopting a K-Means clustering algorithm, and finding out the optimal K value representing the number of the initial candidate frames and the width-height dimension of K clustering centers as candidate frame parameters in a network configuration file;

wherein Cl is_iIs the ith cluster, p is Cl_iSample point of (1), m_iIs Cl_iIs the center of gravity of Cl_iThe mean value of all samples in the process, SSE is the clustering error of all samples, which represents the good or bad clustering effect, and the core idea of the elbow method is as follows: along with the increase of the k value, the sample division is more fine, the SSE gradually becomes smaller, when k reaches the optimal clustering number, the returning of the clustering degree is rapidly reduced by continuously increasing the k value, which is represented by the sudden decrease of the descending amplitude of the SSE, the relation graph of the SSE and the k shows the shape of an elbow, and the k value corresponding to the elbow is the optimal clustering number required by the user;

in the K-means clustering, Euclidean distance is adopted to represent the error between a sample point and a sample mean value, the sample point is a prediction frame, the sample mean value is a real frame, the error between the prediction frame and the real frame is reflected by adopting an IOU (input output) which is larger, and the error is smaller; the clustering error of the obtained samples is calculated by using equation (9):

And finally, sending the identification result to a system terminal, and delivering the identification result to a user (user) as an auxiliary diagnosis result to identify the violation building.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An automatic illegal building detection method based on YOLOv3 is characterized by comprising the following steps:

2. The method for automatically detecting illegal building based on YOLOv3 as claimed in claim 1, wherein the training set in step 1 is made as follows:

3. The method for automatically detecting illegal building based on Yolov3 as claimed in claim 1, wherein the step 6 further comprises after obtaining the trained parameter model: calling a Kinect camera to simultaneously output four types of Kinect images, and identifying by adopting a parameter model to obtain identification results of the four types of Kinect images; the four types of Kinect images refer to: IR images, Registration of RGB images, RGB images and Depth images.

4. The method for automatically detecting illegal building based on YOLOv3 as claimed in claim 1, wherein the value of sum of squared errors SSE in step 3 is obtained by the following method: YOLOv3 divides the image into S × S grids in the training process, and obtains B detection frames and confidence conf (object) thereof for each grid prediction according to formula (1), formula (2) and formula (3);

Conf(Object)＝Pr(Object)×IOU (1)，

wherein:

wherein,

in order to be a coordinate error,

in order to be an error of the IOU,

is a classification error, and has:

wherein:

λ_coordis composed ofWeight parameter of λ_coord＝5；λ_noobjIs composed of

Correction parameter lambda of_noobj＝0.5；

error of x parameter representing grid i;

error of y parameter representing grid i;

error of w parameter representing grid i;

error of h parameter representing grid i;

C_ia confidence Conf (object) predictor representing grid i;

the confidence conf (object) true value representing the mesh i,representing the confidence error of grid i;

Representing the true probability of the target falling into grid i,

representing the probability error that the target falls into grid i;

5. The method for automatically detecting the illegal building based on the YOLOv3 as claimed in claim 1, wherein in step 4, the YOLOv3 introduces a group of initial candidate frames with fixed size and aspect ratio in the target detection process, and performs cluster analysis on the manually marked target frames in the training set obtained in step 1 by using a K-Means clustering algorithm to find the optimal K value representing the number of the initial candidate frames, and the width-height dimension of K cluster centers is used as the candidate frame parameters in the network configuration file;

6. The YOLOv 3-based automatic construction violation detection method according to claim 1, wherein in step 5, in K-means clustering, the euclidean distance is used to represent the error between the sample point and the sample mean, the sample point is the prediction box, the sample mean is the real box, the IOU is used to reflect the error between the prediction box and the real box, and the larger the IOU is, the smaller the error is; the clustering error of the obtained samples is calculated by using equation (9):

7. The method for automatically detecting illegal building according to claim 1, wherein the identification result is sent to the system terminal in step 6 and is delivered to the user for final identification as the auxiliary diagnosis result.

8. The YOLOv 3-based automatic illegal building detection system is characterized by comprising the following components in percentage by weight:

9. The YOLOv 3-based automatic detection violation building system as recited in claim 8, wherein the image capturing device is a Kinect device, and four types of Kinect images comprise an IR image, a Registration of RGB image, an RGB image and a Depth image; the resolution of the picture is 640 multiplied by 480, the image preprocessing is to copy each picture obtained by shooting, and the resolutions are respectively adjusted to 300 multiplied by 225, 400 multiplied by 300, 500 multiplied by 375 and 600 multiplied by 450 according to the proportion, so as to obtain a Kinect image data set with four times of amplification; and manually marking a building area frame for each picture in the four times of the amplified Kinect image data set by a labeling pointer to generate a label file.

10. The YOLOv 3-based automatic violation detection building system according to claim 8, wherein the image scanning device is based on a PC, and Matlab710 based on Retinex image enhancement algorithm is installed in the PC.