CN110852164A - YOLOv 3-based method and system for automatically detecting illegal building - Google Patents

YOLOv 3-based method and system for automatically detecting illegal building Download PDF

Info

Publication number
CN110852164A
CN110852164A CN201910956709.1A CN201910956709A CN110852164A CN 110852164 A CN110852164 A CN 110852164A CN 201910956709 A CN201910956709 A CN 201910956709A CN 110852164 A CN110852164 A CN 110852164A
Authority
CN
China
Prior art keywords
image
grid
kinect
error
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910956709.1A
Other languages
Chinese (zh)
Inventor
江寅
朱传瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Pan Public Mdt Infotech Ltd
Original Assignee
Anhui Pan Public Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Pan Public Mdt Infotech Ltd filed Critical Anhui Pan Public Mdt Infotech Ltd
Priority to CN201910956709.1A priority Critical patent/CN110852164A/en
Publication of CN110852164A publication Critical patent/CN110852164A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for automatically detecting illegal buildings based on YOLOv 3. The image acquisition equipment of the building area shoots four types of Kinect images by using Kinect equipment, and the detection system is used for image preprocessing, labeling and judgment by using a YOLOv3 neural network. And finally, sending a result obtained by processing the detection system to a system terminal, and fusing the recognition result of the Kinect images of four types as an auxiliary diagnosis result to be delivered to a user for final diagnosis so as to realize the recognition of the building area (the illegal building). The invention completes an automatic detection violation building system based on an intelligent image processing technology and a neural network, can reduce the workload of manual identification to a certain extent, and has economic and social significance.

Description

YOLOv 3-based method and system for automatically detecting illegal building
Technical Field
The invention relates to the technical field of illegal building image identification, in particular to a method and a system for automatically detecting illegal buildings based on YOLOv 3.
Background
The building change detection is one of important contents of geographic national condition monitoring, and has important significance for illegal building identification, city dynamic monitoring, geographic information updating and the like. Taking urban illegal building detection as an example, along with the continuous development of the economic society of China, the urbanization process is continuously accelerated, urban buildings are continuously increased, the number and the scale of illegal buildings are also continuously increased, the phenomenon not only destroys urban planning and urban landscape, but also influences urban image and resident life, is a hotspot problem concerned by common people, is a difficult problem of urban management, and is one of negative factors influencing social harmony. At present, the 'law enforcement cost is low and the law enforcement cost is high' is one of the main reasons for repeated prohibition of illegal buildings, besides the lack of related legal links, the detection aspect of the illegal buildings is weak, and due to the lack of automatic monitoring means for the illegal buildings, the mode of utilizing manual inspection has a plurality of disadvantages, namely, the period of the discovery process is long, and the large-scale monitoring cost is high. In recent years, illegal building detection is attempted by using satellite image data in cities such as Beijing, but the automatic analysis technology of image information is still not mature enough, and the specific gravity of manual identification and verification participation in the process is large. Billions of manpower and material resources are invested by land law enforcement, city management, nationwide each year for this task. The method with high automation degree, robustness and reliability is urgently needed in the market to detect urban illegal buildings, so that the renovation process of the urban illegal buildings is promoted.
Disclosure of Invention
The invention aims to provide a method and a system for automatically detecting illegal buildings based on YOLOv3, which aim to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
an automatic illegal building detection method based on YOLOv3 comprises the following steps:
step 1: firstly, obtaining a picture of a building area through image acquisition equipment, then firstly carrying out image definition pretreatment on the picture through image scanning equipment, then making a training set, selecting x groups of sample groups from a data set, wherein each sample group comprises y samples, each sample consists of an RGB image picture and a depth image picture, and obtaining 2 xxxy sample pictures in total;
step 2: copying each sample picture, and respectively adjusting the resolution to 300 × 225, 400 × 300, 500 × 375 and 600 × 450 according to the proportion to obtain sample pictures with four times of amplification quantity;
and step 3: pre-training the sample pictures with the four-fold amplification quantity through Darknet-53, transferring the network parameters obtained after the pre-training to a basic network, and initializing to obtain a transferred Darknet-53 model;
and 4, step 4: adopting a K-means clustering algorithm to cluster the building area frames manually marked in the training set, setting different K values, and counting the values of the corresponding error Square Sum (SSE);
step 6: drawing a relation graph of the SSE value and the k value; finding an optimal k value by an elbow method according to a relation graph of the SSE value and the k value to obtain corresponding k clustering centers, and writing the k clustering centers into a configuration file as initial candidate frame parameters of YOLOv 3;
and 7: training the training set obtained in the step 1 by using improved YOLOv3 to obtain a parameter model for completing training; and realizing the identification of the violation buildings by fusing the identification results of the four types of Kinect images.
Preferably, the training set in step 1 is prepared as follows:
1.1: the image acquisition of image acquisition equipment uses the Kinect equipment to shoot four types of Kinect images, and the image acquisition is respectively: cutting each image into a picture with a fixed position, a consistent angle and a known visual field; the resolution of the picture is 640 multiplied by 480;
1.2: copying each picture obtained by shooting, and respectively adjusting the resolution to 300 multiplied by 225, 400 multiplied by 300, 500 multiplied by 375 and 600 multiplied by 450 according to the proportion to obtain a Kinect image data set with four times of amplification;
1.3: manually marking a building area frame aiming at each picture in the four times of amplified Kinect image data set to generate a label file;
1.4: and combining the Kinect image data set and the label file to form a training set.
Preferably, after the trained parametric model is obtained in step 6, the method further includes: calling a Kinect camera to simultaneously output four types of Kinect images, and identifying by adopting a parameter model to obtain identification results of the four types of Kinect images; the four types of Kinect images refer to: IR images, Registration of RGB images, RGB images and Depth images.
Preferably, the value of the sum of squared errors SSE in step 3 is obtained as follows: YOLOv3 divides the image into S × S grids in the training process, and obtains B detection frames and confidence conf (object) thereof for each grid prediction according to formula (1), formula (2) and formula (3);
Conf(Object)=Pr(Object)×IOU (1),
Figure BDA0002227570930000031
wherein:
pr (object) indicates whether an object falls into the grid corresponding to the candidate box, if so, 0, as shown in formula (2);
IOU represents the ratio of the intersection area and union area of the prediction frame and the real frame; box (pred) represents a prediction box; box (Truth) represents a real box; area (·) denotes an area;
the confidence conf (object) represents the confidence level of the detected object;
each test box contains 5 parameters: x, y, w, h and conf (object); wherein, (x, y) represents the offset of the center of the detection box from the network position, and (w, h) represents the width and height of the detection box;
predicting C Class probabilities Pr (Class) per gridi|Object),Pr(ClassiI Object) represents the probability that the Object falls into grid i; the final output S × S × [ B × (4+1+ C)]Tensor of dimension (tensor); the loss function loss of YOLOv3 is characterized by equation (4):
Figure BDA0002227570930000041
wherein,
Figure BDA0002227570930000042
in order to be a coordinate error,in order to be an error of the IOU,
Figure BDA0002227570930000044
is a classification error, and has:
Figure BDA0002227570930000045
Figure BDA0002227570930000047
wherein:
λcoordis composed of
Figure BDA0002227570930000048
Weight parameter of λcoord=5;λnoobjIs composed of
Figure BDA0002227570930000049
Correction parameter lambda ofnoobj=0.5;
Figure BDA00022275709300000410
The value of the x parameter representing the real box to which grid i corresponds,
Figure BDA00022275709300000411
error of x parameter representing grid i;
Figure BDA00022275709300000412
the value of the y parameter representing the real box to which grid i corresponds,error of y parameter representing grid i;
Figure BDA00022275709300000414
the value of the w parameter representing the real box to which grid i corresponds,
Figure BDA00022275709300000415
error of w parameter representing grid i;
Figure BDA00022275709300000416
the value of the h parameter representing the real box to which grid i corresponds,error of h parameter representing grid i;
Cia confidence Conf (object) predictor representing grid i;
Figure BDA00022275709300000418
the confidence conf (object) true value representing the mesh i,
Figure BDA00022275709300000419
representing the confidence error of grid i;
pi(c) a prediction probability Pr (Class) representing the object falling into grid ii|Object);
Figure BDA00022275709300000420
Representing the true probability of the target falling into grid i,
Figure BDA0002227570930000051
representing the probability error that the target falls into grid i;
Figure BDA0002227570930000052
whether a target falls into the grid i or not is shown, if the target falls into the grid i, the target is 1, otherwise, the target is 0;
Figure BDA0002227570930000053
the judgment result shows that whether an object falls into the grid i in the jth prediction frame is judged, if so, the result is 1, otherwise, the result is 0.
Preferably, in the step 4, a group of initial candidate frames with fixed size and aspect ratio is introduced into the YOLOv3 in the target detection process, and a K-Means clustering algorithm is adopted to perform clustering analysis on the manually marked target frames in the training set obtained in the step 1, so as to find out the optimal K value representing the number of the initial candidate frames and the width-height dimension of K clustering centers as the candidate frame parameters in the network configuration file;
and (3) determining the k value according to the error sum of squares SSE and the elbow method according to the formula (8):
Figure BDA0002227570930000054
wherein Cl isiIs the ith cluster, p is CliSample point of (1), miIs CliIs the center of gravity of CliThe mean value of all samples in the process, SSE is the clustering error of all samples, which represents the good or bad clustering effect, and the core idea of the elbow method is as follows: along with the increase of the k value, the sample division is more fine, the SSE is gradually reduced, when k reaches the optimal clustering number, the return of clustering degree is rapidly reduced by continuously increasing the k value, which is represented by the sudden reduction of the SSE descending amplitude, the relation graph of the SSE and the k shows the shape of an elbow, and the k value corresponding to the elbow is the optimal clustering number required by the user.
Preferably, in the K-means clustering in step 5, the euclidean distance is used to represent the error between the sample point and the sample mean value, the sample point is a prediction frame, the sample mean value is a real frame, the IOU is used to reflect the error between the prediction frame and the real frame, and the larger the IOU is, the smaller the error is; the clustering error of the obtained samples is calculated by using equation (9):
wherein, the IOUpIOU for sample point p, 1-IOUpThe error at sample point p is represented, resulting in the SSE and k values.
Preferably, in step 6, the recognition result is sent to the system terminal, and the recognition result is delivered to the user for final recognition as an auxiliary diagnosis result.
The invention also provides an automatic illegal building detection system based on YOLOv3, which comprises:
the image acquisition equipment is used for shooting four types of Kinect images and uploading the images to the detection system;
the image scanning equipment is used for carrying out image definition pretreatment on the image shot by the image acquisition equipment and then sending the image into the detection system;
the detection system is used for acquiring the Kinect image, preprocessing the image, labeling and judging whether the Kinect image is a violation building or not by utilizing a YOLOv3 neural network;
and the system terminal receives the judgment processed by the detection system and displays the judgment result as a user auxiliary judgment result.
Preferably, the image acquisition equipment adopts Kinect equipment, and the four types of Kinect images comprise an IR image, a Registration of RGB image, an RGB image and a Depth image respectively; the resolution of the picture is 640 multiplied by 480, the image preprocessing is to copy each picture obtained by shooting, and the resolutions are respectively adjusted to 300 multiplied by 225, 400 multiplied by 300, 500 multiplied by 375 and 600 multiplied by 450 according to the proportion, so as to obtain a Kinect image data set with four times of amplification; and manually marking a building area frame for each picture in the four times of the amplified Kinect image data set by a labeling pointer to generate a label file.
Preferably, the image scanning device is based on a PC, into which Matlab710 based on the Retinex image enhancement algorithm is loaded.
Compared with the prior art, the invention has the beneficial effects that:
the invention can provide effective auxiliary diagnosis information, and the invention completes an automatic detection violation building system based on intelligent image processing technology and neural network, can reduce the workload of manual identification to a certain extent, and has economic and social significance.
Drawings
FIG. 1 is a schematic diagram of the system of the present invention;
fig. 2 is a schematic diagram of the network structure of YOLOV3 in the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-2, the present invention provides a technical solution:
referring to fig. 1, an automatic illegal building detection method based on YOLOv3 includes an image acquisition device, an image scanning device, a detection system and a system terminal. The image acquisition of the building area is to use a Kinect device to shoot four types of Kinect images, which are respectively as follows: one each of the IR image, Registration of RGB image, RGB image and Depth image; the resolution of the picture is 640 × 480. The detection system comprises image preprocessing, labeling and discrimination by using a YOLOv3 neural network.
According to the invention, the unmanned aerial vehicle can be used as a camera carrying the Kinect as photographing equipment, low-altitude photographing can be carried out through the unmanned aerial vehicle, and the manual photographing difficulty and workload are reduced. The unmanned aerial vehicle shoots buildings (including illegal buildings) in a city or other regions according to a preset route, an obtained picture image is input into image scanning equipment for preprocessing, the preprocessing is carried out under a Windows XP or above operating system by adopting a computer (PC), Matlab710 based on a Retinex image enhancement algorithm is installed in the Windows XP, and the picture or the photo shot by the unmanned aerial vehicle can be subjected to sharpening processing based on the Retinex algorithm. Because unmanned aerial vehicle is at the in-process of taking a picture and collecting evidence, there is the influence of weather or other factors, for example there is the haze condition or receives the wind fast disturbance and make the photo of taking a picture greatly reduced in the whole contrast and the luminance of image, image color distortion. By adopting the Retinex post-processing of the restored image, the image is integrally enhanced, the detail information of the image is enhanced, the edge is clear, the image color information is further enhanced, the image color is better recovered, and the purpose of enhancing the highlight area is achieved. The Matlab710 based on the Retinex image enhancement algorithm is the prior art, and specifically, reference may be made to the image enhancement algorithm based on the Retinex principle-article number: 1009-3044(2018)11-0185-02, a method for improving the definition of a foggy image-article number: 100325060(2011)0120083204, etc.
FIG. 2 shows a schematic diagram of the structure of YOLOv3, including making training sets, generating migration Darknet-53 models, improving candidate box parameters, and violation building identification. The method comprises the following steps:
step 1, making a training set according to the following process
1.1, using a Kinect device to shoot four types of Kinect images, namely: one each of the IR image, the registration of the RGB image, the RGB image and the Depth image; the resolution of the picture obtained by shooting was 640 × 480.
And 1.2, copying each picture obtained by shooting, and respectively adjusting the resolution to 300 × 225, 400 × 300, 500 × 375 and 600 × 450 in proportion to obtain a four-time-multiplied Kinect image data set.
And 1.3, manually marking a building area frame aiming at each picture in the four times of amplified Kinect image data set, and generating a label file.
And 1.4, combining the Kinect image data set and the label file to form a training set.
Step 2, generating a migration Darknet-53 model according to the following process
2.1, selecting x groups of sample groups in the data set, wherein each sample group comprises y samples, each sample consists of an RGBImage picture and a depth image picture, and obtaining 2 x y sample pictures.
2.2, each sample picture is copied and scaled to resolutions of 300 × 225, 400 × 300, 500 × 375 and 600 × 450, respectively, to obtain four times the number of sample pictures.
2.3, pre-training the sample pictures amplified by four times through Darknet-53, transferring the network parameters obtained after the pre-training to a basic network, and initializing to obtain a transferred Darknet-53 model.
Step 3, setting initial candidate frame parameters of YOLOv3 according to the following process
3.1, clustering the manually marked building area frames in the training set by adopting a K-means clustering algorithm, setting different K values, and counting the values of the corresponding sum of Squared errors SSE (sum of the Squared errors).
3.2, finding the optimal k value by using an elbow method to obtain corresponding k clustering centers, and writing the k clustering centers into a configuration file as initial candidate frame parameters of YOLOv 3.
Step 4, identifying the illegal buildings according to the following process
4.1, training the training set obtained in the step 1 by using improved YOLOv3 to obtain a trained parameter model;
4.2, calling a Kinect camera to simultaneously output four types of Kinect images, and identifying by adopting the parameter model obtained in the step 4.1 to obtain identification results of the four types of Kinect images; the four types of Kinect images refer to: IR images, Registration of RGB images, RGB images and Depth images.
And 4.3, identifying the building area (illegal building) by fusing the identification results of the four types of Kinect images.
In a specific implementation, step 3.1 is to obtain the values of the sum of squared errors SSE as follows:
YOLOv3 divides the image into S × S grids in the training process, and obtains B detection frames and confidence conf (object) thereof for each grid prediction according to formula (1), formula (2) and formula (3);
Conf(Object)=Pr(Object)×IOU (1),
Figure BDA0002227570930000091
Figure BDA0002227570930000092
wherein:
pr (object) indicates whether an object falls into the grid corresponding to the candidate box, if so, 0, as shown in formula (2);
IOU represents the ratio of the intersection area and union area of the prediction frame and the real frame; box (pred) represents a prediction box; box (Truth) represents a real box; area (·) denotes an area;
the confidence conf (object) represents the confidence level of the detected object;
each test box contains 5 parameters: x, y, w, h and conf (object); wherein, (x, y) represents the offset of the center of the detection box from the network position, and (w, h) represents the width and height of the detection box;
predicting C Class probabilities Pr (Class) per gridi|Object),Pr(ClassiI Object) represents the probability that the Object falls into grid i; the final output S × S × [ B × (4+1+ C)]Tensor of dimension (tensor); the loss function loss of YOLOv3 is characterized by equation (4):
Figure BDA0002227570930000101
wherein,
Figure BDA0002227570930000102
in order to be a coordinate error,
Figure BDA0002227570930000103
in order to be an error of the IOU,
Figure BDA0002227570930000104
is a classification error, and has:
Figure BDA0002227570930000105
Figure BDA0002227570930000106
Figure BDA0002227570930000107
wherein:
λcoordis composed ofWeight parameter of λcoord=5;λnoobjIs composed of
Figure BDA0002227570930000109
Correction parameter lambda ofnoobj=0.5;
Figure BDA00022275709300001010
The value of the x parameter representing the real box to which grid i corresponds,
Figure BDA00022275709300001011
error of x parameter representing grid i;
Figure BDA00022275709300001012
the value of the y parameter representing the real box to which grid i corresponds,
Figure BDA00022275709300001013
error of y parameter representing grid i;
Figure BDA00022275709300001014
the value of the w parameter representing the real box to which grid i corresponds,
Figure BDA00022275709300001015
error of w parameter representing grid i;
Figure BDA00022275709300001016
the value of the h parameter representing the real box to which grid i corresponds,
Figure BDA00022275709300001017
error of h parameter representing grid i;
Cia confidence Conf (object) predictor representing grid i;
Figure BDA00022275709300001018
the confidence conf (object) true value representing the mesh i,
Figure BDA00022275709300001019
representing the confidence error of grid i;
pi(c) a prediction probability Pr (Class) representing the object falling into grid ii|Object);
Figure BDA00022275709300001020
Representing the true probability of the target falling into grid i,
Figure BDA0002227570930000111
representing the probability error that the target falls into grid i;
Figure BDA0002227570930000112
whether a target falls into the grid i or not is shown, if the target falls into the grid i, the target is 1, otherwise, the target is 0;
whether an object falls into the grid i in the jth prediction frame or not is judged, if yes, the number is 1, and otherwise, the number is 0;
introducing a group of initial candidate frames with fixed size and aspect ratio into the YOLOv3 in the target detection process, carrying out clustering analysis on the manually marked target frames in the training set obtained in the step 1 by adopting a K-Means clustering algorithm, and finding out the optimal K value representing the number of the initial candidate frames and the width-height dimension of K clustering centers as candidate frame parameters in a network configuration file;
and (3) determining the k value according to the error sum of squares SSE and the elbow method according to the formula (8):
Figure BDA0002227570930000114
wherein Cl isiIs the ith cluster, p is CliSample point of (1), miIs CliIs the center of gravity of CliThe mean value of all samples in the process, SSE is the clustering error of all samples, which represents the good or bad clustering effect, and the core idea of the elbow method is as follows: along with the increase of the k value, the sample division is more fine, the SSE gradually becomes smaller, when k reaches the optimal clustering number, the returning of the clustering degree is rapidly reduced by continuously increasing the k value, which is represented by the sudden decrease of the descending amplitude of the SSE, the relation graph of the SSE and the k shows the shape of an elbow, and the k value corresponding to the elbow is the optimal clustering number required by the user;
in the K-means clustering, Euclidean distance is adopted to represent the error between a sample point and a sample mean value, the sample point is a prediction frame, the sample mean value is a real frame, the error between the prediction frame and the real frame is reflected by adopting an IOU (input output) which is larger, and the error is smaller; the clustering error of the obtained samples is calculated by using equation (9):
Figure BDA0002227570930000115
wherein, the IOUpIOU for sample point p, 1-IOUpThe error at sample point p is represented, resulting in the SSE and k values.
And finally, sending the identification result to a system terminal, and delivering the identification result to a user (user) as an auxiliary diagnosis result to identify the violation building.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. An automatic illegal building detection method based on YOLOv3 is characterized by comprising the following steps:
step 1: firstly, obtaining a picture of a building area through image acquisition equipment, then firstly carrying out image definition pretreatment on the picture through image scanning equipment, then making a training set, selecting x groups of sample groups from a data set, wherein each sample group comprises y samples, each sample consists of an RGB image picture and a depth image picture, and obtaining 2 xxxy sample pictures in total;
step 2: copying each sample picture, and respectively adjusting the resolution to 300 × 225, 400 × 300, 500 × 375 and 600 × 450 according to the proportion to obtain sample pictures with four times of amplification quantity;
and step 3: pre-training the sample pictures with the four-fold amplification quantity through Darknet-53, transferring the network parameters obtained after the pre-training to a basic network, and initializing to obtain a transferred Darknet-53 model;
and 4, step 4: adopting a K-means clustering algorithm to cluster the building area frames manually marked in the training set, setting different K values, and counting the values of the corresponding error Square Sum (SSE);
step 6: drawing a relation graph of the SSE value and the k value; finding an optimal k value by an elbow method according to a relation graph of the SSE value and the k value to obtain corresponding k clustering centers, and writing the k clustering centers into a configuration file as initial candidate frame parameters of YOLOv 3;
and 7: training the training set obtained in the step 1 by using improved YOLOv3 to obtain a parameter model for completing training; and realizing the identification of the violation buildings by fusing the identification results of the four types of Kinect images.
2. The method for automatically detecting illegal building based on YOLOv3 as claimed in claim 1, wherein the training set in step 1 is made as follows:
1.1: the image acquisition of image acquisition equipment uses the Kinect equipment to shoot four types of Kinect images, and the image acquisition is respectively: cutting each image into a picture with a fixed position, a consistent angle and a known visual field; the resolution of the picture is 640 multiplied by 480;
1.2: copying each picture obtained by shooting, and respectively adjusting the resolution to 300 multiplied by 225, 400 multiplied by 300, 500 multiplied by 375 and 600 multiplied by 450 according to the proportion to obtain a Kinect image data set with four times of amplification;
1.3: manually marking a building area frame aiming at each picture in the four times of amplified Kinect image data set to generate a label file;
1.4: and combining the Kinect image data set and the label file to form a training set.
3. The method for automatically detecting illegal building based on Yolov3 as claimed in claim 1, wherein the step 6 further comprises after obtaining the trained parameter model: calling a Kinect camera to simultaneously output four types of Kinect images, and identifying by adopting a parameter model to obtain identification results of the four types of Kinect images; the four types of Kinect images refer to: IR images, Registration of RGB images, RGB images and Depth images.
4. The method for automatically detecting illegal building based on YOLOv3 as claimed in claim 1, wherein the value of sum of squared errors SSE in step 3 is obtained by the following method: YOLOv3 divides the image into S × S grids in the training process, and obtains B detection frames and confidence conf (object) thereof for each grid prediction according to formula (1), formula (2) and formula (3);
Conf(Object)=Pr(Object)×IOU (1),
Figure FDA0002227570920000021
wherein:
pr (object) indicates whether an object falls into the grid corresponding to the candidate box, if so, 0, as shown in formula (2);
IOU represents the ratio of the intersection area and union area of the prediction frame and the real frame; box (pred) represents a prediction box; box (Truth) represents a real box; area (·) denotes an area;
the confidence conf (object) represents the confidence level of the detected object;
each test box contains 5 parameters: x, y, w, h and conf (object); wherein, (x, y) represents the offset of the center of the detection box from the network position, and (w, h) represents the width and height of the detection box;
predicting C Class probabilities Pr (Class) per gridi|Object),Pr(ClassiI Object) represents the probability that the Object falls into grid i; the final output S × S × [ B × (4+1+ C)]Tensor of dimension (tensor); the loss function loss of YOLOv3 is characterized by equation (4):
Figure FDA0002227570920000031
wherein,
Figure FDA0002227570920000032
in order to be a coordinate error,
Figure FDA0002227570920000033
in order to be an error of the IOU,
Figure FDA0002227570920000034
is a classification error, and has:
Figure FDA0002227570920000036
Figure FDA0002227570920000037
wherein:
λcoordis composed ofWeight parameter of λcoord=5;λnoobjIs composed of
Figure FDA0002227570920000039
Correction parameter lambda ofnoobj=0.5;
Figure FDA00022275709200000310
The value of the x parameter representing the real box to which grid i corresponds,
Figure FDA00022275709200000311
error of x parameter representing grid i;
the value of the y parameter representing the real box to which grid i corresponds,
Figure FDA00022275709200000313
error of y parameter representing grid i;
Figure FDA00022275709200000314
the value of the w parameter representing the real box to which grid i corresponds,
Figure FDA00022275709200000315
error of w parameter representing grid i;
Figure FDA00022275709200000316
the value of the h parameter representing the real box to which grid i corresponds,
Figure FDA00022275709200000317
error of h parameter representing grid i;
Cia confidence Conf (object) predictor representing grid i;
Figure FDA0002227570920000041
the confidence conf (object) true value representing the mesh i,representing the confidence error of grid i;
pi(c) a prediction probability Pr (Class) representing the object falling into grid ii|Object);
Figure FDA0002227570920000043
Representing the true probability of the target falling into grid i,
Figure FDA0002227570920000044
representing the probability error that the target falls into grid i;
Figure FDA0002227570920000045
whether a target falls into the grid i or not is shown, if the target falls into the grid i, the target is 1, otherwise, the target is 0;
Figure FDA0002227570920000046
the judgment result shows that whether an object falls into the grid i in the jth prediction frame is judged, if so, the result is 1, otherwise, the result is 0.
5. The method for automatically detecting the illegal building based on the YOLOv3 as claimed in claim 1, wherein in step 4, the YOLOv3 introduces a group of initial candidate frames with fixed size and aspect ratio in the target detection process, and performs cluster analysis on the manually marked target frames in the training set obtained in step 1 by using a K-Means clustering algorithm to find the optimal K value representing the number of the initial candidate frames, and the width-height dimension of K cluster centers is used as the candidate frame parameters in the network configuration file;
and (3) determining the k value according to the error sum of squares SSE and the elbow method according to the formula (8):
wherein Cl isiIs the ith cluster, p is CliSample point of (1), miIs CliIs the center of gravity of CliThe mean value of all samples in the process, SSE is the clustering error of all samples, which represents the good or bad clustering effect, and the core idea of the elbow method is as follows: along with the increase of the k value, the sample division is more fine, the SSE is gradually reduced, when k reaches the optimal clustering number, the return of clustering degree is rapidly reduced by continuously increasing the k value, which is represented by the sudden reduction of the SSE descending amplitude, the relation graph of the SSE and the k shows the shape of an elbow, and the k value corresponding to the elbow is the optimal clustering number required by the user.
6. The YOLOv 3-based automatic construction violation detection method according to claim 1, wherein in step 5, in K-means clustering, the euclidean distance is used to represent the error between the sample point and the sample mean, the sample point is the prediction box, the sample mean is the real box, the IOU is used to reflect the error between the prediction box and the real box, and the larger the IOU is, the smaller the error is; the clustering error of the obtained samples is calculated by using equation (9):
Figure FDA0002227570920000051
wherein, the IOUpIOU for sample point p, 1-IOUpThe error at sample point p is represented, resulting in the SSE and k values.
7. The method for automatically detecting illegal building according to claim 1, wherein the identification result is sent to the system terminal in step 6 and is delivered to the user for final identification as the auxiliary diagnosis result.
8. The YOLOv 3-based automatic illegal building detection system is characterized by comprising the following components in percentage by weight:
the image acquisition equipment is used for shooting four types of Kinect images and uploading the images to the detection system;
the image scanning equipment is used for carrying out image definition pretreatment on the image shot by the image acquisition equipment and then sending the image into the detection system;
the detection system is used for acquiring the Kinect image, preprocessing the image, labeling and judging whether the Kinect image is a violation building or not by utilizing a YOLOv3 neural network;
and the system terminal receives the judgment processed by the detection system and displays the judgment result as a user auxiliary judgment result.
9. The YOLOv 3-based automatic detection violation building system as recited in claim 8, wherein the image capturing device is a Kinect device, and four types of Kinect images comprise an IR image, a Registration of RGB image, an RGB image and a Depth image; the resolution of the picture is 640 multiplied by 480, the image preprocessing is to copy each picture obtained by shooting, and the resolutions are respectively adjusted to 300 multiplied by 225, 400 multiplied by 300, 500 multiplied by 375 and 600 multiplied by 450 according to the proportion, so as to obtain a Kinect image data set with four times of amplification; and manually marking a building area frame for each picture in the four times of the amplified Kinect image data set by a labeling pointer to generate a label file.
10. The YOLOv 3-based automatic violation detection building system according to claim 8, wherein the image scanning device is based on a PC, and Matlab710 based on Retinex image enhancement algorithm is installed in the PC.
CN201910956709.1A 2019-10-10 2019-10-10 YOLOv 3-based method and system for automatically detecting illegal building Pending CN110852164A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910956709.1A CN110852164A (en) 2019-10-10 2019-10-10 YOLOv 3-based method and system for automatically detecting illegal building

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910956709.1A CN110852164A (en) 2019-10-10 2019-10-10 YOLOv 3-based method and system for automatically detecting illegal building

Publications (1)

Publication Number Publication Date
CN110852164A true CN110852164A (en) 2020-02-28

Family

ID=69596513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910956709.1A Pending CN110852164A (en) 2019-10-10 2019-10-10 YOLOv 3-based method and system for automatically detecting illegal building

Country Status (1)

Country Link
CN (1) CN110852164A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507296A (en) * 2020-04-23 2020-08-07 嘉兴河图遥感技术有限公司 Intelligent illegal building extraction method based on unmanned aerial vehicle remote sensing and deep learning
CN111709310A (en) * 2020-05-26 2020-09-25 重庆大学 Gesture tracking and recognition method based on deep learning
CN112215190A (en) * 2020-10-21 2021-01-12 南京智慧航空研究院有限公司 Illegal building detection method based on YOLOV4 model
CN112215189A (en) * 2020-10-21 2021-01-12 南京智慧航空研究院有限公司 Accurate detecting system for illegal building
CN113011405A (en) * 2021-05-25 2021-06-22 南京柠瑛智能科技有限公司 Method for solving multi-frame overlapping error of ground object target identification of unmanned aerial vehicle
CN113420716A (en) * 2021-07-16 2021-09-21 南威软件股份有限公司 Improved Yolov3 algorithm-based violation behavior recognition and early warning method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325454A (en) * 2018-09-28 2019-02-12 合肥工业大学 A kind of static gesture real-time identification method based on YOLOv3

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325454A (en) * 2018-09-28 2019-02-12 合肥工业大学 A kind of static gesture real-time identification method based on YOLOv3

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507296A (en) * 2020-04-23 2020-08-07 嘉兴河图遥感技术有限公司 Intelligent illegal building extraction method based on unmanned aerial vehicle remote sensing and deep learning
CN111709310A (en) * 2020-05-26 2020-09-25 重庆大学 Gesture tracking and recognition method based on deep learning
CN111709310B (en) * 2020-05-26 2024-02-02 重庆大学 Gesture tracking and recognition method based on deep learning
CN112215190A (en) * 2020-10-21 2021-01-12 南京智慧航空研究院有限公司 Illegal building detection method based on YOLOV4 model
CN112215189A (en) * 2020-10-21 2021-01-12 南京智慧航空研究院有限公司 Accurate detecting system for illegal building
CN113011405A (en) * 2021-05-25 2021-06-22 南京柠瑛智能科技有限公司 Method for solving multi-frame overlapping error of ground object target identification of unmanned aerial vehicle
CN113420716A (en) * 2021-07-16 2021-09-21 南威软件股份有限公司 Improved Yolov3 algorithm-based violation behavior recognition and early warning method
CN113420716B (en) * 2021-07-16 2023-07-28 南威软件股份有限公司 Illegal behavior identification and early warning method based on improved Yolov3 algorithm

Similar Documents

Publication Publication Date Title
CN110852164A (en) YOLOv 3-based method and system for automatically detecting illegal building
CN111222574B (en) Ship and civil ship target detection and classification method based on multi-model decision-level fusion
CN113822247B (en) Method and system for identifying illegal building based on aerial image
CN104978567B (en) Vehicle checking method based on scene classification
CN108804992B (en) Crowd counting method based on deep learning
CN113449632B (en) Vision and radar perception algorithm optimization method and system based on fusion perception and automobile
CN112801227B (en) Typhoon identification model generation method, device, equipment and storage medium
CN115512247A (en) Regional building damage grade assessment method based on image multi-parameter extraction
CN115272876A (en) Remote sensing image ship target detection method based on deep learning
CN113313107A (en) Intelligent detection and identification method for multiple types of diseases on cable surface of cable-stayed bridge
CN115880260A (en) Method, device and equipment for detecting base station construction and computer readable storage medium
CN115409814A (en) Photovoltaic module hot spot detection method and system based on fusion image
CN110826364B (en) Library position identification method and device
Li et al. Hybrid cloud detection algorithm based on intelligent scene recognition
CN110765900B (en) Automatic detection illegal building method and system based on DSSD
CN111881833B (en) Vehicle detection method, device, equipment and storage medium
CN117437470A (en) Fire hazard level assessment method and system based on artificial intelligence
CN114463628A (en) Deep learning remote sensing image ship target identification method based on threshold value constraint
CN114494850A (en) Village unmanned courtyard intelligent identification method and system
CN113963230A (en) Parking space detection method based on deep learning
CN113793069A (en) Urban waterlogging intelligent identification method of deep residual error network
CN113239962A (en) Traffic participant identification method based on single fixed camera
CN115359346B (en) Small micro-space identification method and device based on street view picture and electronic equipment
CN112767469B (en) Highly intelligent acquisition method for urban mass buildings
CN115880644B (en) Method and system for identifying coal quantity based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200228