CN114581769A

CN114581769A - Method for identifying houses under construction based on unsupervised clustering

Info

Publication number: CN114581769A
Application number: CN202210063160.5A
Authority: CN
Inventors: 胡华浪; 韩旭; 黄进; 李剑波; 申克建
Original assignee: Big Data Development Center Of Ministry Of Agriculture And Rural Areas; Southwest Jiaotong University
Current assignee: Big Data Development Center Of Ministry Of Agriculture And Rural Areas; Southwest Jiaotong University
Priority date: 2022-01-19
Filing date: 2022-01-19
Publication date: 2022-06-03

Abstract

The invention discloses a house under construction identification method based on unsupervised clustering, which relates to the technical field of computer vision and comprises the following steps: acquiring image data of a house under construction through an image acquisition device; manually marking the position of the building under construction in the collected image data, and then cutting the marked building under construction to obtain a building image data set; setting the category number of clusters, and then performing category division on the building image data in a comparative clustering unsupervised mode; each building has a corresponding category after being clustered, and the category of the building data is labeled to a label on the original image; carrying out model training on the building data image dataset marked with the label by adopting a Yolox target detection algorithm; adjusting the confidence threshold of each category according to the prediction effect of each category of the house data in the test set; the method is more beneficial to the training and convergence of the model, keeps high recall rate and improves the detection precision.

Description

Method for identifying houses under construction based on unsupervised clustering

Technical Field

The invention relates to the technical field of computer vision, in particular to a house under construction identification method based on unsupervised clustering.

Background

The method comprises the steps that picture or video stream data are collected in real time through a camera, and then the collected picture or video stream data are processed and analyzed through an artificial intelligence technology, so that real-time monitoring of rural houses under construction is realized, and information of the rural houses is pushed to an artificial intelligence platform, so that the houses under construction are judged, and the purpose of intelligent supervision is achieved; the method mainly adopts a target detection technology, at present, the target detection generally adopts a supervised learning mode, namely, a training data set is manually marked, the position and the category of a target are mainly marked, then, a model is trained on the marked data set, and finally, the data is predicted through the trained model; the target detection is one of the most common technologies of artificial intelligence, and is mainly divided into a traditional algorithm adopting manual design characteristics and a detection algorithm based on deep learning, and with the development of hardware technologies such as a deep network and a GPU (graphics processing unit), the target detection performance based on the deep learning is far beyond the traditional algorithm and becomes a mainstream algorithm in the current detection field; the method based on deep learning is also divided into Two types, namely One-Stage (One-Stage) and Two-Stage (Two-Stage), wherein the Two-Stage algorithm mainly represents RCNN series (Rcnn, Fast Rcnn and Fast Rcnn), and the Two-Stage algorithm also means that the Two-Stage algorithm needs to be carried out in Two steps, firstly, a candidate region is predicted, then, the candidate region is classified, and the position of a candidate frame is regressed, so the algorithm is high in precision, but poor in real-time performance; on the contrary, the One-Stage algorithm directly predicts the position and the category of a target in an image and realizes end-to-end detection, so that the algorithm has high real-time performance and is easy to optimize, but the accuracy is reduced compared with a One-Stage algorithm, and in terms of the industry, the algorithm prediction precision is only considered, and the real-time performance is also One of important indexes selected by the algorithm, so that the One-Stage algorithm is the detection algorithm preferred by the industry, and the One-Stage algorithm mainly comprises SSD, RetinaNet and Yolo series algorithms; the Yolo series algorithm focusing on the weight of precision and real-time property is widely applied in the industry from Yolov1 to Yolov5, and then to the Yolo detection algorithm just proposed in this year, the Yolo series algorithm is continuously improved, the performance of the Yolo series algorithm is better and better, and the Yolo algorithm leads the detection precision and the inference speed to be improved by introducing the techniques of Decoupled head, anchor-free, simOTA and the like into the algorithm, so that the Yolo series algorithm becomes one of the most excellent detection algorithms at present.

The main problems and defects of the existing technology are that:

most of the existing target detection algorithms label images manually, but for data of houses under construction, states of houses in different periods are different in the construction process, and because angles shot by cameras are different, the house under construction is different, and meanwhile, the house construction types are different; if a single category (such as 'building houses in place') is adopted to label data, the detection precision is low due to the fact that the difference between houses is large and the model is difficult to converge; meanwhile, all states are summarized into a category, so that the model is not easy to optimize, for example, the data identification is not good if the model is not easy to judge, and the state with poor identification cannot be specially optimized; if adopt the manual work to go to classify to data, because the house data diversity of gathering in the real scene, the angle of shooting including each camera is different, shelters from each other between the house, each state and the diversification of house type during the house construction period, this just causes the categorised difficulty of manual definition, if adopt the good classification of manual definition simultaneously, also need go to carry out manual classification when increasing new training data, it is wrong with data classification easily, cause the model training effect not good.

Disclosure of Invention

The invention aims to: in order to solve the technical problem, the invention provides a house under construction identification method based on unsupervised clustering.

The invention specifically adopts the following technical scheme for realizing the purpose:

the invention relates to a house under construction identification method based on unsupervised clustering, which comprises the following steps:

s1, data set production: acquiring image data of a house under construction through an image acquisition device;

s2, preprocessing of the data set: manually marking the position of the building under construction in the collected image data, and then cutting the marked building under construction to obtain a building image data set;

s3, unsupervised clustering of data: setting the number of the types of the clusters, and then performing type division on the image data of the house under construction obtained in the step S2 in a comparison clustering unsupervised mode;

s4, label redefinition: each building has a corresponding category after being clustered, and the category of the building data is labeled to a label on the original image;

s5, training a detection model: carrying out model training on the building data image dataset marked with the label by adopting a Yolox target detection algorithm;

s6, setting a model confidence threshold: and adjusting the confidence threshold of each category according to the predicted effect of each category of the house data in the test set.

Further, in step S1, image data of different focal lengths at different angles and times of the building under construction are collected by the image collecting device.

Further, in step S2, the labelinmage tool is used to manually label the position of the building in the image.

Further, the step S3 of performing category classification on the building image data in an unsupervised manner of contrast clustering includes the following steps:

under-construction house image data sample x_iUsing two data enhancement modes T^aAnd T^bEnhanced images

And

comprises the following steps:

and adopting Resnet34 as a feature extraction network for the enhanced image, wherein the extracted feature vector is expressed as:

wherein

And

representing a feature vector, f (-) represents a feature extraction network;

feature vector

And

and then mapping by two layers of nonlinear perceptrons (MLPs) as follows:

wherein

And

representing the mapped feature vector, g (-) represents the nonlinear sensor;

defining samples

Loss of

Comprises the following steps:

wherein tau is_IIs a temperature parameter of an example level, s (·) represents cosine similarity operation, N represents a set size of the batch, so that after data enhancement, the total number of the batch is 2N, 1 positive sample pair and 2N-2 negative sample pairs can be formed for each sample,

and

respectively represents that the jth sample in one batch passes through T^aAnd T^bExtracting feature vectors after data enhancement;

total loss of samples at the instance level l_insAs shown in the following formula:

when the data samples are mapped to the same number of spatial dimensions as the cluster, each dimension of the data features can be regarded as the probability that the sample belongs to the class, the feature vector extracted by the feature extraction network is mapped into M through two layers of nonlinear perceptors, and M represents the cluster class dimension vector, namely:

wherein g is_c(. cndot.) is a non-linear sensor,

and

respectively representing the M-dimensional vectors after being mapped by the perceptron, the samples in one batch are at T^aAnd T^bData enhanced down-composition Y^a∈R^N×M,Y^b∈R^N×MA matrix;

definition of

Is Y^aIn the ith column, the first column, similarly,

is Y^bColumn i, therefore sample

And

loss at cluster level

Is defined as:

wherein tau is_CIs a temperture parameter of a cluster level, M represents the number of cluster categories,

for the sample at T^aData enhances the cluster allocation for the jth category, and, similarly,

indicating that the sample is at T^bData-enhanced clustering assignment of the jth category;

total loss of samples at cluster level l_cluIs represented by the following formula:

h (y) assigns information entropy of probability to the clusters, so as to avoid that the network assigns all instances to one cluster, which is specifically expressed as the following formula:

wherein the content of the first and second substances,

is shown at T^kThe data of (1) enhances the probability that the jth sample belongs to the category i, | · | | computationally₁Represents the norm L1, Y^kIndicates that a sample in batch is at T^kAn output matrix under data enhancement;

the final model loss function is therefore l ═ l_ins+l_clu；

And (3) minimizing a loss function through error back propagation to obtain a clustering model, and then classifying the data set through the clustering model.

Further, the data enhancement mode is randomly selected from random clipping, gray scale transformation, chrominance transformation, gaussian blurring and horizontal inversion.

The invention has the following beneficial effects:

aiming at data with complex scenes and a plurality of different states in a building, the invention provides a method for labeling the data in multiple classes in an unsupervised clustering mode, compared with the method for labeling the data in a single class, the data distribution is easier to learn by the multiple classes in the multiple classes mode, so that the convergence fitting is better, and the model is easier to optimize after being divided into a plurality of classes; compared with the condition of manually dividing the building into a plurality of types of labels, the condition of each period of the building under construction is abnormal, complicated and difficult to define along with the conditions of camera shooting angle, house type, house shielding and the like, the potential features among the houses are learned in an unsupervised clustering mode, and the images with high similarity are divided into one type, so that the division of a data set is more accurate, the training and convergence of a model are more facilitated, and the prediction precision of the model on the building under construction is improved; aiming at the condition that the building is divided into a plurality of categories, the prediction precision of each category in the actual scene is different, different threshold values are set for each category according to the prediction condition on the test set, and then the labels of the building are uniformly output, so that the high recall rate can be effectively maintained, and the detection precision can be effectively improved.

Drawings

FIG. 1 is a schematic flow chart of the algorithm provided by the present invention;

FIG. 2 is a schematic diagram of a premise data processing module provided by the present invention;

FIG. 3 is a schematic diagram of a comparison clustering module provided by the present invention;

FIG. 4 is a schematic diagram of the diversification of house data provided by the present invention;

FIG. 5 is a diagram of the detection effect of the present invention using manual classification;

FIG. 6 is a diagram of the detection effect of unsupervised clustering according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

As shown in fig. 1 and fig. 2, the present embodiment provides a house under construction identification method based on unsupervised clustering, including the following steps:

in this embodiment, preferably, the image data of each time period, each angle, and different focal lengths of the building under construction are acquired by the image acquisition device. The image acquisition device is a camera, the high-definition camera is controlled by the software platform to acquire all image data of the building under construction in a certain area as acquisition data, and when the data are acquired, data of different scenes and different house types as shown in fig. 4 are collected as much as possible to establish a complete building under construction image data set.

in this embodiment, preferably, the LabelImage tool is used to manually label the position of the building in the image. And only paying attention to the position information of the house in the image, analyzing and extracting the position coordinates of the house in the image by the marked Xml file, and cutting to obtain a cut image data set of the house in the building.

It should be noted that, collected image data may be labeled only at the location of the building, or at the same time, the category of the building may be labeled, the category of the building may be randomly specified, and when the category is labeled, after the unsupervised way of contrast clustering is adopted for category division, the obtained category is substituted for the labeled category.

it should be noted that the name of the category is not important, the number of categories of the cluster can be set at will, for example, the categories can be set as category 1, category 2, category 3, category 4, category 5, etc., and the main purpose of classifying the categories is to classify similar houses into one category.

s6, setting a model confidence coefficient threshold value: and adjusting the confidence threshold of each category according to the predicted effect of each category of the house data in the test set.

In this embodiment, on the test set, different confidence thresholds are set, the prediction effect on the test set is observed, then an appropriate threshold is selected for each category, and after the detection model is trained, the labels of "building in the house" are finally uniformly output, so that the algorithm can be ensured to have higher detection accuracy and recall rate, as shown in fig. 5, a manually-defined category recognition effect graph is shown, and as shown in fig. 6, the algorithm recognition effect graph is shown.

In summary, for data with complex scenes and a plurality of different states such as houses under construction, a multi-class labeling mode is adopted for data, and compared with a single class labeling mode, the multi-class labeling mode enables a model to learn data distribution more easily, so that convergence fitting is better, and the model is easier to optimize after being divided into a plurality of classes; compared with the condition of manually dividing the building into a plurality of types of labels, the condition of each period of the building under construction is abnormal, complicated and difficult to define along with the conditions of camera shooting angle, house type, house shielding and the like, the potential features among the houses are learned in an unsupervised clustering mode, and the images with high similarity are divided into one type, so that the division of a data set is more accurate, the training and convergence of a model are more facilitated, and the prediction precision of the model on the building under construction is improved; aiming at the condition that the building is divided into a plurality of categories, the prediction precision of each category in the actual scene is different, different threshold values are set for each category according to the prediction condition on the test set, and then the labels of the building are uniformly output, so that the high recall rate can be effectively maintained, and the detection precision can be effectively improved.

Example 2

As shown in fig. 3, based on embodiment 1, the performing of category classification on the building image data in step S3 in an unsupervised manner by using contrast clustering includes the following steps:

under-construction house image data sample x_iUsing two data enhancement modes T^aAnd T^bThe data enhancement mode is randomly selected from random cutting, gray level transformation, chroma transformation, Gaussian blur and horizontal inversion, and the enhanced image

And

comprises the following steps:

wherein

And

representing a feature vector, f (-) represents a feature extraction network;

feature vector

And

and then mapped by two layers of nonlinear perceptrons (MLPs) as follows:

wherein

And

representing the mapped feature vector, g (-) represents the nonlinear sensor;

defining a sample

Loss of

Comprises the following steps:

wherein tau is_IIs a temperature parameter of an example layer, 0.5 is taken in the experiment, s (·) represents cosine similarity operation, and N represents a set batch size, so that data enhancement is performedThere are 2N data in batch, 1 positive sample pair and 2N-2 negative sample pairs can be formed for each sample,

and

wherein g is_c(. cndot.) is a non-linear sensor,

and

definition of

Is Y^aIn the ith column, the first column, similarly,

is Y^bColumn i, therefore sample

And

loss at cluster level

Is defined as:

wherein tau is_CIs a parameter of temperture of cluster level, 0.9 is taken in the experiment, M represents the number of cluster types,

wherein the content of the first and second substances,

the final model loss function is therefore l ═ l_ins+l_clu；

And through error back propagation, minimizing a loss function to obtain a clustering model, and then carrying out class division on the data set through the clustering model.

Claims

1. A house under construction identification method based on unsupervised clustering is characterized by comprising the following steps:

s2, preprocessing the data set: manually marking the position of the building under construction in the collected image data, and then cutting the marked building under construction to obtain a building image data set;

2. The method according to claim 1, wherein in step S1, the image data of houses under construction at different times, angles and focal lengths are collected by the image collecting device.

3. The method for identifying houses under construction based on unsupervised clustering according to claim 1, wherein in step S2, the locations of houses under construction in the images are manually labeled by using a Label Image tool.

4. The method for identifying houses under construction based on unsupervised clustering according to claim 1, wherein the step S3 of performing category classification on the house under construction image data in an unsupervised manner of contrast clustering comprises the following steps:

And

comprises the following steps:

wherein

And

representing a feature vector, f (-) represents a feature extraction network;

feature vector

And

and then mapped by two layers of nonlinear perceptrons (MLPs) as follows:

wherein

And

representing the mapped feature vector, g (-) represents a nonlinear sensor;

defining a sample

Loss of

Comprises the following steps:

and

wherein g is_c(. cndot.) is a non-linear sensor,

and

definition of

Is Y^aIn the ith column, the first column, similarly,

is Y^bColumn i, therefore sample

And

loss at cluster level

Is defined as:

h (y) assigns probability entropy to clusters, so as to avoid the network assigning all instances to a cluster, which is specifically expressed as follows:

wherein the content of the first and second substances,

the final model loss function is therefore l ═ l_ins+l_clu；

5. The method of claim 4, wherein the data enhancement mode is selected randomly from random cropping, gray scale transformation, chrominance transformation, Gaussian blur and horizontal inversion.