CN116012296B

CN116012296B - Prefabricated part detection method based on super-resolution and semi-supervised learning

Info

Publication number: CN116012296B
Application number: CN202211532025.7A
Authority: CN
Inventors: 万华平; 张文杰; 胡鹏华; 葛荟斌
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-12-01
Filing date: 2022-12-01
Publication date: 2023-10-24
Anticipated expiration: 2042-12-01
Also published as: CN116012296A

Abstract

The method for detecting the prefabricated part based on super-resolution and semi-supervised learning improves the quality of the prefabricated part picture by using a super-resolution algorithm, and reduces the high cost of data labeling work by using a semi-supervised learning algorithm. The specific implementation steps are as follows: (1) building and training a super-resolution network Real-ESRGAN; (2) collecting pictures of prefabricated components under various construction environments, inputting the pictures into a generation model of a Real-ESRGAN network, and improving the picture quality; (3) introducing a semi-supervised learning algorithm mean-teacher network to train a target detector Yolov5; (4) and detecting the real-time collected field data by using the trained Yolov5 model, and positioning and classifying the prefabricated components assembled on the building site. The invention can improve the image quality, realize excellent detection performance under the condition of limited mark data and provide technical support for the management of the assembly type construction site.

Description

Prefabricated part detection method based on super-resolution and semi-supervised learning

Technical Field

The invention relates to a method for detecting prefabricated components in an assembly type, in particular to a technology for detecting prefabricated components in a construction site based on super-resolution and semi-supervised learning algorithms, and belongs to the field of structural engineering.

Background

Along with the development of the assembled building industry in China, the demand of assembled prefabricated components is rapidly increased, a large number of prefabricated components of different types are piled up on a construction site, and real-time detection of the prefabricated components for guiding construction has very important research significance. The existing method for detecting the prefabricated part is mainly manual inspection, is time-consuming and labor-consuming, and cannot meet the engineering requirements.

In recent years, the computer vision technology realizes the automation of remote dynamic monitoring of the construction site, and improves the construction management level of the construction site. With the improvement of the computing power of computers, the target detection technology based on deep learning is rapidly developed. A large number of object detection models (such as Yolo, fast R-CNN, etc.) have been widely used for the detection of prefabricated building elements due to their high precision and non-contact characteristics.

However, deep learning based object detection model training requires large-scale, high quality and well-labeled datasets. However, the existing data sets still have the following problems: (1) The resolution of the prefabricated part picture is too low, the size of the detection target in the picture is small, the carried information is little, and the feature expression capability is weak, so that the detection performance of the deep learning model is poor; (2) The data set pictures need to be marked by professionals with corresponding knowledge reserves, so that the cost is high, and the situation of error leakage is easy to occur. Thus, a new technique is needed to increase image resolution and overcome the limitations of manual marking.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an assembled prefabricated part detection method based on super-resolution and semi-supervised learning, so as to improve the feasibility of vision-based concrete prefabricated part detection in practical application. The concrete contents include:

the prefabricated part detection method based on super-resolution and semi-supervised learning comprises the following steps:

A. training a super-resolution network Real-ESRGAN;

A1. collecting a prefabricated part picture, and preprocessing data to obtain a training data set of the Real-ESRGAN of the super-resolution network;

A2. a Real-ESRGAN network is built, and the network consists of a generation model G and an identification model D. The generation model G generates a corresponding super-resolution map using the input low-resolution map, and the discrimination model D determines whether the picture is a super-resolution map generated by the generation model or an original high-resolution map. The quality of generated pictures is improved through continuous mutual game between G and D;

A3. training Real-ESRGAN network: firstly, fixing parameters of G, and training D to accurately distinguish real images from generated images; then, the parameters of D are fixed and G is trained to generate super-resolution pictures that can confuse D. Repeating the above processes to finally obtain a required generation model;

B. collecting pictures of assembled prefabricated components under various construction environments, inputting the pictures into a generation model of a Real-ESRGAN network, amplifying the resolution of the pictures to be twice as high as the original resolution, and improving the quality of the pictures;

C. training a target detection model Yolov5 by using a semi-supervised learning algorithm;

C1. dividing the pictures with improved quality into a training set and a testing set, and labeling 50% of the pictures to obtain labeled pictures x _l Label y, unlabeled picture x _u The method comprises the steps of carrying out a first treatment on the surface of the For the original dataset (x _l ,x _u ) Respectively adding random noise to form training data set, which is marked as (x' _l ,x' _u ) And (x' _l ,x” _u )；

C2. Constructing a semi-supervised learning framework mean-teacher network, wherein the framework consists of two student networks and a teacher network with the same structure;

C3. the data (x 'after random noise is added to the target detector using the Yolov5 model as a student network and a teacher network' _l ,x' _u ) And (x' _l ,x” _u ) Respectively inputting the result into a student network and a teacher network to obtain output results. And calculating the loss value according to the consistency regularization criterion, and iteratively updating the parameters of the student network and the teacher network. Finally, verifying the performance of the trained student network on a test data set;

D. and detecting image or video data acquired at the construction site by using the trained Yolov5 model, and positioning and classifying the prefabricated components assembled at the construction site.

Further, the data preprocessing in the step A1 includes: the pictures are adjusted to the same size, in order to avoid picture distortion, the long sides of the pictures are scaled to 640 pixels, while the short sides keep the original aspect ratio of the pictures scaled to the corresponding size, denoted s, so the original high resolution picture size is denoted (640, s); scaling the high resolution picture size to (320, s/2) as input to generate a low resolution picture of the model; the high resolution map and the low resolution map together comprise a training dataset of Real-ESRGAN.

Further, the training loss function of the step A3 includes: l1 distance loss L ₁ Countering loss L _G And perceived loss L _percep Their functions can be expressed as

L _percep ＝||φ(z)-φ(G(z))|| ₁ (3)

Wherein z, y and x represent an input low resolution picture, a generated super resolution picture and a high resolution picture, respectively; I/S the level is that,and φ (·) represents the L1 norm, the desired function, and the VGG penalty function, respectively; the final loss value is obtained by the following formula

L＝ηL ₁ +λL _G +L _percep (4)

Wherein eta and lambda represent weight coefficients, taken as 1 and 0.1, respectively.

Further, the mean-teacher network in the step C2 is composed of a student network and a teacher network, wherein the student network performs parameter optimization by a random gradient descent method, and the teacher network updates according to parameters of the student network.

Further, the loss function in the step C3 includes a loss function of the student network and a teacher network optimization function:

the loss function of the student network comprises supervisionLoss of learning L _sl And semi-supervised learning loss L based on consistency regularization criterion _ssl The functions of which are respectively expressed as

L _sl ＝L _OD [f _s (x' _l ),y _l ] (5)

L _ssl ＝L _OD [f _s (x' _l ,x' _u ),f _t (x” _l ,x” _u )] (6)

Wherein L is _OD Representing the loss function of the target detector, f _s Representing a student network, f _t Representing a teacher network. Total loss is recorded as L _T ＝L _sl +L _ssl ；

Parameter θ 'of the teacher network at the t training round' _t Can be given by

θ' _t ＝αθ' _t-1 +(1-α)θ _t (7)

Wherein θ is _t Representing parameters of the student network, α represents a smoothing parameter, which increases with increasing training rounds.

Compared with the prior art, the technology has the following advantages:

(1) The detection model obtained by the technical training has better detection performance, can overcome some common challenges such as shielding, blurring, small targets and the like in building site detection, and has higher feasibility in the vision-based concrete prefabricated part detection practice.

(2) Compared with the image amplified by the linear interpolation method, the image of the super-resolution prefabricated part obtained by the Real-ESRGAN network is clearer, more characteristic information can be provided, and the accuracy of the target detection model can be effectively improved.

(3) The model performance trained by using 50% of marking data in the proposed technology is equal to the detection performance achieved by using 100% of marking data in the supervised learning technology, and the high cost caused by manual marking can be greatly reduced.

Drawings

FIG. 1 is a flow chart of the present technique;

FIG. 2 is a diagram of a super resolution network Real-ESRGAN of the present invention;

fig. 3 (a) to 3 (d) are super-resolution pictures according to the present invention, wherein fig. 3 (a) and 3 (c) are original pictures, and fig. 3 (b) and 3 (d) are quality-improved super-resolution pictures;

FIG. 4 is a semi-supervised learning mean-teacher framework diagram of the present invention.

Detailed Description

The method for detecting the prefabricated part based on super-resolution and semi-supervised learning is further described in detail below with reference to the accompanying drawings. The implementation technology of the invention is shown in fig. 1, and specifically comprises the following steps:

A. training a super-resolution network Real-ESRGAN;

A1. collecting prefabricated part pictures, wherein 2000 pictures are collected in the example; preprocessing data: the pictures are adjusted to the same size, in order to avoid picture distortion, the long sides of the pictures are scaled to 640 pixels, while the short sides keep the original aspect ratio of the pictures scaled to the corresponding size, denoted s, so the original high resolution picture size is denoted (640, s); scaling the high resolution picture size to (320, s/2) as input to generate a low resolution picture of the model; the high resolution map and the low resolution map together comprise a training dataset of Real-ESRGAN.

A2. Building a Real-ESRGAN network, wherein the network consists of a generation model G and an identification model D, as shown in figure 2;

A3. training Real-ESRGAN network using data set: firstly, fixing parameters of G, and training D to accurately distinguish real images from generated images; then, the parameters of D are fixed and G is trained to generate super-resolution pictures that can confuse D. Repeating the above processes to finally obtain a required generation model;

B. collecting prefabricated component pictures under various construction environments, inputting the prefabricated component pictures into a generation model of a Real-ESRGAN network, and improving the picture quality, wherein an original picture and a corresponding super-resolution picture are shown in FIG. 3;

C1. marking the super-resolution prefabricated part pictures with improved quality, dividing the super-resolution prefabricated part pictures into a training set and a test set, wherein the training set and the test set comprise 5000 training pictures and 900 test pictures, 2500 training pictures are marked, and 900 test pictures are marked;

C2. constructing a semi-supervised learning framework mean-teacher network, wherein the framework consists of two student networks and a teacher network with the same structure, as shown in fig. 4;

C3. taking a Yolov5 model as a target detector of a student network and a teacher network, inputting pictures of all training data sets into a semi-supervised learning framework for training, and verifying the performance of the trained student network on a test data set;

C4. the test result shows that the performance of the model trained by the proposed technology is far higher than that of the supervised learning model and the supervised learning model combined with super-resolution, especially when the proportion of marked data is low, the effect of improving the performance is obvious, which fully explains the effectiveness of the proposed technology;

D. the trained Yolov5 model is used for processing image or video data acquired on a construction site, positioning and classifying prefabricated components assembled on a construction site, and detection results show that the technology has good feasibility in detecting the concrete prefabricated components based on vision.

The description of the embodiments of the present invention is merely an enumeration of possible implementation for the inventive concept, and the scope of protection of the present invention should not be construed as limited to the specific forms set forth in the embodiments, as well as equivalent technical solutions conceived by those skilled in the art according to the inventive concept.

Claims

1. The method for detecting the prefabricated part based on super-resolution and semi-supervised learning is characterized by comprising the following steps of:

A. training a super-resolution network Real-ESRGAN;

A2. building a Real-ESRGAN network, wherein the network consists of a generation model G and an identification model D; the generation model G generates a corresponding super-resolution image by using the input low-resolution image, and the identification model D judges whether the image is the super-resolution image generated by the generation model or the original high-resolution image; the quality of generated pictures is improved through continuous mutual game between G and D;

A3. training Real-ESRGAN network: firstly, fixing parameters of G, and training D to accurately distinguish real images from generated images; then, fixing the parameters of D, and training G to generate a super-resolution picture capable of confusing D; repeating the above processes to finally obtain a required generation model;

C1. dividing the pictures with improved quality into a training set and a testing set, and labeling 50% of the pictures to obtain labeled pictures x _l Label y, unlabeled picture x _u The method comprises the steps of carrying out a first treatment on the surface of the For the original dataset (x _l ,x _u ) Respectively adding random noise to form training data set, which is marked as (x' _l ,x′ _u ) And (x) _l ,x″ _u )；

C3. the data (x 'after random noise is added to the target detector using the Yolov5 model as a student network and a teacher network' _l ,x′ _u ) And (x) _l ,x″ _u ) Respectively inputting the data into a student network and a teacher network to obtain output results; calculating a loss value according to a consistency regularization criterion, and iteratively updating parameters of a student network and a teacher network; finally, verifying the performance of the trained student network on a test data set;

2. The method for detecting the prefabricated part based on super-resolution and semi-supervised learning according to claim 1, wherein the method comprises the following steps of: the data preprocessing in the step A1 comprises the following steps: the pictures are adjusted to the same size, in order to avoid picture distortion, the long sides of the pictures are scaled to 640 pixels, while the short sides keep the original aspect ratio of the pictures scaled to the corresponding size, denoted s, so the original high resolution picture size is denoted (640, s); scaling the high resolution picture size to (320, s/2) as input to generate a low resolution picture of the model; the high resolution map and the low resolution map together comprise a training dataset of Real-ESRGAN.

3. The method for detecting the prefabricated part based on super-resolution and semi-supervised learning according to claim 1, wherein the method comprises the following steps of: the training loss function of the step A3 includes: l1 distance loss L ₁ Countering loss L _G And perceived loss L _percep Their functions can be expressed as

L _percep ＝||φ(z)-φ(G(z))|| ₁ (3)

L＝ηL ₁ +λL _G +L _percep (4)

4. The method for detecting the prefabricated part based on super-resolution and semi-supervised learning according to claim 1, wherein the method comprises the following steps of: the mean-teacher network in the step C2 consists of a student network and a teacher network, wherein the student network performs parameter optimization through a random gradient descent method, and the teacher network updates according to parameters of the student network.

5. The method for detecting the prefabricated part based on super-resolution and semi-supervised learning according to claim 1, wherein the method comprises the following steps of: the training loss function in the step C3 includes a student network loss function and a teacher network optimization function:

the student network loss function comprises a supervised learning loss L _sl And semi-supervised learning loss L based on consistency regularization criterion _ssl The functions of which are respectively expressed as

L _sl ＝L _OD [f _s (x′ _l ),y _l ] (5)

L _ssl ＝L _OD [f _s (x′ _l ,x′ _u ),f _t (x″ _l ,x″ _u )] (6)

Wherein L is _OD Representing the loss function of the target detector, f _s Representing a student network, f _t Representing a teacher network;

total loss is recorded as L _T ＝L _sl +L _ssl ；

θ′ _t ＝αθ′ _t-1 +(1-α)θ _t (7)