CN114821154A

CN114821154A - Grain depot ventilation window state detection algorithm based on deep learning

Info

Publication number: CN114821154A
Application number: CN202210317338.4A
Authority: CN
Inventors: 金心宇; 尚珂珂
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2022-07-29

Abstract

The invention discloses a grain depot ventilation window state detection algorithm based on deep learning, which acquires and acquires a ventilation window scene picture of a grain depot through image or video acquisition equipment; in the upper computer, the ventilation window scene picture is input into a grain depot ventilation window state detection network to carry out ventilation window positioning and state classification; the grain depot ventilation window state detection network takes a YOLOv5s network as a basic network and comprises a main network, a feature fusion network and a prediction head which are sequentially connected, wherein an extrusion excitation SE module is added into a C3 module of the main network to form an SE-C3 module; in the pre-measuring head, the original coupling head is modified into a decoupling head. The invention uses an improved algorithm based on YOLOv5s to judge the open and close states of the ventilation window in the grain depot environment, and can still keep higher detection accuracy when the picture background, the shooting angle, the observation distance, the illumination condition and the like are changed.

Description

Grain depot ventilation window state detection algorithm based on deep learning

Technical Field

The invention relates to the technical field of target recognition neural networks, in particular to a grain depot ventilation window state detection algorithm based on deep learning.

Background

Grain is not only essential basic material in people's life, but also important strategic reserve of the country, and grain depot plays an irreplaceable role in the links of national and local grain storage and grain circulation. The main responsibility of the grain depot lies in completing all links of receiving, storing, transporting, scheduling and configuring and the like of grains, and the safe storage of the grains is the important factor. For a closed grain depot, reasonable and effective ventilation is an important means for preventing the deterioration of grains. At present, the most widely applied ventilation technology at home and abroad is still a mechanical ventilation technology, and the control of opening and closing of ventilation windows according to weather conditions is a key means for ventilation of grain depots.

At present, whether the state of the ventilation window is normal or not is confirmed in most grain bins, patrol inspection by workers is needed, the phenomena of omission often occur due to the fact that the number of the grain bins in the grain bins is large, the workload of the workers is large, and particularly the phenomenon that the ventilation window which is not closed tightly is not found timely, so that the quality of reserves is influenced, and unnecessary economic loss is caused. Although some intelligent ventilation systems have been studied at present, and ventilation conditions are monitored through a computer intelligent system and electric signal transmission, the application range is small due to the problems of modification cost and difficulty. Therefore, the research on the ventilation window state detection method has very important practical significance.

Although there are many door and window state detection algorithms based on computer vision, there are still shortcomings in the problem of detecting ventilation windows in complex environment of grain depot, and the following problems are mainly present to wait for solving:

(1) the detection of the grain depot ventilation window has no related public data set which can be directly used, and a large amount of data acquisition, labeling and preprocessing work is required.

(2) The grain depot environment is complicated, and the ventilation window is more and distribute comparatively densely, makes the degree of difficulty that detects rise.

(3) Grain depot ventilation window image and video are gathered by unmanned aerial vehicle and fixed camera, when running into complicated weather environment such as foggy day, rainy day, cause great noise interference to the image video of gathering for detect the precision and reduce.

(4) At present, most of door and window state detection adopts a traditional image processing method, gray level and Gaussian blur processing is carried out on images read in a video stream, flood filling operation is carried out on a judgment area, and the door opening and closing state is judged according to the relation between the number of pixel points returned by the flood filling operation of the judgment area and a set threshold. However, the method is applicable to a simple scene, a single door and window in the scene is detected under the condition of fixing the camera, the generalization and robustness of the algorithm are insufficient, and when the observation angle, the observation distance and the ambient illumination change, the performance of the algorithm is difficult to meet the requirements of the production environment.

Disclosure of Invention

The invention aims to provide a grain depot ventilation window state detection algorithm based on deep learning, which is used for judging the opening and closing states of ventilation windows in a grain depot environment.

In order to solve the technical problem, the invention provides a grain depot ventilation window state detection algorithm based on deep learning, which comprises the following steps: acquiring and acquiring a ventilation window scene picture of a grain depot through image or video acquisition equipment; in an upper computer, inputting a ventilation window scene picture into a grain depot ventilation window state detection network to carry out ventilation window positioning and state classification, and obtaining a detection result image with a prediction frame position of a ventilation window, a state category of the ventilation window and confidence;

the grain depot ventilation window state detection network takes a YOLOv5s network as a basic network and comprises a main network, a feature fusion network and a prediction head which are sequentially connected, wherein an extrusion excitation SE module is added into a C3 module of the main network to form an SE-C3 module; in the pre-measuring head, an original coupling head is modified into a decoupling head.

The invention relates to an improvement of a grain depot ventilation window state detection algorithm based on deep learning, which comprises the following steps:

the SE-C3 module is characterized in that a bottleeck in a C3 module of a YOLOv5s basic network is modified into a SE-bottleeck, the input of the SE-bottleeck is a feature diagram X of an upper network, the feature diagram X firstly passes through two convolution layers to form a feature diagram X1, the feature diagram X1 is processed by using a squeezing excitation SE module to obtain a vector t1 with a dimension of channel number 1, the vector t1 is multiplied by the feature diagram X1 to obtain a feature diagram X2, and finally the input feature diagram X and the feature diagram X2 are added to obtain the output of the SE-bottleeck module;

the extrusion excitation SE module comprises a global pooling layer, two full-connection layers and a sigmoid activation function.

The grain depot ventilation window state detection algorithm based on deep learning is further improved as follows:

the input of the decoupling head is a feature diagram output by the feature fusion network, and the feature diagram firstly passes through a 1x1 convolution layer and then respectively enters a classification branch and a positioning branch: and classifying branches through two 1-by-1 convolutional layers and a sigmoid activation function to output the state category of the ventilation window, positioning the branches through one 1-by-1 convolutional layer and then dividing the branches into two branches, outputting the confidence coefficient through the 1-by-1 convolutional layer and the sigmoid activation function by one branch, and outputting the predicted frame position of the ventilation window through the 1-by-1 convolutional layer by the other branch.

the training and testing process of the grain depot ventilation window state detection network is as follows: taking a training set as input of grain depot ventilation window state detection network training, taking a loss function as an optimization target, updating network parameters by using a back propagation algorithm, optimizing the network parameters by using a random gradient descent SGD algorithm in a training mode of training preheating and cosine annealing, carrying out one-time verification on a verification set after one epoch is finished, iterating 100 epochs in total, and selecting a model file with an mAP value larger than 0.8 on the verification set as a trained pt model file; and inputting the test set into a trained grain depot ventilation window state detection network, and comparing the obtained output result with the label of the test set to achieve a preset target so as to obtain the online grain depot ventilation window state detection network.

the process of dividing the training set and the test set comprises the following steps: gather the image data of the ventilation window of different on-off states in the grain depot under different angles, weather, illumination, the screening has the image of the grain depot ventilation window of different angles, weather and illumination combination to carry out the data enhancement operation, including contrast enhancement, reversal, mirror image, cutting out, then mark out the ventilation window position to the image after the data enhancement operation, and generate the txt label file that contains target position and state classification in corresponding image, press 7 with the image after the mark and its corresponding txt label file: 1: 2 into the training set, validation set and test set.

the state categories are divided into four types of opening, closing, half-opening and untight closing.

the loss function comprises a classification loss, a position loss function and a confidence loss, the classification loss and the confidence loss are consistent with those of a YOLOv5s network, the position loss function is CIoU loss, and the formula is as follows:

wherein d represents the Euclidean distance between two central points, c represents the diagonal distance of the closure, and IoU represents the Intersection ratio (Intersection Over Union) which is the ratio of the overlapping area of the prediction frame and the real labeling frame to the total area of the prediction frame and the real labeling frame; alpha is a parameter for adjusting, v is a parameter for measuring the consistency of the length-width ratio, and the calculation mode is as follows:

w represents the width of the prediction box and h represents the height of the prediction box.

The invention has the following beneficial effects:

1. the open-close state of the ventilation window is judged under the grain depot environment by using an improved algorithm based on YOLOv5s, and higher detection accuracy can be still maintained when picture backgrounds, shooting angles, observation distances, illumination conditions and the like are changed;

2. according to the invention, on the basis of the original YOLOv5s network structure, an attention mechanism is introduced, a C3 module in the original network is replaced by an SE-C3 module, a characteristic diagram is corrected, the weight occupied by valuable characteristics is increased, the weight of the worthless characteristics is inhibited, and the detection accuracy of the network is further improved;

3. the original YOLOv5s coupling prediction head is modified into a decoupling head, the prediction head in the original YOLOv5s network directly predicts the combination of classification and position information through a channel, the combination prediction mode can lead the two tasks to conflict and influence the detection precision, and the position of a target and the classification prediction are respectively output by using two branches so as to improve the detection precision.

Drawings

The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.

FIG. 1 is a flow chart of the grain depot ventilation window state detection method of the present invention;

FIG. 2 is a schematic structural diagram of a grain depot ventilation window state detection network of the present invention;

FIG. 3 is a schematic diagram of the structure of the SE-C3 module of FIG. 2;

fig. 4 is a schematic view of a decoupling head structure according to the present invention.

Detailed Description

The invention will be further described with reference to specific examples, but the scope of protection of the invention is not limited thereto:

embodiment 1, a grain depot ventilation window state detection algorithm based on deep learning, as shown in fig. 1, the specific process is as follows:

s1 construction of a data set

The method comprises the steps of collecting images containing ventilation windows in grain depot scenes under different angles, weather and illumination environments, guaranteeing the comprehensiveness and sufficiency of a data set, and preprocessing image data.

S101: the method comprises the steps of using an unmanned aerial vehicle and a fixed camera, shooting and collecting the ventilation windows in different switch states in the grain depot by adopting different angles under various weather and illumination environments, and obtaining image data containing ventilation window targets.

S102: screening the image data obtained in the step S101, and selecting the images of the grain depot ventilation windows combined under different conditions such as different angles, weather conditions, illumination conditions and the like as far as possible;

s103: performing data enhancement operations such as contrast enhancement, inversion, mirroring, cutting and the like on the image screened in the step S102 to expand the number of the data sets and improve the diversity of the data sets;

s104: and processing the image subjected to the data enhancement operation in the step S103 by using LabelImg, framing out the detection target, namely marking out the position of the ventilation window in the image by using a rectangular frame, and generating a txt label file containing the position and the state type of the target in the corresponding picture. According to the requirements of daily ventilation operation of grain depot operation management under different seasons and weather conditions, the ventilation window of the grain depot mainly has three states: fully open, closed, half open. In addition, in a real operation scene of the grain depot, the condition that the ventilation window is not closed tightly according to requirements but is judged as closed by mistake is frequently generated, and a non-close state is newly added, so that the state categories of the detection target in the data set are divided into four types of opening, closing, half-opening and non-close;

s105: and (4) according to the image marked in the step S104 and the txt label file corresponding to the image and the txt label file, the image is marked in the step S104 according to the following steps of 7: 1: the scale of 2 is divided into a training set, a validation set, and a test set.

S2: constructing grain depot ventilation window state detection network

An extrusion excitation module is added on the basis of the YOLOv5s network, the output end of the network is adjusted, a decoupling head structure is added to improve the network precision, and a grain depot ventilation window state detection network is obtained, wherein the overall structure is shown in FIG. 2.

S201: a YOLOv5s network is built as a basic structure and comprises a backbone network, a feature fusion network and a prediction head which are connected in sequence. The main network is based on a CSPDarkNet structure and comprises a Focus module, 4C 3 modules and a spatial pyramid pooling SPP module, wherein the Focus module performs segmentation and splicing operations on an input image, value taking operations are performed every other pixel to obtain 4 groups of down-sampling pictures, and each C3 module performs down-sampling operations on a feature map; the SPP module processes the input features through the maximum pooling layers with different scales and then performs splicing operation, so that the receptive field of the network can be increased, the algorithm is adaptive to images with different resolutions, and more information is obtained. The backbone network outputs 3 feature graphs with different sizes to the feature fusion network; the feature fusion network adopts an FPN + PAN mode to process and fuse features of different scales to generate a feature pyramid, so that the network can identify the same object with different sizes and scales; the prediction head part is responsible for outputting a prediction result, including position information of a prediction box, state types (open, closed, half-open and not closed) of ventilation windows in the prediction box and confidence of the prediction;

s202: adding a squeezing-and-Excitation module (SE for short) into a C3 module of a backbone network of the YOLOv5s network to form an SE-C3 module, and correcting the characteristics by introducing a channel attention mechanism into the SE-C3 module, so that the weight of valuable characteristics is increased, and the weight of worthless characteristics is suppressed, so that the extracted characteristics have better learnability and the accuracy of the network is improved; the SE-C3 module structure is as shown in fig. 3, a bottleeck part in a C3 module of an original YOLOv5s network is modified into a SE-bottleeck, an input of the SE-bottleeck is a feature diagram X of an upper network, the input first passes through two convolution layers to form a feature diagram X1, the feature diagram X1 is processed by using a squeeze excitation SE module, the squeeze excitation SE module includes a global pooling layer, two full connection layers and a sigmoid activation function, a vector t1 with a dimension of channel number 1 is obtained, weights of channels of the feature diagram X1 are represented, the vector t1 is multiplied by the feature diagram X1 to obtain a feature diagram X2, the feature diagram X2 learns different importance degrees of the channels through an attention mechanism, and finally the input feature diagram X is added to the feature diagram X2, and an output of the SE-bottleeck module can be obtained.

S203: in the probe of the YOLOv5s network, the original coupling head is modified into a decoupling head. The original prediction head in the YOLOv5s network directly combines and predicts the classification and position information through a channel, the combination prediction mode can cause the conflict of two tasks and influence the detection precision, and the decoupling head structure is used, and the position and classification prediction of a target are respectively output by using two branches, so that the detection precision can be improved. The input of the decoupling head is a characteristic diagram output by a characteristic fusion network, the structure is shown in fig. 4, the input characteristic diagram firstly passes through a 1x1 convolution layer, and the purpose is to perform channel dimension reduction and reduce the operation amount; then respectively entering a classification branch and a positioning branch, and respectively taking charge of classification and positioning tasks: and the classification branch outputs state class prediction of the ventilation window through two 1-by-1 convolution layers and a sigmoid activation function, the positioning branch passes through one 1-by-1 convolution layer and is divided into two branches, one branch outputs confidence prediction through the 1-by-1 convolution layer and the sigmoid activation function, and the other branch outputs position coordinate prediction of the ventilation window through the 1-by-1 convolution layer.

S3: training and testing grain depot ventilation window state detection network

S301: loss function

A loss function of a grain depot ventilation window state detection network mainly comprises three parts, namely position loss, classification loss and confidence coefficient loss.

The classification loss and confidence loss are consistent with those of the YOLOv5s network, the position loss function is CIoU loss, and the formula is as follows:

S302: training grain depot ventilation window state detection network

And (4) taking the training set obtained in the step (S105) as the input of the grain depot ventilation window state detection network training, taking the loss function in the step (S301) as an optimization target, and updating network parameters by using a back propagation algorithm, wherein the training mode adopts a mode of training preheating and cosine annealing. The training preheating is to adopt a smaller learning rate for training at the beginning stage of network training, because the weight initialization of the model is random at this time, if the learning rate is selected too much, unstable oscillation of the model is caused. After a certain iteration period, training the model by adopting a preset learning rate change mode, accelerating the convergence speed of the model and optimizing the effect of the model; the initial learning rate is set to be 0.01 in the experiment, the final learning rate is set to be 0.002, and the learning rate is attenuated by a cosine annealing algorithm. Optimizing network parameters by using an SGD (random gradient descent) algorithm, setting a momentum coefficient to be 0.9, setting weight attenuation to be 0.0002, setting a batch size to be 32 due to video memory limitation, wherein the batch size refers to the number of pictures processed in one forward input, iterating 100 epochs in total, and one epoch represents that all training set pictures are trained once; and in the training process, model parameters are updated and stored in a pt model file in real time, verification is performed on a verification set once an epoch is completed, and the accuracy, the recall rate and the mAP are calculated, so that the performance of the network model can be evaluated in real time. And after 100 epochs are trained, selecting a model file with the mAP value larger than 0.8 on the verification set as a trained pt model file, thereby obtaining a trained grain depot ventilation window state detection network.

S303: detecting network for testing state of ventilation window of grain depot

And loading the pt model file trained in the step S302 to obtain a trained grain depot ventilation window state detection network, inputting the images in the test set obtained in the step S105 into the grain depot ventilation window state detection network, comparing the obtained output result with the labels corresponding to the images in the test set, and evaluating whether the network model reaches a preset target or not by calculating the accuracy rate, the recall rate and the mAP value of the output result so as to obtain the online grain depot ventilation window state detection network.

In the target detection task, TP represents the prediction box which exceeds IoU threshold and has the highest confidence, FP represents the prediction box which is IoU lower than the threshold or has the confidence lower than the best matching box, and FN represents the labeling box which is missed to detect.

The recall ratio calculation method is TP/(TP + FN), and the proportion of the model for correct prediction accounts for all real samples.

The accuracy calculation method is TP/(TP + FP), and the proportion of the model for correct prediction accounts for all prediction results.

The average Accuracy (AP) is specific to a single category, a complete P-R curve can be obtained by calculating the Recall rate and the accuracy rate of the category, the Precision rate Precision of the curve is taken as the horizontal axis, the Recall rate Recall is taken as the vertical axis, the area enclosed by the curve is the average accuracy rate AP, the average accuracy rate of each category is taken as the average value to obtain the mAP value, the mAP value belongs to the interval [0,1], and the larger the value is, the better the model effect is.

S4: and detecting the opening and closing states of the ventilation windows by using a grain depot ventilation window state detection network on line.

Firstly, acquiring and acquiring a ventilation window scene picture of a grain depot by using image or video acquisition equipment, for example, a fixed camera or unmanned aerial vehicle equipment, performing ventilation window positioning and state classification on the acquired ventilation window scene picture by using the grain depot ventilation window state detection network which can be used online in the step 3, acquiring the position of a prediction frame of a ventilation window, the state type (open, half open, closed and not tight) and the confidence coefficient of the ventilation window, and drawing the target position and the state type of the ventilation window on an original input image by the information to obtain an image containing a detection result.

Experiment:

1) and the experimental environment: the experiment is carried out on the basis of a server of a CentOS system, is provided with a GPU (graphics processing Unit) for accelerated training, adopts python language programming, and uses a pytorech deep learning framework to build and train a model, so that the GPU operation can be well supported, and related software and hardware configuration is shown in the following table 1.

TABLE 1 software and hardware configuration of the experiment

Operating system	CentOS7.3.1611
		Processor with a memory having a plurality of memory cells	12*E5-2609v3@1.9GHz,15M Cache
Display card	Tesla P4 8GB x2
		Memory device	125GB
Development environment	Python3.7 PyTorch1.4.0

2) And experimental data set:

data acquired by a ventilation window data set of the experiment mainly come from two ways, namely, 2300 pictures are screened out by crawling network related resource pictures by a crawler and from the operation environments of all national granaries; secondly, 3500 pictures of the detection data of the ventilation window of the grain depot are obtained in the field of the grain depot by acquiring and acquiring the pictures including different backgrounds, different angles and illumination conditions through a fixed camera and an unmanned aerial vehicle; the total number of data set pictures constructed by the two ways is 5800.

Then, preprocessing the image, adding treatments such as illumination, noise, blurring and the like to part of the image to enable the obtained data set to be more generalized, and finally obtaining 8300 pictures in total of the experimental data set, wherein the number of four types of states of a ventilation window in each picture is respectively as follows: 7233, closing: 8784, half open: 6921 and loosely closed 5862, and using the four category states as labels of corresponding pictures.

3) Results of the experiment

For comparative analysis, the following 3 models were selected for training:

(a) YOLOv5s original network:

YOLOv5S network constructed as step S201 in example 1

(b) YOLOv5s + SE-C3 network:

building a YOLOv5S network according to step S201 in embodiment 1, and then adding a squeezing excitation module into a C3 module of a backbone network of the YOLOv5S network according to step S202 in embodiment 1 to form an SE-C3 module;

(c) YOLOv5s + SE-C3+ decoupling head: namely the grain depot ventilation window state detection network of the invention.

Using 80% of the experimental data set of the total 8300 pictures as a training set, training the model according to the mode in the specific embodiment S302, and respectively obtaining a trained YOLOv5S original network, a YOLOv5S + SE-C3 network and the grain depot ventilation window state detection network; then, 1660 pictures in the total of 20% of the experimental data set are used as test sets and respectively input into a trained YOLOv5s original network, a YOLOv5s + SE-C3 network and the grain depot ventilation window state detection network, the three networks are tested, the obtained output results are compared with the labels, mAP and FPS (Frames Per Second frame rate, which represents the number of pictures processed by the model Per Second, is larger, which represents the faster processing speed of the model Per Second) of the three networks are calculated and compared, the performance of the model is evaluated, and the experimental evaluation results are shown in Table 2.

Table 2 experimental evaluation results:

	mAP(％)	FPS (frame/second)
			YOLOv5s	90.1	167
YOLOv5s+SE-C3	92.3	155
			YOLOv5s + SE-C3+ decoupling head	93.8	151

As can be seen from Table 2, the mAP of the original YOLOv5s network under the experimental data set is 90.1%, after the SE-C3 module is added, the mAP is increased to 92.3%, and further the mAP is increased to 93.8% by adding the decoupling header structure. Therefore, the network structure of the YOLOv5s + SE-C3+ decoupling head provided by the invention effectively improves the detection precision through the SE-C3 module and the decoupling head structure under the condition of sacrificing a certain detection speed, so that the network can more accurately identify the state of the ventilation window.

Finally, it is also noted that the above-mentioned lists merely illustrate a few specific embodiments of the invention. It is obvious that the invention is not limited to the above embodiments, but that many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.

Claims

1. The grain depot ventilation window state detection algorithm based on deep learning is characterized in that:

acquiring and acquiring a ventilation window scene picture of a grain depot through image or video acquisition equipment; in an upper computer, inputting a ventilation window scene picture into a grain depot ventilation window state detection network to carry out ventilation window positioning and state classification, and obtaining a detection result image with a prediction frame position of a ventilation window, a state category of the ventilation window and confidence;

2. The deep learning based grain depot ventilation window state detection algorithm of claim 1, wherein:

3. The deep learning based grain depot ventilation window state detection algorithm of claim 2, wherein:

4. The deep learning based grain depot ventilation window state detection algorithm of claim 3, wherein:

5. The deep learning based grain depot ventilation window state detection algorithm of claim 4, wherein:

6. The deep learning based grain depot ventilation window state detection algorithm of claim 5, wherein:

7. The deep learning based grain depot ventilation window state detection algorithm of claim 6, wherein:

the loss function comprises classification loss, a position loss function and confidence loss, the classification loss and the confidence loss are consistent with those of the YOLOv5s network, the position loss function is CIoUloss, and the formula is as follows:

w represents the width of the prediction box, and h represents the height of the prediction box.