CN113076860A

CN113076860A - Bird detection system under field scene

Info

Publication number: CN113076860A
Application number: CN202110344311.XA
Authority: CN
Inventors: 腊孟珂; 肖伟康; 钟稚昉; 杨仕雄; 王言
Original assignee: Nanjing University Environmental Planning And Design Institute Group Co Ltd
Current assignee: Nanjing University Environmental Planning And Design Institute Group Co Ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2021-07-06
Anticipated expiration: 2041-03-30
Also published as: CN113076860B

Abstract

The invention discloses a detection method of a bird detection system in a field scene, which comprises the following steps: s1, collecting video data in an actual scene and bird pictures and video data existing on a network; s2, carrying out data preprocessing on the acquired bird data by using data preprocessing software; s3, performing data enhancement on the processed bird picture data set; s4, training a deep learning target detection model by using the bird data set after data enhancement and the background data set without birds; s5, deploying the trained target detection model on edge equipment in an actual field environment; s6, acquiring video data of the bird protection area in a field scene by using a camera; and S7, detecting the target according to the image by using the deep learning target detection model, and returning the detection result to the central server. The invention reduces the cost of manual detection, improves the detection effect of the model, obtains the result in accordance with the actual scene, and reduces the processing pressure of the central server.

Description

Bird detection system under field scene

Technical Field

The invention belongs to the technical field of bird detection methods, and particularly relates to a bird detection system in a field scene.

Background

The discovery and statistics of birds in protected areas has been an important aspect of environmental protection. By recording the number of the bird groups, the bird occurrence time and the residence time, the current health state of the bird groups and the information such as the ecological environment quality of the protected area can be effectively reflected, and the professional can conveniently develop the environment protection work. However, in the conventional method, video data shot by a camera in an environment protection area is classified and counted manually, and the method needs experts to spend a lot of time on processing the video data, and is time-consuming, labor-consuming and large in workload. Therefore, the birds in the protected area can be automatically detected by adopting a target detection algorithm based on deep learning in the field of artificial intelligence.

Target detection refers to finding out an object of interest in an image, determining the category and position of the object, for example, detecting whether birds exist in the image and marking the position information of the birds in the image. Most of traditional target detection algorithms are designed manually to process images, only shallow features can be extracted, detection accuracy is low, various noise interferences can be caused in actual environments, and generalization capability is poor. The deep learning model has unusual expression in the aspect of extracting high-level features and representing capability of complex features, and more target detection algorithms based on deep learning are developed in recent years. The target detection algorithm based on deep learning can automatically extract features, has stronger generalization capability, and is more suitable for deployment and application in actual environment.

The target detection task focuses on the accuracy rate and the recognition speed of detection. Due to the influences of the resolution of the detected image, the size of the target and the complexity of the environment, great difficulty is brought to the target detection. The earliest two stages of generating a candidate window and classifying objects in the candidate window are divided into two stages by the two stages of the initial two-stage target detection algorithm R-CNN, compared with the traditional method, the identification accuracy is obviously improved, and the detection accuracy and the detection speed are improved to a certain extent by a series of two-stage algorithms subsequently proposed based on the R-CNN. However, due to the self-limitation, the calculation cost of the two-stage algorithm is too high, the detection time of one image is too long, the practical application is not met, and meanwhile, the relative position information of the local target in the whole body cannot be utilized. Researchers have therefore proposed a one-stage target detection algorithm, which is represented as a YOLO series, and such algorithms place the whole process in a network for processing, directly obtain candidate regions and target categories, and greatly improve the detection speed.

Considering a series of characteristics of complex environment, small target bird size, limited calculation capacity of edge equipment and the like in an environment protection area, it is necessary to design a target detection system under a field scene according to the existing target detection algorithm and an actual scene.

Disclosure of Invention

The invention aims to overcome partial defects of the application of the current target detection method based on deep learning under the field condition.

In order to solve the defects, the invention discloses a bird detection system under a field scene, which can be practically deployed and applied by combining with an One-Stage target detection model.

In order to achieve the purpose, the invention adopts the following technical scheme:

a bird detection system under a field scene, wherein a detection method of the bird detection system comprises the following steps:

s1, acquiring bird video data and bird picture data; including the bird data D shot by a camera and existing unmarked in the Internet_{unlabeledbird}And acquiring a marked bird picture data set D from the Internet_labeled1Acquiring background video data D from a camera shot without birds_videoenv；

S2, collecting the unlabelled bird data D_{unlabeledbird}Using the written data preprocessing software to carry out data preprocessing to obtain a bird picture data set D_labeled2(ii) a Butt-miningCollected background video data D not including birds_videoenvPreprocessing software written by data is used for preprocessing the data to obtain a background picture data set D not containing birds_negative；

S3, marking bird picture data set D_labeled1And D_labeled2Carrying out data enhancement to obtain an expanded bird data set D_positive；

S4 bird data set D enhanced by data_positiveAnd a background data set D not containing birds_negativeModel for training deep learning target detection Model_bird；

S5, detecting the Model of the trained target_birdThe method comprises the following steps of deploying on edge equipment in an actual field environment;

s6, acquiring video data of the bird protection area in a field scene by using a camera;

s7, Model for detecting target by using deep learning target_birdAnd detecting the target according to the image, and returning the detection result to the central server.

In order to optimize the technical scheme, the specific measures adopted further comprise:

further, step S1 includes:

s11, the camera is deployed in the actual environment protection area;

s12, shooting a bird group and an environment background in the protected area by a camera;

s13, acquiring marked bird images, unmarked bird images and video data from public data sets in the Internet;

s14, forming a data set D by the marked bird picture data disclosed in the Internet_labeled1。

S15, the bird video data shot by the camera and the unmarked bird image and video data collected from the network form a data set D_{unlabeledbird}；

S16, background video data containing no bird shot by camera is formed into data set D_negative。

Further, the data preprocessing process of step S2 includes:

designing and using data preprocessing software to frame background video data not containing birds and non-labeled bird video data, and respectively storing the frame background video data not containing birds and the image data containing birds, wherein the background image data not containing birds forms a data set D_negative；

Performing quality detection on all unmarked image data containing birds, and removing images with too low bird target pixels;

labeling the image data after quality detection, recording bird position bounding-box information, and obtaining a labeled bird picture data set D_labeled2。

Further, step S2 further includes:

writing video data preprocessing software;

performing image extraction on the video data by using video data preprocessing software;

and (3) using video data preprocessing software to carry out quality detection on the image data and remove the images with too low bird target pixels.

And using video data preprocessing software to perform data marking on the image data and record bird position information.

Further, the data enhancement process of step S3 includes:

performing data enhancement on the data set by using an image cutting mode, wherein the mode is to generate a rectangular frame with a smaller size than the image, randomly cutting the image, and finally taking the image in the rectangular frame as training data;

performing data enhancement on the data set by using an image turning mode, wherein the image is turned left and right, and the images before and after turning are simultaneously used as training data;

the data set is enhanced by using an image whitening mode, namely, the image is whitened, namely, the image is normalized into Gaussian (0, 1) distribution, and the images before and after normalization are simultaneously used as training data.

Further, step S4 includesData set D enhanced with data_positiveAnd a background data set D_negativeTraining a target detection network Yolov4, comprising the following steps:

s41, data set D_positiveAnd D_negativeMixed composition training set D_train；

S42, data set D_trainDividing the training set, the verification set and the test set;

s43, modifying the YOLOv4 network into a two-classification network;

s44, according to the data set D_positiveCalculating and setting an anchor frame value as a Yolov4 anchor frame value;

s45, setting parameters of the target detection network;

s46, training the target detection network by using the training set and the verification set;

s47, verifying the target detection effect by using the test set;

s48, repeating the steps S45-S47 until the proper parameters are searched.

Further, in step S44, calculating the anchor block value using the kmeans method includes:

for data set D_positiveCounting all bounding box information in the database, and dividing the bounding box information into 9 categories by using a kmeans algorithm; the average bounding box size among the 9 classes is calculated as the corresponding anchor box value size.

Furthermore, the edge device is a gateway and comprises a CPU and a GPU hardware;

the edge device directly controls the camera;

the edge device can transmit the result obtained by the deep learning target detection model back to the central server.

Further, the camera is deployed in a bird protection area of a field scene;

performing video plug flow by using ffmpeg and rtsp protocols;

images in the video stream are acquired using opencv.

Further, in step S7, a deep learning object detection Model is used_birdThe detection of the target by image is specifically as follows:

sampling images acquired by a field camera, and reducing the calculation pressure of edge equipment;

using a trained target detection Model for the image_birdDetecting to obtain a detection result;

the detection result comprises whether the image has the bird or not, and position information and confidence coefficient of the bird;

denoising the detection result to filter out the detection result with noise information and low confidence coefficient in the actual scene;

locally storing the image of the bird detected in the result, and storing the position information of the bird in the image;

and feeding back the locally stored image containing the birds and the corresponding position information to the central server.

Further, the denoising process includes:

model using target detection Model_birdDetecting the test set to obtain a detection result;

carrying out statistical analysis on the detection results, and classifying according to the size of the detection frame;

standardizing the confidence degrees of the detection frames in the same size range to form normal distribution;

taking the normal distribution as a probability density function, and calculating the corresponding lowest confidence coefficient when the probability is 0.95;

taking the obtained lowest confidence as a confidence threshold of the detection frame in the size range;

and finding a corresponding threshold value according to the size of the detection frame by the actual detection result, returning the detection result with the confidence coefficient larger than the threshold value as a final result, and discarding the detection result with the confidence coefficient smaller than the threshold value.

The invention has the beneficial effects that: the invention uses the camera to monitor the bird protection area, and uses the target detection method to detect the birds appearing in the video, thereby reducing the cost of manual detection; data training in an actual scene is used, and preprocessing and data enhancement operations are performed on the data, so that the detection effect of the model is improved; denoising the detection result to obtain a result in accordance with an actual scene; and the target detection model is deployed on the edge equipment, so that the processing pressure of the central server is relieved.

Drawings

FIG. 1 is a flow chart of the operation of the system of the present invention;

FIG. 2 is a schematic diagram of data preprocessing software programmed in the present invention;

FIG. 3 is a diagram showing the labeling effect of the data preprocessing software written in the present invention;

FIG. 4 is a representation of a portion of a sample in an avian target detection dataset as used in the present invention;

FIG. 5 is a schematic diagram of a training process of the target detection model according to the present invention;

FIG. 6 is a schematic diagram of a process for detecting an object on the edge device according to the present invention;

fig. 7 is a detection effect display diagram of bird target detection used in the present invention.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings.

It should be noted that the terms "upper", "lower", "left", "right", "front", "back", etc. used in the present invention are for clarity of description only, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not limited by the technical contents of the essential changes.

Fig. 1 is a flow chart of a bird detection system in a field scene according to the present invention, which includes the following 11 steps.

Step 1, collecting data, and constructing a bird data set and a background data set for subsequent data processing. The bird data collected in the example comprise three sources, one of the three sources is that a public picture data set on the internet is collected manually, the public picture data set comprises data sets such as COCO and VOC, and the marked bird data in the data set is taken out to form D_labeled1(ii) a Secondly, the bird video data are published on the internet; thirdly, the video camera is fixed in the area where birds frequently appear in the environment protection area from the bird video data shot by the video camera in the environment protection areaThe domain increases the probability of birds appearing in the video, and the latter two data sets jointly form a data set D_{unlabeledbird}. The background data collected in this example are from the environmental video data of birds not included taken by the cameras in the environmental protection area, constituting a data set D_videoenv

Step 2, data preprocessing and background video data set D processing_videoenvObtaining a background image data set D_negative. Processing a data set D_{unlabeledbird}A noise-free and directly usable data set is obtained for subsequent data enhancement. Data pre-processing in this example includes video data framing, data quality detection, and data annotation. The method specifically comprises the steps of writing data preprocessing software R shown in figure 2, opening background video data by using the R, storing a current frame image, and finally obtaining a background picture data set D_negative. Opening bird video data by using the R, detecting according to frames, and directly removing birds if one frame of image does not contain birds; if the frame contains birds but the pixels are low, the frame is also rejected; if a bird is present in the frame and the lowest pixel size is satisfied, the position of the bird is marked on the image with a mouse using a marking function, as shown in FIG. 3. The next frame is then processed until a segment of video processing is complete, all images containing birds and corresponding bird location information are saved locally. Using R to open unlabeled bird image data, the standard approach is similar to video data. Finally obtaining a labeled bird image data set D_labeled2。

And 3, data enhancement, namely performing data enhancement on the preprocessed data set to increase the size of the data set and enlarge the posture of the target displayed in the image. In this example, the operations of cutting, turning over and whitening are respectively performed on the image; respectively cutting the images in a mode of generating a rectangular frame with a size smaller than that of the images, randomly cutting the images, and finally taking the images in the rectangular frame as training data; turning the image in a mode of turning the image left and right, and simultaneously using the image before and after turning as training data; the whitening operation is applied to the image in such a way as to normalize the image itselfAnd forming Gaussian (0, 1) distribution, and simultaneously using the images before and after normalization as training data. Finally, a marked image data set D with 16981 birds is obtained_positiveAs shown in fig. 4.

Step 4, training a deep learning target detection model, which is embodied in the embodiment that bird picture data and background picture data are used as positive and negative samples to be mixed to obtain a complete data set D_trainAnd randomly dividing the training set into training sets according to the ratio of 3: 1: and (4) verification set: and (5) testing the set. Wherein, the training set and the verification set are used as the input of the model to complete the training process of the model; the test set is used for evaluating the model and judging whether the model achieves the required effect. The model is selected as YOLOV4, and the training process of the model is shown in fig. 5, and includes the following steps:

and 4-1, modifying a classification layer in the network to enable the classification layer to perform a classification task.

Step 4-2, setting model parameters, specifically setting a learning rate of 0.001, a batch size of 64, a weight attenuation of 0.0005, training steps of 2000, and selecting DarknetNet53 for the backbone network;

step 4-3, according to bird picture data set D_positiveThe bounding box information is divided into 9 classes by using a kmeans clustering method, the mean value of all the bounding box information in each class is used as the final anchor frame value, and the obtained 9 anchor frame values are used as the prior knowledge of the target detection model to replace the default anchor frame value.

Step 4-4, inputting the data of the training set and the verification set into the model;

step 4-5, training YOLOV4 on an Nvidia 2080ti display card by using an open source pre-training model, and storing a weight file obtained by training after the training is finished;

4-6, verifying the effect of the model after training by using the weight file and the test set obtained by training;

step 4-7, judging a test result, and if the Average Precision of the mAP (mean Average Precision) reaches 0.7, regarding the parameters and the model obtained by training at the moment as the optimal result; otherwise, modifying the parameter value and repeating the steps 4-4 to 4-7.

And 5, deploying the algorithm in the actual field environment, specifically in the example, deploying the algorithm on edge equipment, deploying the edge equipment into a chassis existing in an environment protection area, wherein the edge equipment is a gateway and can directly control the camera and acquire the video in the actual environment from the camera. The gateway device comprises a CPU10700K and a GPU GeForce RTX 2080ti, and is used for providing computational power requirements of a target detection network, and all detection processes are completed on the edge device.

And 6, acquiring video data shot by the camera, wherein the camera is a dome camera in the embodiment, and can rotate and zoom to complete the omnibearing detection of the area near the dome camera. Video stream data acquired by the dome camera is transmitted through an rtsp protocol and is pushed to the network by ffmpeg, and then the edge device captures the video data from the rtsp stream through opencv and takes out images in the video data.

Step 7, processing the image using the target detection model, which is embodied as shown in fig. 6 in this example, includes the following steps:

step 7-1, regarding the image acquired by opencv, considering that the calculation power of edge equipment is limited, the algorithm detection speed needs to be improved through image down-sampling, in this example, the image is down-sampled to 576 x 576, and then the down-sampled image is input into the target detection algorithm;

7-2, detecting the down-sampled image by using the trained weight parameters stored by the edge end by using a target detection algorithm to obtain a corresponding detection result, wherein the detection result comprises a detection frame and the probability that birds are contained in the detection frame, judging the detection result, and detecting the next frame of image if the birds are not detected in the frame; if birds are detected in the current frame, noise possibly existing in an actual scene is considered, denoising processing needs to be carried out on a detection result, and the specific expression is that statistical analysis is carried out on collected test set data in advance, classification is carried out according to the size of detection frames in the detection result, and the confidence coefficients of the detection frames in the same size range are standardized to form normal distribution. And then, taking the normal distribution as a probability density function, calculating the corresponding lowest confidence coefficient when the probability is 0.95, and taking the obtained lowest confidence coefficient as a confidence coefficient threshold value of the detection frame in the size range. The actual detection result finds the corresponding threshold according to the size of the detection frame, the detection result with the confidence degree greater than the threshold is returned as the final result, the detection result with the confidence degree less than the threshold is discarded, and finally the frame and the position information of the corresponding bird are stored locally, which is specifically shown in fig. 7. The locally stored detection results are transmitted to the central server. The results, embodied in this example as locally stored, are the images that have been algorithmically detected as birds and de-noised, and the edge devices transmit these results over the network to the central server, helping the central server to count the data.

The method and the device monitor the protected area by using the camera, detect birds appearing in the video by using the target detection method, reduce the cost of manual detection, use data training in an actual scene, perform preprocessing and data enhancement operation on the data, and improve the detection effect of the model; denoising the detection result to obtain a result in accordance with an actual scene; and the target detection model is deployed on the edge equipment, so that the processing pressure of the central server is relieved.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims

1. A bird detection system under a field scene is characterized in that a detection method of the bird detection system comprises the following steps:

s1, acquiring bird video data and bird picture data; including unlabeled bird data D_{unlabeledbird}Acquiring a data set D from a publicly labeled bird picture_labeled1Collecting bags from camera shotsBackground video data D containing birds_videoenv；

S2, collecting the unlabelled bird data D_{unlabeledbird}Using data preprocessing software to carry out data preprocessing to obtain a bird picture data set D_labeled2(ii) a For the collected background video data D not containing birds_videoenvUsing data preprocessing software to carry out data preprocessing to obtain a background picture data set D not containing birds_negative；

2. The bird detection system of claim 1, wherein the bird data D not labeled in step S1_{unlabeledbird}Including unlabelled bird image data and video data obtained from the internet, and video data containing birds shot by a camera.

3. The bird detection system of claim 2 wherein the data preprocessing process of step S2 includes:

designing and using data preprocessing software to frame background video data not containing birds and unlabelled bird video data, and respectively storing the background video data not containing birds and the image data containing birds, wherein the background video data not containing birds and the image data containing birds are not containing birdsScene image data composition data set D_negative；

4. The bird detection system of claim 1 wherein the data enhancement of step S3 includes:

5. The bird detection system of claim 1, wherein step S4 includes using data enhanced data set D_positiveAnd a background data set D_negativeTraining a target detection network Yolov4, comprising the following steps:

s41, data set D_positiveAnd D_negativeMixed composition training set D_train；

s43, modifying the YOLOv4 network into a two-classification network;

s45, setting parameters of the target detection network;

s47, verifying the target detection effect by using the test set;

s48, repeating the steps S45-S47 until the proper parameters are searched.

6. The bird detection system of claim 4 wherein in step S44, calculating an anchor frame value using the kmeans method includes:

for data set D_positiveCounting all the bounding box information in the database, and dividing the bounding box information into 9 categories by using a kmeans algorithm; the average bounding box size among the 9 classes is calculated as the corresponding anchor box value size.

7. The bird detection system of claim 1, wherein the edge device is a gateway comprising a piece of CPU and a piece of GPU hardware;

the edge device directly controls the camera;

8. The bird detection system of claim 1, wherein the camera is deployed in a bird protection area of a field scene;

performing video plug flow by using ffmpeg and rtsp protocols;

images in the video stream are acquired using opencv.

9. The bird detection system of claim 4, wherein in step S7, a deep learning target detection Model is used_birdThe detection of the target by image is specifically as follows:

using a trained target detection Model for the image_birdDetecting to obtainDetecting the detection result;

denoising the detection result, and filtering out the detection result with noise information and low confidence coefficient in the actual scene;

10. The bird detection system of claim 8, wherein the de-noising process comprises: