CN118097721B

CN118097721B - Wetland bird recognition method and system based on multi-source remote sensing observation and deep learning

Info

Publication number: CN118097721B
Application number: CN202410530152.6A
Authority: CN
Inventors: 钟顺; 黄敏; 尚子安; 常力书; 付宇恒; 林珲
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2024-04-29
Filing date: 2024-04-29
Publication date: 2024-06-25
Anticipated expiration: 2044-04-29
Also published as: CN118097721A

Abstract

The invention discloses a wetland bird identification method and system based on multi-source remote sensing observation and deep learning, wherein the method comprises the following steps: acquiring a plurality of non-repeated orthographic images of birds covering a certain wetland area in overlooking shooting by an unmanned aerial vehicle; selecting a video frame with highest similarity with each orthographic image, and unifying the size to form a plurality of image pairs; manually labeling a plurality of groups of image pairs to obtain a training sample set and a verification sample set; inputting the training sample set into a bird recognition model for model training, and optimizing the bird recognition model by using the verification sample set to obtain a bird recognition model with high fitting degree; and re-acquiring an orthographic image and a video frame of the wetland area, and inputting a bird recognition model with high fitting degree after processing to obtain a bird recognition result. According to the bird recognition method, the bird features of the side view shot by the camera and the bird features of the orthographic image shot by the unmanned aerial vehicle are fused, so that the problem of reduced bird recognition accuracy caused by shielding of birds in a single-view image is solved.

Description

Wetland bird identification method and system based on multi-source remote sensing observation and deep learning

技术领域Technical Field

本发明涉及遥感图像处理与地理空间信息挖掘领域，具体为基于多源遥感观测和深度学习的湿地鸟类识别方法及系统。The present invention relates to the field of remote sensing image processing and geospatial information mining, and specifically to a wetland bird identification method and system based on multi-source remote sensing observation and deep learning.

背景技术Background technique

鸟类识别是对鸟类进行保护的必要前提，通过鸟类图像可以有效区分各种鸟类。随着人工智能的兴起，深度学习方法开始被应用于鸟类识别，深度学习模型通过学习图像上的鸟类特征来识别鸟类物种。但是，湿地中鸟类成群结对，仅从单一视角对鸟群进行监测，所拍摄的图像上鸟与鸟之间存在互相遮挡的现象，使得深度学习模型无法准确学习被遮挡鸟类的特征，导致识别准确性的下降。Bird identification is a necessary prerequisite for bird protection. Bird images can effectively distinguish different types of birds. With the rise of artificial intelligence, deep learning methods have begun to be applied to bird identification. Deep learning models identify bird species by learning bird features on images. However, birds live in pairs in wetlands. If the flocks are monitored from a single perspective, there will be mutual occlusion between birds in the captured images, making it impossible for deep learning models to accurately learn the features of the occluded birds, resulting in a decrease in recognition accuracy.

目前，获取鸟类图像的设备主要有摄像头和无人机两种，利用鸟类观测摄像头所拍摄的鸟类侧视图能够对鸟的侧面轮廓和细节特征有更清晰的刻画，便于鸟类种类的识别；但是由于鸟群中存在鸟与鸟之间相互遮挡的问题，被遮挡的鸟类容易出现识别错误的情况；而无人机所拍摄的正射图像中鸟与鸟之间通常不存在遮挡的问题，但是由于视角和距离问题，正射图像只能对鸟类整体形态进行展示，仅凭正射图像很难判断其所属类别。At present, there are two main devices for acquiring bird images: cameras and drones. The side view of birds taken by bird observation cameras can more clearly depict the side profile and detailed features of the bird, making it easier to identify the bird species; however, due to the problem of mutual occlusion between birds in a flock, the obscured birds are prone to identification errors; while in the orthophotos taken by drones, there is usually no occlusion problem between birds, but due to viewing angle and distance issues, the orthophotos can only show the overall shape of the bird, and it is difficult to determine its category based on the orthophotos alone.

发明内容Summary of the invention

鉴于此，本发明的目的在于提供一种基于多源遥感观测和深度学习的湿地鸟类识别方法及系统，通过融合摄像头所拍摄的侧视图的鸟类特征和无人机所拍摄的正射图像的鸟类特征，解决单一视角图像中鸟类被遮挡造成的鸟类识别准确度下降的问题。In view of this, the purpose of the present invention is to provide a wetland bird identification method and system based on multi-source remote sensing observation and deep learning, which solves the problem of decreased bird identification accuracy caused by occlusion of birds in single-perspective images by fusing the bird features of the side view taken by the camera and the bird features of the orthographic image taken by the drone.

本发明采用的技术方案如下：基于多源遥感观测和深度学习的湿地鸟类识别方法，步骤如下：The technical solution adopted by the present invention is as follows: a wetland bird identification method based on multi-source remote sensing observation and deep learning, the steps are as follows:

步骤S1，获取无人机俯视拍摄覆盖某湿地区域鸟类的多张正射图像后合成一张全景正射图像，再对全景正射图像进行均匀切割，得到多张无重复的正射图像；Step S1, obtaining multiple orthophoto images of birds covering a wetland area taken by a drone from a bird's-eye view and synthesizing them into a panoramic orthophoto image, and then evenly cutting the panoramic orthophoto image to obtain multiple orthophoto images without duplication;

步骤S2，根据步骤S1得到的多张无重复的正射图像的坐标信息和时间信息，截取与每张无重复的正射图像对应的相同范围内多个摄像头所记录的视频影像中同一时间的多张视频帧，对每张无重复的正射图像对应的多张视频帧进行相似度匹配，选取与每张无重复的正射图像相似度最高的一张视频帧，组成图像对，对每组图像对中的正射图像和视频帧统一尺寸，最后形成多组时空位置对应的尺寸相同的图像对；Step S2, based on the coordinate information and time information of the multiple non-repetitive orthophoto images obtained in step S1, intercept multiple video frames recorded by multiple cameras in the same range corresponding to each non-repetitive orthophoto image at the same time, perform similarity matching on the multiple video frames corresponding to each non-repetitive orthophoto image, select a video frame with the highest similarity to each non-repetitive orthophoto image to form an image pair, unify the size of the orthophoto image and the video frame in each group of image pairs, and finally form multiple groups of image pairs with the same size corresponding to the temporal and spatial positions;

步骤S3，将步骤S2中所得多组时空位置对应的尺寸相同的图像对进行人工标注，得到训练样本集和验证样本集；Step S3, manually labeling the image pairs with the same size corresponding to the multiple groups of spatiotemporal positions obtained in step S2 to obtain a training sample set and a verification sample set;

步骤S4，将训练样本集输入鸟类识别模型进行模型训练，利用验证样本集优化鸟类识别模型，得到拟合度高的鸟类识别模型；Step S4, inputting the training sample set into the bird recognition model for model training, and optimizing the bird recognition model using the verification sample set to obtain a bird recognition model with a high degree of fit;

步骤S5，重新获取覆盖湿地区域的多张无重复的正射图像和视频帧，经步骤S2和步骤S3处理后输入步骤S4得到的拟合度高的鸟类识别模型，得到鸟类识别结果。Step S5, reacquire multiple non-repetitive orthophoto images and video frames covering the wetland area, and input them into the bird recognition model with high fitting degree obtained in step S4 after being processed in steps S2 and S3 to obtain the bird recognition result.

进一步的，步骤S1中，具体为：Furthermore, in step S1, specifically:

选取某湿地区域摄像头正常工作以及鸟类正在休憩的时间段，利用开启RTK的无人机对湿地区域进行航线规划和飞行任务，获取覆盖湿地区域鸟类的多张具有重叠度的正射图像，利用图像处理工具生成一张带有坐标信息的全景正射图像，再对带有坐标信息的全景正射图像进行均匀切割，得到张无重复的正射图像，提取每张正射图像的坐标信息和拍摄的时间信息。 The time period when the camera in a wetland area is working normally and the birds are resting is selected, and the route planning and flight mission of the wetland area are carried out by using a drone with RTK turned on. Multiple overlapping orthophotos covering the birds in the wetland area are obtained, and a panoramic orthophoto with coordinate information is generated by using image processing tools. Then, the panoramic orthophoto with coordinate information is evenly cut and obtained. non-repetitive orthophoto images, and extract the coordinate information and shooting time information of each orthophoto image.

进一步的，步骤S2中，具体为：Furthermore, in step S2, specifically:

步骤S21，湿地区域有个摄像头，每个摄像头固定高度，摄像头对湿地区域进行全方位监测且拍摄清晰的鸟类细节照片； Step S21, the wetland area has Each camera is at a fixed height, and it monitors the wetland area in all directions and takes clear photos of bird details;

步骤S22，根据步骤S1所得张无重复的正射图像，选取其中一张正射图像，，根据正射图像的坐标信息和拍摄的时间信息，从个摄像头记录的视频影像中截取相同范围同一时间的个视频帧，； Step S22, based on the result of step S1 Orthophotos without duplication, select one of the orthophotos , , according to the orthophoto The coordinate information and shooting time information, from Capture the same range and time from the video images recorded by the cameras Video frames , ;

步骤S23，将个视频帧与正射图像利用图像相似度匹配算法进行相似度匹配，获取与正射图像相似度最高的一张视频帧，，形成一组图像对，对图像对进行统一尺寸操作； Step S23: Video frames With orthophoto Use the image similarity matching algorithm to perform similarity matching and obtain the orthophoto image. The video frame with the highest similarity , , forming a set of image pairs , for image pairs Perform uniform size operation;

步骤S24，循环步骤S22-步骤S23，找到与正射图像,,...,相似度最高的视频帧,,...,，最终得到组时空位置对应的尺寸相同的图像对, ,...,，为第一张正射图像，为第二张正射图像，为第张正射图像，为与第一张正射图像对应的相似度最高的视频帧，为与第二张正射图像对应的相似度最高的视频帧，为与第张正射图像对应的相似度最高的视频帧。 Step S24, loop through steps S22-S23 to find the image that matches the orthophoto , ,..., The video frame with the highest similarity , ,..., , and finally get A pair of images of the same size corresponding to a group of spatiotemporal positions , ,..., , is the first orthophoto, is the second orthophoto, For the Orthophoto images, is the video frame with the highest similarity to the first orthophoto image, is the video frame with the highest similarity to the second orthophoto image, For the The video frame with the highest similarity corresponding to the orthophoto.

进一步的，步骤S3中，具体为：Furthermore, in step S3, specifically:

步骤S31，使用数据标注软件对步骤S24最终得到的组时空位置对应的尺寸相同的图像对进行标注，得到图像对中每只鸟的位置及其所属类别标签； Step S31, using data annotation software to annotate the data obtained in step S24 Annotate the image pairs of the same size corresponding to the group's temporal and spatial positions to obtain the position of each bird in the image pair and its category label;

步骤S32，按照7：3的比例将标注好的组时空位置对应的尺寸相同的图像对分成训练样本集和验证样本集，得到组训练样本集和组验证样本集，。 Step S32: mark the The image pairs of the same size corresponding to the temporal and spatial positions are divided into a training sample set and a validation sample set, and the The training sample set and The validation sample set, .

进一步的，步骤S4中，鸟类识别模型包括输入层、卷积层、特征融合层、全连接层以及输出层，将训练样本集输入鸟类识别模型进行模型训练，利用验证样本集优化鸟类识别模型，具体为：Furthermore, in step S4, the bird recognition model includes an input layer, a convolution layer, a feature fusion layer, a fully connected layer, and an output layer. The training sample set is input into the bird recognition model for model training, and the verification sample set is used to optimize the bird recognition model, specifically:

步骤S41，将步骤S31所得训练样本集通过鸟类识别模型的输入层输入到卷积层进行特征提取，得到正射图像的鸟类第一特征图和视频帧的鸟类第二特征图；Step S41, inputting the training sample set obtained in step S31 into the convolution layer through the input layer of the bird recognition model for feature extraction, thereby obtaining a first feature map of birds in the orthophoto image and a second feature map of birds in the video frame;

训练样本集为组时空位置对应的尺寸相同的图像对,,...,，输入鸟类识别模型的卷积层进行特征提取，得到组尺寸相同的特征图,,...,；其中，为第一张正射图像的鸟类第一特征图，为第二张正射图像的鸟类第一特征图，为第张正射图像的鸟类第一特征图，为第一张视频帧的鸟类第二特征图，为第二张视频帧的鸟类第二特征图，为第张视频帧的鸟类第二特征图； The training sample set is A pair of images of the same size corresponding to a group of spatiotemporal positions , ,..., , input into the convolutional layer of the bird recognition model for feature extraction, and obtain Group feature maps of the same size , ,..., ;in, is the first feature map of birds in the first orthophoto, is the first feature map of birds in the second orthophoto, For the The first characteristic map of birds in the orthophotos, is the second feature map of birds in the first video frame, is the second feature map of birds in the second video frame, For the The second feature map of birds in the video frame;

步骤S42，将正射图像的鸟类第一特征图和视频帧的鸟类第二特征图利用鸟类识别模型的特征融合层进行加权融合，得到鸟类特征融合图；Step S42, weighted fusion of the first bird feature map of the orthophoto image and the second bird feature map of the video frame using the feature fusion layer of the bird recognition model to obtain a bird feature fusion map;

按公式（1）对正射图像的鸟类第一特征图和视频帧的鸟类第二特征图进行加权融合；According to formula (1), the first feature map of birds in the orthophoto image and the second feature map of birds in the video frame are weightedly fused;

（1）； (1);

式中，为第张鸟类特征融合图，为第张正射图像的鸟类第一特征图，表示第张视频帧的鸟类第二特征图，，为第张正射图像的鸟类第一特征图的加权系数,为第张视频帧的鸟类第二特征图的加权系数,，且+=1； In the formula, For the A fusion map of bird features. For the The first characteristic map of birds in the orthophoto, Indicates The second feature map of birds in the video frame, , For the The weighting coefficient of the first feature map of birds in the orthophoto, For the The weighted coefficient of the second feature map of birds in each video frame, , and + =1;

正射图像的鸟类第一特征图的加权系数和视频帧的鸟类第二特征图的加权系数通过注意力机制得到，具体步骤如下： Weighting coefficient of the first feature map of birds in the orthophoto and the weighting coefficient of the second feature map of birds in the video frame It is obtained through the attention mechanism. The specific steps are as follows:

步骤S421，对正射图像的鸟类第一特征图和视频帧的鸟类第二特征图输入到特征融合层中的池化层，分别进行全局平均池化操作后，得到鸟类第一特征图的全局平均池化结果，鸟类第二特征图的全局平均池化结果； Step S421, the bird first feature map of the orthophoto image and the bird second feature map of the video frame are input into the pooling layer in the feature fusion layer, and global average pooling operations are performed on each of them to obtain the global average pooling result of the bird first feature map , the global average pooling result of the second feature map of birds ;

步骤S422，将鸟类第一特征图的全局平均池化结果和鸟类第二特征图的全局平均池化结果连接后，输入到特征融合层的全连接层，得到鸟类第一特征图的注意力权重和鸟类第二特征图的注意力权重；全连接层使用的激活函数为函数，见公式（2）和公式（3）所示； Step S422: Global average pooling result of the first feature map of birds And the global average pooling result of the second feature map of birds After connection, it is input into the fully connected layer of the feature fusion layer to obtain the attention weight of the first feature map of the bird and the attention weight of the second feature map of birds ; The activation function used in the fully connected layer is Function, as shown in formula (2) and formula (3);

（2）； (2);

（3）； (3);

式中，、和、分别为全连接层的权重和偏置，全连接层的权重采用随机初始化算法进行初始化，全连接层的偏置初始化设置为一个常数； In the formula, , and , are the weight and bias of the fully connected layer respectively. The weight of the fully connected layer is initialized using a random initialization algorithm, and the bias of the fully connected layer is initialized to a constant;

步骤S423，将鸟类第一特征图的注意力权重和鸟类第二特征图的注意力权重通过函数进行归一化，得到鸟类第一特征图的加权系数和鸟类第二特征图的加权系数；见公式（4）所示； Step S423: The attention weight of the first feature map of birds and the attention weight of the second feature map of birds pass The function is normalized to obtain the weighted coefficient of the first characteristic map of birds and the weighting coefficient of the second characteristic map of birds ; See formula (4);

（4）； (4);

步骤S424，对步骤S41得到的组尺寸相同的特征图,,..., 循环步骤S421-S423操作得到组特征图加权系数,,...,，其中为第一张正射图像的鸟类第一特征图的加权系数，为第二张正射图像的鸟类第一特征图的加权系数，为第张正射图像的鸟类第一特征图的加权系数，为第一张视频帧的鸟类第二特征图的加权系数，为第二张视频帧的鸟类第二特征图的加权系数，为第张视频帧的鸟类第二特征图的加权系数；将组尺寸相同的特征图,,...,和组特征图加权系数,,...,分别按照公式（1）进行加权融合，最终得到张鸟类特征融合图,,...,，其中表示第一张鸟类特征融合图，表示第二张鸟类特征融合图，表示第张鸟类特征融合图； Step S424, the step S41 obtained Group feature maps of the same size , ,..., The operation of steps S421-S423 is repeated to obtain Group feature map weighting coefficient , ,..., ,in The first feature map of the bird in the first orthophoto The weighting coefficient of The first feature map of the bird in the second orthophoto The weighting coefficient of For the First feature map of birds in orthophotos The weighting coefficient of The second feature map of the bird in the first video frame The weighting coefficient of The second feature map of the bird in the second video frame The weighting coefficient of For the The second feature map of birds in the video frame The weighting coefficient of Group feature maps of the same size , ,..., and Group feature map weighting coefficient , ,..., According to formula (1), weighted fusion is performed respectively, and finally the Bird feature fusion map , ,..., ,in represents the first bird feature fusion map, represents the second bird feature fusion map, Indicates A fusion map of bird features;

步骤S43，将鸟类特征融合图输入鸟类识别模型的全连接层，得到预训练的鸟类识别模型，用步骤S3中的验证样本集进行鸟类识别模型优化，得到优化后的鸟类识别模型；Step S43, inputting the bird feature fusion map into the fully connected layer of the bird recognition model to obtain a pre-trained bird recognition model, and optimizing the bird recognition model using the verification sample set in step S3 to obtain an optimized bird recognition model;

步骤S431，将步骤S42中所得张鸟类特征融合图,,...,进行连接，输入到鸟类识别模型的全连接层进行训练，得到预训练的鸟类识别模型，其中为第一张鸟类特征融合图，为第二张鸟类特征融合图，为第张鸟类特征融合图； Step S431, the Bird feature fusion map , ,..., Connect and input into the fully connected layer of the bird recognition model for training to obtain a pre-trained bird recognition model, where This is the first bird feature fusion map. This is the second bird feature fusion map. For the A fusion map of bird features;

步骤S432，用步骤S3中的验证样本集进行鸟类识别模型优化，具体为采用随机梯度下降法和反向传播机制，使得损失函数达到最小值，得到拟合度高的鸟类识别模型；Step S432, using the verification sample set in step S3 to optimize the bird recognition model, specifically using a stochastic gradient descent method and a back propagation mechanism to minimize the loss function and obtain a bird recognition model with a high degree of fit;

损失函数使用交叉熵损失函数，见公式（5）所示；The loss function uses the cross entropy loss function, as shown in formula (5);

（5）； (5);

式中，为交叉熵损失函数，为样本数，为类别数，是第个类别中真实标签向量中第个元素的值，当对样本的预测结果为类别正确时取1，否则取0；是模型输出时样本预测其类别为类别的概率值，。 In the formula, is the cross entropy loss function, is the number of samples, is the number of categories, It is The true label vector of the category The value of the element, when the sample The prediction result is the category If it is correct, it takes 1, otherwise it takes 0; is the model output sample Predict its category as category The probability value of .

进一步的，步骤S5中，具体为：Furthermore, in step S5, specifically:

在步骤S1所述某湿地区域摄像头正常工作以及鸟类正在休憩的时间段，重新利用无人机对该湿地区域进行航线规划和飞行任务，将获取的正射图像和视频帧经步骤S2和步骤S3处理后，得到多组时空位置对应的尺寸相同的图像对，输入步骤S4得到的拟合度高的鸟类识别模型后，得到鸟类识别结果。During the time period when the camera in a wetland area described in step S1 is working normally and the birds are resting, the drone is reused to plan routes and perform flight missions in the wetland area. The acquired orthophotos and video frames are processed in steps S2 and S3 to obtain multiple groups of image pairs of the same size corresponding to the time and space positions. After inputting the bird recognition model with a high degree of fit obtained in step S4, the bird recognition results are obtained.

进一步的，本发明采用另外一种技术方案：基于多源遥感观测和深度学习的湿地鸟类识别系统，主要包括以下模块：Furthermore, the present invention adopts another technical solution: a wetland bird identification system based on multi-source remote sensing observation and deep learning, which mainly includes the following modules:

图像采集模块；图像采集模块利用无人机获取覆盖某湿地的多张正射图像，根据正射图像截取通过摄像头拍摄的对应的多张视频帧；Image acquisition module: The image acquisition module uses a drone to obtain multiple orthophoto images covering a wetland, and intercepts multiple corresponding video frames shot by a camera based on the orthophoto images;

图像处理模块；图像处理模块为对图像采集模块采集到的图像进行相似度匹配和统一尺寸操作，得到多组时空位置对应的尺寸相同的图像对；Image processing module: The image processing module performs similarity matching and size unification operations on the images collected by the image acquisition module to obtain multiple groups of image pairs with the same size corresponding to the time and space positions;

模型训练模块；模型训练模块包括对图像处理模块得到的多组时空位置对应的尺寸相同的图像对进行样本标注以及训练样本集和验证样本集的划分，随后将训练样本集输入鸟类识别模型进行模型训练，并用验证样本集进行模型优化，得到拟合度高的鸟类识别模型；模型训练模块在模型训练和优化完成后即停止工作；Model training module: The model training module includes sample annotation of multiple groups of image pairs of the same size corresponding to the time and space positions obtained by the image processing module, and the division of training sample sets and verification sample sets, and then inputting the training sample sets into the bird recognition model for model training, and using the verification sample sets to optimize the model to obtain a bird recognition model with a high degree of fit; the model training module stops working after the model training and optimization are completed;

鸟类识别模块；鸟类识别模块输入图像处理模块得到的多组时空位置对应的尺寸相同的图像对，利用模型训练模块得到的拟合度高的鸟类识别模型，得到鸟类识别结果。Bird recognition module: The bird recognition module inputs multiple groups of image pairs of the same size corresponding to the time and space positions obtained by the image processing module, and uses the bird recognition model with high fitting degree obtained by the model training module to obtain the bird recognition result.

本发明的有益效果：（1）本发明通过融合摄像头所拍摄的侧视图的鸟类特征和无人机所拍摄的正射图像的鸟类特征，将摄像头和无人机的优势互补，解决单一视角图像中鸟类被遮挡造成的鸟类识别准确度下降的问题；（2）本发明通过注意力机制将正射图像的鸟类特征与视频帧的鸟类特征进行融合，有利于深度学习模型全面深入的学习鸟类特征，提高鸟类识别的准确性和置信度；（3）本发明可广泛应用于湿地鸟类的识别，对湿地鸟类物种库的构建以及湿地鸟类的保护具有重要意义。The beneficial effects of the present invention are as follows: (1) The present invention complements the advantages of the camera and the drone by fusing the bird features of the side view taken by the camera and the bird features of the orthophoto image taken by the drone, thereby solving the problem of decreased bird recognition accuracy caused by occlusion of birds in a single-view image; (2) The present invention fuses the bird features of the orthophoto image with the bird features of the video frame through an attention mechanism, which is conducive to the deep learning model to comprehensively and deeply learn the bird features and improve the accuracy and confidence of bird recognition; (3) The present invention can be widely used in the identification of wetland birds, which is of great significance to the construction of a wetland bird species library and the protection of wetland birds.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的流程图。FIG. 1 is a flow chart of the present invention.

图2为本发明的鸟类识别模型示意图。FIG. 2 is a schematic diagram of a bird identification model of the present invention.

图3为本发明的鸟类识别模型训练和优化流程图。FIG. 3 is a flow chart of bird identification model training and optimization according to the present invention.

图4为本发明的系统构建示意图。FIG. 4 is a schematic diagram of the system construction of the present invention.

具体实施方式Detailed ways

参照图1，本发明采用的技术方案如下：基于多源遥感观测和深度学习的湿地鸟类识别方法，其步骤如下：Referring to FIG1 , the technical solution adopted by the present invention is as follows: a wetland bird identification method based on multi-source remote sensing observation and deep learning, the steps of which are as follows:

进一步的，步骤S1中，具体为：Furthermore, in step S1, specifically:

选取某湿地区域摄像头正常工作以及鸟类正在休憩的时间段，利用开启RTK的无人机对湿地区域进行航线规划和飞行任务，获取覆盖湿地区域鸟类的多张具有重叠度的正射图像，利用图像处理工具（如PhotoScan）生成一张带有坐标信息的全景正射图像，再对带有坐标信息的全景正射图像进行均匀切割，得到张无重复的正射图像，提取每张正射图像的坐标信息和拍摄的时间信息。 The time period when the camera in a wetland area is working normally and the birds are resting is selected, and the route planning and flight mission of the wetland area are carried out by using a drone with RTK turned on. Multiple overlapping orthophotos covering the birds in the wetland area are obtained, and a panoramic orthophoto image with coordinate information is generated by using image processing tools (such as PhotoScan). The panoramic orthophoto image with coordinate information is then evenly cut to obtain non-repetitive orthophoto images, and extract the coordinate information and shooting time information of each orthophoto image.

进一步的，步骤S2中，具体为：Furthermore, in step S2, specifically:

步骤S21，湿地区域有个摄像头，每个摄像头固定高度1.7m，摄像头对湿地区域进行全方位监测且拍摄清晰的鸟类细节照片； Step S21, the wetland area has Cameras, each with a fixed height of 1.7m, monitor the wetland area in all directions and take clear photos of bird details;

步骤S23，将个视频帧与正射图像利用图像相似度匹配算法（如特征匹配算法进行相似度匹配，获取与正射图像相似度最高的一张视频帧，，形成一组图像对，对图像对进行统一尺寸操作； Step S23: Video frames With orthophoto Using image similarity matching algorithms (such as feature matching algorithms Perform similarity matching to obtain orthophoto images The video frame with the highest similarity , , forming a set of image pairs , for image pairs Perform uniform size operation;

进一步的，步骤S3中，具体为：Furthermore, in step S3, specifically:

步骤S31，使用数据标注软件（如LabelImg）对步骤S24最终得到的组时空位置对应的尺寸相同的图像对进行标注，得到图像对中每只鸟的位置及其所属类别标签； Step S31, using data labeling software (such as LabelImg) to label the data obtained in step S24 Annotate the image pairs of the same size corresponding to the group's temporal and spatial positions to obtain the position of each bird in the image pair and its category label;

进一步的，参照图2，步骤S4中，鸟类识别模型包括输入层、卷积层、特征融合层、全连接层以及输出层，将训练样本集输入鸟类识别模型进行模型训练，利用验证样本集优化鸟类识别模型，参照图3，具体为：Further, referring to FIG2 , in step S4, the bird recognition model includes an input layer, a convolution layer, a feature fusion layer, a fully connected layer, and an output layer. The training sample set is input into the bird recognition model for model training, and the bird recognition model is optimized using the verification sample set. Referring to FIG3 , specifically:

训练样本集为组时空位置对应的尺寸相同的图像对,,...,，将其输入鸟类识别模型的卷积层进行特征提取，得到组尺寸相同的特征图,,...,；其中，为第一张正射图像的鸟类第一特征图，为第二张正射图像的鸟类第一特征图，为第张正射图像的鸟类第一特征图，为第一张视频帧的鸟类第二特征图，为第二张视频帧的鸟类第二特征图，为第张视频帧的鸟类第二特征图； The training sample set is A pair of images of the same size corresponding to a group of spatiotemporal positions , ,..., , which is input into the convolutional layer of the bird recognition model for feature extraction, and we get Group feature maps of the same size , ,..., ;in, is the first feature map of birds in the first orthophoto, is the first feature map of birds in the second orthophoto, For the The first characteristic map of birds in the orthophotos, is the second feature map of birds in the first video frame, is the second feature map of birds in the second video frame, For the The second feature map of birds in the video frame;

（1）； (1);

进一步的，正射图像的鸟类第一特征图的加权系数和视频帧的鸟类第二特征图的加权系数通过注意力机制得到，具体步骤如下： Furthermore, the weighting coefficient of the first feature map of the bird in the orthophoto is and the weighting coefficient of the second feature map of birds in the video frame It is obtained through the attention mechanism. The specific steps are as follows:

（2）； (2);

（3）； (3);

（4）； (4);

步骤S431，将步骤S42所得张鸟类特征融合图,,...,进行连接，输入到鸟类识别模型的全连接层进行训练，得到预训练的鸟类识别模型，其中为第一张鸟类特征融合图，为第二张鸟类特征融合图，为第张鸟类特征融合图； Step S431: the Bird feature fusion map , ,..., Connect and input into the fully connected layer of the bird recognition model for training to obtain a pre-trained bird recognition model, where This is the first bird feature fusion map. This is the second bird feature fusion map. For the A fusion map of bird features;

（5）； (5);

式中，L为交叉熵损失函数，为样本数，为类别数，是第个类别中真实标签向量中第个元素的值，当对样本的预测结果为类别正确时取1，否则取0；是模型输出时样本预测其类别为类别的概率值，。 Where L is the cross entropy loss function, is the number of samples, is the number of categories, It is The true label vector of the category The value of the element, when the sample The prediction result is the category If it is correct, it takes 1, otherwise it takes 0; is the model output sample Predict its category as category The probability value of .

进一步的，步骤S5中，具体为：Furthermore, in step S5, specifically:

进一步的，本发明还提供一种技术方案：参照图4，基于多源遥感观测和深度学习的湿地鸟类识别系统，主要包括以下模块：Furthermore, the present invention also provides a technical solution: Referring to FIG. 4 , a wetland bird identification system based on multi-source remote sensing observation and deep learning mainly includes the following modules:

以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若于变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation methods of the present invention, and the descriptions thereof are relatively specific and detailed, but they cannot be understood as limiting the scope of the present invention. It should be pointed out that, for ordinary technicians in this field, some modifications and improvements can be made without departing from the concept of the present invention, and these all belong to the protection scope of the present invention. Therefore, the protection scope of the present invention patent shall be subject to the attached claims.

Claims

1. A wetland bird identification method based on multi-source remote sensing observation and deep learning, characterized in that the steps are as follows:

Step S1, obtaining multiple orthophoto images of birds covering a wetland area taken by a drone from a bird's-eye view and synthesizing them into a panoramic orthophoto image, and then evenly cutting the panoramic orthophoto image to obtain multiple orthophoto images without duplication;

Step S2, based on the coordinate information and time information of the multiple non-repetitive orthophoto images obtained in step S1, intercept multiple video frames recorded by multiple cameras at the same time in the same range corresponding to each non-repetitive orthophoto image, perform similarity matching on the multiple video frames corresponding to each non-repetitive orthophoto image, select a video frame with the highest similarity to each non-repetitive orthophoto image to form an image pair, unify the size of the orthophoto image and the video frame in each group of image pairs, and finally form multiple groups of image pairs with the same size corresponding to the temporal and spatial positions;

Step S3, manually labeling the image pairs with the same size corresponding to the multiple groups of spatiotemporal positions obtained in step S2 to obtain a training sample set and a verification sample set;

Step S4, inputting the training sample set into the bird recognition model for model training, and optimizing the bird recognition model using the verification sample set to obtain a bird recognition model with a high degree of fit;

Step S5, reacquiring a plurality of non-repetitive orthophoto images and video frames covering the wetland area, and inputting the bird recognition model with a high degree of fit obtained in step S4 after being processed in steps S2 and S3 to obtain a bird recognition result;

In step S4, the bird recognition model includes an input layer, a convolution layer, a feature fusion layer, a fully connected layer, and an output layer. The training sample set is input into the bird recognition model for model training, and the verification sample set is used to optimize the bird recognition model. Specifically,

Step S41, inputting the training sample set obtained in step S31 into the convolution layer through the input layer of the bird recognition model for feature extraction, thereby obtaining a first feature map of birds in the orthophoto image and a second feature map of birds in the video frame;

The training sample set is A pair of images of the same size corresponding to a group of spatiotemporal positions , ,..., , input into the convolutional layer of the bird recognition model for feature extraction, and obtain Group feature maps of the same size , ,..., ;in, is the first feature map of birds in the first orthophoto, is the first feature map of birds in the second orthophoto, For the The first characteristic map of birds in the orthophoto, is the second feature map of birds in the first video frame, is the second feature map of birds in the second video frame, For the The second feature map of birds in the video frame;

Step S42, weighted fusion is performed on the first feature map of birds in the orthophoto image and the second feature map of birds in the video frame using the feature fusion layer of the bird recognition model to obtain a bird feature fusion map;

According to formula (1), the first feature map of birds in the orthophoto image and the second feature map of birds in the video frame are weightedly fused;

(1);

In the formula, For the A fusion map of bird features. For the The first characteristic map of birds in the orthophotos, Indicates The second feature map of birds in the video frame, , For the The weighting coefficient of the first feature map of birds in the orthophoto, For the The weighted coefficient of the second feature map of birds in each video frame, , and + =1;

Weighting coefficient of the first feature map of birds in the orthophoto and the weighting coefficient of the second feature map of birds in the video frame It is obtained through the attention mechanism. The specific steps are as follows:

Step S421, the bird first feature map of the orthophoto image and the bird second feature map of the video frame are input into the pooling layer in the feature fusion layer, and global average pooling operations are performed on each of them to obtain the global average pooling result of the bird first feature map , the global average pooling result of the second feature map of birds ;

Step S422: Global average pooling result of the first feature map of birds And the global average pooling result of the second feature map of birds After connection, it is input into the fully connected layer of the feature fusion layer to obtain the attention weight of the first feature map of the bird and the attention weight of the second feature map of birds ; The activation function used in the fully connected layer is Function, as shown in formula (2) and formula (3);

(2);

(3);

In the formula, , and , are the weight and bias of the fully connected layer respectively. The weight of the fully connected layer is initialized using a random initialization algorithm, and the bias of the fully connected layer is initialized to a constant;

Step S423: The attention weight of the first feature map of birds and the attention weight of the second feature map of birds pass The function is normalized to obtain the weighted coefficient of the first characteristic map of birds and the weighting coefficient of the second characteristic map of birds ; See formula (4);

(4);

Step S424, the step S41 obtained Group feature maps of the same size , ,..., The operation of steps S421-S423 is repeated to obtain Group feature map weighting coefficient , ,..., ,in The first feature map of the bird in the first orthophoto The weighting coefficient of The first feature map of the bird in the second orthophoto The weighting coefficient of For the First feature map of birds in orthophotos The weighting coefficient of The second feature map of the bird in the first video frame The weighting coefficient of The second feature map of the bird in the second video frame The weighting coefficient of For the Second feature map of birds in video frames The weighting coefficient of Group feature maps of the same size , ,..., and Group feature map weighting coefficient , ,..., According to formula (1), weighted fusion is performed respectively, and finally the Bird feature fusion map , ,..., ,in represents the first bird feature fusion map, represents the second bird feature fusion map, Indicates A fusion map of bird features;

Step S43, inputting the bird feature fusion map into the fully connected layer of the bird recognition model to obtain a pre-trained bird recognition model, and optimizing the bird recognition model using the verification sample set in step S3 to obtain an optimized bird recognition model;

Step S431, the Bird feature fusion map , ,..., Connect and input into the fully connected layer of the bird recognition model for training to obtain a pre-trained bird recognition model, where This is the first bird feature fusion map. This is the second bird feature fusion map. For the A fusion map of bird features;

Step S432, using the verification sample set in step S3 to optimize the bird recognition model, specifically using a stochastic gradient descent method and a back propagation mechanism to minimize the loss function and obtain a bird recognition model with a high degree of fit;

The loss function uses the cross entropy loss function, as shown in formula (5);

(5);

In the formula, is the cross entropy loss function, is the number of samples, is the number of categories, It is The true label vector of the category The value of the element, when the sample The prediction result is the category If it is correct, it takes 1, otherwise it takes 0; is the model output sample Predict its category as category The probability value of .

2. The wetland bird identification method based on multi-source remote sensing observation and deep learning according to claim 1 is characterized in that: in step S1, specifically:

The time period when the camera in a wetland area is working normally and the birds are resting is selected, and the route planning and flight mission of the wetland area are carried out by using a drone with RTK turned on. Multiple overlapping orthophotos covering the birds in the wetland area are obtained, and a panoramic orthophoto image with coordinate information is generated by using image processing tools. Then, the panoramic orthophoto image with coordinate information is evenly cut and obtained. non-repetitive orthophoto images, and extract the coordinate information and shooting time information of each orthophoto image.

3. The wetland bird identification method based on multi-source remote sensing observation and deep learning according to claim 2 is characterized in that: in step S2, specifically:

Step S21, the wetland area has Each camera is at a fixed height, and the cameras monitor the wetland area in all directions and take clear photos of bird details;

Step S22, based on the result of step S1 Orthophotos without duplication, select one of the orthophotos , , according to the orthophoto The coordinate information and shooting time information, from Capture the same range and time from the video images recorded by the cameras Video frames , ;

Step S23: Video frames With orthophoto Use the image similarity matching algorithm to perform similarity matching and obtain the orthophoto image. The video frame with the highest similarity , , forming a set of image pairs , for image pairs Perform uniform size operation;

Step S24, loop through steps S22-S23 to find the image that matches the orthophoto , ,..., The video frame with the highest similarity , ,..., , and finally get A pair of images of the same size corresponding to a group of spatiotemporal positions , ,..., , is the first orthophoto, is the second orthophoto, For the Orthophoto images, is the video frame with the highest similarity to the first orthophoto image, is the video frame with the highest similarity to the second orthophoto image, For the The video frame with the highest similarity corresponding to the orthophoto image.

4. The wetland bird identification method based on multi-source remote sensing observation and deep learning according to claim 3 is characterized in that: in step S3, specifically:

Step S31, using data annotation software to annotate the data obtained in step S24 Annotate the image pairs of the same size corresponding to the group's temporal and spatial positions to obtain the position of each bird in the image pair and its category label;

Step S32: mark the The image pairs of the same size corresponding to the temporal and spatial positions are divided into a training sample set and a validation sample set, and the The training sample set and The validation sample set, .

5. The wetland bird identification method based on multi-source remote sensing observation and deep learning according to claim 4 is characterized in that: in step S5, specifically:

During the time period when the camera in a wetland area described in step S1 is working normally and the birds are resting, the drone is reused to plan routes and perform flight missions in the wetland area. The acquired orthophotos and video frames are processed in steps S2 and S3 to obtain multiple groups of image pairs of the same size corresponding to the time and space positions. After inputting the bird recognition model with a high degree of fit obtained in step S4, the bird recognition results are obtained.

6. A wetland bird identification system based on multi-source remote sensing observation and deep learning, applied to the wetland bird identification method based on multi-source remote sensing observation and deep learning as described in claim 5, characterized in that it mainly includes the following modules:

Image acquisition module: The image acquisition module uses a drone to obtain multiple orthophoto images covering a wetland, and intercepts multiple corresponding video frames shot by a camera based on the orthophoto images;

Image processing module: The image processing module performs similarity matching and size unification operations on the images collected by the image acquisition module to obtain multiple groups of image pairs with the same size corresponding to the time and space positions;

Model training module: The model training module includes sample annotation of multiple groups of image pairs of the same size corresponding to the time and space positions obtained by the image processing module, and the division of training sample sets and verification sample sets, and then inputting the training sample sets into the bird recognition model for model training, and using the verification sample sets to optimize the model to obtain a bird recognition model with a high degree of fit; the model training module stops working after the model training and optimization are completed;

Bird recognition module: The bird recognition module inputs multiple groups of image pairs of the same size corresponding to the time and space positions obtained by the image processing module, and uses the bird recognition model with high fitting degree obtained by the model training module to obtain the bird recognition result.