CN118097721B - Wetland bird recognition method and system based on multi-source remote sensing observation and deep learning - Google Patents

Wetland bird recognition method and system based on multi-source remote sensing observation and deep learning Download PDF

Info

Publication number
CN118097721B
CN118097721B CN202410530152.6A CN202410530152A CN118097721B CN 118097721 B CN118097721 B CN 118097721B CN 202410530152 A CN202410530152 A CN 202410530152A CN 118097721 B CN118097721 B CN 118097721B
Authority
CN
China
Prior art keywords
bird
image
orthographic
recognition model
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410530152.6A
Other languages
Chinese (zh)
Other versions
CN118097721A (en
Inventor
钟顺
黄敏
尚子安
常力书
付宇恒
林珲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Normal University
Original Assignee
Jiangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Normal University filed Critical Jiangxi Normal University
Priority to CN202410530152.6A priority Critical patent/CN118097721B/en
Publication of CN118097721A publication Critical patent/CN118097721A/en
Application granted granted Critical
Publication of CN118097721B publication Critical patent/CN118097721B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Remote Sensing (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a wetland bird identification method and system based on multi-source remote sensing observation and deep learning, wherein the method comprises the following steps: acquiring a plurality of non-repeated orthographic images of birds covering a certain wetland area in overlooking shooting by an unmanned aerial vehicle; selecting a video frame with highest similarity with each orthographic image, and unifying the size to form a plurality of image pairs; manually labeling a plurality of groups of image pairs to obtain a training sample set and a verification sample set; inputting the training sample set into a bird recognition model for model training, and optimizing the bird recognition model by using the verification sample set to obtain a bird recognition model with high fitting degree; and re-acquiring an orthographic image and a video frame of the wetland area, and inputting a bird recognition model with high fitting degree after processing to obtain a bird recognition result. According to the bird recognition method, the bird features of the side view shot by the camera and the bird features of the orthographic image shot by the unmanned aerial vehicle are fused, so that the problem of reduced bird recognition accuracy caused by shielding of birds in a single-view image is solved.

Description

Wetland bird recognition method and system based on multi-source remote sensing observation and deep learning
Technical Field
The invention relates to the field of remote sensing image processing and geospatial information mining, in particular to a wetland bird identification method and system based on multi-source remote sensing observation and deep learning.
Background
Bird identification is a necessary premise for protecting birds, and various birds can be effectively distinguished through bird images. With the advent of artificial intelligence, deep learning methods have begun to be applied to bird identification, with deep learning models identifying bird species by learning bird features on images. But birds in the wetland are clustered, the bird clusters are monitored only from a single visual angle, and the phenomenon of mutual shielding exists between birds on the shot images, so that the deep learning model cannot accurately learn the characteristics of the shielded birds, and the recognition accuracy is reduced.
At present, the devices for acquiring bird images mainly comprise a camera and an unmanned aerial vehicle, and the side view of the bird shot by the bird observing camera can clearly describe the side profile and detail characteristics of the bird, so that the bird is convenient to identify; however, because the birds in the bird group are mutually shielded, the shielded birds are easy to have the condition of wrong identification; however, due to the problems of the visual angle and the distance, the orthographic image can only show the whole shape of the birds, and the classification of the birds is difficult to judge only by the orthographic image.
Disclosure of Invention
In view of the above, the invention aims to provide a wetland bird recognition method and a wetland bird recognition system based on multi-source remote sensing observation and deep learning, which solve the problem of bird recognition accuracy reduction caused by shielding birds in a single-view image by fusing bird features of a side view shot by a camera and bird features of an orthographic image shot by an unmanned aerial vehicle.
The technical scheme adopted by the invention is as follows: the wetland bird recognition method based on multi-source remote sensing observation and deep learning comprises the following steps:
Step S1, acquiring a plurality of orthographic images of birds covering a certain wetland area by overlooking shooting of an unmanned aerial vehicle, synthesizing a panoramic orthographic image, and uniformly cutting the panoramic orthographic image to obtain a plurality of non-repeated orthographic images;
Step S2, according to the coordinate information and the time information of the plurality of non-repeated orthographic images obtained in the step S1, intercepting a plurality of video frames at the same time in the video images recorded by the plurality of cameras in the same range corresponding to each non-repeated orthographic image, performing similarity matching on the plurality of video frames corresponding to each non-repeated orthographic image, selecting one video frame with the highest similarity with each non-repeated orthographic image, forming an image pair, unifying the sizes of the orthographic images and the video frames in each group of image pairs, and finally forming a plurality of groups of image pairs with the same size corresponding to space-time positions;
S3, manually labeling the image pairs with the same size corresponding to the plurality of groups of space-time positions obtained in the step S2 to obtain a training sample set and a verification sample set;
S4, inputting a training sample set into the bird recognition model for model training, and optimizing the bird recognition model by using a verification sample set to obtain the bird recognition model with high fitting degree;
And S5, re-acquiring a plurality of non-repeated orthographic images and video frames covering the wetland area, and inputting the bird recognition model with high fitting degree obtained in the step S4 after the processing of the step S2 and the step S3 to obtain a bird recognition result.
Further, in step S1, specifically:
Selecting a period of time when a camera of a certain wetland area works normally and birds are resting, performing route planning and flight tasks on the wetland area by using an unmanned aerial vehicle for starting an RTK, acquiring a plurality of orthographic images with overlapping degree for covering the birds of the wetland area, generating a panoramic orthographic image with coordinate information by using an image processing tool, and uniformly cutting the panoramic orthographic image with the coordinate information to obtain And extracting coordinate information and shooting time information of each orthographic image.
Further, in step S2, specifically:
Step S21, the wetland area is provided with Each camera is fixed in height, monitors the wetland area in an omnibearing manner and shoots clear bird detail pictures;
step S22, according to step S1 Selecting one of the non-repeating orthographic imagesFrom orthographic imagesCoordinate information of (2) and time information of photographing fromIntercepting the same range and the same time in video images recorded by the camerasIndividual video frames
Step S23, willIndividual video framesAnd orthographic imageSimilarity matching is carried out by utilizing an image similarity matching algorithm, and an orthographic image is obtainedVideo frame with highest similarityForming a group of image pairsFor image pairsPerforming uniform size operation;
Step S24, loop through step S22-step S23, find an orthographic image ,,...,Video frame with highest similarity,,...,Finally obtainImage pairs corresponding to group space-time positions and having the same size,,...,For the first orthographic image,For the second orthographic image,Is the firstThe image is subjected to an orthographic image,For the video frame with the highest similarity corresponding to the first orthographic image,For the video frame with the highest similarity corresponding to the second orthographic image,Is the first toThe orthographic image corresponds to the video frame with the highest similarity.
Further, in step S3, specifically:
Step S31, using data labeling software to obtain the final product of step S24 Labeling the image pairs with the same size corresponding to the space-time positions of the groups to obtain the position of each bird in the image pair and the category label of each bird;
Step S32, according to 7:3 will be marked The images with the same size corresponding to the space-time positions of the groups are divided into a training sample set and a verification sample set to obtainGroup training sample setThe set of group verification samples,
Further, in step S4, the bird recognition model includes an input layer, a convolution layer, a feature fusion layer, a full connection layer, and an output layer, the training sample set is input into the bird recognition model to perform model training, and the bird recognition model is optimized by using the verification sample set, specifically:
S41, inputting the training sample set obtained in the step S31 into a convolution layer through an input layer of a bird recognition model to perform feature extraction, so as to obtain a bird first feature map of an orthographic image and a bird second feature map of a video frame;
Training sample set is Image pairs corresponding to group space-time positions and having the same size,,...,Inputting the characteristic extraction of the convolution layer of the bird recognition model to obtainFeature maps of the same group size,,...,; Wherein,A first signature of birds that is a first orthographic image,A first signature of birds that is a second orthographic image,Is the firstA bird first signature of an orthographic image,A bird second feature map for the first video frame,A bird second feature map for a second video frame,Is the firstA bird second profile of Zhang Shipin frames;
step S42, carrying out weighted fusion on the bird first feature map of the orthographic image and the bird second feature map of the video frame by utilizing a feature fusion layer of the bird recognition model to obtain a bird feature fusion map;
weighting and fusing the bird first feature map of the orthographic image and the bird second feature map of the video frame according to the formula (1);
(1);
In the method, in the process of the invention, Is the firstA fusion map of characteristics of the birds,Is the firstA bird first signature of an orthographic image,Represent the firstA bird second signature of Zhang Shipin frames,Is the firstWeighting coefficients of the bird first profile of the orthographic image,Is the firstWeighting coefficients of the bird second profile for Zhang Shipin frames,And is also provided with+=1;
Weighting coefficients of bird first feature map of orthographic imageAnd weighting coefficients for bird second feature map of video frameObtained through an attention mechanism, the specific steps are as follows:
Step S421, inputting the bird first feature map of the orthographic image and the bird second feature map of the video frame into a pooling layer in the feature fusion layer, and respectively performing global average pooling operation to obtain a global average pooling result of the bird first feature map Global average pooling results for bird second profile
Step S422, global average pooling results of the first feature map of birdsGlobal average pooling results for second profile of birdsAfter connection, the bird first characteristic map attention weight is obtained by inputting the bird first characteristic map attention weight into a full connection layer of the characteristic fusion layerAnd attention weighting of bird second profile; The activation function used by the full connection layer isThe functions are shown in a formula (2) and a formula (3);
(2);
(3);
In the method, in the process of the invention, AndThe weight and bias of the full-connection layer are respectively initialized by adopting a random initialization algorithm, and the bias initialization of the full-connection layer is set as a constant;
step S423, weighting the attention of the first bird profile And attention weighting of bird second profileBy passing throughNormalizing the function to obtain a weighting coefficient of the first characteristic diagram of the birdAnd weighting coefficients of bird second profile; See formula (4);
(4);
Step S424, for the step S41 Feature maps of the same group size,,...,The operation of the steps S421-S423 is circulated to obtainGroup feature map weighting coefficients,,...,WhereinFirst characterization of birds as first orthographic imageIs used for the weight coefficient of the (c),First characteristic map of bird as second orthographic imageIs used for the weight coefficient of the (c),Is the firstBird first feature map of orthographic imageIs used for the weight coefficient of the (c),Bird second feature map for first video frameIs used for the weight coefficient of the (c),Bird second feature map for second video frameIs used for the weight coefficient of the (c),Is the firstBirds second signature of Zhang Shipin framesWeighting coefficients of (2); will beFeature maps of the same group size,,...,AndGroup feature map weighting coefficients,,...,Respectively carrying out weighted fusion according to the formula (1) to finally obtainFusion map of characteristics of birds,,...,WhereinA first bird characteristic fusion map is shown,A second bird characteristic fusion map is shown,Represent the firstA bird feature fusion map;
Step S43, inputting the bird feature fusion map into a full-connection layer of a bird recognition model to obtain a pre-trained bird recognition model, and optimizing the bird recognition model by using the verification sample set in the step S3 to obtain an optimized bird recognition model;
step S431, step S42 Fusion map of characteristics of birds,,...,Connecting, and inputting into a full connection layer of bird recognition model for training to obtain pre-trained bird recognition model, whereinFor the first bird feature fusion map,For the second bird feature fusion map,Is the firstA bird feature fusion map;
Step S432, performing bird recognition model optimization by using the verification sample set in step S3, specifically adopting a random gradient descent method and a counter propagation mechanism to enable a loss function to reach a minimum value, and obtaining a bird recognition model with high fitting degree;
The loss function uses a cross entropy loss function, as shown in formula (5);
(5);
In the method, in the process of the invention, In order to cross-entropy loss function,In order to obtain the number of samples,For the number of categories to be considered,Is the firstTrue tag vector of the individual categoriesThe value of each element, when applied to a sampleThe prediction result of (a) is categoryTaking 1 when correct, otherwise taking 0; Is a sample of the model output Predicting its category as categoryIs used to determine the probability value of (1),
Further, in step S5, specifically:
And (3) in the time period that the camera of a certain wetland area works normally and birds are resting in the step (S1), carrying out route planning and flight tasks on the wetland area by using the unmanned aerial vehicle again, processing the obtained orthographic images and video frames in the step (S2) and the step (S3) to obtain a plurality of groups of image pairs with the same size corresponding to space-time positions, and inputting the bird recognition model with high fitting degree obtained in the step (S4) to obtain bird recognition results.
Further, the invention adopts another technical scheme that: wetland bird recognition system based on multisource remote sensing observation and deep learning mainly includes following module:
An image acquisition module; the image acquisition module acquires a plurality of orthographic images covering a certain wetland by using the unmanned aerial vehicle, and intercepts a plurality of corresponding video frames shot by a camera according to the orthographic images;
An image processing module; the image processing module is used for performing similarity matching and uniform size operation on the images acquired by the image acquisition module to obtain a plurality of groups of image pairs with the same size corresponding to the space-time positions;
A model training module; the model training module comprises the steps of marking samples of a plurality of groups of image pairs with the same size and corresponding to space-time positions, which are obtained by the image processing module, dividing a training sample set and a verification sample set, inputting the training sample set into a bird recognition model for model training, and optimizing the model by using the verification sample set to obtain the bird recognition model with high fitting degree; the model training module stops working after model training and optimization are completed;
A bird recognition module; the bird recognition module inputs a plurality of groups of image pairs with the same size, which are corresponding to the space-time positions and are obtained by the image processing module, and a bird recognition model with high fitting degree is obtained by the model training module, so that a bird recognition result is obtained.
The invention has the beneficial effects that: (1) According to the invention, the advantages of the camera and the unmanned aerial vehicle are complemented by fusing the bird characteristics of the side view shot by the camera and the bird characteristics of the orthographic image shot by the unmanned aerial vehicle, so that the problem of reduced bird recognition accuracy caused by shielding birds in a single visual angle image is solved; (2) According to the bird feature fusion method, bird features of the orthographic image and bird features of the video frame are fused through the attention mechanism, so that the deep learning model can learn bird features comprehensively and deeply, and accuracy and confidence of bird recognition are improved; (3) The method can be widely applied to the recognition of wetland birds, and has important significance for the construction of wetland bird species libraries and the protection of the wetland birds.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a schematic diagram of a bird recognition model according to the present invention.
FIG. 3 is a flow chart of bird recognition model training and optimization in accordance with the present invention.
FIG. 4 is a schematic diagram of the system construction of the present invention.
Detailed Description
Referring to fig. 1, the technical scheme adopted by the invention is as follows: the wetland bird recognition method based on multi-source remote sensing observation and deep learning comprises the following steps:
Step S1, acquiring a plurality of orthographic images of birds covering a certain wetland area by overlooking shooting of an unmanned aerial vehicle, synthesizing a panoramic orthographic image, and uniformly cutting the panoramic orthographic image to obtain a plurality of non-repeated orthographic images;
Step S2, according to the coordinate information and the time information of the plurality of non-repeated orthographic images obtained in the step S1, intercepting a plurality of video frames at the same time in the video images recorded by the plurality of cameras in the same range corresponding to each non-repeated orthographic image, performing similarity matching on the plurality of video frames corresponding to each non-repeated orthographic image, selecting one video frame with the highest similarity with each non-repeated orthographic image, forming an image pair, unifying the sizes of the orthographic images and the video frames in each group of image pairs, and finally forming a plurality of groups of image pairs with the same size corresponding to space-time positions;
S3, manually labeling the image pairs with the same size corresponding to the plurality of groups of space-time positions obtained in the step S2 to obtain a training sample set and a verification sample set;
S4, inputting a training sample set into the bird recognition model for model training, and optimizing the bird recognition model by using a verification sample set to obtain the bird recognition model with high fitting degree;
And S5, re-acquiring a plurality of non-repeated orthographic images and video frames covering the wetland area, and inputting the bird recognition model with high fitting degree obtained in the step S4 after the processing of the step S2 and the step S3 to obtain a bird recognition result.
Further, in step S1, specifically:
Selecting a period of time when a camera of a certain wetland area works normally and birds are resting, performing route planning and flight tasks on the wetland area by using an unmanned aerial vehicle for starting an RTK, acquiring a plurality of orthographic images with overlapping degree for covering the birds of the wetland area, generating a panoramic orthographic image with coordinate information by using an image processing tool (such as PhotoScan), and uniformly cutting the panoramic orthographic image with the coordinate information to obtain And extracting coordinate information and shooting time information of each orthographic image.
Further, in step S2, specifically:
Step S21, the wetland area is provided with The fixed height of each camera is 1.7m, and the cameras carry out omnibearing monitoring on the wetland area and shoot clear bird detail pictures;
step S22, according to step S1 Selecting one of the non-repeating orthographic imagesFrom orthographic imagesCoordinate information of (2) and time information of photographing fromIntercepting the same range and the same time in video images recorded by the camerasIndividual video frames
Step S23, willIndividual video framesAnd orthographic imageUsing image similarity matching algorithms (e.g. feature matching algorithmsPerforming similarity matching to obtain an orthographic imageVideo frame with highest similarityForming a group of image pairsFor image pairsPerforming uniform size operation;
Step S24, loop through step S22-step S23, find an orthographic image ,,...,Video frame with highest similarity,,...,Finally obtainImage pairs corresponding to group space-time positions and having the same size,,...,For the first orthographic image,For the second orthographic image,Is the firstThe image is subjected to an orthographic image,For the video frame with the highest similarity corresponding to the first orthographic image,For the video frame with the highest similarity corresponding to the second orthographic image,Is the first toThe orthographic image corresponds to the video frame with the highest similarity.
Further, in step S3, specifically:
Step S31, using data labeling software (e.g. LabelImg) to obtain the final product of step S24 Labeling the image pairs with the same size corresponding to the space-time positions of the groups to obtain the position of each bird in the image pair and the category label of each bird;
Step S32, according to 7:3 will be marked The images with the same size corresponding to the space-time positions of the groups are divided into a training sample set and a verification sample set to obtainGroup training sample setThe set of group verification samples,
Further, referring to fig. 2, in step S4, the bird recognition model includes an input layer, a convolution layer, a feature fusion layer, a full connection layer, and an output layer, the training sample set is input into the bird recognition model to perform model training, and the bird recognition model is optimized by using the verification sample set, referring to fig. 3, specifically:
S41, inputting the training sample set obtained in the step S31 into a convolution layer through an input layer of a bird recognition model to perform feature extraction, so as to obtain a bird first feature map of an orthographic image and a bird second feature map of a video frame;
Training sample set is Image pairs corresponding to group space-time positions and having the same size,,...,Inputting the bird recognition model into a convolution layer of the bird recognition model for feature extraction to obtainFeature maps of the same group size,,...,; Wherein,A first signature of birds that is a first orthographic image,A first signature of birds that is a second orthographic image,Is the firstA bird first signature of an orthographic image,A bird second feature map for the first video frame,A bird second feature map for a second video frame,Is the firstA bird second profile of Zhang Shipin frames;
step S42, carrying out weighted fusion on the bird first feature map of the orthographic image and the bird second feature map of the video frame by utilizing a feature fusion layer of the bird recognition model to obtain a bird feature fusion map;
weighting and fusing the bird first feature map of the orthographic image and the bird second feature map of the video frame according to the formula (1);
(1);
In the method, in the process of the invention, Is the firstA fusion map of characteristics of the birds,Is the firstA bird first signature of an orthographic image,Represent the firstA bird second signature of Zhang Shipin frames,Is the firstWeighting coefficients of the bird first profile of the orthographic image,Is the firstWeighting coefficients of the bird second profile for Zhang Shipin frames,And is also provided with+=1;
Further, the weighting coefficients of the bird first feature map of the orthographic imageAnd weighting coefficients for bird second feature map of video frameObtained through an attention mechanism, the specific steps are as follows:
Step S421, inputting the bird first feature map of the orthographic image and the bird second feature map of the video frame into a pooling layer in the feature fusion layer, and respectively performing global average pooling operation to obtain a global average pooling result of the bird first feature map Global average pooling results for bird second profile
Step S422, global average pooling results of the first feature map of birdsGlobal average pooling results for second profile of birdsAfter connection, the bird first characteristic map attention weight is obtained by inputting the bird first characteristic map attention weight into a full connection layer of the characteristic fusion layerAnd attention weighting of bird second profile; The activation function used by the full connection layer isThe functions are shown in a formula (2) and a formula (3);
(2);
(3);
In the method, in the process of the invention, AndThe weight and bias of the full-connection layer are respectively initialized by adopting a random initialization algorithm, and the bias initialization of the full-connection layer is set as a constant;
step S423, weighting the attention of the first bird profile And attention weighting of bird second profileBy passing throughNormalizing the function to obtain a weighting coefficient of the first characteristic diagram of the birdAnd weighting coefficients of bird second profile; See formula (4);
(4);
Step S424, for the step S41 Feature maps of the same group size,,...,The operation of the steps S421-S423 is circulated to obtainGroup feature map weighting coefficients,,...,WhereinFirst characterization of birds as first orthographic imageIs used for the weight coefficient of the (c),First characteristic map of bird as second orthographic imageIs used for the weight coefficient of the (c),Is the firstBird first feature map of orthographic imageIs used for the weight coefficient of the (c),Bird second feature map for first video frameIs used for the weight coefficient of the (c),Bird second feature map for second video frameIs used for the weight coefficient of the (c),Is the firstBirds second signature of Zhang Shipin framesWeighting coefficients of (2); will beFeature maps of the same group size,,...,AndGroup feature map weighting coefficients,,...,Respectively carrying out weighted fusion according to the formula (1) to finally obtainFusion map of characteristics of birds,,...,WhereinA first bird characteristic fusion map is shown,A second bird characteristic fusion map is shown,Represent the firstA bird feature fusion map;
Step S43, inputting the bird feature fusion map into a full-connection layer of a bird recognition model to obtain a pre-trained bird recognition model, and optimizing the bird recognition model by using the verification sample set in the step S3 to obtain an optimized bird recognition model;
Step S431, step S42 Fusion map of characteristics of birds,,...,Connecting, and inputting into a full connection layer of bird recognition model for training to obtain pre-trained bird recognition model, whereinFor the first bird feature fusion map,For the second bird feature fusion map,Is the firstA bird feature fusion map;
Step S432, performing bird recognition model optimization by using the verification sample set in step S3, specifically adopting a random gradient descent method and a counter propagation mechanism to enable a loss function to reach a minimum value, and obtaining a bird recognition model with high fitting degree;
The loss function uses a cross entropy loss function, as shown in formula (5);
(5);
where L is the cross entropy loss function, In order to obtain the number of samples,For the number of categories to be considered,Is the firstTrue tag vector of the individual categoriesThe value of each element, when applied to a sampleThe prediction result of (a) is categoryTaking 1 when correct, otherwise taking 0; Is a sample of the model output Predicting its category as categoryIs used to determine the probability value of (1),
Further, in step S5, specifically:
And (3) in the time period that the camera of a certain wetland area works normally and birds are resting in the step (S1), carrying out route planning and flight tasks on the wetland area by using the unmanned aerial vehicle again, processing the obtained orthographic images and video frames in the step (S2) and the step (S3) to obtain a plurality of groups of image pairs with the same size corresponding to space-time positions, and inputting the bird recognition model with high fitting degree obtained in the step (S4) to obtain bird recognition results.
Further, the invention also provides a technical scheme: referring to fig. 4, the wetland bird recognition system based on multi-source remote sensing observation and deep learning mainly comprises the following modules:
An image acquisition module; the image acquisition module acquires a plurality of orthographic images covering a certain wetland by using the unmanned aerial vehicle, and intercepts a plurality of corresponding video frames shot by a camera according to the orthographic images;
An image processing module; the image processing module is used for performing similarity matching and uniform size operation on the images acquired by the image acquisition module to obtain a plurality of groups of image pairs with the same size corresponding to the space-time positions;
A model training module; the model training module comprises the steps of marking samples of a plurality of groups of image pairs with the same size and corresponding to space-time positions, which are obtained by the image processing module, dividing a training sample set and a verification sample set, inputting the training sample set into a bird recognition model for model training, and optimizing the model by using the verification sample set to obtain the bird recognition model with high fitting degree; the model training module stops working after model training and optimization are completed;
A bird recognition module; the bird recognition module inputs a plurality of groups of image pairs with the same size, which are corresponding to the space-time positions and are obtained by the image processing module, and a bird recognition model with high fitting degree is obtained by the model training module, so that a bird recognition result is obtained.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (6)

1. The wetland bird recognition method based on multi-source remote sensing observation and deep learning is characterized by comprising the following steps of: the method comprises the following steps:
Step S1, acquiring a plurality of orthographic images of birds covering a certain wetland area by overlooking shooting of an unmanned aerial vehicle, synthesizing a panoramic orthographic image, and uniformly cutting the panoramic orthographic image to obtain a plurality of non-repeated orthographic images;
Step S2, according to the coordinate information and the time information of the plurality of non-repeated orthographic images obtained in the step S1, intercepting a plurality of video frames at the same time in the video images recorded by the plurality of cameras in the same range corresponding to each non-repeated orthographic image, performing similarity matching on the plurality of video frames corresponding to each non-repeated orthographic image, selecting one video frame with the highest similarity with each non-repeated orthographic image, forming an image pair, unifying the sizes of the orthographic images and the video frames in each group of image pairs, and finally forming a plurality of groups of image pairs with the same size corresponding to space-time positions;
S3, manually labeling the image pairs with the same size corresponding to the plurality of groups of space-time positions obtained in the step S2 to obtain a training sample set and a verification sample set;
S4, inputting a training sample set into the bird recognition model for model training, and optimizing the bird recognition model by using a verification sample set to obtain the bird recognition model with high fitting degree;
S5, re-acquiring a plurality of non-repeated orthographic images and video frames covering the wetland area, and inputting the bird recognition model with high fitting degree obtained in the step S4 after the processing of the step S2 and the step S3 to obtain a bird recognition result;
In step S4, the bird recognition model includes an input layer, a convolution layer, a feature fusion layer, a full connection layer, and an output layer, the training sample set is input into the bird recognition model to perform model training, and the bird recognition model is optimized by using the verification sample set, specifically:
S41, inputting the training sample set obtained in the step S31 into a convolution layer through an input layer of a bird recognition model to perform feature extraction, so as to obtain a bird first feature map of an orthographic image and a bird second feature map of a video frame;
Training sample set is Image pairs/>, corresponding to groups of spatiotemporal positions, of the same size,,...,Inputting the convolutional layer of the bird recognition model to perform feature extraction to obtainFeature map/>, of the same group size,,...,; WhereinFirst characteristic map of birds as first orthographic image,First characteristic map of birds as second orthographic image,ForBird first feature map of orthographic image,Bird second feature map for first video frame,Bird second feature map for second video frame,ForA bird second profile of Zhang Shipin frames;
step S42, carrying out weighted fusion on the bird first feature map of the orthographic image and the bird second feature map of the video frame by utilizing a feature fusion layer of the bird recognition model to obtain a bird feature fusion map;
weighting and fusing the bird first feature map of the orthographic image and the bird second feature map of the video frame according to the formula (1);
(1);
In the method, in the process of the invention, ForFusion map of bird characteristics,ForBird first feature map of orthographic image,Represents theBird second signature of Zhang Shipin frames,ForWeighting coefficient of bird first feature map of orthographic image,ForWeighting coefficient of bird second feature map of Zhang Shipin frames,And+=1;
Weighting coefficients of bird first feature map of orthographic imageAnd weighting coefficients/>, of bird second feature map of video frameObtained through an attention mechanism, the specific steps are as follows:
Step S421, inputting the bird first feature map of the orthographic image and the bird second feature map of the video frame into a pooling layer in the feature fusion layer, and respectively performing global average pooling operation to obtain a global average pooling result of the bird first feature map Global average pooling result/>, of bird second profile
Step S422, global average pooling results of the first feature map of birdsAnd global average pooling result of bird second profileAfter connection, the bird first characteristic map is input into a full connection layer of the characteristic fusion layer to obtain the attention weightAnd the attention weight/>, of the bird second profile; The activation function used by the full connectivity layer isThe functions are shown in a formula (2) and a formula (3);
(2);
(3);
In the method, in the process of the invention, AndThe weight and bias of the full-connection layer are respectively initialized by adopting a random initialization algorithm, and the bias initialization of the full-connection layer is set as a constant;
step S423, weighting the attention of the first bird profile And the attention weight/>, of the bird second profileBy passing throughNormalizing the function to obtain the weighted coefficient/>, of the bird first feature mapAnd weighting coefficient/>, of bird second profile; See formula (4);
(4);
Step S424, for the step S41 Feature map/>, of the same group size,,...,The operation of the steps S421-S423 is circulated to obtainGroup feature map weighting coefficient,,...,WhereinBird first signature/>, first orthographic imageWeighting coefficient ofBird first signature/>, second orthographic imageWeighting coefficient ofForBird first feature map of orthographic imageWeighting coefficient ofBird second feature map/>, for first video frameWeighting coefficient ofBird second feature map/>, for second video frameWeighting coefficient ofForBird second signature of Zhang Shipin framesWeighting coefficients of (2); willFeature map/>, of the same group size,,...,AndGroup feature map weighting coefficient,,...,Respectively carrying out weighted fusion according to the formula (1) to finally obtainFusion map of bird characteristics,,...,WhereinRepresenting a first bird characteristic fusion map,Representing a second bird characteristic fusion map,Represents theA bird feature fusion map;
Step S43, inputting the bird feature fusion map into a full-connection layer of a bird recognition model to obtain a pre-trained bird recognition model, and optimizing the bird recognition model by using the verification sample set in the step S3 to obtain an optimized bird recognition model;
step S431, step S42 Fusion map of bird characteristics,,...,Connecting, and inputting to a full connecting layer of the bird recognition model for training to obtain a pre-trained bird recognition model, whereinFor the first bird feature fusion map,Fusion of the second bird characteristics,ForA bird feature fusion map;
Step S432, performing bird recognition model optimization by using the verification sample set in step S3, specifically adopting a random gradient descent method and a counter propagation mechanism to enable a loss function to reach a minimum value, and obtaining a bird recognition model with high fitting degree;
The loss function uses a cross entropy loss function, as shown in formula (5);
(5);
In the method, in the process of the invention, Is a cross entropy loss function,For the number of samples,Is the category number,IsTrue tag vector in categoryThe value of the individual element, when applied to the sampleThe predicted outcome of (a) is categoryTaking 1 when correct, otherwise taking 0; /(I)Is the sample/>, at the time of model outputPredicting its category as categoryProbability value of
2. The wetland bird identification method based on multi-source remote sensing observation and deep learning as claimed in claim 1, wherein the method comprises the following steps: in step S1, specifically:
Selecting a period of time when a camera of a certain wetland area works normally and birds are resting, performing route planning and flight tasks on the wetland area by using an unmanned aerial vehicle for starting an RTK, acquiring a plurality of orthographic images with overlapping degree for covering the birds of the wetland area, generating a panoramic orthographic image with coordinate information by using an image processing tool, and uniformly cutting the panoramic orthographic image with the coordinate information to obtain And extracting coordinate information and shooting time information of each orthographic image.
3. The wetland bird identification method based on multi-source remote sensing observation and deep learning as claimed in claim 2, wherein the method comprises the following steps: in step S2, specifically:
Step S21, the wetland area is provided with Each camera is fixed in height, monitors the wetland area in an omnibearing manner and shoots clear bird detail pictures;
step S22, according to step S1 Selecting one of the non-repeating orthographic imagesAccording to orthographic imageCoordinate information of (2) and time information of photographing fromIntercepting the same range and the same time in video images recorded by the camerasVideo frame
Step S23, willVideo frameAnd orthographic imageSimilarity matching is carried out by utilizing an image similarity matching algorithm, and an orthographic image/> isobtainedOne video frame/>, with highest similarityForm a set of image pairsFor image pairsPerforming uniform size operation;
Step S24, loop through step S22-step S23, find an orthographic image ,,...,Video frame with highest similarity,,...,Finally getImage pairs/>, corresponding to groups of spatiotemporal positions, of the same size,,...,For the first orthographic image,For the second orthographic image,ForTensor orthographic image,For the video frame with highest similarity corresponding to the first orthographic image,For the video frame with highest similarity corresponding to the second orthographic image,Is equal toThe orthographic image corresponds to the video frame with the highest similarity.
4. The method for recognizing the birds in the wetland based on multi-source remote sensing observation and deep learning according to claim 3, wherein the method comprises the following steps: in step S3, specifically:
Step S31, using data labeling software to obtain the final product of step S24 Labeling the image pairs with the same size corresponding to the space-time positions of the groups to obtain the position of each bird in the image pair and the category label of each bird;
Step S32, according to 7:3 will be marked The images with the same size corresponding to the space-time positions of the groups are divided into a training sample set and a verification sample set to obtainGroup training sample setGroup verification sample set,
5. The method for recognizing the birds on the wetland based on multi-source remote sensing observation and deep learning according to claim 4, wherein the method comprises the following steps: in step S5, specifically:
And (3) in the time period that the camera of a certain wetland area works normally and birds are resting in the step (S1), carrying out route planning and flight tasks on the wetland area by using the unmanned aerial vehicle again, processing the obtained orthographic images and video frames in the step (S2) and the step (S3) to obtain a plurality of groups of image pairs with the same size corresponding to space-time positions, and inputting the bird recognition model with high fitting degree obtained in the step (S4) to obtain bird recognition results.
6. The wetland bird recognition system based on multi-source remote sensing observation and deep learning is applied to the wetland bird recognition method based on multi-source remote sensing observation and deep learning, and is characterized in that: the device mainly comprises the following modules:
An image acquisition module; the image acquisition module acquires a plurality of orthographic images covering a certain wetland by using the unmanned aerial vehicle, and intercepts a plurality of corresponding video frames shot by a camera according to the orthographic images;
An image processing module; the image processing module is used for performing similarity matching and uniform size operation on the images acquired by the image acquisition module to obtain a plurality of groups of image pairs with the same size corresponding to the space-time positions;
A model training module; the model training module comprises the steps of marking samples of a plurality of groups of image pairs with the same size and corresponding to space-time positions, which are obtained by the image processing module, dividing a training sample set and a verification sample set, inputting the training sample set into a bird recognition model for model training, and optimizing the model by using the verification sample set to obtain the bird recognition model with high fitting degree; the model training module stops working after model training and optimization are completed;
A bird recognition module; the bird recognition module inputs a plurality of groups of image pairs with the same size, which are corresponding to the space-time positions and are obtained by the image processing module, and a bird recognition model with high fitting degree is obtained by the model training module, so that a bird recognition result is obtained.
CN202410530152.6A 2024-04-29 2024-04-29 Wetland bird recognition method and system based on multi-source remote sensing observation and deep learning Active CN118097721B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410530152.6A CN118097721B (en) 2024-04-29 2024-04-29 Wetland bird recognition method and system based on multi-source remote sensing observation and deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410530152.6A CN118097721B (en) 2024-04-29 2024-04-29 Wetland bird recognition method and system based on multi-source remote sensing observation and deep learning

Publications (2)

Publication Number Publication Date
CN118097721A CN118097721A (en) 2024-05-28
CN118097721B true CN118097721B (en) 2024-06-25

Family

ID=91142528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410530152.6A Active CN118097721B (en) 2024-04-29 2024-04-29 Wetland bird recognition method and system based on multi-source remote sensing observation and deep learning

Country Status (1)

Country Link
CN (1) CN118097721B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082551A (en) * 2022-04-13 2022-09-20 中国科学院计算技术研究所 Multi-target detection method based on unmanned aerial vehicle aerial video
CN117765313A (en) * 2023-12-20 2024-03-26 贵州电网有限责任公司 Multisource remote sensing image classification method based on space continuous view contrast learning

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066993B (en) * 2017-05-13 2021-01-05 西安费斯达自动化工程有限公司 Automatic detection method for panoramic and precise images and spherical crown variable excitation amplitude modulation frequency modulation bird song
CN109033975A (en) * 2018-06-27 2018-12-18 山东大学 Birds detection, identification and method for tracing and device in a kind of monitoring of seashore
CN109246397A (en) * 2018-11-02 2019-01-18 上海交通大学 Multichannel video camera intelligent tracking shooting flock of birds type Sample Method and system
CN111709421B (en) * 2020-06-18 2023-06-27 深圳市赛为智能股份有限公司 Bird identification method, bird identification device, computer equipment and storage medium
CN112183236A (en) * 2020-09-10 2021-01-05 佛山聚卓科技有限公司 Unmanned aerial vehicle aerial photography video content identification method, device and system
CN112287896A (en) * 2020-11-26 2021-01-29 山东捷讯通信技术有限公司 Unmanned aerial vehicle aerial image target detection method and system based on deep learning
CN112560619B (en) * 2020-12-06 2022-08-30 国网江苏省电力有限公司常州供电分公司 Multi-focus image fusion-based multi-distance bird accurate identification method
CN112640884B (en) * 2020-12-29 2022-10-28 中国航空工业集团公司西安飞机设计研究所 Airport bird repelling device and bird repelling method thereof
KR20230043668A (en) * 2021-09-24 2023-03-31 한국전자통신연구원 Method and apparatus for generating panorama image based on deep learning network
CN114550000A (en) * 2022-01-05 2022-05-27 中国科学院计算机网络信息中心 Remote sensing image classification method and device based on multi-resolution feature fusion
CN115331110B (en) * 2022-08-26 2024-10-18 苏州大学 Fusion classification method and device for remote sensing hyperspectral image and laser radar image
CN116109922A (en) * 2022-12-21 2023-05-12 杭州睿胜软件有限公司 Bird recognition method, bird recognition apparatus, and bird recognition system
CN116797821A (en) * 2023-05-24 2023-09-22 中国矿业大学 Generalized zero sample image classification method based on fusion visual information
CN116758454A (en) * 2023-06-06 2023-09-15 浪潮软件科技有限公司 Wetland bird monitoring method and system
CN116935310B (en) * 2023-07-13 2024-06-18 百鸟数据科技(北京)有限责任公司 Real-time video monitoring bird density estimation method and system based on deep learning
CN117423077A (en) * 2023-10-11 2024-01-19 浙江时空道宇科技有限公司 BEV perception model, construction method, device, equipment, vehicle and storage medium
CN117332370A (en) * 2023-10-18 2024-01-02 上海海洋大学 Underwater target acousto-optic panorama cooperative identification device and identification method
CN117576724A (en) * 2023-11-14 2024-02-20 广东电网有限责任公司 Unmanned plane bird detection method, system, equipment and medium
CN117541873A (en) * 2023-11-27 2024-02-09 北京理工大学 Ground object classification method based on multisource remote sensing image

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082551A (en) * 2022-04-13 2022-09-20 中国科学院计算技术研究所 Multi-target detection method based on unmanned aerial vehicle aerial video
CN117765313A (en) * 2023-12-20 2024-03-26 贵州电网有限责任公司 Multisource remote sensing image classification method based on space continuous view contrast learning

Also Published As

Publication number Publication date
CN118097721A (en) 2024-05-28

Similar Documents

Publication Publication Date Title
US10599926B2 (en) Automated detection of nitrogen deficiency in crop
US20190303648A1 (en) Smart surveillance and diagnostic system for oil and gas field surface environment via unmanned aerial vehicle and cloud computation
WO2024060321A1 (en) Joint modeling method and apparatus for enhancing local features of pedestrians
US9798923B2 (en) System and method for tracking and recognizing people
CN106529538A (en) Method and device for positioning aircraft
CN112307868B (en) Image recognition method, electronic device, and computer-readable medium
CN104268586A (en) Multi-visual-angle action recognition method
CN112580657B (en) Self-learning character recognition method
US20210319234A1 (en) Systems and methods for video surveillance
CN110008919A (en) Four rotor unmanned aerial vehicle face identification system based on vision
CN112200056A (en) Face living body detection method and device, electronic equipment and storage medium
CN113158833A (en) Unmanned vehicle control command method based on human body posture
CN111046756A (en) Convolutional neural network detection method for high-resolution remote sensing image target scale features
CN113052091A (en) Action recognition method based on convolutional neural network
Yu et al. Man-made object recognition from underwater optical images using deep learning and transfer learning
CN109165636A (en) A kind of sparse recognition methods of Rare Birds based on component-level multiple features fusion
US20210312200A1 (en) Systems and methods for video surveillance
CN114155489A (en) Multi-device cooperative unmanned aerial vehicle flyer detection method, device and storage medium
Margapuri et al. Seed classification using synthetic image datasets generated from low-altitude UAV imagery
CN118097721B (en) Wetland bird recognition method and system based on multi-source remote sensing observation and deep learning
CN112069997B (en) Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net
CN116109930A (en) Cross-view geographic view positioning method based on dynamic observation
CN115376184A (en) IR image in-vivo detection method based on generation countermeasure network
CN110826432B (en) Power transmission line identification method based on aviation picture
CN112307980A (en) Image identification method based on incomplete multi-view clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant