CN114419444A

CN114419444A - Lightweight high-resolution bird group identification method based on deep learning network

Info

Publication number: CN114419444A
Application number: CN202210078169.3A
Authority: CN
Inventors: 孙辉; 史玉龙; 王蕊
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2022-04-29

Abstract

The invention provides a light-weight high-resolution bird group identification method based on a deep learning network, which comprises the following steps: constructing a bird group scene identification network consisting of four parallel subnetworks based on the HRNet network; generating an FIDT (field-defined transformation) graph based on point-level annotation and a focusing inverse distance algorithm, and performing supervised training on the bird group scene recognition network through the FIDT graph; importing an airport bird group image set into the bird group scene recognition network after supervision training, and acquiring a final characteristic image set by joint up-sampling; and determining the unique counting positioning diagram of the bird group based on a preset counting criterion and a positioning criterion according to the final feature diagram set. According to the method, the capability of extracting the flight attitude characteristics of the bird group by a network is enhanced and the complexity of a model is reduced while rich bird group characteristics are obtained by combining the asymmetric convolution and the redundant characteristic graph linear transformation mechanism. The invention supervises the training process by combining the loss function, reduces the influence of irrelevant background in the image and accelerates the network convergence speed.

Description

Lightweight high-resolution bird group identification method based on deep learning network

Technical Field

The invention relates to the technical field of airport bird identification, in particular to a light-weight high-resolution bird group identification method based on a deep learning network.

Background

With the rapid development of the aviation industry and the increasing of air routes, the competition of aircrafts and birds for airspace is more and more intense, and the safety of the takeoff and landing environment is highly emphasized. Especially in coastal areas, large-area mudflat wetlands are distributed around airports, so that a large number of wetland birds are attracted to forage in the wetlands and shallow water ecological clusters, and therefore, the coastal airports are simultaneously influenced by the land birds and the water birds, and serious bird strike risks are caused. At present, the mainstream bird detection means in the airport comprises technologies such as a radar method, a thermal imaging method and image processing. The radar method is to detect flying birds by using a bird detection radar, acquire latitude and longitude information of a target and calculate the flying speed and direction of the target, and the thermal imaging method is mainly to convert invisible infrared energy emitted by the birds into visible thermal images for detection at night. However, the two methods need professional detection equipment, the cost performance is low, the obtained bird information is limited, and in contrast, the image processing technology is relatively low in cost, the image information only needs to be transmitted into a processor through a camera for processing, information such as the number, the position and the individual size of birds in an airport can be provided for ground workers, and the bird repelling work efficiency is improved. The visual task for identifying the bird group scene mainly comprises two subtasks of bird group counting and positioning, wherein the bird group counting aims at estimating the number of birds appearing in an image or a video, and the bird group positioning is to identify the position and the size of each example in the scene while counting the bird group, so that the visual task has important practical value for improving the existing bird repelling mode and improving the bird repelling working effect. In addition, the bird swarm counting and positioning algorithm with good performance can be applied to other visual fields, such as public safety, traffic management, environmental protection, agricultural monitoring, medical cell counting and the like, and has important practical significance and application value.

Reasoning on images or videos of a bird swarm is a challenging computer vision task, and has high requirements on the efficiency and accuracy of the model. In the prior art, the task of counting and positioning images of a bird group mainly faces the following four difficulties:

(1) the bird swarm scene has serious shielding, various form changes, serious influence by illumination and color change and unfavorable identification of the bird swarm region;

(2) due to the fact that the distances between the image pickup equipment and the bird group targets are different, the same target has large scale change among different images;

(3) the uneven distribution of the bird population, the uneven distribution of the positive samples (birds) and the negative samples (background) in the image, brings great challenges to the training process.

(4) Most of the bird data sets disclosed at present are focused on bird classification, each picture usually contains a small number of birds, and the shooting distance is short, so that the bird data sets cannot be applied to research on bird group counting and positioning.

Disclosure of Invention

The invention provides a light-weight high-resolution bird group identification method based on a deep learning network, which is used for solving the problems that serious shielding exists in a bird group scene, the form change is various, the influence of illumination and color change is serious, and the bird group region identification is not facilitated; due to the fact that the distances between the image pickup equipment and the bird group targets are different, the same target has large scale change among different images; the uneven distribution of the bird population, the uneven distribution of the positive samples (birds) and the negative samples (background) in the image, brings great challenges to the training process. Most of the bird data sets disclosed at present are focused on bird classification, each picture usually contains a small number of birds, and the shooting distance is short, so that the bird data sets cannot be applied to the situation of research on bird group counting and positioning.

A light-weight high-resolution bird group identification method based on a deep learning network comprises the following steps:

constructing a bird group scene identification network consisting of four parallel subnetworks based on the HRNet network;

generating an FIDT (field-defined transformation) graph based on point-level annotation and a focusing inverse distance algorithm, and performing supervised training on the bird group scene recognition network through the FIDT graph;

importing an airport bird group image set into the bird group scene recognition network after supervision training, and acquiring a final characteristic image set by joint up-sampling;

and determining the unique counting positioning diagram of the bird group based on a preset counting criterion and a positioning criterion according to the final feature diagram set.

As an embodiment of the present invention: the parallel subnets comprise a first subnet, a second subnet, a third subnet and a fourth subnet; wherein the content of the first and second substances,

the resolution of the fourth sub-network is half of the resolution of the third sub-network;

the resolution of the third sub-network is half of the resolution of the second sub-network;

the resolution of the second sub-network is half of the resolution of the first sub-network;

the resolution of the parallel subnets is reduced by half in sequence; wherein the content of the first and second substances,

when the resolution of the subnet is reduced by half, the number of channels of the subnet is doubled;

the four parallel subnetworks are used to generate feature maps of four different sizes.

As an embodiment of the present invention: the method further comprises the following steps:

constructing a residual error module of the bird group scene recognition network based on an asymmetric convolution and redundant feature map linear transformation mechanism; wherein the content of the first and second substances,

the residual error module comprises a plurality of ACGblock modules and a plurality of ACGneeck modules;

the first subnet is composed of 4 ACGtech modules;

the second sub-network is formed by a multi-resolution block;

the third sub-network is composed of four multi-resolution blocks;

the fourth sub-network is composed of three multi-resolution blocks;

the multi-resolution block is composed of four ACGblock modules.

As an embodiment of the present invention: the redundancy feature map linear transformation mechanism comprises:

generating a partial original feature map based on the convolution kernels passed by the common convolution layer;

and performing linear transformation on the partial original characteristic diagram through the characteristic channel convolution layer to generate a channel characteristic diagram.

performing interpolation upsampling on the feature maps of the four different sizes, wherein the interpolation upsampling comprises the following steps:

adjusting the four feature maps with different sizes into the same channel number for output;

and mapping the bird group features in the feature map to the same space, and determining multi-scale information of the airport bird group image set under four different sizes.

As an embodiment of the present invention: the generating an FIDT graph based on the point-level annotation and focus inverse distance algorithm includes:

acquiring a marking tool;

performing point-level labeling on each image in a preset training data set through the labeling tool;

determining coordinate positions and bird group quantity according to the point-level labels, and generating a label training data set of the point-level labels;

and carrying out focusing transformation on the labeled training data set through a focusing inverse distance algorithm to generate an FIDT (Fidt finite Difference) graph.

As an embodiment of the present invention: the generating an FIDT map based on the point-level annotation and focus inverse distance algorithm further comprises:

performing data expansion based on the FIDT graph; wherein the content of the first and second substances,

the data set extension includes: image scaling, image cropping, and image rotation;

after data expansion, constructing 3 × 3 pooling layers;

determining bird group position information in the FIDT according to the 3-by-3 pooling layer;

and setting an adaptive threshold according to the position information of the bird group in the FIDT image, and judging whether the point-level annotation is missed.

according to the final characteristic diagram, a combined loss function based on a negative sample inhibition loss function and a region structure loss function is established; wherein the content of the first and second substances,

the joint loss function is defined as:

wherein the content of the first and second substances,

for controlling the hyperparameter of the negative sample rejection loss function weight, L₂For the Europe Germany loss, L_I-SAs a loss function of the area structure, L_mfSuppressing loss function for negative samples

The negative sample rejection loss function is defined as:

wherein Z represents the number of birds in image I,

indicating the possibility of prediction of J pixels when

When the number of the pixels is 1, the current pixel points are birds, when the number of the pixels is 0, the background is represented, gamma is a parameter for controlling the weight of the easily classified sample, and delta is a punishment parameter for reducing the proportion of the background around the birds;

the regional structure loss function is:

wherein E is_nRepresenting the area where the nth bird in the predicted FIDT image is located; g_nRepresenting the area where the nth bird in the real FIDT diagram is located; n represents the number of birds in the FIDT plot; n is a positive integer.

And judging whether the counting positioning chart of the bird group is correct or not according to the combined loss function.

As an embodiment of the present invention: the counting criteria include:

calculating the average absolute error and the mean square error according to the final feature map set;

taking the average absolute error as a first evaluation criterion; wherein the content of the first and second substances,

the first evaluation criterion is represented by the following formula:

taking the mean equation error as a second evaluation criterion; wherein the content of the first and second substances,

the second evaluation criterion is:

wherein T is the number of final feature maps; c_iAnd respectively representing the number of birds predicted by the ith bird group image;

representing the true number of predictions of the ith bird group image; i is a positive integer;

and judging the counting accuracy probability based on the preset calculated average absolute error threshold value and the preset calculated mean square error threshold value.

As an embodiment of the present invention: the positioning criteria include:

respectively setting an accuracy threshold, a recall rate threshold and a comprehensive evaluation index threshold;

and based on the accuracy threshold, the recall rate threshold and the comprehensive evaluation index threshold, carrying out positioning performance evaluation on the unique counting feature map set to determine the positioning accuracy probability.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flowchart of a method for identifying a light-weighted high-resolution bird group based on a deep learning network according to an embodiment of the present invention;

FIG. 2 is a block diagram of a bird swarm identification scene network in an embodiment of the invention;

FIG. 3 is a diagram illustrating the structure of ACGblock module and ACGtech module according to an embodiment of the present invention;

FIG. 4 is a diagram of a joint upsampling structure in an embodiment of the present invention;

FIG. 5 is a schematic representation of a sample bird flock data set in accordance with an embodiment of the present invention;

FIG. 6 is a diagram of FIDT according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

In the present invention:

HRNet is called High Resolution Network, and Chinese translates to High Resolution Network.

FIDT is called Focal Inverse Distance Transform, and Chinese is translated into focus Inverse Distance Transform;

ACGblock module: and the basic bird feature extraction module is designed based on the linear transformation mechanism of the asymmetric convolution and the redundant feature graph. The module is an improvement of a basic residual error module, and replaces 3-by-3 convolution in an original residual error module by using an asymmetric convolution fusion redundant feature map linear transformation mechanism. The improvement has the effects of effectively reducing the complexity of the HRNet network, reducing the number of parameters, enabling the network to obtain rich characteristic graphs, and achieving the purpose of not increasing the calculated amount by fusing asymmetric convolution kernels in the network reasoning stage.

ACGtech module: and the bird feature extraction module is designed based on the asymmetric convolution and redundant feature graph linear transformation mechanism. The module is an improvement of a residual module of a bottleneck structure, and replaces 3-by-3 convolution in an original residual module by using an asymmetric convolution fusion redundant feature map linear transformation mechanism. The improvement has the effects of effectively reducing the complexity of the HRNet network, reducing the number of parameters, enabling the network to obtain rich characteristic graphs, and achieving the purpose of not increasing the calculated amount by fusing asymmetric convolution kernels in the network reasoning stage.

Firstly, the invention mainly aims at the safety threat of birds in low-altitude areas of airports to take-off and landing of aircrafts and the problem that a large deep neural network is difficult to deploy on a platform with less resources. The invention provides a light-weight high-resolution bird group identification method based on a deep learning network, which is used for counting and positioning bird groups. The invention takes a high-resolution network (HRNet) as a basic network framework, enhances the extraction capability of the network to the characteristics of the bird group by using an asymmetric convolution and redundant characteristic graph linear transformation mechanism, and lightens the network. Furthermore, it is proposed herein to jointly use a negative sample rejection loss function for the training process of the supervised network. The method achieves balance on model performance and calculation complexity, and has higher accuracy and robustness.

Aiming at the problems of multiple HRNet network parameters and high operation complexity, the design introduces an Asymmetric Convolution (Asymmetric Convolution) and a redundant feature map linear transformation mechanism (GhostNet) to improve a residual error module in the HRNet network, and constructs a new Asymmetric Convolution residual error module ACGhost based on the redundant feature map transformation mechanism, wherein the module comprises an Asymmetric Convolution basic module ACGblock based on the redundant feature map transformation mechanism and an Asymmetric Convolution bottleneck module ACGbank based on the redundant feature map transformation mechanism. Specifically, the design replaces the 3 × 3 convolution in the residual module with an asymmetric convolution based on a redundant feature map transformation mechanism, wherein the convolution is constructed by the asymmetric convolution and the redundant feature map linear transformation mechanism. The improvement has the effects of effectively reducing the complexity of the HRNet network, reducing the number of parameters, enabling the network to obtain rich characteristic graphs, and achieving the purpose of not increasing the calculated amount by fusing asymmetric convolution kernels in the network reasoning stage.

Example 1:

as shown in fig. 1, a method for identifying a light-weighted high-resolution bird group based on a deep learning network includes:

The principle of the above counting scheme is that: the main purpose of the method is to count and position the bird group, mainly to construct an accurate positioning prediction network, and mainly to solve the problems that the bird group is difficult to locate and count and a large deep neural network is difficult to deploy on a platform with less resources. Therefore, the invention carries out structural design through the HRNet network, and takes four parallel subnets as a main body of the model; the bird population counting and locating tasks are performed by this subject model as shown in fig. 1. And making a bird group data set through point-level annotation, and performing supervised training by using a focusing inverse distance variation algorithm, wherein annotation information of the point-level annotation comprises the coordinate position of birds in the image and the number of bird groups. Compared with a density map generated by using a self-adaptive Gaussian kernel in the prior population counting work, the FIDT map does not overlap example objects even in a dense area, and the accuracy of algorithm estimation can be improved. After training, the context information of the multi-scale features is fused through a combined up-sampling module to obtain a high-quality prediction density graph, namely a final feature graph, and a unique graph with the most accurate counting is determined through standard evaluation and positioning evaluation of the final feature graph to determine the position and the number of the bird groups.

The beneficial effect of above-mentioned counting scheme lies in:

according to the invention, the ACGblock module and the ACGtech module are constructed by combining the asymmetric convolution and the redundant feature graph linear transformation mechanism, so that the capability of extracting the flight attitude features of the bird group by a network is enhanced, abundant bird group features are obtained, and the model complexity is reduced.

The invention provides a new joint loss function for supervising the training process of a network, which restrains and predicts the consistency of an FIDT (Fidt finite field transformation) diagram and a ground real FIDT diagram, introduces a negative sample inhibition loss function into the joint loss function for reducing the influence of irrelevant backgrounds in images, so that the network focuses more on a bird swarm region in the learning process, and the network convergence speed is accelerated.

The invention solves the problems that the safety threat of birds to take off and landing of an aircraft in a low-altitude area of an airport is solved, and a large-scale deep neural network is difficult to deploy on a platform with less resources, and has better generalization performance and accurate bird swarm positioning and bird swarm counting in a more accurate, more stable model and more concise mode.

Example 2:

the number of channels of the sub-network is doubled;

The principle of the technical scheme is as follows: the invention is divided into four subnets, the resolution of each subnet is different, the number of channels is different, the output pictures are also different in size, and the loss in precision can be reduced through four different characteristic graphs. The specific structure of the invention is shown in figure 2, and the invention can effectively reduce the calculation complexity and the memory occupation by the arrangement of four subnets under the condition of not losing the model performance, and obtain the final characteristic diagram with high quality.

The beneficial effects of the above technical scheme are that: the invention can reduce the loss of precision through four different feature maps, and can effectively reduce the calculation complexity and the memory occupation through the arrangement of four subnets under the condition of not losing the model performance, thereby obtaining a high-quality final feature map.

Example 3:

the first subnet is composed of 4 ACGtech modules;

the second sub-network is formed by a multi-resolution block;

the third sub-network is composed of four multi-resolution blocks;

the fourth sub-network is composed of three multi-resolution blocks;

the multi-resolution block is composed of four ACGblock modules.

The principle of the technical scheme is as follows: the ACGblock module and the ACGtech module are shown in figure 3, and the model constructed by the method obtains various pictures with different resolutions when the pictures are identified, namely, the feature maps with different sizes and resolutions are determined, and each picture can be divided step by step to obtain the pictures with different sizes and different resolutions.

The beneficial effects of the above technical scheme are that: by the method, the complexity of the HRNet network can be effectively reduced, the number of parameters is reduced, the network can obtain rich characteristic graphs, and the aim of not increasing the calculated amount is fulfilled by fusing asymmetric convolution kernels in a network reasoning stage.

Example 4:

generating a partial original feature map based on the convolution kernel of the common convolution layer;

The principle of the technical scheme is as follows: the convolution transformation of the picture is to obtain a better feature map, so that a part of original feature maps are generated by using fewer convolution kernels in a common convolution mode, and then the residual channel feature maps are generated by performing simple linear transformation on the generated channel feature maps.

It is represented by the following formula: theoretical acceleration ratios and parametric compression ratios achievable using the redundant feature map linear transformation mechanism are calculated as shown in equations (1) and (2):

(1)

(2)

wherein H_inFor input of channel number, H_outFor the number of output channels, k is the convolution kernel size and S is the scaling factor.

The beneficial effects of the above technical scheme are that: the method can accelerate, and as the birds have changeable posture forms in the flying process, the network feature extraction capability is enhanced under the condition of limited computing resources, and the robustness of the network model on image overturning and rotation is improved.

Example 5:

The principle of the technical scheme is as follows: as shown in fig. 4, fig. 4 is divided into four stages. As shown in fig. 4(a), the design goal is to map all input features into the same space, so as to achieve better fusion and reduce the computational complexity, then perform upsampling and concatenation on the generated feature map, and obtain the process as shown in fig. 4(b), then extract features from the feature map in parallel by using four expansion convolutions with expansion rates of 1, 2, 4 and 8, respectively, and finally perform parallel concatenation, as shown in fig. 4 (c). This joint use of multiple dilation convolution operations can extract multi-scale context information from the multi-level feature map, resulting in better performance.

The beneficial effects of the above technical scheme are that: the invention can effectively reduce the calculation complexity and the memory occupation under the condition of not losing the model performance, and obtain the final characteristic diagram with high quality.

Example 6:

acquiring a marking tool;

The principle of the technical scheme is as follows: the invention is provided with a marking tool, the marking tool of the invention manually carries out point-level annotation on birds in each image, and annotation information comprises the coordinate positions of the birds in the image and the number of bird groups. And (3) generating a ground real FIDT graph by adopting a focusing inverse distance transformation algorithm for supervision training, wherein the FIDT graph has no overlap among example objects even in a dense area compared with a density graph generated by using an adaptive Gaussian kernel in the previous population counting work. The data set for labeling according to the invention is shown in FIG. 5, and the resulting TIDT image is shown in FIG. 6.

The beneficial effects of the above technical scheme are that: the above approach is mainly aimed at improving the accuracy of the algorithm estimation.

Example 7:

after data expansion, a 3x3 pooling layer is constructed; the 3x3 pooling layer belongs to the largest pooling layer;

determining bird group position information in the FIDT graph according to the 3x3 pooling layer;

The principle of the technical scheme is as follows: in the training process, the situation that the data set is small may occur, and at this time, the data set should be enlarged, so the data set is expanded in a data enhancement mode in the model training process, including random scaling, shearing and rotation, a pooling layer can be constructed, and the mode of setting the threshold value ensures that the annotation result is more correct.

The beneficial effects of the above technical scheme are that: the invention can realize data remembering expansion and also can make the annotation more and the result more correct.

Example 8:

the joint loss function is defined as:

wherein the content of the first and second substances,

for controlling the hyperparameter of the negative sample rejection loss function weight, L₂For the function of the euclidean loss,

L_I-Sas a loss function of the area structure, L_mfSuppressing the loss function for the negative sample; e represents an FIDT graph generated by predicting the input ith bird group image I according to the network parameter theta; m represents the number of bird images; i is_iRepresenting the ith input image of the bird group; θ represents a network parameter; g_iAnd (3) an actual FIDT graph representing the ith bird group image.

The negative sample rejection loss function is defined as:

wherein Z represents the number of birds in image I,

indicating the possibility of prediction of J pixels when

the regional structure loss function is:

wherein E is_nRepresenting the area where the nth bird in the FIDT map is predicted to be located; g_nRepresenting the area where the nth bird in the real FIDT diagram is located; n represents the number of birds in the real FIDT image; n is a positive integer.

L_SThe loss of structural error commonly used in computers is defined as follows:

wherein, mu_EImage mean of FIDT graph representing network estimation; mu.s_GImage mean representing a true FIDT plot; sigma_EGRepresenting the variance of the image between the estimated FIDT map and the real FIDT map; lambda [ alpha ]₁Indicating that the constant value is set to 0.0001 to prevent zero division; lambda [ alpha ]₂Indicating that the constant value is set to 0.0009 to prevent zero division;

representing a variance representing the estimated FIDT image;

the variance representing the true FIDT plot image is shown.

The principle of the technical scheme is as follows: because in a bird group scene, high-density areas have a larger difference compared with low-density areas or local patterns and texture features of the background, the number of negative examples (background) is far larger than that of positive examples (head), and most of the negative examples (background) are easy to classify and account for most of the total loss, such a case that sample classes are unbalanced easily leads to wrong model optimization directions, and a satisfactory result is difficult to achieve. The invention therefore uses a joint loss function which serves to determine the degree of error.

The beneficial effects of the above technical scheme are that: the combined loss function is the combination of two loss functions, and the invention can judge whether the counting positioning is correct or not through the fusion of the loss functions.

Example 9:

as an embodiment of the present invention: the counting criteria include:

the first evaluation criterion is represented by the following formula:

the second evaluation criterion is:

representing the real number of birds in the ith bird swarm image; i is a positive integer;

The principle of the technical scheme is as follows: mean Absolute Error (MAE) and Mean Square Error (MSE) are used herein as evaluation criteria. MAE can reflect model counting accuracy, and MSE can reflect model robustness.

The beneficial effects of the above technical scheme are that: the size of the example object can be accurately estimated, and the accuracy of the obtained training image is ensured.

Example 10:

as an embodiment of the present invention: the positioning criteria include:

and evaluating the positioning performance of the unique counting feature map set based on the accuracy threshold, the recall threshold and the comprehensive evaluation index threshold, and determining the positioning accuracy probability.

The principle of the technical scheme is as follows: the comprehensive evaluation is to set an accuracy threshold, a recall threshold and a comprehensive evaluation index threshold by combining historical data, and then judge the accuracy of the result by judging the accuracy of the result.

The beneficial effects of the above technical scheme are that: by evaluating the positioning criteria under a plurality of different positioning criteria, it can be determined whether the obtained result is accurate.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A light-weight high-resolution bird group identification method based on a deep learning network is characterized by comprising the following steps:

2. The method for identifying the high-resolution bird group based on the deep learning network with light weight as claimed in claim 1, wherein the parallel subnetworks comprise a first subnetwork, a second subnetwork, a third subnetwork and a fourth subnetwork; wherein the content of the first and second substances,

3. The method for lightweight, high-resolution bird population identification based on deep learning network of claim 2, further comprising:

the parallel sub-network comprises a plurality of ACGblock modules and a plurality of ACGtech modules;

the first subnet is composed of 4 ACGtech modules;

the second sub-network is formed by a multi-resolution block;

the third sub-network is composed of four multi-resolution blocks;

the fourth sub-network is composed of three multi-resolution blocks;

the multi-resolution block is composed of four ACGblock modules.

4. The method of claim 2, wherein the redundant feature map linear transformation mechanism comprises:

generating a partial original characteristic diagram based on a convolution kernel of the common convolution layer;

5. The method for lightweight, high-resolution bird population identification based on deep learning network of claim 1, further comprising:

6. The method of claim 1, wherein the generating FIDT maps based on point-level annotation and inverse distance-of-focus algorithm comprises:

acquiring a marking tool;

7. The method of claim 6, wherein the generating FIDT maps based on point-level annotation and inverse distance-of-focus algorithm further comprises:

after data expansion, constructing a 3 × 3 pooling layer;

8. The method for lightweight, high-resolution bird population identification based on deep learning network of claim 1, further comprising:

the joint loss function is defined as:

wherein the content of the first and second substances,

for controlling the hyperparameter of the negative sample rejection loss function weight, L₂For the Europe Germany loss, L_I-SAs a loss function of the area structure, L_mfSuppressing the loss function for the negative sample;

the negative sample rejection loss function is defined as:

wherein Z represents the number of birds in image I,

indicating the possibility of prediction of J pixels when

the regional structure loss function is:

9. The light weight based high resolution bird swarm identification deep learning network of claim 1, wherein the counting criteria comprises:

the first evaluation criterion is represented by the following formula:

the second evaluation criterion is:

wherein T is the number of final feature maps; c_iRepresenting the number of birds predicted by the ith bird swarm image;

10. The light weight based high resolution bird swarm identification deep learning network of claim 1, wherein the positioning criteria comprises: