CN114419444A - Lightweight high-resolution bird group identification method based on deep learning network - Google Patents

Lightweight high-resolution bird group identification method based on deep learning network Download PDF

Info

Publication number
CN114419444A
CN114419444A CN202210078169.3A CN202210078169A CN114419444A CN 114419444 A CN114419444 A CN 114419444A CN 202210078169 A CN202210078169 A CN 202210078169A CN 114419444 A CN114419444 A CN 114419444A
Authority
CN
China
Prior art keywords
network
bird
resolution
fidt
bird group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210078169.3A
Other languages
Chinese (zh)
Inventor
孙辉
史玉龙
王蕊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Civil Aviation University of China
Original Assignee
Civil Aviation University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Civil Aviation University of China filed Critical Civil Aviation University of China
Priority to CN202210078169.3A priority Critical patent/CN114419444A/en
Publication of CN114419444A publication Critical patent/CN114419444A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a light-weight high-resolution bird group identification method based on a deep learning network, which comprises the following steps: constructing a bird group scene identification network consisting of four parallel subnetworks based on the HRNet network; generating an FIDT (field-defined transformation) graph based on point-level annotation and a focusing inverse distance algorithm, and performing supervised training on the bird group scene recognition network through the FIDT graph; importing an airport bird group image set into the bird group scene recognition network after supervision training, and acquiring a final characteristic image set by joint up-sampling; and determining the unique counting positioning diagram of the bird group based on a preset counting criterion and a positioning criterion according to the final feature diagram set. According to the method, the capability of extracting the flight attitude characteristics of the bird group by a network is enhanced and the complexity of a model is reduced while rich bird group characteristics are obtained by combining the asymmetric convolution and the redundant characteristic graph linear transformation mechanism. The invention supervises the training process by combining the loss function, reduces the influence of irrelevant background in the image and accelerates the network convergence speed.

Description

Lightweight high-resolution bird group identification method based on deep learning network
Technical Field
The invention relates to the technical field of airport bird identification, in particular to a light-weight high-resolution bird group identification method based on a deep learning network.
Background
With the rapid development of the aviation industry and the increasing of air routes, the competition of aircrafts and birds for airspace is more and more intense, and the safety of the takeoff and landing environment is highly emphasized. Especially in coastal areas, large-area mudflat wetlands are distributed around airports, so that a large number of wetland birds are attracted to forage in the wetlands and shallow water ecological clusters, and therefore, the coastal airports are simultaneously influenced by the land birds and the water birds, and serious bird strike risks are caused. At present, the mainstream bird detection means in the airport comprises technologies such as a radar method, a thermal imaging method and image processing. The radar method is to detect flying birds by using a bird detection radar, acquire latitude and longitude information of a target and calculate the flying speed and direction of the target, and the thermal imaging method is mainly to convert invisible infrared energy emitted by the birds into visible thermal images for detection at night. However, the two methods need professional detection equipment, the cost performance is low, the obtained bird information is limited, and in contrast, the image processing technology is relatively low in cost, the image information only needs to be transmitted into a processor through a camera for processing, information such as the number, the position and the individual size of birds in an airport can be provided for ground workers, and the bird repelling work efficiency is improved. The visual task for identifying the bird group scene mainly comprises two subtasks of bird group counting and positioning, wherein the bird group counting aims at estimating the number of birds appearing in an image or a video, and the bird group positioning is to identify the position and the size of each example in the scene while counting the bird group, so that the visual task has important practical value for improving the existing bird repelling mode and improving the bird repelling working effect. In addition, the bird swarm counting and positioning algorithm with good performance can be applied to other visual fields, such as public safety, traffic management, environmental protection, agricultural monitoring, medical cell counting and the like, and has important practical significance and application value.
Reasoning on images or videos of a bird swarm is a challenging computer vision task, and has high requirements on the efficiency and accuracy of the model. In the prior art, the task of counting and positioning images of a bird group mainly faces the following four difficulties:
(1) the bird swarm scene has serious shielding, various form changes, serious influence by illumination and color change and unfavorable identification of the bird swarm region;
(2) due to the fact that the distances between the image pickup equipment and the bird group targets are different, the same target has large scale change among different images;
(3) the uneven distribution of the bird population, the uneven distribution of the positive samples (birds) and the negative samples (background) in the image, brings great challenges to the training process.
(4) Most of the bird data sets disclosed at present are focused on bird classification, each picture usually contains a small number of birds, and the shooting distance is short, so that the bird data sets cannot be applied to research on bird group counting and positioning.
Disclosure of Invention
The invention provides a light-weight high-resolution bird group identification method based on a deep learning network, which is used for solving the problems that serious shielding exists in a bird group scene, the form change is various, the influence of illumination and color change is serious, and the bird group region identification is not facilitated; due to the fact that the distances between the image pickup equipment and the bird group targets are different, the same target has large scale change among different images; the uneven distribution of the bird population, the uneven distribution of the positive samples (birds) and the negative samples (background) in the image, brings great challenges to the training process. Most of the bird data sets disclosed at present are focused on bird classification, each picture usually contains a small number of birds, and the shooting distance is short, so that the bird data sets cannot be applied to the situation of research on bird group counting and positioning.
A light-weight high-resolution bird group identification method based on a deep learning network comprises the following steps:
constructing a bird group scene identification network consisting of four parallel subnetworks based on the HRNet network;
generating an FIDT (field-defined transformation) graph based on point-level annotation and a focusing inverse distance algorithm, and performing supervised training on the bird group scene recognition network through the FIDT graph;
importing an airport bird group image set into the bird group scene recognition network after supervision training, and acquiring a final characteristic image set by joint up-sampling;
and determining the unique counting positioning diagram of the bird group based on a preset counting criterion and a positioning criterion according to the final feature diagram set.
As an embodiment of the present invention: the parallel subnets comprise a first subnet, a second subnet, a third subnet and a fourth subnet; wherein the content of the first and second substances,
the resolution of the fourth sub-network is half of the resolution of the third sub-network;
the resolution of the third sub-network is half of the resolution of the second sub-network;
the resolution of the second sub-network is half of the resolution of the first sub-network;
the resolution of the parallel subnets is reduced by half in sequence; wherein the content of the first and second substances,
when the resolution of the subnet is reduced by half, the number of channels of the subnet is doubled;
the four parallel subnetworks are used to generate feature maps of four different sizes.
As an embodiment of the present invention: the method further comprises the following steps:
constructing a residual error module of the bird group scene recognition network based on an asymmetric convolution and redundant feature map linear transformation mechanism; wherein the content of the first and second substances,
the residual error module comprises a plurality of ACGblock modules and a plurality of ACGneeck modules;
the first subnet is composed of 4 ACGtech modules;
the second sub-network is formed by a multi-resolution block;
the third sub-network is composed of four multi-resolution blocks;
the fourth sub-network is composed of three multi-resolution blocks;
the multi-resolution block is composed of four ACGblock modules.
As an embodiment of the present invention: the redundancy feature map linear transformation mechanism comprises:
generating a partial original feature map based on the convolution kernels passed by the common convolution layer;
and performing linear transformation on the partial original characteristic diagram through the characteristic channel convolution layer to generate a channel characteristic diagram.
As an embodiment of the present invention: the method further comprises the following steps:
performing interpolation upsampling on the feature maps of the four different sizes, wherein the interpolation upsampling comprises the following steps:
adjusting the four feature maps with different sizes into the same channel number for output;
and mapping the bird group features in the feature map to the same space, and determining multi-scale information of the airport bird group image set under four different sizes.
As an embodiment of the present invention: the generating an FIDT graph based on the point-level annotation and focus inverse distance algorithm includes:
acquiring a marking tool;
performing point-level labeling on each image in a preset training data set through the labeling tool;
determining coordinate positions and bird group quantity according to the point-level labels, and generating a label training data set of the point-level labels;
and carrying out focusing transformation on the labeled training data set through a focusing inverse distance algorithm to generate an FIDT (Fidt finite Difference) graph.
As an embodiment of the present invention: the generating an FIDT map based on the point-level annotation and focus inverse distance algorithm further comprises:
performing data expansion based on the FIDT graph; wherein the content of the first and second substances,
the data set extension includes: image scaling, image cropping, and image rotation;
after data expansion, constructing 3 × 3 pooling layers;
determining bird group position information in the FIDT according to the 3-by-3 pooling layer;
and setting an adaptive threshold according to the position information of the bird group in the FIDT image, and judging whether the point-level annotation is missed.
As an embodiment of the present invention: the method further comprises the following steps:
according to the final characteristic diagram, a combined loss function based on a negative sample inhibition loss function and a region structure loss function is established; wherein the content of the first and second substances,
the joint loss function is defined as:
Figure BDA0003484953780000051
wherein the content of the first and second substances,
Figure BDA0003484953780000052
for controlling the hyperparameter of the negative sample rejection loss function weight, L2For the Europe Germany loss, LI-SAs a loss function of the area structure, LmfSuppressing loss function for negative samples
The negative sample rejection loss function is defined as:
Figure BDA0003484953780000053
wherein Z represents the number of birds in image I,
Figure BDA0003484953780000054
indicating the possibility of prediction of J pixels when
Figure BDA0003484953780000055
When the number of the pixels is 1, the current pixel points are birds, when the number of the pixels is 0, the background is represented, gamma is a parameter for controlling the weight of the easily classified sample, and delta is a punishment parameter for reducing the proportion of the background around the birds;
the regional structure loss function is:
Figure BDA0003484953780000056
wherein E isnRepresenting the area where the nth bird in the predicted FIDT image is located; gnRepresenting the area where the nth bird in the real FIDT diagram is located; n represents the number of birds in the FIDT plot; n is a positive integer.
And judging whether the counting positioning chart of the bird group is correct or not according to the combined loss function.
As an embodiment of the present invention: the counting criteria include:
calculating the average absolute error and the mean square error according to the final feature map set;
taking the average absolute error as a first evaluation criterion; wherein the content of the first and second substances,
the first evaluation criterion is represented by the following formula:
Figure BDA0003484953780000061
taking the mean equation error as a second evaluation criterion; wherein the content of the first and second substances,
the second evaluation criterion is:
Figure BDA0003484953780000062
wherein T is the number of final feature maps; ciAnd respectively representing the number of birds predicted by the ith bird group image;
Figure BDA0003484953780000063
representing the true number of predictions of the ith bird group image; i is a positive integer;
and judging the counting accuracy probability based on the preset calculated average absolute error threshold value and the preset calculated mean square error threshold value.
As an embodiment of the present invention: the positioning criteria include:
respectively setting an accuracy threshold, a recall rate threshold and a comprehensive evaluation index threshold;
and based on the accuracy threshold, the recall rate threshold and the comprehensive evaluation index threshold, carrying out positioning performance evaluation on the unique counting feature map set to determine the positioning accuracy probability.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart of a method for identifying a light-weighted high-resolution bird group based on a deep learning network according to an embodiment of the present invention;
FIG. 2 is a block diagram of a bird swarm identification scene network in an embodiment of the invention;
FIG. 3 is a diagram illustrating the structure of ACGblock module and ACGtech module according to an embodiment of the present invention;
FIG. 4 is a diagram of a joint upsampling structure in an embodiment of the present invention;
FIG. 5 is a schematic representation of a sample bird flock data set in accordance with an embodiment of the present invention;
FIG. 6 is a diagram of FIDT according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
In the present invention:
HRNet is called High Resolution Network, and Chinese translates to High Resolution Network.
FIDT is called Focal Inverse Distance Transform, and Chinese is translated into focus Inverse Distance Transform;
ACGblock module: and the basic bird feature extraction module is designed based on the linear transformation mechanism of the asymmetric convolution and the redundant feature graph. The module is an improvement of a basic residual error module, and replaces 3-by-3 convolution in an original residual error module by using an asymmetric convolution fusion redundant feature map linear transformation mechanism. The improvement has the effects of effectively reducing the complexity of the HRNet network, reducing the number of parameters, enabling the network to obtain rich characteristic graphs, and achieving the purpose of not increasing the calculated amount by fusing asymmetric convolution kernels in the network reasoning stage.
ACGtech module: and the bird feature extraction module is designed based on the asymmetric convolution and redundant feature graph linear transformation mechanism. The module is an improvement of a residual module of a bottleneck structure, and replaces 3-by-3 convolution in an original residual module by using an asymmetric convolution fusion redundant feature map linear transformation mechanism. The improvement has the effects of effectively reducing the complexity of the HRNet network, reducing the number of parameters, enabling the network to obtain rich characteristic graphs, and achieving the purpose of not increasing the calculated amount by fusing asymmetric convolution kernels in the network reasoning stage.
Firstly, the invention mainly aims at the safety threat of birds in low-altitude areas of airports to take-off and landing of aircrafts and the problem that a large deep neural network is difficult to deploy on a platform with less resources. The invention provides a light-weight high-resolution bird group identification method based on a deep learning network, which is used for counting and positioning bird groups. The invention takes a high-resolution network (HRNet) as a basic network framework, enhances the extraction capability of the network to the characteristics of the bird group by using an asymmetric convolution and redundant characteristic graph linear transformation mechanism, and lightens the network. Furthermore, it is proposed herein to jointly use a negative sample rejection loss function for the training process of the supervised network. The method achieves balance on model performance and calculation complexity, and has higher accuracy and robustness.
Aiming at the problems of multiple HRNet network parameters and high operation complexity, the design introduces an Asymmetric Convolution (Asymmetric Convolution) and a redundant feature map linear transformation mechanism (GhostNet) to improve a residual error module in the HRNet network, and constructs a new Asymmetric Convolution residual error module ACGhost based on the redundant feature map transformation mechanism, wherein the module comprises an Asymmetric Convolution basic module ACGblock based on the redundant feature map transformation mechanism and an Asymmetric Convolution bottleneck module ACGbank based on the redundant feature map transformation mechanism. Specifically, the design replaces the 3 × 3 convolution in the residual module with an asymmetric convolution based on a redundant feature map transformation mechanism, wherein the convolution is constructed by the asymmetric convolution and the redundant feature map linear transformation mechanism. The improvement has the effects of effectively reducing the complexity of the HRNet network, reducing the number of parameters, enabling the network to obtain rich characteristic graphs, and achieving the purpose of not increasing the calculated amount by fusing asymmetric convolution kernels in the network reasoning stage.
Example 1:
as shown in fig. 1, a method for identifying a light-weighted high-resolution bird group based on a deep learning network includes:
constructing a bird group scene identification network consisting of four parallel subnetworks based on the HRNet network;
generating an FIDT (field-defined transformation) graph based on point-level annotation and a focusing inverse distance algorithm, and performing supervised training on the bird group scene recognition network through the FIDT graph;
importing an airport bird group image set into the bird group scene recognition network after supervision training, and acquiring a final characteristic image set by joint up-sampling;
and determining the unique counting positioning diagram of the bird group based on a preset counting criterion and a positioning criterion according to the final feature diagram set.
The principle of the above counting scheme is that: the main purpose of the method is to count and position the bird group, mainly to construct an accurate positioning prediction network, and mainly to solve the problems that the bird group is difficult to locate and count and a large deep neural network is difficult to deploy on a platform with less resources. Therefore, the invention carries out structural design through the HRNet network, and takes four parallel subnets as a main body of the model; the bird population counting and locating tasks are performed by this subject model as shown in fig. 1. And making a bird group data set through point-level annotation, and performing supervised training by using a focusing inverse distance variation algorithm, wherein annotation information of the point-level annotation comprises the coordinate position of birds in the image and the number of bird groups. Compared with a density map generated by using a self-adaptive Gaussian kernel in the prior population counting work, the FIDT map does not overlap example objects even in a dense area, and the accuracy of algorithm estimation can be improved. After training, the context information of the multi-scale features is fused through a combined up-sampling module to obtain a high-quality prediction density graph, namely a final feature graph, and a unique graph with the most accurate counting is determined through standard evaluation and positioning evaluation of the final feature graph to determine the position and the number of the bird groups.
The beneficial effect of above-mentioned counting scheme lies in:
according to the invention, the ACGblock module and the ACGtech module are constructed by combining the asymmetric convolution and the redundant feature graph linear transformation mechanism, so that the capability of extracting the flight attitude features of the bird group by a network is enhanced, abundant bird group features are obtained, and the model complexity is reduced.
The invention provides a new joint loss function for supervising the training process of a network, which restrains and predicts the consistency of an FIDT (Fidt finite field transformation) diagram and a ground real FIDT diagram, introduces a negative sample inhibition loss function into the joint loss function for reducing the influence of irrelevant backgrounds in images, so that the network focuses more on a bird swarm region in the learning process, and the network convergence speed is accelerated.
The invention solves the problems that the safety threat of birds to take off and landing of an aircraft in a low-altitude area of an airport is solved, and a large-scale deep neural network is difficult to deploy on a platform with less resources, and has better generalization performance and accurate bird swarm positioning and bird swarm counting in a more accurate, more stable model and more concise mode.
Example 2:
as an embodiment of the present invention: the parallel subnets comprise a first subnet, a second subnet, a third subnet and a fourth subnet; wherein the content of the first and second substances,
the resolution of the fourth sub-network is half of the resolution of the third sub-network;
the number of channels of the sub-network is doubled;
the four parallel subnetworks are used to generate feature maps of four different sizes.
The principle of the technical scheme is as follows: the invention is divided into four subnets, the resolution of each subnet is different, the number of channels is different, the output pictures are also different in size, and the loss in precision can be reduced through four different characteristic graphs. The specific structure of the invention is shown in figure 2, and the invention can effectively reduce the calculation complexity and the memory occupation by the arrangement of four subnets under the condition of not losing the model performance, and obtain the final characteristic diagram with high quality.
The beneficial effects of the above technical scheme are that: the invention can reduce the loss of precision through four different feature maps, and can effectively reduce the calculation complexity and the memory occupation through the arrangement of four subnets under the condition of not losing the model performance, thereby obtaining a high-quality final feature map.
Example 3:
as an embodiment of the present invention: the method further comprises the following steps:
constructing a residual error module of the bird group scene recognition network based on an asymmetric convolution and redundant feature map linear transformation mechanism; wherein the content of the first and second substances,
the residual error module comprises a plurality of ACGblock modules and a plurality of ACGneeck modules;
the first subnet is composed of 4 ACGtech modules;
the second sub-network is formed by a multi-resolution block;
the third sub-network is composed of four multi-resolution blocks;
the fourth sub-network is composed of three multi-resolution blocks;
the multi-resolution block is composed of four ACGblock modules.
The principle of the technical scheme is as follows: the ACGblock module and the ACGtech module are shown in figure 3, and the model constructed by the method obtains various pictures with different resolutions when the pictures are identified, namely, the feature maps with different sizes and resolutions are determined, and each picture can be divided step by step to obtain the pictures with different sizes and different resolutions.
The beneficial effects of the above technical scheme are that: by the method, the complexity of the HRNet network can be effectively reduced, the number of parameters is reduced, the network can obtain rich characteristic graphs, and the aim of not increasing the calculated amount is fulfilled by fusing asymmetric convolution kernels in a network reasoning stage.
Example 4:
as an embodiment of the present invention: the redundancy feature map linear transformation mechanism comprises:
generating a partial original feature map based on the convolution kernel of the common convolution layer;
and performing linear transformation on the partial original characteristic diagram through the characteristic channel convolution layer to generate a channel characteristic diagram.
The principle of the technical scheme is as follows: the convolution transformation of the picture is to obtain a better feature map, so that a part of original feature maps are generated by using fewer convolution kernels in a common convolution mode, and then the residual channel feature maps are generated by performing simple linear transformation on the generated channel feature maps.
It is represented by the following formula: theoretical acceleration ratios and parametric compression ratios achievable using the redundant feature map linear transformation mechanism are calculated as shown in equations (1) and (2):
(1)
Figure BDA0003484953780000121
(2)
Figure BDA0003484953780000122
wherein HinFor input of channel number, HoutFor the number of output channels, k is the convolution kernel size and S is the scaling factor.
The beneficial effects of the above technical scheme are that: the method can accelerate, and as the birds have changeable posture forms in the flying process, the network feature extraction capability is enhanced under the condition of limited computing resources, and the robustness of the network model on image overturning and rotation is improved.
Example 5:
as an embodiment of the present invention: the method further comprises the following steps:
performing interpolation upsampling on the feature maps of the four different sizes, wherein the interpolation upsampling comprises the following steps:
adjusting the four feature maps with different sizes into the same channel number for output;
and mapping the bird group features in the feature map to the same space, and determining multi-scale information of the airport bird group image set under four different sizes.
The principle of the technical scheme is as follows: as shown in fig. 4, fig. 4 is divided into four stages. As shown in fig. 4(a), the design goal is to map all input features into the same space, so as to achieve better fusion and reduce the computational complexity, then perform upsampling and concatenation on the generated feature map, and obtain the process as shown in fig. 4(b), then extract features from the feature map in parallel by using four expansion convolutions with expansion rates of 1, 2, 4 and 8, respectively, and finally perform parallel concatenation, as shown in fig. 4 (c). This joint use of multiple dilation convolution operations can extract multi-scale context information from the multi-level feature map, resulting in better performance.
The beneficial effects of the above technical scheme are that: the invention can effectively reduce the calculation complexity and the memory occupation under the condition of not losing the model performance, and obtain the final characteristic diagram with high quality.
Example 6:
as an embodiment of the present invention: the generating an FIDT graph based on the point-level annotation and focus inverse distance algorithm includes:
acquiring a marking tool;
performing point-level labeling on each image in a preset training data set through the labeling tool;
determining coordinate positions and bird group quantity according to the point-level labels, and generating a label training data set of the point-level labels;
and carrying out focusing transformation on the labeled training data set through a focusing inverse distance algorithm to generate an FIDT (Fidt finite Difference) graph.
The principle of the technical scheme is as follows: the invention is provided with a marking tool, the marking tool of the invention manually carries out point-level annotation on birds in each image, and annotation information comprises the coordinate positions of the birds in the image and the number of bird groups. And (3) generating a ground real FIDT graph by adopting a focusing inverse distance transformation algorithm for supervision training, wherein the FIDT graph has no overlap among example objects even in a dense area compared with a density graph generated by using an adaptive Gaussian kernel in the previous population counting work. The data set for labeling according to the invention is shown in FIG. 5, and the resulting TIDT image is shown in FIG. 6.
The beneficial effects of the above technical scheme are that: the above approach is mainly aimed at improving the accuracy of the algorithm estimation.
Example 7:
as an embodiment of the present invention: the generating an FIDT map based on the point-level annotation and focus inverse distance algorithm further comprises:
performing data expansion based on the FIDT graph; wherein the content of the first and second substances,
the data set extension includes: image scaling, image cropping, and image rotation;
after data expansion, a 3x3 pooling layer is constructed; the 3x3 pooling layer belongs to the largest pooling layer;
determining bird group position information in the FIDT graph according to the 3x3 pooling layer;
and setting an adaptive threshold according to the position information of the bird group in the FIDT image, and judging whether the point-level annotation is missed.
The principle of the technical scheme is as follows: in the training process, the situation that the data set is small may occur, and at this time, the data set should be enlarged, so the data set is expanded in a data enhancement mode in the model training process, including random scaling, shearing and rotation, a pooling layer can be constructed, and the mode of setting the threshold value ensures that the annotation result is more correct.
The beneficial effects of the above technical scheme are that: the invention can realize data remembering expansion and also can make the annotation more and the result more correct.
Example 8:
as an embodiment of the present invention: the method further comprises the following steps:
according to the final characteristic diagram, a combined loss function based on a negative sample inhibition loss function and a region structure loss function is established; wherein the content of the first and second substances,
the joint loss function is defined as:
Figure BDA0003484953780000141
wherein the content of the first and second substances,
Figure BDA0003484953780000151
for controlling the hyperparameter of the negative sample rejection loss function weight, L2For the function of the euclidean loss,
Figure BDA0003484953780000152
LI-Sas a loss function of the area structure, LmfSuppressing the loss function for the negative sample; e represents an FIDT graph generated by predicting the input ith bird group image I according to the network parameter theta; m represents the number of bird images; i isiRepresenting the ith input image of the bird group; θ represents a network parameter; giAnd (3) an actual FIDT graph representing the ith bird group image.
The negative sample rejection loss function is defined as:
Figure BDA0003484953780000153
wherein Z represents the number of birds in image I,
Figure BDA0003484953780000154
indicating the possibility of prediction of J pixels when
Figure BDA0003484953780000155
When the number of the pixels is 1, the current pixel points are birds, when the number of the pixels is 0, the background is represented, gamma is a parameter for controlling the weight of the easily classified sample, and delta is a punishment parameter for reducing the proportion of the background around the birds;
the regional structure loss function is:
Figure BDA0003484953780000156
wherein E isnRepresenting the area where the nth bird in the FIDT map is predicted to be located; gnRepresenting the area where the nth bird in the real FIDT diagram is located; n represents the number of birds in the real FIDT image; n is a positive integer.
LSThe loss of structural error commonly used in computers is defined as follows:
Figure BDA0003484953780000157
wherein, muEImage mean of FIDT graph representing network estimation; mu.sGImage mean representing a true FIDT plot; sigmaEGRepresenting the variance of the image between the estimated FIDT map and the real FIDT map; lambda [ alpha ]1Indicating that the constant value is set to 0.0001 to prevent zero division; lambda [ alpha ]2Indicating that the constant value is set to 0.0009 to prevent zero division;
Figure BDA0003484953780000158
representing a variance representing the estimated FIDT image;
Figure BDA0003484953780000159
the variance representing the true FIDT plot image is shown.
And judging whether the counting positioning chart of the bird group is correct or not according to the combined loss function.
The principle of the technical scheme is as follows: because in a bird group scene, high-density areas have a larger difference compared with low-density areas or local patterns and texture features of the background, the number of negative examples (background) is far larger than that of positive examples (head), and most of the negative examples (background) are easy to classify and account for most of the total loss, such a case that sample classes are unbalanced easily leads to wrong model optimization directions, and a satisfactory result is difficult to achieve. The invention therefore uses a joint loss function which serves to determine the degree of error.
The beneficial effects of the above technical scheme are that: the combined loss function is the combination of two loss functions, and the invention can judge whether the counting positioning is correct or not through the fusion of the loss functions.
Example 9:
as an embodiment of the present invention: the counting criteria include:
calculating the average absolute error and the mean square error according to the final feature map set;
taking the average absolute error as a first evaluation criterion; wherein the content of the first and second substances,
the first evaluation criterion is represented by the following formula:
Figure BDA0003484953780000161
taking the mean equation error as a second evaluation criterion; wherein the content of the first and second substances,
the second evaluation criterion is:
Figure BDA0003484953780000162
wherein T is the number of final feature maps; ciAnd respectively representing the number of birds predicted by the ith bird group image;
Figure BDA0003484953780000163
representing the real number of birds in the ith bird swarm image; i is a positive integer;
and judging the counting accuracy probability based on the preset calculated average absolute error threshold value and the preset calculated mean square error threshold value.
The principle of the technical scheme is as follows: mean Absolute Error (MAE) and Mean Square Error (MSE) are used herein as evaluation criteria. MAE can reflect model counting accuracy, and MSE can reflect model robustness.
The beneficial effects of the above technical scheme are that: the size of the example object can be accurately estimated, and the accuracy of the obtained training image is ensured.
Example 10:
as an embodiment of the present invention: the positioning criteria include:
respectively setting an accuracy threshold, a recall rate threshold and a comprehensive evaluation index threshold;
and evaluating the positioning performance of the unique counting feature map set based on the accuracy threshold, the recall threshold and the comprehensive evaluation index threshold, and determining the positioning accuracy probability.
The principle of the technical scheme is as follows: the comprehensive evaluation is to set an accuracy threshold, a recall threshold and a comprehensive evaluation index threshold by combining historical data, and then judge the accuracy of the result by judging the accuracy of the result.
The beneficial effects of the above technical scheme are that: by evaluating the positioning criteria under a plurality of different positioning criteria, it can be determined whether the obtained result is accurate.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A light-weight high-resolution bird group identification method based on a deep learning network is characterized by comprising the following steps:
constructing a bird group scene identification network consisting of four parallel subnetworks based on the HRNet network;
generating an FIDT (field-defined transformation) graph based on point-level annotation and a focusing inverse distance algorithm, and performing supervised training on the bird group scene recognition network through the FIDT graph;
importing an airport bird group image set into the bird group scene recognition network after supervision training, and acquiring a final characteristic image set by joint up-sampling;
and determining the unique counting positioning diagram of the bird group based on a preset counting criterion and a positioning criterion according to the final feature diagram set.
2. The method for identifying the high-resolution bird group based on the deep learning network with light weight as claimed in claim 1, wherein the parallel subnetworks comprise a first subnetwork, a second subnetwork, a third subnetwork and a fourth subnetwork; wherein the content of the first and second substances,
the resolution of the fourth sub-network is half of the resolution of the third sub-network;
the resolution of the third sub-network is half of the resolution of the second sub-network;
the resolution of the second sub-network is half of the resolution of the first sub-network;
the resolution of the parallel subnets is reduced by half in sequence; wherein the content of the first and second substances,
when the resolution of the subnet is reduced by half, the number of channels of the subnet is doubled;
the four parallel subnetworks are used to generate feature maps of four different sizes.
3. The method for lightweight, high-resolution bird population identification based on deep learning network of claim 2, further comprising:
constructing a residual error module of the bird group scene recognition network based on an asymmetric convolution and redundant feature map linear transformation mechanism; wherein the content of the first and second substances,
the parallel sub-network comprises a plurality of ACGblock modules and a plurality of ACGtech modules;
the first subnet is composed of 4 ACGtech modules;
the second sub-network is formed by a multi-resolution block;
the third sub-network is composed of four multi-resolution blocks;
the fourth sub-network is composed of three multi-resolution blocks;
the multi-resolution block is composed of four ACGblock modules.
4. The method of claim 2, wherein the redundant feature map linear transformation mechanism comprises:
generating a partial original characteristic diagram based on a convolution kernel of the common convolution layer;
and performing linear transformation on the partial original characteristic diagram through the characteristic channel convolution layer to generate a channel characteristic diagram.
5. The method for lightweight, high-resolution bird population identification based on deep learning network of claim 1, further comprising:
performing interpolation upsampling on the feature maps of the four different sizes, wherein the interpolation upsampling comprises the following steps:
adjusting the four feature maps with different sizes into the same channel number for output;
and mapping the bird group features in the feature map to the same space, and determining multi-scale information of the airport bird group image set under four different sizes.
6. The method of claim 1, wherein the generating FIDT maps based on point-level annotation and inverse distance-of-focus algorithm comprises:
acquiring a marking tool;
performing point-level labeling on each image in a preset training data set through the labeling tool;
determining coordinate positions and bird group quantity according to the point-level labels, and generating a label training data set of the point-level labels;
and carrying out focusing transformation on the labeled training data set through a focusing inverse distance algorithm to generate an FIDT (Fidt finite Difference) graph.
7. The method of claim 6, wherein the generating FIDT maps based on point-level annotation and inverse distance-of-focus algorithm further comprises:
performing data expansion based on the FIDT graph; wherein the content of the first and second substances,
the data set extension includes: image scaling, image cropping, and image rotation;
after data expansion, constructing a 3 × 3 pooling layer;
determining bird group position information in the FIDT graph according to the 3x3 pooling layer;
and setting an adaptive threshold according to the position information of the bird group in the FIDT image, and judging whether the point-level annotation is missed.
8. The method for lightweight, high-resolution bird population identification based on deep learning network of claim 1, further comprising:
according to the final characteristic diagram, a combined loss function based on a negative sample inhibition loss function and a region structure loss function is established; wherein the content of the first and second substances,
the joint loss function is defined as:
Figure FDA0003484953770000031
wherein the content of the first and second substances,
Figure FDA0003484953770000032
for controlling the hyperparameter of the negative sample rejection loss function weight, L2For the Europe Germany loss, LI-SAs a loss function of the area structure, LmfSuppressing the loss function for the negative sample;
the negative sample rejection loss function is defined as:
Figure FDA0003484953770000033
wherein Z represents the number of birds in image I,
Figure FDA0003484953770000041
indicating the possibility of prediction of J pixels when
Figure FDA0003484953770000042
When the number of the pixels is 1, the current pixel points are birds, when the number of the pixels is 0, the background is represented, gamma is a parameter for controlling the weight of the easily classified sample, and delta is a punishment parameter for reducing the proportion of the background around the birds;
the regional structure loss function is:
Figure FDA0003484953770000043
wherein E isnRepresenting the area where the nth bird in the FIDT map is predicted to be located; gnRepresenting the area where the nth bird in the real FIDT diagram is located; n represents the number of birds in the real FIDT image; n is a positive integer.
And judging whether the counting positioning chart of the bird group is correct or not according to the combined loss function.
9. The light weight based high resolution bird swarm identification deep learning network of claim 1, wherein the counting criteria comprises:
calculating the average absolute error and the mean square error according to the final feature map set;
taking the average absolute error as a first evaluation criterion; wherein the content of the first and second substances,
the first evaluation criterion is represented by the following formula:
Figure FDA0003484953770000044
taking the mean equation error as a second evaluation criterion; wherein the content of the first and second substances,
the second evaluation criterion is:
Figure FDA0003484953770000045
wherein T is the number of final feature maps; ciRepresenting the number of birds predicted by the ith bird swarm image;
Figure FDA0003484953770000046
representing the real number of birds in the ith bird swarm image; i is a positive integer;
and judging the counting accuracy probability based on the preset calculated average absolute error threshold value and the preset calculated mean square error threshold value.
10. The light weight based high resolution bird swarm identification deep learning network of claim 1, wherein the positioning criteria comprises:
respectively setting an accuracy threshold, a recall rate threshold and a comprehensive evaluation index threshold;
and based on the accuracy threshold, the recall rate threshold and the comprehensive evaluation index threshold, carrying out positioning performance evaluation on the unique counting feature map set to determine the positioning accuracy probability.
CN202210078169.3A 2022-01-24 2022-01-24 Lightweight high-resolution bird group identification method based on deep learning network Pending CN114419444A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210078169.3A CN114419444A (en) 2022-01-24 2022-01-24 Lightweight high-resolution bird group identification method based on deep learning network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210078169.3A CN114419444A (en) 2022-01-24 2022-01-24 Lightweight high-resolution bird group identification method based on deep learning network

Publications (1)

Publication Number Publication Date
CN114419444A true CN114419444A (en) 2022-04-29

Family

ID=81277798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210078169.3A Pending CN114419444A (en) 2022-01-24 2022-01-24 Lightweight high-resolution bird group identification method based on deep learning network

Country Status (1)

Country Link
CN (1) CN114419444A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115690448A (en) * 2022-11-09 2023-02-03 广东省科学院动物研究所 AI-based bird species identification method and device
CN116935310A (en) * 2023-07-13 2023-10-24 百鸟数据科技(北京)有限责任公司 Real-time video monitoring bird density estimation method and system based on deep learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115690448A (en) * 2022-11-09 2023-02-03 广东省科学院动物研究所 AI-based bird species identification method and device
CN116935310A (en) * 2023-07-13 2023-10-24 百鸟数据科技(北京)有限责任公司 Real-time video monitoring bird density estimation method and system based on deep learning

Similar Documents

Publication Publication Date Title
CN110675418B (en) Target track optimization method based on DS evidence theory
Yin et al. Detecting and tracking small and dense moving objects in satellite videos: A benchmark
CN111723654B (en) High-altitude parabolic detection method and device based on background modeling, YOLOv3 and self-optimization
CN109255286B (en) Unmanned aerial vehicle optical rapid detection and identification method based on deep learning network framework
CN110689562A (en) Trajectory loop detection optimization method based on generation of countermeasure network
CN109376637A (en) Passenger number statistical system based on video monitoring image processing
CN114419444A (en) Lightweight high-resolution bird group identification method based on deep learning network
CN109145747A (en) A kind of water surface panoramic picture semantic segmentation method
CN112818905B (en) Finite pixel vehicle target detection method based on attention and spatio-temporal information
CN115393635A (en) Infrared small target detection method based on super-pixel segmentation and data enhancement
Hong et al. Multitarget Real‐Time Tracking Algorithm for UAV IoT
CN111553337A (en) Hyperspectral multi-target detection method based on improved anchor frame
Zhang et al. Contextual squeeze-and-excitation mask r-cnn for sar ship instance segmentation
Shi et al. Complex optical remote-sensing aircraft detection dataset and benchmark
CN112529917A (en) Three-dimensional target segmentation method, device, equipment and storage medium
CN116580324A (en) Yolov 5-based unmanned aerial vehicle ground target detection method
CN116129234A (en) Attention-based 4D millimeter wave radar and vision fusion method
CN113902947B (en) Method for constructing air target infrared image generation type countermeasure network by natural image
Zeng et al. Drone detection based on multi-scale feature fusion
CN112069997B (en) Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net
CN115471695A (en) Airplane radio frequency signal multi-task individual identification method based on signal-to-image multi-mode fusion
CN112233079B (en) Method and system for fusing images of multiple sensors
Zhang et al. Accurate Detection and Tracking of Small‐Scale Vehicles in High‐Altitude Unmanned Aerial Vehicle Bird‐View Imagery
Zou et al. A Survey of Deep Learning Target Detection Algorithms under Candidate Regions
Zhu et al. Research on Violence Detection Algorithm based on Multi-UAV

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination