CN113420759A

CN113420759A - Anti-occlusion and multi-scale dead fish identification system and method based on deep learning

Info

Publication number: CN113420759A
Application number: CN202110653176.7A
Authority: CN
Inventors: 杨明东; 张先奎; 陈静; 杨勇; 周红坤; 杨飞
Original assignee: No 750 Test Field of China Shipbuilding Industry Corp
Current assignee: No 750 Test Field of China Shipbuilding Industry Corp
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2021-09-21
Anticipated expiration: 2041-06-11
Also published as: CN113420759B

Abstract

The invention discloses an anti-occlusion and multi-scale dead fish identification system and method based on deep learning, aiming at the difficulties of complex background, large scale change, occlusion and the like of dead fish identification, a FasterRCNN identification model is improved to adapt to a dead fish identification scene of a complex underwater environment. Firstly, carrying out contrast-limited adaptive histogram equalization processing on an image to be identified, and enhancing the local contrast of a target; secondly, designing a multi-scale feature enhancement module to solve the problem of dead fish identification in a large-scale range, designing an anti-blocking module based on an attention mechanism, highlighting a target area, and eliminating background interference such as other noises; finally, a rotating rectangular frame is used for representing a dead fish target, and the identification precision of the dead fish in a dense scene is greatly improved.

Description

Anti-occlusion and multi-scale dead fish identification system and method based on deep learning

Technical Field

The invention relates to the technical field of application of image recognition, in particular to an underwater dead fish recognition technology, and specifically relates to an anti-occlusion and multi-scale dead fish recognition system and method based on deep learning.

Background

In the process of fish culture, because bacteria and parasites exist in culture water or the oxygen concentration is insufficient due to poor water mobility caused by overlarge culture density, fish death can inevitably occur in the culture process. The dead fish firstly sinks to the water bottom after death, and the internal organs ferment to generate gas and then float to the water surface, and in the process, the movement of the dead fish in the cultivation net cage can contact with other live fish or be eaten by the live fish to cause germ diffusion. In order to prevent the problem, a dead fish identification method is urgently needed to be designed, so that information such as the position and the number of the dead fish before floating can be timely given, and a basis is provided for the advanced cleaning and collection of the following dead fish.

The traditional dead fish identification method usually adopts manual intervention, but the intelligent level is low, and manpower and material resources are greatly increased. With the rapid development of artificial intelligence and deep learning, the application of image target recognition technology is wider and wider, and classic recognition models such as fast RCNN and YOLO appear, however, the methods are designed for the water environment. A dead fish identification method and an early warning system based on a deep convolutional neural network directly transfer an original fast RCNN model to underwater dead fish identification. The method and the system for monitoring the cultured fish based on image recognition directly apply the YOLO model to fish recognition. The method directly applies the existing models, does not consider and pertinently solve the following four practical problems and factors, and causes low identification accuracy: (1) the underwater image generally has the characteristics of low contrast, low brightness and the like, has a great difference with the overwater image, and cannot be directly transferred; (2) after the dead fish sinks to the water bottom, intra-class shielding between the dead fish or between the dead fish and live fish can occur, meanwhile, the dead fish can be interfered by aquatic plants and the like, inter-class shielding occurs, and the extracted identification features are easily polluted; (3) due to the difference of the category, the volume and the like of the dead fish, the scale change range of the dead fish is large; (4) the fast RCNN and other methods all adopt horizontal rectangular frames to represent the target position, however, because the dead fish body states are generally distributed in different directions, the horizontal rectangular frames contain a large amount of useless background information, and in a dense scene, the rectangular frames are overlapped and are easy to be eliminated in a post-processing stage.

Disclosure of Invention

In order to solve the defects and defects of the prior art, through research and development and design, the applicant provides a dead fish identification method based on deep learning, a multi-scale feature enhancement module is designed to effectively solve the problem of multi-scale dead fish identification, an anti-blocking module based on an attention mechanism is further designed, the foreground is effectively highlighted, background interference is inhibited, and in addition, the identification effect is better in a dense scene through predicting a rotating rectangular frame representing the position of dead fish.

Specifically, the invention is realized by the following steps: an anti-occlusion and multi-scale dead fish identification system based on deep learning, comprising:

the system comprises a dead fish identification data set module, a dead fish identification data set module and a test set, wherein the dead fish identification data set module is used for acquiring and storing underwater images shot underwater and constructing a dead fish identification training set, a verification set and a test set;

the image feature extraction module is used for preprocessing the underwater image and extracting the bottom edge and the high-level abstract feature map of the image;

and the multi-scale feature strengthening module is used for improving the multi-scale expression capability of the features in a mode of connecting a plurality of feature pyramids in series. Generating a candidate frame to represent the foreground target, and extracting and fusing the characteristics of the candidate frames in the regions with different scales in a self-adaptive manner;

the anti-blocking module is used for learning the mask of the foreground object through an attention mechanism, fusing the mask with the candidate frame characteristics, inhibiting the interference of the background and obtaining the anti-blocking characteristics,

and the dead fish target identification module is used for regression representing the rotating rectangular frame of the dead fish position by taking the candidate frame characteristics as an initial starting point and combining the anti-blocking characteristics and the full connecting layer, and performing dead fish identification classification to complete the dead fish target identification.

Meanwhile, based on the system, the invention also discloses an anti-occlusion and multi-scale dead fish identification method based on deep learning,

step S1, acquiring underwater images, establishing a dead fish identification data set for training and testing,

step S2, line image preprocessing, namely inputting the image into an image feature extraction module, and extracting the bottom edge and the high-level abstract feature map of the image;

step S3, the multi-scale feature strengthening module improves the multi-scale expression capability of the features in a mode of connecting a plurality of feature pyramids in series to obtain multi-scale features;

step S4, generating a horizontal rectangular region candidate box from the multi-scale features, and designing a self-adaptive ROIAlign algorithm for self-adaptively extracting and fusing the features of the region candidate boxes with different scales;

step S5, learning the mask of the foreground object by the region candidate frame characteristics through an attention mechanism, and fusing the mask with the candidate frame characteristics to inhibit the interference of the background;

and step S6, taking the horizontal rectangular area candidate frame as an initial starting point, combining the feature and the full connecting layer after the anti-blocking branch is fused, regressing and representing the rotating rectangular frame at the dead fish position, classifying the dead fish, and completing the dead fish target identification.

The working principle and the beneficial effects of the invention are introduced as follows: and shooting an underwater image by using an underwater camera, and making a dead fish identification data set for training and testing. Meanwhile, an image preprocessing module is designed to provide high-quality and clear images to be identified for the subsequent steps; inputting the image into a ResNet feature extraction module, and extracting the bottom layer edge and the high-layer abstract feature map of the image; after the characteristic diagram is output, a multi-scale characteristic strengthening module is designed, and the multi-scale expression capability of the characteristics is improved in a mode of connecting a plurality of characteristic pyramids in series, so that the problem that the scale change range of the dead fish is large and difficult to identify is solved; inputting the multi-scale features into an RPN network, generating a horizontal rectangular region candidate frame (ROI) which represents a foreground target (dead fish or live fish), and designing a self-adaptive ROIAlign algorithm for self-adaptively extracting and fusing the candidate frame features with different scales; inputting the region candidate frame characteristics into an anti-blocking branch, learning the mask of a foreground target through an attention mechanism, fusing the mask with the candidate frame characteristics, inhibiting the interference of a background and achieving the purpose of anti-blocking; and (3) taking the generated horizontal candidate frame as an initial starting point, combining the feature and the full connecting layer after the anti-blocking branch is fused, regressing and representing the rotating rectangular frame at the dead fish position, classifying the dead fish, and completing the target identification of the dead fish. The image feature extraction module comprises ResNet, VGG, MobileNet or Efficient general series network, and can be selected according to the real-time processing requirement of the system. The generated candidate box comprises an RPN network and an adaptive ROIAlign algorithm unit; firstly, the confidence degrees of the candidate frames output by the RPN are sorted, so that the frames with low confidence degrees are removed, and then subsequent NMS operation is carried out to improve the post-processing speed of the algorithm;

aiming at the difficulties of complex background, large scale change, shielding and the like of dead fish identification, the fast RCNN identification model is improved to adapt to a dead fish identification scene of a complex underwater environment. Before the identification processing, the system and the method design a self-adaptive histogram equalization algorithm with limited contrast to carry out image preprocessing so as to improve image detail information and local contrast and improve the identification accuracy; and the multi-scale feature strengthening module extracts rich multi-scale information by connecting a plurality of feature pyramids in series, designs a cross-stage connection unit between the feature pyramids to reuse features, and solves the problem of gradient disappearance caused by network deepening. And finally accumulating the pyramid layers with the same step length and outputting the strengthened characteristic pyramid. The adaptive ROIAlign is different from the standard ROIAlign in that the features are extracted only in a single feature pyramid layer, and can adaptively pool and fuse the features of different feature layers, so that small targets and large targets can share low-layer and high-layer information. And the anti-shielding module is used for learning a mask of the candidate region in a weak supervision mode by combining an attention mechanism and the marked rectangular frame, wherein the mask emphasizes a target visual region and inhibits interference such as noise of a shielding region, so that the extracted features pay more attention to the foreground target, and meanwhile, the features are stronger in discriminability and have stronger anti-shielding capability. The rotating rectangular frame is used for positioning the dead fish target, and the dead fish position can be framed out without overlapping in a dense scene. Meanwhile, the interference of a large number of similar background areas is eliminated, and the training robustness is improved.

Firstly, carrying out contrast-limited adaptive histogram equalization processing on an image to be identified, and enhancing the local contrast of a target; secondly, designing a multi-scale feature enhancement module to solve the problem of dead fish identification in a large-scale range, designing an anti-blocking module based on an attention mechanism, highlighting a target area, and eliminating background interference such as other noises; finally, a rotating rectangular frame is used for representing a dead fish target, and the identification precision of the dead fish in a dense scene is greatly improved.

Drawings

FIG. 1 is a diagram of a dead fish identification model;

FIG. 2 is a schematic diagram of a data set annotation;

FIG. 3 is a schematic diagram of the multi-scale feature enhancement module of FIG. 1;

FIG. 4 is a schematic diagram of the feature pyramid construction shown in FIG. 3;

FIG. 5 is a schematic diagram of the cross-phase connection unit of FIG. 3;

FIG. 6 is a schematic diagram of the adaptive ROIAlign of FIG. 1;

FIG. 7 is a schematic view of the anti-blocking module of FIG. 1;

FIG. 8 is a rotating rectangular box representation method.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

Example 1: the invention provides an anti-occlusion and multi-scale dead fish identification method based on deep learning, which is specifically shown in a flow chart 1 and comprises the following steps:

step 1, shooting an underwater image by using an underwater camera, and making a dead fish identification data set for training and testing. Meanwhile, an image preprocessing module is designed to provide high-quality and clear images to be identified for subsequent steps.

And 2, inputting the image into a ResNet-50 feature extraction module, and extracting the bottom layer edge and the high-layer abstract feature map of the image.

And 3, after the characteristic diagram is output in the step 2, designing a multi-scale characteristic strengthening module, and improving the multi-scale expression capability of the characteristics in a mode of connecting a plurality of characteristic pyramids in series so as to relieve the problem that the scale change range of the dead fish is large and difficult to identify.

And 4, inputting the multi-scale features output in the step 3 into an RPN (resilient packet network), generating a horizontal rectangular region candidate frame (ROI) to represent a foreground target (dead fish or live fish), and designing a self-adaptive ROIAlign algorithm for self-adaptively extracting and fusing the features of the candidate frames with different scales.

And 5, inputting the region candidate frame characteristics into the anti-blocking branch, learning the mask of the foreground target through an attention mechanism, and fusing the mask with the candidate frame characteristics to inhibit the interference of the background and achieve the purpose of anti-blocking.

And 6, taking the horizontal candidate frame generated in the step 4 as an initial starting point, combining the feature and the full connecting layer after the anti-blocking branches are fused, regressing and representing the rotating rectangular frame at the dead fish position, classifying the dead fish, and completing the dead fish target identification.

The step 1 comprises the following steps:

step 1-1, shooting a large number of underwater images by using an underwater camera, and dividing a data set into a training set, a testing set and a verification set according to the ratio of 6:3:1 after dead fish/live fish target labeling. Further, as shown in fig. 2, the labeling is first performed by using a solid line rotation rectangle box in the figure, which is represented by four vertices (x1, y1, x2, y2, x3, y3, x4, y4) according to the specific position and size of the dead fish. Further, the horizontal rectangular box (x0, y0, xM, yM) is used for training of the RPN network, and is generated by:

x₀＝min(x₁,x₄),y₀＝min(y₁,y₄),x_M＝max(x₂,x₃),y_M＝max(y₂,y₃)

step 1-2, training process data preprocessing: firstly, carrying out normalization processing on a training image, then carrying out random scale scaling, randomly cutting out image blocks with a certain size, and finally completing preprocessing after random horizontal turning and random contrast limited adaptive histogram equalization for subsequent training.

Step 1-3, preprocessing test process data: firstly, carrying out image normalization, then carrying out self-adaptive histogram equalization with limited random contrast and then completing pretreatment.

In the step 2, a ResNet-50 network is used to perform feature extraction on the input image data, so as to obtain five feature maps of Res1, Res2, Res3, Res4 and Res5 shown in fig. 1, wherein the number of channels is 128, 256, 512, 1024 and 2048, and the step size is 2, 4,8,16 and 32.

The multi-scale feature enhancement module in step 3 is shown in fig. 3, and includes the following steps:

step 3-1, generating a characteristic pyramid: the generation method is as shown in fig. 4, firstly, input features of Res2, Res3, Res4 and Res5 are convolved by 1 × 1 in a lateral connection, the number of channels is reduced to 256, and bilinear interpolation is adopted to perform 2 times of upsampling on high-level low-resolution features; then, pixel-by-pixel addition is carried out on the up-sampled features and the down-channel features; and finally, fusing and adding the feature maps by using 3 x3 convolution to obtain the current reinforced features, and repeating the steps for 3 times to generate a feature pyramid consisting of 4 layers of features.

Step 3-2, generating a multi-stage feature pyramid: and (3) taking the output of the pyramid generated in the previous stage as the input of the pyramid of the next stage, then constructing the pyramid of the next stage according to the mode of the step 3-1, and generating the multi-stage feature pyramid in a series connection mode.

And 3-3, connecting the multi-stage feature pyramids through a cross-stage connecting unit, wherein the specific schematic diagram of the unit is shown in fig. 5. Firstly, 1 × 1 convolution is used, channel dimensionality reduction is carried out on the features of the previous stage respectively, and then the channel dimensionality reduction and the features of the current stage are accumulated after downsampling to form new features of the features. And the spatial attention mask generating unit is used for generating a training mask label in a weak supervision mode through a marked rectangular box.

Step 3-4, the multi-scale feature generation process is shown in fig. 3, n stage feature pyramids are represented as { P1, P2, …, Pn }, each pyramid is composed of 4 layers of features { C1, C2, C3, C4}, the step sizes are 4,8,16,32, respectively, and the feature layers with the same step size of each pyramid are accumulated pixel by pixel to form multi-scale feature outputs { F1, F2, F3, F4 }. Generally, the larger the n value is set, the stronger the processing performance is, and in consideration of the problem of the calculation amount, n may be set to 2, and if the processing accuracy is to be improved, the larger n value may be set.

The step 4 is to generate a region level candidate frame and extract features, and a specific process thereof is shown in fig. 6, and includes the following steps:

and 4-1, generating horizontal candidate boxes by using the RPN network, wherein the candidate boxes may contain dead fish or live fish. The method is different from the method of directly filtering redundant candidate frames by using an NMS mode in fast RCNN, but ranks the confidence degrees of the candidate frames firstly, and takes 15000 as the top, then carries out subsequent NMS operation, and reserves 2000 candidate frames, and takes 5000 as the top to carry out NMS and reserves 200 frames in the test stage. The other links of the RPN are consistent with the Faster RCNN.

Step 4-2, the fast RCNN allocates candidate frames to { F1, F2, F3, F4} by using the sizes of the candidate frames, and then extracts features corresponding to the candidate frames by using a 7 × 7 pooling core roiign module.

Step 4-3, different from the feature conversion of the features by the fast RCNN through the full connection layer, the invention uses the convolution of 3 multiplied by 3 to process in order to keep the feature space information.

And 4-4, performing maximum value fusion on the processed features, and finally outputting feature dimensions of 7 multiplied by 256.

The step 5 is the construction of an anti-occlusion branch, and the specific process is shown in fig. 7:

step 5-1, a spatial attention mask generation test phase. And (3) performing two 3 × 3 convolutions on the features output in the step (4), performing nonlinear activation on each convolution by using a ReLU, converting the features into feature graphs of 1 channel by using 1 × 1 convolution, and finally converting the output value to a value between 0 and 1 by using a Sigmoid function.

And 5-2, generating a training phase by the spatial attention mask. In order to train the mask, the mask label is calculated by using a weak supervision mode, specifically, pixels inside a marked rectangular box are marked as 1, and pixels outside the marked rectangular box are marked as 0. In addition, the invention uses a binary cross entropy loss function to supervise the training process of the mask, and if the label of the mask is ci and the predicted mask is pi, the mask generates the loss L_maskComprises the following steps:

where N represents the number of pixels, i.e., N is 7 × 7 is 49.

And 5-3, weighting the generated mask and the input features, specifically, multiplying the generated mask and the input features pixel by pixel to highlight the non-occluded target part, inhibit the interference of other non-target areas, enable the features to be more discriminative and achieve the purpose of anti-occlusion.

Step 6 is to use the horizontal candidate frame generated in step 4 as a starting point to carry out category prediction and rotating rectangular frame regression on the dead fish target, and the specific process is as follows:

and 6-1, predicting the target of the dead fish. In the invention, the target categories are dead fish, live fish and background 3 categories, and during training, the category prediction adopts a cross entropy loss function. During testing, after softmax probability normalization and a maximum function are carried out, the category and the classification confidence coefficient are predicted.

And 6-2, rotating the dead fish to return in a rectangular frame. The diagram of the rotating rectangular frame is shown in fig. 8 and is represented by the form (X, y, w, h, θ), where (X, y) is the center point of the rotating rectangular frame, (w, h) is the width and height of the rotating rectangular frame, and θ represents the angle of a certain side of the rotating frame at an acute angle with the X axis, and the side is defined as w, and in this case, θ ∈ [ -pi/2, 0). Furthermore, the starting point of the rotating rectangular frame regression is the horizontal candidate frame for RPN prediction, which is encoded as:

wherein (x)₀,y₀,w₀,h₀,θ₀) Is a horizontal candidate box, and θ₀And (2) respectively representing a predicted value and a true value of the rotating frame, wherein the values of the other parameters (y, w, h and theta) are consistent. Predict values ti and true values t_iFor loss between

And (3) calculating:

when testing, the predicted value needs to be decoded, and the decoding rule is as follows:

and 6-3, obtaining a detection result of the rotating frame after decoding, and removing the redundant detection frame by using the rotating NMS to obtain a final dead fish identification result.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims

1. An anti-occlusion and multi-scale dead fish recognition system based on deep learning is characterized by comprising:

the multi-scale feature enhancement module is used for improving the multi-scale expression capability of the features in a mode of connecting a plurality of feature pyramids in series; generating a candidate frame to represent the foreground target, and extracting and fusing the characteristics of the candidate frames in the regions with different scales in a self-adaptive manner;

and the anti-blocking module is used for learning the mask of the foreground target through an attention mechanism, fusing the mask with the candidate frame characteristics, and inhibiting the interference of the background to obtain the anti-blocking characteristics.

2. The anti-occlusion and multi-scale dead fish identification system of claim 1, wherein: the dead fish identification data set module comprises three categories of data of dead fish, live fish and background fish, and is divided into three data sets of a training set, a verification set and a test set; the image preprocessing module is also used for designing a contrast-limited adaptive histogram equalization algorithm to carry out image preprocessing so as to improve image detail information and local contrast; the image feature extraction module comprises ResNet, VGG, MobileNet or Efficient general series network, and can be selected according to the real-time processing requirement of the system.

3. The anti-occlusion and multi-scale dead fish identification system of claim 1, wherein: the multi-stage feature pyramid generation mode is that firstly, the output features of the image feature extraction module are convoluted by 1 x1 in a lateral connection mode, the number of channels is reduced to 256, and 2 times of up-sampling is carried out on the high-level low-resolution features by using bilinear interpolation; then, pixel-by-pixel addition is carried out on the up-sampled features and the down-channel features; and finally, fusing and adding the feature maps by using 3 x3 convolution to obtain the current reinforced features, and repeating the steps for 3 times to generate a single feature pyramid consisting of 4 layers of features.

4. The anti-occlusion and multi-scale dead fish identification system of claim 3, wherein: the multi-stage feature pyramid is formed by connecting a plurality of feature pyramids in series, the output of the feature pyramid generated in the previous stage is used as the input of the feature pyramid in the next stage, the previous feature pyramid expresses shallow features, the subsequent feature pyramid is a deep feature, and each feature pyramid contains rich multi-scale information;

the cross-stage connecting unit transmits the characteristics of the previous stage to the next stage, so that the current characteristics can fully reuse the prior knowledge before, and the characteristic expression capability is enhanced;

the multi-scale feature generation process is that n multi-stage feature pyramids are assumed to be expressed as { P¹,P²,…,PⁿEach pyramid consists of 4 layers of features C¹,C²,C³,C⁴The step lengths are respectively 4,8,16 and 32, the feature layers with the same step length of each pyramid are accumulated pixel by pixel, and the multi-scale feature { F is output¹,F²,F³,F⁴}; where n is set to 2 or a larger value of n.

5. The anti-occlusion and multi-scale dead fish identification system of claim 1, wherein: the generated candidate box comprises an RPN network and an adaptive ROIAlign algorithm unit; firstly, the confidence degrees of the candidate frames output by the RPN are sorted, so that the frames with low confidence degrees are removed, and then subsequent NMS operation is carried out to improve the post-processing speed of the algorithm; the self-adaptive ROIAlign algorithm unit is used for self-adaptively pooling the characteristics of different characteristic layers and fusing the characteristics, and firstly, all the generated candidate frames are mapped to the multi-scale characteristic layer { F ] generated in the step 3¹,F²,F³,F⁴And performing ROIAlign operation with 7 × 7 pooling kernels, and mapping the generated features to complete fusion.

6. The anti-occlusion and multi-scale dead fish identification system of claim 1, wherein: the anti-blocking module comprises a space attention mask generation unit and a feature weighting unit, wherein the space attention mask generation unit is used for extracting 7 multiplied by 256 features of the candidate box as input, mapping the feature map into 7 multiplied by 1 dimensions after two 3 multiplied by 3 convolutions and one 1 multiplied by 1 convolution, and finally outputting a foreground target probability map with a value between [0,1] after a Sigmoid activation function; the characteristic weighting unit is used for multiplying the probability graph and the original input characteristic pixel by pixel and then outputting the result as a branch;

and the spatial attention mask generating unit is used for generating a training mask label in a weak supervision mode through a marked rectangular box.

7. The anti-occlusion and multi-scale dead fish identification system of claim 1, wherein: the rotating rectangular frame is represented by the form (X, y, w, h, theta), wherein (X, y) is the center point of the rotating rectangular frame, (w, h) is the width and height of the rotating rectangular frame, and theta represents the angle of a certain side of the rotating rectangular frame, which forms an acute angle with the X axis, and is defined as w, and at this time, theta is formed by [ -pi/2, 0).

8. The anti-occlusion and multi-scale dead fish identification system of claim 7, wherein: the rotating rectangular frame performs regression by taking the horizontal candidate frame as a starting point, and the encoding mode is as follows:

wherein (x)₀,y₀,w₀,h₀,θ₀) Is a horizontal candidate box, and θ₀The method comprises the following steps that (1) as-pi/2, x and x' are a predicted value and a true value of a rotating frame respectively, and other parameters (y, w, h and theta) have the same meanings; smooth is used between predicted value t and true value t_L1The loss is calculated.

9. An anti-occlusion and multi-scale dead fish identification method based on deep learning is characterized by comprising the following steps:

10. The anti-occlusion and multi-scale dead fish identification method of claim 9, wherein: in step S2, an adaptive histogram equalization algorithm with limited contrast is designed to perform image preprocessing to improve image detail information and local contrast;

in step S3, accumulating pyramid layers of the same step size and outputting a reinforced feature pyramid;

in step S4, the roiign algorithm may adaptively pool and fuse the features of different feature layers, so that both the small target and the large target can share the low-layer and high-layer information.