CN115546668A

CN115546668A - Marine organism detection method and device and unmanned aerial vehicle

Info

Publication number: CN115546668A
Application number: CN202211253998.7A
Authority: CN
Inventors: 任雪峰; 罗巍
Original assignee: Beijing Zhuoyi Intelligent Technology Co Ltd
Current assignee: Beijing Zhuoyi Intelligent Technology Co Ltd
Priority date: 2022-10-13
Filing date: 2022-10-13
Publication date: 2022-12-30

Abstract

The disclosure provides a marine organism detection method and device and an unmanned aerial vehicle. The detection of marine organisms comprises: acquiring a marine environment image captured by image acquisition equipment; generating a prediction box of a target marine organism in the marine environment image and a pixel mask of the target marine organism within the prediction box by using a machine learning model; and determining the biological characteristics of the target marine life according to the pixel mask of the target marine life. According to various embodiments provided by the present disclosure, marine organisms are identified and biological features are extracted through a machine learning model, thereby realizing dynamic monitoring of the marine organisms.

Description

Detection method and device for marine organisms and unmanned aerial vehicle

Technical Field

The present disclosure relates generally to the field of artificial intelligence technology, and more particularly, to a method and an apparatus for detecting marine organisms, and an unmanned aerial vehicle.

Background

This section is intended to introduce a selection of aspects of the art, which may be related to various aspects of the present disclosure that are described and/or claimed below. This section is believed to be helpful in providing background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.

Unmanned aerial vehicle remote sensing research on marine mammals such as whales mainly focuses on abundance, distribution and physical condition assessment. Conventional survey and monitoring methods present many challenges: while high resolution data can be collected during ship-based field work and photo recognition, this approach is limited by sea conditions and does not effectively cover a larger space; the existence of man-machine or satellite can collect data rapidly in a large range, but both are limited by weather conditions, so that the manned aerial photography is expensive and the marine environment can bring risks to pilots; the technical requirements of scuba investigation on divers are high, and researchers are required to know the undercurrent and prevent potential attacks by wild species; although effective in monitoring, the marker tracking method is invasive, particularly vulnerable and irrecoverable to sensitive and vulnerable ecosystems (such as mangrove forest and coral reef) in the ocean, and may cause injury and behavior change to whales.

The unmanned aerial vehicle remote sensing technology overcomes the challenges, and the unmanned aerial vehicle is easy to operate, more flexible than other methods, reasonable in price, high in efficiency, capable of preventing potential harm to people caused by dangerous animals and environments, capable of collecting information from places which are difficult to reach, and capable of reducing interference to the greatest extent. With the help of unmanned aerial vehicle remote sensing, besides visual information such as animal existence, distribution and behaviors is obtained, the shape measurement of wild animals is allowed to be carried out by researchers under the condition of not capturing individuals. Thus, animal weight, size, health and population statistics can be obtained, which helps to provide more complete information for the protection and management of marine animals.

The existing unmanned aerial vehicle wildlife investigation method mostly acquires real-time information from the surrounding environment through a camera or a sensor, simultaneously synchronously pushes the real-time information to a ground station, and the ground station transmits the acquired video or image information to a special image analysis server (workstation) for subsequent analysis. The above process is considered as a resource intensive task, the model network is complex, the parameters are many, the efficiency is low, and the real-time performance depends on the bandwidth and stability of the transmission network.

Disclosure of Invention

The purpose of the disclosure is to provide a marine organism detection method and device and an unmanned aerial vehicle, so as to realize dynamic monitoring of marine organisms.

According to a first aspect of the present disclosure, there is provided a method of detecting marine organisms, comprising: acquiring a marine environment image captured by image acquisition equipment; generating a prediction box of a target marine organism in the marine environment image and a pixel mask of the target marine organism within the prediction box using a machine learning model; and determining the biological characteristics of the target marine organism according to the pixel mask of the target marine organism.

In some embodiments, the machine learning model is a Mask R-CNN neural network model.

In some embodiments, the backbone network of the Mask R-CNN neural network model comprises a 101-layer ResNet network and a feature pyramid network.

In some embodiments, the backbone network of the Mask R-CNN neural network model comprises 50 layers of ResNext networks and feature pyramid networks.

In some embodiments, the feature pyramid network further comprises inverse branches in the form of bottom-up and lateral connections.

In some embodiments, the backbone network further comprises a convolution attention module.

In some embodiments, the convolution attention module is a channel and space dual attention module.

In some embodiments, the channel and space dual Attention Module is a CBAM (Convolitional Block Attention Module) Module, and a channel Attention Module of the CBAM Module is an ECA-Net Module.

In some embodiments, the marine organism is whale and the biometric characteristic is length.

In some embodiments, said determining a biometric characteristic of said target marine organism from a pixel mask of said target marine organism comprises: extracting the pixel mask and determining a main axis by a principal component analysis method; measuring a target pixel along the principal axis; and determining the length of the target marine organism according to the target pixel.

According to a second aspect of the present disclosure, there is provided a detection apparatus for marine organisms, comprising: the acquisition module is configured to acquire a marine environment image captured by the image acquisition equipment; a generation module configured to generate a prediction box of a target marine organism in the marine environment image and a pixel mask of the target marine organism within the prediction box using a machine learning model; and a determination module configured to determine a biometric characteristic of the target marine organism from the pixel mask of the target marine organism.

According to a third aspect of the present disclosure, there is provided a drone, comprising: the image acquisition equipment is used for acquiring images in real time; and a processor for performing the method according to any one of the first aspect of the present disclosure from the image acquired by the image acquisition device.

According to various embodiments provided by the present disclosure, marine organisms are identified and biological features are extracted through a machine learning model, thereby realizing dynamic monitoring of the marine organisms.

It should be understood that the statements herein are not intended to identify key or essential features of the claimed subject matter, nor are they intended to be used as an aid in determining the scope of the claimed subject matter, alone.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only the embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

FIG. 1 illustrates an example artificial neural network.

Fig. 2 shows a structural schematic of one example of a machine learning model according to the present disclosure.

Fig. 3 shows an improvement to the feature network in the backbone network of Mask R-CNN.

FIG. 4 shows an improvement to the Feature Pyramid Network (FPN) in the backbone network of Mask R-CNN.

Fig. 5 shows another improvement to the feature network in the backbone network of Mask R-CNN.

Fig. 6 shows one example of a detection method of marine organisms according to the present disclosure.

Fig. 7 shows a flow chart schematic of one example of a method of detection of marine organisms according to the present disclosure.

Fig. 8 shows a schematic configuration diagram of one example of the detection apparatus of marine organisms according to the present disclosure.

Fig. 9 shows a schematic structural diagram of an unmanned aerial vehicle provided in an embodiment of the present invention.

Detailed Description

The present disclosure will be described more fully hereinafter with reference to the accompanying drawings. The present disclosure may, however, be embodied in many alternate forms and should not be construed as limited to the embodiments set forth herein. Accordingly, while the disclosure is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intention to limit the disclosure to the specific forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure as defined by the appended claims.

It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the teachings of the present disclosure.

Some examples are described herein in connection with block diagrams and/or flowchart illustrations, where each block represents a circuit element, module, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in other implementations, the functions noted in the blocks may occur out of the order noted. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Reference herein to "according to.. Examples" or "in.. Examples" means that a particular feature, structure, or characteristic described in connection with the examples can be included in at least one implementation of the present disclosure. The appearances of the phrase "according to.. Example" or "in.. Example" in various places herein are not necessarily all referring to the same example, nor are separate or alternative examples necessarily mutually exclusive of other examples.

The neural network is a mathematical computation model which is inspired and established by the principle of brain neuron structure and neural transmission, and the mode of realizing intelligent computation based on the model is called brain enlightening computation. For example, the neural network includes various forms of network structures, such as a Back Propagation (BP) neural network, a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), a long-short term memory network (LSTM), etc., and for example, the convolutional neural network may be further subdivided into a full convolutional network, a deep convolutional network, a U-network (U-Net), etc.

Fig. 1 illustrates an example artificial neural network ("ANN" or simply "network") 100. In this embodiment, an ANN may refer to a computational model that includes one or more nodes. The example ANN 100 may include an input layer 110,

hidden layers

120, 130, 160, and an output layer 150. Each layer of the ANN 100 may include one or more nodes, such as node 105 or node 115. In this embodiment, each node of the ANN may be connected to another node of the ANN. By way of example and not limitation, each node of the input layer 110 may be connected to one or more nodes of the hidden layer 120. In this embodiment, one or more nodes may be biased nodes (e.g., nodes in a layer that are not connected to and do not receive input from any node in a previous layer). In this embodiment, each node in each tier may be connected to one or more nodes of a previous or subsequent tier. Although fig. 1 depicts a particular ANN having a particular number of tiers, a particular number of nodes, and a particular connection between nodes, the present disclosure also encompasses any suitable ANN having any suitable number of tiers, any suitable number of nodes, and any suitable connection between nodes. By way of example and not limitation, although fig. 1 depicts connections between each node of the input layer 110 and each node of the hidden layer 120, one or more nodes of the input layer 110 may not be connected to one or more nodes of the hidden layer 120.

In this embodiment, the ANN may be a feed-forward ANN (e.g., an ANN without loops or loops, where propagation between nodes flows in one direction, starting from an input layer and proceeding to a subsequent layer). By way of example and not limitation, the input of each node of the hidden layer 120 may include the output of one or more nodes of the input layer 110. As another example and not by way of limitation, the input of each node of the output layer 150 may include the output of one or more nodes of the hidden layer 160. In this embodiment, the ANN may be a deep neural network (e.g., a neural network including at least two hidden layers). In this embodiment, the ANN may be a deep residual network. The deep residual network may be a feed-forward ANN that includes hidden layers organized into residual blocks. The input of each residual block after the first residual block may be a function of the output of the previous residual block and the input of the previous residual block. By way of example and not limitation, the input to the residual block N may be F (x) + x, where F (x) may be the output of the residual block N-1 and x may be the input of the residual block N-1. Although this disclosure describes a particular ANN, this disclosure also includes any suitable ANN.

In this embodiment, an activation function may correspond to each node of the ANN. The activation function of a node may define the output of the node for a given input. In this embodiment, the input to the node may comprise a set of inputs. By way of example and not limitation, the activation function may be an identity function, a binary step function, a logic function, or any other suitable function. As another example and not by way of limitation, the activation function of node K may be a sigmoid function:

hyperbolic tangent function:

rectifier activation function:

F _k (s _k )＝max(0,s _k )

or any other suitable function F _k (s _k ) Wherein s is _k May be a valid input to node k. In this embodiment, the inputs to the activation functions may be weighted. Each node may generate an output using a respective activation function based on the weighted inputs. In this embodiment, each connection between nodes may be associated with a weight. By way of example and not limitation, the connection 125 between node 105 and node 115 may have a weighting factor of 0.4, i.e., the output of node 105 multiplied by 0.4 is used as an input to node 115. As another example and not by way of limitation, the output y of node k _k May be y _k ＝F _k (s _k ) In which F is _k May be an activation function, s, corresponding to node k _k ＝∑(W _jk x _j ) May be a valid input, x, to node k _j May be the output of node j, W, connected to node k _jk May be a weighting factor between node j and node k. Although this disclosure describes specific inputs and outputs of a node, this disclosure also includes any suitable inputs and outputs of a node. Further, although the present disclosure may describe features between nodesConnections and weights are determined, but the disclosure also includes any suitable connections and weights between nodes.

In this embodiment, the ANN may be trained using training data. By way of example and not limitation, the training data may include inputs to the ANN 100 and expected outputs. As another example and not by way of limitation, the training data may include vectors, each vector representing a training subject and an expected label for each training subject. In this embodiment, training the ANN may include modifying weights associated with connections between nodes of the ANN by optimizing an objective function. By way of example and not limitation, a training method (e.g., conjugate gradient method, gradient descent method, random gradient descent method) may be used to back-propagate a sum-of-squares error (e.g., using a loss function that minimizes the sum-of-squares error) representing the distance between each vector. In this embodiment, the ANN may be trained using a ignore (Dropout) technique. By way of example and not limitation, one or more nodes may be temporarily omitted from training (e.g., no input is received and no output is generated). For each training object, one or more nodes of the ANN may have a probability of being omitted. Nodes omitted for a particular training object may be different from nodes omitted for other training objects (e.g., nodes may be temporarily omitted on an object-by-object basis). Although this disclosure describes training an ANN in a particular manner, this disclosure also includes training an ANN in any suitable manner.

Fig. 2 shows a structural schematic of one example of a machine learning model according to the present disclosure. As shown in fig. 2, whale images taken over the ocean may be input to the CNN backbone network. The first layer of the CNN backbone network may create feature maps of edges, curves, and color gradients. And a more abstract feature map can be created in a deeper layer in the CNN backbone network, and the feature maps of the previous layer are aggregated to construct a combination with the simple features of the previous layer. These features may represent the pectoral fins, tail fins or a particular body shape of the whale. Through the CNN backbone network, discriminating characteristics for efficient classification can be extracted.

The CNN backbone network may also include a fully connected layer to obtain final feature maps, represent useful high-level image components, and learn mappings from these feature maps to output classes.

Also included in fig. 2 is the recommendation of networks by zone (RPN) added in the last feature map step. The regional recommendation network passes a sliding window over the feature map and generates a number of bounding box guesses, as well as a score that estimates the likelihood that the bounding box contains an object in the category of interest. The four corners of the recommended area may then be passed to the fully connected layer and fine-tuned. And finally, regression and classification are carried out on the bounding box.

The machine learning model in fig. 2 also includes a masking branch. The mask branch is a convolutional neural Network, and the prediction of the mask is performed by a Full Convolutional Network (FCN). Instance partitioning may be achieved by masking branches. The masking branch may take the RPN recommended region as input and generate a mask for that region.

In other examples, the machine learning model in FIG. 2 may be Mask R-CNN. Mask R-CNN is a two-stage framework, the first stage scanning the image and generating Proposals (Proposals, i.e., areas that may contain an object), the second stage classifying the Proposals and generating bounding boxes and masks. The Mask R-CNN is extended from the Faster R-CNN and into the instance segmentation framework.

To more accurately capture images and output bounding boxes around each object of interest, categories of each object (e.g., whale species), and a full pixel Mask of objects within each bounding box, the present disclosure proposes to improve Mask R-CNN. The following is a description of the improvements to Mask R-CNN in this disclosure.

Fig. 3 shows an improvement to the feature network in the backbone network of Mask R-CNN. Improving the accuracy by widening the network results in increased parameters, increased models, and increased training costs and difficulty. ResNext is a multi-branch convolutional neural network, and the splitting-converting-merging strategy of the inclusion convolutional neural network is utilized to increase the fineness of the network segmentation result and improve the expression capability of the model. ResNext inherits the repeat-layer policy of ResNet while increasing the number of paths, and forms a ResNext block group convolution on each path using the same topology, as shown in fig. 3. The unique structure can improve the accuracy of ResNext without increasing the complexity of calculation, and 50 layers of ResNext have the accuracy similar to that of 101 layers of ResNet, but the calculation amount is reduced by half. The method and the device select the ResNext module to replace the ResNet module in the Mask R-CNN backbone network, and improve model accuracy and performance.

FIG. 4 shows an improvement to the Feature Pyramid Network (FPN) in the backbone network of Mask R-CNN. The Mask R-CNN solves the multi-scale problem in object detection by adopting a characteristic pyramid network. The FPN fuses adjacent features through transverse connection and top-down paths to form a pyramid consisting of features with different scales. The FPN in the Mask R-CNN model fuses feature maps of different layers in a top-down and transverse connection mode for use, fuses high-level semantic information into low-level accurate positioning information, and adopts a side connection method for multi-scale feature maps, thereby making up for the problem of insufficient low-level feature semantic abstraction. In feature extraction, the transverse connections of the 1 × 1 convolutional layers are used to generate features of the same channel dimension. One of the main problems of this structure is that the low-level features contain precise position information, while the high-level features contain strong semantic information, and the failure to reuse the semantic information of each level in the pyramid results in the possibility of losing useful information, thereby affecting the experimental precision. To fully utilize the accurate position information of the features at each level, an improved FPN is proposed as shown in fig. 4 by adding branches with backward connections from bottom to top. In fig. 4, the newly generated element maps N2, N3, N4, N5, and N6 incorporate the high-level and low-level elements, while the main elements thereof are still in their own hierarchical structure.

Fig. 5 shows another improvement to the feature network in the backbone network of Mask R-CNN. Attention-force mechanisms are required to divert attention to the most critical areas of an image and ignore irrelevant parts, allowing key information to be captured from complex graphical features by further weakening the training set to build the requirement of semantic associations of individual pixel points in the image. In order to improve the detection effect of the model, an attention mechanism is introduced, and the characteristic representation of the CNN is enhanced, so that key information of a task target is focused in a plurality of information, and the attention degree of irrelevant information is reduced. Common Attention modules include SE (Squeeze-and-Attention) Module, effective Channel Attention (ECA) Module, convolutional Block Attention Module (CBAM), and so on. The CBAM is a lightweight convolution attention module, can improve the model performance with less cost, and can be easily integrated into a backbone network of Mask R-CNN. The CBAM module is combined with two sub-modules, namely a channel attention module and a space attention module, so that attention characteristic map information can be generated in two dimensions of a channel and a space, and then the attention characteristic map information is multiplied by the previous characteristic map information to adaptively adjust characteristics, and a more accurate characteristic map is generated. In order to solve the problem that target information is lost when the CBAM extracts the channel information by using a multi-layer perceptron structure, the ECA-Net is adopted to replace a channel attention module of the CBAM, and the improved CBAM module is shown in FIG. 5.

Experiments show that 57 (98%) of the improved Mask RCNN pictures correctly predict whale species, 95% of automatic measurement results and manual measurement results have a difference of less than 5%, and the maximum difference is 13%. All species predicted confidence scores output by the modified Mask RCNN were above 80%, except for one misclassified individual (with a predicted confidence score of 63%).

Comparison of the manual extraction Mask and the modified Mask-based RCNN extraction Mask was evaluated by the joint intersection (IoU) method. The IoU is calculated by aligning the upper left corner of the box, as shown in the following equation:

wherein IoU is the intersection of joint calculation, (W) _a ,H _a ) And (W) _b ,H _b ) Respectively, the dimensions of the two boxes, the more similar the two boxes, the smaller the metric value should be.

The IoU in this disclosure compares very favorably with other studies in example segmentation (where greater than 0.5 is generally considered a good detection), with an average IoU of 0.85 and a standard deviation of 0.05 when comparing manually plotted and predicted masks. The detection rate is also typically expressed in terms of accuracy (true positive proportion of detection) and recall (true positive proportion of detection). Since whales are large and easily found, no false negatives were generated in the test.

Fig. 6 shows one example of a detection method of marine organisms according to the present disclosure. As shown in fig. 6, first, an image acquisition device based on the drone system performs image acquisition. And then, inputting the acquired image into a convolutional neural network for image analysis. The convolutional neural network here may be a modified Mask R-CNN proposed by the present disclosure. The convolutional neural network may then output a bounding box around the object of interest (e.g., whale), a category of the object (e.g., whale species), and a full pixel mask of the object within the bounding box. Then, the principal axis of the mask is found by principal component analysis. Then, the pixels are measured along the principal axis, and finally converted to length units (meters) using the image metadata.

Fig. 7 shows a flow chart schematic of one example of a method of detection of marine organisms according to the present disclosure. As shown in fig. 7, the method for detecting marine organisms includes:

step S702: and acquiring a marine environment image captured by the image acquisition equipment.

Step S704: generating a prediction box of a target marine organism in the marine environment image and a pixel mask of the target marine organism within the prediction box using a machine learning model.

Step S706: and determining the biological characteristics of the target marine life according to the pixel mask of the target marine life.

According to the detection method of the marine organisms provided by the disclosure, the marine organisms are identified and the biological characteristics are extracted through the machine learning model, so that the dynamic monitoring of the marine organisms is realized.

Fig. 8 shows a schematic configuration diagram of one example of the detection apparatus of marine organisms according to the present disclosure. As shown in fig. 8, the marine organism detection apparatus 800 includes: an acquisition module 801 configured to acquire an image of a marine environment captured by an image acquisition device; a generating module 802 configured to generate a prediction box of a target marine organism in the marine environment image and a pixel mask of the target marine organism within the prediction box using a machine learning model; and a determining module 803 configured to determine a biometric characteristic of the target marine organism from the pixel mask of the target marine organism.

According to the detection device of marine organisms provided by the disclosure, the marine organisms are identified and the biological characteristics are extracted through the machine learning model, so that the dynamic monitoring of the marine organisms is realized.

In some embodiments, the feature pyramid network further comprises reverse branches in bottom-up and cross-connect forms.

In some embodiments, the determining module 803 is further configured to: extracting the pixel mask and determining a main axis by a principal component analysis method; measuring a target pixel along the principal axis; and determining the length of the target marine organism according to the target pixel.

Fig. 9 is a schematic structural diagram of an unmanned aerial vehicle according to an embodiment of the present invention, where the unmanned aerial vehicle may include: an image capturing device 901 for capturing images in real time; and a processor 902 for executing the method for detecting marine organisms according to the present disclosure from the images acquired by the image acquisition device. The implementation principle, the execution process, and the technical effect of the drone shown in fig. 9 are similar to those described in the embodiment shown in fig. 7, and specifically refer to the above statements, which are not described again here.

The above description is only an embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are included in the scope of the present invention.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the spirit of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of detecting marine organisms comprising:

acquiring a marine environment image captured by image acquisition equipment;

generating a prediction box of a target marine organism in the marine environment image and a pixel mask of the target marine organism within the prediction box by using a machine learning model; and

and determining the biological characteristics of the target marine organism according to the pixel mask of the target marine organism.

2. The detection method according to claim 1, wherein the machine learning model is a Mask R-CNN neural network model.

3. The detection method according to claim 2, wherein the backbone network of the Mask R-CNN neural network model comprises a ResNet network and a feature pyramid network of 101 layers.

4. The detection method of claim 2, wherein the backbone network of the Mask R-CNN neural network model comprises a resenext network and a feature pyramid network of 50 layers.

5. The detection method of claim 4, wherein the feature pyramid network further comprises inverse branches in bottom-up and cross-connect forms.

6. A detection method according to claim 3 or 4 wherein the backbone network further comprises a convolution attention module.

7. The detection method of claim 6, wherein the convolution attention module is a channel and space dual attention module.

8. The detection method according to claim 7, wherein the channel and space dual Attention Module is a CBAM (Convolitional Block Attention Module) Module,

and the channel attention module in the CBAM module is an ECA-Net module.

9. The detection method according to claim 1, wherein the marine organism is whale,

and, the biometric characteristic is a length.

10. The detection method of claim 9, wherein said determining a biometric characteristic of the target marine organism from the pixel mask of the target marine organism comprises:

extracting the pixel mask and determining a main axis by a principal component analysis method;

measuring a target pixel along the principal axis; and

and determining the length of the target marine organism according to the target pixel.

11. A marine organism detection device comprising:

the acquisition module is configured to acquire a marine environment image captured by the image acquisition device;

a generation module configured to generate a prediction box of a target marine organism in the marine environment image and a pixel mask of the target marine organism within the prediction box using a machine learning model; and

a determination module configured to determine a biometric characteristic of the target marine organism from a pixel mask of the target marine organism.

12. An unmanned aerial vehicle, comprising:

the image acquisition equipment is used for acquiring images in real time; and

processor for performing the method according to at least one of claims 1 to 10, based on the image acquired by the image acquisition device.