CN115147706A

CN115147706A - Method, device, equipment and medium for identifying fish feeding behaviors

Info

Publication number: CN115147706A
Application number: CN202210536634.3A
Authority: CN
Inventors: 郑凯健; 杨仁友; 李日富; 秦浩; 杨靓; 严俊
Original assignee: Southern Marine Science and Engineering Guangdong Laboratory Zhanjiang
Current assignee: Southern Marine Science and Engineering Guangdong Laboratory Zhanjiang
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2022-10-04

Abstract

The invention relates to the technical field of aquaculture information, and discloses a method, a device, equipment and a medium for identifying fish school feeding behaviors. According to the method, a time sequence image set of a fish school feeding behavior is obtained, and image preprocessing is carried out on the time sequence image set by respectively adopting a preset enhancement algorithm, a preset optical flow method and a preset Gaussian density model, so that a corresponding space image, an optical flow image and a density image are obtained; extracting first features corresponding to each space image, second features corresponding to each optical flow image and third features corresponding to each density image according to the space images, the optical flow images, the density images and a preset attention mechanism model; according to the first feature, the second feature and the third feature, performing feature fusion on the first feature, the second feature and the third feature through a preset attention mechanism model, and determining corresponding fusion features; identifying the feeding behavior of the fish school according to the fusion characteristics; thereby realizing accurate feeding control.

Description

Method, device, equipment and medium for identifying fish feeding behaviors

Technical Field

The invention relates to the technical field of aquaculture information, in particular to a method, a device, equipment and a medium for identifying fish school feeding behaviors.

Background

In recent years, with the continuous development of aquaculture technology in China, the yield of industrial scale pond fishery aquaculture is continuously improved, and the problem of accurate control of fish feeding amount in aquaculture is more and more emphasized.

At present, in the actual feeding of fishery breeding, a method based on an empirical value is generally adopted for manual feeding, or a machine is adopted for feeding, and the feeding control of baits is also manually set according to breeding experiences. At present, a paper proposes a method for identifying fish school behaviors, but most of the methods aim at laboratory fish tank type data and are difficult to apply in an actual aquaculture net cage.

However, the actual breeding situation of the global data is difficult to realize accurate feeding control according to the complexity and identification difficulty of the actual breeding application environment.

Disclosure of Invention

The invention mainly aims to provide a method, a device, equipment and a medium for identifying the feeding behavior of fish shoal, and aims to realize accurate feeding control.

In order to achieve the above object, the present invention provides a method for identifying a fish school feeding behavior, comprising the steps of:

acquiring a time sequence image set of a fish school feeding behavior, and preprocessing the time sequence image set by respectively adopting a preset enhancement algorithm, a preset optical flow method and a preset Gaussian density model to obtain a corresponding space image, an optical flow image and a density image;

extracting first features corresponding to each space image, second features corresponding to each optical flow image and third features corresponding to each density image based on the space images, the optical flow images, the density images and a preset attention mechanism model;

performing feature fusion through the preset attention mechanism model based on the first feature, the second feature and the third feature, and determining corresponding fusion features;

and performing feeding behavior identification on the fish school based on the fusion characteristics.

Preferably, the preset attention mechanism model comprises a convolution transpose module, or an attention module, or a combination of the convolution transpose module and the attention module.

Preferably, the step of extracting, based on the spatial images, the optical flow images, the density images and a preset feature extraction model, first features corresponding to each spatial image, second features corresponding to each optical flow image, and third features corresponding to each density image includes:

performing convolution and pooling on each spatial image, each optical flow image and each density image through the preset attention mechanism model according to the spatial image, the optical flow image and the density image to obtain a processed spatial image, a processed optical flow image and a processed density image;

and performing residual summation processing on the processed space images, the processed optical flow images and the processed density images for preset times to obtain first features corresponding to all the space images, second features corresponding to all the optical flow images and third features corresponding to all the density images.

Preferably, the step of performing feature fusion through the preset attention mechanism model based on the first feature, the second feature and the third feature, and determining a corresponding fusion feature includes:

according to the first feature, the second feature and the third feature, performing single-image internal feature fusion through the preset attention mechanism model to obtain a first internal feature corresponding to the first feature, a second internal feature corresponding to the second feature and a third internal feature corresponding to the third feature;

according to the first internal feature, the second internal feature and the third internal feature, feature fusion between a plurality of images is carried out through the preset attention mechanism model, and a first associated feature corresponding to the first internal feature, a second associated feature corresponding to the second internal feature and a third associated feature corresponding to the third internal feature are obtained;

and according to the first correlation characteristic, the second correlation characteristic and the third correlation characteristic, performing characteristic fusion between different domain characteristics through the preset attention mechanism model to obtain corresponding fusion characteristics.

Preferably, the preset attention mechanism model includes a convolution transpose module and an attention module, and the step of obtaining a first internal feature corresponding to the first feature, a second internal feature corresponding to the second feature, and a third internal feature corresponding to the third feature by performing intra-feature fusion on a single image through the preset attention mechanism model according to the first feature, the second feature, and the third feature includes:

performing convolution operation on the first feature, the second feature and the third feature through the convolution transposition module, and performing feature matrix transposition multiplication on a result after convolution and the first feature, the second feature and the third feature to obtain a first convolution feature corresponding to the first feature, a second convolution feature corresponding to the second feature and a third convolution feature corresponding to the third feature;

performing convolution operation of a preset number on the first convolution characteristic, the second convolution characteristic and the third convolution characteristic through the attention module, and performing characteristic matrix multiplication on a result after convolution to obtain a first product characteristic corresponding to the first convolution characteristic, a second product characteristic corresponding to the second convolution characteristic and a third product characteristic corresponding to the third convolution;

and respectively carrying out feature summation on the first product feature, the first product feature and the first convolution feature, the second convolution feature and the third convolution feature to obtain a corresponding first internal feature, a corresponding second internal feature and a corresponding third internal feature.

Preferably, the preset attention mechanism model includes an attention module, and the step of performing feature fusion between a plurality of images through the preset attention mechanism model according to the first internal feature, the second internal feature and the third internal feature to obtain a first associated feature corresponding to the first internal feature, a second associated feature corresponding to the second internal feature and a third associated feature corresponding to the third internal feature includes:

respectively performing feature integration processing on the first internal feature, the second internal feature and the third internal feature to obtain a first integrated feature corresponding to the first internal feature, a second integrated feature corresponding to the second internal feature and a third integrated feature corresponding to the third internal feature;

performing residual error operation on the first integrated feature, the second integrated feature and the third integrated feature to obtain a first sum feature corresponding to the first integrated feature, a second sum feature corresponding to the second integrated feature and a third sum feature corresponding to the third integrated feature;

performing pooling operation on the first sum characteristic, the second sum characteristic and the third sum characteristic to obtain corresponding pooled first sum characteristic, pooled second sum characteristic and pooled third sum characteristic;

performing convolution operation with a preset number on the pooled first sum features, the pooled second sum features and the pooled third sum features through the attention module, and performing feature matrix multiplication on the result of convolution and the pooled first sum features, the pooled second sum features and the pooled third sum features respectively and then summing to obtain corresponding first product features, second product features and third product features;

pooling the first product characteristic, the second product characteristic and the third product characteristic to obtain a pooled first product characteristic, a pooled second product characteristic and a pooled third product characteristic;

performing convolution operation with a preset number on the pooled first product feature, the pooled second product feature and the pooled third product feature through the attention module, and performing feature matrix multiplication on the result of the convolution operation, the pooled first product feature, the pooled second product feature and the pooled third product feature respectively and then summing to obtain a corresponding first intention feature, a corresponding second intention feature and a corresponding third intention feature;

and performing pooling processing on the first intention characteristic, the second intention characteristic and the third intention characteristic to obtain a corresponding first associated characteristic, a corresponding second associated characteristic and a corresponding third associated characteristic.

Preferably, the preset attention mechanism model includes a convolution transpose module and an attention module, and the step of performing feature fusion between different domain features through the preset attention mechanism model according to the first relevant feature, the second relevant feature, and the third relevant feature to obtain corresponding fusion features includes:

channel splicing is carried out on the first correlation characteristic, the second correlation characteristic and the third correlation characteristic to obtain corresponding three-dimensional matrix characteristics;

performing convolution operation on the three-dimensional matrix characteristic through the convolution transposition module, and performing characteristic matrix multiplication on a result after convolution and the three-dimensional matrix characteristic to obtain a coding characteristic corresponding to the three-dimensional matrix characteristic;

performing convolution operation of a preset number on the coding features through the attention module, and performing feature matrix multiplication on the result after convolution to obtain product coding features corresponding to the coding features;

and carrying out feature summation on the product coding features to obtain fusion features corresponding to the product coding features.

Further, to achieve the above object, the present invention provides a fish school feeding behavior recognizing device including:

the acquisition module is used for acquiring a time sequence image set of the fish school feeding behavior, and preprocessing the time sequence image set by respectively adopting a preset enhancement algorithm, a preset optical flow method and a preset Gaussian density model to obtain a corresponding spatial image, an optical flow image and a density image;

the extraction module is used for extracting first features corresponding to each space image, second features corresponding to each optical flow image and third features corresponding to each density image based on the space images, the optical flow images, the density images and a preset attention mechanism model;

a fusion module, configured to perform feature fusion through the preset attention mechanism model based on the first feature, the second feature, and the third feature, and determine a corresponding fusion feature;

and the identification module is used for carrying out feeding behavior identification on the fish school based on the fusion characteristics.

Further, to achieve the above object, the present invention also provides an apparatus which is an apparatus for recognizing a fish school feeding behavior, comprising: a memory, a processor and a program for identification of fish school feeding behaviour stored on said memory and executable on said processor, said program for identification of fish school feeding behaviour when executed by said processor implementing the steps of the method for identification of fish school feeding behaviour as described above.

Furthermore, to achieve the above object, the present invention also provides a medium which is a computer readable storage medium having stored thereon a program for identifying a shoal feeding behavior, which when executed by a processor implements the steps of the method for identifying a shoal feeding behavior as described above.

The invention provides a method, a device, equipment and a medium for identifying the ingestion behavior of fish shoal; the identification method of the fish school feeding behavior comprises the following steps: acquiring a time sequence image set of a fish school feeding behavior, and preprocessing the time sequence image set by respectively adopting a preset enhancement algorithm, a preset optical flow method and a preset Gaussian density model to obtain a corresponding space image, an optical flow image and a density image; extracting first features corresponding to each space image, second features corresponding to each optical flow image and third features corresponding to each density image based on the space images, the optical flow images, the density images and a preset attention mechanism model; performing feature fusion through the preset attention mechanism model based on the first feature, the second feature and the third feature, and determining corresponding fusion features; and performing feeding behavior identification on the fish school based on the fusion characteristics. According to the method, a time sequence image set of fish school feeding behaviors is obtained, and image preprocessing is performed on the time sequence image set by respectively adopting a preset enhancement algorithm, a preset optical flow method and a preset Gaussian density model, so that a corresponding spatial image, an optical flow image and a density image are obtained; extracting first features corresponding to each space image, second features corresponding to each optical flow image and third features corresponding to each density image according to the space images, the optical flow images, the density images and a preset attention mechanism model; according to the first feature, the second feature and the third feature, performing feature fusion on the first feature, the second feature and the third feature through a preset attention mechanism model, and determining corresponding fusion features; identifying the feeding behavior of the fish group according to the fusion characteristics; thereby realizing accurate feeding control.

Drawings

FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a first embodiment of the method for identifying feeding behavior of fish school according to the present invention;

FIG. 3 is a schematic diagram of a pre-attention mechanism model of a second embodiment of the method for identifying fish feeding behavior of fish shoal according to the present invention;

FIG. 4 is a schematic flow chart of a third embodiment of the method for identifying fish feeding behavior according to the present invention;

FIG. 5 is a schematic view of a sub-flow chart of a third embodiment of the method for identifying feeding behavior of fish school according to the present invention;

FIG. 6 is a schematic flow chart of a fourth embodiment of the method for identifying fish feeding behavior according to the present invention;

FIG. 7 is a schematic view of a sub-flow chart of a fourth embodiment of the method for identifying feeding behavior of fish school according to the present invention;

FIG. 8 is a schematic flow chart of a fifth embodiment of the method for fish school feeding behavior identification according to the present invention;

FIG. 9 is a schematic view of a fifth embodiment of the method for identifying fish feeding behavior according to the present invention;

FIG. 10 is a schematic flow chart of a sixth embodiment of the method for identifying fish feeding behavior according to the present invention;

FIG. 11 is a schematic view of a sub-flow chart of a sixth embodiment of the method for identifying feeding behavior of fish school according to the present invention;

FIG. 12 is a schematic flow chart showing a seventh embodiment of the method for identifying fish feeding behavior according to the present invention;

FIG. 13 is a schematic view of a sub-flow chart of a first embodiment of the method for identifying feeding behavior of fish in accordance with the present invention;

fig. 14 is a functional block diagram of the first embodiment of the device for identifying the feeding behavior of fish school according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

The device of the embodiment of the invention can be a mobile terminal or a server device.

As shown in fig. 1, the apparatus may include: a processor 1001, e.g. a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a recognition program of a fish feeding behavior.

The operating system is a program for managing and controlling identification equipment and software resources of the fish school feeding behavior, and supports the running of a network communication module, a user interface module, an identification program of the fish school feeding state and the hunger level and other programs or software; the network communication module is used for managing and controlling the network interface 1002; the user interface module is used to manage and control the user interface 1003.

In the identification device of fish school feeding behavior shown in fig. 1, the identification device of fish school feeding behavior calls the identification program of fish school feeding behavior stored in the memory 1005 by the processor 1001 and performs the operations in the respective embodiments of the identification method of fish school feeding behavior described below.

Based on the hardware structure, the embodiment of the method for identifying the fish school feeding behavior is provided.

Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the method for identifying fish school feeding behavior according to the present invention, and the method for identifying fish school feeding behavior includes:

step S10, acquiring a time sequence image set of a fish school feeding behavior, and preprocessing the time sequence image set by respectively adopting a preset enhancement algorithm, a preset optical flow method and a preset Gaussian density model to obtain a corresponding space image, an optical flow image and a density image;

step S20, extracting a first feature corresponding to the space image, a second feature corresponding to the optical flow image and a third feature corresponding to the density image based on the space image, the optical flow image, the density image and a preset attention mechanism model;

step S30, based on the first feature, the second feature and the third feature, performing feature fusion through the preset attention mechanism model, and determining corresponding fusion features;

and S40, carrying out feeding behavior identification on the fish school based on the fusion characteristics.

In the embodiment, a time sequence image set of a fish school feeding behavior is obtained, and a preset enhancement algorithm, a preset optical flow method and a preset Gaussian density model are respectively adopted to carry out image preprocessing on the time sequence image set, so that a corresponding space image, an optical flow image and a density image are obtained; extracting first features corresponding to each space image, second features corresponding to each optical flow image and third features corresponding to each density image according to the space images, the optical flow images, the density images and a preset attention mechanism model; according to the first feature, the second feature and the third feature, performing feature fusion on the first feature, the second feature and the third feature through a preset attention mechanism model, and determining corresponding fusion features; identifying the feeding behavior of the fish group according to the fusion characteristics; thereby realizing accurate feeding control.

The respective steps will be described in detail below:

and S10, acquiring a time sequence image set of the fish school feeding behavior, and preprocessing the time sequence image set by respectively adopting a preset enhancement algorithm, a preset optical flow method and a preset Gaussian density model to obtain a corresponding space image, an optical flow image and a density image.

In this embodiment, the method for identifying the feeding behavior of fish schools consists of three main parts, namely a spatial channel for distribution of the fish schools S1, an optical flow channel for swimming speed of the fish schools S2 and a density channel for the number of the fish schools S3; the spatial channels distributed by the fish shoal in the S1 mode correspond to the spatial images, the optical flow channels of the fish shoal swimming speed in the S2 mode correspond to the optical flow images, and the density channels of the fish shoal number in the S3 mode correspond to the density images.

Continuously shooting the fish shoal breeding cabin by using a camera to obtain a real-time video of the fish shoal feeding behavior; performing frame segmentation on the real-time video, and selecting an image with a clear fish body target as a timing chart image set of a fish school feeding behavior; and performing image preprocessing on the same time sequence image set to obtain a spatial image, an optical flow image and a density image corresponding to the time sequence image set of the fish school feeding behavior.

And carrying out image enhancement preprocessing operation on the time-sequence image set by adopting a preset enhancement algorithm to form a space image of the fish school.

In the image analysis, the processing performed before feature extraction, segmentation, and matching is performed on an input image. The main purposes of image preprocessing are to eliminate irrelevant information from the image, recover useful real information, enhance the detectability of relevant information and simplify the data to the maximum extent, thereby improving the reliability of feature extraction, image segmentation, matching and recognition. The pre-processing process typically has the steps of digitization, geometric transformation, normalization, smoothing, restoration, and enhancement.

Carrying out dense optical flow processing on the time sequence image set of the fish school feeding behavior to obtain an optical flow image corresponding to the time sequence image set; the dense optical flow is an image registration method for performing point-by-point matching on an image, and is different from the sparse optical flow only aiming at a plurality of feature points on the image, and the dense optical flow calculates the offset of all points on the image, so that a dense optical flow field is formed.

The optical flow is the representation of the image motion speed, and when each pixel point in the image is given a motion vector, the whole image forms an optical flow field. The motion vector of each frame of pixel point is obtained by calculating the change of each pixel point in the two frames of images before and after, and the pixel points with similar motion vectors form the same fish body, so that the fish body can be detected and identified.

In order to obtain a real density image of a fish school, marking the whole fish body by taking the head of a fish as a detection area; after the time sequence image set containing the marks is subjected to Gaussian filtering and normalization processing, the filtering result is the density image corresponding to the time sequence image set. The density image represents the distribution of the fish school in the fish school feeding behavior scene, and the total number of the fish school in the fish school feeding behavior scene can be obtained by integrating or summing the density images.

When fish schools are dense, the shielding overlapping of fishes causes the head area displaying the fishes in the image to become small; when the fish schools are distributed sparsely, the shielding overlapping among the fish schools is less, and the head area of the fish in the image is larger.

And S20, extracting first features corresponding to all the space images, second features corresponding to all the optical flow images and third features corresponding to all the density images on the basis of the space images, the optical flow images, the density images and a preset attention mechanism model.

In this embodiment, based on the obtained spatial images, optical flow images, and density images, performing convolution, pooling, and residual summation on each spatial image, each optical flow image, and each density image by using a preset attention mechanism model to extract an internal feature of a single spatial image of each spatial image, an internal feature of a single spatial image of each optical flow image, and an internal feature of a single spatial image of each density image; the internal features of a single space image of each space image are also the first features corresponding to each space image; the internal features of the single space image of each optical flow image, namely the second features corresponding to each optical flow image; and the internal features of the single space image of each density image are also the third features corresponding to each density image.

For example, the size of the input space image is [3, 640, 360], firstly, the input space image is subjected to feature extraction through two-dimensional convolution of two layers to obtain features of [32, 640, 360 ]; then compressing the features to [32, 320, 180] through a layer of max-pooling; then obtaining the low-dimensional characteristics of [64, 160, 90] through a two-dimensional convolution layer and a maximum pooling layer; and finally, performing high-dimensional feature extraction through three repeated two-dimensional residual blocks to finally form the internal features of the single spatial image [64, 160, 90], namely the first features corresponding to each spatial image.

And S30, performing feature association through a preset attention mechanism model based on the first feature, the second feature and the third feature, and determining corresponding clustering features.

In an embodiment, according to the first feature corresponding to each space image, the second feature corresponding to each optical flow image, and the third feature corresponding to each density image, inputting the first feature, the second feature, and the third feature into a preset attention mechanism model; the method comprises the steps of performing attention fusion on a first feature, a second feature and a third feature by using a preset attention mechanism model, sequentially performing feature fusion on the internal features of a single image, feature fusion among multiple images and feature fusion among different domain features on the first feature, the second feature and the third feature, and determining fusion features corresponding to a space image, an optical flow image and a density image.

And S40, identifying the feeding behavior of the fish school based on the clustering characteristics.

In the embodiment, the feeding process of the fish school is divided into a plurality of levels through fusion features corresponding to the spatial image, the optical flow image and the density image, and the feeding state and the hunger level of the fish school are identified; therefore, the feeding behavior state of the fish school in the time sequence image set can be accurately identified, and the accurate feeding control of the fish school is further realized.

In the embodiment, a time sequence image set of a fish school feeding behavior is obtained, and image preprocessing is performed on the time sequence image set to obtain a corresponding spatial image, an optical flow image and a density image; extracting a first feature corresponding to each space image, a second feature corresponding to each optical flow image and a third feature corresponding to each density image through a preset attention mechanism model according to the space images, the optical flow images and the density images; according to the first feature, the second feature and the third feature, performing feature fusion on the first feature, the second feature and the third feature through a preset attention mechanism model, and determining corresponding fusion features; identifying the feeding behavior of the fish group according to the fusion characteristics; thereby realizing accurate feeding control.

Further, based on the first embodiment of the identification method of fish school feeding behavior of the present invention, the second embodiment of the identification method of fish school feeding behavior of the present invention is proposed.

The difference between the second embodiment of the method for identifying fish school feeding behavior and the first embodiment of the method for identifying fish school feeding behavior is that the present embodiment is a refinement of the preset attention mechanism model in step S20, and specifically includes:

the preset attention mechanism model comprises a convolution transpose module, or an attention module, or a combination of the convolution transpose module and the attention module.

In this embodiment, the predetermined attention mechanism model is preferably a convolution transpose module, or an attention module, or a combination of a convolution transpose module and an attention module. According to different task requirements, the preset attention mechanism model can be composed of at least one convolution transpose module, or at least one attention module, or a combination of at least one convolution transpose module and at least one attention module.

Referring to fig. 3, a flow of the preset attention mechanism model: firstly, carrying out convolution operation on input features through one 1*1 convolution layer of a convolution transposition module, and stretching data dimensions of the input features; and multiplying the result after convolution by the transpose of the input features to finally obtain the data features after the input features are compressed. The principle is as follows: the input features are compressed in feature dimension through a 1*1 convolution operation, and then segmented division of the input features is achieved by using output of a 1*1 convolution layer as a selection matrix of the input features.

Furthermore, the output of the convolution transpose module is used as the input of the attention module, and represents the query and the key in the attention model through two convolutions, respectively, and the data input by the attention module is directly used as the value in the preset attention model for subsequent operations. And multiplying the results of the two convolution operations and adding the results to the input of the attention module to finally obtain the attention mechanism characteristic, namely the fusion characteristic corresponding to the input characteristic.

The preset attention mechanism model has the performance of feature compression, feature extraction and the like, and can realize the compression of input features while realizing the dimension of input feature stretching; the accuracy rate of fish school feeding behavior identification can be improved by using a preset attention mechanism model.

Further, a third embodiment of the method for identifying a fish school feeding behavior of the present invention is proposed based on the first and second embodiments of the method for identifying a fish school feeding behavior of the present invention.

The third embodiment of the method for recognizing a fish school feeding behavior differs from the first and second embodiments of the method for recognizing a fish school feeding behavior in that in this embodiment, the first feature corresponding to each spatial image, the second feature corresponding to each optical flow image, and the third feature corresponding to each density image are extracted in step S20 based on the spatial image, the optical flow image, the density image, and a preset attention mechanism model, and with reference to fig. 4, the steps specifically include:

step S21, performing convolution and pooling processing on each space image, each optical flow image and each density image through the preset attention mechanism model according to the space image, the optical flow image and the density image to obtain a processed space image, a processed optical flow image and a processed density image;

and S22, performing residual summation processing on the processed space images, the processed optical flow images and the processed density images for preset times to obtain first features corresponding to the space images, second features corresponding to the optical flow images and third features corresponding to the density images.

In the embodiment, each spatial image, each optical flow image and each density image of the fish school feeding behavior are subjected to convolution and pooling processing through a preset attention mechanism model according to the spatial image, the optical flow image and the density image, and a processed spatial image, a processed optical flow image and a processed density image are obtained; performing residual summation processing on the processed space images, the processed optical flow images and the processed density images for preset times to obtain first features corresponding to the space images, second features corresponding to the optical flow images and third features corresponding to the density images; therefore, high-precision feature extraction of the space image, the optical flow image and the density image is realized, and the accuracy of feeding behavior identification is improved.

The respective steps will be described in detail below:

and S21, performing convolution and pooling processing on each space image, each optical flow image and each density image through the preset attention mechanism model according to the space image, the optical flow image and the density image to obtain a processed space image, a processed optical flow image and a processed density image.

In the embodiment, after the spatial image, the optical flow image and the density image of the fish school feeding behavior are acquired, feature extraction is performed on each spatial image, each optical flow image and each density image through two-layer two-dimensional convolution through a preset attention mechanism model; then, performing feature compression on the result after the convolution processing through a maximum pooling layer; and obtaining a processed space image, a processed optical flow image and a processed density image through a two-dimensional convolution layer and a maximum pooling layer.

Referring to fig. 5, for example, the size of the input aerial image is [3, 640, 360], first, features of [32, 640, 360] are obtained after feature extraction is performed on the input aerial image through a two-dimensional volume of two layers; then compressing the features to [32, 320, 180] with the largest pooling layer; and obtaining a processed space image with the low-dimensional characteristics of [64, 160 and 90] through a two-dimensional convolution layer and a maximum pooling layer. Wherein 3 of [3, 640, 360] represents RGB three channels of the aerial image, and 640 × 360 is the pixel size of the aerial image.

In this embodiment, the processed aerial image, the processed optical flow image, and the processed density image are subjected to high-dimensional feature extraction by three repeated two-dimensional residual blocks, and finally, a first feature corresponding to each aerial image, a second feature corresponding to each optical flow image, and a third feature corresponding to each density image are formed.

Wherein the two-dimensional residual block consists of a layer of two-dimensional convolutions and a sum. Inputting a processed space image, a processed optical flow image and a processed density image, and performing feature compression on the space image, the processed optical flow image and the processed density image through a convolution layer; then, carrying out summation operation on the compressed features and the input features through a summation layer to obtain the internal features of a single space image, the internal features of a single optical flow image and the internal features of a single density image; the internal features of a single space image are the first features corresponding to each space image; the internal features of the single optical flow image, namely the second features corresponding to the optical flow images; the internal features of the single density image are the corresponding third features of each density image.

Referring to fig. 5, for example, the low-dimensional features are subjected to high-dimensional feature extraction through three repeated two-dimensional residual blocks, and finally the internal features of the single image [64, 160, 90], that is, the first features corresponding to each aerial image, are formed.

In the embodiment, each spatial image, each optical flow image and each density image of the fish school feeding behavior are subjected to convolution and pooling processing through a preset attention mechanism model according to the spatial image, the optical flow image and the density image, and a processed spatial image, a processed optical flow image and a processed density image are obtained; performing residual summation processing on the processed space images, the processed optical flow images and the processed density images for preset times to obtain first features corresponding to the space images, second features corresponding to the optical flow images and third features corresponding to the density images; therefore, high-precision feature extraction of the space image, the optical flow image and the density image is realized, and the accuracy of fish school feeding behavior identification is improved.

Further, based on the first, second, and third embodiments of the method for identifying fish school feeding behavior of the present invention, a fourth embodiment of the method for identifying fish school feeding behavior of the present invention is proposed.

The fourth embodiment of the method for identifying fish school feeding behavior differs from the first, second and third embodiments of the method for identifying fish school feeding behavior in that in the present embodiment, in step S30, feature fusion is performed through the preset attention mechanism model based on the first feature, the second feature and the third feature, and refinement of the corresponding fusion feature is determined, and referring to fig. 6, the step specifically includes:

step S31, according to the first feature, the second feature and the third feature, performing single-image internal feature fusion through the preset attention mechanism model to obtain a first internal feature corresponding to the first feature, a second internal feature corresponding to the second feature and a third internal feature corresponding to the third feature;

step S32, according to the first internal feature, the second internal feature and the third internal feature, performing feature fusion between a plurality of images through the preset attention mechanism model to obtain a first associated feature corresponding to the first internal feature, a second associated feature corresponding to the second internal feature and a third associated feature corresponding to the third internal feature;

and S33, performing feature fusion among different domain features through the preset attention mechanism model according to the first correlation feature, the second correlation feature and the third correlation feature to obtain corresponding fusion features.

In this embodiment, a preset attention mechanism model is used to perform feature fusion within a single image, feature fusion between multiple images in each channel, and feature fusion between different domain features on the first features of each spatial image, the second features of each optical flow image, and the third features of each density image, so as to obtain corresponding fusion features; therefore, the anti-interference capability of fish school feeding behavior recognition is improved through complementation of three different characteristics.

The respective steps will be described in detail below:

step S31, according to the first feature, the second feature and the third feature, performing intra-image feature fusion by using the preset attention mechanism model to obtain a first intra-image feature corresponding to the first feature, a second intra-image feature corresponding to the second feature and a third intra-image feature corresponding to the third feature.

In the present embodiment, the first features of the respective aerial images, the second features of the respective optical flow images, and the third features of the respective density images are input to a preset attention mechanism model. The preset attention mechanism model consists of a convolution transposition module and an attention module; according to the requirements of different tasks, the preset attention mechanism model can be formed by matching one convolution transpose module with one or more attention modules or one or more attention modules.

In this embodiment, the preset attention mechanism model is composed of a convolution transpose module and an attention module; and performing internal feature fusion processing on the single image by the first feature, the second feature and the third feature through a convolution transpose module and an attention module to obtain a first internal feature corresponding to the first feature, a second internal feature corresponding to the second feature and a third internal feature corresponding to the third feature.

Referring to fig. 7, a first feature S12 of the spatial image, a second feature S22 of the optical flow image, and a third feature S32 of the density image are subjected to a single-image internal feature fusion process by a convolution transpose module and an attention module, so as to obtain a single-image internal feature association S13 of the corresponding spatial image, that is, a first internal feature corresponding to the first feature; the single-picture internal feature association S23 of the optical flow image is also a second internal feature corresponding to the second feature; and a single-picture internal feature association S33 of the density image, that is, a third internal feature corresponding to the third feature.

Step S32, according to the first internal feature, the second internal feature and the third internal feature, performing feature fusion between a plurality of images through the preset attention mechanism model to obtain a first associated feature corresponding to the first internal feature, a second associated feature corresponding to the second internal feature and a third associated feature corresponding to the third internal feature.

In this embodiment, the predetermined attention mechanism model is composed of two convolution transpose modules. And performing image integration, convolution and pooling on the first internal feature, the second internal feature and the third internal feature, and performing attention relation feature fusion processing between frames of the video of the same channel through two attention modules to obtain a first correlation feature corresponding to the first internal feature, a second correlation feature corresponding to the second internal feature and a third correlation feature corresponding to the third internal feature.

Referring to fig. 7, after image integration, convolution and pooling and two attention modules are performed on a first internal feature S13 of the spatial image, a second internal feature S23 of the optical flow image and a third internal feature S33 of the density image, attention relationship feature fusion processing between frames of a video of the same channel is performed, so that a fusion feature S14, namely a first associated feature, between a plurality of images of the corresponding spatial image is obtained; the feature S24, namely a second correlation feature, is fused among the multiple pictures of the optical flow image; the feature S34, i.e., the third related feature, is fused between the multiple pictures of the density image.

In this embodiment, the predetermined attention mechanism model is composed of a convolution transpose module and an attention module. And carrying out attention acquisition processing among different channels on the first associated feature, the second associated feature and the third associated feature through channel splicing, a convolution transpose module and an attention module to obtain corresponding fusion features.

Referring to fig. 7, after the first correlation feature S14 of the spatial image, the second correlation feature S24 of the optical flow image, and the third correlation feature S34 of the density image are subjected to channel stitching, a convolution transpose module, and an attention module, and attention acquisition processing is performed between different channels, a corresponding fusion feature S4, that is, a fusion feature, between different images is obtained.

Further, a fifth example of the method for identifying a fish school feeding behavior of the present invention is provided based on the first, second, third, and fourth examples of the method for identifying a fish school feeding behavior of the present invention.

The fifth embodiment of the method for recognizing a fish school feeding behavior differs from the first, second, third and fourth embodiments of the method for recognizing a fish school feeding behavior in that in step S31, based on the first feature, the second feature and the third feature, a single-image internal feature fusion is performed by using the preset attention mechanism model to obtain a first internal feature corresponding to the first feature, a second internal feature corresponding to the second feature and a third internal feature corresponding to the third feature, and with reference to fig. 8, the step specifically includes:

a step a10 of performing convolution operation on the first feature, the second feature, and the third feature by using the convolution transpose module, and performing feature matrix transpose multiplication on a result of the convolution and the first feature, the second feature, and the third feature to obtain a first convolution feature corresponding to the first feature, a second convolution feature corresponding to the second feature, and a third convolution feature corresponding to the third feature;

step A20, performing convolution operation with a preset number on the first convolution feature, the second convolution feature and the third convolution feature through the attention module, and performing feature matrix multiplication on a result after convolution to obtain a first product feature corresponding to the first convolution feature, a second product feature corresponding to the second convolution feature and a third product feature corresponding to the third convolution;

step a30, respectively performing feature summation on the first product feature, the first product feature and the first product feature, and the first convolution feature, the second convolution feature and the third convolution feature to obtain a corresponding first internal feature, a corresponding second internal feature and a corresponding third internal feature.

In this embodiment, a convolution transpose module performs convolution and feature matrix transpose multiplication on a first feature, a second feature, and a third feature to obtain a first convolution feature corresponding to the first feature, a second convolution feature corresponding to the second feature, and a third convolution feature corresponding to the third feature; the attention module is used for multiplying the first convolution characteristic, the second convolution characteristic and the third convolution characteristic after convolution with a preset number, and adding the multiplication result with the first convolution characteristic, the second convolution characteristic and the third convolution characteristic respectively to obtain a corresponding first internal characteristic, a corresponding second internal characteristic and a corresponding third internal characteristic; thereby achieving accuracy of the internal features of a single image.

The respective steps will be described in detail below:

step a10, performing convolution operation on the first feature, the second feature, and the third feature by using the convolution transpose module, and performing feature matrix transpose multiplication on a result of the convolution and the first feature, the second feature, and the third feature to obtain a first convolution feature corresponding to the first feature, a second convolution feature corresponding to the second feature, and a third convolution feature corresponding to the third feature.

In this embodiment, the predetermined attention mechanism model is composed of a convolution transpose module and an attention module; and performing convolution operation on the first feature, the second feature and the third feature, and performing transposition multiplication on a feature matrix of the result of the convolution operation and the first feature, the second feature and the third feature respectively to obtain a first convolution feature corresponding to the first feature, a second convolution feature corresponding to the second feature and a third convolution feature corresponding to the third feature.

Referring to fig. 9, for example, the first feature of the input single spatial image is [64, 160, 90], and after the first feature [64, 160, 90] is convolved [1*1, 128] by 1*1, the feature [128, 160, 90] of channel stretching to 128 dimensions is obtained. The features [128, 160, 90] are then multiplied by the transpose of the first feature [64, 160, 90] to obtain a corresponding first convolved feature [128, 64].

Step a20, performing convolution operations of a preset number on the first convolution feature, the second convolution feature and the third convolution feature through the attention module, and performing feature matrix multiplication on the result after convolution to obtain a first product feature corresponding to the first convolution feature, a second product feature corresponding to the second convolution feature and a third product feature corresponding to the third convolution.

In this embodiment, the preset attention mechanism model is composed of a convolution transpose module and an attention module; and simultaneously performing two different convolutions on the first convolution characteristic, the second convolution characteristic and the third convolution characteristic through an attention module, and multiplying the result of the two convolution operations by a characteristic matrix to obtain a first product characteristic corresponding to the first convolution characteristic, a second product characteristic corresponding to the second convolution characteristic and a third product characteristic corresponding to the third convolution. The actual convolution operation with the preset number needs to be set according to actual conditions.

Referring to fig. 9, for example, a first convolution feature [128, 64] is simultaneously convolved two-dimensionally by 3*3 in two different steps, the first two-dimensional convolution is (3 × 3, s = (1,1)), the second two-dimensional matrix is (3 × 3, s = (2,1)), and features with dimensions of [128, 64] and [64, 64] are respectively obtained; and then multiplying the feature [128, 64] and the feature [64, 64] by a feature matrix to obtain a first product feature [128, 64] corresponding to the first volume feature.

In this embodiment, the predetermined attention mechanism model is composed of a convolution transpose module and an attention module; and summing the first product characteristic, the second product characteristic and the third product characteristic with the first convolution characteristic, the second convolution characteristic and the third convolution characteristic respectively to obtain a corresponding first internal characteristic, a corresponding second internal characteristic and a corresponding third internal characteristic.

Referring to FIG. 9, for example, the first product feature [128, 64] is summed with the first convolution feature [128, 64] to obtain a single-map internal relationship feature [128, 64], i.e., the first internal feature [128, 64], of the corresponding aerial image.

In this embodiment, the convolution transpose module performs convolution and feature transpose multiplication on the first feature, the second feature, and the third feature to obtain a first convolution feature corresponding to the first feature, a second convolution feature corresponding to the second feature, and a third convolution feature corresponding to the third feature; the attention module is used for multiplying the first convolution characteristic, the second convolution characteristic and the third convolution characteristic after convolution with a preset number, and adding the first convolution characteristic, the second convolution characteristic and the third convolution characteristic respectively to obtain a corresponding first internal characteristic, a corresponding second internal characteristic and a corresponding third internal characteristic; thereby achieving accuracy of the internal features of a single image.

Further, based on the first, second, third, fourth, and fifth embodiments of the method for identifying fish school feeding behavior of the present invention, a sixth embodiment of the method for identifying fish school feeding behavior of the present invention is proposed.

The sixth embodiment of the method for identifying a fish school feeding behavior differs from the first, second, third, fourth and fifth embodiments of the method for identifying a fish school feeding behavior in that this embodiment is to perform feature fusion between a plurality of images by using the preset attention mechanism model according to the first internal feature, the second internal feature and the third internal feature to obtain refinement of a first related feature corresponding to the first internal feature, a second related feature corresponding to the second internal feature and a third related feature corresponding to the third internal feature in step S32, and the step specifically includes, with reference to fig. 10:

step B10, respectively performing feature integration processing on the first internal feature, the second internal feature and the third internal feature to obtain a first integrated feature corresponding to the first internal feature, a second integrated feature corresponding to the second internal feature and a third integrated feature corresponding to the third internal feature;

step B20, performing residual error operation on the first integrated feature, the second integrated feature and the third integrated feature to obtain a first sum feature corresponding to the first integrated feature, a second sum feature corresponding to the second integrated feature and a third sum feature corresponding to the third integrated feature;

step B30, performing pooling operation on the first sum characteristic, the second sum characteristic and the third sum characteristic to obtain corresponding pooled first sum characteristic, pooled second sum characteristic and pooled third sum characteristic;

step B40, performing convolution operation with a preset number on the pooled first sum features, the pooled second sum features and the pooled third sum features through the convolution transpose module, and performing feature matrix multiplication and summation on the result after convolution, the pooled first sum features, the pooled second sum features and the pooled third sum features respectively to obtain corresponding first product features, second product features and third product features;

step B50, performing pooling operation on the first product characteristic, the second product characteristic and the third product characteristic to obtain a pooled first product characteristic, a pooled second product characteristic and a pooled third product characteristic;

step B60, performing convolution operation of a preset number on the pooled first product feature, the pooled second product feature and the pooled third product feature through the convolution transpose module, and performing feature matrix multiplication on a result after convolution and the pooled first product feature, the pooled second product feature and the pooled third product feature respectively and then summing to obtain a corresponding first intention feature, a corresponding second intention feature and a corresponding third intention feature;

and step B70, performing pooling treatment on the first intention characteristic, the second intention characteristic and the third intention characteristic to obtain a corresponding first associated characteristic, a corresponding second associated characteristic and a corresponding third associated characteristic.

In the embodiment, the first associated feature, the second associated feature and the third associated feature are subjected to multiple image integration processing in the same channel to obtain a corresponding first integrated feature, a corresponding second integrated feature and a corresponding third integrated feature; then, performing fusion processing on the multiple image features in the same channel by using the first integration feature, the second integration feature and the third integration feature through twice pooling and a preset attention mechanism model to obtain a corresponding first association feature, a corresponding second association feature and a corresponding third association feature; therefore, the accuracy of the corresponding associated characteristics of the plurality of images in the same channel is realized.

The respective steps will be described in detail below:

and step B10, respectively performing feature integration processing on the first internal feature, the second internal feature and the third internal feature to obtain a first integrated feature corresponding to the first internal feature, a second integrated feature corresponding to the second internal feature and a third integrated feature corresponding to the third internal feature.

In this embodiment, time-series image sets in the same time period are integrated, that is, feature integration of each channel is performed on the first internal feature of the aerial image, the second internal feature of the optical flow image, and the third internal feature of the density image in the same time period, so as to obtain a first integrated feature corresponding to the first internal feature, a second integrated feature corresponding to the second internal feature, and a third integrated feature corresponding to the third internal feature.

Referring to fig. 11, if the time chart image set in the same time period is composed of 3000 images, the feature integration processing in the same channel is performed on the internal features [128, 64] corresponding to the 3000 aerial images, 3000 optical-flow images, and 3000 density images, and a first integrated feature [3000, 128, 64] corresponding to the first internal feature, a second integrated feature [3000, 128, 64] corresponding to the second internal feature, and a third integrated feature [3000, 128, 64] corresponding to the third internal feature are obtained.

And step B20, carrying out residual error operation on the first integrated feature, the second integrated feature and the third integrated feature to obtain a first sum feature corresponding to the first integrated feature, a second sum feature corresponding to the second integrated feature and a third sum feature corresponding to the third integrated feature.

In this embodiment, the preset attention mechanism model is composed of two attention modules. And performing a residual sum operation on the first integrated feature, the second integrated feature and the third integrated feature, namely performing sum operation on the first integrated feature, the second integrated feature and the third integrated feature and the result after the three-dimensional convolution to obtain a first sum feature corresponding to the first integrated feature, a second sum feature corresponding to the second integrated feature and a third sum feature corresponding to the third integrated feature.

Referring to fig. 11, a three-dimensional convolution operation is performed on the first integrated feature [3000, 128, 64], the second integrated feature [3000, 128, 64], and the third integrated feature [3000, 128, 64], and the three-dimensional convolution result is summed up with the first integrated feature [3000, 128, 64], the second integrated feature [3000, 128, 64], and the third integrated feature [3000, 128, 64], respectively, to obtain a first sum feature [3000, 128, 64] corresponding to the first integrated feature, a second sum feature [3000, 128, 64] corresponding to the second integrated feature, and a third sum feature [3000, 128, 64] corresponding to the third integrated feature.

And B30, performing pooling operation on the first sum characteristic, the second sum characteristic and the third sum characteristic to obtain corresponding pooled first sum characteristic, pooled second sum characteristic and pooled third sum characteristic.

In this embodiment, the pre-attention mechanism model consists of two attention modules. And performing pooling operation on the first sum characteristic, the second sum characteristic and the third sum characteristic through a maximum pooling layer to obtain the corresponding pooled first sum characteristic, pooled second sum characteristic and pooled third sum characteristic.

Referring to fig. 11, for example, the first sum features [3000, 128, 64] of the aerial image are pooled by a maximum pooling layer, and the corresponding pooled first sum features [3000, 1000] are obtained.

And step B40, performing convolution operation with preset number on the pooled first sum features, the pooled second sum features and the pooled third sum features through the attention module, and performing feature matrix multiplication and summation on the convolution result, the pooled first sum features, the pooled second sum features and the pooled third sum features respectively to obtain corresponding first product features, second product features and third product features.

In this embodiment, the preset attention mechanism model is composed of two attention modules. And performing feature matrix multiplication on the two convolution operation results, the pooled first sum feature, the pooled second sum feature and the pooled third sum feature respectively and then summing to obtain a corresponding first product feature, a corresponding second product feature and a corresponding third product feature.

Referring to fig. 11, for example, the pooled first and features [3000, 1000] of the aerial image are simultaneously passed through two-dimensional convolution layers of two different steps by convolution transpose, the first two-dimensional convolution is (3 × 3, s = (1,1)) and the second two-dimensional matrix is (6 × 6, s = (3,1)), resulting in features having respective dimensions [3000, 1000] and [1000, 1000 ]; and then multiplying the features [3000, 1000] and the features [1000, 1000] with the pooled first sum features [3000, 1000] respectively by a feature matrix, and summing the multiplied results to obtain first product features [3000, 1000] corresponding to the pooled first sum features.

And B50, performing pooling operation on the first product characteristic, the second product characteristic and the third product characteristic to obtain a pooled first product characteristic, a pooled second product characteristic and a pooled third product characteristic.

In this embodiment, the preset attention mechanism model is composed of two attention modules. Pooling the first product characteristic, the second product characteristic and the third product characteristic through a maximum pooling layer to obtain a pooled first product characteristic corresponding to the first product characteristic, a pooled second product characteristic corresponding to the second product characteristic and a pooled third product characteristic corresponding to the third product characteristic.

Referring to fig. 11, for example, the first product features [3000, 1000] of the aerial image are pooled by a maximum pooling layer to obtain corresponding pooled first product features [1000, 500].

And step B60, performing convolution operation of preset numbers on the pooled first product features, the pooled second product features and the pooled third product features through the attention module, and performing feature matrix multiplication on the convolved result, the pooled first product features, the pooled second product features and the pooled third product features respectively and then summing to obtain corresponding first intention features, second intention features and third intention features.

In this embodiment, the pre-attention mechanism model consists of two attention modules. And simultaneously passing the pooled first product characteristic, the pooled second product characteristic and the pooled third product characteristic through 2 different convolution layers through an attention module, and respectively multiplying the two convolution operation results by the pooled first product characteristic, the pooled second product characteristic and the pooled third product characteristic through a characteristic matrix and then summing to obtain a corresponding first intention characteristic, a corresponding second intention characteristic and a corresponding third intention characteristic. The actual convolution operation with the preset number needs to be set according to actual conditions.

Referring to fig. 11, for example, the pooled first product features [1000, 500] of the aerial image are simultaneously passed through two-dimensional convolution layers of two different steps, the first two-dimensional convolution is (3 × 3, s = (1,1)) and the second two-dimensional matrix is (6 × 6, s = (3,1)), and features with dimensions of [1000, 500] and [500, 500] are obtained respectively; and then, respectively multiplying the features [1000, 500] and the features [500, 500] by the feature matrix of the pooled first product features [1000, 500], and summing the multiplied results to obtain corresponding first intention features [1000, 500].

In this embodiment, the preset attention mechanism model is composed of two attention modules. And performing pooling processing on the first intention characteristic, the second intention characteristic and the third intention characteristic through a maximum pooling layer to obtain a corresponding first associated characteristic, a corresponding second associated characteristic and a corresponding third associated characteristic.

Referring to fig. 11, for example, the first intention feature [1000, 500] of the spatial image is pooled through a maximum pooling layer to obtain a corresponding video relation feature [500, 250], i.e. a first associated feature [500, 250].

In this embodiment, the first associated feature, the second associated feature, and the third associated feature are subjected to multiple image integration processing in the same channel to obtain a corresponding first integrated feature, a corresponding second integrated feature, and a corresponding third integrated feature; then, performing fusion processing on the multiple image features in the same channel by using the first integration feature, the second integration feature and the third integration feature through twice pooling and a preset attention mechanism model to obtain a corresponding first association feature, a corresponding second association feature and a corresponding third association feature; therefore, the accuracy of the corresponding associated features of the multiple images in the same channel is realized.

The sixth embodiment of the method for identifying a fish school feeding behavior differs from the first, second, third, fourth and fifth embodiments of the method for identifying a fish school feeding behavior in that in step S33, feature fusion between different domain features is performed by the preset attention mechanism model according to the first relevant feature, the second relevant feature and the third relevant feature, so as to obtain refinement of corresponding fusion features, and referring to fig. 12, the step specifically includes:

step C10, channel splicing is carried out on the first correlation characteristic, the second correlation characteristic and the third correlation characteristic to obtain corresponding three-dimensional matrix characteristics;

step C20, performing convolution operation on the three-dimensional matrix characteristic through the convolution transposition module, and performing characteristic matrix multiplication on a result after convolution and the three-dimensional matrix characteristic to obtain a coding characteristic corresponding to the three-dimensional matrix characteristic;

step C30, performing convolution operation of a preset number on the coding features through the attention module, and multiplying a feature matrix of a result after convolution to obtain product coding features corresponding to the coding features;

and step C40, performing characteristic summation on the product coding characteristics to obtain fusion characteristics corresponding to the product coding characteristics.

In the embodiment, the corresponding three-dimensional matrix characteristic is obtained by channel splicing the first correlation characteristic, the second correlation characteristic and the third correlation characteristic; performing feature normalization fusion processing on the three-dimensional matrix features through a preset attention mechanism model consisting of a convolution transposition module and an attention module to obtain fusion features corresponding to product coding features; therefore, the accuracy of the fusion characteristics of the multi-channel image is realized.

The respective steps will be described in detail below:

and C10, channel splicing is carried out on the first correlation characteristic, the second correlation characteristic and the third correlation characteristic to obtain corresponding three-dimensional matrix characteristics.

In this embodiment, the predetermined attention mechanism model is composed of a convolution transpose module and an attention module. And performing attention compression and splicing on the first correlation characteristic, the second correlation characteristic and the third correlation characteristic to form a three-dimensional matrix characteristic corresponding to the first correlation characteristic, the second correlation characteristic and the third correlation characteristic.

Referring to fig. 13, the first correlation features [500, 250] of the space image, the second correlation features [500, 250] of the optical flow image, and the third correlation features [500, 250] of the density image are subjected to a channel stitching process of attention compression and stitching to obtain corresponding three-dimensional matrix features [3, 500, 250].

And step C20, performing convolution operation on the three-dimensional matrix characteristic through the convolution transposition module, and performing characteristic matrix multiplication on the convolved result and the three-dimensional matrix characteristic to obtain a coding characteristic corresponding to the three-dimensional matrix characteristic.

In this embodiment, the predetermined attention mechanism model is composed of a convolution transpose module and an attention module. And performing convolution operation on the three-dimensional matrix characteristic through a convolution transposition module, and multiplying the result after the convolution operation by the transposition of the three-dimensional matrix characteristic to obtain the coding characteristic corresponding to the three-dimensional matrix characteristic.

Referring to fig. 13, for example, after the three-dimensional matrix feature [3, 500, 250] is operated by a convolution transpose module to [1*1, 128], the feature [3, 128, 250] is obtained, and the feature [3, 128, 250] is multiplied by the transpose of the three-dimensional matrix feature [3, 500, 250], so as to obtain the recombined coding feature [128, 250].

And step C30, performing convolution operation of a preset number on the coding features through the attention module, and performing feature matrix multiplication on the convolved results to obtain product coding features corresponding to the coding features.

In this embodiment, the predetermined attention mechanism model is composed of a convolution transpose module and an attention module. And simultaneously passing the coding features through two different convolutions by the attention module, and multiplying the result of the two convolution operations by a feature matrix to obtain the product coding features corresponding to the coding features. The actual convolution operation with the preset number needs to be set according to actual conditions.

Referring to fig. 13, for example, the encoding features [128, 250] are simultaneously passed through two-dimensional convolutions of two different steps by the attention module, the first two-dimensional convolution is (3 × 3, s = (1,1)) and the second two-dimensional matrix is (1 × 1, s = (2,1)), resulting in features having dimensions of [128, 500] and [500, 500], respectively; and then multiplying the feature [128, 500] and the feature [500, 500] by a feature matrix to obtain a corresponding product coding feature [128, 500].

And C40, performing feature summation on the product coding features to obtain fusion features corresponding to the product coding features.

In this embodiment, the predetermined attention mechanism model is composed of a convolution transpose module and an attention module. And performing characteristic summation on the product coding characteristics and the coding characteristics to obtain fusion characteristics corresponding to the product coding characteristics.

Referring to fig. 13, for example, the product coding features [128, 500] and the coding features [128, 500] are subjected to feature summation to obtain multi-channel features [128, 500], that is, fusion features corresponding to the product coding features are obtained.

In this embodiment, channel splicing is performed on the first correlation characteristic, the second correlation characteristic, and the third correlation characteristic to obtain corresponding three-dimensional matrix characteristics; performing feature normalization fusion processing on the three-dimensional matrix features through a preset attention mechanism model consisting of a convolution transposition module and an attention module to obtain fusion features corresponding to product coding features; therefore, the accuracy of the fusion characteristics of the multi-channel image is realized.

The invention also provides a device for identifying the ingestion behavior of the fish school. Referring to fig. 14, the apparatus for recognizing a fish school feeding behavior of the present invention comprises:

the acquisition module 10 is configured to acquire a timing sequence image set of a fish school feeding behavior, and pre-process the timing sequence image set by respectively using a preset enhancement algorithm, a preset optical flow method and a preset gaussian density model to obtain a corresponding spatial image, an optical flow image and a density image;

an extraction module 20, configured to extract, based on the spatial images, the optical flow images, the density images, and a preset attention mechanism model, first features corresponding to the spatial images, second features corresponding to the optical flow images, and third features corresponding to the density images;

a fusion module 30, configured to perform feature fusion through the preset attention mechanism model based on the first feature, the second feature, and the third feature, and determine a corresponding fusion feature;

and the identification module 40 is used for carrying out feeding behavior identification on the fish swarm based on the fusion characteristics.

Furthermore, the present invention also provides a medium which is a computer-readable storage medium having stored thereon a program for identifying a fish school feeding behavior, the program for identifying a fish school feeding behavior realizing the steps of the method for identifying a fish school feeding behavior as described above when executed by a processor.

The method implemented when the program for identifying the fish school feeding behavior executed on the processor is executed may refer to various embodiments of the method for identifying the fish school feeding behavior of the present invention, and will not be described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present specification and the attached drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for identifying a fish school feeding behavior, comprising the steps of:

2. The method of claim 1, wherein the predetermined attention mechanism model comprises a convolution transpose module, an attention module, or a combination of a convolution transpose module and an attention module.

3. The method for identifying fish school feeding behavior according to claim 1, wherein the step of extracting a first feature corresponding to each spatial image, a second feature corresponding to each optical flow image, and a third feature corresponding to each density image based on the spatial image, the optical flow image, the density image, and a preset feature extraction model comprises:

4. The method for identifying fish feeding behavior as claimed in claim 2, wherein the step of performing feature fusion by the predetermined attention mechanism model based on the first feature, the second feature and the third feature, and determining a corresponding fusion feature comprises:

according to the first internal feature, the second internal feature and the third internal feature, feature fusion is carried out among a plurality of images through the preset attention mechanism model, and a first associated feature corresponding to the first internal feature, a second associated feature corresponding to the second internal feature and a third associated feature corresponding to the third internal feature are obtained;

and according to the first correlation characteristic, the second correlation characteristic and the third correlation characteristic, clustering characteristic fusion characteristics among different domain characteristics through the preset attention mechanism model to obtain corresponding fusion characteristics.

5. The method according to claim 4, wherein the preset attention mechanism model comprises a convolution transpose module and an attention module, and the step of performing intra-image feature fusion on the single image according to the first feature, the second feature and the third feature by using the preset attention mechanism model to obtain a first intra-feature corresponding to the first feature, a second intra-feature corresponding to the second feature and a third intra-feature corresponding to the third feature comprises:

performing convolution operation with a preset number on the first convolution characteristic, the second convolution characteristic and the third convolution characteristic through the attention module, and performing characteristic matrix multiplication on a result after convolution to obtain a first product characteristic corresponding to the first convolution characteristic, a second product characteristic corresponding to the second convolution characteristic and a third product characteristic corresponding to the third convolution;

6. The method as claimed in claim 4, wherein the predetermined attention mechanism model comprises an attention module, and the step of performing feature fusion between a plurality of images by the predetermined attention mechanism model according to the first internal feature, the second internal feature and the third internal feature to obtain a first related feature corresponding to the first internal feature, a second related feature corresponding to the second internal feature and a third related feature corresponding to the third internal feature comprises:

performing convolution operation of a preset number on the pooled first product characteristic, the pooled second product characteristic and the pooled third product characteristic through the attention module, performing feature matrix multiplication on the convolved result, the pooled first product characteristic, the pooled second product characteristic and the pooled third product characteristic respectively, and summing to obtain a corresponding first intention characteristic, a corresponding second intention characteristic and a corresponding third intention characteristic;

7. The method as claimed in claim 4, wherein the predetermined attention mechanism model includes a convolution transpose module and an attention module, and the step of performing feature fusion between different domain features through the predetermined attention mechanism model according to the first correlation feature, the second correlation feature and the third correlation feature to obtain a corresponding fusion feature comprises:

performing convolution operation on the three-dimensional matrix characteristic through the convolution transposition module, and performing characteristic matrix multiplication on a convolution result and the three-dimensional matrix characteristic to obtain a coding characteristic corresponding to the three-dimensional matrix characteristic;

8. A fish school feeding behavior recognition apparatus, comprising:

the fusion module is used for performing feature fusion through a preset attention mechanism model based on the first feature, the second feature and the third feature and determining corresponding fusion features;

and the identification module is used for identifying the feeding behavior of the fish school based on the fusion characteristics.

9. An apparatus for identifying a fish feeding behavior, comprising: a memory, a processor and a program for identifying fish school feeding behavior stored on the memory and executable on the processor, the program for identifying fish school feeding behavior when executed by the processor implementing the steps of the method for identifying fish school feeding behavior as claimed in any one of claims 1 to 7.

10. A medium, which is a computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a program for recognition of fish shoal feeding behavior, which when executed by a processor implements the steps of the method for recognition of fish shoal feeding behavior according to any one of claims 1 to 7.