CN113627558A

CN113627558A - Fish image identification method, system and equipment

Info

Publication number: CN113627558A
Application number: CN202110955820.6A
Authority: CN
Inventors: 孔青; 仲国强; 陈振潮
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2021-08-19
Filing date: 2021-08-19
Publication date: 2021-11-09

Abstract

The invention relates to a fish image identification method, a system and equipment, wherein the identification method comprises the following steps: s1, collecting fish images, screening the fish images, enhancing data and uniformly processing the fish images in size, determining classification labels and constructing a training data set; s2, adding a CBAM module into each residual block of the depth residual error network to construct a fish image identification network; s3, training the fish image recognition network constructed in the step S2 by using a training data set, and obtaining a fish image recognition model after training is finished; and S4, screening the fish images to be recognized, enhancing data and uniformly processing the fish images to be recognized, and recognizing the processed fish images to be recognized by using the fish image recognition model to obtain a recognition result. The method can improve the recognition rate of common fish images, and the recognition accuracy rate can reach more than 80%.

Description

Fish image identification method, system and equipment

Technical Field

The invention belongs to the technical field of image recognition, relates to a fish image recognition technology, and particularly relates to a fish image recognition method, a fish image recognition system and fish image recognition equipment.

Background

With the increasing demand of human beings on marine resources, marine fishery resources are more and more emphasized. In order to protect a marine ecosystem and prevent fishermen from catching inappropriate fishes in inappropriate time periods, monitoring cameras are installed on the deck of a fishing boat in many countries and international organizations, but due to the fact that the conditions of frequent rain and wave mixing, water mist diffusion and the like in the marine operation process seriously affect the monitoring image quality, a supervisor is difficult to identify the fishes in a picture, massive monitoring videos need a large amount of human resources, and the identification accuracy is poor.

Disclosure of Invention

Aiming at the problems in the technology, the invention provides a fish image identification method and system equipment, which can accurately and quickly identify and classify fish.

In order to achieve the above object, the present invention provides a fish image recognition method, which comprises the following specific steps:

s1, collecting fish images, screening the fish images, enhancing the data and uniformly processing the fish images in size, determining classification labels and constructing a training data set;

s2, adding a CBAM module into each residual block of the depth residual error network to construct a fish image identification network;

s3, training the fish image recognition network constructed in the step S2 by using a training data set, and obtaining a fish image recognition model after training is finished;

and S4, screening the fish images to be recognized, enhancing data and uniformly processing the fish images to be recognized, and recognizing the processed fish images to be recognized by using the fish image recognition model to obtain a recognition result.

Preferably, in step S1 and step S4, the specific requirements of the screening are: removing fish images with unclear or incomplete fish features 1/3, removing fish images with a resolution below 100, and removing fish images that are different from a certain fish category.

Preferably, in step S1 and step S4, the data enhancement adopts a torchvision image library operation, which specifically includes: and horizontally turning the image, randomly cutting, adding Gaussian noise, adjusting the image brightness and randomly rotating and transforming the image.

Preferably, the CBAM module includes a channel attention module and a space attention module arranged in sequence, and given an input feature map, the CBAM module successively infers an attention map along two separate dimensions of the channel attention module and the space attention module, and then multiplies the attention map with the input feature map to obtain a refined adaptive feature map, which is a final output feature map of the CBAM module and is expressed as:

wherein F' is the final feature output diagram,

representing multiplication of elements, F' obtaining an output feature map for the channel attention module, M_C(F) Channel attention map, M, inferred for the channel attention Module_s(F') is a spatial attention diagram deduced by the spatial attention module, and F is an input feature diagram.

Preferably, the channel attention module generates two different spatial context descriptors by aggregating the spatial information of the input feature map F using average pooling and maximum pooling operations, and forwards the two descriptors to a shared network composed of multiple layers of perceptrons MLP, wherein the shared network includes a hidden layer, to generate the channel attention map.

Preferably, after the shared network is applied to each descriptor, the output feature vectors are combined using an element addition method to obtain a channel attention map, which is expressed as:

where Avg (-) is the average pooling characteristic function, Max (-) is the maximum pooling characteristic function, MLP (-) represents the multi-layered perceptron output function,

the features obtained from the average pooling of the feature maps F,

features obtained for maximum pooling of feature map F, W₀(. is a linear function of the first layer network in the shared network, W₁The (-) is a linear variation function of a second layer network in the shared network, and the sigma (-) represents a sigmoid function.

Preferably, the spatial attention module gathers the channel information of the feature map F' by using the average pooling and maximum pooling operations to generate two-dimensional maps, each representing an average ensemble feature and a maximum ensemble feature of the entire channel, connects the average ensemble feature and the maximum ensemble feature, and convolves them by a standard convolution layer to generate a spatial attention map, represented as:

in the formula (f)^m*m(. cndot.) is a convolution function, m represents the convolution kernel size,

the features obtained from the average pooling of the feature maps F',

features obtained for maximum pooling of the feature map F'.

Preferably, in step S3, when training the fish image recognition network, the Adam algorithm is used to perform network parameter optimization, the output layer is classified by using a Softmax function, and the cross entropy loss function is used to perform network optimization.

In order to achieve the above object, the present invention also provides a fish image recognition system, comprising:

the data acquisition module is used for acquiring fish image data;

the training data set construction module is used for screening fish images, enhancing data and uniformly processing sizes, determining classification labels and constructing a training data set;

the fish image identification network construction module is used for adding a CBAM module into each residual block of the depth residual network to construct a fish image identification network;

the training module is used for training the constructed fish image recognition network by utilizing the training data set, and obtaining a fish image recognition model after the training is finished;

and the identification module is used for identifying the fish image to be identified according to the fish image identification model.

In order to achieve the above object, the present invention further provides a fish image identification apparatus, comprising a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor, the computer program being configured to implement the steps of the above fish image identification method.

Compared with the prior art, the invention has the advantages and positive effects that:

(1) the invention discloses a CBAM module (attention mechanism module) in a residual block of a convolutional layer, which is a simple and efficient forward convolutional neural network attention module. Because the CBAM module is a lightweight universal module, the CBAM module can be seamlessly integrated into any CNN framework, almost has no influence on efficiency and computing power, can realize end-to-end training and improve the recognition efficiency. According to the invention, an attention mechanism is introduced on the basis of a convolution residual error network, fish information which is more critical to the current task is focused in the input fish picture information, the attention degree to other information is reduced, even irrelevant information is filtered out, the problem of information overload can be solved, and the efficiency and the accuracy of task processing are improved.

(2) The fish image acquisition and classification label provided by the invention come from network and manual screening, so that the fish image identification method provided by the invention has higher identification rate for common fish images, and the identification accuracy can reach 80.95%.

Drawings

FIG. 1 is a flow chart of a fish image recognition method according to an embodiment of the invention;

FIG. 2 is a schematic structural diagram of a CBAM module according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a channel attention module according to an embodiment of the invention;

FIG. 4 is a schematic structural diagram of a spatial attention module according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a structure of a residual block according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a convolutional layer according to an embodiment of the present invention;

fig. 7 is a schematic view of a display interface of a fish identification result according to an embodiment of the present invention.

Detailed Description

The invention is described in detail below by way of exemplary embodiments. It should be understood, however, that elements, structures and features of one embodiment may be beneficially incorporated in other embodiments without further recitation.

Example 1: referring to fig. 1, the embodiment provides a fish image identification method, which specifically includes the steps of:

and S1, collecting fish images, screening the fish images, enhancing the data, uniformly processing the sizes of the fish images, determining classification labels, and constructing a training data set.

Specifically, the screening may be manual screening or automatic screening (completed by screening software), and the specific requirements of the screening are as follows: removing fish images with unclear or incomplete fish features 1/3, removing fish images with a resolution below 100, and removing fish images that are different from a certain fish category.

Specifically, the data enhancement adopts a torchvision image library operation, and specifically includes: and horizontally turning the image, randomly cutting, adding Gaussian noise, adjusting the image brightness and randomly rotating and transforming the image. The angle of the random rotation transformation can be different from 30 degrees, 60 degrees, 90 degrees and 270 degrees, and the transformation can be specifically carried out according to the actual requirement.

After data enhancement, all fish pictures are unified in size, for example: the size of all fish pictures is normalized to 224 × 224 pixels, which is determined according to actual conditions.

And S2, adding a CBAM module into each residual block of the depth residual error network to construct a fish image identification network.

Specifically, referring to fig. 2, the CBAM module includes a channel attention module and a space attention module arranged in sequence, and given an input feature map, the CBAM module sequentially infers an attention map along two separate dimensions of the channel attention module and the space attention module, and then multiplies the attention map with the input feature map to obtain a refined adaptive feature map, which is a final output feature map of the CBAM module and is represented as:

wherein F' is the final feature output diagram,

Referring to fig. 3, the channel attention module generates two different spatial context descriptors by aggregating the spatial information of the input feature map F using average pooling and maximum pooling operations, and forwards the two descriptors to a shared network composed of multiple layers of perceptrons MLP, wherein the shared network includes a hidden layer, to generate the channel attention map. After the shared network is applied to each descriptor, the output feature vectors are combined using element addition, resulting in a channel attention map, denoted as:

in the formula, Avg (. cndot.) is an average pooling feature functionNumber, Max (-) is the maximum pooled feature function, MLP (-) represents the multi-layered perceptron output function,

the features obtained from the average pooling of the feature maps F,

Referring to fig. 4, the spatial attention module aggregates the channel information of the feature map F' by using the average pooling and maximum pooling operations to generate two-dimensional maps, each representing the average ensemble feature and the maximum ensemble feature of the entire channel, connects the average ensemble feature and the maximum ensemble feature, and convolves them by a standard convolution layer to generate a spatial attention map, represented as:

the features obtained from the average pooling of the feature maps F',

features obtained for maximum pooling of the feature map F'.

And S3, training the fish image recognition network constructed in the step S2 by using the training data set, and obtaining a fish image recognition model after training.

Specifically, when the fish image recognition network is trained, the Adam algorithm is adopted for optimizing network parameters, the output layer is classified by adopting a Softmax function, and a cross entropy loss function is adopted for optimizing the network. It should be noted that, the Adam algorithm is adopted for parameter optimization, and since the algorithm combines the advantages of the adaptive gradient algorithm and the root-mean-square propagation algorithm, the learning rate of each parameter is dynamically adjusted by calculating the first moment estimate and the second moment estimate of the gradient of each parameter.

Wherein, the expression of the Softmax function is as follows:

in the formula, sigma (z)_jThe j is the output value of the jth neuron, and j is 1, 2.

It should be noted that the Softmax function maps the output values of a plurality of neurons into a [0,1] interval, each value in the interval representing the probability that the sample belongs to each class, and the sum of the values is 1.

The cross entropy loss function is expressed as:

in the formula, G_lossFor the loss value, m is the number of samples of the current batch of input networks, n is the number of classes,

is a genuine label, y_ijIs a predicted label.

It should be noted that the cross entropy function describes the distance between the actual output probability and the expected output probability distribution, and the smaller the value of the cross entropy function is, the better the learning effect in the model training process is.

Specifically, the screening mode may be manual screening or automatic screening (completed by using screening software), and the specific requirements of the screening are as follows: removing fish images with unclear or incomplete fish features 1/3, removing fish images with a resolution below 100, and removing fish images that are different from a certain fish category.

Specifically, the data enhancement adopts a torchvision image library operation, and specifically includes: and (3) performing horizontal turning, random clipping, Gaussian noise addition, image brightness adjustment (brightness adjustment or dimming) and random rotation transformation on the image. The angle of the random rotation transformation can be different from 30 degrees, 60 degrees, 90 degrees and 270 degrees, and the transformation can be specifically carried out according to the actual requirement.

In order to evaluate the identification accuracy of the fish image identification model, the embodiment adopts Top-1 accuracy (Acc for short)_Top-1) As an evaluation criterion, the Top-1 accuracy is the probability that the fish class represented by the maximum probability in the last output probability vector is consistent with the correct fish class, and the formula is as follows:

wherein N represents the total number of images, N_Top-1Indicating the number of correctly classified images.

The fish image recognition method comprises the steps of collecting fish images, constructing a training data set, combining a depth residual error network and a CBAM (CBAM) module based on the depth residual error network, constructing a fish image recognition network, training the fish image recognition network through the training data set to obtain a fish image recognition model, and recognizing the fish images to be recognized through the fish image recognition model. Because the fish image recognition method is based on the CBAM module, the module is a lightweight universal module, can be seamlessly integrated into any CNN framework, almost has no influence on efficiency and algorithm, can realize end-to-end training requirements, improves the feature expression capability of the network, further improves the convergence speed and the test precision of the fish image recognition model, and has the advantages of simple structure and good recognition effect of the trained model.

Example 2: referring to the drawings, the present embodiment provides a fish image recognition system, including:

the data acquisition module 1 is used for acquiring fish image data;

the training data set construction module 2 is used for screening, enhancing and uniformly processing the fish images in size, determining classification labels and constructing a training data set;

the fish image identification network construction module 3 is used for adding a CBAM module into each residual block of the depth residual network to construct a fish image identification network;

the training module 4 is used for training the constructed fish image recognition network by utilizing the training data set, and obtaining a fish image recognition model after the training is finished;

and the identification module 5 is used for identifying the fish image to be identified according to the fish image identification model.

Referring to fig. 2, the CBAM module includes a channel attention module and a space attention module, which are sequentially arranged, and given an input feature map, the CBAM module sequentially infers an attention map along two separate dimensions of the channel attention module and the space attention module, and then multiplies the attention map with the input feature map to obtain a refined adaptive feature map, which is a final output feature map of the CBAM module and is represented as:

wherein F' is the final feature output diagram,

representing multiplication of elements, F' obtaining an output feature map for the channel attention module, M_C(F) Channel attention map, M, inferred for the channel attention Module_s(F') spatial annotation inferred by the spatial attention ModuleIn the diagram, F is an input feature map.

the features obtained from the average pooling of the feature maps F,

in the formula (f)^m*mIs a convolution functionM x m denotes the convolution kernel size,

the features obtained from the average pooling of the feature maps F',

features obtained for maximum pooling of the feature map F'.

Specifically, when the fish image recognition network constructed by the training data set is trained, the Adam algorithm is adopted for optimizing network parameters, the output layer is classified by the Softmax function, and the cross entropy loss function is adopted for optimizing the network. It should be noted that, the Adam algorithm is adopted for parameter optimization, and since the algorithm combines the advantages of the adaptive gradient algorithm and the root-mean-square propagation algorithm, the learning rate of each parameter is dynamically adjusted by calculating the first moment estimate and the second moment estimate of the gradient of each parameter.

Wherein, the expression of the Softmax function is as follows:

The cross entropy loss function is expressed as:

is a real targetStick, y_ijIs a predicted label.

In the fish recognition system, the fish images are collected through the data collection module, the training data set is constructed through the training data set construction module, the depth residual error network is combined with the CBAM module through the fish image recognition network construction module based on the depth residual error network to construct the fish image recognition network, then the training module trains the fish image recognition network through the training data set to obtain the fish image recognition model, and finally the fish images to be recognized are recognized through the recognition module and the fish image recognition model. When the fish image recognition system constructs the fish image recognition network, the residual error network is combined with the CBAM module based on the deep residual error network, the CBAM module is a lightweight universal module and can be seamlessly integrated into any CNN framework, the efficiency and the algorithm are hardly influenced, the end-to-end training requirement can be realized, the feature expression capability of the network is improved, the convergence speed and the test precision of the fish image recognition model are further improved, and the trained model is simple in structure and good in recognition effect.

Example 3: the present embodiment provides a fish image identification apparatus comprising a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor, the computer program being configured to implement the steps of the fish image identification method of embodiment 1.

The above method is described below with reference to specific examples.

Collecting fish images, screening the collected fish images, enhancing data and uniformly processing sizes, determining classification labels and constructing a training data set.

The screening method specifically comprises the following steps: removing fish images with unclear or incomplete fish characteristics 1/3, removing fish images with resolution lower than 100 and removing fish images different from a certain fish category by manual commander selection. The quality of the training data set is improved through screening, the screened fish images contain 21 common fish species, the total number of the fish images is 300, and the image format is PG or PNG.

Because the number of the fish images obtained by manual screening is relatively small for the training sample size required by the depth residual error network, the number of the fish images is expanded by adopting a data enhancement mode, a single picture can be expanded into a plurality of image copies by data enhancement, the training sample size is greatly increased, the generalization of the network is further improved, and overfitting is reduced. In the data enhancement of the embodiment, a torchvision image library is adopted to perform data enhancement on a sample, and the data enhancement specifically comprises the following steps: and (3) performing horizontal turning, random clipping, Gaussian noise addition, image brightness adjustment (brightness adjustment or dimming) and random rotation transformation on the image. The random rotation transformation angles are 30 degrees, 60 degrees, 90 degrees and 270 degrees.

After the data enhancement was completed, all fish images were normalized in size to 224 x 224 pixels.

And finally, adding a classification label to each image to form a training data set, and writing the training data set into a CSV file, so that the deep residual error network can be conveniently read.

ResNet50 is used as the backbone network of the embodiment, and CBAM modules are added into each residual module of the convolution layer in ResNet50, so that the fish image identification network is constructed. The CBAM module is added into the ResNet50, so that the feature expression capability of the network can be improved on one hand, and on the other hand, the network can be informed of what to pay attention to, and the characterization of a specific area can be enhanced.

Referring to fig. 5, for the deep residual network, if the optimal feature output is y and the input obtained by the CBAM module is x, the desired non-linear processing result (i.e., residual) provided by the CBAM module is f (x) y-x, so that the output is f (x) + x. If the front shallow network already provides the optimal output x ═ y, f (x) should approach 0, so that it is ensured that the error rate of the fish image identification network is not higher. In the embodiment of the present invention, three layers of convolution, channel attention and spatial attention are included, and the actually used residual block structure is shown in fig. 6. The results of fish image recognition by the fish image recognition model are shown in fig. 7.

The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are possible within the spirit and scope of the claims.

Claims

1. A fish image identification method is characterized by comprising the following specific steps:

2. The fish image recognition system of claim 1, wherein in steps S1 and S4, the specific requirements of the screening are: removing fish images with unclear or incomplete fish features 1/3, removing fish images with a resolution below 100, and removing fish images that are different from a certain fish category.

3. The fish image recognition system of claim 1, wherein the data enhancement operates using a torchvision image library in steps S1 and S4, and specifically comprises: and horizontally turning the image, randomly cutting, adding Gaussian noise, adjusting the image brightness and randomly rotating and transforming the image.

4. The fish image recognition method of claim 1, wherein in step S2, the CBAM module comprises a channel attention module and a space attention module arranged in sequence, given an input feature map, the CBAM module sequentially infers an attention map along two separate dimensions of the channel attention module and the space attention module, and then multiplies the attention map with the input feature map to obtain a refined adaptive feature map, which is a final output feature map of the CBAM module and is represented as:

wherein F' is the final feature output diagram,

5. The fish image recognition method of claim 4, wherein the channel attention module generates two different spatial context descriptors by aggregating the spatial information of the input feature map F using average pooling and maximum pooling operations, and forwards the two descriptors to a shared network consisting of a plurality of layers of perceptrons MLP, wherein the shared network contains a hidden layer, to generate the channel attention.

6. The fish image recognition method of claim 5, wherein after the shared network is applied to each descriptor, the output feature vectors are combined using element addition to obtain a channel attention map represented as:

where Avg (-) is the average pooling characteristic function, Max (-) is the maximum pooling characteristic function, MLP (-) represents the multi-layered perceptron output characteristic graph,

the features obtained from the average pooling of the feature maps F,

7. The fish image recognition method of claim 4, wherein the spatial attention module gathers channel information of the feature map F' by using average pooling and maximum pooling operations to generate two-dimensional maps, each representing average and maximum ensemble features of the entire channel, connects the average and maximum ensemble features, and convolves with a standard m x m convolution layer to generate the spatial attention map, represented as:

in the formula (f)^m*m(. cndot.) is a function of convolution,

the features obtained from the average pooling of the feature maps F',

for features resulting from maximal pooling of the feature map F', m × m represents the convolution kernel size.

8. The fish image recognition method of claim 1, wherein in step S3, when training the fish image recognition network, Adam algorithm is used for network parameter optimization, the output layer is classified by Softmax function, and cross entropy loss function is used for network optimization.

9. A fish image recognition system, comprising:

the data acquisition module is used for acquiring fish image data;

10. A fish image identification device comprising a computer memory, a computer processor and a computer program stored in the computer memory and executable on the computer processor, characterized in that the computer program is arranged to implement the steps of the fish image identification method according to any one of claims 1 to 8.