CN116503896A

CN116503896A - Fish image classification method, device and equipment

Info

Publication number: CN116503896A
Application number: CN202310156728.2A
Authority: CN
Inventors: 赵然; 逯嘉敏; 张松; 赵世理; 李道亮
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2023-07-28

Abstract

The embodiment of the invention provides a fish image classification method, a device and equipment, wherein the method comprises the following steps: acquiring an image of target fish to be classified; inputting the target fish image into a pre-trained fish classification model to obtain a classification result corresponding to the target fish image; the fish classification model is obtained based on training of a plurality of types of fish images; the fish classification model is used for determining a classification result corresponding to the target fish image based on the characteristic information of the target fish image and the distance between the class prototypes of each fish. The method provided by the embodiment of the invention accurately realizes the effective identification and classification of the fish images with disordered backgrounds collected under the underwater complex scene.

Description

Fish image classification method, device and equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a device for classifying fish images.

Background

Fish is a global important protein source, and fish identification has important significance for protecting, reasonably developing and utilizing fish resources. However, in real life, different kinds of fish are similar in shape and size, and underwater images are susceptible to noise interference and color distortion, and accurate identification of underwater fish has a great challenge.

In the related art, less research is conducted on fish image classification, most of the research is only focused on one data set, a complete detection and classification method is not available, and fish image recognition cannot be effectively conducted. Therefore, how to accurately classify fish through collected images with disordered backgrounds in a complex underwater environment is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

Aiming at the problems in the prior art, the embodiment of the invention provides a fish image classification method, a device and equipment.

Specifically, the embodiment of the invention provides the following technical scheme:

in a first aspect, an embodiment of the present invention provides a fish image classification method, including:

acquiring an image of target fish to be classified;

inputting the target fish image into a pre-trained fish classification model to obtain a classification result corresponding to the target fish image; the fish classification model is obtained based on training of a plurality of types of fish images; the fish classification model is used for determining a classification result corresponding to the target fish image based on the characteristic information of the target fish image and the distance between the class prototypes of each fish.

Further, the fish classification model includes:

the device comprises a feature extraction module and a classification module; the feature extraction module is connected with the classification module;

the characteristic extraction module is used for extracting characteristic information in the target fish image; the feature extraction module is obtained based on a residual network model and a mixed attention mechanism model;

the classification module is used for determining a classification result corresponding to the target fish image based on the characteristic information in the target fish image and the distance between the class prototypes of each fish.

Further, the mixed attention mechanism model comprises a channel attention mechanism model and a space attention mechanism model, wherein the channel attention mechanism model is constructed based on a selective kernel network SKNet model;

the spatial attention mechanism model includes a pooling layer, a normalization layer, a convolution layer, and a nonlinear sigmoid function of neurons.

Further, the classification module determines a classification result corresponding to the target fish image based on the following manner:

determining the distance between each characteristic descriptor in the characteristic information of the target fish image and each characteristic descriptor of the class prototype of each fish;

According to the distances between each feature descriptor in the feature information of the target fish image and each feature descriptor of the class prototype of each fish, determining K nearest feature descriptors in the feature descriptors of the class prototype of each fish corresponding to each feature descriptor in the feature information of the target fish image;

determining the distance between the target fish image and each fish class prototype according to each characteristic descriptor in the characteristic information of the target fish image and K nearest characteristic descriptors in the characteristic descriptors of the class prototype of each fish corresponding to each characteristic descriptor in the characteristic information of the target fish image;

and determining a classification result of the target fish image according to the distance between the target fish image and each fish class prototype.

Further, K nearest feature descriptors among the feature descriptors of the class prototype of each fish corresponding to each feature descriptor in the feature information of the target fish image are determined by using the following formula:

wherein x is _i Feature descriptors in feature information representing the image of the target fish,a feature descriptor representing a class prototype of the fish;

Determining the distance between the target fish image and each fish class prototype by using the following formula:

wherein, the liquid crystal display device comprises a liquid crystal display device,characteristic information f representing image of target fish _q (x) And class prototype of class K->Distance between them.

Further, the fish classification model is trained based on the following:

acquiring a plurality of types of fish image samples;

inputting the fish image samples of each type into a characteristic extraction module of a fish classification model to obtain characteristic information of each fish image sample of each type;

respectively inputting the characteristic information of each fish image sample of each type into a classification module in a fish classification model to obtain a class prototype of each type of fish;

outputting classification results corresponding to the fish image samples according to the characteristic information of the fish image samples and the distances of the class prototypes of the fish types;

training the fish classification model according to classification results corresponding to the fish image samples, label information corresponding to the fish image samples and a target loss function to obtain a trained fish classification model.

In a second aspect, an embodiment of the present invention further provides a fish image classification apparatus, including:

The acquisition module is used for acquiring the target fish images to be classified;

the classification module is used for inputting the target fish image into a pre-trained fish classification model to obtain a classification result corresponding to the target fish image; the fish classification model is obtained based on training of a plurality of types of fish images; the fish classification model is used for determining a classification result corresponding to the target fish image based on the characteristic information of the target fish image and the distance between the class prototypes of each fish.

In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the fish image classification method according to the first aspect when executing the program.

In a fourth aspect, embodiments of the present invention also provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the fish image classification method according to the first aspect.

In a fifth aspect, embodiments of the present invention also provide a computer program product comprising a computer program which, when executed by a processor, implements the fish image classification method according to the first aspect.

According to the fish image classification method, the device and the equipment provided by the embodiment of the invention, the classification result corresponding to the target fish image is obtained by acquiring the target fish image to be classified and inputting the target fish image to be classified into a pre-trained fish classification model; the fish classification model can accurately realize effective identification and classification of the fish images with disordered backgrounds, which are collected under the underwater complex scene, based on the characteristic information of the target fish images to be classified and the distance between the class prototypes of the fishes.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a fish image classification method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a channel attention mechanism model provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of a spatial attention mechanism model provided by an embodiment of the present invention;

Fig. 4 is a flow chart of a training method of a fish classification model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a fish classification model according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a hybrid attention mechanism model provided by an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a fish image classification device according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The method provided by the embodiment of the invention can be applied to the fish image recognition scene, and can accurately realize the effective recognition and classification of the fish images with disordered backgrounds, which are collected under the underwater complex scene.

According to the fish image classification method, the target fish images to be classified are acquired and input into a pre-trained fish classification model, so that classification results corresponding to the target fish images are obtained; the fish classification model can accurately realize effective identification and classification of the fish images with disordered backgrounds, which are collected under the underwater complex scene, based on the characteristic information of the target fish images to be classified and the distance between the class prototypes of the fishes.

The following describes the technical scheme of the present invention in detail with reference to fig. 1 to 8. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 1 is a schematic flow chart of an embodiment of a fish image classification method according to an embodiment of the present invention. As shown in fig. 1, the method provided in this embodiment includes:

step 101, acquiring an image of target fish to be classified;

in particular, in the related art, less research is conducted on fish image classification, and most of the research is focused on one data set, and a complete detection and classification method is not available, so that fish image recognition cannot be effectively conducted. In real life, the shapes and the sizes of different fishes are similar, and the underwater images are easily affected by noise interference and color distortion, so that the fishes cannot be accurately classified through the collected images with disordered backgrounds under the scene of complex underwater environment, and the accurate identification of the underwater fishes has great challenges.

In order to solve the above problems, in the embodiment of the present invention, an image of a target fish to be classified is first acquired; alternatively, the acquired fish image to be classified may be a fish image with a cluttered background acquired in a complex underwater environmental scene.

102, inputting the target fish image into a pre-trained fish classification model to obtain a classification result corresponding to the target fish image; the fish classification model is trained based on a plurality of types of fish images; the fish classification model is used for determining a classification result corresponding to the target fish image based on the characteristic information of the target fish image and the distance between the class prototypes of each fish.

Specifically, after obtaining a target fish image to be classified, the embodiment of the invention inputs the target fish image into a pre-trained fish classification model to obtain a classification result corresponding to the target fish image; the fish classification model can determine a classification result corresponding to the target fish image based on the characteristic information of the target fish image and the distance between class prototypes of each fish; alternatively, the class prototypes of the fishes of the same type can be obtained by summing and averaging the characteristic information of the images of the fishes of the same type, so that each type of fishes can correspond to one class prototype.

For example, if the characteristic information of the target fish image a to be classified is a, the class prototype of the M-type fish is M, the class prototype of the N-type fish is N, and if the distance between the characteristic information a of the target fish image a and the class prototype N of the N-type fish is smaller than the distance between the characteristic information a of the target fish image a and the class prototype M of the M-type fish, it may be determined that the N-type fish corresponds to the target fish image a. Optionally, the class prototype N of the N-type fish is a class prototype of grass carp; optionally, the characteristic information of a plurality of grass carp images can be summed and averaged to obtain a grass carp category prototype; it can be determined that the grass carp corresponds to the target fish image a, so that a classification result corresponding to the target fish image can be obtained.

According to the method, the target fish images to be classified are obtained and input into a pre-trained fish classification model, so that classification results corresponding to the target fish images are obtained; the fish classification model accurately realizes effective identification and classification of the fish images with disordered backgrounds, which are collected under the underwater complex scene, based on the characteristic information of the target fish images to be classified and the distance between the class prototypes of the fishes.

In one embodiment, the fish classification model comprises:

Specifically, the fish classification model in the embodiment of the invention can accurately realize the effective identification and classification of the fish images with disordered backgrounds, which are collected in the underwater complex scene, based on the characteristic information of the target fish images and the distance between the class prototypes of each fish. Optionally, the fish classification model comprises a feature extraction module and a classification module which are sequentially connected; the characteristic extraction module is obtained based on a residual network model and a mixed attention mechanism model and is used for extracting characteristic information in the target fish image; optionally, the residual network comprises a convolution layer, an average pooling layer, 8 residual blocks and a full connection layer; the method is characterized in that a mixed attention mechanism model is added on the basis of a prototype network (residual network model), so that the prototype network is improved, the improved prototype network can increase the characteristic capability of the characteristics of the fish image through the mixed attention mechanism model, concentrate on important characteristics of the fish image and inhibit unnecessary characteristics in the fish image, so that the characteristic information in the fish image can be extracted more accurately through the improved prototype network, and a classification module in a fish classification model can determine classification results corresponding to the fish image more accurately based on the more accurate characteristic information of the fish image and the distance between class prototypes of each fish.

According to the method, the mixed attention mechanism model is added on the basis of the prototype network, so that the prototype network is improved, the characteristic capability of the characteristics of the fish image can be improved through the mixed attention mechanism model, important characteristics of the fish image are focused, unnecessary characteristics in the fish image are restrained, the characteristic information in the fish image can be extracted more accurately through the improved prototype network, and the classification result corresponding to the fish image can be determined more accurately based on the characteristic information of the more accurate fish image.

In one embodiment, the mixed attention mechanism model comprises a channel attention mechanism model and a spatial attention mechanism model, wherein the channel attention mechanism model is constructed based on a selective kernel network SKNet model;

the spatial attention mechanism model includes a pooling layer, a normalization layer, a convolution layer, and a nonlinear-acting sigmoid function of neurons.

Specifically, in the embodiment of the invention, the mixed attention mechanism model is added on the basis of the prototype network, so that the prototype network is improved, the improved prototype network can increase the characteristic capability of the characteristics of the fish image through the mixed attention mechanism model, concentrate on the important characteristics of the fish image and inhibit the unnecessary characteristics in the fish image, and the characteristic information in the fish image can be extracted more accurately through the improved network prototype; optionally, the mixed attention mechanism model in the embodiment of the invention comprises a channel attention mechanism model and a spatial attention mechanism model, wherein the channel attention mechanism model is constructed based on a selective kernel network SKNet model so as to realize adaptive change of receptive fields for fish images with different sizes. Optionally, the spatial attention mechanism model in the embodiment of the present invention includes a pooling layer, a normalization layer, a convolution layer, and a nonlinear sigmoid function of neurons. Optionally, the spatial attention mechanism model first splices the mapping after the average pooling layer and the maximum pooling layer, compresses the vector size by using the L2-norm normalization operation, changes the vector size into a characteristic diagram of 1 channel through 7×7 convolution, and obtains the characteristic diagram of spatial attention through a sigmoid, and the calculation formula is as follows:

Wherein sigma represents a sigmoid function, the z represents a binary norm, f ^7×7 The method and the device represent convolution operation with the filter size of 7 multiplied by 7, W and b represent parameters learned in the training process, and in the embodiment of the invention, the attention mechanism and the L2-norm normalization layer are added into the feature extraction module, so that important features of the fish image can be focused, unnecessary features in the fish image are restrained, and feature information in the fish image can be extracted more accurately.

Exemplary, channel attention mechanism models are shown in FIG. 2:

firstly, a feature map X extracted from a 3×3 and 5×5 convolution check residual network model is selected for convolution operation to respectively generateAnd->I.e. < ->and/>(H, C and W each represent a height,Dimension and width), wherein->and/>Including convolution, normalization and ReLU. To improve the computational efficiency, the 5×5 convolution kernel is replaced with a 3×3 kernel and a hole convolution with an expansion ratio of 2.

In a second step, the aboveAnd->Branching element-by-element addition, i.e.)>And then obtaining global information by using a global average pool to obtain s, and obtaining z through the full-connection layer compression characteristic. s is(s) _c Representing the average value of the c-th dimension feature, d is the feature dimension of z, optionally controlled using a reduction rate r, and L represents the minimum value of d, typically taken as 32.

z＝F _fc (s)＝δ(B(W _s ))

d＝max(C/r,L)

Finally, the compressed feature z is restored to C through 2 fully connected layers, the 2 fully connected results are pieced together (which can be considered as a matrix of Cx 2), and then the softmax operation is performed on the longitudinal direction (each column) of the matrix:

wherein A, a and B, B respectively representAnd->Is a soft attention vector of (1). Ac represents line c of A, ac is the c element of a, B _c And b _c And the same is true. Finally, feature map V is obtained by the attention weights of different convolution kernels, where Vc represents the feature map of dimension c.

V＝[V ₁ ,V ₂ ,…,V _C ]，V _C ∈R ^H×W

Exemplary, spatial attention mechanism models are shown in FIG. 3:

firstly, carrying out average pooling and maximum pooling operation on a feature map V output by a channel attention mechanism model to generate two mapsand/>They are then concatenated to generate feature descriptors. After pooling the layers, the vector size is compressed with an L2-norm normalization operation. The 7×7 convolution is changed into a feature map of 1 channel, and then a 2D space attention map is generated by a sigmoid function, and the calculation formula is as follows:

wherein sigma represents a sigmoid function, the z represents a binary norm, f ^7×7 Representing a convolution operation with a filter size of 7 x 7, W and b represent parameters learned during training.

The feature extraction network does not contain the last full connection layer and global average pooling layer operations. Assuming that an image X is input, a tensor of h×w×d is generated by the feature extraction module, (h, w, d refer to height, width, and dimension after convolution feature mapping, respectively) it can be regarded as feature descriptors of m (m=h×w) d dimensions, and the expression is as follows:

f _θ (X)＝{x ₁ ,x ₂ ,…,x _m }

Wherein x is _i Is the i-th feature descriptor; alternatively, h=w=6, d=512 is set, which means that there are 36 feature descriptors per image in total.

According to the method, the attention mechanism and the normalization layer are added into the feature extraction module, so that important features of the fish image can be focused, unnecessary features in the fish image are restrained, and feature information in the fish image can be extracted more accurately.

In one embodiment, the classification module determines a classification result corresponding to the target fish image based on:

Specifically, the classification module of the fish classification model in the embodiment of the invention is used for determining a classification result corresponding to the target fish image based on the characteristic information in the target fish image and the distance between the class prototypes of each fish; alternatively, assuming a fish image X, a tensor of h×w×d is generated by the feature extraction module in the fish classification model, (h, w, d refer to the height, width, and dimension after the convolution feature mapping, respectively) it may be regarded as m (m=h×w) d-dimensional feature descriptors, that is, feature information of the fish image extracted by the feature module may be represented by the feature descriptors, with the following expression:

f _θ (X)＝{x ₁ ,x ₂ ,…,x _m }

wherein x is _i Is the i-th feature descriptor; alternatively, h=w=6, d=512, that is, the characteristic information of each fish image is represented by 36 characteristic descriptors.

Alternatively, the class prototypes of the fishes of the same type can be obtained by summing and averaging characteristic information of a plurality of fish images of the same type, so that each type of fishes can correspond to one class prototype. Alternatively, the category prototypes corresponding to each type of fish may be determined using the following formula:

Wherein f _φ (x _i ) Characteristic information representing the target fish image extracted by the characteristic extraction module; s is S _k ＝{(x ₁ ,y _k ),(x ₂ ,y _k ),…,(x _n ,y _k ) And } represents a collection of target fish images of class k. Alternatively, each class prototype may also correspond to a tensor of h×w×d, (h, w, d refer to the height, width, and dimension after the convolution feature mapping, respectively), that is, each class prototype may also be regarded as feature descriptors of m (m=h×w) d dimensions, that is, each class prototype may also be represented by feature descriptors.

Optionally, after determining each feature descriptor in the feature information of the target fish image and each feature descriptor of the class prototype of each fish, a distance between each feature descriptor in the feature information of the target fish image and each feature descriptor of the class prototype of each fish may be determined; optionally, the following formula may be used to determine the distance between each feature descriptor in the feature information of the target fish image and each feature descriptor of the class prototype of each fish, so that K feature descriptors closest to each feature descriptor in the class prototype of each fish corresponding to each feature descriptor in the feature information of the target fish image may be determined, where K is optionally equal to 3:

Wherein x is _i Feature descriptors in feature information representing the image of the target fish,a feature descriptor representing a class prototype of the fish; that is, for each feature descriptor in the feature information of the target fish image, the nearest K (3) feature descriptors are determined from the feature descriptors corresponding to each category prototype by using the above formula. For example, assuming that N class prototypes are shared, determining 3 feature descriptors closest to the 1 st feature descriptor in the feature information of the target fish image and 36 feature descriptors of the first class prototypes by using the above formula, and determining 3 feature descriptors closest to the 1 st feature descriptor in the feature information of the target fish image and 36 feature descriptors of the second class prototypes until determining 3 feature descriptors closest to the 1 st feature descriptor in the feature information of the target fish image and 36 feature descriptors of the N class prototypes; similarly, the 36 th feature descriptor in the feature information of the target fish image is determined to be respectively the nearest 3 feature descriptors in the 36 feature descriptors of each category prototype.

Optionally, after determining K feature descriptors closest to each of the feature descriptors of the class prototypes of the fish corresponding to each of the feature descriptors in the feature information of the target fish image, the distance between the target fish image and each of the class prototypes of the fish may be determined using the following formula:

Wherein, the liquid crystal display device comprises a liquid crystal display device,characteristic information f representing image of target fish _q (x) And class prototype of class K->A distance therebetween; optionally, the above formula is passed through each descriptor x in the target fish image _i Each descriptor x _i Respectively determining characteristic information f of the target fish image with 3 nearest characteristic descriptors in the K-th category prototype _q (x) And class prototype of class K->A distance therebetween; i.e. the characteristic information f of the target fish image _q (x) With respective descriptors x _i Representing, prototype class K ++>With respective descriptors x _i The method is respectively represented by 3 nearest feature descriptors in the K-th category prototype, so that on the basis of ensuring accurate identification and classification, the calculated amount is reduced, and the classification efficiency of the target fish image is improved; alternatively, assuming that N class prototypes are total, K is less than or equal to N, the distances between the target fish image and the N class prototypes can be determined by using the above formula, respectively.

Optionally, after determining the distances between the target fish image and the N class prototypes, the classification result of the target fish image may be determined according to the distances between the target fish image and the respective class prototypes. For example, if the distance between the target fish image and the N category prototypes is the smallest, the target fish image may be divided into fish species corresponding to the 2 nd category prototype.

According to the method, the characteristic information of each fish image sample and the class prototype of each type of fish are represented in the mode of the characteristic descriptors, and the distance between each characteristic descriptor in the characteristic information of the target fish image to be classified and each characteristic descriptor of the class prototype of each fish is further accurately determined, so that the classification result of the target fish image can be accurately determined.

In one embodiment, the fish classification model is trained based on the following:

acquiring a plurality of types of fish image samples;

characteristic information of each fish image sample of each type is input into a classification module in a fish classification model to obtain a class prototype of each type of fish;

Training the fish classification model according to classification results corresponding to the fish image samples, label information corresponding to the fish image samples and the target loss function to obtain a trained fish classification model.

Specifically, the fish classification model in the embodiment of the invention is used for determining the classification result corresponding to the target fish image based on the characteristic information of the target fish image and the distance between the class prototypes of each fish. The fish classification model can be obtained by training in the following manner: firstly, obtaining a plurality of types of fish image samples, and then inputting the fish image samples of all types into a characteristic extraction module of a fish classification model to obtain characteristic information of the fish image samples of all types; after the characteristic information of each fish image sample of each type is obtained, the classification module in the fish classification model can perform summation and average operation on the characteristic information of the fish images of the same type to obtain a class prototype of the fish of the type, so that each type of fish can correspond to one class prototype; then, the fish classification model can output classification results corresponding to the fish image samples according to the characteristic information of the fish image samples to be classified and the distances of the class prototypes of the fish types; alternatively, the fish type corresponding to the class prototype with the smallest distance from the fish image to be classified may be used as the classification result of the fish image to be classified; alternatively, the characteristic information of each fish image sample and the class prototype of each type of fish may be represented by means of the characteristic descriptor in the embodiment of the present invention, and the characteristic information of each fish image sample and the distance of the class prototype of each type of fish may be determined.

Optionally, the fish classification model in the embodiment of the invention carries out training based on the small sample fish image data set, so that in the training process of carrying out the fish classification model, various types of fish images can be divided into a training set, a verification set and a test set, k types of fish are randomly selected in the training set, the verification set and the test set respectively, n images of each type are taken as supporting sets, m images are selected in the similar sets as query sets, and based on the training method, classification results corresponding to the various types of fish image samples are output according to the characteristic information of the various types of fish image samples and the distance of the class prototypes of the various types of fish; according to the classification results corresponding to the fish image samples, the label information corresponding to the fish image samples and the target loss function, training the fish classification model to obtain a trained fish classification model, so that training of the fish classification model based on the small sample fish image can be realized, and the trained fish classification model can accurately classify the fish image.

Optionally, after the classification result corresponding to each fish image sample is obtained, training the fish classification model by using the classification result corresponding to each fish image sample, the label information corresponding to each fish image sample and the target loss function to obtain a trained fish classification model; alternatively, the target loss function may be a cross entropy loss function as follows:

l＝-logp _q (y＝k|x)

Wherein P represents the probability that the target fish image X to be classified belongs to the kth class, and y represents the label information corresponding to the target fish image to be classified.

Optionally, the tag information corresponding to each fish image sample is used for marking which type of fish the fish image belongs to; optionally, under the condition that the classification result corresponding to the fish image sample obtained by using the fish classification model is inconsistent with the fish type corresponding to the tag information of the fish image sample, optimizing and adjusting the parameters in the fish classification model by using the target loss function, so that the trained fish classification model can accurately identify and classify the target fish image.

As shown in fig. 4, the training method of the fish classification model in the embodiment of the invention is specifically as follows:

step one, acquiring a fish data set, wherein the fish data set optionally comprises the following steps: fish4 knowledges, wildFish and CroatianFishDataset.

Step two, preprocessing a data set. The data set is preprocessed. First, all the pictures input into the network are converted into three-channel RGB images, and then the RGB images are resized to a uniform size.

And step three, dividing the data set. The collected fish dataset is divided into a training set, a validation set and a test set. And respectively randomly selecting k kinds of fishes in the training set, the verification set and the test set, taking n pictures of each class as a supporting set, and selecting m pictures in the similar sets as a query set.

And fourthly, inputting the support set and the query set selected randomly from the training set and the verification set into an improved prototype network model (fish classification model) for training, so that the loss function is minimized. Optionally, the support set and the query set images are input into a feature extraction module to extract features to generate feature vectors, then similarity between the query set and the support set category prototype is obtained through a classification module, and finally the category of the query set is judged according to the similarity.

And fifthly, inputting the support set and query set images selected randomly in the test set into the model weights trained in the fourth step for testing and adjusting to obtain the trained fish classification model.

Illustratively, as shown in FIG. 5, the fish classification model includes: the device comprises a feature extraction module and a classification module; the specific steps of classifying by using the fish classification model are as follows: firstly, extracting image features of a sample and a query image x (target fish image to be classified) in a support set through a feature extraction module to generate feature vectors; secondly, the classification module calculates a class prototype of the support set; then, the query image x (target fish image to be classified) calculates distances from the class prototypes of the support sets, the distances are used for representing the similarity between the images, and the classification result corresponding to the query image x (target fish image to be classified) is predicted. Finally, comparing the classification result with the label information to obtain a result of prediction accuracy, and then obtaining a trained fish classification model by continuously optimizing and updating parameters in the training process, so that classification of fish images can be realized by using the trained fish classification model.

Optionally, the basic network structure of the feature extraction module is ResNet18, optionally, the structure of the mixed attention mechanism model is shown in fig. 6, and a channel attention model and a spatial attention model are added as direct mapping parts after convolution operation, wherein the channel attention mechanism model is selected from SKNet, and the spatial attention mechanism model consists of a pooling layer, a normalization layer, a convolution layer and a sigmoid function.

Optionally, the classification module is configured to calculate a distance between the query image (the target fish image to be classified) and the class. Specifically, a query set image x (image of target fish to be classified) is subjected to feature extractionAfter taking the module, generating characteristic information f _q (x)＝[x ₁ ,x ₂ ,…,x _m ]Meanwhile, the support set image obtains a category prototype after passing through the feature extraction module,then find each descriptor x with the query image in the category prototype _i (i∈[1,2,…,m]) Nearest k values->Finally, x is calculated using the following formula _i And->Distance between->Using the following formula->The distance between the query set and the category prototype is determined, so that the accurate classification of the fish images is realized.

The fish image classification device provided by the invention is described below, and the fish image classification device described below and the fish image classification method described above can be referred to correspondingly.

Fig. 7 is a schematic structural diagram of the fish image classification device provided by the invention. The fish image classification device provided in this embodiment includes:

an acquisition module 710 for acquiring an image of a target fish to be classified;

the classification module 720 is used for inputting the target fish image into a pre-trained fish classification model to obtain a classification result corresponding to the target fish image; the fish classification model is trained based on a plurality of types of fish images; the fish classification model is used for determining a classification result corresponding to the target fish image based on the characteristic information of the target fish image and the distance between the class prototypes of each fish.

Optionally, the fish classification model comprises:

Optionally, the mixed attention mechanism model comprises a channel attention mechanism model and a spatial attention mechanism model, wherein the channel attention mechanism model is constructed based on a selective kernel network SKNet model;

Optionally, the classification module determines a classification result corresponding to the target fish image based on the following manner:

Optionally, the K nearest feature descriptors among the feature descriptors of the class prototype of each fish corresponding to each feature descriptor in the feature information of the target fish image are determined by using the following formula:

/>

The device of the embodiment of the present invention is configured to perform the method of any of the foregoing method embodiments, and its implementation principle and technical effects are similar, and are not described in detail herein.

Fig. 8 illustrates a physical structure diagram of an electronic device, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform a fish image classification method comprising: acquiring an image of target fish to be classified; inputting the target fish image into a pre-trained fish classification model to obtain a classification result corresponding to the target fish image; the fish classification model is trained based on a plurality of types of fish images; the fish classification model is used for determining a classification result corresponding to the target fish image based on the characteristic information of the target fish image and the distance between the class prototypes of each fish.

Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method of classifying fish images provided by the methods described above, the method comprising: acquiring an image of target fish to be classified; inputting the target fish image into a pre-trained fish classification model to obtain a classification result corresponding to the target fish image; the fish classification model is trained based on a plurality of types of fish images; the fish classification model is used for determining a classification result corresponding to the target fish image based on the characteristic information of the target fish image and the distance between the class prototypes of each fish.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the above provided fish image classification methods, the method comprising: acquiring an image of target fish to be classified; inputting the target fish image into a pre-trained fish classification model to obtain a classification result corresponding to the target fish image; the fish classification model is trained based on a plurality of types of fish images; the fish classification model is used for determining a classification result corresponding to the target fish image based on the characteristic information of the target fish image and the distance between the class prototypes of each fish.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A fish image classification method, comprising:

acquiring an image of target fish to be classified;

2. The fish image classification method according to claim 1, wherein the fish classification model comprises:

3. The fish image classification method according to claim 2, wherein the mixed attention mechanism model comprises a channel attention mechanism model and a spatial attention mechanism model, and the channel attention mechanism model is constructed based on a selective kernel network SKNet model;

4. A fish image classification method according to claim 3, wherein the classification module determines the classification result corresponding to the target fish image based on:

5. The fish image classification method according to claim 4, wherein K nearest feature descriptors among feature descriptors of class prototypes of respective fishes corresponding to respective feature descriptors in the feature information of the target fish image are determined by using the following formula:

6. The fish image classification method according to any one of claims 1-5, wherein the fish classification model is trained based on:

acquiring a plurality of types of fish image samples;

7. A fish image classification apparatus, comprising:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the fish image classification method of any one of claims 1 to 6 when the program is executed by the processor.

9. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the fish image classification method according to any one of claims 1 to 6.

10. A computer program product having stored thereon executable instructions which, when executed by a processor, cause the processor to implement the fish image classification method of any of claims 1 to 6.