CN112016617A

CN112016617A - Fine-grained classification method and device and computer-readable storage medium

Info

Publication number: CN112016617A
Application number: CN202010880880.1A
Authority: CN
Inventors: 杨若愚
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2020-12-01
Anticipated expiration: 2040-08-27
Also published as: CN112016617B

Abstract

The invention relates to artificial intelligence, and discloses a fine-grained classification method, which comprises the steps of establishing an initial model, acquiring an original image, and preprocessing the original image to form training data for training the initial model; acquiring loss data corresponding to the original image based on the initial model and the training data; performing backward propagation of random gradient descent based on the loss data until the initial model converges in a preset range to form classification model training; and carrying out classification prediction on the image to be processed through a classification model. The invention also relates to a blockchain technique, the original image being stored in a blockchain. The invention can realize self-adaptive selection of local characteristic regions and improve the accuracy of classification prediction.

Description

Fine-grained classification method and device and computer-readable storage medium

Technical Field

The present invention relates to artificial intelligence, and in particular, to a fine-grained classification method, apparatus, electronic device, and computer-readable storage medium.

Background

The fine-grained classification mainly refers to more refined subclass classification on the basis of distinguishing basic classes, such as distinguishing the types of birds and the styles of vehicles, and the like, and currently, the fine-grained classification has wide business requirements and application scenes in the industry and actual life.

At present, the fine-grained classification method mainly includes two main categories. One is to classify by locating local feature regions and extracting image features of these different classification regions that can distinguish between classifications. However, since there is no position marking information of the local key region, most algorithms use a pre-sized window to select the local feature region. It can be known that the method cannot automatically adapt to the sizes of the feature regions in different scenes, and needs to manually set the aspect ratio and the area of the scribed window. The self-adaptive capacity of the fine-grained classification algorithm to data in different scenes is weakened, so that local feature areas cannot be accurately positioned and local features cannot be effectively extracted, and the algorithm has poor expansion adaptability.

And the other method is to eliminate the influence of non-target areas on fine-grained classification through example detection and segmentation of weak supervision. However, the previous algorithm based on the idea can not realize end-to-end training, and the training process is repeated. Automation cannot be achieved, in addition, continuous human intervention is needed in the whole training process, the training time is long, and the expected training result is uncertain.

Disclosure of Invention

The invention provides a fine-grained classification method, a fine-grained classification device, electronic equipment and a computer-readable storage medium, and mainly aims to improve the precision and efficiency of image fine-grained classification.

In order to achieve the above object, the present invention provides a fine-grained classification method, including:

creating an initial model, acquiring an original image, and preprocessing the original image to form training data for training the initial model;

obtaining loss data corresponding to the original image based on the initial model and the training data;

performing backward propagation of random gradient descent based on the loss data until the initial model converges in a preset range to form a classification model;

and carrying out classification prediction on the image to be processed through the classification model.

Optionally, the training data is stored in a blockchain, and the process of preprocessing the raw images to form training data for training the initial model includes:

inputting the original image into the initial model to obtain a significant matrix corresponding to the original image;

determining the information strength sequence of the adding sub-matrix corresponding to the original image based on the intermediate output result, and acquiring the image local characteristic regions of the adding sub-matrix with the preset number in the information strength sequence;

and determining a target image characteristic region as the training data according to the image local characteristic region.

Optionally, the initial model includes a first neural network and a second neural network, and the process of obtaining loss data corresponding to the original image includes:

extracting and predicting the characteristics of the target image characteristic region through the first neural network, and acquiring a first prediction result and a first cross entropy loss;

obtaining a pairing sorting loss based on the information strength sorting and the first prediction result; simultaneously, extracting a feature vector of the original image through the first neural network, and splicing the feature vector with a feature vector of the target image feature region;

and inputting the spliced feature vectors into a full-connection layer of the first neural network for prediction, and acquiring a corresponding second prediction result and a second cross entropy loss.

Optionally, the loss data is stored in a blockchain; wherein the content of the first and second substances,

the loss data includes the first cross-entropy loss, the second cross-entropy loss, and the pair ordering loss.

Optionally, the inputting an original image into the initial model, and the obtaining a saliency matrix corresponding to the original image includes:

inputting an original image into the first neural network, and acquiring an output result of an intermediate layer of the first neural network;

and respectively passing the output results of the intermediate layers through the second neural network to obtain the significant matrix corresponding to each intermediate layer.

In order to solve the above problem, the present invention also provides a fine-grained classification apparatus, including:

the model building and data acquiring module is used for building an initial model, acquiring an original image, and preprocessing the original image to form training data for training the initial model;

a loss data acquisition module for acquiring loss data corresponding to the original image based on the initial model and the training data;

the model training module is used for carrying out backward propagation of random gradient descent based on the loss data until the initial model converges in a preset range so as to form a classification model;

and the classification prediction module is used for performing classification prediction on the image to be processed through the classification model.

determining the information strength sequence of the adding sub-matrix corresponding to the original image based on the significant matrix, and acquiring the image local characteristic region of the adding sub-matrix with the preset number in the information strength sequence;

In order to solve the above problem, the present invention also provides an electronic device, including:

a memory storing at least one instruction; and

and the processor executes the instructions stored in the memory to realize the fine-grained classification method.

In order to solve the above problem, the present invention further provides a computer-readable storage medium, which stores at least one instruction, and the at least one instruction is executed by a processor in an electronic device to implement the fine-grained classification method.

The embodiment of the invention constructs an initial model by acquiring an original image and preprocesses the original image to form training data; acquiring loss data corresponding to the original image according to the initial model and the training data; and performing reverse propagation of random gradient descent based on loss data until the initial model training is completed to form a classification model, and finally performing classification prediction on the image to be processed through the classification model, so that not only can the key feature region be flexibly selected, but also the local key feature region can be selected in a self-adaptive manner, and the effective extraction of the fine-grained features of the image is realized, thereby achieving the effects of classifying and positioning the fine-grained features of the image and highlighting the features of the local feature region.

Drawings

Fig. 1 is a schematic flowchart of a fine-grained classification method according to an embodiment of the present invention;

fig. 2 is a schematic block diagram of a fine-grained classification apparatus according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an internal structure of an electronic device implementing a fine-grained classification method according to an embodiment of the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a fine-grained classification method. Fig. 1 is a schematic flow chart of a fine-grained classification method according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

In this embodiment, the fine-grained classification method includes:

s110: an initial model is built, an original image is obtained, and the original image is preprocessed to form training data for training the initial model.

Specifically, the process of preprocessing the raw images to form training data for training the initial model includes:

firstly, inputting an original image into an initial model to obtain a significant matrix corresponding to the original image;

then, determining the information strength sequence of the corresponding adding sub-matrix based on the significant matrix, and acquiring the image local characteristic region of the adding sub-matrix with the preset number in the information strength sequence;

and finally, determining a target image characteristic region as the training data according to the acquired image local characteristic region.

In another aspect, the initial model includes a first neural network and a second neural network, inputting the original image into the initial model, and the process of obtaining the saliency matrix corresponding to the original image further includes:

1. inputting an original image into a first neural network of an initial model, and acquiring an output result of an intermediate layer of the first neural network; when a plurality of intermediate layers exist, the output results of the intermediate layers are also multiple, the positions of local feature areas of the original image can be preliminarily determined through the output results of the intermediate layers, namely the process of finding the local features through the first neural network is used for preparing for obtaining a loss function in the next step;

2. and respectively passing the output results of the intermediate layers through a second neural network to obtain the significant matrix corresponding to each intermediate layer.

The second neural network can adopt a characteristic pyramid network, and after the output result of the middle layer passes through the convolution layer of the characteristic pyramid network, a significant matrix corresponding to each middle layer can be obtained; in addition, the first neural network and the second neural network can also adopt other structures or types of network structures, and both local feature search and intermediate output result acquisition can be realized.

And respectively passing the output results of the middle layers on different scales through 1-x-1 dimensional convolution layers in the FPN, and calculating to obtain significant matrixes of the original images on different scales.

It is emphasized that, in order to further ensure the privacy and security of the original image or training data, the original image or training data may also be stored in a node of a blockchain.

S120: and acquiring loss data corresponding to the original image based on the initial model and the training data.

The original image is an image which needs to be classified and predicted, the image is input into a first neural network of an initial model, the output result of the first neural network is input into a second neural network, the first neural network and the second neural network can be deep convolution neural networks for extracting features of the image, a plurality of different convolution neural network structures can be selected in actual training or application, and a common trainer can select a proper neural network structure according to the characteristics of a data set, so that the specific structure of the neural network is not limited.

As a specific example, in the present invention, the first neural network employs a ResNet50 network structure, which includes 64 convolutional layers with a step size of 2 of 7 × 7, one maximum pooling layer with a step size of 2 of 3 × 3, followed by four residual structures with a step size of 2 between each residual structure, and includes a certain number of convolutional layer combinations. Each convolution layer combination comprises a certain number of convolution layers with 1x1 dimension, a certain number of convolution layers with 3x3 dimension and a certain number of convolution layers with 1x1 dimension. At the end of the four residual structures, one average pooling layer, one 1000-dimensional fully-linked network and the softmax activation function are connected.

In addition, the second neural network may adopt a feature pyramid network, that is, the nearest neighbor upsampling is performed on the operation output results of the four residual structures of the ResNet50 network from high to low (i.e., 4, 3, 2, 1), and the convolutional operation result of the residual structure output intermediate result and FPN (corresponding to 1x 1-dimensional convolution in the feature pyramid structure) of the lower layer is subjected to cannate stitching. Finally, the result of the above operation is obtained by an operation of 1x1 dimension, and the significant matrix required by the local feature region is located.

Specifically, the process of acquiring loss data corresponding to the original image includes:

1. performing feature extraction and prediction on a target image feature region through a first neural network, and acquiring a first prediction result and a first cross entropy loss;

2. obtaining pairing sorting loss based on the information strength sorting and the first prediction result; simultaneously, extracting a characteristic vector of the original image through a first neural network, and splicing the characteristic vector with a characteristic vector of a target image characteristic region;

3. inputting the spliced feature vectors into a full-connection layer of the neural network for prediction, and obtaining a corresponding second prediction result and a second cross entropy loss.

As an example, using dynamic programming, the information strength ordering of the summing sub-matrices can be solved on three significant matrices and the summing sub-matrices with the information strength ranked first three can be output, in which case nine summing sub-matrices can be obtained. And further acquiring image local characteristic regions corresponding to the nine summation submatrices, and selecting three image local characteristic regions as image local characteristic regions proposed by the neural network (namely target image characteristic regions) through a non-maximum suppression algorithm.

In addition, the first neural network or the backbone network extracts features of the target image feature region, predicts by using an independent 'first full-link layer', obtains a first prediction result (namely, the prediction probability of the category to which the original image belongs), and further obtains a first cross entropy loss according to the first prediction result. In addition, feature vectors extracted from the original image and the three image local feature areas by the main network are spliced, and an independent 'second full-connection layer' is input for prediction, so that a second prediction result and a second cross entropy loss are obtained.

In addition, the loss data may also be stored in the blockchain; wherein the loss data comprises the first cross entropy loss, the second cross entropy loss and the pair ordering loss.

S130: and carrying out backward propagation of random gradient descent based on the loss data until the initial model converges in a preset range to form a classification model.

S140: and carrying out classification prediction on the image to be processed through the classification model.

Specifically, according to the total loss (the sum of the first cross entropy loss, the second cross entropy loss and the pairing sorting loss) obtained in the steps, the gradient descent is performed to the neural network for back propagation, the steps are repeated until the neural network converges in a preset range, namely, the training of the classification model is completed, and then the classification model is obtained according to the training to perform classification prediction on the correlation of the graph to be processed. Therefore, the invention relates to artificial intelligence, and deep learning is carried out based on a neural network to obtain a final classification model.

As a specific example, the process of classifying and predicting the image by the fine-grained classification method provided by the present invention includes the following steps:

1. and inputting the original image into a backbone network, and outputting a required intermediate layer calculation intermediate result.

2. And (3) respectively passing the output results of the intermediate layers on different scales through a 1X 1-dimensional convolution layer in the FPN (feature pyramid network), and calculating the significant matrixes of the original image on different scales.

3. And (4) superposing the significant matrixes on different scales to integrate one significant matrix.

4. Inputting the original image and pictures of three areas including ' a minimum external matrix of a significant value positive response area ', a minimum external matrix of a region with a significant value larger than 0.5 and a minimum external matrix of a region with a significant value larger than 0.9 ' intercepted according to a significant image matrix into a backbone network, and respectively extracting 4 1024-dimensional eigenvectors. Wherein, the 0.5 area or the 0.9 area refers to the minimum bounding matrix of the enclosed area in which the significance values of all matrix elements in the significance matrix are greater than 0.5 or 0.9.

5. And splicing the four extracted feature vectors, and inputting the four feature vectors into a full-connection layer to obtain an image classification result.

As a specific application scene, the fine-grained classification method based on deformable key feature region extraction can be applied to specific processes such as crop pest and disease identification. The disease and pest lesion marking method can automatically position a lesion feature area under the condition of no disease and pest lesion marking, and extract effective image classification features aiming at the lesion area.

Specifically, crop image information is input into a main network (first neural network) of an initial model, an output result of an intermediate layer of a main network of the last three layers is connected to two independent convolution layers 1 and a full-connection layer 1, and a crop category to which an input image belongs is output.

Then, after a crop type prediction result is obtained, inputting an intermediate layer output result of a shallow layer network into a special deep layer network corresponding to a crop type in a backbone network, inputting a prophase net to the intermediate layer result of the deep layer network before Max posing, outputting a two-dimensional saliency map by the prophase net, and outputting a plurality of local key areas by a method of solving an addition and ascending sequence sub-matrix sequence.

Then, the selected multiple local key areas are input into a main network formed by a general shallow network for various crops and a special crop deep network of the judged crop category, and feature vectors are extracted. And splicing the extracted feature vectors, inputting the feature vectors into the full-connection layer 2, and outputting a prediction result of the pest and disease category to which the crop belongs.

And accessing a tanh activation function to a propsal net output result to generate a saliency map with a saliency value between (-1,1), calculating a sequence of adding to an ascending sub-matrix by outputting on the saliency map, inputting an image area corresponding to the sub-matrix with the saliency value ranked at the front into a teacher network model, and obtaining the confidence coefficient of the labeling label. And calculating the pair ranking loss by ranking the confidence degree of the extracted local characteristic region and ranking the significant value. By updating the parameters of the proposal net and the backbone network through the back propagation of the pair warning loss, the significance value of the non-target area approaches to-1 and the significance value of the target area approaches to 1 on the significance map output by the model.

And selecting the image area of the minimum circumscribed matrix of the corresponding areas with different saliency values between different intervals based on the generated saliency map matrix. In forward propagation, the image areas are input into a backbone network, and the feature vectors are extracted, so that the effects of inhibiting non-target areas and reducing background interference are achieved.

Therefore, in order to realize feasible local feature region positioning without position supervision information of the local key feature region, the fine-grained classification method based on deformable key feature region extraction provided by the invention can realize local key region selection without increasing the calculation cost by a method of extracting a saliency map matrix from a trunk network intermediate layer through an FPN (feature pyramid network) and a method of utilizing a maximum summation sub-matrix. The sub-matrixes with the higher rank are selected as local key areas selected by the model through adding and ranking the information strength of the sub-matrixes, the sub-matrixes are respectively input into a backbone network to extract feature vectors, and the network can effectively extract fine-grained features of the image, so that the purposes of classifying and positioning the fine-grained features of the image and highlighting the features of the local feature areas are achieved.

Fig. 2 is a functional block diagram of the fine-grained classification apparatus according to the present invention.

The fine-grained classification device 100 of the present invention may be installed in an electronic device. According to the realized functions, the fine-grained classification device may include a training data acquisition module 101, a loss data acquisition module 102, a model training module 103, and a classification prediction module 104. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the model building and data acquiring module 101 is configured to build an initial model, acquire an original image, and preprocess the original image to form training data for training the initial model;

a loss data obtaining module 102, configured to obtain loss data corresponding to the original image based on the initial model and the training data;

the model training module 103 is configured to perform backward propagation of random gradient descent based on the loss data until the initial model converges within a preset range to form a classification model;

and the classification prediction module 104 is configured to perform classification prediction on the image to be processed through the classification model.

extracting and predicting the features of the target image feature region through a backbone network of the first neural network, and acquiring a first prediction result and a first cross entropy loss;

obtaining a pairing sorting loss based on the information strength sorting and the first prediction result; simultaneously, extracting a feature vector of the original image through the backbone network, and splicing the feature vector with a feature vector of the target image feature region;

Fig. 3 is a schematic structural diagram of an electronic device implementing the fine-grained classification method according to the present invention.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a fine-grained classification program 12, stored in the memory 11 and executable on the processor 10.

The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic apparatus 1 and various types of data, such as codes of fine-grained classification programs, etc., but also to temporarily store data that has been output or is to be output.

The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., fine-grained classification programs, etc.) stored in the memory 11 and calling data stored in the memory 11.

The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.

Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.

Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The fine-grained classification program 12 stored by the memory 11 in the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:

constructing an initial model, acquiring an original image, and preprocessing the original image to form training data for training the initial model;

and carrying out classification prediction on the image to be processed through a classification model.

inputting the original image into the initial model, and acquiring a significant matrix corresponding to the original image;

Optionally, the inputting an original image into an initial model, and the obtaining a saliency matrix corresponding to the original image includes:

and respectively passing the output results of the intermediate layers through a second neural network to obtain the significant matrix corresponding to each intermediate layer.

Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again. It is emphasized that, in order to further ensure the privacy and security of the original image or training data, the original image or training data may also be stored in a node of a blockchain.

Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A fine-grained classification method, characterized in that the method comprises:

2. A fine-grained classification method according to claim 1, wherein the training data is stored in a blockchain, and wherein the pre-processing of the raw images to form training data for training the initial model comprises:

3. A fine-grained classification method according to claim 2, wherein the initial model comprises a first neural network and a second neural network, and the process of obtaining loss data corresponding to the original image comprises:

4. The fine grain classification method of claim 3, wherein the loss data is stored in a blockchain; wherein the content of the first and second substances,

5. A fine-grained classification method according to claim 3, wherein said process of inputting an original image into said initial model and obtaining a saliency matrix corresponding to said original image comprises:

6. A fine-grained classification apparatus, characterized in that the apparatus comprises:

7. A fine grain classification apparatus according to claim 6, wherein the training data is stored in a blockchain, and the process of preprocessing the raw images to form training data for training the initial model comprises:

8. A fine-grained classification method according to claim 7, wherein the initial model comprises a first neural network and a second neural network, and the process of obtaining loss data corresponding to the original image comprises:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the processor; wherein the content of the first and second substances,

the memory stores instructions executable by the processor to perform a fine-grained classification method according to any one of claims 1 to 6.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the fine-grained classification method according to any one of claims 1 to 6.