CN112819073A

CN112819073A - Classification network training method, image classification device and electronic equipment

Info

Publication number: CN112819073A
Application number: CN202110137951.3A
Authority: CN
Inventors: 朱彦浩; 胡郡郡; 唐大闰
Original assignee: Shanghai Minglue Artificial Intelligence Group Co Ltd
Current assignee: Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2021-05-18

Abstract

The application provides a classification network training method, an image classification device and electronic equipment, wherein the classification network training method comprises the following steps: acquiring an image sample; extracting feature information of the image sample in each feature extraction stage in a feature extraction network based on the feature extraction network, wherein each feature extraction stage comprises at least one feature extraction network layer; correlating the attention of the feature information of each feature extraction stage to obtain fusion feature information; and training the classification network based on the fusion characteristic information to obtain a trained classification network model. The network can acquire global information, explore more different characteristics among various types and explore more similarities among the same types so as to obtain higher classification performance and improve the accuracy of classification of similar characteristics in the image.

Description

Classification network training method, image classification device and electronic equipment

Technical Field

The application relates to the field of artificial intelligence, in particular to a classification network training method, an image classification device and electronic equipment.

Background

Traditional image classification usually splits based on convolutional neural network, but to the classification of the feature information in the image with more similar features, classifying based on traditional convolutional neural network can lead to inaccurate classification results, for example, in the LOGO detection field, LOGO identifications in the same industry are different greatly, the degree of distinction between LOGO is not high, and especially to the classification of LOGO of products with more similar names, the traditional classification network structure is difficult to meet actual requirements.

Therefore, the related art has a problem of how to improve the accuracy of classification of similar features in an image.

Disclosure of Invention

The application provides a classification network training method, an image classification device and electronic equipment, which are used for at least solving the problem of how to improve the accuracy of classification of similar features in an image in the related art.

According to an aspect of an embodiment of the present application, there is provided a classification network training method, including: acquiring an image sample; extracting feature information of the image sample in each feature extraction stage in a feature extraction network based on the feature extraction network, wherein each feature extraction stage comprises at least one feature extraction network layer; correlating the attention of the feature information of each feature extraction stage to obtain fusion feature information; and training the classification network based on the fusion characteristic information to obtain a trained classification network model.

Optionally, the associating the feature information attention of each feature extraction stage to obtain the fused feature information includes: selecting target feature information from the feature information of each feature extraction stage based on preset target features; and sequentially carrying out attention weighting on the target characteristic information and the characteristic information extracted in other characteristic extraction stages in the characteristic extraction network, wherein each attention weighting result is used as the target characteristic information in the next attention weighting process until the attention weighting of all the characteristic information is finished.

Optionally, the performing attention weighting on the target feature information and the feature information of other feature extraction stages in the feature extraction network in sequence includes: acquiring the dimension scale of the target characteristic information and the dimension scale of the characteristic information to be operated; adjusting the dimension scale of the feature information to be calculated to the dimension scale of the target feature information; adding the feature information to be operated and the target feature information in a one-to-one correspondence manner according to dimensionality to obtain added feature information; normalizing the addition characteristic information to obtain the weight of each characteristic dimension of the addition characteristic information; and updating the target characteristic information based on the weight and the characteristic information to be operated, wherein the updated target characteristic information is used as the target characteristic information of the next characteristic information to be operated.

Optionally, the feature information comprises a multi-dimensional feature vector; the updating the target feature information based on the weight and the feature information to be operated includes: and calculating the product of the weight and the vector value of the feature information to be operated as the vector value of the updated target feature information.

Optionally, the extracting, based on the feature extraction network, feature information of the image sample at each feature extraction stage in the feature extraction network includes: inputting the image sample into the feature extraction network; respectively extracting a feature map output by each feature extraction stage; and respectively carrying out average pooling on the feature information output by each feature extraction section to obtain the feature information.

Optionally, the training the classification network based on the fused feature information includes: and inputting the fusion characteristic information into the classification network, and training the classification network by taking an Arcface loss function as a loss function to obtain the trained classification network.

According to another aspect of the embodiments of the present application, there is also provided an image classification method, including: acquiring an image to be classified; and inputting the images to be classified into a trained classification network to obtain a classification result, wherein the classification network is obtained by training based on fusion characteristic information, and the fusion characteristic information is obtained by associating the attention of the characteristic information of each characteristic extraction stage when the characteristic extraction network extracts the characteristic information of the image sample.

According to another aspect of the embodiments of the present application, there is also provided a classification network training apparatus, including: the acquisition module is used for acquiring an image sample; the extraction module is used for extracting the feature information of the image sample in each feature extraction stage in the feature extraction network based on the feature extraction network, wherein each feature extraction stage comprises at least one feature extraction network layer; the association module is used for associating the attention of the feature information of each feature extraction stage to obtain fusion feature information; and the training module is used for training the classification network based on the fusion characteristic information to obtain a trained classification network model.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory communicate with each other through the communication bus; wherein the memory is used for storing the computer program; a processor for performing the method steps in any of the above embodiments by running the computer program stored on the memory.

According to a further aspect of the embodiments of the present application, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the method steps of any of the above embodiments when the computer program is executed.

Extracting feature information of the image sample in each feature extraction stage based on a feature extraction network, and associating the attention of the feature information of each feature extraction stage to obtain fused feature information; the classification network is trained based on the fusion feature information, so that the network can acquire global information, explore more different features among various types, and explore more similarities among the same types, so that higher classification performance is obtained, and the classification accuracy of the similar features in the image is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a schematic diagram of a hardware environment for an alternative classification network training and/or image classification method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart diagram illustrating an alternative classification network training method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an alternative network architecture for a feature extraction network according to an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram of another alternative image classification method according to an embodiment of the present application;

FIG. 5 is a block diagram of an alternative classification network training apparatus according to an embodiment of the present application;

fig. 6 is a block diagram of an alternative electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of an embodiment of the present application, a classification network training method is provided. Optionally, in this embodiment, the above classification network training method may be applied in a hardware environment as shown in fig. 1. As shown in figure 1 of the drawings, in which,

according to an aspect of an embodiment of the present application, a classification network training method is provided. Alternatively, in this embodiment, the above-mentioned classification network training method may be applied to a hardware environment formed by the terminal 102 and the server 104 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal 102 through a network, and may be configured to provide services (such as game services, application services, and the like) for the terminal or a client installed on the terminal, set a database on the server or independent of the server, provide data storage services for the server 104, and process cloud services, where the network includes but is not limited to: the terminal 102 is not limited to a PC, a mobile phone, a tablet computer, etc. the terminal may be a wide area network, a metropolitan area network, or a local area network. The classification network training method according to the embodiment of the present application may be executed by the server 104, or may be executed by the terminal 102, or may be executed by both the server 104 and the terminal 102. The terminal 102 may also be configured to execute the classification network training method according to the embodiment of the present application by a client installed thereon.

Taking the method for training the classification network in the present embodiment executed by the server 104 and/or the terminal 102 as an example, fig. 2 is a schematic flowchart of an optional method for training the classification network according to the embodiment of the present application, and as shown in fig. 2, the flowchart of the method may include the following steps:

step S202, obtaining an image sample;

step S204, extracting the feature information of the image sample in each feature extraction stage in the feature extraction network based on the feature extraction network, wherein each feature extraction stage comprises at least one feature extraction network layer;

step S206, correlating the attention of the feature information of each feature extraction stage to obtain fusion feature information;

and S208, training the classification network based on the fusion characteristic information to obtain a trained classification network model.

Through the steps S202 to S208, extracting the feature information of the image sample in each feature extraction stage based on the feature extraction network, and associating the attention of the feature information of each feature extraction stage to obtain fused feature information; the classification network is trained based on the fusion feature information, so that the network can acquire global information, explore more different features among various types and explore more similarities among the same types to obtain higher classification performance.

In the technical solution of step S202, an image sample is obtained, for example, the image sample may be an image in which there is a relatively similar feature to be classified, or a plurality of features of the same type, for example, an image sample with a LOGO.

In the technical solution of step S204, the image sample feature information is extracted based on the feature extraction network, specifically, a residual network may be used for extraction, and for example, a network structure such as resnet-18 and resnet-50 may be used, in this embodiment, resnet-18 may be taken as an example for description, the resnet-18 network structure is divided into four stages, the size of the feature map is reduced stage by stage, and the number of channels of the feature map is increased stage by stage, so that the feature map at the upper layer focuses more on information at the lower layer, that is, information such as texture and color, because the loss of information is not large, and the feature map at the lower layer focuses more on information at the higher layer, that is, information at semantic level. Each feature extraction stage may include at least one feature extraction layer, such as the exemplary feature extraction network structure diagram shown in fig. 3, where each feature extraction layer may be a convolution layer, and the weight of each feature extraction stage is different, so that the features extracted in each stage are different in attention, for example, the feature map in the upper layer focuses more on the information of the low latitude in the bottom layer, i.e., the information of texture, color, etc., and the feature map in the lower layer focuses more on the information of the high latitude in the upper layer, i.e., the information of semantic level.

In the technical solution of step S206, the final output of the network is focused on the global information. The attention of the feature information of each feature extraction stage is correlated to obtain fusion feature information, illustratively, the feature information output by any one feature extraction stage and the feature information output by other feature extraction stages can be subjected to attention correlation, the finally obtained fusion feature information is restrained by the attention of each feature extraction stage, and the importance of each hierarchical feature is fused, so that the global information is fused, more features are provided, the fusion feature information is adopted to train the classification network, the network can obtain the global information, more different features among various types are explored, and the similarity among more similar types is explored to obtain higher classification performance.

In the technical solution of step S206, the classification network may adopt a measurement network, use Arcface Loss as a Loss function, and measure and learn based on the Arcface Loss function, so that the network structure may directly measure the distance between images in an angle space, and may directly and better constrain the angle between the similar image and the non-similar image compared to the cosine distance. And a better classification result is achieved.

As an exemplary embodiment, correlating the feature information attention of each feature extraction stage may add an attention mechanism to the feature extraction network to fuse the low-dimensional information with the high-dimensional information. Specifically, the target feature information is selected from the feature information of each feature extraction stage based on preset target features; and sequentially carrying out attention weighting on the target characteristic information and the characteristic information extracted in other characteristic extraction stages in the characteristic extraction network, wherein each attention weighting result is used as the target characteristic information in the next attention weighting process until the attention weighting of all the characteristic information is finished. In this embodiment, the specific diagnosis information of the last stage may be taken as the target feature information, and specifically, the feature information of the last stage and the features of the first three stages are weighted with attention, so that the output of the last stage can be combined with the information of the first layers. For example, after the first stage and the last stage are attention weighted, new feature information is obtained, the new feature information is used as target feature information of the next stage, and so on, and the obtained new feature information is respectively used to sequentially perform attention weighting with feature information of the remaining stages, and finally fusion feature information is obtained.

As an exemplary embodiment, the specific attention weighting method may be: acquiring the dimension scale of the target characteristic information and the dimension scale of the characteristic information to be operated; adjusting the dimension scale of the feature information to be calculated to the dimension scale of the target feature information;

adding the feature information to be operated and the target feature information in a one-to-one correspondence manner according to dimensionality to obtain added feature information; normalizing the addition characteristic information to obtain the weight of each characteristic dimension of the addition characteristic information; and updating the target characteristic information based on the weight and the characteristic information to be operated, wherein the updated target characteristic information is used as the target characteristic information of the next characteristic information to be operated.

For example, the attention mechanism may be a Query-key-value attention mechanism, and the resnet-18 network structure is divided into four stages, where the first stage outputs a 1 × 64-dimensional feature vector, the second stage outputs a 1 × 128-dimensional feature vector, the third stage outputs a 1 × 256-dimensional feature vector, and the fourth stage outputs a 1 × 512-dimensional feature vector. In order to enable the output result to combine the low-dimensional texture information and the high-dimensional semantic information, the output 1x 512-dimensional feature vector of the last stage is weighted with the feature information of the first three stages, so that the output feature information of the last stage can be combined with the feature information of the first stages. Illustratively, the fourth stage feature information may be an attention object, as Query, the first three stage feature information may be an association object, as Key, where Value of the first three stages is consistent with Key, first, for the first stage, operation is performed, for 1 × 64 features, the feature is raised to 1 × 512 dimensions using convolution of 1X1, then one-to-one corresponding summation is performed with 1 × 512 feature vectors output by the fourth stage, the summed vector is normalized using softmax (), according to the magnitude of the median of the vector, the weight of each vector Value is given, and the weights of all the values are summed up to 1. And multiplying the obtained weight by a vector value in the characteristic information output by the first stage to obtain a vector which is a result of attention correlation of the first stage and the fourth stage. And the obtained vector is used as target characteristic information, the characteristic information of the second stage is added, and the like, and the results of the second stage and the third stage are correspondingly added. The finally obtained fusion features are 1 × 512-dimensional features, namely feature information of four stages is finally fused, and more robust features can be obtained through training of learning by using the progress amount of the fusion feature information through the result of attention mechanism constraint.

As an exemplary embodiment, extracting feature information of each feature extraction stage based on the feature extraction network may include: inputting the image sample into the feature extraction network; respectively extracting a feature map output by each feature extraction stage; and respectively carrying out average pooling on the feature information output by each feature extraction section to obtain the feature information. Illustratively, the first stage output signature has 64 channels (i.e., 64 maps). As shown in fig. 3, the values of each of the 64 graphs are averaged using a global average pooling operation to generate a 1X64 dimensional feature vector. Similarly, the second, third and fourth stages generate vectors of different sizes by using the same operation, for example, generating a 1 × 128-dimensional feature vector, a 1 × 256-dimensional feature vector and a 1 × 512-dimensional feature vector respectively.

According to another aspect of the embodiments of the present application, there is also provided an image classification method, as shown in fig. 4, the method may include:

s402, acquiring an image to be classified.

S404, inputting the images to be classified into a trained classification network to obtain a classification result, wherein the classification network is obtained by training based on fusion feature information, and the fusion feature information is obtained by associating the attention of the feature information of each feature extraction stage when the feature extraction network extracts the feature information of the image sample.

Through the technical scheme of the steps S402 to S404, the classification network is obtained by fusion feature training after the attention of each stage of the fusion feature extraction network is extracted, when the images to be classified are classified, the classification network can obtain global information, explore more different features among various types, and explore more similarities among the same types so as to obtain higher classification performance. The accuracy of similar feature classification of the same kind is greatly improved.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., a ROM (Read-Only Memory)/RAM (Random Access Memory), a magnetic disk, an optical disk) and includes several instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the methods according to the embodiments of the present application.

According to another aspect of the embodiment of the present application, there is also provided a classification network training apparatus for implementing the above classification network training method. Fig. 5 is a schematic diagram of an alternative classification network training apparatus according to an embodiment of the present application, and as shown in fig. 5, the apparatus may include:

an obtaining module 502 for obtaining an image sample;

an extracting module 504, configured to extract feature information of the image sample at each feature extraction stage in a feature extraction network based on the feature extraction network, where each feature extraction stage includes at least one feature extraction network layer;

the association module 506 is configured to associate the attention of the feature information of each feature extraction stage to obtain fused feature information;

and the training module 508 is configured to train the classification network based on the fusion feature information to obtain a trained classification network model.

It should be noted that the obtaining module 502 in this embodiment may be configured to execute the step S202, the extracting module 504 in this embodiment may be configured to execute the step S204, the associating module 506 in this embodiment may be configured to execute the step S206, and the training module 508 in this embodiment may be configured to execute the step S208.

According to another aspect of the embodiment of the present application, there is also provided a classification network training apparatus for implementing the above classification network training method. The apparatus may include:

acquiring an image to be classified;

and inputting the images to be classified into a trained classification network to obtain a classification result, wherein the classification network is obtained by training based on fusion characteristic information, and the fusion characteristic information is obtained by associating the attention of the characteristic information of each characteristic extraction stage when the characteristic extraction network extracts the characteristic information of the image sample.

It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may be operated in a hardware environment as shown in fig. 1, and may be implemented by software, or may be implemented by hardware, where the hardware environment includes a network environment.

According to yet another aspect of the embodiments of the present application, there is also provided an electronic device for implementing the above classification network training and/or image classification method, which may be a server, a terminal, or a combination thereof.

Fig. 6 is a block diagram of an alternative electronic device according to an embodiment of the present invention, as shown in fig. 6, including a processor 602, a communication interface 604, a memory 606, and a communication bus 608, where the processor 602, the communication interface 604, and the memory 606 communicate with each other through the communication bus 608, where,

a memory 606 for storing computer programs;

the processor 602, when executing the computer program stored in the memory 606, implements the following steps:

acquiring an image sample;

extracting feature information of the image sample in each feature extraction stage in a feature extraction network based on the feature extraction network, wherein each feature extraction stage comprises at least one feature extraction network layer;

correlating the attention of the feature information of each feature extraction stage to obtain fusion feature information;

and training the classification network based on the fusion characteristic information to obtain a trained classification network model.

And/or performing the following steps:

acquiring an image to be classified;

Alternatively, in this embodiment, the communication bus may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The memory may include RAM, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. Alternatively, the memory may be at least one memory device located remotely from the processor.

As an example, as shown in fig. 6, the memory 606 may include, but is not limited to, various functional modules of the classification network training and/or image classification apparatus.

The processor may be a general-purpose processor, and may include but is not limited to: a CPU (Central Processing Unit), an NP (Network Processor), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.

It can be understood by those skilled in the art that the structure shown in fig. 6 is only an illustration, and the device implementing the above classification network training and/or image classification method may be a terminal device, and the terminal device may be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 6 is a diagram illustrating a structure of the electronic device. For example, the terminal device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 6, or have a different configuration than shown in FIG. 6.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disk, ROM, RAM, magnetic or optical disk, and the like.

According to still another aspect of an embodiment of the present application, there is also provided a storage medium. Optionally, in this embodiment, the storage medium may be used for program codes for performing the classification network training and/or the image classification method.

Optionally, in this embodiment, the storage medium may be located on at least one of a plurality of network devices in a network shown in the above embodiment.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps:

acquiring an image sample;

And/or performing the following steps:

acquiring an image to be classified;

Optionally, the specific example in this embodiment may refer to the example described in the above embodiment, which is not described again in this embodiment.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a U disk, a ROM, a RAM, a removable hard disk, a magnetic disk, or an optical disk.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the method described in the embodiments of the present application.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, and may also be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution provided in the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A classification network training method is characterized by comprising the following steps:

acquiring an image sample;

2. The classification network training method according to claim 1, wherein the associating the feature information attention of each feature extraction stage to obtain the fused feature information comprises:

selecting target feature information from the feature information of each feature extraction stage based on preset target features;

and sequentially carrying out attention weighting on the target characteristic information and the characteristic information extracted in other characteristic extraction stages in the characteristic extraction network, wherein each attention weighting result is used as the target characteristic information in the next attention weighting process until the attention weighting of all the characteristic information is finished.

3. The classification network training method according to claim 2, wherein the attention-weighting the target feature information with the feature information of the other feature extraction stages in the feature extraction network in sequence comprises:

acquiring the dimension scale of the target characteristic information and the dimension scale of the characteristic information to be operated;

adjusting the dimension scale of the feature information to be calculated to the dimension scale of the target feature information;

adding the feature information to be operated and the target feature information in a one-to-one correspondence manner according to dimensionality to obtain added feature information;

normalizing the addition characteristic information to obtain the weight of each characteristic dimension of the addition characteristic information;

and updating the target characteristic information based on the weight and the characteristic information to be operated, wherein the updated target characteristic information is used as the target characteristic information of the next characteristic information to be operated.

4. The classification network training method of claim 3, wherein the feature information includes a multi-dimensional feature vector;

the updating the target feature information based on the weight and the feature information to be operated includes:

and calculating the product of the weight and the vector value of the feature information to be operated as the vector value of the updated target feature information.

5. The classification network training method as claimed in claim 1, wherein the extracting feature information of the image sample at each feature extraction stage in the feature extraction network based on the feature extraction network comprises:

inputting the image sample into the feature extraction network;

respectively extracting a feature map output by each feature extraction stage;

and respectively carrying out average pooling on the feature information output by each feature extraction section to obtain the feature information.

6. The classification network training method of claim 1, wherein the training a classification network based on the fused feature information comprises

And inputting the fusion characteristic information into the classification network, and training the classification network by taking an Arcface loss function as a loss function to obtain the trained classification network.

7. An image classification method, comprising:

acquiring an image to be classified;

8. A classification network training device is characterized by comprising

The acquisition module is used for acquiring an image sample;

the extraction module is used for extracting the feature information of the image sample in each feature extraction stage in the feature extraction network based on the feature extraction network, wherein each feature extraction stage comprises at least one feature extraction network layer;

the association module is used for associating the attention of the feature information of each feature extraction stage to obtain fusion feature information;

and the training module is used for training the classification network based on the fusion characteristic information to obtain a trained classification network model.

9. An electronic device comprising a processor, a communication interface, a memory and a communication bus, wherein said processor, said communication interface and said memory communicate with each other via said communication bus,

the memory for storing a computer program;

the processor configured to execute the steps of the classification network training method of any one of claims 1 to 6 and/or the image classification method of claim 7 by running the computer program stored on the memory.

10. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the steps of the classification network training method according to one of claims 1 to 6 and/or the image classification method according to claim 7 when executed.