CN112819073B

CN112819073B - Classification network training, image classification method and device and electronic equipment

Info

Publication number: CN112819073B
Application number: CN202110137951.3A
Authority: CN
Inventors: 朱彦浩; 胡郡郡; 唐大闰
Original assignee: Shanghai Minglue Artificial Intelligence Group Co Ltd
Current assignee: Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2024-08-20
Anticipated expiration: 2041-02-01
Also published as: CN112819073A

Abstract

The application provides a classification network training method, an image classification method, a device and electronic equipment, wherein the classification network training method comprises the following steps: acquiring an image sample; extracting feature information of the image sample at each feature extraction stage in a feature extraction network based on the feature extraction network, wherein each feature extraction stage comprises at least one feature extraction network layer; correlating the attention of the feature information of each feature extraction stage to obtain fusion feature information; training the classification network based on the fusion characteristic information to obtain a trained classification network model. The network can acquire global information, discover more difference features among various classes and discover more similarity among the classes so as to obtain higher classification performance and improve the accuracy of classification of similar features in the image.

Description

Classification network training, image classification method and device and electronic equipment

Technical Field

The application relates to the field of artificial intelligence, in particular to a classification network training method, an image classification device and electronic equipment.

Background

Traditional image classification is usually split based on a convolutional neural network, but classification based on the traditional convolutional neural network can lead to inaccurate classification results, for example, in the LOGO detection field, LOGO marks in the same line are different in size, the degree of distinction between LOGOs is not high, and especially for classification of LOGO of products with similar names, the traditional classification network structure is difficult to meet actual requirements.

Accordingly, there is a problem in the related art how to improve the accuracy of classification of similar features in an image.

Disclosure of Invention

The application provides a classification network training method, an image classification device and electronic equipment, which at least solve the problem of how to improve the accuracy of similar feature classification in images in the related technology.

According to an aspect of an embodiment of the present application, there is provided a classification network training method, including: acquiring an image sample; extracting feature information of the image sample at each feature extraction stage in a feature extraction network based on the feature extraction network, wherein each feature extraction stage comprises at least one feature extraction network layer; correlating the attention of the feature information of each feature extraction stage to obtain fusion feature information; training the classification network based on the fusion characteristic information to obtain a trained classification network model.

Optionally, the associating the attention of the feature information of each feature extraction stage to obtain the fused feature information includes: selecting target feature information in the feature information of each feature extraction stage based on a preset target feature; and carrying out attention weighting on the target feature information and feature information extracted by other feature extraction stages in the feature extraction network in sequence, wherein the attention weighting result of each time is used as the target feature information in the next attention weighting until the attention weighting of all the feature information is completed.

Optionally, the weighted attention of the target feature information with feature information of other feature extraction stages in the feature extraction network sequentially includes: acquiring the dimension scale of the target feature information and the dimension scale of the feature information to be operated; adjusting the dimension scale of the feature information to be operated to the dimension scale of the target feature information; adding the feature information to be operated and the target feature information in a one-to-one correspondence manner according to dimensions to obtain added feature information; normalizing the summation characteristic information to obtain the weight of each characteristic dimension of the summation characteristic information; and updating the target characteristic information based on the weight and the characteristic information to be operated, wherein the updated target characteristic information is used as the target characteristic information of the characteristic information to be operated next.

Optionally, the feature information includes a multidimensional feature vector; the updating the target feature information based on the weight and the feature information to be operated comprises: and calculating the product of the weight and the vector value of the feature information to be operated as the vector value of the updated target feature information.

Optionally, the extracting, based on a feature extraction network, feature information of the image sample at each feature extraction stage in the feature extraction network includes: inputting the image sample into the feature extraction network; extracting the feature graphs output by each feature extraction stage respectively; and respectively carrying out average pooling on the characteristic information output by each characteristic extraction section to obtain the characteristic information.

Optionally, the training the classification network based on the fusion characteristic information includes: inputting the fusion characteristic information into the classification network, and training the classification network by taking Arcface loss functions as loss functions to obtain a trained classification network.

According to still another aspect of the embodiment of the present application, there is also provided an image classification method including: acquiring an image to be classified; inputting the images to be classified into a trained classification network to obtain a classification result, wherein the classification network is trained based on fusion feature information, and the fusion feature information is obtained by correlating the attention of the feature information of each feature extraction stage when the feature extraction network extracts the feature information of the image sample.

According to still another aspect of the embodiment of the present application, there is also provided a classified network training apparatus, including: the acquisition module is used for acquiring an image sample; an extraction module for extracting feature information of the image sample at each feature extraction stage in a feature extraction network based on the feature extraction network, wherein each feature extraction stage comprises at least one feature extraction network layer; the association module is used for associating the attention of the feature information of each feature extraction stage to obtain fusion feature information; and the training module is used for training the classification network based on the fusion characteristic information to obtain a trained classification network model.

According to still another aspect of the embodiments of the present application, there is provided an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein the memory is used for storing a computer program; a processor for performing the method steps of any of the embodiments described above by running the computer program stored on the memory.

According to a further aspect of the embodiments of the present application there is also provided a computer readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the method steps of any of the embodiments described above when run.

Extracting the feature information of the image sample in each feature extraction stage based on the feature extraction network, and correlating the attention of the feature information of each feature extraction stage to obtain fused feature information; based on the fusion characteristic information, the classification network is trained, so that the network can acquire global information, discover more difference characteristics among various types, discover more similarity among the same types, obtain higher classification performance, and improve the accuracy of classification of similar characteristics in the image.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a schematic diagram of a hardware environment of an alternative classification network training and/or image classification method according to an embodiment of the invention;

FIG. 2 is a flow diagram of an alternative method of training a classification network according to an embodiment of the application;

FIG. 3 is a network architecture diagram of an alternative feature extraction network in accordance with embodiments of the application;

FIG. 4 is a flow chart of another alternative image classification method according to an embodiment of the application;

FIG. 5 is a block diagram of an alternative classification network training apparatus in accordance with an embodiment of the application;

fig. 6 is a block diagram of an alternative electronic device in accordance with an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to one aspect of an embodiment of the present application, a method of training a classification network is provided. Alternatively, in this embodiment, the above-described classification network training method may be applied to a hardware environment as shown in fig. 1. As shown in figure 1 of the drawings,

According to one aspect of an embodiment of the present application, a method of training a classification network is provided. Alternatively, in the present embodiment, the above-described classification network training method may be applied to a hardware environment constituted by the terminal 102 and the server 104 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal 102 through a network, which may be used to provide services (such as game services, application services, etc.) to the terminal or clients installed on the terminal, may set a database on or independent of the server, may be used to provide data storage services to the server 104, and may also be used to process cloud services, where the network includes, but is not limited to: the terminal 102 is not limited to a PC, a mobile phone, a tablet computer, etc., but is a wide area network, a metropolitan area network, or a local area network. The method for training the classification network according to the embodiment of the present application may be performed by the server 104, may be performed by the terminal 102, or may be performed by both the server 104 and the terminal 102. The method for training the classification network performed by the terminal 102 according to the embodiment of the present application may also be performed by a client installed thereon.

Taking the example that the server 104 and/or the terminal 102 perform the classification network training method in this embodiment, fig. 2 is a schematic flow diagram of an alternative classification network training method according to an embodiment of the present application, as shown in fig. 2, the flow of the method may include the following steps:

Step S202, obtaining an image sample;

Step S204, extracting feature information of each feature extraction stage of the image sample in the feature extraction network based on the feature extraction network, wherein each feature extraction stage comprises at least one feature extraction network layer;

step S206, correlating the attention of the feature information of each feature extraction stage to obtain fusion feature information;

And step S208, training the classification network based on the fusion characteristic information to obtain a trained classification network model.

Through the steps S202 to S208, feature information of the image sample in each feature extraction stage is extracted based on the feature extraction network, and attention of the feature information of each feature extraction stage is associated to obtain fusion feature information; based on the fusion characteristic information, the classification network is trained, so that the network can acquire global information, discover more difference characteristics among various classes and discover more similarity among the classes, and further obtain higher classification performance.

In the solution of step S202, an image sample is acquired, and the image sample may be, for example, an image sample with a LOGO, or a plurality of features of the same type, for example, similar to the features to be classified in the image.

In the technical solution of step S204, the feature extraction network is used to extract the feature information of the image sample, specifically, a residual network may be used to extract the feature information, for example, a network structure such as resnet-18, and a network structure such as resnet-50 may be used, in this embodiment, the network structure may be illustrated by taking resnet-18 as an example, and resnet-18 is divided into four stages, where the size of the feature map is reduced stage by stage, and the number of channels of the feature map is increased stage by stage, so that the upper layer feature map focuses on the information of the bottom layer, that is, the texture, the color, and the like, and the lower layer focuses on the information of the higher layer, that is, the semantic level information, because the loss of the information is not great. Each feature extraction stage may include at least one feature extraction layer, such as an exemplary feature extraction network structure shown in fig. 3, where each feature extraction layer may be a convolution layer, where the weights of each feature extraction stage are different, so that the attention of the features extracted in each stage is different, for example, the feature map of the upper layer focuses more on the low latitude information of the bottom layer, i.e. the texture, color, etc., and the feature map of the lower layer focuses more on the high latitude information of the upper layer, i.e. the semantic level information.

In the solution of step S206, the final output of the network is focused on global information in order to be able to make it possible. The feature information attention of each feature extraction stage is correlated to obtain fused feature information, and the feature information output by any one feature extraction stage can be correlated with the feature information output by other feature extraction stages in an attention-related manner, so that the finally obtained fused feature information receives the attention constraint of each feature extraction stage and fuses the importance of each level of features, therefore, global information is fused, more features are provided, the classification network is trained by adopting the fused feature information, the network can acquire global information, discover the difference features among more classes and the similarity among more classes, and obtain higher classification performance.

In the technical solution of step S206, the classification network may use a metric network, and use Arcface Loss as a loss function, and metric learning based on the Arcface Loss function, so that the network structure may directly measure the distance between images in an angle space, and may more directly and better constrain the angles between similar images and non-similar images than cosine distances. And a better classification result is achieved.

As an exemplary embodiment, correlating the feature information attention of each feature extraction stage may add an attention mechanism to the feature extraction network to fuse low-dimensional information with high-dimensional information. Specifically, selecting target feature information in the feature information of each feature extraction stage based on a preset target feature; and carrying out attention weighting on the target feature information and feature information extracted by other feature extraction stages in the feature extraction network in sequence, wherein the attention weighting result of each time is used as the target feature information in the next attention weighting until the attention weighting of all the feature information is completed. For example, the feature information of one of the feature extraction stages may be selected as the target feature information, for example, the feature information of the last stage may be selected as the target feature information, and of course, the feature information of other stages may also be selected as the target feature information. For example, after the attention weighting is performed on the first stage and the last stage, new feature information is obtained, the new feature information is used as target feature information of the next stage, and then the new feature information is used for performing attention weighting with the feature information of the rest stages in sequence, so that fused feature information is finally obtained.

As an exemplary embodiment, a specific attention weighting method may be: acquiring the dimension scale of the target feature information and the dimension scale of the feature information to be operated; adjusting the dimension scale of the feature information to be operated to the dimension scale of the target feature information;

Adding the feature information to be operated and the target feature information in a one-to-one correspondence manner according to dimensions to obtain added feature information; normalizing the summation characteristic information to obtain the weight of each characteristic dimension of the summation characteristic information; and updating the target characteristic information based on the weight and the characteristic information to be operated, wherein the updated target characteristic information is used as the target characteristic information of the characteristic information to be operated next.

For example, the attention mechanism may adopt a attention mechanism of Query-key-value, and the resnet-18 network structure is divided into four stages, wherein the first stage outputs a1×64-dimensional feature vector, the second stage outputs a1×128-dimensional feature vector, the third stage outputs a1×256-dimensional feature vector, and the fourth stage outputs a1×512-dimensional feature vector for illustration. In order to enable the output result to combine the low-dimensional texture information and the high-dimensional semantic information, the output 1x 512-dimensional feature vector of the last stage is weighted with the feature information of the first three stages, so that the feature information of the output of the last stage can be combined with the feature information of the first three stages. For example, the fourth-stage feature information may be regarded as an object of interest, as Query, the first three-stage feature information may be regarded as an associated object, and as Key, where Value in the first three stages is consistent with Key, first, for the first stage, the first stage is operated, 1X1 is used for 1X 64 features to raise the first stage to 1X 512 dimensions, then one-to-one summation is performed on the first stage feature vector and the first X512 feature vector output in the fourth stage, the summed vector is normalized by softmax (), and weight of each vector Value is given according to the magnitude of the median Value of the vector, and the weight of all the values is added to 1. The obtained weight is multiplied by a vector value in the feature information output by the first stage, and the obtained vector is a result of attention association of the first stage and the fourth stage. And taking the obtained vector as target characteristic information, adding the characteristic information of the second stage, and correspondingly adding the results of the second stage and the third stage. The final fusion feature is 1x 512 dimension feature, namely the result of final fusion of the feature information of four stages and constraint through the attention mechanism, and more robust feature can be obtained by training of learning the progress of the fusion feature information.

As an exemplary embodiment, extracting feature information of each feature extraction stage based on the feature extraction network may include: inputting the image sample into the feature extraction network; extracting the feature graphs output by each feature extraction stage respectively; and respectively carrying out average pooling on the characteristic information output by each characteristic extraction section to obtain the characteristic information. Illustratively, the feature map output by the first stage has 64 channels (i.e., 64 maps). As shown in fig. 3, the values of each of the 64 graphs are averaged using a global averaging pooling operation to generate a 1X64 dimensional feature vector. Similarly, the second, third and fourth stages generate vectors of different sizes by the same operation, for example, 1x 128-dimensional feature vectors, 1x 256-dimensional feature vectors and 1x 512-dimensional feature vectors are generated respectively.

According to another aspect of the embodiment of the present application, there is also provided an image classification method, as shown in fig. 4, the method may include:

S402, obtaining an image to be classified.

S404, inputting the images to be classified into a trained classification network to obtain classification results, wherein the classification network is trained based on fusion feature information, and the fusion feature information is obtained by associating the attention of the feature information of each feature extraction stage when the feature extraction network extracts the feature information of the image sample.

Through the technical scheme of step S402-step S404, the classification network is obtained by training the fusion features after extracting the attention of each stage of the network based on the fusion features, and when classifying the images to be classified, the classification network can acquire global information, discover more difference features among various classes and discover more similarities among the classes so as to obtain higher classification performance. The accuracy of similar feature classification of the same kind is greatly improved.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM (Read-Only Memory)/RAM (Random Access Memory), magnetic disk, optical disk) and including instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

According to another aspect of the embodiment of the application, a classification network training device for implementing the classification network training method is also provided. FIG. 5 is a schematic diagram of an alternative categorical network training apparatus, according to an embodiment of the application, as shown in FIG. 5, which may include:

An acquisition module 502, configured to acquire an image sample;

An extraction module 504, configured to extract feature information of the image sample at each feature extraction stage in the feature extraction network based on a feature extraction network, where each feature extraction stage includes at least one feature extraction network layer;

The association module 506 is configured to associate the attention of the feature information in each feature extraction stage to obtain fused feature information;

And the training module 508 is configured to train the classification network based on the fused feature information, and obtain a trained classification network model.

It should be noted that, the acquiring module 502 in this embodiment may be used to perform the step S202, the extracting module 504 in this embodiment may be used to perform the step S204, the associating module 506 in this embodiment may be used to perform the step S206, and the training module 508 in this embodiment may be used to perform the step S208.

According to another aspect of the embodiment of the application, a classification network training device for implementing the classification network training method is also provided. The apparatus may include:

acquiring an image to be classified;

inputting the images to be classified into a trained classification network to obtain a classification result, wherein the classification network is trained based on fusion feature information, and the fusion feature information is obtained by correlating the attention of the feature information of each feature extraction stage when the feature extraction network extracts the feature information of the image sample.

It should be noted that the above modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to what is disclosed in the above embodiments. It should be noted that the above modules may be implemented in software or in hardware as part of the apparatus shown in fig. 1, where the hardware environment includes a network environment.

According to yet another aspect of an embodiment of the present application, there is also provided an electronic device for implementing the above classification network training and/or image classification method, which may be a server, a terminal, or a combination thereof.

Fig. 6 is a block diagram of an alternative electronic device, according to an embodiment of the application, as shown in fig. 6, including a processor 602, a communication interface 604, a memory 606, and a communication bus 608, wherein the processor 602, the communication interface 604, and the memory 606 communicate with each other via the communication bus 608, wherein,

A memory 606 for storing a computer program;

The processor 602, when executing the computer program stored on the memory 606, performs the following steps:

acquiring an image sample;

Extracting feature information of the image sample at each feature extraction stage in a feature extraction network based on the feature extraction network, wherein each feature extraction stage comprises at least one feature extraction network layer;

correlating the attention of the feature information of each feature extraction stage to obtain fusion feature information;

training the classification network based on the fusion characteristic information to obtain a trained classification network model.

And/or performing the steps of:

acquiring an image to be classified;

Alternatively, in the present embodiment, the above-described communication bus may be a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus, or an EISA (Extended Industry Standard Architecture ) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The memory may include RAM or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

As an example, as shown in FIG. 6, the memory 606 may include, but is not limited to, the various functional blocks of the classification network training and/or image classification apparatus described above.

The processor may be a general purpose processor and may include, but is not limited to: CPU (Central Processing Unit ), NP (Network Processor, network processor), etc.; but may also be a DSP (DIGITAL SIGNAL Processing), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field-Programmable gate array) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments, and this embodiment is not described herein.

It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely illustrative, and the device implementing the above-described classification network training and/or image classification method may be a terminal device, and the terminal device may be a smart phone (such as an Android Mobile phone, an iOS Mobile phone, etc.), a tablet computer, a palm computer, a Mobile internet device (Mobile INTERNET DEVICES, MID), a PAD, etc. Fig. 6 is not limited to the structure of the electronic device. For example, the terminal device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in fig. 6, or have a different configuration than shown in fig. 6.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device to execute in association with hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, ROM, RAM, magnetic or optical disk, etc.

According to yet another aspect of an embodiment of the present application, there is also provided a storage medium. Alternatively, in the present embodiment, the above-described storage medium may be used for program code for performing the classification network training and/or the image classification method.

Alternatively, in this embodiment, the storage medium may be located on at least one network device of the plurality of network devices in the network shown in the above embodiment.

Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of:

acquiring an image sample;

And/or performing the steps of:

acquiring an image to be classified;

Alternatively, specific examples in the present embodiment may refer to examples described in the above embodiments, which are not described in detail in the present embodiment.

Alternatively, in the present embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a U disk, ROM, RAM, a mobile hard disk, a magnetic disk or an optical disk.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution provided in the present embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. A method of training a classification network, comprising:

acquiring an image sample;

correlating the attention of the feature information of each feature extraction stage to obtain fusion feature information; the fused feature information is constrained by the attention of the feature information of each feature extraction stage;

training the classification network based on the fusion characteristic information to obtain a trained classification network model;

Wherein, the associating the attention of the feature information of each feature extraction stage to obtain the fused feature information includes: selecting target feature information from feature information of each feature extraction stage based on preset target features; carrying out attention weighting on the target feature information and feature information extracted in other feature extraction stages in the feature extraction network in sequence, wherein the attention weighting result of each time is used as the target feature information in the next attention weighting until the attention weighting of all feature information is completed;

wherein the weighted attention of the target feature information and the feature information of other feature extraction stages in the feature extraction network sequentially includes: acquiring the dimension scale of the target feature information and the dimension scale of the feature information to be operated; adjusting the dimension scale of the feature information to be operated to the dimension scale of the target feature information; adding the feature information to be operated and the target feature information in a one-to-one correspondence manner according to dimensions to obtain added feature information; normalizing the summation characteristic information to obtain the weight of each characteristic dimension of the summation characteristic information; updating the target feature information based on the weight and the feature information to be operated, wherein the updated target feature information is used as the target feature information of the feature information to be operated next;

Wherein the extracting feature information of the image sample at each feature extraction stage in the feature extraction network based on the feature extraction network comprises: inputting the image sample into the feature extraction network; extracting the feature graphs output by each feature extraction stage respectively; and respectively carrying out average pooling on the characteristic information output by each characteristic extraction section to obtain the characteristic information.

2. The classification network training method of claim 1, wherein the feature information comprises a multi-dimensional feature vector;

the updating the target feature information based on the weight and the feature information to be operated comprises:

and calculating the product of the weight and the vector value of the feature information to be operated as the vector value of the updated target feature information.

3. The method of training a classification network of claim 1, wherein training the classification network based on the fused feature information comprises

Inputting the fusion characteristic information into the classification network, and training the classification network by taking Arcface loss functions as loss functions to obtain a trained classification network.

4. An image classification method, comprising:

acquiring an image to be classified;

Inputting the images to be classified into a trained classification network to obtain a classification result, wherein the classification network is trained based on fusion feature information, and the fusion feature information is obtained by correlating the attention of the feature information of each feature extraction stage when the feature extraction network extracts the feature information of the image sample; the fused feature information is constrained by the attention of the feature information of each feature extraction stage;

Wherein, the classification network includes in training phase: selecting target feature information from feature information of each feature extraction stage based on preset target features; carrying out attention weighting on the target feature information and feature information extracted in other feature extraction stages in the feature extraction network in sequence, wherein the attention weighting result of each time is used as the target feature information in the next attention weighting until the attention weighting of all feature information is completed;

Wherein, the classification network includes in training phase: inputting the image sample into the feature extraction network; extracting the feature graphs output by each feature extraction stage respectively; and respectively carrying out average pooling on the characteristic information output by each characteristic extraction section to obtain the characteristic information.

5. A classification network training apparatus, comprising:

the acquisition module is used for acquiring an image sample;

An extraction module for extracting feature information of the image sample at each feature extraction stage in a feature extraction network based on the feature extraction network, wherein each feature extraction stage comprises at least one feature extraction network layer;

the association module is used for associating the attention of the feature information of each feature extraction stage to obtain fusion feature information; the fused feature information is constrained by the attention of the feature information of each feature extraction stage;

The training module is used for training the classification network based on the fusion characteristic information to obtain a trained classification network model;

Wherein, the association module includes: a selection unit for selecting target feature information from the feature information of each feature extraction stage based on a preset target feature; a weighting unit, configured to perform attention weighting on the target feature information and feature information extracted in other feature extraction stages in the feature extraction network in sequence, where each attention weighting result is used as target feature information in next attention weighting until attention weighting of all feature information is completed;

Wherein the weighting unit includes: the acquisition subunit is used for acquiring the dimension scale of the target feature information and the dimension scale of the feature information to be operated; the adjusting subunit is used for adjusting the dimension scale of the feature information to be operated to the dimension scale of the target feature information; the computing subunit is used for adding the feature information to be operated and the target feature information in a one-to-one correspondence manner according to the dimension to obtain added feature information; the processing subunit is used for normalizing the summation characteristic information to obtain the weight of each characteristic dimension of the summation characteristic information; the updating subunit is used for updating the target feature information based on the weight and the feature information to be operated, wherein the updated target feature information is used as the target feature information of the feature information to be operated next;

Wherein, the extraction module includes: an input unit for inputting the image sample into the feature extraction network; the extraction subunit is used for respectively extracting the feature graphs output by each feature extraction stage; and the processing subunit is used for respectively carrying out average pooling on the characteristic information output by each characteristic extraction section to obtain the characteristic information.

6. An electronic device comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus, characterized in that,

The memory is used for storing a computer program;

The processor is configured to execute the steps of the classification network training method according to any one of claims 1 to 3 and/or the image classification method according to claim 4 by running the computer program stored on the memory.

7. A computer-readable storage medium, characterized in that the storage medium has stored therein a computer program, wherein the computer program is arranged to perform the steps of the classification network training method of any of claims 1 to 3 and/or the image classification method of claim 4 when run.