CN113989566A

CN113989566A - Image classification method and device, computer equipment and storage medium

Info

Publication number: CN113989566A
Application number: CN202111275615.1A
Authority: CN
Inventors: 宗卓凡; 黎昆昌; 宋广录; 刘宇
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-28

Abstract

The present disclosure provides an image classification method, apparatus, computer device and storage medium, wherein the method comprises: determining a first number of initial image blocks corresponding to the target image and image block characteristics corresponding to each initial image block based on the target image to be processed; determining importance information corresponding to the initial image blocks based on the image block characteristics of the initial image blocks aiming at each initial image block; aggregating the image block characteristics respectively corresponding to the first number of initial image blocks based on the importance information corresponding to each initial image block to obtain a second number of target image blocks and the image block characteristics corresponding to each target image block; the second number is less than the first number; and determining the image classification result of the target image based on the image block characteristics corresponding to each target image block. The embodiment of the disclosure can improve the reasoning speed of the neural network.

Description

Image classification method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to an image classification method, an image classification device, a computer device, and a storage medium.

Background

In order to increase the network inference speed of the neural network and achieve efficient image classification prediction, in the prior art, a method of pruning the neural network is generally used, for example, the number of network layers of the neural network or the number of features extracted by the network layers in the neural network is reduced to obtain a light-weight neural network, and the obtained light-weight neural network is used to process an image, so as to achieve the purposes of increasing the network inference speed and achieving efficient image classification prediction.

However, the effect of improving the network inference speed by pruning the neural network is not obvious, and the prediction accuracy of the neural network is also reduced.

Disclosure of Invention

The embodiment of the disclosure at least provides an image classification method, an image classification device, computer equipment and a storage medium.

In a first aspect, an embodiment of the present disclosure provides an image classification method, including:

determining a first number of initial image blocks corresponding to a target image and image block characteristics corresponding to each initial image block based on the target image to be processed;

for each initial image block, determining importance information corresponding to the initial image block based on the image block characteristics of the initial image block;

aggregating the image block characteristics respectively corresponding to the first number of initial image blocks based on the importance information corresponding to each initial image block to obtain a second number of target image blocks and image block characteristics corresponding to each target image block; the second number is less than the first number;

and determining an image classification result of the target image based on the image block characteristics corresponding to each target image block.

According to the embodiment, the image block features respectively corresponding to the first number of initial image blocks are aggregated by determining the importance information capable of representing the influence degree of each initial image block on the output classification result, so that redundant image block features can be removed; for example, removing image block features with little or no influence on the output classification result; thus, image block features corresponding to the second number of target image blocks that determine the output classification result are obtained. Determining an image classification result of the target image by using image block characteristics corresponding to the second number of target image blocks; the accuracy of the determined image classification result is effectively ensured, and the second quantity is smaller than the first quantity, so that the quantity of image block features to be processed is reduced, the reasoning speed is effectively improved, and the defect of reduced prediction caused by pruning operation is avoided.

In a possible implementation manner, the determining, for each of the initial image blocks, importance information corresponding to the initial image block based on image block characteristics of the initial image block includes:

for each initial image block, coding the image block characteristics corresponding to the initial image block to obtain the coding characteristics corresponding to the initial image block;

and determining importance information corresponding to each initial image block based on the coding features corresponding to the initial image blocks.

According to the embodiment, the dimension reduction processing of the image block features can be realized through the coding processing of the target coding module, the coding features which are small in data size and convenient to process are obtained, and further, the importance information corresponding to the initial image block can be accurately determined based on the processing of the coding features which are convenient to process.

In a possible implementation, the determining importance information of the initial image blocks based on the coding features corresponding to each of the initial image blocks includes:

carrying out normalization processing on the coding features corresponding to each initial image block to obtain the coding features after normalization processing;

carrying out full-connection mapping processing on the coding features after the normalization processing to obtain first intermediate features; wherein the feature dimension corresponding to the first intermediate feature is smaller than the feature dimension corresponding to the encoding feature;

and determining importance information corresponding to each initial image block based on the first intermediate features corresponding to each initial image block.

According to the embodiment, the normalization processing of the coding features corresponding to each initial image block can be realized, namely, the data range corresponding to the coding features is limited in the target interval (0, 1), and the reduction of the feature dimension of the coding features after the normalization processing can be realized through the full-connection mapping processing, so that the data quantity and the difficulty of processing required for subsequently determining the importance information corresponding to the initial image block are reduced.

In a possible implementation manner, the normalizing the coding feature corresponding to each initial image block to obtain the normalized coding feature includes:

determining a normalization weight corresponding to each initial image block based on the coding features corresponding to each initial image block;

and based on the normalization weight corresponding to each initial image block, carrying out normalization processing on the coding features corresponding to each initial image block to obtain the coding features after normalization processing.

According to the embodiment, based on the determined normalization weight, the weighting processing of the coding features corresponding to the initial image blocks can be realized, so that the accuracy and the reasonability of the obtained coding features after each normalization processing are improved, and the accuracy of subsequently determining the importance information corresponding to the initial image blocks is improved.

In a possible implementation manner, the performing full-concatenation mapping processing on the normalized coding feature to obtain a first intermediate feature includes:

determining a dimension compression weight corresponding to the coding features after the normalization processing based on the coding features after the normalization processing;

and performing full-connection mapping processing on the normalized coding features according to the dimension compression weight to obtain the first intermediate features.

According to the embodiment, the dimension compression weight is utilized to perform full-connection mapping processing on the coding features after normalization processing, dimension compression of different degrees on the coding features after different normalization processing can be achieved, the first intermediate features with redundant features reduced are obtained, the accuracy of the first intermediate features is improved, the complexity of the obtained first intermediate features is reduced, and therefore the accuracy of subsequently determining the importance information corresponding to the initial image block is improved, and the difficulty of subsequently determining the importance information corresponding to the initial image block is reduced.

In a possible implementation manner, the determining importance information corresponding to each of the initial image blocks based on the first intermediate feature corresponding to each of the initial image blocks includes:

carrying out nonlinear transformation on each first intermediate feature, and carrying out full-connection mapping processing on the first intermediate features after the nonlinear transformation to obtain a second intermediate feature corresponding to each first intermediate feature;

and determining importance information corresponding to each initial image block based on each second intermediate feature.

According to the embodiment, based on the nonlinear transformation of the first intermediate features, the feature space can be enriched, the second intermediate features with richer space information can be obtained, and further, based on the second intermediate features with richer space information, the reasonability and the accuracy of the importance information corresponding to each determined initial image block can be improved.

In a possible implementation manner, the aggregating, based on the importance information corresponding to each of the initial image blocks, image block features respectively corresponding to the first number of initial image blocks to obtain a second number of target image blocks and an image block feature corresponding to each of the target image blocks includes:

determining a first feature matrix corresponding to the importance information based on the importance information corresponding to each initial image block, wherein the matrix dimension corresponding to the first feature matrix is NxM, N is the first quantity, and M is the second quantity;

performing matrix dimension conversion operation on the first feature matrix to obtain a second feature matrix with matrix dimension of M multiplied by N;

and aggregating the image block characteristics respectively corresponding to the first number of initial image blocks based on the second characteristic matrix and the image block matrix corresponding to the image block characteristics of the initial image blocks to obtain the second number of target image blocks and the image block characteristics corresponding to each target image block.

In the embodiment, based on the matrix dimension conversion operation, the dimension conversion of the feature data corresponding to each importance degree information in the first feature matrix can be realized, and a second feature matrix which can be subjected to matrix multiplication with the image block matrix is obtained; furthermore, based on matrix multiplication, the aggregation processing corresponding to the image block features respectively corresponding to each initial image block in the image block matrix can remove redundant image block features, obtain the image block features corresponding to the second number of target image blocks determining the output classification result, effectively reduce the data volume of the image block features needing to be processed subsequently, and further effectively improve the inference speed.

In a possible implementation manner, the determining an image classification result of the target image based on the image block feature corresponding to each target image block includes:

taking the target image blocks as new initial image blocks, taking the number of the new initial image blocks as a new first number, returning to the step of determining importance information corresponding to the initial image blocks based on the image block characteristics of the initial image blocks aiming at each initial image block, and determining probability classification information corresponding to the target image based on the image block characteristics corresponding to each finally determined target image block until the number of times of return reaches a preset value;

and determining an image classification result of the target image based on the probability classification information.

According to the embodiment, the redundant image block features can be removed for multiple times by the image block features corresponding to the target image, and the redundant image block features can be removed with high precision, so that the redundant features in the finally determined image block features corresponding to each target image block are reduced, and the image block features determining the output classification result are obtained; further, based on the determined probability classification information of the target image corresponding to each image category, an accurate image classification result can be determined.

In a possible implementation manner, the step of determining the image classification result of the target image is performed by a pre-trained pruning neural network based on the target image to be processed;

the method further comprises the step of training the pruning neural network:

acquiring a sample image;

inputting the sample image into a pruning neural network to be trained, processing the sample image by using the pruning neural network to be trained, determining a first predicted image feature output by each data processing block, and determining first prediction classification information corresponding to the sample image; the data processing block is used for determining importance information of initial prediction image blocks based on image block characteristics of the initial prediction image blocks corresponding to the sample image, and aggregating image block characteristics respectively corresponding to a third number of initial prediction image blocks based on importance information corresponding to each initial prediction image block to obtain a fourth number of target prediction image blocks and first prediction image characteristics corresponding to each target prediction image block;

inputting the sample image into a pre-trained teacher neural network, processing the sample image by using the teacher neural network, determining a second predicted image feature output by each data processing block in the teacher neural network, and determining second predicted classification information corresponding to the sample image; the data processing block in the teacher neural network comprises a target coding module;

and determining the prediction loss of the pruning neural network to be trained based on the first prediction image feature, the second prediction image feature, the first prediction classification information and the second prediction classification information, and performing iterative training on the pruning neural network to be trained by using the prediction loss until a preset training cut-off condition is met to obtain the trained pruning neural network.

According to the embodiment, based on the second predicted image features and the second predicted classification information output by the pre-trained teacher neural network, and the first predicted image features and the first predicted classification information output by the pruning neural network to be trained, the intensive knowledge distillation of the pruning neural network to be trained can be realized, reasonable and accurate prediction loss is obtained, then the pruning neural network to be trained is subjected to iterative training by using the prediction loss, the prediction accuracy of the pruning neural network to be trained can be improved, and therefore the final trained pruning neural network is guaranteed to have reliable prediction accuracy, and accurate image classification results are predicted.

In a possible embodiment, the determining a prediction loss of the pruning neural network to be trained based on the first predicted image feature, the second predicted image feature, the first prediction classification information, and the second prediction classification information includes:

for each data processing block in the pruning neural network to be trained, determining a first loss corresponding to the data processing block based on a first predicted image feature and a second predicted image feature corresponding to the data processing block;

determining a second loss of the pruning neural network to be trained based on the first predictive classification information and the second predictive classification information;

determining the predicted penalty based on the corresponding first penalty and the second penalty for each of the data processing blocks.

The embodiment can determine the first loss of the data processing block in the prediction of the image characteristics based on the first prediction image characteristics and the second prediction image characteristics corresponding to the data processing block, determine the second loss of the pruning neural network to be trained in the prediction of the final classification information based on the first prediction classification information and the second prediction classification information, determine the prediction loss which is related to the data processing block and is related to the pruning neural network to be trained in the prediction of the final classification information based on the first loss and the second loss, iteratively train the pruning neural network to be trained by using the prediction loss, improve the prediction accuracy of the data processing block in the trained pruning neural network to be trained, and improve the prediction accuracy of the trained pruning neural network to be trained in the output of the final classification information, therefore, the accuracy of the image classification result output by the trained pruning neural network is improved.

In one possible implementation, the determining a first loss corresponding to the data processing block based on the first predicted image feature and the second predicted image feature corresponding to the data processing block includes:

determining a third number of restored predictive image features based on each of the first predictive image features; the third number is the number of initial prediction image blocks corresponding to the sample image;

determining a first loss corresponding to the data processing block based on the third number of restored predicted image features and the second predicted image features.

In this embodiment, since the first predicted image features output by the pruning neural network to be trained are predicted image features from which redundant image block features are removed, the number of the obtained first predicted image features is smaller than the number (third number) of the initial predicted image blocks. And the number of the second predicted image features output by the teacher neural network is the same as that of the initial predicted image blocks, so that based on the obtained third number of restored predicted image features, the number of the features corresponding to the first predicted image features output by the pruning neural network to be trained can be restored, and the number of the features can be matched with that of the second predicted image features, so that one-to-one comparison between the restored predicted image features and the second predicted image features can be realized, the loss between each restored predicted image feature and the second predicted image feature can be determined, the intensive knowledge distillation of the pruning neural network to be trained can be realized, the loss between each restored predicted image feature and the second predicted image feature can be utilized, and the reasonable and accurate first loss can be obtained.

In one possible implementation, the determining a third number of restored predicted image features based on each of the first predicted image features includes:

normalizing the second feature matrix corresponding to the first predicted image feature to obtain a normalized first predicted coding feature, and performing matrix dimension conversion operation on the feature matrix corresponding to the first predicted coding feature to obtain a converted third feature matrix;

performing full-connection mapping processing on the converted third feature matrix to obtain a second predictive coding feature, and performing nonlinear transformation on the second predictive coding feature to obtain a third predictive coding feature;

performing full-connection mapping processing on the feature matrix corresponding to the third predictive coding feature, performing matrix dimension conversion operation on the feature matrix subjected to full-connection mapping processing to obtain a fourth feature matrix, and determining the third number of restored predictive image features based on the fourth feature matrix, where the feature number in the matrix dimension corresponding to the fourth feature matrix is the third number, and the feature dimension in the matrix dimension corresponding to the fourth feature matrix is: and initially predicting the numerical value of the feature dimension corresponding to the image block feature of the image block.

According to the embodiment, based on operations such as normalization processing, matrix dimension conversion operation, full-connection mapping processing and the like, the inverse operation of each operation performed on the image block features corresponding to the initial prediction image blocks by the data processing blocks can be realized, so that the number corresponding to the first prediction image features is reduced, the third number of reduced prediction image features are obtained, and the reduced prediction image features and the number matching with the second prediction image features are realized.

In one possible embodiment, the determining the third number of restored predicted image features based on the fourth feature matrix includes:

carrying out normalization processing on the fourth feature matrix, and carrying out multiple times of full-connection mapping on the fourth feature matrix after normalization processing to obtain a fifth feature matrix;

determining the third number of restored predicted image features based on the fifth feature matrix and the fourth feature matrix.

According to the embodiment, based on normalization processing and multiple times of full-connection mapping performed on the fourth feature matrix, the semantic information of each predicted image feature in the fourth feature matrix can be enriched, the fifth feature matrix of the enriched semantic information can be obtained, furthermore, based on combination of the predicted image features in the fifth feature matrix and the fourth feature matrix, residual connection of the predicted image features in the two feature matrices can be realized, the problems of gradient disappearance, gradient explosion, network overfitting and the like can be effectively avoided, and therefore the accurate restored predicted image feature can be obtained.

In one possible implementation, the determining a first loss corresponding to the data processing block based on the third number of restored predicted image features and the second predicted image features includes:

determining a first sub-loss based on the third number of restored predicted image features and the second predicted image features;

performing feature processing operation on the reduced predicted image features to obtain first target predicted features corresponding to the reduced predicted image features, and determining third predicted classification information corresponding to the first target predicted features;

performing feature processing operation on the second predicted image feature to obtain a second target predicted feature corresponding to the second predicted image feature, and determining fourth predicted classification information corresponding to the second target predicted feature;

determining a second sub-penalty based on the third prediction classification information and the fourth prediction classification information; and determining the first loss based on the first sub-loss and the second sub-loss.

In this embodiment, based on the third number of restored predicted image features and the second predicted image features, the first sub-loss of the data processing block in predicting the image features can be determined. The identification of the reduced predicted image features can be accomplished based on feature processing operations on the reduced predicted image features, determining a first probability that each reduced predicted image feature belongs to the teacher network, and determining a second probability that each second predicted image feature belongs to the teacher network. Based on the first probability and the second probability, the countermeasure loss of the student network about the rationality loss when outputting the reduction prediction image feature, namely a second sub-loss can be determined; furthermore, the pruning neural network to be trained is trained based on the first loss obtained by the second sub-loss and the first sub-loss, so that the accuracy and the reasonability of the characteristic of the restored predicted image output by the data processing block can be improved.

In a possible embodiment, the determining a second sub-loss based on the third prediction classification information and the fourth prediction classification information includes:

determining a third sub-loss based on the third prediction classification information and first standard classification information corresponding to the third prediction classification information;

determining a fourth sub-loss based on the fourth prediction classification information and second standard classification information corresponding to the fourth prediction classification information;

determining the second sub-loss based on the third sub-loss and the fourth sub-loss.

In this embodiment, the teacher network and the pruning neural network to be trained respectively correspond to different standard classification information, and the targets of the corresponding output prediction classification information are different, the second probability that the second prediction image feature corresponding to the teacher network belongs to the teacher network should be close to 1, and the first probability that the restored prediction image feature corresponding to the pruning neural network to be trained belongs to the teacher network should be close to 0. Therefore, the loss is determined by using different standard classification information, the accuracy and the reasonability of the determined third sub-loss and the fourth sub-loss can be improved, and the reasonable and accurate second sub-loss can be obtained.

In a possible implementation, the determining the predicted loss based on the first loss and the second loss corresponding to each of the data processing blocks includes:

determining probability prediction loss corresponding to the pruning neural network to be trained based on the first prediction classification information and standard classification information corresponding to the sample image;

determining the predicted loss based on the first loss, the second loss, and the probabilistic predicted loss for each of the data processing blocks.

According to the embodiment, loss information, namely probability prediction loss, between the first prediction classification information output by the pruning neural network to be trained and the real standard classification information can be determined based on the first prediction classification information and the standard classification information corresponding to the sample image, the pruning neural network to be trained is trained by utilizing the loss, and the precision of the prediction classification information output by the trained pruning neural network can be further improved.

performing feature processing on the sample image by using a pre-trained convolutional neural network to determine fifth prediction classification information corresponding to the sample image;

determining a third loss of the pruning neural network to be trained based on the fifth prediction classification information and the first prediction classification information;

determining the predicted loss based on the first loss, the second loss, and the third loss.

According to the embodiment, based on the fifth prediction classification information and the first prediction classification information determined by the pre-trained convolutional neural network, the third loss between the prediction classification information output by the pruning neural network to be trained and the trained convolutional neural network in the prior art can be determined, the flexibility and diversity of loss determining modes used for training are improved, and further, the flexibility of training the pruning neural network to be trained is improved.

In a second aspect, an embodiment of the present disclosure further provides an image classification apparatus, including:

the first determining module is used for determining a first number of initial image blocks corresponding to a target image and image block characteristics corresponding to each initial image block based on the target image to be processed;

the second determining module is used for determining importance information corresponding to the initial image blocks based on the image block characteristics of the initial image blocks aiming at each initial image block;

the aggregation module is used for aggregating the image block characteristics respectively corresponding to the first number of initial image blocks based on the importance information corresponding to each initial image block to obtain a second number of target image blocks and the image block characteristics corresponding to each target image block; the second number is less than the first number;

and the third determining module is used for determining the image classification result of the target image based on the image block characteristics corresponding to each target image block.

In a third aspect, this disclosure also provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the steps in the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

For the description of the effects of the image classification apparatus, the computer device, and the computer-readable storage medium, reference is made to the description of the image classification method, which is not repeated herein.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 illustrates a flowchart of an image classification method provided by an embodiment of the present disclosure;

fig. 2 is a schematic diagram illustrating a network structure of a trained pruning neural network provided by an embodiment of the present disclosure;

fig. 3 illustrates an initial schematic diagram of determining a target image block corresponding to a target image and an image block feature corresponding to the target image block by using a data processing block according to an embodiment of the present disclosure;

fig. 4 illustrates a flow chart of a method of training a pruning neural network provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating training a pruning neural network using a teacher neural network provided by an embodiment of the present disclosure;

fig. 6 is a schematic diagram illustrating a structure of a data recovery block according to an embodiment of the disclosure;

fig. 7 is a schematic diagram illustrating an image classification apparatus provided in an embodiment of the present disclosure;

fig. 8 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

Furthermore, the terms "first," "second," and the like in the description and in the claims, and in the drawings described above, in the embodiments of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein.

Reference herein to "a plurality or a number" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Research shows that in the technical field of decision intelligence, a pre-trained neural network is generally required to be used for analyzing data, images and the like, and the neural network can help a user to make a reasonable decision based on an analysis result. For example, for intelligent driving, the control of the speed and direction of the vehicle is automatically realized based on the analysis of a neural network on driving images; or, for face payment, determining an image analysis result of the face image based on the processing of the neural network on the face image, and further determining whether the face image can pass verification, thereby determining whether the payment can be completed. Therefore, the intelligent decision making application brings great convenience to users.

In order to increase the network inference speed of the neural network, and implement efficient image classification prediction to achieve timeliness of decision intelligence and efficient decision intelligence, in the prior art, a method of pruning the neural network is generally used, for example, the number of network layers of the neural network or the number of features extracted by the network layers in the neural network is reduced to obtain a lightweight neural network, and the obtained lightweight neural network is used to process an image, so as to achieve the purposes of increasing the network inference speed and implementing efficient image classification prediction.

Based on the above research, the present disclosure provides an image classification method, apparatus, computer device, and storage medium, which can implement removal of redundant image block features by aggregating image block features respectively corresponding to a first number of initial image blocks by determining importance information capable of characterizing an influence degree of each initial image block on an output classification result; for example, removing image block features with little or no influence on the output classification result; thus, image block features corresponding to the second number of target image blocks that determine the output classification result are obtained. Determining an image classification result of the target image by using image block characteristics corresponding to the second number of target image blocks; the accuracy of the determined image classification result is effectively ensured, and the second quantity is smaller than the first quantity, so that the quantity of image block features to be processed is reduced, the reasoning speed is effectively improved, and the defect of reduced prediction caused by pruning operation is avoided.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

It should be noted that specific terms mentioned in the embodiments of the present disclosure include:

norm function: a function having a function of giving a length and a size to the feature vector in the vector space;

a linear module: a linear model, which tries to learn a function for prediction through linear combination of attributes, is used for a regression task and a classification task, and can carry out linear change on the characteristic vector in the vector space;

GELU: gaussian error linear units are high-performance neural network activation functions, nonlinear change of the neural network activation functions is a random regular transformation mode which accords with expectation, and nonlinear factors can be added into a network model, so that nonlinear transformation of characteristics in the network model is realized;

softmax function: the method is widely used in a multi-classification scene, the second intermediate features are mapped to real numbers between 0 and 1 to obtain importance information corresponding to each second intermediate feature, and the sum of the real numbers obtained by mapping each second intermediate feature is 1.

To facilitate understanding of the present embodiment, a detailed description is first given of an image classification method disclosed in the embodiments of the present disclosure, an execution subject of the image classification method provided in the embodiments of the present disclosure is generally a computer device with certain computing power, and in some possible implementations, the image classification method may be implemented by a processor calling a computer-readable instruction stored in a memory.

The following describes the image classification method provided by the embodiment of the present disclosure in detail.

As shown in fig. 1, a flowchart of an image classification method provided in an embodiment of the present disclosure may include the following steps:

s101: based on a target image to be processed, a first number of initial image blocks corresponding to the target image and image block characteristics corresponding to each initial image block are determined.

Here, the target image to be processed may be an image captured by an image capturing device, and specifically may be an image including a target person, an image including a target animal, an image including a target landscape, or the like. The initial image blocks may be sub-images obtained by image segmentation of the target image, and one image block corresponds to a part of the area of the target image, for example, image block a corresponds to the upper left corner area of the target image, image block B corresponds to the upper right corner area of the target image, and image block C corresponds to the middle area of the target image.

The first number may be determined according to the image size of the target image, and the first number of the initial image blocks corresponding to different target images may be different, for example, the larger the image size of the target image is, the larger the first number of the initial image blocks is, and vice versa, the smaller the first number of the initial image blocks is. Alternatively, the first number may also be a predetermined number, and for any acquired target image, the first number of the corresponding initial image blocks is the predetermined number. And after the first number of initial image blocks are spliced, the target image can be restored. The number of the initial image blocks is the same as the number of the image block features corresponding to the initial image blocks, and the number of the initial image blocks is the first number.

The image block features corresponding to each initial image block are the image features of the area of the target image corresponding to the initial image block, and the image features corresponding to the initial image blocks are spliced to form the image features corresponding to the target image. Specifically, the image block features may be feature vectors.

In specific implementation, when there is a need for image classification of a target image, the target image to be processed may be acquired, then the target image is segmented to obtain a first number of initial image blocks corresponding to the target image, and image block features corresponding to each initial image block are determined based on image recognition of the target image.

Here, determining the first number of initial image blocks corresponding to the target image and the image block features corresponding to each initial image block may be performed directly by using a computer device, or may be performed by using a trained pruning neural network. The trained pruning neural network is a neural network used for image classification of the images, and can be a transform model. Fig. 2 is a schematic diagram of a network structure of a trained pruning neural network according to an embodiment of the present disclosure. The pruning neural network includes a plurality of data processing blocks (3 are shown in fig. 2) and a classifier, where the data processing blocks are configured to perform aggregation processing on image block features to obtain image block features corresponding to a target image block, and the classifier is configured to determine probability classification information (which will be described later) corresponding to the target image. The data processing block includes a target encoding Module and an image block pruning Module (TSM), specifically, a data processing block may be referred to as a block, and the target encoding Module may be an Encoder. In addition, in the pruned neural network, in addition to the encoders in the data processing block, at least one other Encoder may be included, and only one other Encoder is shown in fig. 2, where the Encoder is used to process the image block features corresponding to the target image block output by the last data processing block, so as to obtain target classified image features (which will also be specifically described later).

In specific implementation, after a target image to be processed is obtained, the target image may be input to the trained pruning neural network, and the target image is subjected to image processing by using the pruning neural network, so as to obtain a first number of initial image blocks corresponding to the target image and an image block feature corresponding to each initial image block.

S102: and determining importance information corresponding to the initial image blocks based on the image block characteristics of the initial image blocks for each initial image block.

Here, the importance information is used to characterize the degree of influence of each initial image block on the output classification result, and one initial image block corresponds to one importance information. Specifically, the importance information may be an importance score, and the higher the importance score is, the higher the influence of the initial image block on the output classification result is.

For example, the target image is an image including a cat, and if the image block features of an initial image block corresponding to the target image are the features of pixel points corresponding to the cat, it can be determined that the initial image block has a higher influence on the output classification result, and the importance scores corresponding to the initial image block are relatively higher; if the image block features of an initial image block corresponding to the target image are all the features of the pixel points corresponding to the image background, it can be determined that the initial image block has a low influence on the output classification result, and it can be determined that the image features corresponding to the initial image block are redundant image features, so that the importance degree score corresponding to the initial image block will be relatively low, and even can be 0.

In specific implementation, for each initial image block, feature recognition may be performed on image block features of the initial image block, and based on a result of the feature recognition, an influence of the image block features of the initial image block on an output classification result is determined, and further, based on a magnitude of the determined influence of the image block features of the initial image block on the output classification result, importance information corresponding to the initial image block is determined. Furthermore, based on the image block characteristics corresponding to each initial image block, importance information corresponding to each initial image block can be determined.

Or, in the case that the pruning neural network is used to process the target image, after determining the image block features corresponding to each initial image block, the pruning neural network may perform linear transformation on the image block features corresponding to each initial image block in the first number of initial image blocks for multiple times by using the data processing block, determine the importance weight corresponding to each initial image block based on the result of the linear transformation, perform further linear transformation on the image block features corresponding to each initial image block for multiple times by using the data processing block based on the importance weight corresponding to each initial image block, and determine the importance information corresponding to each initial image block.

S103: aggregating the image block characteristics respectively corresponding to the first number of initial image blocks based on the importance information corresponding to each initial image block to obtain a second number of target image blocks and the image block characteristics corresponding to each target image block; the second number is less than the first number.

Here, the second number is smaller than the first number, and the second number is the number of the obtained target image block and the image block features corresponding to the target image block. The target image block is obtained by removing redundant image features from the image block features corresponding to the initial image blocks based on the importance information corresponding to each initial image block. The second number may be related to the first number, and in particular, the second number may be half of the first number. For example, in the case that the first number corresponding to the initial image block is 200, the second number corresponding to the obtained target image block may be 100.

The image block features corresponding to all the target image blocks are the image block features with redundant image features removed. For example, if the image block features of an initial image block D corresponding to the target image are all the features of the pixel points corresponding to the image background, the obtained image block features of the target image block may be the image block features of the initial image block D.

In specific implementation, after the importance information corresponding to each initial image block is determined, redundant image block features in the image block features corresponding to the first number of initial image blocks may be determined based on the importance information corresponding to each initial image block, and then, the image block features corresponding to the first number of initial image blocks may be aggregated based on the determined redundant image block features to obtain a second number of target image blocks and image block features corresponding to each target image block.

Or, in the case of processing the target image by using the pruning neural network, after the pruning neural network obtains the importance information corresponding to each initial image block, according to the above steps, the image block features respectively corresponding to the first number of initial image blocks are aggregated based on the importance information corresponding to each initial image block, so as to obtain the second number of target image blocks and the image block feature corresponding to each target image block.

In addition, when the target image is processed by using the pruning neural network, since the target neural network includes a plurality of data processing blocks, each data processing block may be according to the steps of S102 and S103, specifically, after the first data processing block aggregates the image block features corresponding to the initial image blocks, a second number of target image blocks and image block features corresponding to each target image block are obtained; then, the target image blocks may be used as new initial image blocks, the second number corresponding to the target image blocks is used as a new first number, and the second data processing block continues to further aggregate the image block features corresponding to each target image block output by the first data processing block, that is, the image block features corresponding to the new initial image blocks of the new first number are further aggregated, so as to obtain new target image blocks of the second number and the image block features corresponding to each new target image block of the second number; further, the third data processing block may be used to continue to perform further aggregation on the image block features corresponding to each target image block output by the second data processing block, so as to obtain new target image blocks and image block features corresponding to each new target image block. Specifically, the number of target image blocks output by each data processing block and the number of image block features corresponding to the output target image block are smaller than the number of target image blocks output by the previous data processing block and the number of image block features corresponding to the output target image block.

For example, in the case that the number of initial image blocks corresponding to the target image and the number of image block features corresponding to the initial image blocks are 200, the number of target image blocks output by the first data processing block and the number of image block features corresponding to the target image blocks may be 100, the number of target image blocks output by the second data processing block and the number of image block features corresponding to the target image blocks may be 50, and the number of target image blocks output by the third data processing block and the number of image block features corresponding to the target image blocks may be 25.

In this way, by processing the plurality of data processing blocks, redundant image features in the target image can be sufficiently removed, and image block features that determine the output classification result can be obtained.

S104: and determining the image classification result of the target image based on the image block characteristics corresponding to each target image block.

Here, the image classification result is used to characterize the image category of the target image, for example, the image category of the target image may be an animal image, for example, the target image is an image of a cat, the target image is an image of a squirrel, and so on; for another example, the image category of the target image may be a human image, a landscape image, or the like.

In specific implementation, feature recognition may be performed on the image block features corresponding to each final target image block, and the image category of the target image is determined based on the result of the feature recognition, so as to obtain the image classification result of the target image. Likewise, S104 may also be performed by using a pruning neural network, and will not be described herein.

In this way, by determining the importance information capable of representing the influence degree of each initial image block on the output classification result, the image block features respectively corresponding to the first number of initial image blocks are aggregated, so that the redundant image block features can be removed; for example, removing image block features which have little or no influence on the output classification result; thus, image block features corresponding to the second number of target image blocks that determine the output classification result are obtained. Determining an image classification result of the target image by using image block characteristics corresponding to the second number of target image blocks; the accuracy of the determined image classification result is effectively ensured, and the number of the image block features to be processed is reduced because the second number is smaller than the first number, so that the reasoning speed is effectively improved.

In one embodiment, for S102, the following is described as an example of performing this step by using one data processing block in the pruning neural network:

s102-1: and aiming at each initial image block, coding the image block characteristics corresponding to the initial image block by using a target coding module to obtain the coding characteristics corresponding to the initial image block.

In specific implementation, aiming at each initial image block, an Encoder in a data processing block in a pruning neural network can be utilized to firstly encode the image block characteristics corresponding to the initial image block to obtain the encoding characteristics corresponding to the initial image block; further, the coding characteristics corresponding to each initial image block can be obtained.

Here, after the target image is processed by the pruning neural network, the image block features corresponding to each initial image block obtained by the target image may be feature data in the form of a feature matrix, where one feature data in the feature matrix corresponds to an image block feature (feature vector) of one initial image block.

Specifically, the matrix dimension corresponding to the feature matrix corresponding to the initial image block may be nxm, where N represents the number of features of the image block, and specifically, N is the first number; and C represents the feature dimension corresponding to the image block feature obtained after the pruning neural network processes the target image. Also, N is generally less than C.

In specific implementation, the Encoder may perform encoding processing on each image block feature in the N × C feature matrix to obtain a corresponding encoding feature of each initial image block. Here, the number of encoding features corresponding is the same as the number of image block features corresponding to the initial image block.

S102-2: and determining importance information corresponding to the initial image blocks based on the coding features corresponding to each initial image block.

In specific implementation, after obtaining the coding features corresponding to each initial image block, an image block pruning module TSM in the data processing block may be used to perform further feature processing on the coding features corresponding to each initial image block, so as to determine the importance information corresponding to the initial image block.

In one embodiment, for S102-2, the following steps may be performed:

s102-2-1: and carrying out normalization processing on the coding features corresponding to each initial image block to obtain the coding features after normalization processing.

In specific implementation, after obtaining the coding feature corresponding to each initial image block output by the Encoder, the coding feature corresponding to each initial image block may be normalized by using a normalization module in the TSM, so as to obtain the normalized coding feature corresponding to each coding feature. Here, the number of obtained coding features after normalization processing is the same as the number of coding features corresponding to the initial image block.

In particular, the normalization module may be a Norm module, wherein the Norm module includes a Norm function.

In one embodiment, for S102-2-1, the following steps may be performed:

s102-2-1-1: and determining the normalization weight corresponding to each initial image block based on the coding features corresponding to each initial image block.

Here, the normalization weight is used to perform normalization weighting processing on the encoding features corresponding to the initial image block. In specific implementation, when the normalization module obtains the coding features corresponding to each initial image block, the normalization module may first perform feature identification processing on the coding features corresponding to each initial image block to determine redundant features in each coding feature, and then determine the normalization weight corresponding to each coding feature according to the redundant features in each coding feature.

S102-2-1-2: and based on the normalization weight corresponding to each initial image block, performing normalization processing on the coding features corresponding to each initial image block to obtain the coding features after the normalization processing.

In specific implementation, the normalization module may be used to perform normalization weighting processing on the coding features corresponding to each initial image block based on the normalization weight corresponding to each initial image block, so as to obtain the coding features after normalization processing.

S102-2-2: carrying out full-connection mapping processing on the normalized coding features to obtain first intermediate features; and the characteristic dimension corresponding to the first intermediate characteristic is smaller than the characteristic dimension corresponding to the coding characteristic.

Here, the first intermediate feature is a feature obtained after the full-connection mapping process is performed. In specific implementation, the full-connection module in the data processing block may be used to perform full-connection mapping processing on each normalized coding feature output by the Norm module, so as to obtain a first intermediate feature corresponding to each normalized coding feature. Specifically, the fully-connected module may be a linear module.

Here, the obtained number of the first intermediate features is the same as the number of the coding features after the normalization processing, but the value of the feature dimension corresponding to each first intermediate feature is smaller than the value of the feature dimension of the image block feature corresponding to the first intermediate feature, and specifically, the value of the feature dimension corresponding to the first intermediate feature may be half of the value of the feature dimension of the image block feature corresponding to the first intermediate feature.

Taking the matrix dimension of the feature matrix corresponding to the initial image block as N × C as an example, that is, the number of the feature dimension corresponding to each image block feature is C, after the feature matrix corresponding to the initial image block is processed by using the TSM, the number of the obtained first intermediate features is N, and the number of the obtained feature dimension corresponding to each first intermediate feature is C/2, that is, the matrix dimension of the feature matrix corresponding to each first intermediate feature output by the full connection module may be N × C/2. Wherein C/2 may be a second number.

In one embodiment, for S102-2-2, the following steps may be performed:

s102-2-2-1: and determining the dimension compression weight corresponding to the coding features after the normalization processing based on the coding features after the normalization processing.

Here, the dimension compression weight is a degree of compression when the feature dimension corresponding to the encoding feature is dimension-compressed.

In specific implementation, the full-connection module may perform feature recognition processing on each normalized coding feature to determine a redundant coding feature in each normalized coding feature, and then determine a dimension compression weight corresponding to each normalized coding feature according to the redundant coding feature in each normalized coding feature.

S102-2-2-2: and performing full-connection mapping processing on the normalized coding features according to the dimension compression weight to obtain first intermediate features.

In specific implementation, the full-connection module may be used to perform feature dimension compression processing on the coding features corresponding to each initial image block based on the dimension compression weight corresponding to each normalized coding feature, so as to obtain a first intermediate feature corresponding to each normalized coding feature.

S102-2-3: and determining importance information corresponding to each initial image block based on the first intermediate features corresponding to each initial image block.

In specific implementation, based on the first intermediate feature corresponding to each initial image block, further feature transformation processing may be performed on each first intermediate feature, so as to obtain importance information corresponding to each initial image block.

In one embodiment, for S102-2-3, the following steps may be performed:

s102-2-3-1: and carrying out nonlinear transformation on each first intermediate feature, and carrying out full-connection mapping processing on the first intermediate features after the nonlinear transformation to obtain a second intermediate feature corresponding to each first intermediate feature.

In specific implementation, each first intermediate feature may be subjected to nonlinear transformation by using an activation function GELU in a nonlinear transformation module in the TSM, and then, a fully-connected module (specifically, the fully-connected module may also be a linear module) in the TSM is used to perform fully-connected mapping processing on the first intermediate feature after the nonlinear transformation, so as to obtain a second intermediate feature corresponding to each first intermediate feature.

Here, the obtained value of the feature dimension corresponding to the second intermediate feature is smaller than the value of the feature dimension corresponding to the first intermediate feature corresponding to the second intermediate feature, and specifically, the value of the feature dimension corresponding to the second intermediate feature may be half of the first number. For example, in the case where the first number is N, the feature dimension corresponding to the second intermediate feature may have a value of N/2.

Taking the matrix dimension of the feature matrix corresponding to the initial image block as N × C, the matrix dimension of the feature matrix corresponding to each first intermediate feature output by the full-connection module may be N × C/2, and the matrix dimension of the obtained feature matrix corresponding to each second intermediate feature is N × N/2.

S102-2-3-2: and determining importance information corresponding to each initial image block based on each second intermediate feature.

In specific implementation, after the feature matrix corresponding to the second intermediate feature is obtained, the feature matrix may be input to a classification module in the TSM, each second intermediate feature in the second intermediate features is classified by using a softmax function in the classification module, a real value corresponding to each second intermediate feature is determined, and the real value corresponding to each second intermediate feature is used as an importance score corresponding to the second intermediate feature, that is, importance information corresponding to each second intermediate feature is obtained. Wherein, the importance information can be characterized by a feature vector.

Here, the obtained importance information may correspond to a feature matrix, and the feature matrix includes importance information corresponding to each initial image block. In specific implementation, when the matrix dimension of the feature matrix corresponding to each obtained second intermediate feature is N × N/2, the matrix dimension of the feature matrix corresponding to the importance information is also N × N/2.

In one embodiment, for S103, the following steps may be performed:

s103-1: determining a first feature matrix corresponding to the importance information based on the importance information corresponding to each initial image block;

the matrix dimension corresponding to the first feature matrix is nxm, N is a first number, and M is a second number.

Here, the first feature matrix includes the importance information corresponding to each initial image block, and as can be seen from the above embodiments, the matrix dimension of the feature matrix corresponding to the obtained importance information is also N × N/2, where N may be a first number, and N/2 may be a second number, and the feature matrix is the first feature matrix, and therefore, M is known to be N/2.

In specific implementation, after obtaining the importance information corresponding to each second intermediate feature, the classification module in the TSM may determine the first feature matrix composed of each importance information directly based on the feature vector corresponding to each importance information.

S103-2: and performing matrix dimension conversion operation on the first characteristic matrix to obtain a second characteristic matrix with matrix dimension of M multiplied by N.

Here, the matrix dimension conversion operation may be an operation of converting the positions of the feature vectors corresponding to the respective importance information in the feature matrix. Specifically, the matrix dimension conversion operation may be a reshape operation, for example, after the N × M feature matrix is subjected to the reshape operation, the N × M feature matrix may become an M × N feature matrix.

In specific implementation, a matrix dimension conversion operation may be performed on the first feature matrix, the number of rows corresponding to the first feature matrix is changed into the number of columns, and the number of columns of the first feature matrix is changed into the number of rows, that is, the second feature matrix with the matrix dimension of M × N is obtained.

Here, in the process of performing the matrix dimension conversion operation on the first feature matrix, the feature vectors corresponding to the importance information of each row may be sequentially changed into the feature vectors corresponding to the importance information of each column, so as to obtain the second feature matrix. For example, the first matrix is

The second feature matrix obtained after conversion is

Alternatively, the position of each feature vector in the first matrix may be converted into a corresponding position in the second matrix by a preset conversion relationship. For example, the conversion relationship may be that each feature vector in the first row is changed into a feature vector in the second column, each feature vector in the second row is changed into a feature vector in the third column, and so on, and finally each feature vector in the first row is changed into a feature vector in the first column, thereby obtaining the second feature matrix obtained after conversion. The conversion relationship may be set according to development requirements, and is not limited here.

S103-3: and aggregating the image block characteristics respectively corresponding to the first number of initial image blocks based on the second characteristic matrix and the image block matrixes corresponding to the image block characteristics of the initial image blocks to obtain a second number of target image blocks and image block characteristics corresponding to each target image block.

Here, the image block feature corresponding to each target image block may be characterized in the form of a feature vector; the image block matrix corresponding to the image block feature of the initial image block may be a feature matrix corresponding to the encoding feature after encoding processing.

In specific implementation, matrix multiplication may be performed on the second feature matrix and the image block matrices corresponding to the image block features of the initial image blocks, so as to realize aggregation of the image block features respectively corresponding to the first number of initial image blocks, thereby obtaining a second number of target image blocks and an image block feature corresponding to each target image block.

For example, an image block matrix corresponding to the image block features of the initial image block is an N × C feature matrix, and the second feature matrix is an M × N feature matrix, where M is N/2, for example, the second feature matrix and the image block matrix are subjected to matrix multiplication to obtain the M × C feature matrix, and then, each feature vector in the M × C feature matrix may be used as the image block feature corresponding to each target image block.

As shown in fig. 3, an initial schematic diagram is provided for determining a target image block corresponding to a target image and an image block feature corresponding to the target image block by using a data processing block according to an embodiment of the present disclosure, where a feature matrix corresponding to an image block feature of an initial image block is an N × C feature matrix, that is, a feature matrix corresponding to the target image is an N × C feature matrix, and after the data processing block obtains the N × C feature matrix corresponding to the target image, a target encoding module is used to encode the image block feature of the initial image block to obtain an encoding feature, where the feature matrix corresponding to the encoding feature may be the N × C feature matrix. Then, inputting the feature matrix which can be an N × C feature matrix corresponding to the coding features into an image block pruning network in the data processing block, wherein after the image block pruning network obtains the N × C feature matrix corresponding to the target image, the normalization module can be sequentially utilized to perform normalization processing on each feature vector in the N × C feature matrix; then, carrying out full-connection mapping processing by using a first full-connection module to obtain a characteristic matrix of NxC/2; secondly, nonlinear transformation is carried out on the N multiplied by C/2 characteristic matrix by using an activation function in a nonlinear transformation module, and then full-connection mapping processing is carried out by using a second full-connection module to obtain the N multiplied by N/2 characteristic matrix; classifying by utilizing a softmax function in a classification module to obtain a first characteristic matrix of NxN/2 corresponding to the importance information; carrying out matrix dimension conversion operation on the characteristic matrix of NxN/2 corresponding to the importance information to obtain a second characteristic matrix with the matrix dimension of N/2 xN; finally, matrix multiplication is carried out on the N/2 xN second feature matrix and the feature matrix corresponding to the coding features, namely the N/2 xC feature matrix corresponding to the image block features of the target image block is obtained, and the N/2 xC feature matrix is obtained by taking each feature vector in the feature matrix as the image block features corresponding to each target image block.

In an embodiment, for S104, because the pruning neural network includes a plurality of data processing blocks, after a first number of initial image blocks corresponding to the target image and image block features corresponding to each initial image block are aggregated by using a first data processing block to obtain a second number of target image blocks and image block features corresponding to each target image block, the target image block output by the first data processing block may be used as a new initial image block, the number of new initial image blocks is used as a new first number, and for each initial image block, the importance information corresponding to the initial image block is determined based on the image block features of the initial image block. Until the number of times of return reaches a preset value, here, for 3 data processing blocks shown in fig. 2, after the preset value may be 2, that is, after performing aggregation processing three times, based on the image block features corresponding to each target image block that is finally determined, the image classification result of the target image is determined. Specifically, the preset value may be set according to the number of data processing blocks, and is not limited herein.

Here, referring to fig. 2, first, a first data processing block is used to perform aggregation processing on a first number of initial image blocks corresponding to a target image and image block characteristics corresponding to each initial image block, so as to obtain an output of the first data processing block; then, the output is used as the input of a second data processing block, and the input is subjected to aggregation processing by the second data processing block to obtain the output of the second data processing block; and finally, taking the output of the second data processing block as the input of a third data processing block, performing aggregation processing on the input by using the third data processing block to obtain the output corresponding to the third data processing block, and taking the output corresponding to the third data processing block as the finally determined image block characteristics corresponding to each target image block.

Then, based on the image block characteristics corresponding to each finally determined target image block, corresponding probability classification information of the target image can be determined, and based on the probability classification information, an image classification result of the target image can be determined.

Specifically, in the process of determining the corresponding probability classification information of the target image, the target classification image features corresponding to the target image may be determined based on the image block features of each target image block determined last.

Here, in fig. 2, the finally determined image block characteristics of each target image block are the image block characteristics of each target image block output by the third data processing block. The target classification image features are image features obtained by encoding the image block features of each finally determined target image block, and the target classification image features are fused with the image block features of each target image block.

In specific implementation, after the finally determined image block features of each target image block are obtained, other encoders in the pruning neural network may be used to perform further feature coding processing on the finally determined image block features of each target image block, so as to obtain target classification image features.

In fig. 2, a further Encoder shown in the pruning neural network may be used to perform further feature coding processing on the finally determined image block features of each target image block, so as to obtain target classified image features. However, in specific implementation, the other encoders in the pruned neural network may include a plurality of encoders, and when the other encoders include a plurality of encoders, each other Encoder may be sequentially used to perform further feature encoding processing on the image block features of each finally determined target image block, and the image features output by the last other Encoder are used as the target classification image features.

Further, corresponding probabilistic classification information of the target image may be determined based on the target classification image feature.

Here, the probability classification information is used to characterize a probability value of the target image corresponding to each image category, for example, the corresponding probability classification information of the target image may be: the probability value of the target image corresponding to the image of class a is 0.85, the probability value of the target image corresponding to the image of class b is 0.1, and the probability value of the target image corresponding to the image of class c is 0.05.

In specific implementation, the target classified image features can be input to a classifier in the pruning neural network, the classifier is used for carrying out feature classification processing on the target classified image features, and corresponding probability classification information of the target image is output.

Finally, an image classification result of the target image may be determined based on the probabilistic classification information.

In specific implementation, the classification result with the maximum probability value corresponding to the probability classification information can be used as the image classification result of the target image.

Continuing with the above example of the corresponding probability classification information for the target image, the classification result with the probability value of 0.85 may be used as the image classification result of the target image, that is, it may be determined that the target image corresponds to a-type image.

In an embodiment, as can be seen from the foregoing embodiments, the step of determining the image classification result of the target image based on the target image to be processed is performed by a pre-trained pruning neural network, so that the embodiment of the present disclosure further provides a method for training the pruning neural network, and as shown in fig. 4, a flowchart of the method for training the pruning neural network provided by the embodiment of the present disclosure may include the following steps:

s401: a sample image is acquired.

Here, the sample image may be an image obtained corresponding to any image class for training the pruning neural network. In specific implementation, the acquired sample images may include a plurality of sample images, the image categories corresponding to each sample image may be the same or different, and the image contents corresponding to each sample image may be different.

S402: the method comprises the steps of inputting a sample image into a pruning neural network to be trained, processing the sample image by using the pruning neural network to be trained, determining a first predicted image feature output by each data processing block, and determining first prediction classification information corresponding to the sample image.

The data processing block is used for determining importance information of the initial prediction image blocks based on image block characteristics of the initial prediction image blocks corresponding to the sample image, and aggregating image block characteristics respectively corresponding to the third number of the initial prediction image blocks based on the importance information corresponding to each of the initial prediction image blocks to obtain a fourth number of target prediction image blocks and first prediction image characteristics corresponding to each of the target prediction image blocks.

Here, the first prediction classification information is used to represent the prediction probability values of the sample image corresponding to each image category, the first image prediction features are prediction features corresponding to respective target prediction image blocks output by each data processing block, and one first image prediction feature is output by one data processing block.

In specific implementation, after the sample image is input to the pruning neural network to be trained, the pruning neural network to be trained may perform segmentation processing and identification processing on the sample image to obtain a third number of initial prediction image blocks corresponding to the sample image, and determine an image block feature corresponding to each initial prediction image block.

Then, the image block features corresponding to each initial prediction image block can be sequentially processed by using each data processing block in the pruning neural network to be trained, a first prediction image feature output by each data processing block is determined, and first prediction classification information corresponding to the sample image is determined based on the first prediction image feature output by the last data processing block.

Specifically, the step of determining the first predicted image feature output by each data processing block is the same as the step of determining the image block feature of the target image corresponding to the target image by each data processing block in the above embodiments, and the step of determining the first prediction classification information is the same as the step of determining the probability classification information corresponding to the target image in the above embodiments, which are not repeated here.

S403: and inputting the sample image into a pre-trained teacher neural network, processing the sample image by using the teacher neural network, determining a second predicted image feature output by each data processing block in the teacher neural network, and determining second prediction classification information corresponding to the sample image.

The data processing block in the teacher neural network comprises a target coding module.

Here, the teacher neural network may be a neural network generated based on a pruning neural network and trained in advance, and may be a neural network for image classification of an image, and specifically, the teacher neural network may be a transform model. Here, the pruning neural network to be trained may act as a student network.

The number of the data processing blocks in the teacher neural network is the same as that of the data processing blocks in the pruning neural network, but the data processing blocks in each teacher neural network only comprise one target coding module and do not comprise a TSM (time series modulation), i.e., the teacher neural network cannot reduce the feature dimension of the image block features corresponding to the sample images. As shown in fig. 5, a schematic diagram of training a pruning neural network by using a teacher neural network according to an embodiment of the present disclosure is provided, where the pruning neural network in fig. 5 may further include a data restoring block (RTSM) for restoring image block features output by the data processing blocks, and in the pruning neural network, one data processing block corresponds to one data restoring block, and specifically, the data restoring block will be described later. And the data recovery block is used in the training process of the pruning neural network, and the trained pruning neural network does not comprise the data recovery block and only comprises the data processing block. And also shown in fig. 5 are the network structure of the teacher neural network, various training losses, including specifically the first loss, the second loss, the third loss, the second sub-loss, the third sub-loss and the fourth sub-loss, and also shown in fig. 5 are the discrimination networks for determining the third prediction classification information corresponding to the restored predicted image features and determining the fourth prediction classification information corresponding to the second predicted image features, which will be described in detail later on with respect to various training losses and discrimination networks.

The second prediction image feature is a prediction feature output by a data processing block in the teacher neural network, one data processing block in the teacher neural network also corresponds to the second prediction image feature, and the second prediction classification information is information output by the teacher neural network and capable of representing the prediction probability value of the sample image corresponding to each image category.

In specific implementation, after the sample image is input to the teacher neural network, the teacher neural network may perform segmentation processing and identification processing on the sample image to obtain a third number of initial prediction image blocks corresponding to the sample image, and determine an image block feature corresponding to each initial prediction image block.

Then, each data processing block in the teacher neural network may be utilized, and specifically, the target coding module in each data processing block may be utilized to sequentially perform coding processing on image block features corresponding to each initial prediction image block in the third number of initial prediction image blocks, determine second prediction image features output by each data processing block, and determine second prediction classification information corresponding to the sample image based on the second prediction image features output by the last data processing block. And the number of the second predicted image features output by each data processing block is a third number.

S404: and determining the prediction loss of the pruning neural network to be trained based on the first prediction image feature, the second prediction image feature, the first prediction classification information and the second prediction classification information, and performing iterative training on the pruning neural network to be trained by using the prediction loss until a preset training cut-off condition is met to obtain the trained pruning neural network.

Here, the training cutoff condition may be that the number of rounds of iterative training reaches a preset number of rounds and/or the prediction accuracy of the pruned neural network obtained by training reaches a target accuracy.

In particular implementations, the loss corresponding to the predicted image feature can be determined based on each first predicted image feature and each second predicted image feature. And determining a loss between the two predicted classification information based on the first predicted classification information and the second predicted classification information.

And then, determining the prediction loss of the pruning neural network to be trained based on the loss corresponding to the predicted image characteristics and the loss between the two pieces of prediction classification information, and performing iterative training on the pruning neural network to be trained by using the prediction loss until a preset training cut-off condition is met to obtain the trained pruning neural network.

In one embodiment, for S404, the following steps may be performed:

s404-1: and aiming at each data processing block in the pruning neural network to be trained, determining a first loss corresponding to the data processing block based on a first predicted image characteristic and a second predicted image characteristic corresponding to the data processing block.

Here, for each data processing block in the pruning neural network to be trained, the data processing block may be determined, a corresponding matching data processing block in the teacher neural network, and then a loss between a first predicted image feature corresponding to the data processing block and a second predicted image feature corresponding to the matching data processing block is determined based on the first predicted image feature corresponding to the data processing block and the second predicted image feature, and the loss is taken as a first loss corresponding to the data processing block.

Further, based on this step, a first loss corresponding to each data processing block in the pruned neural network to be trained may be determined.

S404-2: determining a second loss of the pruning neural network to be trained based on the first predictive classification information and the second predictive classification information.

In specific implementation, the loss between the two pieces of prediction classification information can be determined based on the first prediction classification information and the second prediction classification information, and the loss is used as the second loss of the pruning neural network to be trained. Specifically, the second loss is shown in fig. 5.

S404-3: and determining a predicted loss based on the corresponding first loss and second loss of each data processing block.

In this step, the first loss and the second loss corresponding to each data processing block may be directly used as the predicted loss.

In one embodiment, for S404-1, the following steps may be performed:

s404-1-1: determining a third number of restored predictive image features based on each of the first predictive image features; the third number is the number of initial prediction image blocks corresponding to the sample image.

Here, the restored predicted image features are predicted image features obtained after the first predicted image features are restored in a number dimension, and the number corresponding to the restored predicted image features is larger than the number corresponding to the first predicted image features.

As can be seen from the above embodiments, since each data processing block in the pruning neural network to be trained includes a TSM, the number of first predicted image features output by each data processing block is smaller than the number of image block features of the initial predicted image block, while each data processing block in the teacher neural network does not have a TSM, so the number of second predicted image features output by each data processing block in the teacher neural network is equal to the number of image block features of the initial predicted image block.

Therefore, after obtaining each first predictive image feature, the number of the first predictive image features can be reduced by using the RTSM corresponding to each first predictive image feature, so as to obtain a third number of reduced predictive image features corresponding to each first predictive image feature. In this way, it is achieved that the number of restored predictive image features matches the number of second predictive image features.

S404-1-2: a first penalty is determined for the data processing block based on the third number of restored predicted image features and the second predicted image features.

In a specific implementation, for each of the third number of restored predicted image features, a second predicted image feature corresponding to the restored predicted image feature may be determined first; further, a loss between the restored predicted image feature and the second predicted image feature can be determined; thereafter, a first loss can be determined based on a loss between every two predicted image features (the restored predicted image feature and its corresponding second predicted image feature), which is taken as the first loss of the restored predicted image feature corresponding to the data processing block in the pruning neural network to be trained.

Based on S404-1-1 and S404-1-2, a first loss corresponding to each data processing block in the pruning neural network to be trained can be determined, and is shown in fig. 5.

In one embodiment, for S404-1-1, the following steps may be performed:

s404-1-1-1: and performing normalization processing on the second feature matrix corresponding to the first predicted image feature to obtain a normalized first predicted coding feature, and performing matrix dimension conversion operation on the feature matrix corresponding to the first predicted coding feature to obtain a converted third feature matrix.

Taking the matrix dimension of the feature matrix corresponding to the image block features of the initial prediction image blocks as L × K, where L is equal to the third number, and K is a numerical value of the feature dimension corresponding to the image block features of each initial prediction image block. As can be seen from the above embodiments for determining the image block characteristics of the target image block corresponding to the target image, the matrix dimension of the second feature matrix corresponding to the first predictive coding feature output by the first data processing block in the pruned neural network to be trained is L/2 × K, the matrix dimension of the second feature matrix corresponding to the first predictive coding feature output by the second data processing block is L/4 × K, and the matrix dimension of the second feature matrix corresponding to the first predictive coding feature output by the third data processing block is L/8 × K. The second feature matrix corresponding to each data processing block includes respective first predicted image features output by the data processing block, and the respective first predicted image features can be characterized in the form of feature vectors.

As shown in fig. 6, a schematic structural diagram of a data reduction block provided in the embodiment of the present disclosure is shown, where the data reduction block includes two normalization modules, four full-connection modules, and a nonlinear transformation module, and specific functions of the modules will be described in the following embodiments. Fig. 6 also shows a schematic diagram of restoring the L/2 × K second feature matrix output by the first data processing block to obtain the restored predicted image features, and the specific restoration steps will also be set forth below.

The following description will take an example of restoring the first predicted image feature output by the first data processing block in the pruning neural network to be trained. Specifically, the L/2 × K second feature matrix output by the first data processing block may be input to the RTSM corresponding to the first data processing block (the first TSM), and the first normalization module (specifically, a Norm module including a Norm function) in the RTSM shown in fig. 6 is used to normalize the L/2 × K second feature matrix, so as to obtain the normalized L/2 × K second feature matrix. Wherein, the normalized L/2 XK second feature matrix comprises: and the normalized first predictive coding features correspond to the first predictive coding features.

Then, a matrix dimension conversion operation may be performed on the normalized L/2 × K second feature matrix, that is, a reshape operation may be performed on the normalized L/2 × K second feature matrix, so as to obtain a converted third feature matrix. The matrix dimension corresponding to the third feature matrix may be K L/2.

S404-1-1-2: and carrying out full-connection mapping processing on the converted third feature matrix to obtain a second predictive coding feature, and carrying out nonlinear conversion on the second predictive coding feature to obtain a third predictive coding feature.

Taking the example that the matrix dimension corresponding to the third feature matrix may be K × L/2, in a specific implementation, a first full-connection module (specifically, a linear module) in the RTSM shown in fig. 6 may be used to perform full-connection mapping processing on the K × L/2 third feature matrix to obtain a K × (X × L) feature matrix, where the feature matrix includes second predictive coding features corresponding to each normalized first predictive coding feature in the third feature matrix, and a value (X × L) of the feature dimension corresponding to the second predictive coding feature is greater than a value (L/2) of the feature dimension corresponding to the normalized first predictive coding feature. In specific implementation, X in (X × L) may be 4, that is, the K × L/2 third feature matrix is subjected to full-connection mapping processing to obtain a K × 4L feature matrix.

Further, a nonlinear transformation module including an activation function GELU in RTSM as shown in fig. 6 may be utilized to perform nonlinear transformation on each second predictive coding feature in the K × 4L feature matrix to obtain a new K × 4L feature matrix, where the new K × 4L feature matrix includes a third predictive coding feature obtained by performing nonlinear transformation on each second predictive coding feature.

S404-1-1-3: and performing full-connection mapping processing on the feature matrix corresponding to the third predictive coding feature, performing matrix dimension conversion operation on the feature matrix subjected to full-connection mapping processing to obtain a fourth feature matrix, and determining a third number of restored predictive image features based on the fourth feature matrix.

The feature quantity in the matrix dimension corresponding to the fourth feature matrix is a third quantity, and the numerical value of the feature dimension in the matrix dimension corresponding to the fourth feature matrix is as follows: and initially predicting the numerical value of the feature dimension corresponding to the image block feature of the image block.

Here, the description will be continued by taking, as an example, a new K × 4L feature matrix obtained by using a feature matrix corresponding to the third predictive coding feature:

in specific implementation, as shown in fig. 6, a second full-connection module (specifically, a linear module) in the RTSM may be used to perform full-connection mapping processing on the new K × 4L feature matrix, that is, perform full-connection mapping processing on the feature matrix corresponding to the third predictive coding feature, so as to obtain a feature matrix after full-connection mapping processing. The matrix dimension corresponding to the fully-concatenated feature matrix after the fully-concatenated mapping process is KxL, the KxL feature matrix comprises predicted coding features after the fully-concatenated mapping process is performed on each third predicted coding feature, the obtained predicted coding features corresponding to each third predicted coding feature are subjected to the fully-concatenated mapping process, and the numerical value (4L) of the feature dimension corresponding to each predicted coding feature after the fully-concatenated mapping process is smaller than the numerical value (L) of the feature dimension corresponding to the third predicted coding feature.

Further, the RTSM shown in fig. 6 may be used to perform a matrix dimension conversion operation on the K × L feature matrix after the full connection mapping process, and specifically, the RTSM shown in fig. 6 may be used to perform a resahpe operation on the K × L feature matrix after the full connection mapping process, so as to obtain a fourth feature matrix of L × K.

Then, the RTSM can be utilized to determine the corresponding reduced predicted image features for the first data processing block based on the K × L fourth feature matrix, where the number of the corresponding reduced predicted image features for the first data processing block is the third number, so that the matching between the number of the corresponding reduced predicted image features for the first data processing block and the number of the second predicted image features can be achieved.

As to the step of restoring the first predicted image feature output by the second data processing block and the first predicted image feature output by the third data processing block in the pruning neural network to be trained, the step of restoring the first predicted image feature output by the first data processing block may be performed, and details are not repeated here.

In one embodiment, the step of determining the third number of restored predicted image features based on the fourth feature matrix in S404-1-1-3 can be implemented as follows:

s404-1-1-3-1: and carrying out normalization processing on the fourth feature matrix, and carrying out multiple times of full-connection mapping on the fourth feature matrix after normalization processing to obtain a fifth feature matrix.

The description is continued by taking the fourth feature matrix of which the fourth feature matrix is L × K as an example:

in specific implementation, the second normalization module (specifically, the Norm module including the Norm function) in the RTSM shown in fig. 6 may be used to perform normalization processing on the fourth feature matrix, and then two consecutive fully-connected modules (both may be linear modules) in the RTSM are used to sequentially perform two times of fully-connected mapping processing on the fourth feature matrix after the normalization processing, so as to obtain a fifth feature matrix. And the matrix dimension of the fifth feature matrix is L multiplied by K.

S404-1-1-3-2: and determining a third number of restored predicted image features based on the fifth feature matrix and the fourth feature matrix.

In a specific implementation, a matrix addition operation may be performed on the L × K fifth feature matrix and the L × K fourth feature matrix to obtain a sixth feature matrix, each image feature included in the sixth feature matrix is used as a restored predicted image feature, and since the numbers of image features in the fifth feature matrix and the fourth feature matrix are both the third number L, the numbers of image features included in the six feature matrices obtained after the matrix addition are also both the third number L, and thus the numbers of the obtained restored predicted image features are also both the third number L.

In one embodiment, the step of determining the first loss corresponding to the data processing block based on the third number of restored predicted image features and the second predicted image features may be further implemented as follows:

step one, based on the third quantity of the reduction predicted image characteristics and the second predicted image characteristics, determining a first sub-loss.

Here, the first sub-loss may be the loss described in S404-1-1 and S404-1-2, and with respect to the first sub-loss, reference may be made to S404-1-1 and S404-1-2, which is not described herein again.

And secondly, performing feature processing operation on the reduced predicted image features to obtain first target predicted features corresponding to the reduced predicted image features, and determining third predicted classification information corresponding to the first target predicted features.

Here, the disclosed embodiments also provide an authentication network for authenticating the restored predicted image feature and the second predicted feature, determining a probability that the restored predicted image feature corresponds to the predicted image feature output for the teacher neural network, and determining a probability that the second predicted feature corresponds to the predicted image feature output for the teacher neural network.

The first target prediction feature is a prediction feature obtained by reducing a feature dimension corresponding to the restored prediction image feature. The third classification information is used to characterize the probability that each of the first target prediction features (or the restored prediction image features corresponding to the first target prediction features) corresponds to a prediction image feature output for the teacher neural network, and the restored prediction image features are derived based on the first prediction image features, so the third classification information can characterize the probability that the first prediction image features correspond to prediction image features output for the teacher neural network.

In specific implementation, after obtaining each reduction predicted image feature corresponding to each data processing block in the pruning neural network to be trained, each reduction predicted image feature may be input to the identification network, and feature processing operations are performed on each reduction predicted image feature by using the identification network, specifically, a dimension reduction operation may be performed on a feature dimension corresponding to each reduction predicted image feature, and a value of the feature dimension corresponding to each reduction predicted image feature is reduced to 1, so as to obtain a first target predicted feature corresponding to each reduction predicted image feature. And the numerical value of the feature dimension corresponding to each first target prediction feature is 1.

For example, when the numerical value K (K is greater than 1) of the feature dimension corresponding to the restored predicted image feature is reduced, the numerical value of the feature dimension corresponding to the restored predicted image feature becomes 1 after the feature processing operation is performed on the restored predicted image feature by using the discrimination network. When the matrix dimension of the feature matrix corresponding to the restored predicted image feature is L × K, the feature matrix of L × K is subjected to a feature processing operation by using the discrimination network, and then an L × 1 feature matrix can be obtained.

Further, each first target prediction feature may be classified by using the discrimination network, a probability that each first target prediction feature corresponds to a prediction image feature output for the teacher neural network is determined, and the probability is used as third prediction classification information corresponding to the first target prediction feature.

And thirdly, performing feature processing operation on the second predicted image feature to obtain a second target predicted feature corresponding to the second predicted image feature, and determining fourth predicted classification information corresponding to the second target predicted feature.

Here, the second target prediction feature is a prediction feature obtained by reducing a feature dimension corresponding to the second prediction image feature. The fourth classification information is used to characterize a probability that each of the second target prediction features (or second predicted image features corresponding to the second target prediction features) corresponds to a predicted image feature output for the teacher neural network.

In a specific implementation, after obtaining each second prediction image feature corresponding to each data processing block in the teacher neural network, each second prediction image feature may be input to the identification network, and the identification network is used to perform a feature processing operation on each second prediction image feature, specifically, a dimension reduction operation may be performed on a feature dimension corresponding to each second prediction image feature, and a value of the feature dimension corresponding to each second prediction image feature is reduced to 1, so as to obtain a second target prediction feature corresponding to each second prediction image feature. And the numerical value of the feature dimension corresponding to each second target prediction feature is 1.

Further, the identification network may be used to classify each second target prediction feature, determine a probability that each second target prediction feature corresponds to a prediction image feature output by the teacher neural network, and use the probability as fourth prediction classification information corresponding to the second target prediction feature.

Step four, determining a second sub-loss based on the third prediction classification information and the fourth prediction classification information; and determining a first loss based on the first sub-loss and the second sub-loss.

In one embodiment, the step of determining the second sub-loss based on the third prediction classification information and the fourth prediction classification information may be implemented as the following steps:

s1: and determining a third sub-loss based on the third prediction classification information and the first standard classification information corresponding to the third prediction classification information.

Here, the first standard classification information corresponding to the third prediction classification information may be a label 0. Since the third prediction classification information is used for representing the probability that each first target prediction feature corresponds to the predicted image feature output by the teacher neural network, and each first target prediction feature is obtained based on the first predicted image feature output by the data processing block in the pruned neural network to be trained, the probability that the first target prediction feature corresponds to the predicted image feature output by the teacher neural network should be close to 0, therefore, the label 0 is used as the first standard classification information, the third prediction classification information is used for making loss, the first loss is determined based on the obtained loss, and then the first loss is used for training the data processing block in the pruned neural network to be trained, so that the rationality of the first predicted image feature output by the data processing block can be ensured.

In specific implementation, each probability and label 0 corresponding to each piece of third prediction classification information may be used as a loss, a loss corresponding to each piece of third prediction classification information is determined, and then, a loss corresponding to the third prediction classification information may be used as a third sub-loss. Fig. 5 shows a third sub-loss corresponding to the first data processing block of the pruning neural network to be trained, and the third sub-losses corresponding to the other data processing blocks in the pruning neural network to be trained are not shown one by one, but each data processing block in the pruning neural network to be trained may correspond to one third sub-loss.

S2: and determining a fourth sub-loss based on the fourth prediction classification information and second standard classification information corresponding to the fourth prediction classification information.

Here, the second standard classification information corresponding to the fourth prediction classification information may be a label 1. Since the fourth prediction classification information is used to characterize the probability that each of the second target prediction features corresponds to a prediction image feature output for the teacher neural network, and each second target prediction feature is derived based on a second prediction image feature output by a data processing block in the teacher neural network, the probability that the second target predictive feature corresponds to the predictive image feature output for the teacher neural network should be close to 1, therefore, the label 0 is used as the second standard classification information, and the fourth prediction classification information is lost, determining a first loss based on the obtained loss, training a data processing block in the pruning neural network to be trained by utilizing the first loss, the method can realize supervision on the output of the data processing block in the pruning neural network to be trained, thereby improving the reasonability of the first predicted image feature output by the data processing block.

In specific implementation, each probability and label 1 corresponding to each fourth prediction classification information may be lost, so as to determine a loss corresponding to each fourth prediction classification information, and then, the loss corresponding to the fourth prediction classification information may be used as a fourth sub-loss. In fig. 5, the fourth sub-loss corresponding to the first data processing block of the teacher neural network is shown, and the fourth sub-losses corresponding to the other data processing blocks in the teacher neural network are not shown one by one, but each data processing block in the teacher neural network may correspond to one fourth sub-loss.

S3: and determining a second sub-loss based on the third sub-loss and the fourth sub-loss.

Here, the third sub-loss and the fourth sub-loss may be combined, and the combined loss may be regarded as the second sub-loss. In fig. 5, a corresponding second sub-loss between the first data processing block of the teacher neural network and the first data processing block of the pruning neural network to be trained is shown, but the corresponding second sub-losses between the teacher neural network and the other data processing blocks in the pruning neural network to be trained are not shown one by one, but one second sub-loss may be corresponding to each data processing block in the teacher neural network and the pruning neural network to be trained.

Further, in the fourth step, after obtaining the second sub-loss and the first sub-loss, the second sub-loss and the first sub-loss may be regarded as the first loss. Alternatively, the second sub-loss and the first sub-loss may be combined, and the combined loss may be regarded as the first loss.

In addition, based on the first step, the second step, the third step and the fourth step, the first loss corresponding to each data processing block in the pruning neural network to be trained is determined, and then, the data processing block may be iteratively trained by using the first loss corresponding to each data processing block.

In an embodiment, the step of determining the predicted loss for the first loss and the second loss corresponding to each data processing block may further be implemented as the following steps:

t1: and determining the probability prediction loss corresponding to the pruning neural network to be trained based on the first prediction classification information and the standard classification information corresponding to the sample image.

Here, the prediction loss may further include a probabilistic prediction loss between the first prediction classification information and the standard classification information corresponding to the sample image. And the probability prediction loss is used for representing the loss between the first prediction classification information output by the pruning neural network to be trained and the standard classification information corresponding to the sample image. And the standard classification information corresponding to the sample image is the real classification information corresponding to the sample image.

In specific implementation, the first prediction classification information and the standard classification information corresponding to the sample image can be used for loss, so that the probability prediction loss corresponding to the pruning neural network to be trained is determined.

T2: and determining the predicted loss based on the first loss, the second loss and the probability predicted loss corresponding to each data processing block.

Here, the first loss, the determined second loss, and the determined probabilistic predictive loss corresponding to each data processing block in the pruning neural network to be trained may be taken as the predictive loss. Or, the losses may be combined to determine a total loss, and the pruning neural network to be trained is trained using the total loss.

p1: and performing feature processing on the sample image by using a pre-trained convolutional neural network to determine fifth prediction classification information corresponding to the sample image.

Here, the convolutional neural network trained in advance may be a convolutional neural network trained by using the related art, which is a neural network capable of image classification of an image. The fifth prediction classification information is used to characterize the prediction probability values of the sample images corresponding to each image class.

In specific implementation, the pre-trained convolutional neural network can be used to perform feature processing on the sample image, and fifth prediction classification information corresponding to the sample image output by the pre-trained convolutional neural network is determined.

P2: and determining a third loss of the pruning neural network to be trained based on the fifth prediction classification information and the first prediction classification information.

In specific implementation, the fifth prediction classification information and the first prediction classification information can be used for loss, and the obtained loss is used as a third loss of the pruning neural network to be trained. In particular, a third penalty for the pruned neural network to be trained is shown in fig. 5.

P3: a predicted loss is determined based on the first loss, the second loss, and the third loss.

In particular implementation, the first loss, the second loss and the third loss may be used as predicted losses; alternatively, the first loss, the second loss, the third loss, and the probabilistic predictive loss in the above embodiment may be taken together as the predictive loss; further alternatively, the first loss, the second loss, the third loss, and the probabilistic predictive loss may be combined, and the combined loss may be used as the predictive loss.

In addition, in implementation, at least part of the losses (the first loss, the second loss, the third loss, and the probabilistic predictive loss) mentioned in the above embodiments may be used as the predictive loss to train the pruning neural network to be trained, and the embodiment of the present disclosure is not particularly limited with respect to the loss in specific use.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, an image classification device corresponding to the image classification method is also provided in the embodiments of the present disclosure, and because the principle of solving the problem of the device in the embodiments of the present disclosure is similar to the image classification method in the embodiments of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.

As shown in fig. 7, a schematic diagram of an image classification apparatus provided in an embodiment of the present disclosure includes:

a first determining module 701, configured to determine, based on a target image to be processed, a first number of initial image blocks corresponding to the target image and an image block feature corresponding to each initial image block;

a second determining module 702, configured to determine, for each initial image block, importance information corresponding to the initial image block based on an image block feature of the initial image block;

an aggregation module 703, configured to aggregate, based on the importance information corresponding to each initial image block, image block features corresponding to the first number of initial image blocks, to obtain a second number of target image blocks and an image block feature corresponding to each target image block; the second number is less than the first number;

a third determining module 704, configured to determine an image classification result of the target image based on an image block feature corresponding to each target image block.

In a possible implementation manner, the second determining module 702 is configured to, for each initial image block, perform encoding processing on image block features corresponding to the initial image block to obtain encoding features corresponding to the initial image block;

In a possible implementation manner, the second determining module 702 is configured to perform normalization processing on the coding feature corresponding to each initial image block to obtain a normalized coding feature;

In a possible implementation, the second determining module 702 is configured to determine a normalization weight corresponding to each of the initial image blocks based on the coding feature corresponding to each of the initial image blocks;

In a possible implementation manner, the second determining module 702 is configured to determine, based on the normalized encoding feature, a dimension compression weight corresponding to the normalized encoding feature;

In a possible implementation manner, the second determining module 702 is configured to perform nonlinear transformation on each first intermediate feature, and perform full-connection mapping processing on the first intermediate features after the nonlinear transformation, so as to obtain a second intermediate feature corresponding to each first intermediate feature;

In a possible implementation manner, the aggregating module 703 is configured to determine, based on importance information corresponding to each of the initial image blocks, a first feature matrix corresponding to the importance information, where a matrix dimension corresponding to the first feature matrix is nxm, N is the first number, and M is the second number;

In a possible implementation manner, the third determining module 704 is configured to use the target image blocks as new initial image blocks, use the number of the new initial image blocks as a new first number, and return to the step of determining, for each initial image block, importance information corresponding to the initial image block based on image block characteristics of the initial image block, until the number of times of return reaches a preset value, and determine probability classification information corresponding to the target image based on image block characteristics corresponding to each finally determined target image block;

In a possible implementation manner, the step of determining the image classification result of the target image is performed by a pre-trained pruning neural network based on the target image to be processed; the device further comprises:

a training module 705 for training the pruning neural network according to the following steps:

acquiring a sample image;

In a possible implementation manner, the training module 705 is configured to, for each data processing block in the pruning neural network to be trained, determine a first loss corresponding to the data processing block based on a first predicted image feature and a second predicted image feature corresponding to the data processing block;

In a possible implementation, the training module 705 is configured to determine a third number of restored predicted image features based on each of the first predicted image features; the third number is the number of initial prediction image blocks corresponding to the sample image;

In a possible implementation manner, the training module 705 is configured to perform normalization processing on the second feature matrix corresponding to the first predicted image feature to obtain a first predicted coding feature after the normalization processing, and perform matrix dimension conversion operation on the feature matrix corresponding to the first predicted coding feature to obtain a third feature matrix after the conversion;

In a possible implementation manner, the training module 705 is configured to perform normalization processing on the fourth feature matrix, and perform multiple full-join mapping on the fourth feature matrix after the normalization processing to obtain a fifth feature matrix;

In a possible implementation, the training module 705 is configured to determine a first sub-loss based on the third number of restored predicted image features and the second predicted image features;

In a possible implementation manner, the training module 705 is configured to determine a third sub-loss based on the third prediction classification information and first standard classification information corresponding to the third prediction classification information;

In a possible implementation manner, the training module 705 is configured to determine a probabilistic predictive loss corresponding to the pruning neural network to be trained based on the first predictive classification information and the standard classification information corresponding to the sample image;

In a possible implementation manner, the training module 705 is configured to perform feature processing on the sample image by using a convolutional neural network trained in advance, and determine fifth prediction classification information corresponding to the sample image;

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

An embodiment of the present disclosure further provides a computer device, as shown in fig. 8, which is a schematic structural diagram of a computer device provided in an embodiment of the present disclosure, and the computer device includes:

a processor 81 and a memory 82; the memory 82 stores machine-readable instructions executable by the processor 81, the processor 81 being configured to execute the machine-readable instructions stored in the memory 82, the processor 81 performing the following steps when the machine-readable instructions are executed by the processor 81: s101: determining a first number of initial image blocks corresponding to the target image and image block characteristics corresponding to each initial image block based on the target image to be processed; s102: determining importance information corresponding to the initial image blocks based on the image block characteristics of the initial image blocks aiming at each initial image block; s103: aggregating the image block characteristics respectively corresponding to the first number of initial image blocks based on the importance information corresponding to each initial image block to obtain a second number of target image blocks and the image block characteristics corresponding to each target image block; the second number is smaller than the first number and S104: and determining the image classification result of the target image based on the image block characteristics corresponding to each target image block.

The memory 82 includes a memory 821 and an external memory 822; the memory 821 is also referred to as an internal memory and temporarily stores operation data in the processor 81 and data exchanged with the external memory 822 such as a hard disk, and the processor 81 exchanges data with the external memory 822 through the memory 821.

For the specific execution process of the instruction, reference may be made to the steps of the image classification method described in the embodiments of the present disclosure, and details are not repeated here.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the image classification method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The computer program product of the image classification method provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute steps of the image classification method described in the above method embodiments, which may be referred to specifically for the above method embodiments, and are not described herein again.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implementing, and for example, a plurality of units or components may be combined, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. An image classification method, comprising:

2. The method according to claim 1, wherein the determining, for each of the initial image blocks, importance information corresponding to the initial image block based on the image block characteristics of the initial image block comprises:

3. The method according to claim 2, wherein said determining the importance information of the initial image blocks based on the corresponding coding features of each of the initial image blocks comprises:

4. The method according to claim 3, wherein the normalizing the coding features corresponding to each of the initial image blocks to obtain normalized coding features includes:

5. The method according to claim 3 or 4, wherein the performing full-concatenation mapping processing on the normalized coding features to obtain first intermediate features comprises:

6. The method according to any of claims 3 to 5, wherein the determining importance information corresponding to each of the initial image blocks based on the first intermediate feature corresponding to each of the initial image blocks comprises:

7. The method according to any one of claims 1 to 6, wherein the aggregating, based on the importance information corresponding to each of the initial image blocks, the image block features respectively corresponding to the first number of initial image blocks to obtain a second number of target image blocks and an image block feature corresponding to each target image block comprises:

8. The method according to any one of claims 1 to 7, wherein the determining an image classification result of the target image based on the image block characteristics corresponding to each target image block comprises:

9. The method according to claim 8, wherein the step of determining the image classification result of the target image is performed by a pre-trained pruning neural network based on the target image to be processed;

the method further comprises the following steps:

acquiring a sample image;

10. The method according to claim 9, wherein the determining a prediction loss of the pruning neural network to be trained based on the first predicted image feature, the second predicted image feature, the first prediction classification information, and the second prediction classification information comprises:

11. The method of claim 10, wherein determining the first loss for the data processing block based on the first predicted image feature and the second predicted image feature for the data processing block comprises:

12. The method according to claim 11, wherein determining a third number of restored predictive image features based on each of the first predictive image features comprises:

13. The method according to claim 12, wherein said determining the third number of restored predicted image features based on the fourth feature matrix comprises:

14. The method according to any of claims 11-13, wherein said determining a first loss corresponding to the data processing block based on the third number of restored predicted image features and the second predicted image features comprises:

15. The method of claim 14, wherein determining a second sub-penalty based on the third prediction classification information and the fourth prediction classification information comprises:

16. The method according to any of claims 10 to 15, wherein said determining the predicted penalty based on the corresponding first penalty and the second penalty for each of the data processing blocks comprises:

17. The method according to any of claims 10 to 16, wherein said determining said predicted loss based on said corresponding first loss and said second loss for each of said data processing blocks comprises:

18. An image classification apparatus, comprising:

19. A computer device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, the processor for executing the machine-readable instructions stored in the memory, the processor performing the steps of the image classification method according to any one of claims 1 to 17 when the machine-readable instructions are executed by the processor.

20. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when executed by a computer device, performs the steps of the image classification method according to any one of claims 1 to 17.