CN115761343A

CN115761343A - Image classification method and image classification device based on continuous learning

Info

Publication number: CN115761343A
Application number: CN202211459077.6A
Authority: CN
Inventors: 赵旭鹰; 赵星; 于连源; 姚越宇; 梁小涛
Original assignee: Capital Normal University
Current assignee: Capital Normal University
Priority date: 2022-11-17
Filing date: 2022-11-17
Publication date: 2023-03-07

Abstract

The application provides an image classification method and an image classification device based on continuous learning, which comprise the following steps: determining an image category of the target image based on the image classification model; the image classification model is trained by the following steps: aiming at any model training task, in the process of carrying out each round of training on the initial image classification model according to the model training task: judging whether the task information of the model training task in the round of training enters a preset credible area or not; if the image classification model enters the image classification model, updating a projection matrix corresponding to each layer obtained by previous training in the initial image classification model according to a first preset mode; if the information does not enter the training system, updating the projection matrix corresponding to each layer obtained by the previous training according to a second preset mode; and finally determining a projection matrix obtained by the round of training. Therefore, the accuracy and the stability of image classification can be improved when the trained image classification model is applied to identifying the images of unknown classes.

Description

Image classification method and image classification device based on continuous learning

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to an image classification method and an image classification apparatus based on continuous learning.

Background

Recently, with the development of deep learning, neural network models are widely applied to the fields of speech recognition, image classification, object detection, and the like. Among them, in the field of image classification, neural network models have achieved many research results, and in many cases, recognition speed and recognition accuracy close to those of humans have been achieved.

However, neural network models do not have the ability to learn continuously like humans, i.e., the knowledge of one task cannot be used on a subsequent task, and the previous task cannot be forgotten when learning the subsequent task. When the neural network model learns a new task, the knowledge learned by the previous task is forgotten, so that when the trained image classification model is applied to identifying the image of an unknown class, the accuracy of image classification is unstable, the classification accuracy on partial image classes is low, and the image classification fails. In addition, the neural network model can continuously learn the mapping relation between input data and output data in the training process, and the learning knowledge of the neural network model is not accurate enough in the early learning stage, so that misleading information is brought by the knowledge and is persistently stored in the model, and the performance of the trained image classification model is poor and the accuracy of image classification is influenced.

Disclosure of Invention

In view of this, the present application aims to provide an image classification method and an image classification apparatus based on continuous learning, which add a suitable projection matrix to adjust model parameters, so as to improve the continuous learning capability of the model and improve the accuracy and stability of image classification when a trained image classification model is applied to identify an image of an unknown class; in addition, the projection matrix is updated after the task information of model training enters a preset credible area, misleading information of the training task in the model when the training task is not credible can be avoided, and therefore the performance of the image classification model and the accuracy of image classification are further improved.

The embodiment of the application provides an image classification method based on continuous learning, which comprises the following steps:

inputting a target image to be classified into a pre-trained image classification model, and determining the image category of the target image; wherein the image classification model is trained by the following steps:

sequentially acquiring a plurality of model training tasks under a continuous learning condition;

sequentially carrying out multiple rounds of training on the initial image classification model according to each model training task to obtain the trained image classification model; aiming at any model training task, in the process of carrying out each round of training on the initial image classification model according to the model training task:

judging whether task information of a projection matrix corresponding to each layer in the initial image classification model of the model training task enters a preset credible area during the training;

if the task information of the projection matrix corresponding to the layer in the initial image classification model enters a preset credible area during the training, determining the projection matrix corresponding to the layer obtained by the training according to the projection matrix corresponding to the layer in the initial image classification model obtained by the previous training in a first preset mode; the initial projection matrix corresponding to each layer is an identity matrix with the same matrix scale as the model parameter scale of the initial image classification model; the projection matrix is used to update the model parameters of the initial image classification model when propagating backwards.

Further, the method further comprises:

and if the task information of the projection matrix corresponding to the layer in the initial image classification model does not enter a preset credible area during the training of the round, determining the projection matrix corresponding to the layer obtained by the training of the round according to the projection matrix corresponding to the layer in the initial image classification model obtained by the previous training of the round according to a second preset mode.

Further, each model training task comprises a preset total training round number and a preset round number threshold value corresponding to the model training task; judging whether the task information of the projection matrix corresponding to each layer in the initial image classification model of the model training task in the training round enters a preset credible area or not comprises the following steps:

acquiring task information of a projection matrix corresponding to each layer of the model training task during the round of training; the task information comprises the corresponding training round number in the training round;

and if the difference value between the training round number corresponding to the training round in the training round and the preset total training round number is smaller than the preset round number threshold value, judging that the task information of the projection matrix corresponding to each layer of the model training task in the training round enters a preset credible area.

Furthermore, each model training task comprises a preset loss value threshold corresponding to the model training task or a preset gradient threshold corresponding to each layer in the model training task; the method for judging whether the task information of the projection matrix corresponding to each layer in the initial image classification model of the model training task enters the preset credible area during the training of the round comprises the following steps:

acquiring task information of a projection matrix corresponding to each layer of the model training task during the round of training; the task information comprises a corresponding loss value in the training of the round or a gradient matrix of each layer obtained through back propagation;

if the corresponding loss value during the round of training is smaller than the preset loss value threshold, judging that the task information of the projection matrix corresponding to each layer of the model training task during the round of training enters a preset credible area;

or if the sum of the absolute values of each element in the gradient matrix of each layer is smaller than the preset gradient threshold value corresponding to the layer, judging that the task information of the projection matrix corresponding to the layer enters a preset credible area when the model training task is trained in the round.

Further, each model training task comprises a ratio threshold of the gradient norm corresponding to each layer in the model training task; the method for judging whether the task information of the projection matrix corresponding to each layer in the initial image classification model of the model training task enters the preset credible area during the training of the round comprises the following steps:

aiming at the projection matrix corresponding to each layer in the initial image classification model, acquiring task information of the projection matrix corresponding to the layer when the model training task is trained in the round; the task information comprises a ratio of a first norm to a second norm corresponding to the layer in the round of training; the first norm is the norm of the product of the projection matrix corresponding to the layer after projection and the current gradient; the second norm is a norm of the current gradient;

and if the ratio is smaller than the ratio threshold of the gradient norm corresponding to the layer, judging that the task information of the projection matrix corresponding to the layer enters a preset credible area when the model training task is trained in the round.

Furthermore, each model training task also comprises a sample image set and an image label corresponding to the model training task; determining the projection matrix corresponding to the layer obtained by the previous round of training according to the projection matrix corresponding to the layer in the initial image classification model obtained by the previous round of training according to a first preset mode, wherein the method comprises the following steps:

disordering and segmenting a sample image set corresponding to the model training task into a plurality of sample image subsets;

for any sample image subset, determining a projection matrix corresponding to the layer updated by using the sample image subset based on the sample image subset and the projection matrix corresponding to the layer updated by using the previous sample image subset; when the sample image subset is the first sample image subset in the plurality of sample image subsets, taking the projection matrix obtained by the previous training of the training or the initial projection matrix as the projection matrix obtained by updating the previous sample image subset;

and determining a projection matrix updated by using the last sample image subset in the plurality of sample image subsets as the projection matrix corresponding to the layer obtained by the round of training.

Further, when each model training task includes a ratio threshold of a gradient norm corresponding to the model training task, if the task information of the projection matrix corresponding to the layer in the initial image classification model does not enter a preset confidence region during the training of the round, determining the projection matrix corresponding to the layer obtained through the training of the round according to a second predetermined mode and the projection matrix corresponding to the layer in the initial image classification model obtained through the previous training of the round, including:

for any sample image subset, determining a projection matrix corresponding to the layer updated by using the sample image subset based on the sample image subset and the projection matrix corresponding to the layer updated by using the previous sample image subset;

determining a projection matrix updated by using the last sample image subset in the plurality of sample image subsets as a projection matrix corresponding to the layer obtained by the round of training;

in the process of determining the projection matrix corresponding to the layer obtained by updating the sample image subset, if the task information of the projection matrix corresponding to the layer in the initial image classification model does not enter a preset credible area during the training, respectively calculating a target ratio of a first norm and a second norm corresponding to the layer by using each image data for each image data input into the projection matrix corresponding to the layer;

arranging the target ratios calculated by using each image data in an ascending order, and screening m image data corresponding to the previous m target ratios, wherein m is a positive integer;

based on the m image data, the projection matrix corresponding to the layer updated by using the previous sample image subset, the projection matrix corresponding to the layer updated by using the sample image subset is determined by the following formula:

wherein i represents the number of layers corresponding to the projection matrix, k represents the sequence number corresponding to the sample image subset, and i and k are positive integers; pi _{_k} Representing a projection matrix corresponding to the ith layer updated by using the sample image subset; pi _{_k-1} Representing a projection matrix corresponding to the ith layer updated by using the previous sample image subset; xi ^′ _{_k} An average value representing m pieces of image data input into a projection matrix corresponding to the ith layer; xi ^′ _{_} kT denotes xi ^′ _{_k} Transposing; alpha represents a hyperparameter and alpha>0。

The embodiment of the present application further provides an image classification device based on continuous learning, the device includes:

the classification module is used for inputting a target image to be classified into a pre-trained image classification model and determining the image category of the target image;

the training module is used for obtaining the image classification model through the following steps:

judging whether task information of a projection matrix corresponding to each layer in the initial image classification model of the model training task enters a preset credible area during the round of training; the initial projection matrix corresponding to each layer is an identity matrix with the same matrix scale as the model parameter scale of the initial image classification model; the projection matrix is used for updating model parameters of the initial image classification model during backward propagation;

and if the task information of the projection matrix corresponding to the layer in the initial image classification model enters a preset credible area during the training, determining the projection matrix corresponding to the layer obtained by the training according to the projection matrix corresponding to the layer in the initial image classification model obtained by the previous training according to a first preset mode.

An embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions being executable by the processor to perform the steps of a method for image classification based on continuous learning as described above.

Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the image classification method based on continuous learning as described above.

The embodiment of the application provides an image classification method and an image classification device based on continuous learning, which comprise the following steps: inputting a target image to be classified into a pre-trained image classification model, and determining the image category of the target image; wherein the image classification model is trained by the following steps: sequentially acquiring a plurality of model training tasks under a continuous learning condition; sequentially carrying out multiple rounds of training on the initial image classification model according to each model training task to obtain the trained image classification model; aiming at any model training task, in the process of carrying out each round of training on the initial image classification model according to the model training task: judging whether task information of a projection matrix corresponding to each layer in the initial image classification model of the model training task enters a preset credible area during the training; if the task information of the projection matrix corresponding to the layer in the initial image classification model enters a preset credible area during the training, determining the projection matrix corresponding to the layer obtained by the training according to the projection matrix corresponding to the layer in the initial image classification model obtained by the previous training in a first preset mode; the initial projection matrix corresponding to each layer is an identity matrix with the same matrix scale as the model parameter scale of the initial image classification model; the projection matrix is used to update the model parameters of the initial image classification model upon back propagation.

Therefore, the proper projection matrix is added to adjust the model parameters, the continuous learning capacity of the model can be improved, and the accuracy and the stability of image classification when the trained image classification model is applied to identifying the unknown type of images are improved; in addition, the projection matrix is updated after the task information of model training enters a preset credible area, misleading information of the training task in the model when the training task is not credible can be avoided, and therefore the performance of the image classification model and the accuracy of image classification are further improved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 illustrates one of flowcharts of a training method of an image classification model according to an embodiment of the present application;

fig. 2 illustrates a second flowchart of a training method of an image classification model according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram illustrating an apparatus for training an image classification model according to an embodiment of the present disclosure;

fig. 4 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. Every other embodiment that one skilled in the art can obtain without inventive effort based on the embodiments of the present application falls within the scope of protection of the present application.

It is found through research that recently, with the development of deep learning, neural network models are widely applied in the fields of speech recognition, image classification, target detection, and the like. Among them, in the field of image classification, neural network models have achieved many research results, and in many cases, recognition speed and recognition accuracy close to those of humans have been achieved.

However, neural network models do not have the ability to learn continuously as humans, i.e., are not able to use the knowledge of one task to a later task, and do not forget the previous task when learning the latter task. When the neural network model learns a new task, the knowledge learned by the previous task is forgotten, so that when the trained image classification model is applied to identifying the image of an unknown class, the accuracy of image classification is unstable, the classification accuracy on partial image classes is low, and the image classification fails. In addition, the neural network model continuously learns the mapping relationship between the input data and the output data in the training process, and the knowledge learned by the neural network model in the early learning stage is not accurate enough, so that misleading information is brought, and the misleading information is persistently stored in the model, so that the performance of the trained image classification model is poor, and the accuracy of image classification is influenced.

Based on this, the embodiment of the application provides an image classification method and an image classification device based on continuous learning, so that model parameters are adjusted by adding an optimized more appropriate projection matrix, the continuous learning capability of a model is improved, and the accuracy and stability of image classification of a trained image classification model are improved when the trained image classification model is applied to identifying images of unknown classes; in addition, the projection matrix is updated after the task information of the model training enters a preset credible area, so that misleading information of the training task in the unreliable state in the model is avoided, and the performance of the image classification model and the accuracy of image classification are further improved.

The image classification method based on continuous learning provided by the embodiment of the application comprises the following steps: inputting a target image to be classified into a pre-trained image classification model, and determining the image category of the target image.

Here, the image classification model trained in advance may be a neural network model, such as a convolutional neural network CNN and a recurrent neural network RNN. The image classification model can extract feature information of different levels from an input target image through a multi-layer complex network structure formed by neurons, and further determine the image category of the target image; each layer of the neural network model is provided with a projection matrix (orthogonal projection matrix (OWM) algorithm) in which the projected gradient is orthogonal to the input space of the layer.

In specific application, the image classification model can identify the handwritten digital image and determine the numbers in the handwritten digital image; the image classification model can also be used for identifying the shot images and determining the animal types in the shot images, such as cats, dogs, birds and the like; the specific variety of each animal can be further accurately identified, such as whether the dog in the shot image is caudad, golden hair, labrador and the like.

Further, the training method for the image classification model provided in the embodiment of the present application may include:

the method comprises the following steps of firstly, sequentially obtaining a plurality of model training tasks under the condition of continuous learning.

It should be noted that, under the condition of continuous learning, when the image classification model is trained based on the next model training task, data of the previous model training task is often lost or covered, so that when the neural network model learns a new task, knowledge learned by the previous task is forgotten.

And secondly, performing multi-round training on the initial image classification model according to each model training task in sequence to obtain the trained image classification model.

In specific implementation, different model training tasks may be used to train the image classification model to recognize the capabilities of different types of images, and corresponding to the above example, the model training Task1 may be used to train the image classification model to recognize images of cats; the model training Task2 may be used to train the image classification model to identify the images of the dog.

It should be noted that, a plurality of model training tasks are often stored in a model training queue to be executed in sequence; after completing multi-round training of the initial image classification model according to a certain model training task Taski, the initial image classification model is continuously subjected to multi-round training according to another model training task Taskn, and so on until all model training tasks in the model training queue are completely executed, and the trained image classification model is obtained.

In the prior art, when an image classification model executes an old model training task and executes a new model training task, the situation that the knowledge learned by the previous task is suddenly forgotten occurs, namely the problem of 'catastrophic forgetting' occurs; in the early stage of a new model training task, the image classification model is not suitable for new training task data, the learned knowledge is not accurate enough, and the inaccurate knowledge often brings misleading information to the neural network, for example, the weights of neurons are changed mistakenly, and the misleading information is stored in the model persistently. All the situations can cause the technical problems that the performance of a trained image classification model is poor and the accuracy and the stability of image classification are influenced.

The following describes in detail a training method of the image classification model provided in the embodiment of the present application.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for training an image classification model according to an embodiment of the present disclosure. As shown in fig. 1, for any model training task, the process of performing each training round of the multiple training rounds on the initial image classification model according to the model training task may include:

s101, judging whether task information of a projection matrix corresponding to each layer in the initial image classification model enters a preset credible area or not during the model training task.

Initializing a model parameter of an initial image classification model to be w _{i_0} (generally obeying a Gaussian distribution) and initialized to obtain an initial projection matrix P _{i_0} I represents the number of layers corresponding to the projection matrix; the gradient of the initial projection matrix after projection is orthogonal to the input space of the initial image classification model; the initial projection matrix is an identity matrix with the same matrix size as the model parameter size of the initial image classification model (the number of rows of the initial projection matrix is the same as the number of rows of the model parameter matrix), namely P _{i_0} ＝I _i (ii) a The projection matrix is used for updating the model parameters of the initial image classification model in the back propagation stage of the model training process.

In a first possible implementation manner, when each model training task includes a preset total number of training rounds corresponding to the model training task and a preset round threshold, step S101 may include:

and S1011, acquiring task information of the projection matrix corresponding to each layer of the model training task in the round of training.

Wherein, the task information comprises the corresponding training round number in the training round.

And S1012, if the difference value between the number of training rounds corresponding to the training round and the preset total number of training rounds is smaller than the preset round threshold value, judging that the task information of the projection matrix corresponding to each layer of the model training task enters a preset credible area during the training round.

That is, it is assumed that the round of training is the epoch round of training, the preset total number of training rounds is N, the preset round threshold is r, and if N-epoch < r, it is indicated that the task information of the model training task during the epoch round of training has entered the preset confidence region.

It should be appreciated that, in general, the knowledge learned by the network is often not sufficiently accurate at the early stages of neural network training. With the training, the neural network can gradually learn the mapping rule of the real data. By this embodiment, the image classification model can be made immune to inaccurate knowledge within a certain number of training rounds.

In a second possible implementation manner, when each model training task includes a preset loss threshold corresponding to the model training task or a preset gradient threshold corresponding to each layer in the model training task, step S101 may include:

and S1013, acquiring task information of the projection matrix corresponding to each layer of the model training task in the round of training.

Wherein the task information includes a corresponding loss value in the training round or a gradient matrix of each layer obtained through back propagation.

S1014, if the corresponding loss value during the round of training is smaller than the preset loss value threshold, judging that the task information of the projection matrix corresponding to each layer of the model training task during the round of training enters a preset credible area; or, if the sum of the absolute values of each element in the gradient matrix of each layer (L1 norm of the gradient matrix) is smaller than the preset gradient threshold corresponding to the layer, it is determined that the task information of the projection matrix corresponding to the layer has entered the preset confidence region during the round of training of the model training task.

That is, assuming that the corresponding preset loss threshold value during the round of training is L, the loss value obtained by forward propagation of the current input data x is loss; and if the loss is less than or equal to L, the task information of the projection matrix corresponding to each layer of the model training task enters a preset credible area during the round of training. Or, assume when the input data is xGradient matrix of i-th layer is Δ w _{i_x} The preset gradient threshold value corresponding to the ith layer is W during the round of training _i (ii) a If the sum of the absolute values of each element in the gradient matrix of the i-th layer (L1 norm of the gradient matrix) satisfies:

||Δw _{i_x} || _L1 ≤W _i

and the task information of the projection matrix corresponding to the ith layer of the model training task enters a preset credible area during the round of training.

It should be appreciated that the L1 norm of the loss value or gradient matrix is typically larger during the early stages of neural network training. Then, along with the continuous training of the neural network, the loss value or the L1 norm of the gradient matrix can be rapidly reduced to a certain range and fluctuates in a smaller range, and the output of each layer of the network is closer to the optimal output of each layer of the network for overcoming the catastrophic forgetting. By the implementation mode, when a certain batch of data is propagated forwards, if the calculated loss value or the L1 norm of the gradient matrix is not higher than the set corresponding threshold value, the batch of data is recorded by the projection matrix, so that much inaccurate information can be discarded, and the projection matrix is more practical. Since the value of the loss function fluctuates all the time during the training process, even at the late stage of the training, some inaccurate information corresponding to the high L1 norm of the loss value or the gradient matrix exists. Therefore, by the implementation mode, even in the later stage of network training, the method provided by the embodiment of the application can also discard some inaccurate information in a self-adaptive manner, and is more favorable for obtaining the image classification model with good classification performance, so that the accuracy and stability of image classification are improved when the image classification model is applied to identifying the images of unknown classes.

In a third possible implementation manner, when each model training task includes a threshold value of a ratio of gradient norms corresponding to each layer in the model training task, step S101 may include:

and S1015, acquiring task information of the projection matrix corresponding to each layer of the initial image classification model during the model training task in the round of training aiming at the projection matrix corresponding to each layer of the initial image classification model.

Wherein the task information comprises a ratio of a first norm to a second norm corresponding to the layer in the round of training; the first norm is the norm of the product of the projection matrix corresponding to the layer after projection and the current gradient; the second norm is a norm of the current gradient.

And S1016, if the ratio is smaller than the ratio threshold of the gradient norm corresponding to the layer, judging that the task information of the projection matrix corresponding to the layer enters a preset credible area when the model training task is in the round of training.

It should be noted that, in the third embodiment, it is required to determine, layer by layer, whether the task information of the projection matrix corresponding to each layer has entered the preset trusted area, which is different from the first two embodiments in that it can be determined by one determination whether the task information of the projection matrix corresponding to each layer has entered the preset trusted area.

Specifically, assuming that a ratio threshold of a gradient norm corresponding to the model training task is β, when a ratio of a first norm and a second norm corresponding to the layer in the round of training satisfies the following formula (1), it is determined that task information of a projection matrix corresponding to the layer has entered a preset confidence region during the round of training of the model training task:

in the formula,. DELTA.w _{i_k-1} Representing a current gradient; p' _{i_k} After projection, the projection matrix corresponding to the layer,

representing a first norm;

representing a second norm.

It is to be noted that, the three embodiments described above may respectively and independently determine whether the task information of the projection matrix corresponding to each layer has entered the preset trusted area, or may be combined with each other to determine whether the task information of the projection matrix corresponding to each layer has entered the preset trusted area: for example, the first embodiment and the second embodiment may be combined, and when two conditions that a difference value between a corresponding training round number and the preset total training round number during the training round is smaller than the preset round number threshold and a corresponding loss value during the training round is smaller than the preset loss value threshold are simultaneously met, it is determined that task information of a projection matrix corresponding to each layer of the model training task during the training round has entered a preset confidence region; for another example, the first embodiment and the second embodiment may be combined, and when any one of the two conditions is satisfied, it may be determined that the task information of the projection matrix corresponding to each layer of the model training task has entered the preset confidence region during the round of training; for another example, the first embodiment and/or the second embodiment may be combined with the third embodiment, that is, after the task information of the projection matrix corresponding to each layer during the round of training is judged to have entered the preset confidence region according to the first embodiment and/or the second embodiment, whether the task information of the projection matrix corresponding to each layer has entered the preset confidence region is further judged layer by layer according to the third embodiment.

S102, if the task information of the projection matrix corresponding to the layer in the initial image classification model enters a preset credible area during the training, determining the projection matrix corresponding to the layer obtained by the training according to the projection matrix corresponding to the layer in the initial image classification model obtained by the previous training according to a first preset mode.

Therefore, by introducing the concept of the credible area and setting that the projection matrix is updated after the task information trained by the model enters the preset credible area, misleading information generated by the training task in the unreliable state in the model can be avoided, and immunity of the training task in the unreliable state is realized; the method avoids the situation that when the model parameter reliability of the image classification model is low, the projection matrix is used for updating the model parameter too early, thereby improving the convergence precision of the image classification model, improving the performance of the image classification model and the accuracy of image classification, achieving the technical effect of 'removing dregs and taking essence', and conforming to the general rule of acquiring knowledge by human brain.

Further, each model training task further comprises a sample image set and an image label corresponding to the model training task. Referring to fig. 2, fig. 2 is a second flowchart of a training method of an image classification model according to an embodiment of the present disclosure. As shown in fig. 2, the determining, according to a first predetermined manner, a projection matrix corresponding to the layer in the initial image classification model obtained through a previous round of training according to the projection matrix corresponding to the layer in the initial image classification model may include:

s201, the sample image set corresponding to the model training task is broken up and divided into a plurality of sample image subsets.

In this step, a plurality of sample images in a sample image set corresponding to the model training task are randomly broken and divided into a plurality of small-batch sample image subsets. For each round of training of the model training task, successively training the image classification model by sequentially using a plurality of sample image subsets in the subsequent steps until the last sample image subset is used for completing the training of the image classification model, and considering that the round of training of the model training task is finished. Wherein each sample image subset includes at least one sample image. The multiple sample images corresponding to each model training task have the same image label, and different model training tasks correspond to different image labels.

S202, aiming at any sample image subset, determining a projection matrix corresponding to the layer updated by the sample image subset based on the sample image subset and the projection matrix corresponding to the layer updated by the previous sample image subset.

When the sample image subset is a first sample image subset in the plurality of sample image subsets, taking a projection matrix obtained in a previous training round of the training (the training round is not the first training round) or the initial projection matrix (the training round is the first training round) as the projection matrix updated by using the previous sample image subset.

In a specific implementation, the projection matrix corresponding to the layer updated by using the sample image subset may be determined based on the sample image subset and the projection matrix corresponding to the layer updated by using the previous sample image subset by the following formula:

wherein i represents the number of layers corresponding to the projection matrix, k represents the sequence number corresponding to the sample image subset, and i and k are positive integers; p _{i_k} Representing a projection matrix corresponding to the ith layer updated by using the sample image subset, namely the kth sample subset; p is _{i_k-1} Representing a projection matrix corresponding to the ith layer updated by using a previous sample image subset, namely a (k-1) th sample subset; x is the number of _{i_k} An average value representing a plurality of input data in a projection matrix corresponding to the ith layer;

represents x _{i_k} Transposing; alpha represents a hyperparameter and alpha>0。

Here, when a plurality of sample images are included in the sample image subset, x _{i_k} The average value can be obtained by performing matrix operation on image data of a plurality of sample images.

And S203, determining a projection matrix obtained by updating the last sample image subset in the plurality of sample image subsets as the projection matrix corresponding to the layer obtained by the round of training.

Further, after the S201 breaks up and segments the sample image set corresponding to the model training task into a plurality of sample image subsets, each round of training of the initial image classification model is performed according to the model training task, which further includes:

step 1, aiming at each sample image subset, inputting the sample image subset into an image classification model obtained by iterative training of the previous sample image subset, and determining an image classification result of the sample image subset.

In this round of training, if the sample image subset is the first sample image subset of the plurality of sample image subsets, an image classification model obtained in a previous round of training (the round of training is not the first round of training) or an initial image classification model initially set (the round of training is the first round of training) in the round of training may be used as an image classification model obtained in iterative training using the previous sample image subset.

In this step, for each sample image subset, the sample image subset is input to an image classification model obtained by iterative training using a previous sample image subset, and an image classification result of the sample image subset output by the image classification model is obtained through a forward propagation process of the model.

And 2, determining a loss value corresponding to the sample image subset based on the image classification result of the sample image subset and the image label.

In this step, the loss value corresponding to the sample image subset may be determined based on any manner in the prior art based on the image classification result and the image label of the sample image subset, for example, the loss value may be determined by substituting the image classification result and the image label into a cross entropy loss function, and the like.

And 3, determining the current gradient of the loss value corresponding to the sample image subset to the initial image classification model at the current time.

And 4, updating model parameters of the initial image classification model based on the current model parameters of the initial image classification model, the current gradient of the loss values corresponding to the sample image subsets to the initial image classification model at present and a projection matrix obtained by updating the sample image subsets, and re-determining the image classification model obtained after updating the model parameters as the initial image classification model so as to finish the training of the initial image classification model by using the sample image subsets in the training.

In the step, through methods such as gradient descent, error back propagation and the like, the gradient direction of the initial image classification model is corrected based on a projection matrix orthogonal to an input space in the process of back propagation, so that model parameters are updated, the 'catastrophic forgetting problem' is overcome, the continuous learning capacity of the model is improved, and the accuracy and the stability of image classification of the trained image classification model are improved when the trained image classification model is applied to recognizing images of unknown classes.

In specific implementation, based on the current model parameter of the initial image classification model, the current gradient of the sample image subset to the initial image classification model at the present time, and the projection matrix obtained by updating the sample image subset, the model parameter of the initial image classification model can be updated according to the following formula:

w _{i_k} ＝w _{i_k-1} -γ·P _{i_k} ·Δw _{i_k-1} ；

wherein i represents the number of layers corresponding to the projection matrix, k represents the sequence number corresponding to the sample image subset, and k is a positive integer; p is _{i_k} A projection matrix representing the ith layer updated using the sample image subset; w is a _{i_k} A model parameter matrix representing the ith layer of the initial image classification model after model parameter updating; w is a _{i_k-1} Carrying out model parameter updating on a current model parameter matrix of the ith layer of the initial image classification model; Δ w _{i_k-1} Representing the current gradient matrix of the ith layer of the initial image classification model at the current loss value corresponding to the sample image subset; γ represents the learning rate.

Referring back to fig. 1, S103, if the task information of the projection matrix corresponding to the layer in the initial image classification model does not enter the preset confidence region during the round of training, determining the projection matrix corresponding to the layer obtained by the round of training according to the projection matrix corresponding to the layer in the initial image classification model obtained by the previous round of training according to a second predetermined manner.

In this step, for the first embodiment and the second embodiment in S101, if the task information during the training round does not enter the preset confidence region, P is _{i_k} ＝P _{i_k-1} And i represents the number of layers corresponding to the projection matrix. Accordingly, when the round of training is the first round of training in a multi-round of training, the initial projection matrix P may be used ₀ As the projection matrix obtained from the previous training round.

It should be noted that even if the projection matrix obtained from the previous training round is not updated, the projection matrix obtained from the training round is also used for updating the model parameters of the initial image classification model in the back propagation stage of the model training round.

Further, for the third implementation in S101, step S103 may include:

and S1031, disordering and dividing the sample image set corresponding to the model training task into a plurality of sample image subsets.

S1032, aiming at any sample image subset, determining a projection matrix corresponding to the layer updated by the sample image subset based on the sample image subset and the projection matrix corresponding to the layer updated by the previous sample image subset;

s1033, determining a projection matrix updated by using the last sample image subset in the plurality of sample image subsets as the projection matrix corresponding to the layer obtained by the round of training;

here, the descriptions of S1031 to S1033 refer to the descriptions of S201 to S203, and the same technical effects can be achieved, which are not described herein again.

In the process of determining the projection matrix corresponding to the layer obtained by updating the sample image subset, if the task information of the projection matrix corresponding to the layer in the initial image classification model does not enter a preset credible region in the training round, then:

(1) And aiming at each image data input into the projection matrix corresponding to the layer, respectively calculating the target ratio of the first norm and the second norm corresponding to the layer by using each image data.

(2) And arranging the target ratios calculated by using each image data in an ascending order, and screening m image data corresponding to the previous m target ratios, wherein m is a positive integer.

(3) Based on the m image data, the projection matrix corresponding to the layer updated by using the previous sample image subset, the projection matrix corresponding to the layer updated by using the sample image subset is determined by the following formula:

wherein i represents the number of layers corresponding to the projection matrix, k represents the sequence number corresponding to the sample image subset, and i and k are positive integers; p _{i_k} Representing a projection matrix corresponding to the ith layer updated by using the sample image subset; p _{i_k-1} Representing a projection matrix corresponding to the ith layer updated by using the previous sample image subset; x' _{i_k} Representing an average value of m pieces of image data input into a projection matrix corresponding to the ith layer; x' _{i_k} ^T Represents x' _{i_k} Transposing; alpha represents a hyperparameter and alpha>0。

By the implementation mode, the similarity of input data of the projection matrix among training tasks or among sample image subsets (batch) can be adjusted, any abrupt data cannot participate in updating of the projection matrix, when a new task is trained, the abrupt data can be eliminated before back propagation is applied, the input data of the projection matrix are balanced as much as possible, and the gradient of a new sample is projected to a data space vertical to the previously learned data.

In an experiment, an image classification model is designed based on addition of an orthogonal projection matrix (OWM) algorithm (orthogonal weights modification); the first nine layers of the image classification model are formed by alternately combining convolutional layers, dropout layers and maximum pooling layers. The three convolutional layers respectively comprise 64 filters, 128 filters and 256 filters, the sizes of convolutional kernels are uniformly set to be 2 multiplied by 2, and the convolutional layers are mainly used for extracting features. The windows of the three largest pooling layers are uniformly set to be 2 multiplied by 2, the pooling layers are used for reducing the dimension of the characteristic graph, keeping the characteristic invariance of the image and preventing overfitting to a certain extent. The last three layers of the image classification model consist of dense layers, and the corresponding number of neurons is [1000-1000-10]. The image classification model adopts a cross entropy loss function added with an L2 regularization term, and the activation function uses a ReLU function and uses an Xavier method for weight initialization. Experimental results show that the image classification method based on continuous learning provided by the embodiment of the application shows remarkable improvement of image classification performance.

The embodiment of the application provides an image classification method based on continuous learning, which comprises the following steps: inputting a target image to be classified into a pre-trained image classification model, and determining the image category of the target image; wherein the image classification model is trained by the following steps: sequentially acquiring a plurality of model training tasks under a continuous learning condition; sequentially carrying out multiple rounds of training on the initial image classification model according to each model training task to obtain the trained image classification model; aiming at any model training task, in the process of carrying out each round of training on the initial image classification model according to the model training task: judging whether task information of a projection matrix corresponding to each layer in the initial image classification model of the model training task enters a preset credible area during the training; the initial projection matrix corresponding to each layer is an identity matrix with the same matrix scale as the model parameter scale of the initial image classification model; the projection matrix is used for updating model parameters of the initial image classification model during backward propagation; and if the task information of the projection matrix corresponding to the layer in the initial image classification model enters a preset credible area during the training, determining the projection matrix corresponding to the layer obtained by the training according to the projection matrix corresponding to the layer in the initial image classification model obtained by the previous training in a first preset mode.

Therefore, proper projection matrix is added to adjust model parameters, the continuous learning capacity of the model can be improved, and the accuracy and stability of image classification of the trained image classification model are improved when the trained image classification model is applied to recognizing unknown images; in addition, the projection matrix is updated after the task information of model training enters a preset credible area, misleading information of the training task in the model when the training task is not credible can be avoided, and therefore the performance of the image classification model and the accuracy of image classification are further improved.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an image classification device based on continuous learning according to an embodiment of the present disclosure. As shown in fig. 3, the apparatus 300 comprises:

the classification module 310 is configured to input a target image to be classified into a pre-trained image classification model, and determine an image category of the target image;

a training module 320, configured to train to obtain the image classification model by:

performing multiple rounds of training on the initial image classification model according to each model training task in sequence to obtain the trained image classification model; aiming at any model training task, in the process of carrying out each round of training on the initial image classification model according to the model training task:

judging whether task information of a projection matrix corresponding to each layer in the initial image classification model of the model training task enters a preset credible area during the training; the initial projection matrix corresponding to each layer is an identity matrix with the same matrix scale as the model parameter scale of the initial image classification model; the projection matrix is used for updating model parameters of the initial image classification model during backward propagation;

Further, the training module 320 is further configured to:

Furthermore, each model training task comprises a preset total training round number and a preset round number threshold value corresponding to the model training task; when the training module 320 is configured to determine whether task information of a projection matrix corresponding to each layer in the initial image classification model during the training of the model training task in the round enters a preset trusted area, the training module 320 is configured to:

and if the difference value between the training round number corresponding to the training round and the preset total training round number is smaller than the preset round number threshold value, judging that the task information of the projection matrix corresponding to each layer of the model training task enters a preset credible area during the training round.

Furthermore, each model training task comprises a preset loss value threshold corresponding to the model training task or a preset gradient threshold corresponding to each layer in the model training task; when the training module 320 is configured to determine whether task information of a projection matrix corresponding to each layer in the initial image classification model during the training of the model training task in the round enters a preset confidence region, the training module 320 is configured to:

if the corresponding loss value during the round of training is smaller than the preset loss value threshold value, judging that the task information of the projection matrix corresponding to each layer of the model training task during the round of training enters a preset credible area;

or if the sum of the absolute values of each element in the gradient matrix of each layer is smaller than the preset gradient threshold corresponding to the layer, judging that the task information of the projection matrix corresponding to the layer enters a preset credible area during the round of training of the model training task.

Furthermore, each model training task comprises a ratio threshold value of the gradient norm corresponding to each layer in the model training task; when the training module 320 is configured to determine whether task information of a projection matrix corresponding to each layer in the initial image classification model during the training of the model training task in the round enters a preset confidence region, the training module 320 is configured to:

Furthermore, each model training task also comprises a sample image set and an image label corresponding to the model training task; when the training module 320 is configured to determine, according to a first predetermined manner, a projection matrix corresponding to the layer in the initial image classification model obtained through a previous round of training, the training module 320 is configured to:

for any sample image subset, determining a projection matrix corresponding to the layer updated by using the sample image subset based on the sample image subset and the projection matrix corresponding to the layer updated by using the previous sample image subset; when the sample image subset is the first sample image subset in the plurality of sample image subsets, taking the projection matrix obtained by the previous training round of the training or the initial projection matrix as the projection matrix obtained by using the previous sample image subset for updating;

and determining the projection matrix updated by using the last sample image subset in the plurality of sample image subsets as the projection matrix corresponding to the layer obtained by the round of training.

Further, when each model training task includes a ratio threshold of a gradient norm corresponding to the model training task, the training module 320 is configured to, if the task information of the projection matrix corresponding to the layer in the initial image classification model does not enter a preset confidence region during the round of training, determine, according to a second predetermined manner and according to the projection matrix corresponding to the layer in the initial image classification model obtained through a previous round of training, the projection matrix corresponding to the layer obtained through the round of training, where the training module 320 is configured to:

in the process of determining the projection matrix corresponding to the layer obtained by updating the sample image subset, if the task information of the projection matrix corresponding to the layer in the initial image classification model does not enter a preset trusted area during the training, respectively calculating a target ratio of a first norm and a second norm corresponding to the layer by using each image data for each image data of the projection matrix corresponding to the input layer;

determining a projection matrix corresponding to the layer updated by using the sample image subset according to the following formula based on the m image data and the projection matrix corresponding to the layer updated by using the previous sample image subset:

wherein i represents the number of layers corresponding to the projection matrix, k represents the sequence number corresponding to the sample image subset, and i and k are positive integers; p _{i_k} Representing a projection matrix corresponding to the ith layer updated by using the sample image subset; p _{i_k-1} Representing a projection matrix corresponding to the ith layer updated by using the previous sample image subset; x' _{i_k} An average value representing m pieces of image data input into a projection matrix corresponding to the ith layer; x' _{i_k} ^T Represents x' _{i_k} Transposing; alpha represents a hyperparameter and alpha>0。

Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 4, the electronic device 400 includes a processor 410, a memory 420, and a bus 430.

The memory 420 stores machine-readable instructions executable by the processor 410, when the electronic device 400 runs, the processor 410 communicates with the memory 420 through the bus 430, and when the machine-readable instructions are executed by the processor 410, the steps of the image classification method based on continuous learning in the method embodiments shown in fig. 1 and fig. 2 may be executed.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the step of the image classification method based on continuous learning in the method embodiments shown in fig. 1 and fig. 2 may be executed.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some communication interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in software functional units and sold or used as a stand-alone product, may be stored in a non-transitory computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present application and are intended to be covered by the appended claims. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image classification method based on continuous learning, characterized in that the method comprises:

judging whether task information of a projection matrix corresponding to each layer in the initial image classification model of the model training task enters a preset credible area during the round of training; the initial projection matrix corresponding to each layer is an identity matrix with the same matrix scale as the model parameter scale of the initial image classification model; the projection matrix is used for updating model parameters of the initial image classification model during back propagation;

and if the task information of the projection matrix corresponding to the layer in the initial image classification model enters a preset credible area during the training, determining the projection matrix corresponding to the layer obtained by the training according to the projection matrix corresponding to the layer in the initial image classification model obtained by the previous training in a first preset mode.

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein each model training task comprises a preset total number of training rounds and a preset round threshold corresponding to the model training task; judging whether the task information of the projection matrix corresponding to each layer in the initial image classification model of the model training task in the training round enters a preset credible area or not comprises the following steps:

4. The method of claim 1, wherein each model training task comprises a preset loss threshold corresponding to the model training task or a preset gradient threshold corresponding to each layer in the model training task; the method for judging whether the task information of the projection matrix corresponding to each layer in the initial image classification model of the model training task enters the preset credible area during the training of the round comprises the following steps:

5. The method of claim 2, wherein each model training task comprises a ratio threshold of the gradient norm corresponding to each layer in the model training task; judging whether the task information of the projection matrix corresponding to each layer in the initial image classification model of the model training task in the training round enters a preset credible area or not comprises the following steps:

6. The method of claim 1, wherein each model training task further comprises a sample image set and an image label corresponding to the model training task; determining the projection matrix corresponding to the layer obtained by the previous round of training according to the projection matrix corresponding to the layer in the initial image classification model obtained by the previous round of training according to a first preset mode, wherein the method comprises the following steps:

7. The method according to claim 5, wherein when each model training task includes a threshold of a ratio of gradient norms corresponding to the model training task, if task information of the projection matrix corresponding to the layer in the initial image classification model does not enter a preset confidence region during the training of the round, determining the projection matrix corresponding to the layer obtained by the training of the round according to the projection matrix corresponding to the layer in the initial image classification model obtained by the training of the previous round in a second predetermined manner, includes:

wherein i represents the number of layers corresponding to the projection matrix, k represents the sequence number corresponding to the sample image subset, and i and k are positive integers; p _{i_k} Indicating the use of the sample image subsetUpdating the obtained projection matrix corresponding to the ith layer; p is _{i_k-1} Representing a projection matrix corresponding to the ith layer updated by using the previous sample image subset; x' _{i_k} An average value representing m pieces of image data input into a projection matrix corresponding to the ith layer; x' _{i_k} ^T Represents x' _{i_k} Transposing; alpha represents a hyperparameter and alpha>0。

8. An apparatus for image classification based on continuous learning, the apparatus comprising:

9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of a method of image classification based on continuous learning according to any one of claims 1 to 7.

10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of a method for image classification based on continuous learning according to any one of claims 1 to 7.