CN114358197A

CN114358197A - Method and device for training classification model, electronic equipment and storage medium

Info

Publication number: CN114358197A
Application number: CN202210021820.3A
Authority: CN
Inventors: 单齐齐; 周涛; 史治国
Original assignee: Shanghai Yibao Health Management Co ltd; Zhejiang University ZJU
Current assignee: Shanghai Yibao Health Management Co ltd; Zhejiang University ZJU
Priority date: 2022-01-10
Filing date: 2022-01-10
Publication date: 2022-04-15

Abstract

The application provides a training method and device of a classification model, an electronic device and a computer readable storage medium, wherein the method comprises the following steps: constructing a lightweight model corresponding to the trained standard classification model; respectively inputting sample invoice images in the sample data set into a standard classification model and a lightweight model to obtain a first loss parameter of the standard classification model in a prediction process and a second loss parameter of the lightweight model in the prediction process; weighting and summing the first loss parameter and the second loss parameter to obtain a target loss parameter; and adjusting the network parameters of the light weight model according to the target loss parameters, and iterating until the light weight model converges to obtain the light weight classification model. According to the scheme, the first evidence parameter of the standard classification model is referred to in the training process, and the knowledge learned by the standard classification model from the sample invoice image can be transmitted to the light-weight model, so that the light-weight classification model with better prediction effect and lower model complexity is trained.

Description

Method and device for training classification model, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for training a classification model, an electronic device, and a computer-readable storage medium.

Background

In the field of deep learning, in order to enable a network model to achieve a better prediction effect, an over-parameterized deep neural network can be adopted; the second is that multiple weaker network models can be integrated. Both of the two modes have great expenditure, and the network model occupies very large computing resources and large computing amount during operation, so that the network model is limited by hardware conditions during deployment. In order to reduce the deployment threshold of the network model, the network model can be compressed in a quantitative weight and pruning mode. However, quantizing the weights may make back-propagation of the network model impractical during training, because the gradients cannot be back-propagated through discrete neurons, making convergence of the network model difficult. The pruning method can only reduce the scale of the network model and cannot reduce the calculation time.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method and an apparatus for training a classification model, an electronic device, and a computer-readable storage medium, which are used for training a classification model with a better prediction effect under the condition that a complexity of a network model is low.

In one aspect, the present application provides a training method for classification models, including:

constructing a lightweight model corresponding to the trained standard classification model;

respectively inputting sample invoice images in a sample data set into the standard classification model and the lightweight model to obtain a first credential parameter of the standard classification model in a prediction process and a second credential parameter of the lightweight model in the prediction process;

determining a target loss parameter according to the first credential parameter and the second credential parameter;

and adjusting the network parameters of the lightweight model according to the target loss parameters, and iterating until the lightweight model converges to obtain a lightweight classification model.

In an embodiment, before the building a lightweight model corresponding to the trained standard classification model, the method further comprises:

inputting a sample invoice image in a sample data set into a standard network model to obtain prediction category information output by the standard network model; the sample data set comprises a plurality of sample invoice images, and each sample invoice image carries an invoice category label;

and adjusting the network parameters of the standard network model based on the difference between the prediction category information and the invoice category label until the standard network model converges to obtain a standard classification model.

In an embodiment, the method further comprises:

and inputting the target invoice picture into the lightweight classification model to obtain invoice category information output by the lightweight classification model.

In an embodiment, the standard classification model comprises a first convolutional neural network and a first classifier, and the first credential parameter is determined by:

in the process of predicting any sample image by the standard classification model, obtaining a convolution calculation result of the first convolution neural network on the sample image as the first credential parameter.

In an embodiment, the lightweight model includes a second convolutional neural network and a second classifier, the second credential parameter is determined by:

and in the process of predicting any sample image by the lightweight model, obtaining the convolution calculation result of the second convolution neural network on the sample image as the second credential parameter.

In an embodiment, said determining a target loss parameter based on said first credential parameter and said second credential parameter comprises:

determining a first loss parameter according to the first credential parameter, the second credential parameter, a preset first temperature parameter and a preset category total amount;

determining a second loss parameter according to the invoice category label of the sample invoice image, the second credential parameter, a preset second temperature parameter and the category total amount;

and weighting and summing the first loss parameter and the second loss parameter to obtain the target loss parameter.

In an embodiment, the method further comprises:

acquiring a plurality of initial invoice images, and performing data enhancement processing on the plurality of initial invoice images to obtain a plurality of sample invoice images;

and constructing the sample data set according to the plurality of sample invoice images.

In another aspect, the present application provides a training apparatus for classification models, including:

the building module is used for building a light weight model corresponding to the trained standard classification model;

the acquisition module is used for respectively inputting the sample invoice images in the sample data set into the standard classification model and the lightweight model to obtain a first credential parameter of the standard classification model in a prediction process and a second credential parameter of the lightweight model in the prediction process;

a determination module configured to determine a target loss parameter according to the first credential parameter and the second credential parameter;

and the updating module is used for adjusting the network parameters of the lightweight model according to the target loss parameters and iterating until the lightweight model converges to obtain the lightweight classification model.

Furthermore, the present application provides an electronic device comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the above training method of the classification model.

Further, the present application provides a computer-readable storage medium storing a computer program executable by a processor to perform the above training method of a classification model.

According to the scheme, after a trained standard classification model is built, a light weight model corresponding to the standard classification model is built, and a sample invoice image is respectively input into the standard classification model and the light weight model when the light weight model is trained, so that a target loss parameter is determined according to a first evidence parameter of the standard classification model prediction process and a second evidence parameter of the light weight model in the prediction process, and a network parameter of the light weight model is adjusted according to the target loss parameter, so that the light weight classification model is trained;

because the first evidence parameter of the trained standard classification model is referred in the training process, the knowledge learned by the standard classification model from the sample invoice image can be transmitted to the lightweight model, so that the lightweight classification model with better prediction effect and lower model complexity is trained.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a training method of a classification model according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a method for determining a target loss parameter according to an embodiment of the present application;

fig. 4 is a block diagram of a training apparatus for a classification model according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

As shown in fig. 1, the present embodiment provides an electronic apparatus 1 including: at least one processor 11 and a memory 12, one processor 11 being exemplified in fig. 1. The processor 11 and the memory 12 are connected by a bus 10, and the memory 12 stores instructions executable by the processor 11, and the instructions are executed by the processor 11, so that the electronic device 1 can execute all or part of the flow of the method in the embodiments described below. In an embodiment, the electronic device 1 may be a host for performing a training method of the classification model.

The Memory 12 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk.

The present application also provides a computer-readable storage medium, which stores a computer program executable by the processor 11 to perform the training method of the classification model provided herein.

Referring to fig. 2, a flowchart of a training method for a classification model according to an embodiment of the present application is shown, and as shown in fig. 2, the method may include the following steps 210 to 240.

Step 210: and constructing a lightweight model corresponding to the trained standard classification model.

The standard classification model is a network model with high model complexity. Here, the model complexity may be in terms of parameters of the network model and floating point operands per second (flo)ting-point operations per second, FLOPS). The number of parameters for an exemplary standard classification model is on the order of 10⁷With floating-point operands per second in the order of 10⁸。

The lightweight model is a network model with a low model complexity. Illustratively, the number of parameters for the lightweight model is on the order of 10⁶With floating-point operands per second in the order of 10⁷。

After obtaining the trained standard classification model, the host computer may build a lightweight model with a similar structure to the standard classification model. The host computer can respond to the building instruction of the lightweight model and build the lightweight model according to the mode indicated by the building instruction.

Step 220: and respectively inputting the sample invoice images in the sample data set into a standard classification model and a lightweight model to obtain a first credential parameter of the standard classification model in the prediction process and a second credential parameter of the lightweight model in the prediction process.

Wherein, the sample invoice image is the invoice image to be classified. The sample dataset may include a plurality of sample invoice images, and each sample invoice image carries an invoice category label indicating a category of the sample invoice image. For example, the categories of the sample invoice image may include medical invoices, taxi invoices, catering consumption invoices, supermarket consumption invoices, hotel accommodation invoices, and the like.

The first credential parameter represents a characteristic parameter that the standard classification model extracts from the sample invoice image during the prediction process. The second credential parameter represents a characteristic parameter that the lightweight model extracts from the sample invoice image during the prediction process.

Step 230: a target loss parameter is determined based on the first credential parameter and the second credential parameter.

Step 240: and adjusting the network parameters of the light weight model according to the target loss parameters, and iterating until the light weight model converges to obtain the light weight classification model.

Here, the target loss parameter is a parameter for evaluating an error of the lightweight model in the prediction process.

After obtaining the first credential parameter and the second credential parameter, the host may perform a calculation based on the two credential parameters to obtain a target loss parameter. After obtaining the target loss parameters, network parameters of the lightweight model may be adjusted.

After the adjustment, the host may input the sample invoice image in the sample data set into the standard classification model and the lightweight model again, so as to obtain the first credential parameter and the second credential parameter again, determine a new target loss parameter according to the new first credential parameter and the second credential parameter, and adjust the network parameter of the lightweight model with the new target loss parameter.

Through repeated iteration, when the target loss parameter tends to be stable, the lightweight model can be determined to be converged, and at the moment, the lightweight classification model for classifying the invoice image is obtained.

By the measures, after the standard classification model with higher model complexity is obtained, the standard classification model can be used as a teacher model, and the light weight model with lower model complexity corresponding to the standard classification model is constructed and used as a student model; when the lightweight model is trained, respectively inputting the sample invoice image into the standard classification model and the lightweight model to obtain a first credential parameter of the standard classification model in the prediction process and a second credential parameter of the lightweight model in the prediction process, and determining a target loss parameter according to the first credential parameter and the second credential parameter; the standard classification model is trained, the target loss parameter is determined according to the first evidence parameter, and the learned knowledge of the standard classification model can be transmitted to the lightweight model as a soft label, so that the lightweight classification model with lower complexity and better prediction effect is obtained through training.

In one embodiment, the standard classification model may be trained prior to building a lightweight model corresponding to the standard classification model.

The host computer can input the sample invoice image in the sample data set into the standard network model to obtain the prediction category information output by the standard network model. Here, the prediction type information is type information output by the network model.

The host computer may adjust network parameters of the standard network model based on differences between the prediction category information and the invoice category label. The prediction category information and the invoice category label can be multidimensional vectors; each element of the multi-dimensional vector represents a confidence of the corresponding class.

The host can calculate the difference between the prediction category information and the invoice category label through a Cross Entropy Loss function (Cross Entropy Loss), obtain a Loss parameter, and adjust the network parameter of the standard network model according to the Loss parameter. The cross entropy loss function can be expressed by the following equation (1):

wherein n represents the number of invoice images of one batch of samples; m represents the total number of categories; y is_imRepresenting the true confidence of the ith sample invoice image on the mth category; y'_imThe prediction confidence of the ith sample invoice image on the mth category is shown.

In the training process, after a plurality of iterations, the learning rate of the standard network model can be reduced. Illustratively, the total training amount of the standard network model is 100 rounds, and after 29 rounds of training, the learning rate can be reduced for the first time; after 59 rounds of training, the learning rate may be reduced a second time.

Through repeated iteration, when the loss parameters tend to be stable, the standard network model can be determined to be converged, and at the moment, the standard classification model is obtained.

In an embodiment, after training to obtain the lightweight classification model, the invoice images may be classified by the lightweight classification model.

The host computer can input the target invoice image into the lightweight classification model, so as to obtain the invoice category information output by the lightweight classification model. Here, the target invoice image is an invoice image to be classified, and the invoice category information is used to indicate an invoice category to which the target invoice image belongs.

Due to the fact that the model complexity of the light-weight classification model is low, the calculated amount is small, the calculation time is short, and the classification result of the target invoice image can be obtained quickly through the light-weight classification model. In addition, due to the low deployment threshold of the lightweight classification model, the host can deploy the lightweight classification model to a device with low hardware condition (such as a mobile phone), so that the invoice category detection is performed on other devices through the lightweight classification model.

In an embodiment, the standard classification model includes a first convolutional neural network and a first classifier. Here, the first convolutional neural network is a convolutional neural network with higher complexity, and may be, for example, ResNet 18; the first classifier is used for calculating the prediction class information according to the output of the first convolutional neural network, and the first classifier can be softmax for example.

In the process of predicting any sample image by the standard classification model, the host computer can obtain a convolution calculation result of the first convolution neural network on the sample image as a first credential parameter. Here, the convolution calculation result is a calculation result output by the last network layer of the first convolution neural network.

In an embodiment, the lightweight model includes a second convolutional neural network and a second classifier. Here, the second convolutional neural Network is a convolutional neural Network with lower complexity, which may contain fewer convolutional layers than the first convolutional neural Network, and may be, for example, TRN8(Tiny Residual Network 8); the second classifier is used for calculating the prediction class information according to the output of the second convolutional neural network, and the second classifier can be softmax for example.

In the process of predicting any sample image by the lightweight model, the host computer can obtain a convolution calculation result of the second convolution neural network on the sample image as a second credential parameter. Here, the convolution calculation result is a calculation result output by the last network layer of the second convolutional neural network.

In an embodiment, referring to fig. 3, a flowchart of a method for determining a target loss parameter provided in an embodiment of the present application is shown in fig. 3, where the method may include the following steps 310 to 330.

Step 310: and determining a first loss parameter according to the first evidence parameter, the second evidence parameter, a preset first temperature parameter and a preset category total amount.

Wherein the first temperature parameter is a value greater than 1, and illustratively, the first temperature parameter is 2.5.

The host may determine the value (confidence) of softmax output of the standard classification model on each category at the first temperature parameter by the first credential parameter, the first temperature parameter, and the category total. Illustratively, this is expressed by the following formula (2):

here, T is a first temperature parameter; p is a radical of_i ^TIs the value of the softmax output of the standard classification model at the first temperature parameter on the ith class; n is the total amount of the categories; v. of_iIs the value of the first credential parameter in the ith category; v. of_kRepresenting the value of the first credential parameter in the kth category.

The host may determine values of softmax output of the lightweight model at the first temperature parameter over the categories by the second credential parameter, the first temperature parameter, and the category total. Illustratively, this is expressed by the following formula (3):

here, T is a first temperature parameter; q. q.s_i ^TIs a value of softmax output of the lightweight model at the first temperature parameter on the ith class; n is the total amount of the categories; z is a radical of_iIs the value of the second credential parameter in the ith category; z is a radical of_kRepresenting the value of the second credential parameter in the kth category.

After obtaining values of the softmax output over the various categories for the standard classification model at the first temperature parameter, and values of the softmax output over the various categories for the lightweight model at the first temperature parameter, the host may determine a first loss parameter. Illustratively, this is expressed by the following formula (4):

here, L_softIs a first loss parameter; n is the total amount of the categories; q. q.s_j ^TIs the value of the softmax output of the lightweight model at the first temperature parameter on the jth class; p is a radical of_j ^TIs the value of the softmax output of the standard classification model at the first temperature parameter on the jth class.

Step 320: and determining a second loss parameter according to the invoice category label of the sample invoice image, the second credential parameter, the preset second temperature parameter and the category total amount.

Wherein the second temperature parameter may be 1.

The host may determine values of softmax output of the lightweight model at the second temperature parameter over the categories by the second credential parameter, the second temperature parameter, and the category total. Illustratively, this is expressed by the following formula (5):

here, T is a second temperature parameter; q. q.s_i ^TIs the value of the softmax output of the lightweight model at the second temperature parameter on the ith class; n is the total amount of the categories; z is a radical of_iIs the value of the second credential parameter in the ith category; z is a radical of_kRepresenting the value of the second credential parameter in the kth category.

After obtaining the value of softmax output of the lightweight model at the second temperature parameter over the various categories, the host may determine a second loss parameter in conjunction with the invoice category tag. Illustratively, this is expressed by the following equation (6):

here, L_hardIs a second loss parameter; n is the total amount of the categories; q. q.s_j ^TIs the value of the softmax output of the lightweight model at the second temperature parameter on the jth category; p is a radical of_j ^TIs the value of the invoice category label on the jth category, if the invoice category label indicates that the invoice belongs to the jth invoice, the value is 1, otherwise, the value is 0.

Step 330: and weighting and summing the first loss parameter and the second loss parameter to obtain a target loss parameter.

The host may perform weighted summation on the first loss parameter and the second loss parameter according to a preset weight coefficient, so as to obtain a target loss parameter. Illustratively, this is expressed by the following formula (7):

L＝αL_soft+βL_hard (7)

wherein, L is a target loss parameter; l is_softIs a first loss parameter; α is a weight coefficient of the first loss parameter, which may be 1, for example; l is_hardIs a second loss parameter; β is a weighting factor of the second loss parameter, which may be 0.5, for example.

In an embodiment, the sample data set may be constructed prior to training the standard classification model and the lightweight classification model.

The host computer can obtain a plurality of initial invoice images, and carry out data enhancement processing on the plurality of initial invoice images to obtain a plurality of sample invoice images.

The initial invoice image may be a photographed invoice image or an invoice image directly obtained from an external data source. The initial invoice image may be annotated with an invoice category label. The sample invoice image is the invoice image that is ultimately used for training.

The host computer can modify the brightness, contrast and saturation of the initial invoice image, and perform hue dithering, size unification and random rotation, thereby completing data enhancement processing and improving the data volume of the invoice image. And carrying the invoice category label of the initial invoice image corresponding to the sample invoice image subjected to the image enhancement processing. After obtaining the plurality of sample invoice images, the host computer may construct a sample data set from the plurality of sample invoice images.

Through the measures, the data volume of the invoice image can be improved by the host computer, so that the subsequent training effect is better.

According to the scheme, the standard classification model with high model complexity is used as a teacher model, the light weight model with low model complexity is used as a student model, and the learned Knowledge of the teacher model is transmitted to the student model in a Knowledge Distillation (Knowledge Distillation) training mode, so that the light weight classification model obtained by training the light weight model has a better prediction effect. Here, the prediction effect may be represented by ACC (accuracy).

Exemplarily, the convolutional neural network in the standard classification model is resnet18, and after the training of the sample data set, the accuracy rate is 99.7%; the convolutional neural network in the lightweight classification model is trn8, and after the training of the sample data set, the accuracy rate is 99.18%; through a knowledge distillation mode, a light-weight classification model is trained on the basis of a trained standard classification model, and the accuracy rate is 99.68%. Therefore, under the condition that the network structure of the lightweight classification model is not changed, the prediction effect can be obviously improved by training in a knowledge distillation mode.

Fig. 4 is a block diagram of an apparatus for training a classification model according to an embodiment of the present invention, and as shown in fig. 4, the apparatus may include:

a construction module 410 for constructing a lightweight model corresponding to the trained standard classification model;

an obtaining module 420, configured to input sample invoice images in a sample data set into the standard classification model and the lightweight model, respectively, and obtain a first credential parameter of the standard classification model in a prediction process and a second credential parameter of the lightweight model in the prediction process;

a determining module 430, configured to determine a target loss parameter according to the first credential parameter and the second credential parameter;

and the updating module 440 is configured to adjust the network parameters of the lightweight model according to the target loss parameters, and iterate until the lightweight model converges to obtain a lightweight classification model.

The implementation process of the functions and actions of each module in the above device is specifically detailed in the implementation process of the corresponding step in the above training method for the classification model, and is not described herein again.

In the embodiments provided in the present application, the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A training method of a classification model is characterized by comprising the following steps:

2. The method of claim 1, wherein prior to said building a lightweight model corresponding to a trained standard classification model, the method further comprises:

3. The method of claim 1, further comprising:

4. The method of claim 1, wherein the standard classification model comprises a first convolutional neural network and a first classifier, and wherein the first credential parameter is determined by:

5. The method of claim 1, wherein the lightweight model comprises a second convolutional neural network and a second classifier, and wherein the second credential parameter is determined by:

6. The method of claim 1, wherein said determining a target loss parameter based on said first credential parameter and said second credential parameter comprises:

7. The method of claim 1, further comprising:

8. A training device for classification models, comprising:

9. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method of training a classification model of any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program executable by a processor to perform the method of training a classification model according to any one of claims 1 to 7.