CN111709493A

CN111709493A - Object classification method, training method, device, equipment and storage medium

Info

Publication number: CN111709493A
Application number: CN202010662167.XA
Authority: CN
Inventors: 沈力; 黄浩智; 王璇; 刘威
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2020-09-25
Anticipated expiration: 2040-07-10
Also published as: CN111709493B

Abstract

The invention discloses an object classification method, a training method, a device, equipment and a storage medium, wherein the classification method comprises the following steps: obtaining an object to be classified; inputting the object into a classification model, and outputting a classification result; the classification model is obtained by the following steps: constructing a first objective function of the neural network model, wherein the first objective function has a zero-modulus norm of parameters of the neural network model; converting the zero norm into an equivalent continuous representation; obtaining a continuous second objective function according to the continuous representation and the first objective function; and training the neural network model based on the second objective function to obtain a trained classification model. The invention is based on the continuous second objective function, reduces the occupied data processing resources, has low calculation difficulty, obtains the classification model after pruning in the training process by an equivalent transformation mode, does not need to introduce approximate errors, can obtain higher classification precision and can be widely applied to the technical field of artificial intelligence.

Description

Object classification method, training method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an object classification method, a training method, a device, equipment and a storage medium.

Background

In recent years, with the rapid development of science and technology, artificial intelligence is continuously entering the visual field of people. As a core technology of artificial intelligence, deep learning is also receiving attention and attention from more and more people. In the deep learning field, classification (classification) problem is the most basic and important research direction, and many other AI applications can be evolved from classification problem, and many problems can be transformed into classification problem. For example, image segmentation of a natural scene in computer vision can be converted into classification of each pixel point and then corresponding labels are given to the pixel points.

With the development of deep learning, the depth of the neural network is deeper and deeper, the amount of calculation and the amount of parameters are more and more, and the device with limited calculation capability (such as a small server or a terminal product such as a mobile device) is difficult to deploy. The model pruning technology mainly comprises weight pruning and channel pruning, and is an important ring for deployment of a deep learning model in actual production. The deep learning model of the classification problem is compressed through a model pruning technology, so that resources (such as computer memory) occupied by the classification problem can be reduced, and the cost is saved.

The current model pruning solution for the classification problem is to design various criteria to judge which network parameters are redundant, and then prune and fine tune the neural network model according to the judgment result. For example, the norm of the weight of each channel of the neural network is calculated, then the channels are sequentially ordered from large to small, and finally the channel corresponding to the last weight parameter is deleted, so that the model after pruning is obtained; or, the pruning model can be converted into a non-convex non-smooth optimization model with sparse optimization by introducing mask parameters, and then the mask variables are solved by using the skill of the sparse optimization to carry out pruning and the like of the deep neural network of the classification problem. The model pruning methods for the classification problem all adopt the idea of sparse optimization to carry out model pruning, but due to the introduction of sparse constraint/regular terms, the model pruning problem is a discontinuous non-convex optimization problem, so that the methods cannot adopt a back propagation method to directly train a pruning model, and the calculation difficulty of the classification problem is not reduced. For this reason, the related art may adopt an approximation method to approximate the sparse constraint/regularization term, thereby ensuring that back propagation can be performed in the process of training the classification model. However, the adoption of the approximation method can cause a certain approximation error between the classification model after pruning compression and the original classification model, thereby reducing the classification precision.

Disclosure of Invention

The embodiment of the application provides an object classification method, a training method, a device, equipment and a storage medium, so as to obtain higher classification precision.

According to a first aspect of embodiments of the present application, an object classification method includes the following steps:

obtaining an object to be classified;

inputting the object into a classification model, and outputting a classification result; the classification model is obtained by the following steps:

constructing a first objective function of a neural network model based on a sparse optimization mode, wherein the first objective function has a zero-modulus norm of parameters of the neural network model;

converting the zero norm into an equivalent continuous representation;

obtaining a continuous second objective function according to the continuous representation and the first objective function;

and training the neural network model based on the second objective function to obtain a trained classification model.

According to a second aspect of embodiments of the present application, an object classification apparatus includes:

the acquisition module is used for acquiring an object to be classified;

the classification module is used for inputting the object into a classification model and outputting a classification result; the classification model is obtained by the following steps:

converting the zero norm into an equivalent continuous representation;

According to a third aspect of an embodiment of the present application, a method for training an object classification model includes the following steps:

converting the zero norm into an equivalent continuous representation;

According to a fourth aspect of embodiments herein, an apparatus comprises:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, the at least one program causes the at least one processor to implement the object classification method of the first aspect or the training method of the third aspect.

According to a fifth aspect of embodiments herein, a computer-readable storage medium has stored thereon a processor-executable program for implementing the object classification method of the first aspect or the training method of the third aspect when executed by a processor.

The technical scheme provided by the embodiment of the application constructs a first target function of a neural network model based on a sparse optimization mode in the process of training a classification model, converts a zero norm of parameters of the neural network model into equivalent continuous representation, obtains a continuous second target function according to the continuous representation and the first target function, finally trains the classification model according to the second target function, utilizes the equivalent transformation of the zero norm to train and obtain a second target function which is equivalent to and continuous with the first target function, can adopt a back propagation method to train the pruned classification model based on the continuous second target function, has more simplified structure and low complexity, can reduce occupied data processing resources, has low calculation difficulty, obtains the pruned classification model in the training process through the equivalent transformation mode and is used for object classification, and approximation errors do not need to be introduced, so that higher classification precision can be obtained.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present application or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of an object classification method according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating an example of a classification model obtaining method according to the present application;

FIG. 3 is a flowchart illustrating the continuous second objective function acquisition in an embodiment of the object classification method of the present application;

FIG. 4 is a flowchart of training a neural network model based on a second objective function in an embodiment of the object classification method of the present application;

FIG. 5 is a flow chart illustrating a training process of an alternate optimization method according to an embodiment of the object classification method of the present application;

FIG. 6 is a schematic diagram of a pruning flow of two adjacent stages in an embodiment of the subject classification method of the present application;

FIG. 7 is a flowchart of image classification using the object classification method of the present application;

FIG. 8 is a flowchart illustrating a method for training an object classification model according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an object classification apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an apparatus according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

First, the terms of the related nouns referred to in the embodiments of the present application are introduced and explained:

artificial Intelligence (AI): a theory, method, technique and application system for simulating, extending and expanding human intelligence, sensing environment, acquiring knowledge and using knowledge to obtain optimal results by using a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision (CV): the method is a science for researching how to make a machine see, and particularly refers to replacing human eyes with a camera and a computer to perform machine vision such as identification, tracking, measurement and the like on a target, and further performing graphic processing, so that the computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Deep Learning (DL): deep learning is a new research direction in the field of machine learning, and is used for learning the intrinsic rules and expression levels of sample data, and information obtained in the learning process greatly helps interpretation of data such as characters, images and sounds. The final aim of the method is to enable the machine to have the analysis and learning capability like a human, and to recognize data such as characters, images and sounds.

Zero modulo (L0) norm the number of non-zero elements in a matrix or vector.

L1 norm: the sum of the absolute values of the individual elements in the matrix or vector.

A sparse optimization mode: refers to a method for solving the optimal solution (i.e. the optimal sparse solution) of the sparse problem. At present, sparse optimization methods can be roughly divided into: the greedy algorithm aiming at the L0 norm minimization problem is to directly solve the L0 norm; the optimization algorithm aiming at the L1 norm minimization problem is to relax the L0 norm to the L1 norm for approximate solution; the statistical optimization algorithm estimates the rarest solution from the mathematical expectation.

Mask parameters: the binary matrix is used for controlling whether model pruning is performed or not, the binary matrix represents that the model pruning is required when the element of the binary matrix is 0, and represents that the model pruning is not required when the element of the binary matrix is 1.

Dual parameters: parameters for controlling the sparseness of mask parameters in order to transform the mask parameters into an equivalent continuous representation for training using back-propagation algorithms.

Hadamard (Hadamard) product: the product of the corresponding elements of the two matrices. m x n matrix a ═ a_ij]And m x n matrix B ═ B_ij]Has a Hadamard product of A ⊙ B, the elements in the matrix A ⊙ B (A ⊙ B)_ij＝a_ij.b_ijI and j are the row number and column number of matrix a or B, respectively, i is 1,2,3 … … m; j is 1,2,3 … … n.

The penalty function method is that the basic idea is to construct an auxiliary function F and convert the original constraint problem into the unconstrained problem of minimizing the auxiliary function. The auxiliary function F needs to satisfy the optimal solution equivalence conditions as follows: the value of the internal part of the feasible region is the same as that of the original problem, and the value of the external part of the feasible region is far greater than that of the objective function.

Mpec (physical programming with equivalent composites): mathematical programming with balance constraints.

As described above, in classification tasks such as image classification, the deep learning model that executes the classification task by using the model pruning technique is compressed, which is very important for reducing the number of model parameters, reducing the amount of model computation, and reducing resources (such as computer memory) occupied by the classification problem. In the prior art, a sparse optimization idea can be adopted for model pruning, but due to the introduction of sparse constraint/regular terms, the model pruning problem is a discontinuous non-convex optimization problem, so that a back propagation method cannot be adopted for training a pruning model. Although the prior art can adopt an approximation method to approximate sparse constraint/regular term, a certain approximation error exists between the pruned classification model obtained by the approximation method and the original classification model, and the classification precision is reduced. Therefore, the embodiment of the application provides an object classification method, an object classification training device, an object classification equipment and a storage medium, wherein when a classification model for object classification is trained, a first objective function of a neural network model is constructed based on a sparse optimization mode, zero norm numbers of parameters of the neural network model are converted into equivalent continuous representation, a continuous second objective function is obtained according to the continuous representation and the first objective function, and finally the classification model is trained according to the second objective function. Therefore, based on the continuous second objective function, the classification model after pruning can be trained by adopting a back propagation method, the obtained classification model is more simplified in structure and low in complexity, occupied data processing resources can be reduced, the calculation difficulty is low, the classification model after pruning is obtained in the training process through an equivalent transformation mode and is used for object classification, approximate errors do not need to be introduced, and therefore higher classification precision can be obtained. According to the scheme, the classification models with fewer parameters (higher compression rate) and higher precision can be obtained, the classification models can be directly deployed on equipment with limited computing capability, such as a mobile terminal or edge equipment, and the like, and have higher classification speed, and the requirements for deploying the equipment are reduced.

The scheme provided by the embodiment of the application relates to technologies such as deep learning of machine learning in artificial intelligence, and is specifically explained by the following embodiments:

the embodiment of the application provides an object classification method, which is used for realizing classification tasks of objects such as images and the like based on a neural network model and a model pruning technology, and can be applied to a terminal, a server or an application scene consisting of the terminal and the server. The object method may be software running in a terminal or a server, such as an application having an object classification function. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like.

Taking an application scene composed of a terminal and a server as an example, in the application scene, the server is used for executing the classification method provided by the embodiment of the application, training a classification model for object classification, and sending the classification model to the terminal, and the terminal is used for operating the classification model to realize the classification and recognition functions of objects such as images, voice and the like. In order to reduce resources occupied by classification problems and save cost, a classification model trained by the server can adopt a classification model after model pruning, that is, the process of the server for carrying out classification model training based on a neural network model is also the process of model pruning, and the model pruning training can be carried out according to given samples and labels to obtain the classification model after model pruning and send the classification model to the terminal.

Specifically, the server in the embodiment of the application performs classification model training after model pruning in a sparse optimization manner, first obtains a given training sample, a given label and an initial value of the neural network model, performs iterative training of the classification model according to the given training sample, the given label and the initial value of the neural network model until parameters of the current neural network model converge, and uses the neural network model when the parameters converge as the classification model after model pruning. When the server carries out iterative training of the classification model, the loss function (used for determining the difference value between the classification result of the current model parameter and the label value) is continuous. The continuous loss function (i.e. the second objective function) can be determined by using the idea of sparse optimization, the penalty function method, the correlation theory of MPEC, and the duality of the zero-norm, and the specific determination process is as follows:

first, the server in the embodiment of the present application may adopt a sparse optimization mode to convert model pruning of the neural network model into an optimization problem (i.e., a first objective function) with a sparse constraint according to a given training sample and a given label, where the sparse constraint is an inequality constraint including a zero-norm, is discontinuous, and cannot be trained in a back propagation mode. Then, the server decomposes the model parameters of the classification model into Hadamard products of the weight parameters and the mask parameters, and substitutes the Hadamard products into the first objective function to obtain an equivalent representation of the first objective function, wherein the equivalent representation contains inequality constraints of zero modulus norm of the mask parameters, is still discontinuous and cannot be trained by applying a back propagation mode. Then, the server in the embodiment of the present application obtains the penalty problem representation form containing the zero norm of the mask parameter by using a penalty function method, and removes the inequality constraint, but the penalty problem representation form still contains a non-convex non-smooth term, which is the zero norm of the mask parameter, and the gradient still cannot be solved, so that the end-to-end training cannot be performed. Finally, the server of the embodiment of the application utilizes the correlation theory of MPEC and the duality of the zero norm of the mask parameter, converts the non-convex and non-smooth term of the zero norm of the mask parameter into an equivalent continuous representation form by introducing the duality parameter, and substitutes the continuous representation form into the penalty problem representation form, thereby obtaining a continuous second objective function.

Therefore, when the server carries out the iterative training of the pruned classification model, the loss value can be calculated according to the continuous second objective function, so that the training can be directly carried out by applying a back propagation algorithm, the end-to-end training is realized, the introduction of approximate errors is not needed, and the classification precision is improved.

After the classification model after the model pruning is trained by the server, the classification model after the model pruning can be sent to the terminal, compared with the classification model without the model pruning, the classification model after the model pruning is smaller in model size, the processing capacity of the terminal is enough to bear and operate the model after the model pruning, and then the terminal can realize the function of classifying objects such as images, voice and the like based on the model after the model pruning.

It should be understood that the above application scenario is only an example, and in practical applications, in addition to training the classification model after model pruning by using the server, other devices with model training capability, such as a terminal, may also be used to train the classification model after model pruning; in addition, other devices except the terminal device may be used to carry the classification model after training model pruning, and no limitation is made to the application scenario of the object classification method provided in the embodiment of the present application.

The object classification method provided by the present application is described below by way of an embodiment.

As shown in fig. 1, the method comprises the following steps S100-S101:

s100, obtaining an object to be classified;

in the embodiment of the application, the objects to be classified can be objects of classification tasks such as images, videos, texts and voices, and can be flexibly selected according to actual requirements. The obtaining method of the object to be classified may be a method in which the user inputs the object to the terminal through a human-computer interaction method (such as a touch method, a key input method, and the like), may also be a method in which the object is crawled from the internet, may also be provided by third-party software (such as image processing software), and the like, and is not specifically limited herein.

S101, inputting the object to be classified into a classification model, and outputting a classification result.

According to the method and the device, the classification model for object classification can be trained in advance according to given training samples and labels. Therefore, the object to be classified is input into the trained classification model, and the type of the object can be identified or predicted through the classification model to serve as a classification result. As described above, in order to reduce the parameters of the classification model and the occupied resources such as the computer memory, in the process of training the classification model, the classification model after pruning (i.e., compression) may be trained by using an improved model pruning method according to the embodiments of the present application. As shown in fig. 2, the classification model may be obtained through the following steps S1010-S1013:

s1010, a first objective function of the neural network model is constructed based on a sparse optimization mode, wherein the first objective function has a zero-modulus norm of parameters of the neural network model.

Specifically, according to the correlation theory and the prior knowledge of the sparse optimization mode, it can be known that model pruning (including channel pruning and weight pruning) can be abstracted into the following optimization problem with sparse constraint:

min_Wf(W,(X,Y))s.t.‖W‖₀≤κ (1)

wherein f represents a first objective function of the neural network model, min is a function for obtaining a minimum value, and W represents parameters (such as weight and offset) of the neural network model. X and Y respectively represent a training sample and a corresponding label, s.t. is constraint, and k is a control variable of the pruning degree of the model.‖‖₀Is a zero modulo norm, | W |)₀Is a zero modulo norm of W.

The Neural network model may be a classic network model such as vgnet (visual geometry group network) model, resnet (residual Neural network) model, google net (google network) model, densnet (loss probabilistic network) model, or other network models including Convolutional layers.

And S1011, converting the zero norm of the parameters of the neural network model into equivalent continuous representation.

From the formula (1), we can see | W |₀Is discontinuous, for subsequent conversion of the first target function of equation (1) to a continuous function, W may be expressed as the Hadamard product of the mask parameter M and the weight parameter N, W:m ⊙ N, thus | W |₀Is equivalent to M ⊙ N |₀Kappa and M ⊙ N |)₀The necessary condition that ≦ kappa holds is | M | |)₀κ, therefore, the problem of equation (1) above can be converted to the equivalent of:

min_N,Mf(M⊙N,(X,Y))s.t.‖M‖₀≤κ (2)

and M is a mask parameter and is used for controlling whether model pruning is carried out, wherein when the element of M is 0, model pruning is required, and when the element of M is 1, model pruning is not required.

The penalty problem in equation (2) is expressed in the form:

min_N,Mf(M⊙N,(X,Y))+λ||M||₀(3)

wherein the regular term λ is a first penalty parameter and also belongs to a control variable of the pruning degree of the model.

The two problems of equation (2) and equation (3) above are equivalent under the condition that the optimal solution equivalence is satisfied. However, since non-convex and non-smooth | | | M | | non-woven phosphor₀In existence, neither of the above two problems can solve the gradient (because | | M | non-calculation phosphor particles₀Non-continuous, non-conductive) and thus cannot be trained end-to-end.

And pre-programmed according to M₀Dual nature of (1) and mathematical programming facies with balance constraintsBy the theory, I M I non-woven phosphor can be obtained₀The following optimization problem is used to characterize (i.e., a continuous representation of the zero modulo norm of the mask parameter M can be obtained):

||M||₀＝min_V{<E-V,E>s.t.<V,|M|>＝0,0≤V≤E} (4)

wherein V is a dual parameter for controlling the sparsity of M; e is a matrix or vector with all element values 1, <, > is a vector inner product, < V, | M | > is 0 is a complementary constraint obtained according to MPEC, | M | is an absolute value of M (i.e., a result of each element in M solving for an absolute value), and represents the meaning: if the value of the element in V is 0, the corresponding element in M also needs to be 0, and if the value of the element in V is 1, no requirement is made on the corresponding element in M, so that the non-zero quantity of M can be controlled through V; since < V, | M | > is 0, which is the orthogonality of vectors, V may also be referred to as an orthogonal mask parameter. 0 ≦ V ≦ E indicates that each element value in V is between 0 and 1.

And S1012, obtaining a continuous second objective function according to the equivalent continuous representation and the first objective function.

Specifically, as shown in fig. 3, step S1012 may further include the following steps S10121-S10122:

s10121, based on the weight parameters and the mask parameters, obtaining a penalty problem expression equivalent to the first objective function by using a penalty function method: namely, combining the formulas (1) and (2) to obtain the formula (3).

S10122, the equivalent continuous representation (i.e., the vector inner product of equation (4)) is substituted into the penalty problem expression (3), and the second objective function is obtained.

Specifically, in conjunction with equations (3) and (4) above, the pruning problem of equation (1) (i.e., the first objective function) can be equivalently rewritten as the following continuous optimization problem (i.e., the second objective function):

s.t.<V,|M|>＝0,0≤V≤E

the problem of equation (5) has three variables, weight parameter N of the neural network, mask parameter M, and dual parameter (i.e., orthogonal mask parameter) V.

And S1013, training the neural network model based on the second objective function to obtain a trained model serving as a classification model.

Specifically, as shown in fig. 4, step S1013 may further include the following steps S10131-S10132:

and S10131, translating the equation constraint in the second objective function to the second objective function.

Due to the existence of the equation constraint < V, | M | > in the second objective function equation (5), the common machine learning algorithm cannot effectively solve the above problem. For this purpose, the equation constraint can be converted to the second objective function by a penalty function method, an augmented lagrange function method, or the like, so as to solve the above-mentioned continuous pruning model formula (5). Taking as an example the solution of the above-mentioned continuous pruning model formula (5) by using a penalty function method, it penalizes the equation constraint to the second objective function by linearity, that is, there are:

s.t.0≤V≤E

and rho is a second penalty parameter used for controlling whether the complementary constraint is met.

According to the theory related to the penalty function, it can be known that the global optimization of the penalty problem formula (6) and the model pruning problem formula (5) is completely equivalent under the condition of satisfying the equivalence of the optimal solution, so that the penalty problem formula (6) is equivalent to the original pruning model problem formula (1). It can be seen that the problem of equation (6) is a continuous optimization problem with simple constraints, which can be solved by using a back propagation method.

S10132, training the neural network model by adopting an alternate optimization mode through a training set based on the transformed second objective function, and obtaining the neural network model meeting the convergence condition as a trained classification model.

In a possible implementation manner, in order to save the training time for the classification model, a preset maximum stage number can be preset_maxWhen the number of training stages reaches the preset maximum number of stages, the classification model is considered to have satisfied the convergence condition.

In general, the maximum number of stages can be set_maxThe setting is 50, when the training reaches 50 stages, the classification model can be considered to meet the convergence condition, and the training of the classification model can be finished; of course, in practical application, the maximum stage number may be preset_maxThe values are set to other values, and are not limited herein.

In another possible implementation manner, the convergence condition may be set such that a loss value of the current loss function (a difference value between the prediction result of the classification model determined by the current training parameter and the label) is smaller than a preset threshold value.

It should be understood that, besides the above two convergence conditions, other conditions may be set as the convergence conditions according to actual requirements, and the convergence conditions are not specifically limited herein.

As shown in fig. 5, step S10132 may further include the following steps S101321-S10133:

s101321, fixing mask parameters and dual parameters, and updating weight parameters by adopting a gradient descent algorithm;

s101322, fixing the weight parameters and the dual parameters, and updating mask parameters by adopting a gradient descent algorithm;

s101323, fixing the weight parameter and the mask parameter, and updating the dual parameter.

Specifically, the Gradient Descent algorithm may employ an SGD (Stochastic Gradient Descent) algorithm, an adaptive moment estimation (Adam) algorithm, or the like.

For the three variables of the weight parameter N, the mask parameter M, and the dual parameter (i.e., the orthogonal mask parameter) V of equation (6), an alternate optimization technique can be adopted to solve. Namely, firstly, fixing variables M and V to optimize and update a weight parameter N of the neural network; fixing N and V to update the mask parameter M of the neural network; and fixing M and N to update the variable V to control the sparseness degree of the mask parameters. According to the above process, the whole training process of the classification model can be known to be guided, so that the back propagation algorithm can be directly adopted for automatic training.

The optimization process of solving the model of equation (6) can be summarized as the pseudo code of algorithm 1 below:

in the algorithm 1, the iteration step 2-4 is to learn the weight parameter N of the neural network, the iteration step 5-8 is to learn the mask parameter M to perform model pruning, and the iteration step 9-10 is to estimate the dual parameter (i.e. orthogonal mask parameter) V to ensure that the mask parameter M has better sparsity. With the increase of the training phase, the compressed model with less parameter amount and higher precision and more accurate mask parameter estimation can be obtained step by step.

The pruning flow of algorithm 1 from the kth stage to the K +1 stages at the ith hidden layer is shown in fig. 6. In fig. 6, c represents the weight link or channel number between different neurons, weight is the weight parameter N, mask is the mask parameter M, and orthogonal mask is the dual parameter V. From Algorithm 1, it can be seen that when the mask parameters are

At that time (the region of the dashed box at mask in fig. 6 indicates that the element value of the mask parameter is 0), the neural network weights or corresponding channels in the corresponding positions in the next stage will be pruned (the region of the dashed box at weights in fig. 6 indicates that the weights/channel parameters are pruned), for example, the K stage in fig. 6

Of the K +1 th stage

Is cut off. On the contrary, since the model pruning process of the present embodiment is dynamically staged and the dual parameters V control the mask parameters M through complementary constraints, when the weights of the neural network or some channels are pruned, the model pruning process is performed afterIn the next stage of training, the corresponding mask parameters may be updated to non-zero values again (according to the complementary constraint, when the element in V is not 0, the corresponding element in M may become 1), so that the weight or channel of the corresponding network structure may be restored again, which indicates that the pruning flow in the previous stage is wrong, and the algorithm automatically repairs the network structure. After the algorithm 1 is trained in multiple stages, stable mask parameters and dual parameters can be obtained, so that an optimal pruning model is obtained.

In the object classification method, a multi-stage end-to-end model pruning and object classification scheme is provided in the process of training the classification model of the object. The scheme starts from the most original classification model with sparse constraint/regularization term, and firstly, the dual property of zero-mode norm and the MPEC theory are utilized to obtain the zero-mode norm number M | in the sparse constraint/regularization term₀Equivalent sequential representation. This equivalent continuous representation is then substituted into the original classification model, resulting in a classification model with a continuous objective function. The corresponding classification model can be obtained by directly applying a back propagation algorithm to the classification model for optimization solution (i.e. training). The classification model in the scheme is completely equivalent to the original classification model, other approximate errors are not introduced, and the classification precision can be improved. Actual experimental results show that the absolute accuracy of the classification method is improved by 0.8% compared with the conventional classification algorithm using L1 norm pruning. In addition, the classification model in the scheme is a continuous non-convex optimization model, and can be directly solved by adopting a back propagation skill, so that the calculation difficulty is greatly reduced; and equivalent continuous classification models can be solved by methods such as a penalty function method and the like in an alternate optimization mode, so that the process of training model classification from end to end can be carried out.

The end-to-end model pruning scheme provided by the scheme does not need to train a high-precision neural network model in advance and then prune, can dynamically carry out the pruning process in the process of training the neural network, and automatically adjusts the quality of pruning mask parameters through orthogonal mask parameters (namely dual parameters), thereby ensuring the accuracy of the classification model after pruning and improving the automation degree of the model compression process. The scheme can be widely applied to classification tasks such as image classification tasks, video classification tasks, text classification tasks, voice classification tasks and the like in the real world, and the end-to-end training method greatly reduces the workload of manually adjusting pruning parameters. The method can be directly deployed on a mobile terminal or edge equipment, and has higher inference speed, so that the application scene of the neural network model is greatly expanded, and powerful support is provided for the rapid landing of the deep learning model.

The following description will be made, with reference to fig. 7, on how to perform a brief flow of the object classification method provided by the embodiment of the present application, by taking the application of the method of the present application to the image classification task as an example:

s701: acquiring an image to be classified;

s702: and inputting the image to be classified into an image classification model, and obtaining an image classification result as output.

The image classification model in step S702 is a classification model after model pruning, and is obtained through the classification model training process of the object classification method. Specifically, the image classification model of step S702 can be obtained by alternately training by using the foregoing algorithm 1, and the specific implementation process is as follows:

1) determining the structure of a neural network to be trained (such as a ResNet network), and setting a weight parameter N, a mask parameter M and an initial value of a dual parameter V of the initial neural network;

2) acquiring a plurality of training images X and labels Y thereof;

3) fixing the mask parameter M and the dual parameter V unchanged, and updating the weight parameter N of the neural network by adopting a gradient descent method according to the training image X and the label Y thereof until the maximum training algebraic epoch is reached_maxThe updated expression is:

wherein η is the learning rate of the weight parameter,

represents the gradient of N;

4) the fixed weight parameter N and the dual parameter V are unchanged, and the mask parameter M of the neural network is updated by adopting a gradient descent method according to the training image X and the label Y thereof until the maximum training algebraic epoch is reached_maxThe updated expression is:

wherein gamma is the learning rate of mask parameters,

represents the gradient of M;

5) the fixed weight parameter N and the mask parameter M are unchanged, the dual parameter V is updated according to the training image X and the label Y thereof, and the dual parameter V is updated

Namely, it is

Wherein, V_iIs the ith element in V, i is a natural number;

6) increasing the penalty parameter (the purpose of increasing the penalty parameter is to make the complementary constraint gradually hold, if a very large penalty parameter is set at the beginning, the algorithm solution is unstable, so the method of the present embodiment is to increase the penalty parameter step by step in the process of algorithm iteration), and the new penalty parameter is 1.1 × ρ;

7) repeating the above processes 3) to 6) until the maximum number of training stages stage is reached_maxPost-outputting parameters W of the trained network^*The pruned image classification model was obtained as M ⊙ N.

Referring to fig. 8, an embodiment of the present application further discloses a training method for an object classification model, including the following steps:

s801, constructing a first objective function of a neural network model based on a sparse optimization mode, wherein the first objective function has a zero-modulus norm of parameters of the neural network model;

s802, converting the zero norm into equivalent continuous representation;

s803, obtaining a continuous second objective function according to the equivalent continuous representation and the first objective function;

s804, training the neural network model based on the second objective function to obtain a trained classification model.

In the method and the device for object classification, in the process of training a classification model for object classification, a first objective function of a neural network model is constructed based on a sparse optimization mode, then a zero norm of parameters of the neural network model is converted into equivalent continuous representation, then a continuous second objective function is obtained according to the continuous representation and the first objective function, finally a model after pruning is trained according to the second objective function, and a second objective function which is equivalent to and continuous with the first objective function is obtained by training through equivalent conversion of the zero norm. Therefore, based on the continuous second objective function, the classification model after pruning can be trained by adopting a back propagation method, the obtained classification model is more simplified in structure and low in complexity, occupied data processing resources can be reduced, the calculation difficulty is low, the classification model after pruning is obtained in the training process through an equivalent transformation mode and is used for object classification, approximate errors do not need to be introduced, and therefore higher classification precision can be obtained. According to the scheme, the classification models with fewer parameters (higher compression rate) and higher precision can be obtained, the classification models can be directly deployed on equipment with limited computing capability, such as a mobile terminal or edge equipment, and the like, and have higher classification speed, and the requirements for deploying the equipment are reduced.

Referring to fig. 9, an embodiment of the present application further discloses an object classification apparatus, including:

an obtaining module 901, configured to obtain an object to be classified;

a classification module 902, configured to input the object into a classification model and output a classification result; wherein the classification model is obtained by the following steps:

constructing a first objective function of the neural network model based on a sparse optimization mode, wherein the first objective function has a zero-modulus norm of parameters of the neural network model;

converting the zero norm into an equivalent continuous representation;

obtaining a continuous second objective function according to the equivalent continuous representation and the first objective function;

and training the neural network model based on the second objective function to obtain a classification model which is well cut and trained.

The contents in the classification method embodiment shown in any one of fig. 2 to 6 are all applicable to the present apparatus embodiment, the functions implemented by the present apparatus embodiment are the same as those in the classification method embodiment shown in any one of fig. 2 to 6, and the advantageous effects achieved by the present apparatus embodiment are also the same as those achieved by the classification method embodiment shown in any one of fig. 2 to 6.

Referring to fig. 10, an embodiment of the present application further discloses an apparatus, including:

at least one processor 1001;

at least one memory 1002 for storing at least one program;

when executed by the at least one processor 101, the at least one program causes the at least one processor 1002 to implement the classification method embodiment as shown in any of fig. 2-6 or the training method as shown in fig. 8.

The content in the classification method embodiment shown in any one of fig. 2 to 6 or the training method embodiment shown in fig. 8 is applicable to the present apparatus embodiment, the functions implemented by the present apparatus embodiment are the same as those in the classification method embodiment shown in any one of fig. 2 to 6 or the training method embodiment shown in fig. 8, and the beneficial effects achieved by the present apparatus embodiment are the same as those achieved by the classification method embodiment shown in any one of fig. 2 to 6 or the training method embodiment shown in fig. 8.

The embodiment of the application also discloses a computer readable storage medium, on which a program executable by a processor is stored, wherein the program executable by the processor is used for implementing the classification method embodiment shown in any one of fig. 2-6 or the training method shown in fig. 8 when being executed by the processor.

The contents of the embodiment of the classification method shown in any one of fig. 2 to 6 or the embodiment of the training method shown in fig. 8 are all applicable to the embodiment of the storage medium, the functions implemented by the embodiment of the storage medium are the same as those of the embodiment of the classification method shown in any one of fig. 2 to 6 or the embodiment of the training method shown in fig. 8, and the beneficial effects achieved by the embodiment of the classification method shown in any one of fig. 2 to 6 or the embodiment of the training method shown in fig. 8 are also the same as those achieved by the embodiment of the classification method shown in any one of fig. 2 to 6 or the embodiment of the training method.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the method provided in the various alternative implementations shown in any one of fig. 2-6 or shown in fig. 8.

It will be understood that all or some of the steps, systems of methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

The embodiments of the present application have been described in detail with reference to the drawings, but the present application is not limited to the embodiments, and various changes can be made without departing from the spirit of the present application within the knowledge of those skilled in the art.

Claims

1. An object classification method, characterized by: the method comprises the following steps:

obtaining an object to be classified;

converting the zero norm into an equivalent continuous representation;

2. The object classification method according to claim 1, characterized in that the object comprises any one of an image, a video, a text or a speech.

3. The object classification method according to claim 1 or 2, characterized in that it further comprises the steps of:

and decomposing parameters of the neural network model into Hadamard products of weight parameters and mask parameters, wherein the mask parameters are used for controlling whether model pruning is carried out, and the sparsity degree of the mask parameters is controlled by dual parameters.

4. The object classification method according to claim 3, characterized in that said continuous representation is a vector inner product equivalent to said zero norm, said deriving a continuous second objective function from said continuous representation and said first objective function comprises:

obtaining a penalty problem expression equivalent to the first objective function by utilizing a penalty function method based on the weight parameter and the mask parameter;

and substituting the vector inner product into the penalty problem expression to obtain the second objective function.

5. The method of claim 3, wherein the second objective function includes an equality constraint between the mask parameter and the dual parameter, and the training the neural network model based on the second objective function to obtain a trained classification model comprises:

translating the equality constraint onto the second objective function;

and training the neural network model by adopting an alternate optimization mode through a training set based on the converted second objective function to obtain the neural network model meeting the convergence condition as the classification model.

6. The object classification method according to claim 5, characterized in that said translating said equality constraint onto said second objective function comprises:

and linearly penalizing the equation constraint to the second objective function through a penalty function method.

7. The object classification method according to claim 5, characterized in that the training of the neural network model by means of alternating optimization through a training set comprises:

fixing the mask parameters and the dual parameters, and updating the weight parameters by adopting a gradient descent algorithm;

fixing the weight parameters and the dual parameters, and updating the mask parameters by adopting a gradient descent algorithm;

and fixing the weight parameters and the mask parameters, and updating the dual parameters.

8. A method for training an object classification model is characterized by comprising the following steps:

converting the zero norm into an equivalent continuous representation;

9. An object classification apparatus, comprising:

the acquisition module is used for acquiring an object to be classified;

converting the zero norm into an equivalent continuous representation;

10. An apparatus, comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-8.

11. A computer readable storage medium having stored thereon a program executable by a processor, wherein the program executable by the processor is adapted to implement the method according to any one of claims 1-8 when executed by the processor.