CN116957017A

CN116957017A - Acceleration method and device for neural network model in computer equipment

Info

Publication number: CN116957017A
Application number: CN202310790932.XA
Authority: CN
Inventors: 朱明旭
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-10-27

Abstract

The application discloses an acceleration method and device for a neural network model in computer equipment, and belongs to the technical field of artificial intelligence deep learning. The method comprises the following steps: acquiring a first neural network model; regularization training is carried out on the first neural network model to obtain a second neural network model with sparse parameters, wherein the regularization training refers to the steps of carrying out the sparse of the parameters in the first neural network model in a regularization processing mode, and carrying out model training based on the sparse parameters; pruning operation is carried out on the second neural network model to obtain a third neural network model after pruning, wherein the pruning operation refers to cutting of parameters in the second neural network model. By the method, the neural network model with smaller parameters can be obtained, so that a good effect is achieved on acceleration of the neural network model.

Description

Acceleration method and device for neural network model in computer equipment

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence deep learning, in particular to an acceleration method and device for a neural network model in computer equipment.

Background

With the continuous expansion of the research scope and research depth of neural network models, neural network models have shown absolute advantages over conventional methods in terms of prediction accuracy in various fields (such as speech recognition, machine translation, assisted medical treatment, automatic driving, etc.).

The neural network model can achieve higher prediction accuracy because the neural network model has the capabilities of high-level information extraction and multi-level expression. In general, the depth of the neural network model and the expression capability of the neural network model show a certain correlation, and the deeper the learning depth is, the better the network training effect is, and the higher the prediction accuracy is. On the other hand, however, the deeper the learning depth, the more network parameters and, in turn, the more memory space is required for the neural network model. Based on this, acceleration techniques for neural network models have evolved, including network pruning. Network pruning refers to deleting redundant structural information or parameter information in a complex neural network model on the premise of ensuring smaller performance loss so as to achieve the purpose of model compression. The current mainstream network pruning mostly depends on a manual pruning mode, and the pruning process is generally as follows: firstly, training an original neural network model through a training data set to obtain the trained original neural network model; then, the weight parameters with lower importance are cut out from the original neural network model, so that a sparse neural network model is obtained.

In the above technical solution, because the network structures of different neural network models have large differences, the pruning needs to be customized for each network structure, the pruning cannot be commonly used, the pruning effect cannot be ensured, and the prediction accuracy of the neural network models is greatly affected.

Disclosure of Invention

The application provides an acceleration method and device for a neural network model in computer equipment, wherein the technical scheme is as follows:

according to an aspect of the present application, there is provided a method for accelerating a neural network model in a computer device, the method comprising:

acquiring a first neural network model;

regularization training is carried out on the first neural network model to obtain a second neural network model with sparse parameters, wherein the regularization training refers to the steps of carrying out the sparse of the parameters in the first neural network model in a regularization processing mode, and carrying out model training based on the sparse parameters;

pruning operation is carried out on the second neural network model to obtain a third neural network model after pruning, wherein the pruning operation refers to cutting parameters in the second neural network model, and the parameter quantity in the third neural network model is smaller than the parameter quantity in the second neural network model.

According to an aspect of the present application, there is provided an acceleration apparatus for a neural network model in a computer device, the apparatus comprising:

the acquisition module is used for acquiring the first neural network model;

the training module is used for carrying out regularization training on the first neural network model to obtain a second neural network model with sparse parameters, wherein the regularization training refers to carrying out the sparse of the parameters in the first neural network model in a regularization processing mode and carrying out model training based on the sparse parameters;

the pruning module is used for pruning the second neural network model to obtain a pruned third neural network model, wherein the pruning operation refers to the cutting of parameters in the second neural network model, and the parameter quantity in the third neural network model is smaller than the parameter quantity in the second neural network model.

According to another aspect of the present application, there is provided a computer apparatus comprising: a processor and a memory, the memory storing at least one computer program, the at least one computer program loaded and executed by the processor to implement the acceleration method for neural network models in a computer device as described in the above aspect.

According to another aspect of the present application, there is provided a computer storage medium having stored therein at least one computer program, the at least one computer program being loaded and executed by a processor to implement the method of accelerating a neural network model in a computer-oriented device as described in the above aspect.

According to another aspect of the present application, there is provided a computer program product comprising a computer program stored in a computer readable storage medium; the computer program is read from the computer-readable storage medium and executed by a processor of a computer device, causing the computer device to perform the acceleration method for a neural network model in a computer device as described in the above aspect.

The technical scheme provided by the application has the beneficial effects that at least:

acquiring a first neural network model; regularization training is carried out on the first neural network model, and a second neural network model with sparse parameters is obtained through training; and pruning the second neural network model to obtain a pruned third neural network model. According to the method, the regularization is introduced to conduct training constraint on parameters in the first neural network model, the second neural network model with sparse parameters is obtained, then parameters in the second neural network model are cut, and the third neural network model with smaller parameters is obtained, so that a good effect is achieved on acceleration of the neural network model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an acceleration method for a neural network model in a computer device, according to an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of the architecture of a computer system provided by an exemplary embodiment of the present application;

FIG. 3 is a flow chart of a method for accelerating a neural network model in a computer device, in accordance with an exemplary embodiment of the present application;

FIG. 4 is a flow chart of a method for accelerating a neural network model in a computer device, in accordance with an exemplary embodiment of the present application;

FIG. 5 is a schematic diagram of parameter values of a batch normalization layer prior to regularization training provided by an exemplary embodiment of the application;

FIG. 6 is a schematic diagram of parameter values of a batch normalization layer after regularization training provided by an exemplary embodiment of the application;

FIG. 7 is a schematic diagram of a method for optimizing a neural network model, provided by an exemplary embodiment of the present application;

FIG. 8 is a schematic diagram of knowledge distillation of a third neural network model, provided in accordance with an exemplary embodiment of the present application;

FIG. 9 is a schematic diagram of a method for accelerating neural network models in a computer device, according to an exemplary embodiment of the present application;

FIG. 10 is a block diagram of an acceleration apparatus for a neural network model in a computer device, provided in accordance with an exemplary embodiment of the present application;

fig. 11 is a schematic structural view of a computer device according to an exemplary embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings. Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another.

First, a brief description will be made of terms involved in the embodiments of the present application:

artificial intelligence (Artificial Intelligence, AI): the system is a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Machine Learning (ML): is a multi-domain interdisciplinary, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. The pre-training model is the latest development result of deep learning, and integrates the technology.

Neural Network (NN) model: also called artificial neural network model, is a research hotspot in the field of artificial intelligence. The neural network model abstracts the human brain neural network from the angle of information processing, builds a certain simple model, and forms different networks according to different connection modes. The neural network model is an operation model, and is formed by interconnecting a plurality of nodes (or neurons). Each node represents a specific output function, called the excitation function (activation function). The connections between each two nodes represent a weight, called weight, which is equivalent to the memory of an artificial neural network. The output of the network is different according to the connection mode of the network, the weight value and the excitation function. Neural network models are typically approximations to certain algorithms or functions in nature, or expression of a logical strategy.

In recent years, the research of artificial neural networks has been in progress, and the artificial neural networks have successfully solved many practical problems which are difficult to solve by modern computers in the fields of pattern recognition, intelligent robots, automatic control, predictive estimation, biology, medicine, economy and the like, and have shown good intelligent characteristics.

A heavy weight model, a light weight model: the heavy-weight model (namely the first neural network model in the application) and the light-weight model (namely the third neural network model in the application) are a group of model types with opposite concepts, wherein the heavy-weight model is used for indicating models with good performance, large parameter quantity and high calculation complexity, and is usually deployed in server back-end equipment with high calculation capability, the server equipment is usually fixed in position and is a relatively large-volume equipment, and the number of electronic elements capable of being carried is large and the calculation capability is relatively high; the lightweight model is used to indicate some models with smaller parameters and relatively lower computational complexity, and is typically deployed in a mobile terminal device with relatively smaller computing power, and the mobile device is typically a portable device, such as: mobile equipment needs to meet the mobility capability, such as a mobile phone, a tablet personal computer, a wearable device and the like, so that the mobile equipment is relatively small in size, convenient to carry, and relatively low in computing capability, and the number and the size of electronic elements capable of being carried are limited.

Knowledge distillation (Knowledge Distillation, KD): the training of the Student model is guided by the Teacher model after the training, the learned implicit knowledge aiming at a specific task in the Teacher model (Teacher Network) with large depth and high precision can be migrated into the Student model (Student Network) with light weight, so that the Student model obtains better performance than the conventional training method, the final Student model is used as an application model to be deployed and online, the Teacher model is not deployed and online, and the Teacher model is the first neural Network model in the application, and the Student model is the third neural Network model in the application.

Knowledge distillation may include two ways, among others. One is Soft-label distillation (Soft-Label Distillation), which uses the probability distribution of the teacher model output as a smooth label to train the student model.

For example, a teacher graph neural network for protein structure prediction is additionally introduced in advance, the teacher graph neural network is a graph neural network which is trained in advance, a prediction probability distribution corresponding to a sample protein structure can be output by inputting the sample protein structure into the teacher graph neural network, and the prediction probability distribution is used as a smooth label to train a student graph neural network for protein structure analysis which is trained from scratch.

Another way is feature distillation (Feature Distillation), which adds regularization constraints in the feature representation space to the training of the neural network. The intermediate activation feature of the parameterized neural network can be directly used as a knowledge signal for feature distillation learning, or a feature transformation function can be designed to extract specific knowledge from a teacher model, for example, a feature attention map, a similarity map and the like.

For example, a teacher graph neural network for protein structure prediction is additionally introduced in advance, the teacher graph neural network is a graph neural network which is trained in advance, intermediate output characteristics of the network can be obtained by inputting a sample protein structure into the teacher graph neural network, and the intermediate output characteristics are used as knowledge signals for characteristic distillation learning to train the student graph neural network.

Pruning: the method is to systematically cut out partial weights from the original neural network model to lose accuracy as little as possible so as to reduce the quantity of parameters in the neural network model and further achieve the benefit of reducing resource occupation. Deep learning algorithms represented by convolutional neural networks (Convolutional Neural Network, CNN), cyclic neural networks (Recurrent Neural Network, RNN) and the like are widely used in the fields of machine vision, natural language processing, automatic driving, robots and the like, however, the complex structure thereof brings huge power consumption and occupation of computing resources (processors, memories, storage spaces and the like) during reasoning, and the deployment of the technology on mobile and embedded platforms with limited power consumption and resources is severely limited, so that the technology must be compressed to a certain extent before.

The embodiment of the application provides a schematic diagram of an acceleration method for a neural network model in computer equipment, and as shown in fig. 1, the method can be executed by the computer equipment, and the computer equipment can be a terminal or a server.

Illustratively, a computer device obtains a first neural network model 10; the computer equipment carries out regularization training on the first neural network model 10, and the regularization training is carried out to obtain a second neural network model 20 with sparse parameters; the computer device performs pruning operation on the second neural network model 20 to obtain a pruned third neural network model 30.

Alternatively, the first neural network model 10 is a trained neural network model, or the first neural network model 10 is an untrained neural network model, which is not particularly limited in the embodiment of the present application.

The regularization training refers to performing sparsification on parameters in the first neural network model 10 by a regularization processing manner, and performing model training on the first neural network model 10 based on the sparsified parameters.

Regularization refers to adding training constraints to the parameters of the network layer in the neural network model, i.e., the parameters of the network layer are constrained from random variation during training or iteration.

Thinning means that parameters in the neural network model are moved to 0, and the more the number of 0 in the parameters is, the more sparse the parameters are.

Pruning refers to clipping or deleting parameters in the second neural network model 20.

The amount of parameters in the third neural network model 30 is smaller than the amount of parameters in the second neural network model 20.

The regularization training comprises the following steps: the computer equipment carries out regularization treatment on the network layer in the first neural network model 10 to obtain a regular loss function value corresponding to the network layer; the computer device updates the model parameters of the first neural network model 10 based on the total loss function value formed by the first training loss function value and the regular loss function value, and obtains a second neural network model 20 with sparse parameters.

The first training loss function value is a loss function value obtained by updating the model parameters of the first neural network model 10 before the regularization training.

Optionally, before the computer device performs the first regularization training on the network layer in the first neural network model 10, the first training loss function value refers to an original loss function value for performing model parameter updating on the first neural network model 10; before the computer equipment performs the second regularization processing on the network layer in the first neural network model 10, the first training loss function value refers to a total loss function value for performing model parameter updating on the first neural network model 10 after the first regularization training; before the computer equipment performs the n-th regularization treatment on the network layer in the first neural network model 10, the first training loss function value refers to the total loss function value of model parameter updating on the first neural network model 10 after the n-1-th regularization training; n is a positive integer.

Illustratively, the neural network model includes an input layer 40, a network layer 50, and an output layer 60. The network layer 50 includes a convolution layer (Convolutional layer) and a batch normalization layer.

The convolution layers are used to extract different features from the input layer 40, e.g., a first layer convolution layer may only extract some low-level features, such as edges, lines, and corners, and a later convolution layer may iteratively extract more complex features from the low-level features. Each convolution layer is composed of a plurality of convolution units, and parameters of each convolution unit are optimized through a back propagation algorithm.

The batch normalization layer (Batch Normalization, BN) is used for carrying out standardized processing on the characteristics, so that the problem of unstable numerical values in the neural network is solved, the characteristics of the same batch are distributed similarly, and the network model is easy to train.

Illustratively, the computer device performs regularization processing on the parameters of the convolution layer and the parameters of the batch normalization layer in the first neural network model 10 to obtain a convolution regular loss function value corresponding to the convolution layer and a batch normalization regular loss function value corresponding to the batch normalization layer.

Optionally, the regularization process includes at least one of calculating a Smooth absolute loss function value smoothl 1, calculating a square loss function value L2, and calculating an absolute loss function value L1, but is not limited thereto, and the embodiment of the present application is not particularly limited thereto.

Optionally, the computer device calculates a smooth absolute loss function value corresponding to the weight matrix in the convolution layer in the first neural network model 10, to obtain a convolution regular loss function value corresponding to the convolution layer.

Optionally, the computer device calculates a square loss function value corresponding to the stretching parameter in the batch normalization layer in the first neural network model 10, to obtain a batch normalization regular loss function value corresponding to the batch normalization layer.

The computer equipment sums at least one of the convolution regular loss function value and the batch normalization regular loss function value with the first training loss function value to obtain a total loss function value; the computer device updates the model parameters of the first neural network model 10 based on the total loss function value, resulting in a second neural network model 20.

The second neural network model 20 is used to represent a network model obtained by thinning out the parameters of the first neural network model 10, that is, by approaching the parameters in the first neural network model 10 as close to 0 as possible.

Wherein, the second neural network model 20 is the same in parameter quantity and only the values of the parameters are different from those of the first neural network model 10.

The third neural network model 30 is compared to the second neural network model 20 in that the amount of parameters of the third neural network model 30 is smaller than the amount of parameters in the second neural network model 20.

The computer equipment only sums the convolution regular loss function value and the first training loss function value to obtain a total loss function value; the computer equipment updates the model parameters of the first neural network model 10 based on the total loss function value to obtain a second neural network model 20; or, the computer equipment only sums the batch normalized regular loss function value and the first training loss function value to obtain a total loss function value; the computer equipment updates the model parameters of the first neural network model 10 based on the total loss function value to obtain a second neural network model 20; or, the computer equipment sums the convolution regular loss function value and the batch normalization regular loss function value with the first training loss function value to obtain a total loss function value; the computer equipment updates the model parameters of the first neural network model 10 based on the total loss function value to obtain a second neural network model 20; the embodiment of the present application is not particularly limited thereto.

After obtaining the second neural network model 20 with the sparse parameters, the computer device deletes a part of parameters corresponding to the network layer in the second neural network model 20, and obtains a third neural network model 30 with the deleted parameters.

The deleting manner of the parameters corresponding to the network layer includes at least one of the following manners, but is not limited thereto:

deleting the parameters with the parameter values smaller than the threshold value;

deleting parameters corresponding to the network layer in proportion, for example, deleting 90% of the data volume;

deleting parameters corresponding to the network layer according to the size and proportion of the parameter values, for example, deleting 90% of data volume from small to large;

deleting parameters corresponding to the network layers according to the number, for example, deleting parameters corresponding to 10 network layers;

deleting parameters corresponding to the network layers according to the size and the number of the parameter values, for example, deleting parameters corresponding to 10 network layers from small to large;

deleting parameters corresponding to the network layer according to the parameter positions in the network layer, for example, deleting parameters corresponding to the edge positions in the network layer.

Optionally, the computer equipment takes absolute values of elements in the weight matrix corresponding to the single convolution layer and adds up the absolute values to obtain a weight matrix norm corresponding to the single convolution layer; the computer equipment sorts the weight matrix norms corresponding to at least two convolution layers according to the numerical values to obtain a weight matrix norms sequence; the computer device performs partial deletion on the weight matrix norms in the weight matrix norms according to the numerical values of the weight matrix norms, and obtains the third neural network model 30 based on the weight matrices corresponding to the rest weight matrix norms.

The weight matrix norm refers to the sum of the absolute values of the individual elements in the weight matrix.

Optionally, the computer equipment sorts the stretching parameters corresponding to the at least two batch normalization layers according to the numerical values to obtain a stretching parameter sequence; the computer device performs partial deletion of the stretching parameters in the stretching parameter sequence according to the values of the stretching parameters, and obtains the third neural network model 30 based on the rest stretching parameters.

In some embodiments, after obtaining the tailored lightweight third neural network model 30, the computer device model trains the third neural network model 30 based on the training samples, resulting in a parameter-adjusted third neural network model 30. The convolutional layer reserved after cutting in the third neural network model 30 is subjected to model training by taking the parameter value of the batch normalization layer as an initial value, so that the accuracy of the lightweight third neural network model 30 is improved.

It should be noted that, the computer device only deletes the weight matrix in the convolution layer, that is, the computer device only performs pruning operation on the convolution layer in the second neural network model, or the computer device only deletes the stretching parameter in the batch normalization layer, that is, the computer device only performs pruning operation on the batch normalization layer in the second neural network model, or the computer device simultaneously deletes the weight matrix in the convolution layer and the stretching parameter in the batch normalization layer, that is, the computer device may simultaneously perform pruning operation on the convolution layer and the batch normalization layer in the second neural network model.

In summary, in the method provided in this embodiment, the first neural network model is obtained; regularization training is carried out on the first neural network model, and a second neural network model with sparse parameters is obtained through training; and pruning the second neural network model to obtain a pruned third neural network model. According to the method, the regularization is introduced to conduct training constraint on parameters in the first neural network model, a second neural network model with sparse parameters is obtained, then parameters in the second neural network model are cut, and a third neural network model with smaller parameters is obtained, so that a good effect on acceleration of the neural network model is achieved, and accuracy of the neural network model is guaranteed.

FIG. 2 is a schematic diagram of a computer system according to an embodiment of the present application. The computer system may include: terminal 110 and server 120, wherein terminal 110 and server 120 are connected via communication network 130.

In some embodiments, the first neural network model of the weight class is pruned to obtain a third neural network model of the weight class. The method of the first neural network model by pruning acceleration may be executed by the terminal 110, the server 120, or the terminal 110 and the server 120 in an interactive and coordinated manner, which is not limited by the present application.

After accelerating the first neural network model through pruning operation, accuracy improvement is required to be performed on the third neural network model after pruning.

In some embodiments, the lightweight model 111 (i.e., the third neural network model in the present application) is deployed in the terminal 110, and the lightweight model 121 (i.e., the first neural network model in the present application) is deployed in the server 120; alternatively, both the lightweight model 111 and the lightweight model 121 are deployed in the server 120, and transferred to the terminal 110 for application when the lightweight model 111 is trained in the server 120.

Alternatively, the lightweight model 111 and the heavyweight model 121 are models that are identical in structure, but differ in the amount of parameters; or, the lightweight model 111 is a model obtained by parameter-clipping the lightweight model 121.

In some embodiments, off-line application off-line from the server 120 may be implemented after the lightweight model 111 is trained.

The lightweight model 111 is deployed in the terminal 110, the lightweight model 121 is deployed in the server 120, the lightweight model 111 is trained under the structure, the server 120 acquires a sample image, inputs the sample image into the lightweight model 121 to obtain a first identification result, sends the sample image into the terminal 110 through the communication network 130, inputs the sample image into the lightweight model 111 to obtain a second identification result, so that the terminal 110 sends the second identification result into the server 120, determines an adjustment mode for parameter adjustment of the lightweight model 111 based on the first identification result and the second identification result by the server 120, and feeds back the adjustment mode to the terminal 110.

In the above embodiments, taking the training of the lightweight model 111 in the terminal 110 as an example, in some embodiments, the lightweight model 111 is trained in the server 120, and after the server 120 acquires the sample image, the lightweight model 111 and the lightweight model 121 are respectively input, so that the parameters of the lightweight model 111 are adjusted based on the output recognition result, and after the training is completed, the lightweight model 111 is sent to the terminal 110.

The terminal 110 may be an electronic device such as a mobile phone, a tablet computer, a vehicle-mounted terminal (car), a wearable device, a personal computer (Personal Computer, PC), a palm image recognition device, a palm image recognition home appliance, a vehicle-mounted terminal, an aircraft, an unmanned vending terminal, or the like.

The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server providing cloud computing services, a cloud database, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), and a cloud server of basic cloud computing services such as a big data and manual palm image recognition platform. The server 120 may be a background server of the target application program, and is configured to provide background services for clients of the target application program.

Cloud technology (Cloud technology) refers to a hosting technology that unifies serial resources such as hardware, software, networks and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied by the cloud computing business mode, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

In some embodiments, the servers described above may also be implemented as nodes in a blockchain system. Blockchain (Blockchain) is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. The blockchain is essentially a decentralised database, and is a series of data blocks which are generated by association by using a cryptography method, and each data block contains information of a batch of network transactions and is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

Fig. 3 is a flowchart of a method for accelerating a neural network model in a computer device according to an exemplary embodiment of the present application. The method may be performed by a computer device, which may be the terminal or the server in fig. 2. The method comprises the following steps:

step 302: a first neural network model is acquired.

The first neural network model is also referred to as an artificial neural network model. The first neural network model abstracts the human brain neural network from the angle of information processing, builds a certain simple model, and forms different networks according to different connection modes. The first neural network model can be applied to the fields of pattern recognition, intelligent robots, automatic control, predictive estimation, biology, medicine, economy, images, voice, natural language processing and the like.

Optionally, the first neural network model is a neural network model that has been trained, or the first neural network model is a neural network model that has not been trained, which is not particularly limited by the embodiment of the present application.

The method in the embodiment of the application can be applied to products based on the neural network model, such as facial recognition products and intelligent traffic products, but is not limited in detail.

Step 304: regularization training is carried out on the first neural network model, and a second neural network model with sparse parameters is obtained through training.

The regularization training means that parameters in the first neural network model are thinned in a regularization processing mode, and model training is carried out on the first neural network model based on the thinned parameters.

The second neural network model is used to represent a network model obtained by thinning the parameters of the first neural network model, that is, by approaching the parameters in the first neural network model to 0 as much as possible.

The second neural network model is the same as the first neural network model in parameter quantity, and only the values of the parameters are different.

Illustratively, the computer equipment sparsifies parameters in the first neural network model in a regularization processing mode, after the parameters in the first neural network model are sparsified, the computer carries out iterative training on the first neural network model based on the sparsified parameters, and finally a second neural network model is obtained.

Step 306: and pruning the second neural network model to obtain a pruned third neural network model.

Pruning refers to clipping or deleting parameters in the second neural network model.

The parameter quantity in the third neural network model is smaller than the parameter quantity in the second neural network model.

Wherein the third neural network model is compared to the second neural network model, and the parameter amount of the third neural network model is smaller than the parameter amount in the second neural network model.

Illustratively, the computer device performs partial removal of parameters (also referred to as network weights) in the second neural network model, for example, removing parameters with low importance (or referred to as contribution degree of the model output result) in the second neural network model, to obtain a neural network model with sparse parameters and light weight (i.e., the third neural network model).

Fig. 4 is a flowchart of a method for accelerating a neural network model in a computer device according to an exemplary embodiment of the present application. The method may be performed by a computer device, which may be the terminal or the server in fig. 2. The method comprises the following steps:

step 402: a first neural network model is acquired.

The first neural network model is also referred to as an artificial neural network model. The first neural network model abstracts the human brain neural network from the angle of information processing, builds a certain simple model, and forms different networks according to different connection modes. The first neural network model can be applied to the fields of pattern recognition, intelligent robots, automatic control, predictive estimation, biology, medicine, economy and the like.

Step 404: and regularizing the network layer in the first neural network model to obtain a regularized loss function value corresponding to the network layer.

The network layer includes, but is not limited to, a convolution layer and a batch normalization layer; the regular loss function values include, but are not limited to, convolution regular loss function values and batch normalization regular loss function values.

The convolution layers are used to extract different features of the input layer input, e.g., a first layer convolution layer may only extract some low-level features, such as edges, lines, and corners, and later convolution layers may iteratively extract more complex features from the low-level features.

The batch normalization layer is used for carrying out standardized processing on the characteristics, so that the problem of unstable numerical values in the neural network is solved, the characteristics of the same batch are distributed similarly, and the network model is easier to train.

The regularization processing for the convolutional layer is as follows:

And the computer equipment carries out regularization treatment on the parameters of the convolution layer in the first neural network model to obtain a convolution regular loss function value corresponding to the convolution layer.

In some embodiments, the parameters in the convolutional layer include a weight matrix and the regularization process includes calculating a smoothed absolute loss function value. And the computer equipment calculates a smooth absolute loss function value corresponding to the weight matrix in the convolution layer in the first neural network model to obtain a convolution regular loss function value corresponding to the convolution layer.

For example, for any one convolution layer, the convolution layer includes a weight matrix and a bias matrix, assuming that W is the weight matrix of the convolution layer and b is the bias matrix of the convolution layer. The operation of the convolutional layer can be expressed as:

F(x)＝Wx+b

where W is the weight matrix of the convolution layer, b is the bias matrix of the convolution layer, x is the input of the convolution layer, and F (x) is the output of the convolution layer.

When training the convolution layer, W and b can be updated in a back propagation algorithm of a training period, and from the calculation process of the convolution layer, the importance degree of the convolution layer can be directly estimated by using W. The smaller the value of W, the less important the information that describes the convolutional layer, that is, the less important the convolutional layer can be pruned. For this reason, a convolution regular loss function value is designed for the convolution layer to drive the value of the weight matrix W to approach 0, and the calculation formula of the convolution regular loss function value can be expressed as:

L _conv ＝λc*g(W)

Wherein L is _conv For the convolution regularization loss function value, λc is a regularization parameter of the convolution layer, and is used for controlling the regularization degree of the convolution layer, the larger λc is, the stronger the regularization effect is, the more the value of W is approaching 0, otherwise, the smaller λc is, the weaker the regularization force is, and the value of W is not approaching 0. The value of λc is in the range of 0 to 1. g (W) is a regularization function, where the regularization is performed using smooth-L1, and x is the input to the convolutional layer.

The regularization process for the batch normalization layer is as follows:

and the computer equipment carries out regularization treatment on parameters of the batch normalization layer in the first neural network model to obtain a batch normalization regular loss function value corresponding to the batch normalization layer.

In some embodiments, the parameters in the batch normalization layer include stretch parameters and the regularization process includes calculating a square loss function value. And the computer equipment calculates a square loss function value corresponding to the stretching parameter in the batch normalization layer in the first neural network model to obtain a batch normalization regular loss function value corresponding to the batch normalization layer.

For example, for any batch of normalized layers, assume z _in For input of the batch normalization layer, z _out The output of the layer is normalized for the batch. The operation of the batch normalization layer can be expressed as:

Wherein z is _in For input of the batch normalization layer, z _out Mu for the output of the normalized layer of the batch _B For the mean value of the feature map in the current batch size (mini-batch), the batch size is used to represent the number of samples selected before each parameter adjustment, σ _B For the variance of the batch normalized layer feature map, e is the error coefficient,in the middle, γ is the stretching parameter, and β is the offset parameter. Gamma and beta are parameters to be learned of the batch normalization layer, and are updated through a back propagation algorithm in a training stage. From the calculation process of the batch normalization layer, the gamma parameter can be used for measuring the importance of the batch normalization layer, so that the importance degree of the corresponding batch normalization layer is measured, and the gamma parameter approaches to 0, which means that the less important the batch normalization layer can be cut off.

For this purpose, a batch normalization regular loss function value is designed for the batch normalization layer to drive the value of the stretching parameter gamma to approach 0, and the calculation formula of the batch normalization regular loss function value can be expressed as:

L _bn ＝λ _b *B(γ)

B(γ)＝∑γ ²

wherein L is _bn For batch normalization of canonical loss function value, lambda _b Regularization parameters of the batch normalization layer are used for controlling regularization degree of the batch normalization layer, lambda _b The larger the regularization effect is, the stronger the gamma parameter will be approaching 0, whereas λ is _b The smaller the regularization strength is, the weaker the gamma parameter will not approach 0. Lambda (lambda) _b The value range is between 0 and 1. B (γ) is a regularization function where L2 is used for regularization of the batch-level.

When regularizing the convolution layer and the batch normalization layer in the first neural network model, the regularizing treatment may be performed on the convolution layer and the batch normalization layer at the same time, or the regularizing treatment may be performed on the convolution layer and the batch normalization layer sequentially according to the need, or the regularizing treatment may be performed according to the connection sequence of the convolution layer and the batch normalization layer, which is not particularly limited in the embodiment of the present application.

When regularizing the convolution layer and the batch normalization layer in the first neural network model, the regularizing process may be performed only on the convolution layer, or only on the batch normalization layer, or both the convolution layer and the batch normalization layer.

Step 406: and updating model parameters of the first neural network model based on the total loss function value formed by the first training loss function value and the regular loss function value to obtain a second neural network model after parameter sparsification.

The first training loss function value refers to a loss function value obtained by updating the model parameters of the first neural network model before the regularization training.

Optionally, before the computer device performs the first regularization training on the network layer in the first neural network model, the first training loss function value refers to an original loss function value for performing model parameter update on the first neural network model; before the computer equipment performs the second regularization processing on the network layer in the first neural network model, the first training loss function value refers to a total loss function value obtained by performing model parameter updating on the first neural network model after the first regularization training; before the computer equipment performs n-th regularization treatment on a network layer in the first neural network model, the first training loss function value refers to a total loss function value obtained by performing model parameter update on the first neural network model after n-1-th regularization training; n is a positive integer.

Illustratively, in the case of obtaining a convolution regular loss function value corresponding to the convolution layer and a batch normalization regular loss function value corresponding to the batch normalization layer, the computer device sums at least one of the convolution regular loss function value and the batch normalization regular loss function value with the first training loss function value to obtain a total loss function value; the computer equipment updates model parameters of the first neural network model based on the total loss function value to obtain a second neural network model.

Optionally, the computer device sums only the convolution regular loss function value and the first training loss function value to obtain a total loss function value; the computer equipment updates model parameters of the first neural network model based on the total loss function value to obtain a second neural network model.

Optionally, the computer device sums only the batch normalized regular loss function value with the first training loss function value to obtain a total loss function value; the computer equipment updates model parameters of the first neural network model based on the total loss function value to obtain a second neural network model.

Optionally, the computer device sums the convolution regular loss function value, the batch normalization regular loss function value and the first training loss function value to obtain a total loss function value; the computer equipment updates model parameters of the first neural network model based on the total loss function value to obtain a second neural network model.

For example, the calculation formula of the total loss function value can be expressed as:

L _total ＝L+λc*g(W)+λ _b *B(γ)

wherein L is _total To represent the total loss function value, λc×g (W) to represent the convolution regular lossFunction value lambda _b * B (gamma) is used to represent the batch normalized canonical loss function value, λc is the regularization parameter of the convolution layer, g (W) is the regularization function, λ _b For the regularization parameters of the batch normalization layer, B (γ) is a regularization function, L being used to represent the first training loss function value.

Illustratively, by joint regularization training based on a convolution layer and a batch normalization layer, sparse model parameters can be obtained, taking the batch normalization layer as an example, a schematic diagram of parameter values of the batch normalization layer before regularization training, as shown in fig. 5, and a schematic diagram of parameter values of the batch normalization layer after regularization training, as shown in fig. 6. The abscissa in the figure is a numerical value, and the ordinate is the number of iterations. As can be seen from fig. 5, the parameter values of the batch normalization layer before regularization training show a normal distribution, and the values of the parameters are concentrated near 1; as can be seen from fig. 6, the parameter values in the batch normalization layer after regularization training are gradually pushed to be near 0.1 and tend to be 0 along with the increase of the iteration times, so that a thinned model can be obtained through regularization training.

Step 408: and pruning the second neural network model to obtain a pruned third neural network model.

The computer device deletes a part of parameters corresponding to the network layer in the second neural network model, and a third neural network model after deleting the parameters is obtained.

In some embodiments, the computer device deletes a part of parameters corresponding to the convolution layer and/or the batch normalization layer in the second neural network model, and obtains a third neural network model after deleting the parameters.

The step of deleting part of the parameters of the convolution layer is as follows:

the computer equipment takes absolute values of elements in the weight matrix corresponding to the single convolution layer and adds up the absolute values to obtain a weight matrix norm corresponding to the single convolution layer; the computer equipment sorts the weight matrix norms corresponding to the at least two convolution layers according to the numerical values to obtain a weight matrix norms sequence; and the computer equipment performs partial deletion on the weight matrix norms in the weight matrix norms sequence according to the numerical value of the weight matrix norms to obtain a third neural network model.

Optionally, the computer device obtains a third neural network model based on the weight matrix corresponding to the remaining weight matrix norms.

For example, for the weight matrix W corresponding to all the convolution layers, the weight matrix norms (also referred to as L1 norms) of all the W matrices are counted and denoted as |w|, that is, each element in the weight matrix W of each single convolution layer is taken as an absolute value and then summed up. The calculation of the weight matrix norms can be expressed as:

|W|＝|W _1* |+|W _1* |+…+|W _m*n |

wherein, W is used to represent the weight matrix norm, W _1* The i is used to represent the absolute value of the element of the first row and first column in the weight matrix.

For example, after sorting the stretching parameters, a model clipping ratio is set to r, and the value range of r is 0 to 1, according to r, we can cut off the values of the parameters after sorting in the same ratio, for example, the weight matrix norm sequence is arranged in descending order, and includes 100 parameter values in total, and meanwhile, a clipping ratio is added to 0.9, that is, 90% of the parameter values are clipped, so that only the convolution layers corresponding to the first 10 parameter values of 100 parameter values can be reserved.

The steps for deleting part of the parameters of the batch normalization layer are as follows:

the computer equipment sorts the stretching parameters corresponding to the at least two batch normalization layers according to the numerical values to obtain a stretching parameter sequence; and the computer equipment partially deletes the stretching parameters in the stretching parameter sequence according to the numerical value of the stretching parameters, and obtains a third neural network model based on the rest stretching parameters.

And taking absolute values of gamma parameters of all batches of normalization layers, and finally sequencing the magnitudes of all parameter values.

After the stretching parameters are sequenced, a model cutting proportion is set as r, the value range of r is 0 to 1, according to r, the same proportion cutting can be carried out on the values of the sequenced parameters, for example, the stretching parameter sequences are arranged according to descending order, 500 parameter values are included in total, meanwhile, the cutting proportion is set as 0.9, namely, 90% of the parameter values are cut, and only the batch corresponding to the first 50 parameter values of the 500 parameter values is reserved for one layer.

In some embodiments, the computer device may partially remove parameters of the second neural network model to obtain a third neural network model.

Illustratively, the computer device identifies the importance of each parameter in the entire second neural network model and cuts out non-important parameters for the second neural network model through a mechanism.

In one possible implementation, when the computer device prunes the second neural network model, its smallest pruning unit may be a single parameter (or referred to as a single weight) (referred to as unstructured pruning in the field of artificial intelligence (Artificial Intelligence, AI)), or may be a more structured set of parameters (such as weight bars, filters, feature maps, etc.) (referred to as structured pruning in the field of AI).

In one possible implementation, the recognition of the importance of the parameters by the computer device may be approximated empirically by certain indices (norms, gradients, etc.), or may be actually measured based on the loss of model expressive power caused by clipping each parameter.

For example, parameters with smaller absolute values (e.g., below a certain threshold, or in a later order from large to small), or structures of parameters with smaller norms L0, L1, and/or L2 (e.g., filters in convolutional neural networks, feature maps, etc.), are considered as unimportant parameters based on the empirical assumption that "smaller parameter values are less important".

In one possible implementation manner, the above-mentioned clipping operation on the unimportant parameters may be that all parameters meeting the unimportant conditions are clipped at one time, or may be divided into a plurality of iterations, firstly clipping a part, if the compression ratio does not reach the standard, then selecting the parameters that need to be clipped at the next iteration again according to some conditions (which may be the same as or different from the previous iteration), and repeating the cycle until the clipping is completed (the parameters that are clipped on the sparse network that is finally completed may not be the same as those in the case of selecting one-time clipping, even if the same global pruning proportion is adopted in both cases).

In one possible implementation, the pruning ratio of each layer of parameters in the entire neural network model may be the same or different.

In one possible implementation, the above-mentioned computer device may determine, when determining the number of clipping parameters, based on a number-to-number ratio, or based on a ratio of importance values of the clipped parameters.

According to the method provided by the embodiment, through regularization processing is carried out on parameters in a network layer in the first neural network model, a regular loss function value corresponding to the network layer is obtained, model parameter updating is carried out on the first neural network model based on a total loss function value formed by the first training loss function value and the regular loss function value, and a second neural network model with sparse parameters is obtained. According to the method, the parameters of the network layer in the first neural network model are subjected to training constraint through regularization treatment, the first neural network model is subjected to model training by combining the loss function value of the first neural network model and the regular loss function value corresponding to the network layer, and a second neural network model with sparse parameters is obtained in a model training mode, so that the parameter sparsification of the neural network model is more uniform and accurate.

According to the method provided by the embodiment, the convolution layer for extracting the characteristics in the first neural network model and the normalization layer for normalizing the characteristics are regularized, the corresponding regularized loss function value is obtained, the model parameter updating is carried out on the first neural network model based on the total loss function value formed by the first training loss function value and the regularized loss function value, and the second neural network model with sparse parameters is obtained. According to the application, the parameters of the most core convolution layer and the batch normalization layer in the first neural network model are subjected to training constraint through regularization treatment, the model training is performed on the first neural network model by combining the loss function value of the first neural network model and the regular loss function values corresponding to the convolution layer and the batch normalization layer, and the parameter sparsification of the convolution layer and the batch normalization layer in the neural network model is more uniform and accurate through a model training mode.

According to the method provided by the embodiment, the third neural network model with the smaller parameter number is obtained by deleting part of the parameters in the second neural network model with the sparse parameters, so that a good effect is achieved on acceleration of the neural network model, and accuracy of the neural network model is guaranteed.

According to the method provided by the embodiment, the parameters of the convolution layer and the batch normalization layer are sequenced and partially deleted, so that the third neural network model with smaller parameter number is obtained, a good effect is achieved on acceleration of the neural network model, and accuracy of the neural network model is guaranteed.

According to the method provided by the embodiment, after the third neural network model with the smaller parameter is obtained, the third neural network model is subjected to model training through the training sample, so that the third neural network model with the adjusted parameter is obtained, the optimization of the third neural network model is realized, and the neural network model is enabled to keep higher accuracy.

According to the method provided by the embodiment, after the third neural network model with smaller parameters is obtained, the first neural network model is used for guiding the third neural network model to carry out model training, the third neural network model with the parameters adjusted is obtained, optimization of the third neural network model is achieved, and the neural network model is enabled to keep higher accuracy.

After the cut light-weight third neural network model is obtained, the accuracy value of the third neural network model is reduced to a certain extent, and in order to ensure the accuracy value of the third neural network model, the third neural network model is optimized by coarse and fine adjustment of the third neural network model. Illustratively, a schematic diagram of a method of optimizing a neural network model is shown in fig. 7. The coarse and fine tuning methods may be performed by a computer device, which may be the terminal or server of fig. 2.

Coarse tuning refers to fine tuning of parameters of the third neural network model.

Fine tuning refers to making large-scale or large-scale adjustments to parameters of the third neural network model.

It should be noted that, the coarse tuning and the fine tuning are not limited in order and implementation, that is, the coarse tuning and the fine tuning are not required to be performed, and only one of them may be performed; the coarse adjustment is not required to be performed first, and the fine adjustment may be performed first and then the coarse adjustment may be performed, which is not particularly limited in the embodiment of the present application.

The method for carrying out rough adjustment on the third neural network model comprises the following steps:

and the computer equipment performs model training on the third neural network model based on the training sample to obtain the third neural network model with the parameters adjusted. And model training in a plurality of training periods is carried out by cutting the reserved convolution layer in the third neural network model and taking the parameter value of the batch normalization layer as an initial value, so that the accuracy of the lightweight third neural network model is improved.

The method for fine-tuning the third neural network model comprises the following steps:

step 702: and simultaneously inputting the training sample into the first neural network model and the third neural network model for calculation to obtain a first result corresponding to the first neural network model and a second result corresponding to the third neural network model.

The third neural network model is a lightweight model corresponding to the first neural network model, and the first neural network model is a lightweight model relative to the third neural network model.

The first neural network model may also be referred to as a teacher model, the third neural network model may also be referred to as a student model, the student model is finally used for deploying online to perform data processing work, and the teacher model is used for providing guidance for training of the student model, and the teacher model is not typically deployed online due to the fact that the teacher model is typically large in depth, high in accuracy, complex in structure, and the like.

Taking a neural network model as an image processing model as an example, that is, a student model which is trained is used for performing super-resolution image processing, that is, after the computer device finishes training the student model, the student model can be deployed into devices with image processing requirements, such as embedded devices, mobile terminal devices and the like, so that the devices can perform image processing work by using the student model obtained by training. Of course, after the computer device completes training of the student model, it may also perform image processing work by itself using the student model obtained by training. Because the student model is a lightweight deep learning model, performing image processing work through the student model reduces the capability requirements for deploying the device. In addition, although the model structure of the student model is relatively simple and the depth is small, the student model is guided by a teacher model with large depth and high precision in the training process, so that the light-weight student model can achieve an ideal image processing effect.

It should be noted that, in the embodiment of the present application, the application scenario of the neural network model is not limited, and optionally, taking the neural network model as an image processing model as an example, the image processing model in the embodiment of the present application may be applied to scenarios such as medical imaging, remote sensing imaging, image compression, image processing, video processing, and the like.

Illustratively, the first result includes a first feature result output by a feature layer in the first neural network model and a first output result output by an output layer, and the second result includes a second feature result output by a feature layer in the third neural network model and a second output result output by the output layer.

The computer equipment inputs the training sample into a first neural network model for calculation to obtain a first characteristic result and a first output result; the computer equipment inputs the training sample into the third neural network model for calculation to obtain a second characteristic result and a second output result.

Step 704: a loss function value is calculated based on the first result and the second result.

Illustratively, after obtaining the first result and the second result, the computer device obtains a first loss function value based on a square of a difference between the first feature result and the second feature result; the computer device obtains a second loss function value based on a square of a difference between the first output result and the second output result.

The calculation formula of the first loss function value can be expressed as:

L ₁ ＝|f ₁ (x)-Y ₁ | ²

wherein L is ₁ For the first loss function value, f ₁ (x) Outputting a first feature result for a feature layer in the first neural network model, Y ₁ And outputting a second characteristic result for a characteristic layer in the third neural network model, wherein x is an input value.

The calculation formula of the second loss function value can be expressed as:

L ₂ ＝|f ₂ (x)-Y ₂ | ²

wherein L is ₂ As a second loss functionNumerical value f ₂ (x) A first output result output for an output layer in the first neural network model, Y ₂ And outputting a second output result for an output layer in the third neural network model, wherein x is an input value.

Step 706: and updating the model parameters of the third neural network model based on the loss function values.

The computer device updates model parameters of the third neural network model based on a sum of the first loss function value and the second loss function value.

It should be noted that the computer device may update the model parameters of the third neural network model based on at least one of the first loss function value and the second loss function value.

Optionally, the computer device updates the model parameters of the third neural network model based only on the first loss function value.

Optionally, the computer device updates the model parameters of the third neural network model based only on the second loss function value.

For example, as shown in the schematic diagram of knowledge distillation of the third neural network model in fig. 8, after coarse adjustment of the third neural network model 803, the accuracy is generally improved greatly, but accuracy similar to that of the first neural network model 802 is often not obtained, so that in order to further improve the accuracy of the third neural network model 803, the accuracy of the third neural network model 803 is improved based on joint knowledge distillation of the feature layer and the output layer. At this time, the first neural network model 802 is used as a teacher model, the third neural network model 803 is used as a student model, and for the same training sample 801, the third neural network model 803 can make its own output result close to the first neural network model 802, i.e. imitate the first neural network model 802, so as to improve its own accuracy.

Assuming that the training sample 801 is a picture, the picture is sent to the first neural network model 802 and the third neural network model 803 after being cut at the same time to calculate, then the loss function values corresponding to the last feature layer and the output layer of the first neural network model 802 and the third neural network model 803 are calculated, and the output difference value between the third neural network model 803 and the first neural network model 802 is pulled up by the loss function values corresponding to the last feature layer and the output layer, so that the third neural network model 803 learns knowledge information of the first neural network model 802.

In summary, according to the method provided by the embodiment, the training samples are used for performing coarse adjustment and fine adjustment on the parameters of the lightweight third neural network model, so that the model accuracy performance of the lightweight third neural network model is basically unchanged. By the technology, the deep learning large model can be deployed into the end-side equipment with limited hardware resources in a landing way, and high algorithm instantaneity and algorithm accuracy are obtained.

The embodiment of the application provides a schematic diagram of an acceleration method for a neural network model in computer equipment, and as shown in fig. 9, the method can be executed by the computer equipment, and the computer equipment can be a terminal or a server.

Step 901: a first neural network model is acquired.

Step 902: regularization training is performed based on the convolution layer and the batch normalization layer.

The computer equipment carries out regularization treatment on parameters of a convolution layer and a batch normalization layer in the neural network model respectively to obtain a convolution regular loss function value corresponding to the convolution layer and a batch normalization regular loss function value corresponding to the batch normalization layer.

Under the condition that a convolution regular loss function value corresponding to a convolution layer and a batch normalization regular loss function value corresponding to a batch normalization layer are obtained, the computer equipment sums at least one of the convolution regular loss function value and the batch normalization regular loss function value with a first training loss function value to obtain a total loss function value; and the computer equipment updates the model parameters of the first neural network model based on the total loss function value to obtain a second neural network model with sparse parameters.

Step 903: pruning.

After the second neural network model with sparse numbers is obtained, deleting partial parameters corresponding to the convolution layer and/or the batch normalization layer in the second neural network model by the computer equipment to obtain a third neural network model with deleted parameters.

Step 904: and (5) rough adjustment of model parameters.

After the cut lightweight third neural network model is obtained, the third neural network model is optimized by coarse tuning, and the computer equipment performs model training on the third neural network model based on a training sample to obtain the third neural network model with the parameters adjusted. And model training in a plurality of training periods is carried out by taking the parameter value of the batch normalization layer as an initial value through the convolution layer which is reserved after cutting in the third neural network model, so that the lightweight third neural network model is optimized.

Step 905: knowledge distillation is performed based on the feature layer and the output layer.

After coarse tuning the third neural network model, the computer device performs knowledge distillation on the third neural network model based on the feature layer and the output layer, thereby realizing fine tuning of parameters of the third neural network model.

The computer equipment simultaneously inputs the training sample into the first neural network model and the third neural network model for calculation to obtain a first result corresponding to the first neural network model and a second result corresponding to the third neural network model; after obtaining the first result and the second result, the computer device obtains a first loss function value based on a square of a difference between the first feature result and the second feature result; the computer device obtaining a second loss function value based on the square of the difference between the first output result and the second output result; the computer device updates model parameters of the third neural network model based on the sum of the first loss function value and the second loss function value, thereby obtaining an optimized third neural network model.

Step 906: and obtaining an optimized third neural network model.

The optimized third neural network model is characterized in that parameters are cut and the accuracy of the model is maintained.

Fig. 10 is a schematic structural view of an acceleration apparatus for a neural network model in a computer device according to an exemplary embodiment of the present application. The apparatus may be implemented as all or part of a computer device by software, hardware, or a combination of both, the apparatus comprising:

An acquisition module 1001, configured to acquire a first neural network model;

the training module 1002 is configured to perform regularization training on the first neural network model, and perform training to obtain a second neural network model with sparse parameters, where the regularization training refers to performing sparsification on parameters in the first neural network model by a regularization processing manner, and performing model training based on the sparse parameters;

the pruning module 1003 is configured to perform pruning operation on the second neural network model to obtain a pruned third neural network model, where the pruning operation refers to clipping parameters in the second neural network model, and a parameter amount in the third neural network model is smaller than a parameter amount in the second neural network model.

In some embodiments, the training module 1002 is configured to perform regularization processing on a network layer in the first neural network model, to obtain a regularized loss function value corresponding to the network layer, where the regularization processing is configured to add a training constraint to parameters in the network layer.

In some embodiments, the training module 1002 is configured to update model parameters of the first neural network model based on a total loss function value formed by the first training loss function value and the regular loss function value, so as to obtain a second neural network model after parameter sparsification.

The first training loss function value refers to a loss function value obtained by updating model parameters of the first neural network model before the regularization training.

In some embodiments, the network layer comprises a convolutional layer, and the canonical loss function value comprises a convolutional canonical loss function value; and a training module 1002, configured to perform regularization processing on parameters of the convolutional layer in the first neural network model, so as to obtain the convolutional canonical loss function value corresponding to the convolutional layer.

In some embodiments, a training module 1002 is configured to sum the convolution regular loss function value with the first training loss function value to obtain the total loss function value; and updating model parameters of the first neural network model based on the total loss function value to obtain the second neural network model.

In some embodiments, the parameters in the convolutional layer comprise a weight matrix, and the regularization process comprises calculating a smoothed absolute loss function value; and a training module 1002, configured to calculate a smooth absolute loss function value corresponding to the weight matrix in the convolution layer, and obtain the convolution regular loss function value corresponding to the convolution layer.

In some embodiments, the network layer comprises a batch normalization layer, and the canonical loss function value comprises a batch normalization canonical loss function value; the training module 1002 is configured to perform regularization processing on parameters of the batch normalization layer in the first neural network model, to obtain the batch normalization canonical loss function value corresponding to the batch normalization layer; summing the batch normalized regular loss function value and the first training loss function value to obtain the total loss function value; and updating model parameters of the first neural network model based on the total loss function value to obtain the second neural network model.

In some embodiments, the parameters in the batch normalization layer include stretch parameters, and the regularizing process includes calculating a squared loss function value; the training module 1002 is configured to calculate a square loss function value corresponding to the stretching parameter in the batch normalization layer, and obtain the batch normalization regular loss function value corresponding to the batch normalization layer.

In some embodiments, the apparatus further includes a pruning module 1003, configured to delete a part of parameters corresponding to a network layer in the second neural network model, to obtain the third neural network model after deleting the parameters.

In some embodiments, the pruning module 1003 is configured to take absolute values of elements in a weight matrix corresponding to a single convolution layer and accumulate and sum the absolute values to obtain a weight matrix norm corresponding to the single convolution layer, where the weight matrix norm refers to a sum of absolute values of elements in the weight matrix; sequencing the weight matrix norms corresponding to at least two convolution layers according to the numerical values to obtain a weight matrix norms sequence; and partially deleting the weight matrix norms in the weight matrix norms sequence according to the numerical value of the weight matrix norms to obtain the third neural network model.

In some embodiments, the pruning module 1003 is configured to sort the stretching parameters corresponding to the at least two batch normalization layers according to the numerical values, to obtain a stretching parameter sequence; and partially deleting the stretching parameters in the stretching parameter sequence according to the numerical value of the stretching parameters to obtain the third neural network model.

In some embodiments, the training module 1002 is configured to perform model training on the third neural network model based on a training sample, to obtain a third neural network model with adjusted parameters.

In some embodiments, the training module 1002 is configured to input a training sample to the first neural network model and the third neural network model at the same time to perform calculation, so as to obtain a first result corresponding to the first neural network model and a second result corresponding to the third neural network model; calculating a loss function value based on the first result and the second result; and updating model parameters of the third neural network model based on the loss function value.

The parameter quantity in the third neural network model is smaller than the parameter quantity in the lightweight model of the first neural network model, and the parameter quantity in the first neural network model is larger than the parameter quantity in the lightweight model of the third neural network model.

In some embodiments, the first result comprises a first feature result output by a feature layer in the first neural network model and a first output result output by an output layer, and the second result comprises a second feature result output by a feature layer in the third neural network model and a second output result output by an output layer; the training module 1002 is configured to input the training sample to the first neural network model for calculation, to obtain the first feature result and the first output result; and inputting the training sample into the third neural network model for calculation to obtain the second characteristic result and the second output result.

In some embodiments, a training module 1002 is configured to obtain a first loss function value based on a square of a difference between the first feature result and the second feature result; obtaining a second loss function value based on the square of the difference between the first output result and the second output result;

in some embodiments, the training module 1002 is configured to update model parameters of the third neural network model based on a sum of the first loss function value and the second loss function value.

Fig. 11 shows a block diagram of a computer device 1100 in accordance with an exemplary embodiment of the present application. The computer device may be implemented as a server in the above-described aspects of the present application. The image computer apparatus 1100 includes a central processing unit (Central Processing Unit, CPU) 1101, a system Memory 1104 including a random access Memory (Random Access Memory, RAM) 1102 and a Read-Only Memory (ROM) 1103, and a system bus 1105 connecting the system Memory 1104 and the central processing unit 1101. The image computer device 1100 also includes a mass storage device 1106 for storing an operating system 1109, application programs 1110, and other program modules 1111.

The mass storage device 1106 is connected to the central processing unit 1101 through a mass storage controller (not shown) connected to the system bus 1105. The mass storage device 1106 and its associated computer-readable media provide non-volatile storage for the image computer device 1100. That is, the mass storage device 1106 may include a computer readable medium (not shown) such as a hard disk or a compact disk-Only (CD-ROM) drive.

The computer readable medium may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, erasable programmable read-Only registers (Erasable Programmable Read Only Memory, EPROM), electrically erasable programmable read-Only Memory (EEPROM) flash Memory, or other solid state Memory technology, CD-ROM, digital versatile disks (Digital Versatile Disc, DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. Of course, those skilled in the art will recognize that the computer storage medium is not limited to the one described above. The system memory 1104 and mass storage 1106 described above may be collectively referred to as memory.

The image computer device 1100 may also operate via a network, such as the internet, connected to a remote computer on the network, according to various embodiments of the present disclosure. That is, the image computer device 1100 may be connected to the network 1108 through a network interface unit 1107 connected to the system bus 1105, or alternatively, the network interface unit 1107 may be used to connect to other types of networks or remote computer systems (not shown).

The memory further includes at least one section of computer program stored in the memory, and the central processor 1101 implements all or part of the steps in the acceleration method for a neural network model in a computer device shown in the above embodiments by executing the at least one section of program.

The embodiment of the application also provides a computer device, which comprises a processor and a memory, wherein at least one program is stored in the memory, and the at least one program is loaded and executed by the processor to realize the acceleration method for the neural network model in the computer device provided by the above method embodiments.

The embodiment of the application also provides a computer readable storage medium, and at least one computer program is stored in the computer readable storage medium, and the at least one computer program is loaded and executed by a processor to realize the acceleration method for the neural network model in the computer equipment provided by the above method embodiments.

Embodiments of the present application also provide a computer program product comprising a computer program stored in a computer readable storage medium; the computer program is read from the computer readable storage medium and executed by a processor of the computer device, so that the computer device executes to implement the acceleration method for the neural network model in the computer device provided by the above method embodiments.

It will be appreciated that in the specific embodiments of the present application, data related to user data processing, such as related to user identity or characteristics, such as historical data, portraits, etc., may be subject to user approval or consent when the above embodiments of the present application are applied to specific products or technologies, and the collection, use and processing of the related data may be subject to relevant national and regional laws and regulations and standards.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the application are intended to be included within the scope of the application.

Claims

1. A method for accelerating a neural network model in a computer device, the method comprising:

acquiring a first neural network model;

2. The method of claim 1, wherein regularizing the first neural network model to obtain a second neural network model with sparse parameters comprises:

regularizing a network layer in the first neural network model to obtain a regular loss function value corresponding to the network layer, wherein the regularizing is used for adding training constraints to parameters in the network layer;

updating model parameters of the first neural network model based on a total loss function value formed by the first training loss function value and the regular loss function value to obtain a second neural network model with sparse parameters;

3. The method of claim 2, wherein the network layer comprises a convolutional layer, and the canonical loss function value comprises a convolutional canonical loss function value;

the regularization processing is performed on the network layer in the first neural network model to obtain a regularized loss function value corresponding to the network layer, including:

Regularizing parameters of the convolution layer in the first neural network model to obtain the convolution regular loss function value corresponding to the convolution layer;

the model parameter updating is performed on the first neural network model based on the total loss function value formed by the first training loss function value and the regular loss function value, so as to obtain a second neural network model with sparse parameters, which comprises the following steps:

summing the convolution regular loss function value and the first training loss function value to obtain the total loss function value;

and updating model parameters of the first neural network model based on the total loss function value to obtain the second neural network model.

4. A method according to claim 3, wherein the parameters in the convolutional layer comprise a weight matrix and the regularization process comprises calculating a smoothed absolute loss function value;

regularizing parameters of the convolution layer in the first neural network model to obtain the convolution regular loss function value corresponding to the convolution layer, including:

and calculating a smooth absolute loss function value corresponding to the weight matrix in the convolution layer to obtain the convolution regular loss function value corresponding to the convolution layer.

5. The method of claim 2, wherein the network layer comprises a batch normalization layer and the canonical loss function value comprises a batch normalization canonical loss function value;

regularizing parameters of the batch normalization layer in the first neural network model to obtain the batch normalization regular loss function value corresponding to the batch normalization layer;

summing the batch normalized regular loss function value and the first training loss function value to obtain the total loss function value;

6. The method of claim 5, wherein the parameters in the batch normalization layer comprise stretch parameters, and wherein the regularization process comprises calculating a square loss function value;

Regularizing parameters of the batch normalization layer in the first neural network model to obtain a batch normalization regular loss function value corresponding to the batch normalization layer, wherein the regularizing step comprises the following steps:

and calculating a square loss function value corresponding to the stretching parameter in the batch normalization layer to obtain the batch normalization regular loss function value corresponding to the batch normalization layer.

7. The method according to any one of claims 1 to 6, wherein the pruning operation is performed on the second neural network model to obtain a pruned third neural network model, including:

and deleting partial parameters corresponding to the network layer in the second neural network model to obtain the third neural network model after deleting the parameters.

8. The method of claim 7, wherein deleting the partial parameters corresponding to the network layer in the second neural network model to obtain the third neural network model after deleting the parameters, comprises:

taking absolute values of elements in a weight matrix corresponding to a single convolution layer, and accumulating and summing to obtain a weight matrix norm corresponding to the single convolution layer, wherein the weight matrix norm refers to the sum of the absolute values of all elements in the weight matrix;

Sequencing the weight matrix norms corresponding to at least two convolution layers according to the numerical values to obtain a weight matrix norms sequence;

and partially deleting the weight matrix norms in the weight matrix norms sequence according to the numerical value of the weight matrix norms to obtain the third neural network model.

9. The method of claim 7, wherein the partially deleting the parameters corresponding to the network layer in the second neural network model to obtain the third neural network model after deleting the parameters, comprises:

sequencing the stretching parameters corresponding to the at least two batch normalization layers according to the numerical values to obtain a stretching parameter sequence;

and partially deleting the stretching parameters in the stretching parameter sequence according to the numerical value of the stretching parameters to obtain the third neural network model.

10. The method according to any one of claims 1 to 6, further comprising:

and performing model training on the third neural network model based on the training sample to obtain a third neural network model with the parameters adjusted.

11. The method according to any one of claims 1 to 6, further comprising:

Simultaneously inputting a training sample into the first neural network model and the third neural network model for calculation to obtain a first result corresponding to the first neural network model and a second result corresponding to the third neural network model;

calculating a loss function value based on the first result and the second result;

updating model parameters of the third neural network model based on the loss function value;

12. The method of claim 11, wherein the first result comprises a first feature result output by a feature layer and a first output result output by an output layer in the first neural network model, and the second result comprises a second feature result output by a feature layer and a second output result output by an output layer in the third neural network model;

the step of inputting the training sample to the first neural network model and the third neural network model for calculation to obtain a first result corresponding to the first neural network model and a second result corresponding to the third neural network model, includes:

Inputting the training sample into the first neural network model for calculation to obtain the first characteristic result and the first output result;

and inputting the training sample into the third neural network model for calculation to obtain the second characteristic result and the second output result.

13. The method of claim 12, wherein the calculating a loss function value based on the first result and the second result comprises:

obtaining a first loss function value based on the square of the difference between the first feature result and the second feature result;

obtaining a second loss function value based on the square of the difference between the first output result and the second output result;

the updating the model parameters of the third neural network model based on the loss function value includes:

model parameters of the third neural network model are updated based on a sum of the first loss function value and the second loss function value.

14. An acceleration apparatus for a neural network model in a computer device, the apparatus comprising:

the acquisition module is used for acquiring the first neural network model;

15. A computer device, the computer device comprising: a processor and a memory, said memory having stored therein at least one computer program, at least one of said computer programs being loaded and executed by said processor to implement the acceleration method of a neural network model in a computer-oriented device as claimed in any one of claims 1 to 13.

16. A computer storage medium, characterized in that at least one computer program is stored in the computer readable storage medium, the at least one computer program being loaded and executed by a processor to implement the acceleration method of a neural network model in a computer-oriented device according to any one of claims 1 to 13.