CN116187401B

CN116187401B - Compression method and device for neural network, electronic equipment and storage medium

Info

Publication number: CN116187401B
Application number: CN202310460357.7A
Authority: CN
Inventors: 冉仕举; 卿勇; 李珂; 周鹏飞
Original assignee: Capital Normal University
Current assignee: Capital Normal University
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2023-07-14
Anticipated expiration: 2043-04-26
Also published as: CN116187401A

Abstract

The invention provides a compression method, a compression device, electronic equipment and a storage medium of a neural network, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: obtaining M high-order tensors of an initial neural network; generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, wherein the N target tensors comprise compression weight parameters corresponding to initial weight parameters; performing tensor contraction processing on the N target tensors to generate tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network; training the initial neural network based on the updated weight parameters to obtain the target neural network. By the method, the high-order tensor representing the variation parameters of the neural network is written as the contraction of the tensor network, so that the parameter quantity of the initial neural network can be compressed, the storage and transmission cost of a computer is reduced, the overfitting phenomenon of the neural network is relieved, and the generalization capability is enhanced.

Description

Compression method and device for neural network, electronic equipment and storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for compressing a neural network, an electronic device, and a storage medium.

Background

With the development of artificial intelligence, neural networks have achieved remarkable results in scientific research in the fields of computer vision, natural language processing, even mathematics and physics. One of the biggest costs in continuously increasing the ability of neural networks to handle complex tasks is the rapid increase in the amount of parameters in the neural network. To date, the parameters of the large-scale neural network break through trillion magnitude, and the huge parameter quantity improves the capability of the neural network, but the storage and transmission cost of a computer aiming at the neural network is increased greatly, the overfitting phenomenon is serious, and the generalization capability is damaged.

Therefore, how to reduce the storage and transmission cost of the computer for the neural network and avoid the over-fitting phenomenon is a problem to be solved at present.

Disclosure of Invention

Aiming at the problems existing in the prior art, the embodiment of the invention provides a compression method, a compression device, electronic equipment and a storage medium of a neural network.

The invention provides a compression method of a neural network, which comprises the following steps:

obtaining M high-order tensors of an initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer;

generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer;

performing tensor contraction processing on the N target tensors to generate tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network;

training the initial neural network based on the updated weight parameters to obtain a target neural network.

Optionally, the N target tensors are connected by a tensor index;

performing tensor contraction processing on the N target tensors to generate a tensor contraction result of the target tensor network, including:

determining a common index among the N target tensors based on the connection relation among the N target tensors;

and carrying out summation operation on the common index between every two target tensors to generate the tensor contraction result.

Optionally, the generating, based on the M higher-order tensors, a target tensor network corresponding to the M higher-order tensors includes:

generating an initial tensor network corresponding to the M high-order tensors based on the M high-order tensors; the initial tensor network comprises N initial tensors, and the N initial tensors comprise initial compression weight parameters corresponding to the initial weight parameters;

performing tensor contraction processing on the N initial tensors to generate tensor contraction results of the initial tensor network; the tensor contraction result of the initial tensor network comprises initial updating weight parameters of the initial neural network;

and pre-training the initial tensor network based on the initial weight parameters in the M high-order tensors and the initial updating weight parameters until convergence to obtain the target tensor network.

Optionally, the pre-training the initial tensor network based on the initial weight parameter in the M higher-order tensors and the initial updated weight parameter includes:

based on the initial weight parameter and the initial updating weight parameter, pre-training the initial tensor network by using Euclidean distance as a first loss function;

the first loss function is represented by the following formula (1):

（1）

wherein,,

representing said first loss function,/->

Representing the initial update weight parameter, +.>

Representing the initial weight parameters.

Optionally, the training the initial neural network based on the updated weight parameter to obtain a target neural network includes:

and training the initial neural network by utilizing a loss function corresponding to the initial neural network based on the updated weight parameters until convergence to obtain the target neural network.

The invention also provides a compression device of the neural network, which comprises:

the acquisition module is used for acquiring M high-order tensors of the initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer;

the generation module is used for generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer;

the processing module is used for performing tensor contraction processing on the N target tensors and generating tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network;

and the training module is used for training the initial neural network based on the updated weight parameters to obtain a target neural network.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of compressing a neural network as described in any one of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of compressing a neural network as described in any one of the above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a method of compressing a neural network as described in any one of the above.

According to the compression method, the compression device, the electronic equipment and the storage medium of the neural network, the target tensor network comprising N target tensors is generated by acquiring M high-order tensors of the initial neural network, the N target tensors are subjected to tensor contraction processing, a tensor contraction result comprising updated weight parameters of the initial neural network is generated, and the neural network is trained based on the updated weight parameters, so that the target neural network can be obtained; in the method, the core idea is that the high-order tensor of the neural network variation parameter is written as the contraction of the tensor network, so that the parameter quantity of the compression weight parameter corresponding to the initial weight parameter in N target tensors is far smaller than the parameter quantity of the initial weight parameter, and the computer can realize the efficient compression of the initial weight parameter by only storing N target tensors, thereby reducing the storage and transmission cost of the computer for the neural network, relieving the overfitting phenomenon and enhancing the generalization capability of the neural network; and meanwhile, tensor contraction is carried out based on N target tensors, updated weight parameters similar to the initial weight parameters can be restored, the initial neural network is trained by utilizing the updated weight parameters, and the accuracy of the target neural network is ensured.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a method for compressing a neural network according to the present invention;

FIG. 2 is a schematic diagram of a compression process of a neural network provided by the present invention;

FIG. 3 is a second flow chart of the compression method of the neural network according to the present invention;

FIG. 4 is a schematic diagram of a compressing apparatus of a neural network according to the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order to facilitate a clearer understanding of the various embodiments of the present application, some relevant background knowledge is first presented below.

In recent years, with the development of artificial intelligence, neural networks have achieved remarkable results in scientific research in the fields of computer vision, natural language processing, even mathematics and physics. One of the biggest costs in continuously increasing the ability of neural networks to handle complex tasks is the rapid increase in the amount of parameters in the neural network. To date, parameters of large-scale neural networks have broken through in the order of trillion. The huge parameter quantity can raise the capability of the neural network and simultaneously bring a plurality of serious problems, such as the rapid increase of storage and transmission cost, serious overfitting phenomenon, damaged generalization capability and the like. These problems severely limit the practical application of this technology.

In the related art, the method for compressing the neural network parameters includes: model clipping, network distillation, weight sharing, tensor decomposition and the like. The weight tensor in the neural network is represented by using a matrix product operator, so that higher compression efficiency can be obtained. However, the above method still does not compress the neural network parameters well.

Therefore, in order to reduce the huge parameter quantity of the neural network and reduce the storage and transmission cost, thereby relieving the overfitting phenomenon and enhancing the generalization capability of the neural network, the invention provides a novel method for compressing the parameters of the neural network by utilizing a depth tensor network model. The method can compress the parameter quantity of a linear layer, a convolution layer and the like in the neural network to tens of thousands of the original parameter quantity, and can enhance the generalization capability of the neural network under most conditions, alleviate the overfitting phenomenon and improve the accuracy in the test set.

The compression method of the neural network provided by the invention is specifically described below with reference to fig. 1 to 3. Fig. 1 is a schematic flow chart of a method for compressing a neural network according to the present invention, referring to fig. 1, the method includes steps 101 to 103, where:

step 101, obtaining M high-order tensors of an initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer.

It should be noted that the execution body of the present invention may be any electronic device capable of implementing compression of the neural network, for example, any one of a smart phone, a smart watch, a desktop computer, a laptop computer, and the like.

In order to reduce the huge parameter number of the neural network, reduce the storage and transmission cost of the computer for the neural network, alleviate the overfitting phenomenon and enhance the generalization capability of the neural network, in this embodiment, initial weight parameters of the initial neural network need to be acquired first.

It should be noted that, the initial weight parameters are stored in the form of higher-order tensors in the computer.

The compression method of the neural network provided by the invention has high universality and can be used for realizing a general neural network model; thus, the initial neural network may be, for example, a fully connected neural network (Full Connect Neural Network), a convolutional neural network (Convolutional Neural Network, CNN), a recurrent neural network (Recurrent Neural Network, RNN), or the like.

In practical applications, the neural network may be expressed as:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>

Representing inputs of a neural network, +.>

Representing an output of the neural network; all initial parameters of the neural network +.>

Are stored in the computer in the form of higher order tensors.

102, generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer.

In the present embodiment, since the target tensor network is a form of deforming a higher-order tensor into a plurality of lower-order tensors, the parameter amount of the compression weight parameter stored in each target tensor is much smaller than that in a higher-order tensor in the target tensor network.

Step 103, performing tensor contraction processing on the N target tensors to generate tensor contraction results of the target tensor network; the tensor contraction result includes updated weight parameters of the initial neural network.

In this embodiment, the N target tensors in the target tensor network need to be subjected to tensor contraction processing to obtain the contraction result (i.e. the updated weight parameter of the initial neural network) of the target tensor network.

The object of the invention is to replace the higher order tensor containing the vast majority of the parameters with updated weight parameters of the initial neural network. Since the updated weight parameters are obtained by performing tensor contraction processing based on N target tensors, the parameter amounts of the compression weight parameters contained in the target tensors are far smaller than the parameter amounts of the initial weight parameters stored in the higher-order tensors.

Therefore, the computer only stores N target tensors, and can restore to obtain updated weight parameters similar to the initial weight parameters. That is, the computer can implement compression of the neural network parameters by storing N target tensors.

And 104, training the initial neural network based on the updated weight parameters to obtain a target neural network.

After the updated weight parameters are obtained, the updated weight parameters are utilized to replace the initial weight parameters in the initial neural network, and the initial neural network is trained, so that the performance of the target neural network is ensured.

It should be noted that the compression method of the neural network described in the embodiments of the present invention may be applicable to a variety of different application scenarios, such as image recognition field, text processing field, voice recognition field, and so on. The invention is not limited to the applicable scene of the compression method of the neural network and the type of the neural network.

According to the compression method of the neural network, M high-order tensors of an initial neural network are obtained, a target tensor network comprising N target tensors is generated, tensor contraction processing is carried out on the N target tensors, a tensor contraction result comprising updated weight parameters of the initial neural network is generated, and training is carried out on the neural network based on the updated weight parameters, so that the target neural network can be obtained; in the method, the core idea is that the high-order tensor of the neural network variation parameter is written as the contraction of the tensor network, so that the parameter quantity of the compression weight parameter corresponding to the initial weight parameter in N target tensors is far smaller than the parameter quantity of the initial weight parameter, and the computer can realize the efficient compression of the initial weight parameter by only storing N target tensors, thereby reducing the storage and transmission cost of the computer for the neural network, relieving the overfitting phenomenon and enhancing the generalization capability of the neural network; and meanwhile, tensor contraction is carried out based on N target tensors, updated weight parameters similar to the initial weight parameters can be restored, the initial neural network is trained by utilizing the updated weight parameters, and the accuracy of the target neural network is ensured.

Optionally, the N target tensors are connected by a tensor index;

the tensor contraction processing is performed on the N target tensors, so as to generate a tensor contraction result of the target tensor network, which can be specifically implemented through the following steps (1) - (2):

step (1), determining a common index among the N target tensors based on the connection relation among the N target tensors;

and (2) carrying out summation operation on the common index between every two target tensors to generate the tensor contraction result.

N target tensors in the target tensor network are connected through tensor indexes, and the number of the tensor indexes represents the order of the target tensor. It should be noted that, each tensor index also has dimensions; the tensor index and the dimension of the target tensor can be flexibly adjusted.

In this embodiment, first, it is necessary to determine a common index between N target tensors based on the connection relationship between N target tensors in the target tensor network. That is, it is necessary to determine the tensor index of the connection between the target tensors.

And then carrying out summation operation on the common index between every two target tensors to obtain the compression result (namely the updated weight parameter of the initial neural network) of the target tensor network.

In practice, for example, the initial neural network is compressed to store

Higher order tensors of the individual initial weight parameters (denoted +.>

) I.e. parameter complexity->

。

Assuming that the index dimension of each target tensor in the target tensor network is 2, obviously, the shrinkage result of the target tensor network, namely the update weight parameter of the initial neural network (used

Indicated) is likewise containing->

Higher order tensors of the parameters. The goal of this embodiment is to shrink the N target tensors to approximate the initial weight parameters, i.e., there is +.>

。

The process of the tensor contraction process is described in detail below with reference to fig. 2. Fig. 2 is a schematic diagram of a compression process of a neural network provided by the present invention, and referring to fig. 2, the neural network at least includes an input layer, a convolution portion, and a full connection portion, where in a process of processing initial weight parameters of the network by the convolution portion and the full connection portion, the initial weight parameters of the neural network need to be encoded to generate a corresponding target tensor network. The tensor network in fig. 2 is a "brick wall" tensor network structure.

In the target tensor network, there are multiple target tensors A for storing compression weight parameters ^[0] -A ^[14] Each target tensor is connected through tensor indexes, and the number of the tensor indexes represents the order of the tensor; the common tensor indicator of the two target tensors is represented by a line segment connecting each other. In calculating the contraction of the target tensor network, the common index of the target tensor needs to be summed.

In fig. 2, the solid line represents the tensor index, and the broken line represents the contraction result of the target tensor network (i.e., the update weight parameter of the initial neural network); each vertical line represents an activation function, the selection of which is flexible, and in this embodiment, the type of the activation function is not particularly limited, and the activation functions, such as a RELU activation function, a Sigmoid activation function, a Tanh activation function, and the like; fig. 2 is an example in which the activation function is a ReLu activation function.

Now assume that the content is to be compressed

Higher order tensors of the individual initial weight parameters (denoted +.>

) I.e. parameter complexity

. Setting the index dimension of each target tensor in the target tensor network to be +.>

Obviously, the contraction result of the target tensor network (in +.>

Representation) is likewise approximately containing +.>

The high order tensor of the parameter is indicated by an unshrunk line segment, as shown by the dashed line on the right boundary of fig. 2.

In the tensor network of the present invention, the most important part is the 2 x 2 tensor group that forms the target tensor network

Wherein->

Total number of target tensors->

。

Where 2 represents the dimension of each target tensor and 4 2 multiplications represent that the target tensor is a 4 th order tensor.

At the position of

All variational parameters of the target tensor network are included, the total parameter quantity is only +.>

. Due to->

Order from target tensor->

In a linear relationship, thus the parameter quantity of the compression weight parameter of the target tensor network +.>

Parameter amounts to be much smaller than the initial weight parameters in the M higher-order tensors +.>

。

After the target tensor is contracted, updated weight parameters are obtained, the updated weight parameters are decoded, and then the decoded updated weight parameters are utilized to train the initial neural network, so that the target neural network can be obtained.

In the above embodiment, the parameter number of the initial neural network can be efficiently compressed by performing the summation operation on the common index between every two target tensors, so as to obtain the compression parameter of the initial neural network; because the parameter quantity of the compression parameters is far smaller than that of the initial parameters, the computer can realize the efficient compression of the initial weight parameters by only storing N target tensors, the storage and transmission cost of the computer for the neural network is reduced, the fitting phenomenon is relieved, and the generalization capability of the neural network is enhanced; and meanwhile, tensor contraction is carried out based on N target tensors, updated weight parameters similar to the initial weight parameters can be restored, the initial neural network is trained by utilizing the updated weight parameters, and the accuracy of the target neural network is ensured.

Optionally, the generating, based on the M higher-order tensors, a target tensor network corresponding to the M higher-order tensors may be specifically implemented by the following steps [1] -3):

step [1], based on the M high-order tensors, generating an initial tensor network corresponding to the M high-order tensors; the initial tensor network comprises N initial tensors, and the N initial tensors comprise initial compression weight parameters corresponding to the initial weight parameters;

step 2, performing tensor contraction processing on the N initial tensors to generate tensor contraction results of the initial tensor network; the tensor contraction result of the initial tensor network comprises initial updating weight parameters of the initial neural network;

and step [3], pre-training the initial tensor network based on the initial weight parameters and the initial updating weight parameters in the M higher-order tensors until convergence to obtain the target tensor network.

In this embodiment, in order to generate the optimal N target tensors based on the M high-order tensors, the tensor network needs to be pre-trained first, so as to add stability to the tensor network by giving a good initialization to the tensor network.

Specifically, first, an initial tensor network corresponding to the M high-order tensors needs to be generated based on the M high-order tensors; the initial tensor network comprises N initial tensors, wherein the N initial tensors comprise initial compression weight parameters corresponding to the initial weight parameters, and the initial compression weight parameters are random.

Performing tensor contraction processing on the N initial tensors to generate tensor contraction results of the initial tensor network, wherein the tensor contraction results specifically comprise initial updating weight parameters of the initial neural network; based on the initial weight parameters and the initial updated weight parameters, the initial tensor network is pre-trained with a first loss function.

It should be noted that, the type of the first loss function is not limited in the present invention, and the loss function capable of implementing pre-training on the initial tensor network can be used as the first loss function.

In one implementation of this embodiment, the euclidean distance may be selected as the first loss function, and the pre-training may be performed by minimizing the distance between the initial weight parameter and the initial updated weight parameter.

Optionally, the first loss function is represented by the following formula (1):

（1）

wherein,,

representing said first loss function,/->

Representing the initial update weight parameter, +.>

Representing the initial weight parameters.

At the position of

After convergence, a target tensor network comprising N target tensors is obtained.

Optionally, the training the initial neural network based on the updated weight parameter to obtain a target neural network is specifically implemented through the following steps:

In this embodiment, the parameters of the initial neural network need to be optimized with the goal of minimizing the loss function of the machine learning task.

The same loss function as that used to train the initial neural network can be selected for optimization, and the feedforward process in the optimization process is the same as that of the initial neural network, except that the initial weight parameters in the initial neural network are used

With update weight parameter->

Instead of.

Fig. 3 is a second flow chart of a method for compressing a neural network according to the present invention, referring to fig. 3, the method includes steps 301 to 307, wherein:

step 301, obtaining M higher-order tensors of an initial neural network, where the M higher-order tensors include initial weight parameters of the initial neural network, and M is a positive integer.

Step 302, generating an initial tensor network corresponding to the M high-order tensors based on the M high-order tensors; the initial tensor network comprises N initial tensors, the N initial tensors comprise initial compression weight parameters corresponding to initial weight parameters, and N is a positive integer.

And 303, performing tensor contraction processing on the N initial tensors to generate tensor contraction results of the initial tensor network, wherein the tensor contraction results of the initial tensor network comprise initial updating weight parameters of the initial neural network.

Step 304, pre-training the initial tensor network by using the Euclidean distance as a first loss function based on initial weight parameters in M high-order tensors and initial updated weight parameters until convergence to obtain a target tensor network; the target tensor network comprises N target tensors, wherein the N target tensors comprise compression weight parameters corresponding to initial weight parameters, and the parameter quantity of the compression weight parameters is smaller than the parameter quantity of the initial weight parameters.

Step 305, determining a common index between the N target tensors based on the connection relationship between the N target tensors.

And 306, summing the common indexes between every two target tensors to generate a tensor contraction result, wherein the tensor contraction result comprises updated weight parameters of the initial neural network.

Step 307, based on the updated weight parameters, training the initial neural network by using a loss function corresponding to the initial neural network until convergence, thereby obtaining the target neural network.

The compression device of the neural network provided by the invention is described below, and the compression device of the neural network described below and the compression method of the neural network described above can be referred to correspondingly. Fig. 4 is a schematic structural diagram of a compressing apparatus for a neural network according to the present invention, and as shown in fig. 4, a compressing apparatus 400 for a neural network includes: an acquisition module 401, a generation module 402, a processing module 403, and a training module 404, wherein:

an acquisition module 401, configured to acquire M higher-order tensors of an initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer;

a generating module 402, configured to generate a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer;

a processing module 403, configured to perform tensor contraction processing on the N target tensors, and generate a tensor contraction result of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network;

and a training module 404, configured to train the initial neural network based on the updated weight parameter, so as to obtain a target neural network.

According to the compressing device of the neural network, provided by the invention, the target tensor network comprising N target tensors is generated by acquiring M high-order tensors of the initial neural network, the N target tensors are subjected to tensor contraction processing, a tensor contraction result comprising updated weight parameters of the initial neural network is generated, and the neural network is trained based on the updated weight parameters, so that the target neural network can be obtained; in the method, the core idea is that the high-order tensor of the neural network variation parameter is written as the contraction of the tensor network, so that the parameter quantity of the compression weight parameter corresponding to the initial weight parameter in N target tensors is far smaller than the parameter quantity of the initial weight parameter, and the computer can realize the efficient compression of the initial weight parameter by only storing N target tensors, thereby reducing the storage and transmission cost of the computer for the neural network, relieving the overfitting phenomenon and enhancing the generalization capability of the neural network; and meanwhile, tensor contraction is carried out based on N target tensors, updated weight parameters similar to the initial weight parameters can be restored, the initial neural network is trained by utilizing the updated weight parameters, and the accuracy of the target neural network is ensured.

Optionally, the N target tensors are connected by a tensor index;

the processing module 403 is further configured to:

Optionally, the generating module 402 is further configured to:

the first loss function is represented by the following formula (1):

（1）

wherein,,

representing said first loss function,/->

Representing the initial update weight parameter, +.>

Representing the initial weight parameters.

Optionally, the parameter in the first neural network is the compression parameter;

the training module 404 is further configured to:

Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a method of compressing a neural network, the method comprising: obtaining M high-order tensors of an initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer; generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer; performing tensor contraction processing on the N target tensors to generate tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network; training the initial neural network based on the updated weight parameters to obtain a target neural network.

Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing a method of compressing a neural network provided by the methods described above, the method comprising: obtaining M high-order tensors of an initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer; generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer; performing tensor contraction processing on the N target tensors to generate tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network; training the initial neural network based on the updated weight parameters to obtain a target neural network.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a method of compressing a neural network provided by the above methods, the method comprising: obtaining M high-order tensors of an initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer; generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer; performing tensor contraction processing on the N target tensors to generate tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network; training the initial neural network based on the updated weight parameters to obtain a target neural network.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of compressing a neural network, comprising:

training the initial neural network based on the updated weight parameters to obtain a target neural network;

wherein the N target tensors are connected through tensor indexes;

summing the common indexes between every two target tensors to generate a tensor contraction result;

the generating, based on the M higher-order tensors, a target tensor network corresponding to the M higher-order tensors includes:

pre-training the initial tensor network based on the initial weight parameters and the initial updating weight parameters in the M high-order tensors until convergence to obtain the target tensor network;

the pre-training the initial tensor network based on the initial weight parameters in the M higher-order tensors and the initial updated weight parameters includes:

the first loss function is represented by the following formula (1):

（1）

wherein,,

representing said first loss function,/->

Representing the initial update weight parameter, +.>

Representing the initial weight parameters.

2. The method for compressing a neural network according to claim 1, wherein training the initial neural network based on the updated weight parameters to obtain a target neural network comprises:

3. A compression device for a neural network, comprising:

the training module is used for training the initial neural network based on the updated weight parameters to obtain a target neural network;

wherein the N target tensors are connected through tensor indexes;

the processing module is further configured to:

the generating module is further configured to:

the first loss function is represented by the following formula (1):

（1）

wherein,,

representation ofSaid first loss function,>

representing the initial update weight parameter, +.>

Representing the initial weight parameters.

4. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of compressing the neural network according to claim 1 or 2 when executing the program.

5. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the method of compressing a neural network according to claim 1 or 2.