CN116187401B - Compression method and device for neural network, electronic equipment and storage medium - Google Patents

Compression method and device for neural network, electronic equipment and storage medium Download PDF

Info

Publication number
CN116187401B
CN116187401B CN202310460357.7A CN202310460357A CN116187401B CN 116187401 B CN116187401 B CN 116187401B CN 202310460357 A CN202310460357 A CN 202310460357A CN 116187401 B CN116187401 B CN 116187401B
Authority
CN
China
Prior art keywords
initial
tensor
tensors
target
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310460357.7A
Other languages
Chinese (zh)
Other versions
CN116187401A (en
Inventor
冉仕举
卿勇
李珂
周鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital Normal University
Original Assignee
Capital Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capital Normal University filed Critical Capital Normal University
Priority to CN202310460357.7A priority Critical patent/CN116187401B/en
Publication of CN116187401A publication Critical patent/CN116187401A/en
Application granted granted Critical
Publication of CN116187401B publication Critical patent/CN116187401B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a compression method, a compression device, electronic equipment and a storage medium of a neural network, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: obtaining M high-order tensors of an initial neural network; generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, wherein the N target tensors comprise compression weight parameters corresponding to initial weight parameters; performing tensor contraction processing on the N target tensors to generate tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network; training the initial neural network based on the updated weight parameters to obtain the target neural network. By the method, the high-order tensor representing the variation parameters of the neural network is written as the contraction of the tensor network, so that the parameter quantity of the initial neural network can be compressed, the storage and transmission cost of a computer is reduced, the overfitting phenomenon of the neural network is relieved, and the generalization capability is enhanced.

Description

Compression method and device for neural network, electronic equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for compressing a neural network, an electronic device, and a storage medium.
Background
With the development of artificial intelligence, neural networks have achieved remarkable results in scientific research in the fields of computer vision, natural language processing, even mathematics and physics. One of the biggest costs in continuously increasing the ability of neural networks to handle complex tasks is the rapid increase in the amount of parameters in the neural network. To date, the parameters of the large-scale neural network break through trillion magnitude, and the huge parameter quantity improves the capability of the neural network, but the storage and transmission cost of a computer aiming at the neural network is increased greatly, the overfitting phenomenon is serious, and the generalization capability is damaged.
Therefore, how to reduce the storage and transmission cost of the computer for the neural network and avoid the over-fitting phenomenon is a problem to be solved at present.
Disclosure of Invention
Aiming at the problems existing in the prior art, the embodiment of the invention provides a compression method, a compression device, electronic equipment and a storage medium of a neural network.
The invention provides a compression method of a neural network, which comprises the following steps:
obtaining M high-order tensors of an initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer;
generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer;
performing tensor contraction processing on the N target tensors to generate tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network;
training the initial neural network based on the updated weight parameters to obtain a target neural network.
Optionally, the N target tensors are connected by a tensor index;
performing tensor contraction processing on the N target tensors to generate a tensor contraction result of the target tensor network, including:
determining a common index among the N target tensors based on the connection relation among the N target tensors;
and carrying out summation operation on the common index between every two target tensors to generate the tensor contraction result.
Optionally, the generating, based on the M higher-order tensors, a target tensor network corresponding to the M higher-order tensors includes:
generating an initial tensor network corresponding to the M high-order tensors based on the M high-order tensors; the initial tensor network comprises N initial tensors, and the N initial tensors comprise initial compression weight parameters corresponding to the initial weight parameters;
performing tensor contraction processing on the N initial tensors to generate tensor contraction results of the initial tensor network; the tensor contraction result of the initial tensor network comprises initial updating weight parameters of the initial neural network;
and pre-training the initial tensor network based on the initial weight parameters in the M high-order tensors and the initial updating weight parameters until convergence to obtain the target tensor network.
Optionally, the pre-training the initial tensor network based on the initial weight parameter in the M higher-order tensors and the initial updated weight parameter includes:
based on the initial weight parameter and the initial updating weight parameter, pre-training the initial tensor network by using Euclidean distance as a first loss function;
the first loss function is represented by the following formula (1):
Figure SMS_1
(1)
wherein,,
Figure SMS_2
representing said first loss function,/->
Figure SMS_3
Representing the initial update weight parameter, +.>
Figure SMS_4
Representing the initial weight parameters.
Optionally, the training the initial neural network based on the updated weight parameter to obtain a target neural network includes:
and training the initial neural network by utilizing a loss function corresponding to the initial neural network based on the updated weight parameters until convergence to obtain the target neural network.
The invention also provides a compression device of the neural network, which comprises:
the acquisition module is used for acquiring M high-order tensors of the initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer;
the generation module is used for generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer;
the processing module is used for performing tensor contraction processing on the N target tensors and generating tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network;
and the training module is used for training the initial neural network based on the updated weight parameters to obtain a target neural network.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of compressing a neural network as described in any one of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of compressing a neural network as described in any one of the above.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a method of compressing a neural network as described in any one of the above.
According to the compression method, the compression device, the electronic equipment and the storage medium of the neural network, the target tensor network comprising N target tensors is generated by acquiring M high-order tensors of the initial neural network, the N target tensors are subjected to tensor contraction processing, a tensor contraction result comprising updated weight parameters of the initial neural network is generated, and the neural network is trained based on the updated weight parameters, so that the target neural network can be obtained; in the method, the core idea is that the high-order tensor of the neural network variation parameter is written as the contraction of the tensor network, so that the parameter quantity of the compression weight parameter corresponding to the initial weight parameter in N target tensors is far smaller than the parameter quantity of the initial weight parameter, and the computer can realize the efficient compression of the initial weight parameter by only storing N target tensors, thereby reducing the storage and transmission cost of the computer for the neural network, relieving the overfitting phenomenon and enhancing the generalization capability of the neural network; and meanwhile, tensor contraction is carried out based on N target tensors, updated weight parameters similar to the initial weight parameters can be restored, the initial neural network is trained by utilizing the updated weight parameters, and the accuracy of the target neural network is ensured.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method for compressing a neural network according to the present invention;
FIG. 2 is a schematic diagram of a compression process of a neural network provided by the present invention;
FIG. 3 is a second flow chart of the compression method of the neural network according to the present invention;
FIG. 4 is a schematic diagram of a compressing apparatus of a neural network according to the present invention;
fig. 5 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to facilitate a clearer understanding of the various embodiments of the present application, some relevant background knowledge is first presented below.
In recent years, with the development of artificial intelligence, neural networks have achieved remarkable results in scientific research in the fields of computer vision, natural language processing, even mathematics and physics. One of the biggest costs in continuously increasing the ability of neural networks to handle complex tasks is the rapid increase in the amount of parameters in the neural network. To date, parameters of large-scale neural networks have broken through in the order of trillion. The huge parameter quantity can raise the capability of the neural network and simultaneously bring a plurality of serious problems, such as the rapid increase of storage and transmission cost, serious overfitting phenomenon, damaged generalization capability and the like. These problems severely limit the practical application of this technology.
In the related art, the method for compressing the neural network parameters includes: model clipping, network distillation, weight sharing, tensor decomposition and the like. The weight tensor in the neural network is represented by using a matrix product operator, so that higher compression efficiency can be obtained. However, the above method still does not compress the neural network parameters well.
Therefore, in order to reduce the huge parameter quantity of the neural network and reduce the storage and transmission cost, thereby relieving the overfitting phenomenon and enhancing the generalization capability of the neural network, the invention provides a novel method for compressing the parameters of the neural network by utilizing a depth tensor network model. The method can compress the parameter quantity of a linear layer, a convolution layer and the like in the neural network to tens of thousands of the original parameter quantity, and can enhance the generalization capability of the neural network under most conditions, alleviate the overfitting phenomenon and improve the accuracy in the test set.
The compression method of the neural network provided by the invention is specifically described below with reference to fig. 1 to 3. Fig. 1 is a schematic flow chart of a method for compressing a neural network according to the present invention, referring to fig. 1, the method includes steps 101 to 103, where:
step 101, obtaining M high-order tensors of an initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer.
It should be noted that the execution body of the present invention may be any electronic device capable of implementing compression of the neural network, for example, any one of a smart phone, a smart watch, a desktop computer, a laptop computer, and the like.
In order to reduce the huge parameter number of the neural network, reduce the storage and transmission cost of the computer for the neural network, alleviate the overfitting phenomenon and enhance the generalization capability of the neural network, in this embodiment, initial weight parameters of the initial neural network need to be acquired first.
It should be noted that, the initial weight parameters are stored in the form of higher-order tensors in the computer.
The compression method of the neural network provided by the invention has high universality and can be used for realizing a general neural network model; thus, the initial neural network may be, for example, a fully connected neural network (Full Connect Neural Network), a convolutional neural network (Convolutional Neural Network, CNN), a recurrent neural network (Recurrent Neural Network, RNN), or the like.
In practical applications, the neural network may be expressed as:
Figure SMS_5
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>
Figure SMS_6
Representing inputs of a neural network, +.>
Figure SMS_7
Representing an output of the neural network; all initial parameters of the neural network +.>
Figure SMS_8
Are stored in the computer in the form of higher order tensors.
102, generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer.
In the present embodiment, since the target tensor network is a form of deforming a higher-order tensor into a plurality of lower-order tensors, the parameter amount of the compression weight parameter stored in each target tensor is much smaller than that in a higher-order tensor in the target tensor network.
Step 103, performing tensor contraction processing on the N target tensors to generate tensor contraction results of the target tensor network; the tensor contraction result includes updated weight parameters of the initial neural network.
In this embodiment, the N target tensors in the target tensor network need to be subjected to tensor contraction processing to obtain the contraction result (i.e. the updated weight parameter of the initial neural network) of the target tensor network.
The object of the invention is to replace the higher order tensor containing the vast majority of the parameters with updated weight parameters of the initial neural network. Since the updated weight parameters are obtained by performing tensor contraction processing based on N target tensors, the parameter amounts of the compression weight parameters contained in the target tensors are far smaller than the parameter amounts of the initial weight parameters stored in the higher-order tensors.
Therefore, the computer only stores N target tensors, and can restore to obtain updated weight parameters similar to the initial weight parameters. That is, the computer can implement compression of the neural network parameters by storing N target tensors.
And 104, training the initial neural network based on the updated weight parameters to obtain a target neural network.
After the updated weight parameters are obtained, the updated weight parameters are utilized to replace the initial weight parameters in the initial neural network, and the initial neural network is trained, so that the performance of the target neural network is ensured.
It should be noted that the compression method of the neural network described in the embodiments of the present invention may be applicable to a variety of different application scenarios, such as image recognition field, text processing field, voice recognition field, and so on. The invention is not limited to the applicable scene of the compression method of the neural network and the type of the neural network.
According to the compression method of the neural network, M high-order tensors of an initial neural network are obtained, a target tensor network comprising N target tensors is generated, tensor contraction processing is carried out on the N target tensors, a tensor contraction result comprising updated weight parameters of the initial neural network is generated, and training is carried out on the neural network based on the updated weight parameters, so that the target neural network can be obtained; in the method, the core idea is that the high-order tensor of the neural network variation parameter is written as the contraction of the tensor network, so that the parameter quantity of the compression weight parameter corresponding to the initial weight parameter in N target tensors is far smaller than the parameter quantity of the initial weight parameter, and the computer can realize the efficient compression of the initial weight parameter by only storing N target tensors, thereby reducing the storage and transmission cost of the computer for the neural network, relieving the overfitting phenomenon and enhancing the generalization capability of the neural network; and meanwhile, tensor contraction is carried out based on N target tensors, updated weight parameters similar to the initial weight parameters can be restored, the initial neural network is trained by utilizing the updated weight parameters, and the accuracy of the target neural network is ensured.
Optionally, the N target tensors are connected by a tensor index;
the tensor contraction processing is performed on the N target tensors, so as to generate a tensor contraction result of the target tensor network, which can be specifically implemented through the following steps (1) - (2):
step (1), determining a common index among the N target tensors based on the connection relation among the N target tensors;
and (2) carrying out summation operation on the common index between every two target tensors to generate the tensor contraction result.
N target tensors in the target tensor network are connected through tensor indexes, and the number of the tensor indexes represents the order of the target tensor. It should be noted that, each tensor index also has dimensions; the tensor index and the dimension of the target tensor can be flexibly adjusted.
In this embodiment, first, it is necessary to determine a common index between N target tensors based on the connection relationship between N target tensors in the target tensor network. That is, it is necessary to determine the tensor index of the connection between the target tensors.
And then carrying out summation operation on the common index between every two target tensors to obtain the compression result (namely the updated weight parameter of the initial neural network) of the target tensor network.
In practice, for example, the initial neural network is compressed to store
Figure SMS_9
Higher order tensors of the individual initial weight parameters (denoted +.>
Figure SMS_10
) I.e. parameter complexity->
Figure SMS_11
Assuming that the index dimension of each target tensor in the target tensor network is 2, obviously, the shrinkage result of the target tensor network, namely the update weight parameter of the initial neural network (used
Figure SMS_12
Indicated) is likewise containing->
Figure SMS_13
Higher order tensors of the parameters. The goal of this embodiment is to shrink the N target tensors to approximate the initial weight parameters, i.e., there is +.>
Figure SMS_14
The process of the tensor contraction process is described in detail below with reference to fig. 2. Fig. 2 is a schematic diagram of a compression process of a neural network provided by the present invention, and referring to fig. 2, the neural network at least includes an input layer, a convolution portion, and a full connection portion, where in a process of processing initial weight parameters of the network by the convolution portion and the full connection portion, the initial weight parameters of the neural network need to be encoded to generate a corresponding target tensor network. The tensor network in fig. 2 is a "brick wall" tensor network structure.
In the target tensor network, there are multiple target tensors A for storing compression weight parameters [0] -A [14] Each target tensor is connected through tensor indexes, and the number of the tensor indexes represents the order of the tensor; the common tensor indicator of the two target tensors is represented by a line segment connecting each other. In calculating the contraction of the target tensor network, the common index of the target tensor needs to be summed.
In fig. 2, the solid line represents the tensor index, and the broken line represents the contraction result of the target tensor network (i.e., the update weight parameter of the initial neural network); each vertical line represents an activation function, the selection of which is flexible, and in this embodiment, the type of the activation function is not particularly limited, and the activation functions, such as a RELU activation function, a Sigmoid activation function, a Tanh activation function, and the like; fig. 2 is an example in which the activation function is a ReLu activation function.
Now assume that the content is to be compressed
Figure SMS_15
Higher order tensors of the individual initial weight parameters (denoted +.>
Figure SMS_16
) I.e. parameter complexity
Figure SMS_17
. Setting the index dimension of each target tensor in the target tensor network to be +.>
Figure SMS_18
Obviously, the contraction result of the target tensor network (in +.>
Figure SMS_19
Representation) is likewise approximately containing +.>
Figure SMS_20
The high order tensor of the parameter is indicated by an unshrunk line segment, as shown by the dashed line on the right boundary of fig. 2.
In the tensor network of the present invention, the most important part is the 2 x 2 tensor group that forms the target tensor network
Figure SMS_21
Figure SMS_22
Figure SMS_23
Wherein->
Figure SMS_24
Total number of target tensors->
Figure SMS_25
Where 2 represents the dimension of each target tensor and 4 2 multiplications represent that the target tensor is a 4 th order tensor.
At the position of
Figure SMS_26
All variational parameters of the target tensor network are included, the total parameter quantity is only +.>
Figure SMS_27
. Due to->
Figure SMS_28
Order from target tensor->
Figure SMS_29
In a linear relationship, thus the parameter quantity of the compression weight parameter of the target tensor network +.>
Figure SMS_30
Parameter amounts to be much smaller than the initial weight parameters in the M higher-order tensors +.>
Figure SMS_31
After the target tensor is contracted, updated weight parameters are obtained, the updated weight parameters are decoded, and then the decoded updated weight parameters are utilized to train the initial neural network, so that the target neural network can be obtained.
In the above embodiment, the parameter number of the initial neural network can be efficiently compressed by performing the summation operation on the common index between every two target tensors, so as to obtain the compression parameter of the initial neural network; because the parameter quantity of the compression parameters is far smaller than that of the initial parameters, the computer can realize the efficient compression of the initial weight parameters by only storing N target tensors, the storage and transmission cost of the computer for the neural network is reduced, the fitting phenomenon is relieved, and the generalization capability of the neural network is enhanced; and meanwhile, tensor contraction is carried out based on N target tensors, updated weight parameters similar to the initial weight parameters can be restored, the initial neural network is trained by utilizing the updated weight parameters, and the accuracy of the target neural network is ensured.
Optionally, the generating, based on the M higher-order tensors, a target tensor network corresponding to the M higher-order tensors may be specifically implemented by the following steps [1] -3):
step [1], based on the M high-order tensors, generating an initial tensor network corresponding to the M high-order tensors; the initial tensor network comprises N initial tensors, and the N initial tensors comprise initial compression weight parameters corresponding to the initial weight parameters;
step 2, performing tensor contraction processing on the N initial tensors to generate tensor contraction results of the initial tensor network; the tensor contraction result of the initial tensor network comprises initial updating weight parameters of the initial neural network;
and step [3], pre-training the initial tensor network based on the initial weight parameters and the initial updating weight parameters in the M higher-order tensors until convergence to obtain the target tensor network.
In this embodiment, in order to generate the optimal N target tensors based on the M high-order tensors, the tensor network needs to be pre-trained first, so as to add stability to the tensor network by giving a good initialization to the tensor network.
Specifically, first, an initial tensor network corresponding to the M high-order tensors needs to be generated based on the M high-order tensors; the initial tensor network comprises N initial tensors, wherein the N initial tensors comprise initial compression weight parameters corresponding to the initial weight parameters, and the initial compression weight parameters are random.
Performing tensor contraction processing on the N initial tensors to generate tensor contraction results of the initial tensor network, wherein the tensor contraction results specifically comprise initial updating weight parameters of the initial neural network; based on the initial weight parameters and the initial updated weight parameters, the initial tensor network is pre-trained with a first loss function.
It should be noted that, the type of the first loss function is not limited in the present invention, and the loss function capable of implementing pre-training on the initial tensor network can be used as the first loss function.
In one implementation of this embodiment, the euclidean distance may be selected as the first loss function, and the pre-training may be performed by minimizing the distance between the initial weight parameter and the initial updated weight parameter.
Optionally, the first loss function is represented by the following formula (1):
Figure SMS_32
(1)
wherein,,
Figure SMS_33
representing said first loss function,/->
Figure SMS_34
Representing the initial update weight parameter, +.>
Figure SMS_35
Representing the initial weight parameters.
At the position of
Figure SMS_36
After convergence, a target tensor network comprising N target tensors is obtained.
Optionally, the training the initial neural network based on the updated weight parameter to obtain a target neural network is specifically implemented through the following steps:
and training the initial neural network by utilizing a loss function corresponding to the initial neural network based on the updated weight parameters until convergence to obtain the target neural network.
In this embodiment, the parameters of the initial neural network need to be optimized with the goal of minimizing the loss function of the machine learning task.
The same loss function as that used to train the initial neural network can be selected for optimization, and the feedforward process in the optimization process is the same as that of the initial neural network, except that the initial weight parameters in the initial neural network are used
Figure SMS_37
With update weight parameter->
Figure SMS_38
Instead of.
Fig. 3 is a second flow chart of a method for compressing a neural network according to the present invention, referring to fig. 3, the method includes steps 301 to 307, wherein:
step 301, obtaining M higher-order tensors of an initial neural network, where the M higher-order tensors include initial weight parameters of the initial neural network, and M is a positive integer.
Step 302, generating an initial tensor network corresponding to the M high-order tensors based on the M high-order tensors; the initial tensor network comprises N initial tensors, the N initial tensors comprise initial compression weight parameters corresponding to initial weight parameters, and N is a positive integer.
And 303, performing tensor contraction processing on the N initial tensors to generate tensor contraction results of the initial tensor network, wherein the tensor contraction results of the initial tensor network comprise initial updating weight parameters of the initial neural network.
Step 304, pre-training the initial tensor network by using the Euclidean distance as a first loss function based on initial weight parameters in M high-order tensors and initial updated weight parameters until convergence to obtain a target tensor network; the target tensor network comprises N target tensors, wherein the N target tensors comprise compression weight parameters corresponding to initial weight parameters, and the parameter quantity of the compression weight parameters is smaller than the parameter quantity of the initial weight parameters.
Step 305, determining a common index between the N target tensors based on the connection relationship between the N target tensors.
And 306, summing the common indexes between every two target tensors to generate a tensor contraction result, wherein the tensor contraction result comprises updated weight parameters of the initial neural network.
Step 307, based on the updated weight parameters, training the initial neural network by using a loss function corresponding to the initial neural network until convergence, thereby obtaining the target neural network.
According to the compression method of the neural network, M high-order tensors of an initial neural network are obtained, a target tensor network comprising N target tensors is generated, tensor contraction processing is carried out on the N target tensors, a tensor contraction result comprising updated weight parameters of the initial neural network is generated, and training is carried out on the neural network based on the updated weight parameters, so that the target neural network can be obtained; in the method, the core idea is that the high-order tensor of the neural network variation parameter is written as the contraction of the tensor network, so that the parameter quantity of the compression weight parameter corresponding to the initial weight parameter in N target tensors is far smaller than the parameter quantity of the initial weight parameter, and the computer can realize the efficient compression of the initial weight parameter by only storing N target tensors, thereby reducing the storage and transmission cost of the computer for the neural network, relieving the overfitting phenomenon and enhancing the generalization capability of the neural network; and meanwhile, tensor contraction is carried out based on N target tensors, updated weight parameters similar to the initial weight parameters can be restored, the initial neural network is trained by utilizing the updated weight parameters, and the accuracy of the target neural network is ensured.
The compression device of the neural network provided by the invention is described below, and the compression device of the neural network described below and the compression method of the neural network described above can be referred to correspondingly. Fig. 4 is a schematic structural diagram of a compressing apparatus for a neural network according to the present invention, and as shown in fig. 4, a compressing apparatus 400 for a neural network includes: an acquisition module 401, a generation module 402, a processing module 403, and a training module 404, wherein:
an acquisition module 401, configured to acquire M higher-order tensors of an initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer;
a generating module 402, configured to generate a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer;
a processing module 403, configured to perform tensor contraction processing on the N target tensors, and generate a tensor contraction result of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network;
and a training module 404, configured to train the initial neural network based on the updated weight parameter, so as to obtain a target neural network.
According to the compressing device of the neural network, provided by the invention, the target tensor network comprising N target tensors is generated by acquiring M high-order tensors of the initial neural network, the N target tensors are subjected to tensor contraction processing, a tensor contraction result comprising updated weight parameters of the initial neural network is generated, and the neural network is trained based on the updated weight parameters, so that the target neural network can be obtained; in the method, the core idea is that the high-order tensor of the neural network variation parameter is written as the contraction of the tensor network, so that the parameter quantity of the compression weight parameter corresponding to the initial weight parameter in N target tensors is far smaller than the parameter quantity of the initial weight parameter, and the computer can realize the efficient compression of the initial weight parameter by only storing N target tensors, thereby reducing the storage and transmission cost of the computer for the neural network, relieving the overfitting phenomenon and enhancing the generalization capability of the neural network; and meanwhile, tensor contraction is carried out based on N target tensors, updated weight parameters similar to the initial weight parameters can be restored, the initial neural network is trained by utilizing the updated weight parameters, and the accuracy of the target neural network is ensured.
Optionally, the N target tensors are connected by a tensor index;
the processing module 403 is further configured to:
determining a common index among the N target tensors based on the connection relation among the N target tensors;
and carrying out summation operation on the common index between every two target tensors to generate the tensor contraction result.
Optionally, the generating module 402 is further configured to:
generating an initial tensor network corresponding to the M high-order tensors based on the M high-order tensors; the initial tensor network comprises N initial tensors, and the N initial tensors comprise initial compression weight parameters corresponding to the initial weight parameters;
performing tensor contraction processing on the N initial tensors to generate tensor contraction results of the initial tensor network; the tensor contraction result of the initial tensor network comprises initial updating weight parameters of the initial neural network;
and pre-training the initial tensor network based on the initial weight parameters in the M high-order tensors and the initial updating weight parameters until convergence to obtain the target tensor network.
Optionally, the generating module 402 is further configured to:
based on the initial weight parameter and the initial updating weight parameter, pre-training the initial tensor network by using Euclidean distance as a first loss function;
the first loss function is represented by the following formula (1):
Figure SMS_39
(1)
wherein,,
Figure SMS_40
representing said first loss function,/->
Figure SMS_41
Representing the initial update weight parameter, +.>
Figure SMS_42
Representing the initial weight parameters.
Optionally, the parameter in the first neural network is the compression parameter;
the training module 404 is further configured to:
and training the initial neural network by utilizing a loss function corresponding to the initial neural network based on the updated weight parameters until convergence to obtain the target neural network.
Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a method of compressing a neural network, the method comprising: obtaining M high-order tensors of an initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer; generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer; performing tensor contraction processing on the N target tensors to generate tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network; training the initial neural network based on the updated weight parameters to obtain a target neural network.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing a method of compressing a neural network provided by the methods described above, the method comprising: obtaining M high-order tensors of an initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer; generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer; performing tensor contraction processing on the N target tensors to generate tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network; training the initial neural network based on the updated weight parameters to obtain a target neural network.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a method of compressing a neural network provided by the above methods, the method comprising: obtaining M high-order tensors of an initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer; generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer; performing tensor contraction processing on the N target tensors to generate tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network; training the initial neural network based on the updated weight parameters to obtain a target neural network.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (5)

1. A method of compressing a neural network, comprising:
obtaining M high-order tensors of an initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer;
generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer;
performing tensor contraction processing on the N target tensors to generate tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network;
training the initial neural network based on the updated weight parameters to obtain a target neural network;
wherein the N target tensors are connected through tensor indexes;
performing tensor contraction processing on the N target tensors to generate a tensor contraction result of the target tensor network, including:
determining a common index among the N target tensors based on the connection relation among the N target tensors;
summing the common indexes between every two target tensors to generate a tensor contraction result;
the generating, based on the M higher-order tensors, a target tensor network corresponding to the M higher-order tensors includes:
generating an initial tensor network corresponding to the M high-order tensors based on the M high-order tensors; the initial tensor network comprises N initial tensors, and the N initial tensors comprise initial compression weight parameters corresponding to the initial weight parameters;
performing tensor contraction processing on the N initial tensors to generate tensor contraction results of the initial tensor network; the tensor contraction result of the initial tensor network comprises initial updating weight parameters of the initial neural network;
pre-training the initial tensor network based on the initial weight parameters and the initial updating weight parameters in the M high-order tensors until convergence to obtain the target tensor network;
the pre-training the initial tensor network based on the initial weight parameters in the M higher-order tensors and the initial updated weight parameters includes:
based on the initial weight parameter and the initial updating weight parameter, pre-training the initial tensor network by using Euclidean distance as a first loss function;
the first loss function is represented by the following formula (1):
Figure QLYQS_1
(1)
wherein,,
Figure QLYQS_2
representing said first loss function,/->
Figure QLYQS_3
Representing the initial update weight parameter, +.>
Figure QLYQS_4
Representing the initial weight parameters.
2. The method for compressing a neural network according to claim 1, wherein training the initial neural network based on the updated weight parameters to obtain a target neural network comprises:
and training the initial neural network by utilizing a loss function corresponding to the initial neural network based on the updated weight parameters until convergence to obtain the target neural network.
3. A compression device for a neural network, comprising:
the acquisition module is used for acquiring M high-order tensors of the initial neural network; the M higher-order tensors comprise initial weight parameters of the initial neural network, and M is a positive integer;
the generation module is used for generating a target tensor network corresponding to the M high-order tensors based on the M high-order tensors; the target tensor network comprises N target tensors, the N target tensors comprise compression weight parameters corresponding to the initial weight parameters, the parameter quantity of the compression weight parameters is smaller than that of the initial weight parameters, and N is a positive integer;
the processing module is used for performing tensor contraction processing on the N target tensors and generating tensor contraction results of the target tensor network; the tensor contraction result comprises updated weight parameters of the initial neural network;
the training module is used for training the initial neural network based on the updated weight parameters to obtain a target neural network;
wherein the N target tensors are connected through tensor indexes;
the processing module is further configured to:
determining a common index among the N target tensors based on the connection relation among the N target tensors;
summing the common indexes between every two target tensors to generate a tensor contraction result;
the generating module is further configured to:
generating an initial tensor network corresponding to the M high-order tensors based on the M high-order tensors; the initial tensor network comprises N initial tensors, and the N initial tensors comprise initial compression weight parameters corresponding to the initial weight parameters;
performing tensor contraction processing on the N initial tensors to generate tensor contraction results of the initial tensor network; the tensor contraction result of the initial tensor network comprises initial updating weight parameters of the initial neural network;
pre-training the initial tensor network based on the initial weight parameters and the initial updating weight parameters in the M high-order tensors until convergence to obtain the target tensor network;
the generating module is further configured to:
based on the initial weight parameter and the initial updating weight parameter, pre-training the initial tensor network by using Euclidean distance as a first loss function;
the first loss function is represented by the following formula (1):
Figure QLYQS_5
(1)
wherein,,
Figure QLYQS_6
representation ofSaid first loss function,>
Figure QLYQS_7
representing the initial update weight parameter, +.>
Figure QLYQS_8
Representing the initial weight parameters.
4. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of compressing the neural network according to claim 1 or 2 when executing the program.
5. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the method of compressing a neural network according to claim 1 or 2.
CN202310460357.7A 2023-04-26 2023-04-26 Compression method and device for neural network, electronic equipment and storage medium Active CN116187401B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310460357.7A CN116187401B (en) 2023-04-26 2023-04-26 Compression method and device for neural network, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310460357.7A CN116187401B (en) 2023-04-26 2023-04-26 Compression method and device for neural network, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116187401A CN116187401A (en) 2023-05-30
CN116187401B true CN116187401B (en) 2023-07-14

Family

ID=86452565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310460357.7A Active CN116187401B (en) 2023-04-26 2023-04-26 Compression method and device for neural network, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116187401B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894457B (en) * 2023-09-11 2023-11-24 深存科技(无锡)有限公司 Network weight access method of deep learning model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276438A (en) * 2019-05-15 2019-09-24 长沙理工大学 A kind of neural network parameter compression method and relevant apparatus
CN111652349A (en) * 2020-04-22 2020-09-11 华为技术有限公司 Neural network processing method and related equipment
CN113011568A (en) * 2021-03-31 2021-06-22 华为技术有限公司 Model training method, data processing method and equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944556B (en) * 2017-12-12 2020-09-08 电子科技大学 Deep neural network compression method based on block item tensor decomposition
CN110263913A (en) * 2019-05-23 2019-09-20 深圳先进技术研究院 A kind of deep neural network compression method and relevant device
CN110428045A (en) * 2019-08-12 2019-11-08 电子科技大学 Depth convolutional neural networks compression method based on Tucker algorithm
US20240232576A1 (en) * 2021-08-04 2024-07-11 The Regents Of The University Of California Methods and systems for determining physical properties via machine learning
CN113989576A (en) * 2021-12-06 2022-01-28 西南大学 Medical image classification method combining wavelet transformation and tensor network
CN115205613A (en) * 2022-05-20 2022-10-18 中国建设银行股份有限公司 Image identification method and device, electronic equipment and storage medium
CN115346053A (en) * 2022-08-25 2022-11-15 中国建设银行股份有限公司 Image feature extraction method and device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276438A (en) * 2019-05-15 2019-09-24 长沙理工大学 A kind of neural network parameter compression method and relevant apparatus
CN111652349A (en) * 2020-04-22 2020-09-11 华为技术有限公司 Neural network processing method and related equipment
CN113011568A (en) * 2021-03-31 2021-06-22 华为技术有限公司 Model training method, data processing method and equipment

Also Published As

Publication number Publication date
CN116187401A (en) 2023-05-30

Similar Documents

Publication Publication Date Title
He et al. Asymptotic soft filter pruning for deep convolutional neural networks
CN111079532B (en) Video content description method based on text self-encoder
CN109671026B (en) Gray level image noise reduction method based on void convolution and automatic coding and decoding neural network
CN107943938A (en) A kind of large-scale image similar to search method and system quantified based on depth product
Dai et al. Incremental learning using a grow-and-prune paradigm with efficient neural networks
CN113011581B (en) Neural network model compression method and device, electronic equipment and readable storage medium
CN108334945B (en) Acceleration and compression method and device of deep neural network
CN116187401B (en) Compression method and device for neural network, electronic equipment and storage medium
CN109284761B (en) Image feature extraction method, device and equipment and readable storage medium
CN110929865A (en) Network quantification method, service processing method and related product
US20220222534A1 (en) System and method for incremental learning using a grow-and-prune paradigm with neural networks
CN113837940A (en) Image super-resolution reconstruction method and system based on dense residual error network
CN113421187B (en) Super-resolution reconstruction method, system, storage medium and equipment
CN110084250A (en) A kind of method and system of iamge description
Qi et al. Learning low resource consumption cnn through pruning and quantization
Verma et al. A" Network Pruning Network''Approach to Deep Model Compression
CN112257466B (en) Model compression method applied to small machine translation equipment
CN114595815A (en) Transmission-friendly cloud-end cooperation training neural network model method
CN117351299A (en) Image generation and model training method, device, equipment and storage medium
CN113536800A (en) Word vector representation method and device
CN114444690B (en) Migration attack method based on task augmentation
CN116109537A (en) Distorted image reconstruction method and related device based on deep learning
CN116090425A (en) Text generation method, system and storage medium based on word replacement
CN112257469B (en) Compression method of deep nerve machine translation model for small mobile equipment
CN110852361B (en) Image classification method and device based on improved deep neural network and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant