CN115357554B - Graph neural network compression method and device, electronic equipment and storage medium - Google Patents
Graph neural network compression method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN115357554B CN115357554B CN202211299256.8A CN202211299256A CN115357554B CN 115357554 B CN115357554 B CN 115357554B CN 202211299256 A CN202211299256 A CN 202211299256A CN 115357554 B CN115357554 B CN 115357554B
- Authority
- CN
- China
- Prior art keywords
- network
- graph
- quantization
- neural network
- bit width
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 255
- 238000000034 method Methods 0.000 title claims abstract description 99
- 230000006835 compression Effects 0.000 title claims abstract description 96
- 238000007906 compression Methods 0.000 title claims abstract description 96
- 238000003860 storage Methods 0.000 title claims abstract description 23
- 238000013139 quantization Methods 0.000 claims abstract description 362
- 238000009826 distribution Methods 0.000 claims abstract description 64
- 238000012549 training Methods 0.000 claims abstract description 63
- 230000002787 reinforcement Effects 0.000 claims abstract description 37
- 230000000670 limiting effect Effects 0.000 claims abstract description 20
- 230000009471 action Effects 0.000 claims description 187
- 230000006870 function Effects 0.000 claims description 53
- 239000013598 vector Substances 0.000 claims description 52
- 239000011159 matrix material Substances 0.000 claims description 41
- 239000003795 chemical substances by application Substances 0.000 claims description 39
- 238000004364 calculation method Methods 0.000 claims description 31
- 230000008569 process Effects 0.000 claims description 24
- 230000004913 activation Effects 0.000 claims description 21
- 238000006243 chemical reaction Methods 0.000 claims description 20
- 238000010586 diagram Methods 0.000 claims description 19
- 230000006399 behavior Effects 0.000 claims description 16
- 238000005070 sampling Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 230000007613 environmental effect Effects 0.000 claims description 7
- 238000011478 gradient descent method Methods 0.000 claims description 7
- 238000005315 distribution function Methods 0.000 claims description 5
- 239000000203 mixture Substances 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 4
- 230000002123 temporal effect Effects 0.000 claims 1
- 238000001994 activation Methods 0.000 description 20
- 238000012545 processing Methods 0.000 description 8
- 238000003062 neural network model Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 6
- 230000007704 transition Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000000342 Monte Carlo simulation Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006386 neutralization reaction Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 101100317378 Mus musculus Wnt3 gene Proteins 0.000 description 1
- 241000270295 Serpentes Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004001 molecular interaction Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a graph neural network compression method, a graph neural network compression device, electronic equipment and a storage medium, and relates to the field of neural networks, wherein the method comprises the following steps: acquiring a trained graph neural network and graph data used in training the graph neural network; determining degree distribution ranges corresponding to all graph vertexes in graph data, and dividing the degree distribution ranges into a plurality of degree intervals; under the constraint of a preset resource limiting condition, determining an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network by using a reinforcement learning and hardware accelerator; quantizing and compressing the vertex characteristics of the graph vertexes corresponding to degrees in the graph data by using the optimal interval quantization bit width, and quantizing and compressing the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network; and determining the optimal quantization bit width for the graph neural network and the graph vertex characteristics by utilizing reinforcement learning, and ensuring that the quantization graph neural network has high precision and lower resource consumption rate.
Description
Technical Field
The present invention relates to the field of neural networks, and in particular, to a method and an apparatus for compressing a neural network, an electronic device, and a storage medium.
Background
In recent years, graph Neural Networks (GNNs) have received a lot of attention because they are capable of modeling irregular structure data. GNNs are widely used in various fields such as graph-based vertex classification, molecular interactions, social networking, recommendation systems or program understanding. Although GNN models are usually few in parameters, GNNs are characterized by high memory usage and high computational effort (expressed as long training or reasoning time) due to the fact that each application's storage and computation requirements are closely related to the size of the input graph data. This feature makes GNNs ineffective for most resource-constrained devices, such as embedded systems and internet of things devices. There are two main reasons behind this embarrassing situation. First, the input to the GNN consists of two types of data, graph structure (edge list) and vertex features (embedding). When the graph size becomes large, it is easy to cause its storage size to increase sharply. This puts a great strain on small devices that have very limited memory budgets. Second, larger scale graph data requires more data operations (e.g., additions and multiplications) and data movement (e.g., memory transactions), which consume a large amount of energy and exhaust the limited power consumption budget on these miniature devices.
To address the above challenges, quantitative compression may emerge as a "rock-two" solution for resource constrained devices, which may: (1) The memory size of the vertex characteristic is effectively reduced, so that the memory use is reduced; (2) minimizing operand size may reduce power consumption. However, the existing quantization method has the following two problems: (1) Simple but aggressive unified quantization of all data selection to minimize memory and power consumption costs, resulting in high accuracy loss; (2) Choosing a very conservative quantization to maintain accuracy, which can result in suboptimal memory and power saving performance; (3) Ignoring the different hardware architectures, all layers of GNN are quantized in a uniform way.
As such, how to perform quantization compression on the graph neural network and the corresponding graph data is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a graph neural network compression method, a graph neural network compression device, electronic equipment and a storage medium, which can automatically determine the optimal quantization bit width for a graph neural network and vertex characteristics in graph data by utilizing reinforcement learning under the constraint of a preset resource limiting condition so as to ensure that the obtained quantization graph neural network has higher precision and lower resource consumption rate.
In order to solve the above technical problem, the present invention provides a graph neural network compression method, including:
acquiring a trained graph neural network and graph data used in training the graph neural network;
determining degree distribution ranges corresponding to all graph vertexes in the graph data, and dividing the degree distribution ranges into a plurality of degree intervals;
under the constraint of a preset resource limiting condition, determining an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network by using a reinforcement learning and hardware accelerator;
and quantizing and compressing the vertex characteristics of the graph vertex corresponding to the degree in the graph data by using the optimal interval quantization bit width, and quantizing and compressing the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network.
Optionally, the determining a degree distribution range corresponding to vertices of all graphs in the graph data, and dividing the degree distribution range into a plurality of degree intervals includes:
arranging all graph vertexes in the graph data from small to large according to degrees to obtain a graph vertex sequence;
dividing the degree distribution range by using the graph vertex sequence to obtain a plurality of degree intervals; the number of the chart vertexes contained in each degree interval is the same or the difference value is smaller than a preset threshold value.
Optionally, after obtaining the optimal quantization map data and the optimal quantization map neural network, the method further includes:
and training the optimal quantization graph neural network by using the optimal quantization graph data to obtain a fine tuning quantization graph neural network so as to deploy the fine tuning quantization graph neural network to external service equipment.
Optionally, the time sequence structure of the hardware accelerator is reconfigurable bit-serial matrix multiplication superposition, and the space structure is a BitFusion architecture.
Optionally, the determining, by using reinforcement learning and a hardware accelerator, an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network under the constraint of a preset resource limitation condition includes:
acquiring reference accuracy corresponding to the execution of a specified task by the graph neural network, and initializing an agent and a historical reward value used by the reinforcement learning; the intelligent agent comprises an actor module and a critic module;
setting the strategy frequency to be 1, and initializing an action sequence and a historical state vector; the action sequence is used for storing an interval quantization bit width corresponding to each degree interval and a network quantization bit width corresponding to the graph neural network; the state vector is used for recording the corresponding memory occupation amount, the calculated amount and the corresponding accuracy when the quantitative map neural network processes the quantitative map data;
setting the time step to be 1, determining continuous actions by using the actor module under the constraint of the preset resource limiting condition, performing numerical updating on the action sequence by using the continuous actions, and determining the memory occupation amount and the calculated amount corresponding to the action sequence after updating;
performing quantization compression on the vertex features in the graph data and the graph neural network by using the action sequence, and sending the obtained quantization graph data and the quantization graph neural network to the hardware accelerator, so that the hardware accelerator trains the quantization graph neural network by using the quantization graph data and determines the current accuracy corresponding to the designated task executed by the trained quantization graph neural network;
determining a current state vector by using the memory occupation amount, the calculated amount and the accuracy corresponding to the action sequence, and determining a reward value by using the reference accuracy and the current accuracy;
when the reward value is determined to be larger than the historical reward value, updating the historical reward value by using the reward value, and updating the optimal interval quantization bit width and the optimal network quantization bit width by using the updated action sequence;
generating conversion data by using the historical state vector, the continuous action, the reward value and the current state vector, and training the actor module and the critic module by using the conversion data so that the critic module updates a strategy used by the actor module when the numerical value is updated;
when the time step is determined not to reach the length of the action sequence, adding 1 to the time step, updating the historical state vector by using the current state vector, and entering the step of determining continuous actions by using the actor module under the constraint of the preset resource limit condition;
when the time step is determined to reach the length of the action sequence and the strategy times do not reach a preset value, adding 1 to the strategy times, and entering the steps of initializing the action sequence and the historical state vector;
and when the strategy times are determined to reach the preset value, outputting the optimal interval quantization bit width and the optimal network quantization bit width.
Optionally, the determining, by the actor module, a continuous action under the constraint of the preset resource limitation condition, performing numerical update on the action sequence by using the continuous action, and determining the memory occupation amount and the calculation amount corresponding to the action sequence after the update includes:
selecting the continuous action by the actor module according to a Behavior strategy, and discretizing the continuous action in the following mode to obtain a discrete action value:
wherein,is shown asThe i-th quantization bit width in the action sequence of each time step corresponds to a successive action,is shown andcorresponding discrete action values, Q comprising a plurality of predetermined quantization bit width values,a function of rounding off is represented by,andrepresenting a preset minimum quantization bit width and a maximum quantization bit width,the function is used to select a target preset quantization bit width value Q in Q such thatMinimum;
performing numerical value updating on the action sequence by using the action value, determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence, and judging whether the memory occupation amount, the calculated amount and the delay amount meet the limitation of the preset resource limitation condition;
if yes, performing quantitative compression on the vertex features in the graph data and the graph neural network by using the action sequence;
if not, sequentially reducing the quantization bit width in the action sequence according to a preset sequence so as to update the action sequence again, and entering the step of determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence when each reduction action is completed.
Optionally, the selecting, with the actor module, a continuous action according to a Behavior policy includes:
selecting, with the actor module, a continuous action according to a Behavior policy as follows:
wherein,is shown asRandom UO noise corresponding to each time step,is shown asThe historical state vector corresponding to each time step,representing an online actor network in the actor module,representing online actor network parameters.
Optionally, the hardware accelerator trains the quantization map neural network with the quantization map data, including:
and the hardware accelerator trains the quantization map neural network by utilizing the quantization map data based on a small batch stochastic gradient descent method.
Optionally, the determining the memory occupation amount, the calculation amount, and the delay amount corresponding to the updated action sequence includes:
calculating the memory footprint using the following formula:
wherein,is indicative of the amount of memory usage,representing the number of graph vertices within a single mini-batch,representing the quantization map neural networkThe vertex dimension values corresponding to the individual network layers,,representing the number of all network layers of the neural network of the quantization map,represents the maximum value of the interval quantization bit widths to which all graph vertices within a single mini-batch are assigned, S represents the total number of convolution kernels,andrespectively representing the weight matrix of each network layer of the neural network of the quantization diagram and the network quantization bit width corresponding to the convolution kernel;
the calculated amount is calculated using the following formula:
wherein,the amount of the calculation is represented by,network quantization bit width corresponding to an activation matrix of each network layer of the quantization graph neural network is represented,representation of quantization map neural networkA total number of multiply-accumulate operations for a layer;
the delay amount is calculated using the following formula:
wherein,the amount of the delay is represented by,representing the quantization graph neural networkThe network layer handles the delay of small batches of graph data.
Optionally, the performing quantization compression on the vertex features in the graph data by using the action sequence includes:
truncating the vertex features of each graph vertex in the graph data into a range of [ -c, c ] (c > 0) and performing quantization compression on the truncated vertex features by using interval quantization bits corresponding to the degrees of the graph vertices in the action sequence:
wherein,which represents a function of the quantization compression,a function of rounding off is represented by,representing a truncation function ofIs cut off to,The characteristics of the vertex are represented and,representing the jth component in the vertex feature,which represents a scaling factor, is the ratio of the scaling factor,,representing the sum in the sequence of actionsAnd interval quantization bits corresponding to degrees of the vertex of the graph.
Optionally, before performing quantization compression on vertex features in the graph data by using the action sequence, the method further includes:
the c value is determined by:
wherein,the function is used to select the value of x such thatAt the minimum, the temperature of the mixture is controlled,to representCharacteristic distribution ofKL divergence between feature distributions of (a); the characteristic distribution is a maximum value, a minimum value, a mean value, a variance, a sharpness or a kurtosis.
Optionally, the actor module includes an online actor network and a target actor network, the critic module includes an online critic network and a target critic network, and the initializing the agent used for reinforcement learning includes:
initializing online actor network parameters of the online actor network, and setting target actor network parameters of the target actor network and the online actor network parameters to be the same values;
initializing the online commenting family network parameters of the online commenting family network, and setting the target commenting family network parameters of the target commenting family network and the online commenting family network parameters to be the same values.
Optionally, the training the actor module and the critic module using the transformation data comprises:
adding the conversion data to the experience playback pool, and randomly sampling a preset number of conversion data from the experience playback pool as training data;
determining a first gradient of the online critic network parameter using the training data, the target actor network, the target critic network, the online critic network, and a loss function as follows;
wherein, theThe loss function is represented by a function of the loss,the sequential action is represented by the sequential action,denotes the firstThe historical state vector corresponding to each time step,representing an online network of critics that is online,representing the online critic's network parameters,representing the preset number;representing an estimate of the target critic's network,,is shown asThe reward value corresponding to each time step is calculated,a pre-set discount factor is represented by,a network of target critics is represented by,representing the network parameters of the target critic,a network of target actors is represented as,a network parameter representing the target actor is shown,is shown asCurrent state vectors corresponding to the time steps;
updating the online commenting family network parameters according to the first gradient;
determining a performance goal using the training data, the updated online critic network, the online actor network, and an objective function, and determining a second gradient of the performance goal with respect to determining the online actor network parameters:
wherein,indicating when the environmental state O obeys a distribution function ofWhen being distributedThe expected value of (c) is,;a network parameter representing the on-line actor is shown,representing the second gradient;
updating the online actor network parameters based on the second gradient;
updating the target comment family network parameter and the target actor network parameter by using the updated online comment family network parameter and the updated online actor network parameter in the following way:
The invention also provides a graph neural network compression device, comprising:
the acquisition module is used for acquiring the trained graph neural network and graph data used in the training process;
the interval determining module is used for determining degree distribution ranges corresponding to all graph vertexes in the graph data and dividing the degree distribution ranges into a plurality of degree intervals;
a quantization bit width determining module, configured to determine, by using a reinforcement learning and a hardware accelerator, an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network under a constraint of a preset resource restriction condition;
and the quantization compression module is used for performing quantization compression on the vertex characteristics of the graph vertex corresponding to the degree in the graph data by using the optimal interval quantization bit width and performing quantization compression on the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network.
The present invention also provides an electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the graph neural network compression method as described above when executing the computer program.
The present invention also provides a computer-readable storage medium having stored thereon computer-executable instructions that, when loaded and executed by a processor, implement the graph neural network compression method as described above.
The invention provides a graph neural network compression method, which comprises the following steps: acquiring a trained graph neural network and graph data used in training the graph neural network; determining degree distribution ranges corresponding to all graph vertexes in the graph data, and dividing the degree distribution ranges into a plurality of degree intervals; under the constraint of a preset resource limiting condition, determining an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network by using a reinforcement learning and hardware accelerator; and quantizing and compressing the vertex characteristics of the graph vertex corresponding to the degree in the graph data by using the optimal interval quantization bit width, and quantizing and compressing the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network.
It can be seen that, when the trained graph neural network and the graph data used in the training are obtained, the degree distribution range corresponding to all graph vertexes in the graph data is counted firstly, and the return is divided into a plurality of degree intervals; subsequently, under the constraint of a preset resource limitation condition, determining the optimal interval quantization bit width corresponding to each degree interval and the optimal network quantization bit width corresponding to the graph neural network by adopting reinforcement learning and a hardware accelerator, and performing quantization compression on the vertex characteristics of the graph data and the graph neural network by utilizing the two quantization bit widths, wherein the reinforcement learning can automatically search the optimal quantization bit width distribution strategy corresponding to each degree interval and the graph neural network according to the feedback of the hardware accelerator, namely the automatic search of the optimal interval quantization bit width and the optimal network quantization bit width can be realized; meanwhile, the automatic searching action of reinforcement learning is limited by a preset resource limiting condition, namely the finally obtained optimal interval quantization bit width and the optimal network quantization bit width can be ensured to be suitable for resource limited equipment; finally, the degree distribution range of the graph vertex is divided into a plurality of degree intervals, and the corresponding optimal interval quantization bit width is determined for each interval, namely, the quantization compression of the vertex characteristics of the graph vertex with different degrees can be carried out to different degrees, so that the problem of high precision loss easily caused by the simple selection of all data but the advanced unified quantization in the existing scheme can be effectively avoided. In brief, because the invention adopts reinforcement learning to determine the optimal quantization bit width for the graph neural network and the graph data used in the training, the automatic determination of the quantization bit width can be realized, and the relationship between the performance and the network model precision can be effectively balanced, so that the finally obtained quantization graph data and the quantization graph neural network not only have higher precision, but also can be suitable for resource-limited equipment. The invention also provides a graph neural network compression device, electronic equipment and a computer readable storage medium, which have the beneficial effects.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method for compressing a neural network according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an exemplary architecture of a neural network according to an embodiment of the present invention;
FIG. 3 is a block diagram of a neural network compression system according to an embodiment of the present invention;
FIG. 4 is a block diagram illustrating a neural network compression apparatus according to an embodiment of the present invention;
fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention;
fig. 6 is a block diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
In order to more effectively perform quantization compression on the graph neural network model to ensure that the quantized graph neural network obtained by compression has higher precision and lower resource consumption rate, embodiments of the present invention can provide a graph neural network compression method, which can automatically determine an optimal quantization bit width for the graph neural network and graph data by using reinforcement learning under the constraint of a preset resource limitation condition to ensure that the obtained quantized graph neural network has higher precision and lower resource consumption rate. Referring to fig. 1, fig. 1 is a flowchart of a neural network compression method according to an embodiment of the present invention, where the method includes:
s100, obtaining the trained graph neural network and graph data used in training.
It should be noted that the graph neural network obtained in this step is an original, full-precision graph neural network, and the graph data is training data of the graph neural network, where parameters such as weights and convolution kernels included in the graph neural network and the graph data are floating point type data and are mostly represented by FP 32. Floating point data are highly accurate, but correspondingly, the memory space required to store them is also large. The invention aims to find out proper quantization bit width for the weight, convolution kernel parameters and the like of each layer of the graph neural network and graph data on the premise of ensuring the inference precision of the graph neural network model so as to reduce the requirement of storage space. The quantization bit width here is generally an integer representing a low precision, such as int4, int8, etc.
For ease of understanding, the graph data and the graph neural network will first be briefly described. Graph data is the basic input content of a graph neural network. Consider a graph G = (V, E) with n vertices and m edges, i.e., with | V | = n and | E | = m, with the average degree of the graph vertices d = m/n. Connectivity in the graph is represented by an adjacency matrixGive out the elementsRepresenting verticesAndthe adjacent edges of the two adjacent edges are adjacent,indicating no adjacency. The degree matrix D is a diagonal matrix, and the values of n elements on the main diagonal respectively represent degrees of n vertices, and the remaining elements are all zero. Each vertexAll have a length ofFeature vectors of all graph vertexes form a feature matrix. In the embodiment of the invention, the part to be compressed in the graph data is a feature matrix formed by feature vectors of all graph verticesThe matrix belongs to floating point type data.
Further, a graph neural network is a special neural network that can handle irregularly structured data. Although the structure of the graph neural network can be designed following different guidelines, almost all graph neural networks can be interpreted as performing message passing on vertex features, followed by feature transformation and activation. FIG. 2 illustrates the structure of a typical graph neural network: it is composed of input layer, L-layer graph volume layer and output layer. The input layer is responsible for reading the adjacency matrix A or the adjacency list AdjList of the topological structure of the token graph and the vertex characteristic matrix. The graph convolution layer is responsible for the extraction of the vertex features, and for each layer of the graph convolution layerIt reads in adjacency matrix A or adjacency list AdjList, and vertex feature matrixOutputting new vertex feature matrix through graph convolution operation and nonlinear transformation. The output layer can be freely set according to different tasks, for example, vertex classification can be realized through a softmax function. Typically, in a neural network consisting of L-layer map convolutional layers, the secondThe graph convolution operation of a layer can generally be written in the form:
wherein,representation definition of message passing operatorA graph convolution kernel;representing a non-linear activation function.Is the firstThe s-th convolution kernel of a layer corresponds to a learnable linear weight matrix,is shown asThe layer map wraps the vertex feature dimensions of the layer input. Within this general framework, the main differences of the different-map neural networks now select different-map convolution kernels. Whether it is the vertex feature matrix X, or the graph convolution kernel F, or the weight W, they are typically floating point type data. It should be noted that only the graph convolution layer has convolution kernels and activations, and the input and output layers have weights only.
It should be noted that the embodiments of the present invention are not limited to specific graph neural networks and graph data. As described above, the structure of the graph neural network can be designed following different guidelines; meanwhile, it can be understood that the specific content of the graph data, even the complexity thereof, may be different for different tasks, and thus the specific graph neural network and the graph data may be selected according to the actual application requirements. The compression method provided by the embodiment of the invention is suitable for various graph neural networks because the embodiment of the invention adopts reinforcement learning to determine the optimal quantization bit width corresponding to the graph neural networks and the graph data, and the reinforcement learning has stronger adaptability to various environments.
S200, determining the degree distribution range corresponding to all graph vertexes in the graph data, and dividing the degree distribution range into a plurality of degree intervals.
In the prior art, quantization compression of vertex features of each graph vertex in graph data is generally performed by using a uniform quantization bit width. Although the complexity and the storage scale of the graph data are effectively reduced, the indiscriminate quantization compression method brings significant precision loss to the graph neural network model. Therefore, in the embodiment of the present invention, for graph vertices with different degrees in graph data, different quantization bit widths may be used for compression, so as to alleviate the loss of precision of the graph neural network model caused by quantization of the graph data. In particular, in graph neural network computations, vertices with higher degrees will typically get more rich information from neighboring vertices, which makes them more robust to low quantization bits, since the random error of quantization can typically average to 0 through a large number of aggregation operations. In particular, given a quantization bit width q, a vertexQuantization error ofIs a random variable and follows a uniform distribution. For the vertex with a larger degree, the vertex can be takenAnd its adjacent vertexPolymerizing a large amount ofAndand the average result will converge to 0 according to the law of large numbers. Thus, vertices with a large number of degrees are more robust to quantization errors, and smaller quantization bits may be used for those vertices with a high number of degrees, while larger quantization bits may be used for vertices with a low number of degrees.
Further, since the vertex degrees of the real-world graph mostly follow power-law distribution, if quantization bit widths are allocated to each graph vertex with different degrees, the state space explosion will be caused. For example, even for a small scale of the graph data com-levejoural, a significant portion of the vertex degrees are spread between 1 and 10 4 In the meantime. If the quantization space is 8, the state space will reach 8 surprisingly 10000 . Obviously, such a huge state space cannot meet the application requirements. Therefore, in order to reduce the complexity of the state space, the embodiment of the present invention may first count the degrees corresponding to each graph vertex in the graph data to obtain the degree distribution range corresponding to the graph data, and then divide this range into a plurality of degree intervals to determine the optimal interval quantization bit width for each interval, so that the size of the state space can be greatly reduced, and the search convenience of the optimal quantization bit width is further improved. According to the above description, the distribution rule for finally obtaining the optimal interval quantization bit width should be: the larger the degree value corresponding to the degree interval is, the larger the corresponding optimal interval quantization bit width is. It should be noted that the method for dividing the power distribution range is not limited in the embodiments of the present invention, and for example, the power distribution range may be divided equally, or the power distribution range may be divided according to the distribution of the graph vertices in the range, for example, the number of graph vertices corresponding to each power interval may be ensured to be equalThe same or close. In order to further reduce the precision loss, in the embodiment of the present invention, the degree distribution range may be divided according to the distribution of the graph vertices in the range, so as to ensure that the graph vertices included in each interval are the same in number.
In one possible case, determining a degree distribution range corresponding to all graph vertices in the graph data, and dividing the degree distribution range into a plurality of degree intervals may include:
step S201, arranging all graph vertexes in graph data from small to large according to degrees to obtain a graph vertex sequence;
step S202, dividing a degree distribution range by using a graph vertex sequence to obtain a plurality of degree intervals; the number of the chart vertexes contained in each degree interval is the same or the difference value is smaller than a preset threshold value.
It should be noted that, the embodiment of the present invention does not limit the specific value of the preset threshold, and the preset value may be set according to the actual application requirement. In order to reduce the data difference between the top points of the graphs in the graph degree interval, the value of the preset threshold value can be as small as possible. Specifically, for graph data G = (V, E), the vertex degree distribution may be counted first, and all vertices in the graph G are sorted from small to large in degree. Finding a list of vertex degree segmentation points in the sequenceTo divide all vertices into k intervals:the number of vertices falling in each interval is made equal or close to each other. Wherein,;and,andrespectively representing the minimum value and the maximum value of all vertex degrees in certain graph data. On the basis, a vertex degree-quantization bit width distribution table is establishedGraph vertices in the same interval are assigned the same quantization bit width: if the vertex degrees fall withinThe interval is allocatedBit width.
S300, under the constraint of a preset resource limiting condition, determining the optimal interval quantization bit width corresponding to each degree interval and the optimal network quantization bit width corresponding to the graph neural network by using reinforcement learning and a hardware accelerator.
After the division of the degree intervals is completed, the embodiment of the invention determines the optimal interval quantization bit width corresponding to each degree interval and the optimal network quantization bit width corresponding to the graph neural network model by using reinforcement learning and a hardware accelerator under the constraint of a preset resource limiting condition. It should be noted here that the optimal network quantization bit width corresponding to the graph neural network specifically refers to an optimal network quantization bit width corresponding to a graph convolution kernel matrix, a weight matrix, and an activation matrix of the graph neural network, and the optimal network quantization bit widths corresponding to the three matrices may be the same or different; in addition, the optimal network quantization bit widths corresponding to the convolution kernel matrix, the weight matrix and the activation matrix of each network layer of the graph neural network can be the same or different, and can be selected according to the practical application requirements, wherein the input layer and the output layer do not have the convolution kernel matrix and the activation matrix, and the convolution layer has the convolution kernel matrix and the activation matrix. It can be understood that although different optimal network quantization bit widths can bring higher network model accuracy, the search computation amount of the optimal network quantization bit width is easily increased, and therefore, the setting of the optimal network quantization bit widths of the three matrices can be set as needed after balancing the network model accuracy and the search computation amount. Of course, it should be noted that each network layer in the graph neural network does not have the graph convolution kernel matrix, the weight matrix, and the activation matrix, for example, the convolution layer has three matrices, but the input layer and the output layer do not have the graph convolution kernel matrix and the activation matrix. Therefore, when setting the network quantization bit width for the graph neural network, the setting can be further performed according to the specific structure of the graph neural network.
Further, the preset resource limitation condition is used to limit the computation resources consumed for processing the quantized graph data and the quantized graph neural network (such as training, executing a specified task, and the like), which is because the graph neural network consumes more computation resources, and if the quantized compression is performed arbitrarily without considering a specific hardware framework, the quantized graph data and the quantized graph neural network obtained finally may have a larger processing computation amount, a larger memory occupation amount, and a longer processing delay, and are not favorable for deployment and application. Therefore, the embodiment of the invention limits the reinforcement learning by adopting the preset resource limiting condition. It should be noted that the embodiment of the present invention does not limit the specific preset resource limiting conditions, and may include a calculated amount threshold, a memory occupied amount threshold, and a delay amount threshold, for example, and each threshold is provided with a corresponding calculation formula for calculating the calculated amount, the memory occupied amount, and the delay amount corresponding to the quantized graph data and the quantized graph neural network. It can be understood that the calculated amount, the memory occupied amount and the delay amount corresponding to the quantized graph data and the quantized graph neural network should be less than or equal to the corresponding calculated amount threshold value, the memory occupied amount threshold value and the delay amount threshold value. The threshold and the corresponding formula are determined by direct feedback of a hardware accelerator, wherein the hardware accelerator is used for verifying the quantization effect of the graph data and the graph neural network, such as verifying the consumption of the quantization compression network on the computing resources and the corresponding accuracy of the network when executing the specified task. It should be noted that, the embodiment of the present invention does not limit the specific calculated amount threshold, the memory occupied amount threshold, and the delay amount threshold, nor the specific corresponding calculation formula of the above thresholds, and may be set according to the actual application requirements, or refer to the description in the following embodiments. The embodiment of the present invention also does not limit the specific structure of the hardware accelerator, for example, the time sequence structure of the hardware accelerator may be a reconfigurable Bit-Serial Matrix Multiplication Overlay (BISMO), and the space structure may be a BitFusion architecture. A preferred hardware accelerator configuration can be found in the table below.
TABLE 1 configuration of hardware accelerators
Hardware accelerator model | Batch size | PE array | AXI port | Block RAM |
Zynq-7020 | 1 | 8*8 | 4×64b | 140×36Kb |
F37X | 16 | 16*16 | 4*256b | 2160*36KB |
Further, reinforcement learning is one of the paradigms and methodologies of machine learning to describe and solve the problem of an agent (agent) learning strategies to maximize a return or achieve a specific goal during interaction with the environment. The problems to be solved by reinforcement learning are as follows: let agents learn how to perform actions in an environment to obtain a maximum bonus value total (total rewarded). This reward value is typically associated with a task goal defined by the agent. The main learning contents of the intelligent agent comprise: the first is action policy, and the second is planning. The learning goal of the behavior strategy is an optimal strategy, namely, by using the strategy, the behavior of the intelligent agent in a specific environment can obtain the maximum reward value, so that the task goal of the intelligent agent is realized. Actions (actions) can be simply divided into: (1) Continuous, such as steering wheel angle, throttle, brake control signals in racing games, joint servo motor control signals of robots; and (2) discrete, such as go, greedy snake games, and the like.
Embodiments of the present invention specifically use a reinforcement learning method based on both value and policy, which may also be referred to as an Actor-Critic method. The Actor-criticic method combines the advantages of a value-based method and a policy-based method, improves sampling efficiency by learning a Q-value function or a state-value function V using the value-based method (the part processed by a reviewer), and learns a policy function using the policy-based method (the part processed by an Actor), thereby being applicable to a continuous or high-dimensional motion space. The Actor-criticic method can be regarded as an extension of a value-based method in a continuous motion space, and can also be regarded as an improvement of a strategy-based method in terms of reducing sample variance and improving sampling efficiency.
Specifically, referring to fig. 3, fig. 3 is a block diagram of a neural network compression system according to an embodiment of the present invention, where the system includes four parts, namely, an Actor-critical framework-based DDPG (Deep Deterministic Policy Gradient) agent, a Policy, a quantization implementation, and a hardware accelerator. The DDPG agent gives actions according to a specific strategy on the premise of meeting the constraint of hardware accelerator resources (namely, preset resource limiting conditions) according to the current environment state O: the appropriate quantization bit widths are assigned to the features of the vertices of each degree interval and the graph convolution kernels (if any), weights and activations (if any) of all layers of the graph neural network. And the upper computer quantizes the trained floating point diagram neural network model and diagram data according to a quantization bit width distribution scheme provided by the DDPG intelligent body to obtain a quantization diagram neural network model and quantization diagram data. Then, the quantized data and the quantized network are mapped or distributed to the hardware accelerator, and the hardware accelerator trains the quantized graph neural network by using the quantized graph data, executes a specified task by using the quantized graph neural network after the training, and feeds back the accuracy difference of the quantized graph neural network and the quantized graph neural network to the DDPG agent as a reward. And the DDPG intelligent body adjusts the strategy according to the information fed back by the environment and outputs a new action until an optimal strategy is obtained. Of course, the system may also include other workflows, and to avoid redundancy of description, please refer to the description in the following embodiments for the specific workflow of the system.
S400, quantizing and compressing the vertex characteristics of the graph vertex corresponding to the degree in the graph data by using the optimal interval quantization bit width, and quantizing and compressing the graph neural network by using the optimal network quantization bit width to obtain the optimal quantization graph data and the optimal quantization graph neural network.
After the optimal interval quantization bit width and the optimal network quantization bit width are obtained, the vertex features of each graph vertex in the corresponding graph data and the graph neural network can be quantized and compressed to obtain the optimal quantization graph data and the optimal quantization graph neural network. The embodiment of the present invention does not limit the specific steps of the quantitative compression, and may be set according to the actual application requirements, or refer to the description in the following embodiments. It should be noted that although embodiments of the present invention have endeavored to improve the accuracy of the optimal quantization map neural network, quantization compression itself may negatively impact the accuracy with which the optimal quantization map neural network performs the specified task. In this regard, after the quantization compression is finished, the optimal quantization map data is used again to train the quantization map neural network so as to recover the accuracy of the optimal quantization map neural network in executing the designated task, so that the finally obtained fine tuning quantization map neural network is deployed to external service equipment for external service.
In one possible case, after obtaining the optimal quantization map data and the optimal quantization map neural network, the method may further include:
s500, training the optimal quantization graph neural network by using the optimal quantization graph data to obtain a fine tuning quantization graph neural network, and deploying the fine tuning quantization graph neural network into external service equipment.
It should be noted that, the embodiment of the present invention does not limit the training process of the optimal quantization graph neural network, and reference may be made to the related art of the graph neural network.
Based on the embodiment, when the trained graph neural network and the graph data used in the training are obtained, the degree distribution range corresponding to all graph vertexes in the graph data is counted firstly, and the degree distribution range is divided into a plurality of degree intervals; then, under the constraint of a preset resource limitation condition, determining an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to a graph neural network by adopting reinforcement learning and a hardware accelerator, and performing quantization compression on the vertex characteristics of each graph vertex in graph data and the graph neural network by utilizing the two quantization bit widths, wherein the reinforcement learning can automatically search an optimal quantization bit width distribution strategy corresponding to each degree interval and the graph neural network according to the feedback of the hardware accelerator, namely, the automatic search of the optimal interval quantization bit width and the optimal network quantization bit width can be realized; meanwhile, the automatic search action of reinforcement learning is limited by a preset resource limiting condition, namely the finally obtained optimal interval quantization bit width and the optimal network quantization bit width can be ensured to be suitable for resource limited equipment; finally, the degree distribution range of the graph vertex is divided into a plurality of degree intervals, and the corresponding optimal interval quantization bit width is determined for each interval, namely, the quantization compression of the vertex characteristics of the graph vertex with different degrees can be carried out to different degrees, so that the problem of high precision loss easily caused by the simple selection of all data but the advanced unified quantization in the existing scheme can be effectively avoided. In brief, because the invention adopts reinforcement learning to determine the optimal quantization bit width for the graph neural network and the graph data used in the training, the automatic determination of the quantization bit width can be realized, and the relationship between the performance and the network model precision can be effectively balanced, so that the finally obtained quantization graph data and the quantization graph neural network not only have higher precision, but also can be suitable for resource-limited equipment.
Based on the above embodiments, the specific workflow of the neural network compression system will be described below. For ease of understanding, the sequence of actions, policies, time steps, prize values and conversion data presented hereinafter will be described first. The action sequence is used for storing the interval quantization bit width corresponding to each degree interval and the network quantization bit width corresponding to the graph neural network, for example, for given graph data G = (V, E), the vertex degree distribution range is counted first, and is divided into k intervals according to a certain strategy. Further, for k degree intervals and three matrices of the graph neural network, the length of the action sequence may be k +3. The process of determining a complete sequence of actions is called an one-time strategy (epicode) which contains N time steps (step), where the value of N is equal to the length of the sequence of actions. It should be noted that the sequence of actions is updated once per time step performed, so a strategy can typically produce N different sequences of actions. Further, it can be understood that the action sequences can be used for quantization compression, and since the previous action sequence is different from the next action sequence, the compression effects corresponding to the two action sequences are also different, in other words, the resource consumption (such as memory occupancy rate, calculation amount, etc.) corresponding to the quantized graph data and the quantized graph neural network generated by using the two action sequences is different, and the accuracy corresponding to the execution of the specified task is also different. Therefore, in the embodiment of the present invention, the state vector may be used to record the resource consumption and the change between the accuracy, specifically, for the quantized graph data and the quantized graph neural network compressed by the previous action sequence, the memory occupancy, the calculation amount and the accuracy corresponding to the execution of the specified task may be recorded by using the historical state vector, and the quantized graph data and the quantized graph neural network compressed by the next action sequence, the memory occupancy, the calculation amount and the accuracy corresponding to the execution of the specified task may be recorded by using the current state vector. Further, the original graph neural network can be used for executing reference accuracy corresponding to a specified task and the quantized graph neural network can be used for executing accuracy corresponding to the same task to determine the reward value, wherein the reference accuracy specifically refers to inference accuracy corresponding to the graph neural network after the original graph neural network is trained by using original graph data, such as classification accuracy in a classification task. After that, the historical state vector, the action sequence, the reward value and the current state vector corresponding to each time step constitute a transformation data (transition), obviously, the data includes the action, the reward and the state transition of the quantization compression, and the agent can sense the execution effect of the action through the data. In other words, the agent may be trained using the transformed data to update the strategy that the agent employs in determining the action.
Based on the above description, the following describes in detail a specific workflow of the graph neural network compression system, where in a possible case, under the constraint of a preset resource limitation condition, determining an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network by using a reinforcement learning and hardware accelerator may include:
s310, acquiring the reference accuracy corresponding to the execution of the designated task by the graph neural network, and initializing an agent and a historical reward value used for reinforcement learning; the agent includes an actor module and a critic module.
It should be noted that the embodiments of the present invention are not limited to the specific tasks performed by the neural network, and can be set according to the actual application requirements. The accuracy with which the original graph neural network performs the task will be set as the reference accuracy by embodiments of the present invention. The embodiment of the invention also does not limit the calculation mode of the accuracy, and can be set according to the actual application requirement. In one possible case, for a multi-classification task, a test pattern dataset is setWith only one class label per vertex and all vertices in commonA category label ofThe ratio of the number of vertexes to the total number of vertexes ofAnd is and. Considering each class as a "positive class (positive)" and the rest as a "negative class (negative)", and by using the definition of the corresponding index in the classical two-classification problem, the classification accuracy of the multi-classification problem can be defined as:
further, in order to determine the optimal interval quantization bit width and the optimal network quantization bit width in the search process of the agent, the embodiment of the present invention also specifically sets a historical reward value for recording the highest reward value occurring in the search process. When the highest reward value appears, the embodiment of the invention updates the historical record value, the optimal interval quantization bit width and the optimal network quantization bit width. Of course, it is understood that the historical award values should also have initial values, and the initialization process herein is to set initial values for the historical award values. The embodiment of the invention does not limit the specific initial value of the historical reward value, and the historical reward value is only required to be as small as possible.
Further, the embodiment of the present invention does not limit the specific process of initializing the agent, where the initialization mainly initializes the parameters in the agent, and reference may be made to the related technology of the DDPG agent.
S320, setting the strategy frequency to be 1, and initializing an action sequence and a historical state vector; the action sequence is used for storing the interval quantization bit width corresponding to each degree interval and the network quantization bit width corresponding to the graph neural network; the state vector is used for recording the corresponding memory occupation amount, the calculated amount and the corresponding accuracy when the quantized graph neural network processes the quantized graph data.
Specifically, the sequence of actions may be represented as:
wherein ""means belonging to the intervalIs used for allocating quantized bit width,“”、“"and""the quantized bit width set for the graph convolution kernel (if any), weight and activation (if any) of all layers of the graph neural network respectively is,And. Of course, if different quantization bit widths are assigned to the graph convolution kernels (or weights or activations) of different layers of the graph neural network, at this time, the length of the action sequence a of the DDPG agent will become k +3l +2, where L represents the number of graph convolution layers, that is:
further, the state vector may be represented as:
where acc represents accuracy, store represents memory footprint, and comp represents computational effort. The determination of the memory occupation amount and the calculation amount can refer to the description in the subsequent embodiments.
S330, setting the time step to be 1, determining continuous actions by using an actor module under the constraint of a preset resource limiting condition, updating numerical values of the action sequence by using the continuous actions, and determining the memory occupation amount and the calculated amount corresponding to the action sequence after updating.
It will be appreciated that the numerical update of the action sequence by the actor module corresponds to the actor module giving an action based on the current state and strategy. It is noted that the actor module (actor) first determines a continuous action and then uses this continuous action to update the value of the action sequence. However, since the quantization bit width is usually a discrete value, for example, the conventional quantization bit width is 2, 4, 8, 16, 32 bits, etc., after obtaining the continuous operation, it is first necessary to discretize the continuous operation to obtain a discrete operation value, and then update the operation sequence with the discrete operation value. This process is described in detail below.
In a possible case, under the constraint of a preset resource limitation condition, determining continuous actions by using an actor module, performing numerical update on an action sequence by using the continuous actions, and determining the memory occupation amount and the calculated amount corresponding to the action sequence after the update, the method comprises the following steps:
step S331, selecting continuous action by using the actor module according to a Behavior strategy, and discretizing the continuous action in the following way to obtain a discrete action value:
wherein,denotes the firstThe i-th quantization bit width in the operation sequence of the time step corresponds to a successive operation,is shown andcorresponding discrete action values, Q comprising a plurality of predetermined quantization bit width values,a function of rounding off is represented by,andindicating a preset minimum quantization bit width and a preset maximum quantization bit width,the function is used to select a target preset quantization bit width value Q in Q such thatMinimum;
step S332, performing numerical value updating on the action sequence by using the action value, determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence, and judging whether the memory occupation amount, the calculated amount and the delay amount meet the limit of a preset resource limiting condition; if yes, go to step S333, otherwise, go to step S334;
step S333, if yes, performing quantization compression on the vertex features and the graph neural network in the graph data by using the action sequence;
and step S334, if not, sequentially reducing the quantization bit width in the action sequence according to a preset sequence to update the action sequence again, and when each reduction action is completed, determining a memory occupation amount, a calculation amount, and a delay amount corresponding to the updated action sequence.
Specifically, for the action sequence with length k +3, the sequenceIndividual time step, DDPG agent takes continuous actionAnd satisfyAnd each component thereof is divided by the formulaRound off toThe bit width value nearest to itI.e. satisfyAt a minimum, wherein=2,=32. For example, whenThe calculation results in the above formula show that when q is selected to be 4, the bit width of the quantization is ensured compared with other preset quantization bit widthsAt a minimum, and therefore correspondShould be set to 4.
Further, in practical applications, due to limited computational budget (i.e., computational load, latency, and memory footprint), embodiments of the present invention may wish to find a quantized bit-width allocation scheme with optimal speculative performance given constraints. Embodiments of the present invention encourage agents to meet computational budgets by limiting action space. In particular, each time an agent issues an actionIn the embodiment of the invention, the amount of hardware resources to be used by the quantized graph neural network needs to be estimated. If the current allocation scheme exceeds the hardware accelerator resource budget, the vertices of each degree interval and the bit widths of the graph convolution kernels (if any), the weights and the activations (if any) of all the layers of the graph neural network are sequentially reduced until the hardware accelerator resource budget constraint is finally met. The bit width values may also decrease in other orders, for example, in order from large to small, and the embodiment of the present invention is not limited.
Further, the Behavior strategyIs a strategy according to the current actor module and random UO (Uhlenbeck-Ornstein, ornstein-Urnbeck) noiseThe random process generated can be specifically as follows:
in one possible scenario, selecting a continuous action according to the Behavior policy with the actor module includes:
step S3311, selecting a continuous action according to the Behavior policy by using the actor module as follows:
wherein,is shown asRandom UO noise corresponding to each time step,denotes the firstThe historical state vector corresponding to each time step,an online actor network in the actor module is represented,representing online actor network parameters.
It should be noted here that one strategy of the actor module may be specifically represented by the specific model parameters in the module. In other words, a policy update to an actor module actually performs a parameter update to that module.
And S340, performing quantization compression on the vertex features and the graph neural network in the graph data by using the action sequence, and sending the obtained quantization graph data and the quantization graph neural network to the hardware accelerator, so that the hardware accelerator trains the quantization graph neural network by using the quantization graph data, and determines the current accuracy corresponding to the execution of the specified task by the trained quantization graph neural network.
S350, determining a current state vector by using the memory occupation amount, the calculated amount and the accuracy corresponding to the action sequence, and determining an award value by using the reference accuracy and the current accuracy;
specifically, the reward value may be calculated as follows:
wherein,after an original graph neural network is trained by using an original training set, the corresponding reference accuracy of the original graph neural network is improved,is the accuracy of the fine-tuned quantization map neural network,as a scaling factor, the value may preferably be 0.1.
S360, when the reward value is determined to be larger than the historical reward value, updating the historical reward value by using the reward value, and updating the optimal interval quantization bit width and the optimal network quantization bit width by using the updated action sequence;
s370, generating conversion data by using the historical state vector, the continuous action, the reward value and the current state vector, and training the actor module and the critic module by using the conversion data so as to enable the critic module to update strategies used by the actor module when numerical value updating is carried out;
it should be noted that, the embodiment of the present invention does not limit the specific process of training the actor module and the critic module, and reference may be made to the description in the following embodiments. The significance of the training is to update the model parameters of the actor module so that it can use new strategies to determine the next action.
S380, when the time step is determined not to reach the length of the action sequence, adding 1 to the time step, updating the historical state vector by using the current state vector, and entering the step of determining continuous action by using the actor module under the constraint of a preset resource limiting condition;
s390, when the time step is determined to reach the length of the action sequence and the strategy frequency does not reach a preset value, adding 1 to the strategy frequency, and entering the step of initializing the action sequence and the historical state vector;
and S3100, outputting the optimal interval quantization bit width and the optimal network quantization bit width when the strategy times are determined to reach a preset value.
It should be noted that, the embodiment of the present invention does not limit the specific preset value, and the preset value can be set according to the actual application requirement. It can be understood that the larger the preset value is, the stronger the perception degree of the agent to the environment is, the more appropriate the generated optimal interval quantization bit width and the optimal network quantization bit width are, but the corresponding calculation time is longer, and the calculation amount is larger, so the preset upper limit corresponding to the number of times of the policy can be set as required after the precision and the calculation resource are balanced.
Based on the above-described embodiment, the manner of calculating the memory footprint, the calculation amount, and the delay amount will be described below. Of course, considering that the threshold values and the calculation formulas of the three quantities are determined by the direct feedback of the hardware accelerator, the processing manner of the hardware accelerator on the quantization map data and the quantization map neural network is also described. Specifically, the main processing content of the hardware accelerator for the quantization map data and the quantization map neural network is to train the quantization map neural network by using the quantization map data, and the training process may be optimized in various ways, for example, optimization of strategies such as full-batch (full-batch), mini-batch (mini-batch), or single-element (one-example) Stochastic Gradient Descent (SGD). In the embodiment of the invention, in order to improve the training efficiency, the hardware accelerator can optimize the training process of the quantization map neural network by adopting a small-batch stochastic gradient descent method.
In one possible scenario, the hardware accelerator training the quantization map neural network with quantization map data may include:
and S341, training the quantization map neural network by the hardware accelerator by using the quantization map data based on a small batch stochastic gradient descent method.
Based on the above training method, the following describes the calculation method of the memory footprint, the calculation amount, and the delay amount. In one possible case, determining the memory footprint, the computation amount and the delay amount corresponding to the updated action sequence includes:
s3321, calculating the memory occupation amount by using the following formula:
wherein,the memory occupied amount is shown in the figure,representing the number of graph vertices within a single mini-batch,representation of quantization map neural networkThe vertex dimension values corresponding to the respective network layers,,representing the number of all network layers of the neural network of the quantization map,representing all graphs within a single small batchThe maximum value in the interval quantization bit width to which the vertex is allocated, S represents the total number of convolution kernels,andrespectively representing the network quantization bit width corresponding to the weight matrix and the convolution kernel of each network layer of the neural network of the quantization diagram;
s3322, calculating the calculated amount by using the following formula:
wherein,the amount of calculation is represented by the amount of calculation,network quantization bit widths corresponding to activation matrixes of each network layer of the neural network of the quantization diagram are represented,representation of quantization map neural networkA total number of multiply-accumulate operations for a layer;
s3323, calculating the delay amount using the following equation:
wherein,it is indicated that the amount of delay,representing quantization graph neural networkThe network layer handles the delay of small batches of graph data.
It should be noted that after obtaining the memory footprint, the calculated amount, and the delay amount, corresponding threshold values may be used to determine whether the three amounts meet the requirements. Can adopt、Andrepresenting a memory footprint threshold, a computation threshold and a latency threshold, ofThe storage capacity that can be provided by the hardware acceleration device,represents the upper limit of the total number of bit operations available per second for the hardware accelerator, andrefers to the delay characteristics of the hardware accelerator.、Andall determined by the characteristics of the hardware accelerator, and can be obtained directly or obtained through measurement.
Based on the above embodiments, the following describes a specific process of quantization compression. The embodiment of the present invention will be described by taking quantization and compression of graph data as an example. In one possible case, performing quantization compression on vertex features in graph data by using a sequence of actions may include:
s341, truncating vertex features of each graph vertex in the graph data to the range of [ -c, c ] (c > 0), and performing quantization compression on the truncated vertex features by using section quantization bits corresponding to degrees of the graph vertices in the operation sequence:
wherein,a quantized compression function is represented that is,a function of rounding off is represented by,representing a truncation function ofIs cut off to,The characteristics of the vertex are represented and,representing the jth component in the vertex feature,which represents a scaling factor, is the ratio of the scaling factor,,indicating a neutralization in a sequence of actionsAnd interval quantization bits corresponding to degrees of the vertex of the graph.
Of course, in order to further reduce the accuracy loss of the quantized graph data due to the selection of the cutoff value c, the embodiment of the present invention also designs that a method based on minimizing the characteristic distribution distance of the data before and after quantization is adopted to determine the appropriate c value. Specifically, before performing quantization compression on vertex features in the graph data by using the action sequence, the method may further include:
and S342, determining the c value in the following way:
wherein,the function is used to select the value of x such thatAt the minimum, the temperature of the mixture is controlled,to representCharacteristic distribution ofKL divergence between feature distributions of (a); the feature distribution is a maximum, minimum, mean, variance, sharpness, or kurtosis.
It should be noted that the embodiment of the present invention does not limit the calculation manner of the KL Divergence (Kullback-Leibler Divergence), and of course, the distance between the two feature distributions may also be determined in other manners, for example, JS distance (Jensen-Shannon Divergence) and Mutual Information (Mutual Information) may also be used, and may be set according to the actual application requirements. The embodiment of the invention also does not limit the specific acquisition mode of the characteristic distribution data, for example, the maximum value, the minimum value, the mean value and the variance can be directly obtained through target data; the sharpness and kurtosis are obtained by constructing a histogram of the target data. As for the graph convolution kernels (if any), weights and activations (if any) of the different layers of the graph neural network, the embodiments of the present invention will quantify them similarly. The difference is that for activation, embodiments of the present invention will truncate them to the range of [0, c ] rather than [ -c, c ] because the activation values (i.e., the outputs of the ReLU (linear rectification function) layer) are non-negative.
Based on the above embodiment, the initialization and training process of the actor module and the critic module will be described in detail below. First, a brief description of the structure of a DDPG agent will be given. The Actor-Critic framework consists of an Actor (which may also be referred to as a policy network μ) and Critic (which may also be referred to as a Q network or a value network). Wherein the Actor is responsible for interacting with the environment and learning a better strategy by a strategy gradient method under the guidance of a Critic value function; the Critic task is to learn a value function Q by utilizing collected data of interaction between the Actor and the environment, and the function of the function is to judge whether the current state-action pair is good or not so as to assist the Actor to update strategies. Both Actor and Critic contain two networks, one called online and one called target. Thus, four networks in the DDPG algorithm are available, namely an online Actor network (online Actor network), a target Actor network (target Actor network), an online Critic network (online Critic network), and a target Critic network (target Critic network). The online Actor network and the target Actor network have the same structure and different parameters; the same is true for the online critical network and the target critical network. In the network training process, the DDPG algorithm adopts the skill of freezing target network: the online network parameters are updated in real time, while the target network parameters are temporarily frozen. And when the target network is frozen, the online network is tried and explored, the target network summarizes the experience according to the samples generated by the online network, then acts again, and assigns the parameters of the online network to the target network.
In addition, the DDGP algorithm also employs an empirical playback (empirical playback) mechanism to remove data dependencies and improve sample utilization efficiency. The method specifically comprises the steps of maintaining an experience playback pool, storing a conversion data quadruple (state, action, reward and next state) sampled from the environment each time into the experience playback pool, and randomly sampling a plurality of data from a playback buffer area when a strategy network and a Q network are trained. Doing so can serve two functions: (1) let the samples satisfy the independent assumption. The relevance between samples can be broken through by adopting empirical playback, so that the samples can meet independent assumptions; and (2) improving the sample utilization rate.
The functions of the four networks of the DDPG agent are as follows:
online Actor network (online Actor network): responsible policy network parametersIs updated iteratively according to the current environmental stateSelecting a current optimal actionAnd responsible for interacting with the environment to generate the next stateAnd a reward r;
target Actor network (target Actor network): responsible for the next state based on samples from the empirical replay poolSelecting the next optimal actionAnd the system is responsible for regularly using an exponential moving average method to convert the parameters of the Online ActorUpdating parameters to a Target Actor network;
online Critic network (online Critic network): responsible value network parametersIs responsible for calculating the online Q value of the current state-action pairResponsible for calculating the estimate of the Target critical network output;
target Critic network (target Critic network): estimation responsible for calculating Target critical network outputIn (1)And responsible for periodically using the exponential moving average method to convert the parameters of Online CriticUpdating parameters to Target critical network。
In one possible scenario, where the actor module comprises an online actor network and a target actor network, the critic module comprises an online critic network and a target critic network, initializing an agent used for reinforcement learning may include:
s311, initializing online actor network parameters of the online actor network, and setting the target actor network parameters of the target actor network and the online actor network parameters to be the same values;
s312, initializing online commenting family network parameters of the online commenting family network, and setting the target commenting family network parameters of the target commenting family network and the online commenting family network parameters to be the same values.
In particular, parameters of the online actor and online reviewer networks may be initialized firstAndand copying the parameters of the online network to the corresponding target network parameters:
in one possible scenario, training the actor module and the critic module with the transformed data may include:
s371, adding the conversion data to an experience playback pool, and randomly sampling a preset number of conversion data from the experience playback pool to serve as training data;
s372, determining a first gradient of the online critic network parameters by utilizing the training data, the target actor network, the target critic network, the online critic network and the following loss function;
wherein,the function of the loss is represented by,it is shown that the continuous motion is performed,is shown asThe historical state vector corresponding to each time step,representing an online network of critics that is online,the online review of the family network parameters,representing a preset number;representing an estimate of the target critic's network,,is shown asThe value of the reward corresponding to each time step,a pre-set discount factor is represented by,a network of target critics is represented,representing the network parameters of the target critic,a network of target actors is represented as,a network parameter representing the target actor is shown,denotes the firstCurrent state vectors corresponding to the time steps;
s373, updating the online critic network parameters according to the first gradient;
s374, determining a performance target by using the training data, the updated online critic network, the updated online actor network and the target function, and determining a second gradient of the performance target relative to the determined online actor network parameters:
wherein,indicating when the environmental state O obeys a distribution function ofWhen being distributedThe expected value of (c) is,;a network parameter of the on-line actor is represented,representing a second gradient.
For the second gradient calculation process, it should be noted that the object of the embodiment of the present invention is to find an optimal policy network parameterThe DDPG agent is caused to act according to an optimal strategy corresponding to this parameter, maximizing the expectation of the cumulative prize accrued in the environment. To evaluate the performance of a policy μ, the present invention defines an objective function J called performance objective:
wherein,means that in each state O, if according to the policyTo select an actionThe Q value can be generated.Is that when the environmental state O obeys a distribution function ofWhen the distribution of the pressure difference is small,is calculated from the expected value of (c). Objective functionNetwork parameters with respect to policyThe gradient (policy gradient for short) of (a) can be calculated by the following formula:
the calculation of the strategy gradient utilizes a chain rule, firstly derives the action a, and then takes the strategy network parameterAnd (6) derivation. Then, the function Q is maximized by a gradient ascending method, resulting in the action with the largest value.
The expected value can be estimated by the Monte-Carlo method. Storing state transitions in an empirical playback pool PWhereinBased on DDPG intelligent agent and according to Behavior strategyGenerated, it will be converted into discrete action values based on the methods provided in the above embodiments. When N conversion data are randomly sampled from the empirical replay pool P to form a single batch, according to the Monte-Carlo method, the single batch data can be substituted into the policy gradient formula as a unbiased estimate of the expected value, so the policy gradient can be rewritten as:
s375, updating the network parameters of the online actors based on the second gradient;
s376, updating the target comment family network parameter and the target actor network parameter by using the updated online comment family network parameter and the updated online actor network parameter in the following way:
The neural network compression method of the above-described figure is described in detail below based on a specific example.
(a) A heterogeneous parallel computing system consisting of a host (namely an upper computer) and a hardware accelerator is built. Xilinx Zynq-7020 FPGA or Inspur F37X FPGA is used as GNN inference hardware accelerator. In the aspect of time sequence structure design, reconfigurable Bit-Serial Matrix Multiplication superposition (BISMO) is utilized. In the aspect of space structure, a Bitfusion architecture is adopted. Computation, storage and delay characteristic data for the hardware accelerator are obtained.
(b) The Graph neural Network selects GCN (Graph Convolutional neural Network), constructs a Graph data set by using Pumbed (a kind of abstract database), selects a Graph learning task as vertex classification, and then designs an objective function and an evaluation standard matched with the learning task. And constructing a GNN example containing the L-layer graph convolution layer, and training the GNN model by using a CPU (central processing unit) or a GPU (graphics processing unit) according to a small-batch stochastic gradient descent method on the upper computer to obtain the trained floating-point GNN model. Graph data and a trained floating-point GNN model are objects to be quantized by the present invention.
(c) And constructing a DDPG reinforcement learning environment and finishing initialization. 1) And constructing an Actor (policy network) and a Critic (value network). Each network has one copy, one is an online network and the other is a target network. 2) Initializing online network parameters of Actor and CriticAnd(ii) a Copying parameters of the online network to corresponding target network parameters:. 3) Initializing an environmental state. 4) Initializing an empirical playback pool (replay buffer) P and a sampling threshold. 5) The maximum reward r _ best and the optimal action a _ best are initialized.
(d) And finding an optimal quantization bit width distribution strategy by using a DDPG algorithm. All steps are performed on the upper computer unless explicitly stated. The method comprises the following specific steps:
(1) initializing a UO random process;
(3) Repeatedly executing T time steps, at each time stepThe following operations are performed in sequence:
a. the Actor selects an action according to the Behavior strategyWherein,random UO (Uhlenbeck-Ornstein) noise. Will be provided withConversion to discrete actions。
b. The upper computer is based onAnd (3) specifying a good quantization bit width, and performing quantization on the features of all graph vertices, graph convolution kernels (if any) of all layers of GNN, weights and activations (if any) by adopting a quantization method based on a method for minimizing data feature distribution distance before and after quantization. Obtaining quantized graph vertex feature data and a GNN model, and mapping the GNN model to a hardware accelerator;
c. the hardware accelerator reads the quantized graph vertex characteristics and the adjacency matrix from the upper computer, trains the GNN model by adopting a small-batch stochastic gradient descent method, tests the classification accuracy and calculates the reward functionAnd output the value of (C)(ii) a Will be provided withAndreturning to the upper computer;
d. and the upper computer updates r _ best and a _ best. The upper computer compares and returnsAnd the size of r _ best, if>r _ best, then order,。
e. The upper computer converts the state into a processAnd storing the data into an experience playback pool P.
f. When the number of transitions in the empirical playback pool P exceeds a thresholdAnd then, sampling is carried out: the upper computer randomly samples N pieces of transition data from an experience playback pool P to serve as batch training data of an online Actor and an online critical network.
g. And the upper computer updates the gradients of the online Actor network and the online Critic network. ComputingAboutAnd calculating a strategy gradient; updating online critical network parameters by adopting Adam optimizerAnd online Actor network parameters;
h. The upper computer soft updates parameters of a target Actor network and a target Critic network: and soft updating the online network parameters corresponding to the online network parameters and the target network parameters by using a moving average method:
(4) and the upper computer outputs r _ best and a _ best.
(e) And the hardware accelerator retrains the quantization model by an epoch according to the a _ best to recover the performance, and obtains the final fixed-point GNN quantization model and the quantized graph vertex characteristic data.
The following describes a neural network compression device, an electronic device, and a computer-readable storage medium according to embodiments of the present invention, and the neural network compression device, the electronic device, and the computer-readable storage medium described below and the neural network compression method described above may be referred to correspondingly.
Referring to fig. 4, fig. 4 is a block diagram of a neural network compression device according to an embodiment of the present invention. The apparatus may include:
an obtaining module 401, configured to obtain a trained graph neural network and graph data used in training the graph neural network;
an interval determining module 402, configured to determine degree distribution ranges corresponding to vertices of all graphs in graph data, and divide the degree distribution ranges into a plurality of degree intervals;
a quantization bit width determining module 403, configured to determine, by using a reinforcement learning and hardware accelerator, an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network under the constraint of a preset resource limitation condition;
and the quantization compression module 404 is configured to perform quantization compression on vertex features of graph vertices corresponding to degrees in the graph data by using the optimal interval quantization bit width, and perform quantization compression on the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network.
Optionally, the interval determining module 402 may include:
the arrangement submodule is used for arranging all graph vertexes in the graph data from small to large according to degrees to obtain a graph vertex sequence;
the dividing submodule is used for dividing the degree distribution range by using the graph vertex sequence to obtain a plurality of degree intervals; the number of the chart vertexes contained in each degree interval is the same or the difference value is smaller than a preset threshold value.
Optionally, the apparatus may further include:
and the training module is used for training the optimal quantization map neural network by using the optimal quantization map data to obtain a fine tuning quantization map neural network so as to deploy the fine tuning quantization map neural network to external service equipment.
Optionally, the time sequence structure of the hardware accelerator is reconfigurable bit-serial matrix multiplication superposition, and the space structure is a BitFusion architecture.
Optionally, the quantization bit width determining module 403 includes:
the initialization submodule is used for acquiring the reference accuracy corresponding to the execution of the designated task by the graph neural network and initializing an agent and a historical reward value used for reinforcement learning; the intelligent agent comprises an actor module and a critic module;
the first setting submodule is used for setting the strategy frequency to be 1 and initializing an action sequence and a historical state vector; the action sequence is used for storing the interval quantization bit width corresponding to each degree interval and the network quantization bit width corresponding to the graph neural network; the state vector is used for recording the corresponding memory occupation amount, the calculated amount and the corresponding accuracy when the quantized graph neural network processes the quantized graph data;
the second setting submodule is used for setting the time step to be 1, determining continuous actions by using the actor module under the constraint of a preset resource limiting condition, updating the numerical values of the action sequence by using the continuous actions, and determining the memory occupation amount and the calculated amount corresponding to the action sequence after updating;
the compression and training submodule is used for carrying out quantization compression on the vertex features and the graph neural network in the graph data by using the action sequence and sending the obtained quantization graph data and the quantization graph neural network to the hardware accelerator so that the hardware accelerator trains the quantization graph neural network by using the quantization graph data and determines the current accuracy corresponding to the execution of the designated task by the trained quantization graph neural network;
the calculation submodule is used for determining a current state vector by utilizing the memory occupation amount, the calculated amount and the accuracy corresponding to the action sequence and determining a reward value by utilizing the reference accuracy and the current accuracy;
the refinement submodule is used for updating the historical reward value by using the reward value when the reward value is determined to be larger than the historical reward value, and updating the optimal interval quantization bit width and the optimal network quantization bit width by using the updated action sequence;
the intelligent agent training submodule is used for generating conversion data by utilizing the historical state vector, the continuous action, the reward value and the current state vector, and training the actor module and the critic module by utilizing the conversion data so as to update the strategy used by the critic module when the actor module is subjected to numerical value updating;
a third setting submodule, configured to add 1 to the time step when it is determined that the time step does not reach the length of the action sequence, update the historical state vector using the current state vector, and perform a step of numerically updating the time step on the action sequence using the actor module under the constraint of a preset resource restriction condition;
a fourth setting submodule, configured to add 1 to the strategy times when it is determined that the time step reaches the length of the action sequence and the strategy times does not reach a preset value, and enter a step of initializing the action sequence and the historical state vector;
and the output sub-module is used for outputting the optimal interval quantization bit width and the optimal network quantization bit width when the strategy times are determined to reach the preset value.
Optionally, the second setting submodule may include:
the discrete action determining unit is used for selecting continuous action according to the Behavior strategy by utilizing the actor module, and discretizing the continuous action in the following mode to obtain a discrete action value:
wherein,denotes the firstThe i-th quantization bit width in the action sequence of each time step corresponds to a successive action,is represented byCorresponding discrete action values, Q comprises a plurality of predefined quantization bit width values,a function of rounding off is represented by,andrepresenting a preset minimum quantization bit width and a maximum quantization bit width,the function is used to select a target preset quantization bit width value Q in Q such thatMinimum;
the updating unit is used for carrying out numerical value updating on the action sequence by utilizing the action value, determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence, and judging whether the memory occupation amount, the calculated amount and the delay amount meet the limitation of the preset resource limitation condition or not;
the first processing unit is used for performing quantization compression on the vertex features and the graph neural network in the graph data by utilizing the action sequence if the vertex features and the graph neural network are in the same state;
and if not, sequentially reducing the quantization bit width in the action sequence according to a preset sequence so as to update the action sequence again, and entering the step of determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence when each reduction action is completed.
Optionally, the discrete motion determination unit may include:
a continuous action determining subunit for selecting a continuous action according to the Behavior strategy using the actor module in the following manner:
wherein,is shown asRandom UO noise corresponding to each time step,is shown asThe historical state vector corresponding to each time step,an online actor network in the actor module is represented,representing online actor network parameters.
Optionally, the compression and training submodule may include:
and the hardware accelerator unit is used for training the quantization map neural network by using the quantization map data based on a small batch stochastic gradient descent method.
Optionally, the updating unit may include:
the first calculating subunit is configured to calculate the memory usage amount by using the following formula:
wherein,the amount of memory usage is indicated,representing the number of graph vertices within a single mini-batch,representation of quantization map neural networkThe vertex dimension values corresponding to the individual network layers,,representing the number of all network layers of the neural network of the quantization map,represents the maximum value of the interval quantization bit widths to which all graph vertices within a single mini-batch are assigned, S represents the total number of convolution kernels,andrespectively representing the network quantization bit width corresponding to the weight matrix and the convolution kernel of each network layer of the neural network of the quantization diagram;
a second calculating subunit, configured to calculate the calculated amount using the following formula:
wherein,the amount of calculation is represented by the amount of calculation,representing the network quantization bit width corresponding to the activation matrix of each network layer of the neural network of the quantization diagram,representation of quantization map neural networkA total number of multiply-accumulate operations for a layer;
a third calculating subunit for calculating the delay amount using the following formula:
wherein,it is indicated that the amount of delay,representation of quantization map neural networkThe network layer handles the delay of small batches of graph data.
Optionally, the compression and training submodule comprises:
a compression unit, for truncating the vertex features of each graph vertex in the graph data to the range of [ -c, c ] (c > 0), and performing quantization compression on the truncated vertex features by using the interval quantization bits corresponding to the degrees of the graph vertices in the action sequence:
wherein,a quantized compression function is represented that is,a function of rounding off is represented by,represents a truncation function ofIs cut off to,The characteristics of the vertex are represented and,representing the jth component in the vertex feature,which represents a scaling factor, is the ratio of the scaling factor,,indicating a neutralization in a sequence of actionsAnd interval quantization bits corresponding to degrees of the vertex of the graph.
Optionally, the compression and training sub-module further comprises:
a cutoff value determination unit for determining the value of c by:
wherein,the function is used to select the value of x such thatAt the minimum, the temperature of the mixture is controlled,to representCharacteristic distribution ofKL divergence between feature distributions of (a); the characteristic distribution is a maximum, a minimum, a mean, a variance, a sharpness, or a kurtosis.
Optionally, the actor module includes an online actor network and a target actor network, the critic module includes an online critic network and a target critic network, and the initialization sub-module includes:
the first initialization unit is used for initializing the online actor network parameters of the online actor network and setting the target actor network parameters of the target actor network and the online actor network parameters to be the same values;
the second initialization unit is used for initializing online comment family network parameters of the online comment family network and setting the target comment family network parameters of the target comment family network to be the same as the online comment family network parameters.
Optionally, the agent training submodule may include:
the training data extraction unit is used for adding the conversion data to the experience playback pool and randomly sampling a preset number of conversion data from the experience playback pool as training data;
the first gradient calculation unit is used for determining a first gradient of the online critic network parameters by utilizing the training data, the target actor network, the target critic network, the online critic network and the following loss function;
wherein,the function of the loss is represented by,it is shown that the continuous motion is performed,is shown asThe historical state vector corresponding to each time step,representing an online network of critics that is online,the online review of the family network parameters,representing a preset number;representing an estimate of the network of the target critic,,denotes the firstThe reward value corresponding to each time step is calculated,a pre-set discount factor is represented by,a network of target critics is represented by,representing the network parameters of the target critic,a network of target actors is represented as,a network parameter representing the target actor is shown,is shown asCurrent state vectors corresponding to the time steps;
the first updating unit is used for updating the online critic network parameters according to the first gradient;
the second gradient calculation unit is used for determining a performance target by utilizing the training data, the updated online critic network, the updated online actor network and the target function, and determining a second gradient of the performance target relative to the determined online actor network parameters:
wherein,indicating when the environmental state O obeys a distribution function ofWhen being distributedThe expected value of (c) is,;a network parameter representing the on-line actor is shown,representing a second gradient;
the second updating unit is used for updating the network parameters of the online actors based on the second gradient;
a third updating unit, configured to update the target comment family network parameter and the target actor network parameter by using the updated online comment family network parameter and online actor network parameter in the following manner:
Referring to fig. 5, fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention, where the embodiment of the present invention further provides an electronic device, including:
a memory 501 for storing a computer program;
a processor 502 for implementing the steps of the neural network compression method as described above when executing the computer program.
Since the embodiment of the electronic device portion corresponds to the embodiment of the neural network compression method portion, please refer to the description of the embodiment of the neural network compression method portion for the embodiment of the electronic device portion, and details are not repeated here.
Referring to fig. 6, fig. 6 is a block diagram of a computer-readable storage medium according to an embodiment of the present invention, and the embodiment of the present invention further provides a computer-readable storage medium 601 having a computer program stored thereon, where the computer program is executed by a processor to implement the steps of the graph neural network compression method according to any of the embodiments.
Since the embodiment of the computer-readable storage medium portion corresponds to the embodiment of the neural network compression method portion, please refer to the description of the embodiment of the neural network compression method portion for the embodiment of the storage medium portion, and details are not repeated here.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The method, the apparatus, the electronic device and the storage medium for compressing the neural network of the present invention are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
Claims (16)
1. A graph neural network compression method, comprising:
acquiring a trained graph neural network and graph data used in training the graph neural network;
determining degree distribution ranges corresponding to all graph vertexes in the graph data, and dividing the degree distribution ranges into a plurality of degree intervals;
under the constraint of a preset resource limiting condition, determining an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network by using a reinforcement learning and hardware accelerator;
and quantizing and compressing the vertex characteristics of the graph vertex corresponding to the degree in the graph data by using the optimal interval quantization bit width, and quantizing and compressing the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network.
2. The method according to claim 1, wherein the determining a degree distribution range corresponding to all graph vertices in the graph data and dividing the degree distribution range into a plurality of degree intervals comprises:
arranging all graph vertexes in the graph data from small to large according to degrees to obtain a graph vertex sequence;
dividing the degree distribution range by using the graph vertex sequence to obtain a plurality of degree intervals; the number of the chart vertexes contained in each degree interval is the same or the difference value is smaller than a preset threshold value.
3. The method of compressing a neural network of a graph according to claim 1, further comprising, after obtaining the optimal quantization map data and the optimal quantization map neural network:
and training the optimal quantization map neural network by using the optimal quantization map data to obtain a fine tuning quantization map neural network so as to deploy the fine tuning quantization map neural network to external service equipment.
4. The method of claim 1, wherein the hardware accelerator has a temporal structure of reconfigurable bit-serial matrix multiplication superposition and a spatial structure of a Bitfusion architecture.
5. The method according to any one of claims 1 to 4, wherein the determining, by using a reinforcement learning and hardware accelerator, an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network under a constraint of a preset resource constraint condition includes:
acquiring reference accuracy corresponding to the execution of a specified task by the graph neural network, and initializing an agent and a historical reward value used by the reinforcement learning; the intelligent agent comprises an actor module and a critic module;
setting the strategy frequency to be 1, and initializing an action sequence and a historical state vector; the action sequence is used for storing an interval quantization bit width corresponding to each degree interval and a network quantization bit width corresponding to the graph neural network; the state vector is used for recording the corresponding memory occupation amount, the calculated amount and the corresponding accuracy when the quantitative map neural network processes the quantitative map data;
setting the time step to be 1, determining continuous actions by using the actor module under the constraint of the preset resource limiting condition, performing numerical updating on the action sequence by using the continuous actions, and determining the memory occupation amount and the calculated amount corresponding to the action sequence after updating;
performing quantization compression on the vertex features in the graph data and the graph neural network by using the action sequence, and sending the obtained quantization graph data and the quantization graph neural network to the hardware accelerator, so that the hardware accelerator trains the quantization graph neural network by using the quantization graph data and determines the current accuracy corresponding to the designated task executed by the trained quantization graph neural network;
determining a current state vector by using the memory occupation amount, the calculated amount and the accuracy corresponding to the action sequence, and determining a reward value by using the reference accuracy and the current accuracy;
when the reward value is determined to be larger than the historical reward value, updating the historical reward value by using the reward value, and updating the optimal interval quantization bit width and the optimal network quantization bit width by using the updated action sequence;
generating conversion data by using the historical state vector, the continuous action, the reward value and the current state vector, and training the actor module and the commentator module by using the conversion data so as to enable the commentator module to update a strategy used by the actor module when the numerical value is updated;
when the time step is determined not to reach the length of the action sequence, adding 1 to the time step, updating the historical state vector by using the current state vector, and entering the step of determining continuous actions by using the actor module under the constraint of the preset resource limit condition;
when the time step is determined to reach the length of the action sequence and the strategy times do not reach a preset value, adding 1 to the strategy times, and entering the steps of initializing the action sequence and the historical state vector;
and when the strategy times are determined to reach the preset value, outputting the optimal interval quantization bit width and the optimal network quantization bit width.
6. The neural network compression method of claim 5, wherein the determining, by the actor module, continuous actions under the constraint of the preset resource limitation condition, performing numerical update on the action sequence by using the continuous actions, and determining the memory occupation amount and the calculation amount corresponding to the action sequence after the update comprises:
selecting the continuous action by the actor module according to a Behavior strategy, and discretizing the continuous action in the following mode to obtain a discrete action value:
wherein,is shown asThe i-th quantization bit width in the operation sequence of the time step corresponds to a successive operation,is shown andcorresponding discrete action values, Q comprising a plurality of predefined quantization bit width values, Q representing a target predefined quantization bit width value selected in Q,a function of rounding off is represented by,andrepresenting a preset minimum quantization bit width and a maximum quantization bit width,the function is used to select a target preset quantization bit width value Q in Q such thatMinimum;
performing numerical value updating on the action sequence by using the action value, determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence, and judging whether the memory occupation amount, the calculated amount and the delay amount meet the limit of the preset resource limiting condition;
if yes, performing quantitative compression on the vertex features in the graph data and the graph neural network by using the action sequence;
if not, sequentially reducing the quantization bit width in the action sequence according to a preset sequence so as to update the action sequence again, and entering the step of determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence when each reduction action is completed.
7. The method of claim 6, wherein said selecting said continuous action according to a Behavior policy using said actor module comprises:
selecting, with the actor module, a continuous action according to a Behavior policy as follows:
8. The method of claim 6, wherein the hardware accelerator trains the quantization map neural network with the quantization map data, comprising:
and the hardware accelerator trains the quantization map neural network by utilizing the quantization map data based on a small batch stochastic gradient descent method.
9. The method of claim 8, wherein the determining the memory footprint, the computation load, and the delay load corresponding to the updated action sequence comprises:
calculating the memory occupancy amount by using the following formula:
wherein,is indicative of the amount of memory usage,representing the number of graph vertices within a single mini-batch,representing the quantization map neural networkThe vertex dimension values corresponding to the individual network layers,,representing the number of all network layers of the neural network of the quantization map,represents the maximum value of the interval quantization bit widths to which all graph vertices within a single mini-batch are assigned, S represents the total number of convolution kernels,andrespectively representing the weight matrix of each network layer of the neural network of the quantization diagram and the network quantization bit width corresponding to the convolution kernel;
the calculated amount is calculated using the following formula:
wherein,the amount of the calculation is represented by,network quantization bit width corresponding to an activation matrix of each network layer of the quantization graph neural network is represented,representation of quantization map neural networkA total number of multiply-accumulate operations for a layer;
the delay amount is calculated using the following formula:
10. The graph neural network compression method of claim 5, wherein the performing quantization compression on the vertex features in the graph data by using the action sequence comprises:
truncating the vertex features of each graph vertex in the graph data into a range of [ -c, c ] (c > 0) and performing quantization compression on the truncated vertex features by using section quantization bits corresponding to the degrees of the graph vertex in the action sequence:
wherein,which represents a function of the quantization compression,a function of rounding off is represented by,representing a truncation function ofIs cut off to,The characteristics of the vertex are represented and,representing the jth component in the vertex feature,which represents a scaling factor, is the ratio of the scaling factor,,representing the sum in the sequence of actionsAnd interval quantization bits corresponding to degrees of the vertex of the graph.
11. The method of claim 10, further comprising, prior to performing quantization compression on vertex features in the graph data using the sequence of actions:
the c value is determined by:
wherein,the function is used to select the value of x such thatAt the minimum, the temperature of the mixture is controlled,to representCharacteristic distribution ofKL divergence between feature distributions of (a); the characteristic distribution is a maximum value, a minimum value, a mean value, a variance, a sharpness or a kurtosis.
12. The graph neural network compression method of claim 5, wherein the actor modules comprise an online actor network and a target actor network, the critic modules comprise an online critic network and a target critic network, and the initializing agents used by the reinforcement learning comprises:
initializing online actor network parameters of the online actor network, and setting target actor network parameters of the target actor network and the online actor network parameters to be the same values;
initializing the online commenting family network parameters of the online commenting family network, and setting the target commenting family network parameters of the target commenting family network and the online commenting family network parameters to be the same values.
13. The method of graph neural network compression of claim 12, wherein the training the actor module and the critic module using the transformed data comprises:
adding the conversion data to an experience playback pool, and randomly sampling a preset number of conversion data from the experience playback pool as training data;
determining a first gradient of the online critic network parameter using the training data, the target actor network, the target critic network, the online critic network, and a loss function as follows;
wherein, theThe loss function is represented by a function of the loss,the sequential action is represented by the sequential action,is shown asThe historical state vector corresponding to each time step,representing an online network of critics that is online,representing the online critic's network parameters,representing the preset number;representing an estimate of the target critic's network,,is shown asThe reward value corresponding to each time step is calculated,a pre-set discount factor is represented by,a network of target critics is represented,represents the network parameters of the target critic,a network of target actors is represented as,a network parameter representing the target actor is shown,denotes the firstA current state vector corresponding to each time step;
updating the online commenting family network parameters according to the first gradient;
determining a performance goal using the training data, the updated online critic network, the online actor network, and an objective function, and determining a second gradient of the performance goal with respect to determining the online actor network parameters:
wherein,indicating when the environmental state O obeys a distribution function ofWhen being distributedThe expected value of (c) is,;a network parameter representing the on-line actor is shown,representing the second gradient;
updating the online actor network parameters based on the second gradient;
updating the target comment family network parameter and the target actor network parameter by using the updated online comment family network parameter and the updated online actor network parameter in the following way:
14. A graph neural network compression apparatus, comprising:
the acquisition module is used for acquiring the trained graph neural network and graph data used in the training process;
the interval determining module is used for determining degree distribution ranges corresponding to all graph vertexes in the graph data and dividing the degree distribution ranges into a plurality of degree intervals;
a quantization bit width determining module, configured to determine, by using a reinforcement learning and a hardware accelerator, an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network under a constraint of a preset resource restriction condition;
and the quantization compression module is used for performing quantization compression on the vertex characteristics of the graph vertex corresponding to the degree in the graph data by using the optimal interval quantization bit width and performing quantization compression on the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network.
15. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the method of figure neural network compression as claimed in any one of claims 1 to 13 when executing the computer program.
16. A computer-readable storage medium having stored thereon computer-executable instructions that, when loaded and executed by a processor, carry out a method of graph neural network compression as claimed in any one of claims 1 to 13.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211299256.8A CN115357554B (en) | 2022-10-24 | 2022-10-24 | Graph neural network compression method and device, electronic equipment and storage medium |
PCT/CN2023/085970 WO2024087512A1 (en) | 2022-10-24 | 2023-04-03 | Graph neural network compression method and apparatus, and electronic device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211299256.8A CN115357554B (en) | 2022-10-24 | 2022-10-24 | Graph neural network compression method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115357554A CN115357554A (en) | 2022-11-18 |
CN115357554B true CN115357554B (en) | 2023-02-24 |
Family
ID=84007819
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211299256.8A Active CN115357554B (en) | 2022-10-24 | 2022-10-24 | Graph neural network compression method and device, electronic equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115357554B (en) |
WO (1) | WO2024087512A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115357554B (en) * | 2022-10-24 | 2023-02-24 | 浪潮电子信息产业股份有限公司 | Graph neural network compression method and device, electronic equipment and storage medium |
CN116011551B (en) * | 2022-12-01 | 2023-08-29 | 中国科学技术大学 | Graph sampling training method, system, equipment and storage medium for optimizing data loading |
CN115934661B (en) * | 2023-03-02 | 2023-07-14 | 浪潮电子信息产业股份有限公司 | Method and device for compressing graphic neural network, electronic equipment and storage medium |
CN116341633B (en) * | 2023-05-29 | 2023-09-01 | 山东浪潮科学研究院有限公司 | Model deployment method, device, equipment and storage medium |
CN118296359B (en) * | 2024-06-05 | 2024-08-06 | 山东德源电力科技股份有限公司 | Electric energy meter with intelligent acquisition system for concentrator terminal |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108962393A (en) * | 2018-05-12 | 2018-12-07 | 鲁东大学 | Automatic arrhythmia analysis method based on compression figure neural network |
CN112100286A (en) * | 2020-08-14 | 2020-12-18 | 华南理工大学 | Computer-aided decision-making method, device and system based on multi-dimensional data and server |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10747433B2 (en) * | 2018-02-21 | 2020-08-18 | Wisconsin Alumni Research Foundation | Computer architecture for high-speed, graph-traversal |
US11645493B2 (en) * | 2018-05-04 | 2023-05-09 | Microsoft Technology Licensing, Llc | Flow for quantized neural networks |
CN110852439B (en) * | 2019-11-20 | 2024-02-02 | 字节跳动有限公司 | Data processing method and device and storage medium |
CN111563589B (en) * | 2020-04-14 | 2024-01-16 | 中科物栖(南京)科技有限公司 | Quantification method and device for neural network model |
CN113570037A (en) * | 2021-07-13 | 2021-10-29 | 清华大学 | Neural network compression method and device |
CN113762489A (en) * | 2021-08-12 | 2021-12-07 | 北京交通大学 | Method for carrying out multi-bit width quantization on deep convolutional neural network |
CN113902108A (en) * | 2021-11-24 | 2022-01-07 | 贵州电网有限责任公司 | Neural network acceleration hardware architecture and method for quantizing bit width dynamic selection |
US20220092391A1 (en) * | 2021-12-07 | 2022-03-24 | Santiago Miret | System and method of using neuroevolution-enhanced multi-objective optimization for mixed-precision quantization of deep neural networks |
CN114781615A (en) * | 2022-04-24 | 2022-07-22 | 上海大学 | Two-stage quantization implementation method and device based on compressed neural network |
CN115357554B (en) * | 2022-10-24 | 2023-02-24 | 浪潮电子信息产业股份有限公司 | Graph neural network compression method and device, electronic equipment and storage medium |
-
2022
- 2022-10-24 CN CN202211299256.8A patent/CN115357554B/en active Active
-
2023
- 2023-04-03 WO PCT/CN2023/085970 patent/WO2024087512A1/en unknown
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108962393A (en) * | 2018-05-12 | 2018-12-07 | 鲁东大学 | Automatic arrhythmia analysis method based on compression figure neural network |
CN112100286A (en) * | 2020-08-14 | 2020-12-18 | 华南理工大学 | Computer-aided decision-making method, device and system based on multi-dimensional data and server |
Non-Patent Citations (2)
Title |
---|
Fully nested neural network for adaptive compression and quantization;Yufei Cui et al;《Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence》;20210107;第2080–2087页 * |
一种基于分类的改进BP神经网络图像压缩方法;马义德 等;《兰州大学学报 (自然科学版)》;20050830;第41卷(第4期);第70-72页 * |
Also Published As
Publication number | Publication date |
---|---|
CN115357554A (en) | 2022-11-18 |
WO2024087512A1 (en) | 2024-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115357554B (en) | Graph neural network compression method and device, electronic equipment and storage medium | |
Jin et al. | Data-driven evolutionary optimization | |
US11861474B2 (en) | Dynamic placement of computation sub-graphs | |
Li et al. | Deep reinforcement learning: Framework, applications, and embedded implementations | |
Gupta et al. | Resource usage prediction of cloud workloads using deep bidirectional long short term memory networks | |
Praveen et al. | Low cost PSO using metamodels and inexact pre-evaluation: Application to aerodynamic shape design | |
CN105488563A (en) | Deep learning oriented sparse self-adaptive neural network, algorithm and implementation device | |
US11263513B2 (en) | Method and system for bit quantization of artificial neural network | |
CN112513886A (en) | Information processing method, information processing apparatus, and information processing program | |
Mammadli et al. | The art of getting deep neural networks in shape | |
WO2019006976A1 (en) | Neural network weight discretizing method, system and device, and readable storage medium | |
CN112529069A (en) | Semi-supervised node classification method, system, computer equipment and storage medium | |
US10410140B1 (en) | Categorical to numeric conversion of features for machine learning models | |
CN114692552A (en) | Layout method and device of three-dimensional chip and terminal equipment | |
Ortega-Zamorano et al. | FPGA implementation of neurocomputational models: comparison between standard back-propagation and C-Mantec constructive algorithm | |
Lyu et al. | Efficient factorisation-based Gaussian process approaches for online tracking | |
Park et al. | Continual learning with speculative backpropagation and activation history | |
Hosseini et al. | The evolutionary convergent algorithm: A guiding path of neural network advancement | |
Wang et al. | Balancing memory-accessing and computing over sparse DNN accelerator via efficient data packaging | |
US12061988B1 (en) | Decomposition of ternary weight tensors | |
Chen et al. | An effective surrogate model assisted algorithm for multi-objective optimization: application to wind farm layout design | |
CN112651492A (en) | Self-connection width graph convolution neural network model and training method thereof | |
JP2021140493A (en) | Information processing apparatus, information processing method, and program | |
Li et al. | Forecasting shipping index using CEEMD-PSO-BiLSTM model | |
CN115934661B (en) | Method and device for compressing graphic neural network, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |