CN115357554B - Graph neural network compression method and device, electronic equipment and storage medium - Google Patents

Graph neural network compression method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115357554B
CN115357554B CN202211299256.8A CN202211299256A CN115357554B CN 115357554 B CN115357554 B CN 115357554B CN 202211299256 A CN202211299256 A CN 202211299256A CN 115357554 B CN115357554 B CN 115357554B
Authority
CN
China
Prior art keywords
network
graph
quantization
neural network
bit width
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211299256.8A
Other languages
Chinese (zh)
Other versions
CN115357554A (en
Inventor
胡克坤
董刚
赵雅倩
李仁刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN202211299256.8A priority Critical patent/CN115357554B/en
Publication of CN115357554A publication Critical patent/CN115357554A/en
Application granted granted Critical
Publication of CN115357554B publication Critical patent/CN115357554B/en
Priority to PCT/CN2023/085970 priority patent/WO2024087512A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a graph neural network compression method, a graph neural network compression device, electronic equipment and a storage medium, and relates to the field of neural networks, wherein the method comprises the following steps: acquiring a trained graph neural network and graph data used in training the graph neural network; determining degree distribution ranges corresponding to all graph vertexes in graph data, and dividing the degree distribution ranges into a plurality of degree intervals; under the constraint of a preset resource limiting condition, determining an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network by using a reinforcement learning and hardware accelerator; quantizing and compressing the vertex characteristics of the graph vertexes corresponding to degrees in the graph data by using the optimal interval quantization bit width, and quantizing and compressing the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network; and determining the optimal quantization bit width for the graph neural network and the graph vertex characteristics by utilizing reinforcement learning, and ensuring that the quantization graph neural network has high precision and lower resource consumption rate.

Description

Graph neural network compression method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of neural networks, and in particular, to a method and an apparatus for compressing a neural network, an electronic device, and a storage medium.
Background
In recent years, graph Neural Networks (GNNs) have received a lot of attention because they are capable of modeling irregular structure data. GNNs are widely used in various fields such as graph-based vertex classification, molecular interactions, social networking, recommendation systems or program understanding. Although GNN models are usually few in parameters, GNNs are characterized by high memory usage and high computational effort (expressed as long training or reasoning time) due to the fact that each application's storage and computation requirements are closely related to the size of the input graph data. This feature makes GNNs ineffective for most resource-constrained devices, such as embedded systems and internet of things devices. There are two main reasons behind this embarrassing situation. First, the input to the GNN consists of two types of data, graph structure (edge list) and vertex features (embedding). When the graph size becomes large, it is easy to cause its storage size to increase sharply. This puts a great strain on small devices that have very limited memory budgets. Second, larger scale graph data requires more data operations (e.g., additions and multiplications) and data movement (e.g., memory transactions), which consume a large amount of energy and exhaust the limited power consumption budget on these miniature devices.
To address the above challenges, quantitative compression may emerge as a "rock-two" solution for resource constrained devices, which may: (1) The memory size of the vertex characteristic is effectively reduced, so that the memory use is reduced; (2) minimizing operand size may reduce power consumption. However, the existing quantization method has the following two problems: (1) Simple but aggressive unified quantization of all data selection to minimize memory and power consumption costs, resulting in high accuracy loss; (2) Choosing a very conservative quantization to maintain accuracy, which can result in suboptimal memory and power saving performance; (3) Ignoring the different hardware architectures, all layers of GNN are quantized in a uniform way.
As such, how to perform quantization compression on the graph neural network and the corresponding graph data is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a graph neural network compression method, a graph neural network compression device, electronic equipment and a storage medium, which can automatically determine the optimal quantization bit width for a graph neural network and vertex characteristics in graph data by utilizing reinforcement learning under the constraint of a preset resource limiting condition so as to ensure that the obtained quantization graph neural network has higher precision and lower resource consumption rate.
In order to solve the above technical problem, the present invention provides a graph neural network compression method, including:
acquiring a trained graph neural network and graph data used in training the graph neural network;
determining degree distribution ranges corresponding to all graph vertexes in the graph data, and dividing the degree distribution ranges into a plurality of degree intervals;
under the constraint of a preset resource limiting condition, determining an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network by using a reinforcement learning and hardware accelerator;
and quantizing and compressing the vertex characteristics of the graph vertex corresponding to the degree in the graph data by using the optimal interval quantization bit width, and quantizing and compressing the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network.
Optionally, the determining a degree distribution range corresponding to vertices of all graphs in the graph data, and dividing the degree distribution range into a plurality of degree intervals includes:
arranging all graph vertexes in the graph data from small to large according to degrees to obtain a graph vertex sequence;
dividing the degree distribution range by using the graph vertex sequence to obtain a plurality of degree intervals; the number of the chart vertexes contained in each degree interval is the same or the difference value is smaller than a preset threshold value.
Optionally, after obtaining the optimal quantization map data and the optimal quantization map neural network, the method further includes:
and training the optimal quantization graph neural network by using the optimal quantization graph data to obtain a fine tuning quantization graph neural network so as to deploy the fine tuning quantization graph neural network to external service equipment.
Optionally, the time sequence structure of the hardware accelerator is reconfigurable bit-serial matrix multiplication superposition, and the space structure is a BitFusion architecture.
Optionally, the determining, by using reinforcement learning and a hardware accelerator, an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network under the constraint of a preset resource limitation condition includes:
acquiring reference accuracy corresponding to the execution of a specified task by the graph neural network, and initializing an agent and a historical reward value used by the reinforcement learning; the intelligent agent comprises an actor module and a critic module;
setting the strategy frequency to be 1, and initializing an action sequence and a historical state vector; the action sequence is used for storing an interval quantization bit width corresponding to each degree interval and a network quantization bit width corresponding to the graph neural network; the state vector is used for recording the corresponding memory occupation amount, the calculated amount and the corresponding accuracy when the quantitative map neural network processes the quantitative map data;
setting the time step to be 1, determining continuous actions by using the actor module under the constraint of the preset resource limiting condition, performing numerical updating on the action sequence by using the continuous actions, and determining the memory occupation amount and the calculated amount corresponding to the action sequence after updating;
performing quantization compression on the vertex features in the graph data and the graph neural network by using the action sequence, and sending the obtained quantization graph data and the quantization graph neural network to the hardware accelerator, so that the hardware accelerator trains the quantization graph neural network by using the quantization graph data and determines the current accuracy corresponding to the designated task executed by the trained quantization graph neural network;
determining a current state vector by using the memory occupation amount, the calculated amount and the accuracy corresponding to the action sequence, and determining a reward value by using the reference accuracy and the current accuracy;
when the reward value is determined to be larger than the historical reward value, updating the historical reward value by using the reward value, and updating the optimal interval quantization bit width and the optimal network quantization bit width by using the updated action sequence;
generating conversion data by using the historical state vector, the continuous action, the reward value and the current state vector, and training the actor module and the critic module by using the conversion data so that the critic module updates a strategy used by the actor module when the numerical value is updated;
when the time step is determined not to reach the length of the action sequence, adding 1 to the time step, updating the historical state vector by using the current state vector, and entering the step of determining continuous actions by using the actor module under the constraint of the preset resource limit condition;
when the time step is determined to reach the length of the action sequence and the strategy times do not reach a preset value, adding 1 to the strategy times, and entering the steps of initializing the action sequence and the historical state vector;
and when the strategy times are determined to reach the preset value, outputting the optimal interval quantization bit width and the optimal network quantization bit width.
Optionally, the determining, by the actor module, a continuous action under the constraint of the preset resource limitation condition, performing numerical update on the action sequence by using the continuous action, and determining the memory occupation amount and the calculation amount corresponding to the action sequence after the update includes:
selecting the continuous action by the actor module according to a Behavior strategy, and discretizing the continuous action in the following mode to obtain a discrete action value:
Figure DEST_PATH_IMAGE001
wherein,
Figure 100002_DEST_PATH_IMAGE002
is shown as
Figure DEST_PATH_IMAGE003
The i-th quantization bit width in the action sequence of each time step corresponds to a successive action,
Figure 100002_DEST_PATH_IMAGE004
is shown and
Figure 704067DEST_PATH_IMAGE002
corresponding discrete action values, Q comprising a plurality of predetermined quantization bit width values,
Figure DEST_PATH_IMAGE005
a function of rounding off is represented by,
Figure 100002_DEST_PATH_IMAGE006
and
Figure DEST_PATH_IMAGE007
representing a preset minimum quantization bit width and a maximum quantization bit width,
Figure 100002_DEST_PATH_IMAGE008
the function is used to select a target preset quantization bit width value Q in Q such that
Figure 100002_DEST_PATH_IMAGE009
Minimum;
performing numerical value updating on the action sequence by using the action value, determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence, and judging whether the memory occupation amount, the calculated amount and the delay amount meet the limitation of the preset resource limitation condition;
if yes, performing quantitative compression on the vertex features in the graph data and the graph neural network by using the action sequence;
if not, sequentially reducing the quantization bit width in the action sequence according to a preset sequence so as to update the action sequence again, and entering the step of determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence when each reduction action is completed.
Optionally, the selecting, with the actor module, a continuous action according to a Behavior policy includes:
selecting, with the actor module, a continuous action according to a Behavior policy as follows:
Figure DEST_PATH_IMAGE010
wherein,
Figure 100002_DEST_PATH_IMAGE011
is shown as
Figure 431327DEST_PATH_IMAGE003
Random UO noise corresponding to each time step,
Figure DEST_PATH_IMAGE012
is shown as
Figure 352010DEST_PATH_IMAGE003
The historical state vector corresponding to each time step,
Figure 100002_DEST_PATH_IMAGE013
representing an online actor network in the actor module,
Figure DEST_PATH_IMAGE014
representing online actor network parameters.
Optionally, the hardware accelerator trains the quantization map neural network with the quantization map data, including:
and the hardware accelerator trains the quantization map neural network by utilizing the quantization map data based on a small batch stochastic gradient descent method.
Optionally, the determining the memory occupation amount, the calculation amount, and the delay amount corresponding to the updated action sequence includes:
calculating the memory footprint using the following formula:
Figure 100002_DEST_PATH_IMAGE015
wherein,
Figure DEST_PATH_IMAGE016
is indicative of the amount of memory usage,
Figure 100002_DEST_PATH_IMAGE017
representing the number of graph vertices within a single mini-batch,
Figure DEST_PATH_IMAGE018
representing the quantization map neural network
Figure 100002_DEST_PATH_IMAGE019
The vertex dimension values corresponding to the individual network layers,
Figure DEST_PATH_IMAGE020
Figure 100002_DEST_PATH_IMAGE021
representing the number of all network layers of the neural network of the quantization map,
Figure DEST_PATH_IMAGE022
represents the maximum value of the interval quantization bit widths to which all graph vertices within a single mini-batch are assigned, S represents the total number of convolution kernels,
Figure 100002_DEST_PATH_IMAGE023
and
Figure DEST_PATH_IMAGE024
respectively representing the weight matrix of each network layer of the neural network of the quantization diagram and the network quantization bit width corresponding to the convolution kernel;
the calculated amount is calculated using the following formula:
Figure 100002_DEST_PATH_IMAGE025
wherein,
Figure DEST_PATH_IMAGE026
the amount of the calculation is represented by,
Figure 100002_DEST_PATH_IMAGE027
network quantization bit width corresponding to an activation matrix of each network layer of the quantization graph neural network is represented,
Figure DEST_PATH_IMAGE028
representation of quantization map neural network
Figure 144558DEST_PATH_IMAGE019
A total number of multiply-accumulate operations for a layer;
the delay amount is calculated using the following formula:
Figure 100002_DEST_PATH_IMAGE029
wherein,
Figure DEST_PATH_IMAGE030
the amount of the delay is represented by,
Figure 100002_DEST_PATH_IMAGE031
representing the quantization graph neural network
Figure 942619DEST_PATH_IMAGE019
The network layer handles the delay of small batches of graph data.
Optionally, the performing quantization compression on the vertex features in the graph data by using the action sequence includes:
truncating the vertex features of each graph vertex in the graph data into a range of [ -c, c ] (c > 0) and performing quantization compression on the truncated vertex features by using interval quantization bits corresponding to the degrees of the graph vertices in the action sequence:
Figure DEST_PATH_IMAGE032
wherein,
Figure 100002_DEST_PATH_IMAGE033
which represents a function of the quantization compression,
Figure DEST_PATH_IMAGE034
a function of rounding off is represented by,
Figure 100002_DEST_PATH_IMAGE035
representing a truncation function of
Figure DEST_PATH_IMAGE036
Is cut off to
Figure 100002_DEST_PATH_IMAGE037
Figure DEST_PATH_IMAGE038
The characteristics of the vertex are represented and,
Figure 100002_DEST_PATH_IMAGE039
representing the jth component in the vertex feature,
Figure DEST_PATH_IMAGE040
which represents a scaling factor, is the ratio of the scaling factor,
Figure 100002_DEST_PATH_IMAGE041
Figure DEST_PATH_IMAGE042
representing the sum in the sequence of actions
Figure 100002_DEST_PATH_IMAGE043
And interval quantization bits corresponding to degrees of the vertex of the graph.
Optionally, before performing quantization compression on vertex features in the graph data by using the action sequence, the method further includes:
the c value is determined by:
Figure DEST_PATH_IMAGE044
wherein,
Figure 100002_DEST_PATH_IMAGE045
the function is used to select the value of x such that
Figure DEST_PATH_IMAGE046
At the minimum, the temperature of the mixture is controlled,
Figure 100002_DEST_PATH_IMAGE047
to represent
Figure DEST_PATH_IMAGE048
Characteristic distribution of
Figure 100002_DEST_PATH_IMAGE049
KL divergence between feature distributions of (a); the characteristic distribution is a maximum value, a minimum value, a mean value, a variance, a sharpness or a kurtosis.
Optionally, the actor module includes an online actor network and a target actor network, the critic module includes an online critic network and a target critic network, and the initializing the agent used for reinforcement learning includes:
initializing online actor network parameters of the online actor network, and setting target actor network parameters of the target actor network and the online actor network parameters to be the same values;
initializing the online commenting family network parameters of the online commenting family network, and setting the target commenting family network parameters of the target commenting family network and the online commenting family network parameters to be the same values.
Optionally, the training the actor module and the critic module using the transformation data comprises:
adding the conversion data to the experience playback pool, and randomly sampling a preset number of conversion data from the experience playback pool as training data;
determining a first gradient of the online critic network parameter using the training data, the target actor network, the target critic network, the online critic network, and a loss function as follows;
Figure DEST_PATH_IMAGE050
wherein, the
Figure 100002_DEST_PATH_IMAGE051
The loss function is represented by a function of the loss,
Figure DEST_PATH_IMAGE052
the sequential action is represented by the sequential action,
Figure 100002_DEST_PATH_IMAGE053
denotes the first
Figure DEST_PATH_IMAGE054
The historical state vector corresponding to each time step,
Figure 100002_DEST_PATH_IMAGE055
representing an online network of critics that is online,
Figure DEST_PATH_IMAGE056
representing the online critic's network parameters,
Figure 100002_DEST_PATH_IMAGE057
representing the preset number;
Figure DEST_PATH_IMAGE058
representing an estimate of the target critic's network,
Figure 100002_DEST_PATH_IMAGE059
Figure DEST_PATH_IMAGE060
is shown as
Figure 296240DEST_PATH_IMAGE054
The reward value corresponding to each time step is calculated,
Figure 100002_DEST_PATH_IMAGE061
a pre-set discount factor is represented by,
Figure DEST_PATH_IMAGE062
a network of target critics is represented by,
Figure 100002_DEST_PATH_IMAGE063
representing the network parameters of the target critic,
Figure DEST_PATH_IMAGE064
a network of target actors is represented as,
Figure 100002_DEST_PATH_IMAGE065
a network parameter representing the target actor is shown,
Figure DEST_PATH_IMAGE066
is shown as
Figure 422328DEST_PATH_IMAGE054
Current state vectors corresponding to the time steps;
updating the online commenting family network parameters according to the first gradient;
determining a performance goal using the training data, the updated online critic network, the online actor network, and an objective function, and determining a second gradient of the performance goal with respect to determining the online actor network parameters:
Figure 100002_DEST_PATH_IMAGE067
wherein,
Figure 100002_DEST_PATH_IMAGE068
indicating when the environmental state O obeys a distribution function of
Figure DEST_PATH_IMAGE069
When being distributed
Figure 100002_DEST_PATH_IMAGE070
The expected value of (c) is,
Figure DEST_PATH_IMAGE071
Figure 100002_DEST_PATH_IMAGE072
a network parameter representing the on-line actor is shown,
Figure DEST_PATH_IMAGE073
representing the second gradient;
updating the online actor network parameters based on the second gradient;
updating the target comment family network parameter and the target actor network parameter by using the updated online comment family network parameter and the updated online actor network parameter in the following way:
Figure 100002_DEST_PATH_IMAGE074
wherein,
Figure DEST_PATH_IMAGE075
is a preset value.
The invention also provides a graph neural network compression device, comprising:
the acquisition module is used for acquiring the trained graph neural network and graph data used in the training process;
the interval determining module is used for determining degree distribution ranges corresponding to all graph vertexes in the graph data and dividing the degree distribution ranges into a plurality of degree intervals;
a quantization bit width determining module, configured to determine, by using a reinforcement learning and a hardware accelerator, an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network under a constraint of a preset resource restriction condition;
and the quantization compression module is used for performing quantization compression on the vertex characteristics of the graph vertex corresponding to the degree in the graph data by using the optimal interval quantization bit width and performing quantization compression on the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network.
The present invention also provides an electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the graph neural network compression method as described above when executing the computer program.
The present invention also provides a computer-readable storage medium having stored thereon computer-executable instructions that, when loaded and executed by a processor, implement the graph neural network compression method as described above.
The invention provides a graph neural network compression method, which comprises the following steps: acquiring a trained graph neural network and graph data used in training the graph neural network; determining degree distribution ranges corresponding to all graph vertexes in the graph data, and dividing the degree distribution ranges into a plurality of degree intervals; under the constraint of a preset resource limiting condition, determining an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network by using a reinforcement learning and hardware accelerator; and quantizing and compressing the vertex characteristics of the graph vertex corresponding to the degree in the graph data by using the optimal interval quantization bit width, and quantizing and compressing the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network.
It can be seen that, when the trained graph neural network and the graph data used in the training are obtained, the degree distribution range corresponding to all graph vertexes in the graph data is counted firstly, and the return is divided into a plurality of degree intervals; subsequently, under the constraint of a preset resource limitation condition, determining the optimal interval quantization bit width corresponding to each degree interval and the optimal network quantization bit width corresponding to the graph neural network by adopting reinforcement learning and a hardware accelerator, and performing quantization compression on the vertex characteristics of the graph data and the graph neural network by utilizing the two quantization bit widths, wherein the reinforcement learning can automatically search the optimal quantization bit width distribution strategy corresponding to each degree interval and the graph neural network according to the feedback of the hardware accelerator, namely the automatic search of the optimal interval quantization bit width and the optimal network quantization bit width can be realized; meanwhile, the automatic searching action of reinforcement learning is limited by a preset resource limiting condition, namely the finally obtained optimal interval quantization bit width and the optimal network quantization bit width can be ensured to be suitable for resource limited equipment; finally, the degree distribution range of the graph vertex is divided into a plurality of degree intervals, and the corresponding optimal interval quantization bit width is determined for each interval, namely, the quantization compression of the vertex characteristics of the graph vertex with different degrees can be carried out to different degrees, so that the problem of high precision loss easily caused by the simple selection of all data but the advanced unified quantization in the existing scheme can be effectively avoided. In brief, because the invention adopts reinforcement learning to determine the optimal quantization bit width for the graph neural network and the graph data used in the training, the automatic determination of the quantization bit width can be realized, and the relationship between the performance and the network model precision can be effectively balanced, so that the finally obtained quantization graph data and the quantization graph neural network not only have higher precision, but also can be suitable for resource-limited equipment. The invention also provides a graph neural network compression device, electronic equipment and a computer readable storage medium, which have the beneficial effects.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method for compressing a neural network according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an exemplary architecture of a neural network according to an embodiment of the present invention;
FIG. 3 is a block diagram of a neural network compression system according to an embodiment of the present invention;
FIG. 4 is a block diagram illustrating a neural network compression apparatus according to an embodiment of the present invention;
fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention;
fig. 6 is a block diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
In order to more effectively perform quantization compression on the graph neural network model to ensure that the quantized graph neural network obtained by compression has higher precision and lower resource consumption rate, embodiments of the present invention can provide a graph neural network compression method, which can automatically determine an optimal quantization bit width for the graph neural network and graph data by using reinforcement learning under the constraint of a preset resource limitation condition to ensure that the obtained quantized graph neural network has higher precision and lower resource consumption rate. Referring to fig. 1, fig. 1 is a flowchart of a neural network compression method according to an embodiment of the present invention, where the method includes:
s100, obtaining the trained graph neural network and graph data used in training.
It should be noted that the graph neural network obtained in this step is an original, full-precision graph neural network, and the graph data is training data of the graph neural network, where parameters such as weights and convolution kernels included in the graph neural network and the graph data are floating point type data and are mostly represented by FP 32. Floating point data are highly accurate, but correspondingly, the memory space required to store them is also large. The invention aims to find out proper quantization bit width for the weight, convolution kernel parameters and the like of each layer of the graph neural network and graph data on the premise of ensuring the inference precision of the graph neural network model so as to reduce the requirement of storage space. The quantization bit width here is generally an integer representing a low precision, such as int4, int8, etc.
For ease of understanding, the graph data and the graph neural network will first be briefly described. Graph data is the basic input content of a graph neural network. Consider a graph G = (V, E) with n vertices and m edges, i.e., with | V | = n and | E | = m, with the average degree of the graph vertices d = m/n. Connectivity in the graph is represented by an adjacency matrix
Figure 100002_DEST_PATH_IMAGE076
Give out the elements
Figure DEST_PATH_IMAGE077
Representing vertices
Figure 100002_DEST_PATH_IMAGE078
And
Figure DEST_PATH_IMAGE079
the adjacent edges of the two adjacent edges are adjacent,
Figure 100002_DEST_PATH_IMAGE080
indicating no adjacency. The degree matrix D is a diagonal matrix, and the values of n elements on the main diagonal respectively represent degrees of n vertices, and the remaining elements are all zero. Each vertex
Figure DEST_PATH_IMAGE081
All have a length of
Figure 100002_DEST_PATH_IMAGE082
Feature vectors of all graph vertexes form a feature matrix
Figure DEST_PATH_IMAGE083
. In the embodiment of the invention, the part to be compressed in the graph data is a feature matrix formed by feature vectors of all graph vertices
Figure 100002_DEST_PATH_IMAGE084
The matrix belongs to floating point type data.
Further, a graph neural network is a special neural network that can handle irregularly structured data. Although the structure of the graph neural network can be designed following different guidelines, almost all graph neural networks can be interpreted as performing message passing on vertex features, followed by feature transformation and activation. FIG. 2 illustrates the structure of a typical graph neural network: it is composed of input layer, L-layer graph volume layer and output layer. The input layer is responsible for reading the adjacency matrix A or the adjacency list AdjList of the topological structure of the token graph and the vertex characteristic matrix
Figure 100002_DEST_PATH_IMAGE085
. The graph convolution layer is responsible for the extraction of the vertex features, and for each layer of the graph convolution layer
Figure DEST_PATH_IMAGE086
It reads in adjacency matrix A or adjacency list AdjList, and vertex feature matrix
Figure 100002_DEST_PATH_IMAGE087
Outputting new vertex feature matrix through graph convolution operation and nonlinear transformation
Figure DEST_PATH_IMAGE088
. The output layer can be freely set according to different tasks, for example, vertex classification can be realized through a softmax function. Typically, in a neural network consisting of L-layer map convolutional layers, the second
Figure 100002_DEST_PATH_IMAGE089
The graph convolution operation of a layer can generally be written in the form:
Figure DEST_PATH_IMAGE090
wherein,
Figure 100002_DEST_PATH_IMAGE091
representation definition of message passing operator
Figure 100002_DEST_PATH_IMAGE092
A graph convolution kernel;
Figure 100002_DEST_PATH_IMAGE093
representing a non-linear activation function.
Figure DEST_PATH_IMAGE094
Is the first
Figure 100002_DEST_PATH_IMAGE095
The s-th convolution kernel of a layer corresponds to a learnable linear weight matrix,
Figure DEST_PATH_IMAGE096
is shown as
Figure 133145DEST_PATH_IMAGE095
The layer map wraps the vertex feature dimensions of the layer input. Within this general framework, the main differences of the different-map neural networks now select different-map convolution kernels
Figure 100002_DEST_PATH_IMAGE097
. Whether it is the vertex feature matrix X, or the graph convolution kernel F, or the weight W, they are typically floating point type data. It should be noted that only the graph convolution layer has convolution kernels and activations, and the input and output layers have weights only.
It should be noted that the embodiments of the present invention are not limited to specific graph neural networks and graph data. As described above, the structure of the graph neural network can be designed following different guidelines; meanwhile, it can be understood that the specific content of the graph data, even the complexity thereof, may be different for different tasks, and thus the specific graph neural network and the graph data may be selected according to the actual application requirements. The compression method provided by the embodiment of the invention is suitable for various graph neural networks because the embodiment of the invention adopts reinforcement learning to determine the optimal quantization bit width corresponding to the graph neural networks and the graph data, and the reinforcement learning has stronger adaptability to various environments.
S200, determining the degree distribution range corresponding to all graph vertexes in the graph data, and dividing the degree distribution range into a plurality of degree intervals.
In the prior art, quantization compression of vertex features of each graph vertex in graph data is generally performed by using a uniform quantization bit width. Although the complexity and the storage scale of the graph data are effectively reduced, the indiscriminate quantization compression method brings significant precision loss to the graph neural network model. Therefore, in the embodiment of the present invention, for graph vertices with different degrees in graph data, different quantization bit widths may be used for compression, so as to alleviate the loss of precision of the graph neural network model caused by quantization of the graph data. In particular, in graph neural network computations, vertices with higher degrees will typically get more rich information from neighboring vertices, which makes them more robust to low quantization bits, since the random error of quantization can typically average to 0 through a large number of aggregation operations. In particular, given a quantization bit width q, a vertex
Figure DEST_PATH_IMAGE098
Quantization error of
Figure 100002_DEST_PATH_IMAGE099
Is a random variable and follows a uniform distribution. For the vertex with a larger degree, the vertex can be taken
Figure 21598DEST_PATH_IMAGE098
And its adjacent vertex
Figure DEST_PATH_IMAGE100
Polymerizing a large amount of
Figure 952644DEST_PATH_IMAGE099
And
Figure 100002_DEST_PATH_IMAGE101
and the average result will converge to 0 according to the law of large numbers. Thus, vertices with a large number of degrees are more robust to quantization errors, and smaller quantization bits may be used for those vertices with a high number of degrees, while larger quantization bits may be used for vertices with a low number of degrees.
Further, since the vertex degrees of the real-world graph mostly follow power-law distribution, if quantization bit widths are allocated to each graph vertex with different degrees, the state space explosion will be caused. For example, even for a small scale of the graph data com-levejoural, a significant portion of the vertex degrees are spread between 1 and 10 4 In the meantime. If the quantization space is 8, the state space will reach 8 surprisingly 10000 . Obviously, such a huge state space cannot meet the application requirements. Therefore, in order to reduce the complexity of the state space, the embodiment of the present invention may first count the degrees corresponding to each graph vertex in the graph data to obtain the degree distribution range corresponding to the graph data, and then divide this range into a plurality of degree intervals to determine the optimal interval quantization bit width for each interval, so that the size of the state space can be greatly reduced, and the search convenience of the optimal quantization bit width is further improved. According to the above description, the distribution rule for finally obtaining the optimal interval quantization bit width should be: the larger the degree value corresponding to the degree interval is, the larger the corresponding optimal interval quantization bit width is. It should be noted that the method for dividing the power distribution range is not limited in the embodiments of the present invention, and for example, the power distribution range may be divided equally, or the power distribution range may be divided according to the distribution of the graph vertices in the range, for example, the number of graph vertices corresponding to each power interval may be ensured to be equalThe same or close. In order to further reduce the precision loss, in the embodiment of the present invention, the degree distribution range may be divided according to the distribution of the graph vertices in the range, so as to ensure that the graph vertices included in each interval are the same in number.
In one possible case, determining a degree distribution range corresponding to all graph vertices in the graph data, and dividing the degree distribution range into a plurality of degree intervals may include:
step S201, arranging all graph vertexes in graph data from small to large according to degrees to obtain a graph vertex sequence;
step S202, dividing a degree distribution range by using a graph vertex sequence to obtain a plurality of degree intervals; the number of the chart vertexes contained in each degree interval is the same or the difference value is smaller than a preset threshold value.
It should be noted that, the embodiment of the present invention does not limit the specific value of the preset threshold, and the preset value may be set according to the actual application requirement. In order to reduce the data difference between the top points of the graphs in the graph degree interval, the value of the preset threshold value can be as small as possible. Specifically, for graph data G = (V, E), the vertex degree distribution may be counted first, and all vertices in the graph G are sorted from small to large in degree. Finding a list of vertex degree segmentation points in the sequence
Figure DEST_PATH_IMAGE102
To divide all vertices into k intervals:
Figure 100002_DEST_PATH_IMAGE103
the number of vertices falling in each interval is made equal or close to each other. Wherein,
Figure DEST_PATH_IMAGE104
Figure 100002_DEST_PATH_IMAGE105
and
Figure DEST_PATH_IMAGE106
Figure 100002_DEST_PATH_IMAGE107
and
Figure DEST_PATH_IMAGE108
respectively representing the minimum value and the maximum value of all vertex degrees in certain graph data. On the basis, a vertex degree-quantization bit width distribution table is established
Figure 100002_DEST_PATH_IMAGE109
Graph vertices in the same interval are assigned the same quantization bit width: if the vertex degrees fall within
Figure DEST_PATH_IMAGE110
The interval is allocated
Figure 100002_DEST_PATH_IMAGE111
Bit width.
S300, under the constraint of a preset resource limiting condition, determining the optimal interval quantization bit width corresponding to each degree interval and the optimal network quantization bit width corresponding to the graph neural network by using reinforcement learning and a hardware accelerator.
After the division of the degree intervals is completed, the embodiment of the invention determines the optimal interval quantization bit width corresponding to each degree interval and the optimal network quantization bit width corresponding to the graph neural network model by using reinforcement learning and a hardware accelerator under the constraint of a preset resource limiting condition. It should be noted here that the optimal network quantization bit width corresponding to the graph neural network specifically refers to an optimal network quantization bit width corresponding to a graph convolution kernel matrix, a weight matrix, and an activation matrix of the graph neural network, and the optimal network quantization bit widths corresponding to the three matrices may be the same or different; in addition, the optimal network quantization bit widths corresponding to the convolution kernel matrix, the weight matrix and the activation matrix of each network layer of the graph neural network can be the same or different, and can be selected according to the practical application requirements, wherein the input layer and the output layer do not have the convolution kernel matrix and the activation matrix, and the convolution layer has the convolution kernel matrix and the activation matrix. It can be understood that although different optimal network quantization bit widths can bring higher network model accuracy, the search computation amount of the optimal network quantization bit width is easily increased, and therefore, the setting of the optimal network quantization bit widths of the three matrices can be set as needed after balancing the network model accuracy and the search computation amount. Of course, it should be noted that each network layer in the graph neural network does not have the graph convolution kernel matrix, the weight matrix, and the activation matrix, for example, the convolution layer has three matrices, but the input layer and the output layer do not have the graph convolution kernel matrix and the activation matrix. Therefore, when setting the network quantization bit width for the graph neural network, the setting can be further performed according to the specific structure of the graph neural network.
Further, the preset resource limitation condition is used to limit the computation resources consumed for processing the quantized graph data and the quantized graph neural network (such as training, executing a specified task, and the like), which is because the graph neural network consumes more computation resources, and if the quantized compression is performed arbitrarily without considering a specific hardware framework, the quantized graph data and the quantized graph neural network obtained finally may have a larger processing computation amount, a larger memory occupation amount, and a longer processing delay, and are not favorable for deployment and application. Therefore, the embodiment of the invention limits the reinforcement learning by adopting the preset resource limiting condition. It should be noted that the embodiment of the present invention does not limit the specific preset resource limiting conditions, and may include a calculated amount threshold, a memory occupied amount threshold, and a delay amount threshold, for example, and each threshold is provided with a corresponding calculation formula for calculating the calculated amount, the memory occupied amount, and the delay amount corresponding to the quantized graph data and the quantized graph neural network. It can be understood that the calculated amount, the memory occupied amount and the delay amount corresponding to the quantized graph data and the quantized graph neural network should be less than or equal to the corresponding calculated amount threshold value, the memory occupied amount threshold value and the delay amount threshold value. The threshold and the corresponding formula are determined by direct feedback of a hardware accelerator, wherein the hardware accelerator is used for verifying the quantization effect of the graph data and the graph neural network, such as verifying the consumption of the quantization compression network on the computing resources and the corresponding accuracy of the network when executing the specified task. It should be noted that, the embodiment of the present invention does not limit the specific calculated amount threshold, the memory occupied amount threshold, and the delay amount threshold, nor the specific corresponding calculation formula of the above thresholds, and may be set according to the actual application requirements, or refer to the description in the following embodiments. The embodiment of the present invention also does not limit the specific structure of the hardware accelerator, for example, the time sequence structure of the hardware accelerator may be a reconfigurable Bit-Serial Matrix Multiplication Overlay (BISMO), and the space structure may be a BitFusion architecture. A preferred hardware accelerator configuration can be found in the table below.
TABLE 1 configuration of hardware accelerators
Hardware accelerator model Batch size PE array AXI port Block RAM
Zynq-7020 1 8*8 4×64b 140×36Kb
F37X 16 16*16 4*256b 2160*36KB
Further, reinforcement learning is one of the paradigms and methodologies of machine learning to describe and solve the problem of an agent (agent) learning strategies to maximize a return or achieve a specific goal during interaction with the environment. The problems to be solved by reinforcement learning are as follows: let agents learn how to perform actions in an environment to obtain a maximum bonus value total (total rewarded). This reward value is typically associated with a task goal defined by the agent. The main learning contents of the intelligent agent comprise: the first is action policy, and the second is planning. The learning goal of the behavior strategy is an optimal strategy, namely, by using the strategy, the behavior of the intelligent agent in a specific environment can obtain the maximum reward value, so that the task goal of the intelligent agent is realized. Actions (actions) can be simply divided into: (1) Continuous, such as steering wheel angle, throttle, brake control signals in racing games, joint servo motor control signals of robots; and (2) discrete, such as go, greedy snake games, and the like.
Embodiments of the present invention specifically use a reinforcement learning method based on both value and policy, which may also be referred to as an Actor-Critic method. The Actor-criticic method combines the advantages of a value-based method and a policy-based method, improves sampling efficiency by learning a Q-value function or a state-value function V using the value-based method (the part processed by a reviewer), and learns a policy function using the policy-based method (the part processed by an Actor), thereby being applicable to a continuous or high-dimensional motion space. The Actor-criticic method can be regarded as an extension of a value-based method in a continuous motion space, and can also be regarded as an improvement of a strategy-based method in terms of reducing sample variance and improving sampling efficiency.
Specifically, referring to fig. 3, fig. 3 is a block diagram of a neural network compression system according to an embodiment of the present invention, where the system includes four parts, namely, an Actor-critical framework-based DDPG (Deep Deterministic Policy Gradient) agent, a Policy, a quantization implementation, and a hardware accelerator. The DDPG agent gives actions according to a specific strategy on the premise of meeting the constraint of hardware accelerator resources (namely, preset resource limiting conditions) according to the current environment state O: the appropriate quantization bit widths are assigned to the features of the vertices of each degree interval and the graph convolution kernels (if any), weights and activations (if any) of all layers of the graph neural network. And the upper computer quantizes the trained floating point diagram neural network model and diagram data according to a quantization bit width distribution scheme provided by the DDPG intelligent body to obtain a quantization diagram neural network model and quantization diagram data. Then, the quantized data and the quantized network are mapped or distributed to the hardware accelerator, and the hardware accelerator trains the quantized graph neural network by using the quantized graph data, executes a specified task by using the quantized graph neural network after the training, and feeds back the accuracy difference of the quantized graph neural network and the quantized graph neural network to the DDPG agent as a reward. And the DDPG intelligent body adjusts the strategy according to the information fed back by the environment and outputs a new action until an optimal strategy is obtained. Of course, the system may also include other workflows, and to avoid redundancy of description, please refer to the description in the following embodiments for the specific workflow of the system.
S400, quantizing and compressing the vertex characteristics of the graph vertex corresponding to the degree in the graph data by using the optimal interval quantization bit width, and quantizing and compressing the graph neural network by using the optimal network quantization bit width to obtain the optimal quantization graph data and the optimal quantization graph neural network.
After the optimal interval quantization bit width and the optimal network quantization bit width are obtained, the vertex features of each graph vertex in the corresponding graph data and the graph neural network can be quantized and compressed to obtain the optimal quantization graph data and the optimal quantization graph neural network. The embodiment of the present invention does not limit the specific steps of the quantitative compression, and may be set according to the actual application requirements, or refer to the description in the following embodiments. It should be noted that although embodiments of the present invention have endeavored to improve the accuracy of the optimal quantization map neural network, quantization compression itself may negatively impact the accuracy with which the optimal quantization map neural network performs the specified task. In this regard, after the quantization compression is finished, the optimal quantization map data is used again to train the quantization map neural network so as to recover the accuracy of the optimal quantization map neural network in executing the designated task, so that the finally obtained fine tuning quantization map neural network is deployed to external service equipment for external service.
In one possible case, after obtaining the optimal quantization map data and the optimal quantization map neural network, the method may further include:
s500, training the optimal quantization graph neural network by using the optimal quantization graph data to obtain a fine tuning quantization graph neural network, and deploying the fine tuning quantization graph neural network into external service equipment.
It should be noted that, the embodiment of the present invention does not limit the training process of the optimal quantization graph neural network, and reference may be made to the related art of the graph neural network.
Based on the embodiment, when the trained graph neural network and the graph data used in the training are obtained, the degree distribution range corresponding to all graph vertexes in the graph data is counted firstly, and the degree distribution range is divided into a plurality of degree intervals; then, under the constraint of a preset resource limitation condition, determining an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to a graph neural network by adopting reinforcement learning and a hardware accelerator, and performing quantization compression on the vertex characteristics of each graph vertex in graph data and the graph neural network by utilizing the two quantization bit widths, wherein the reinforcement learning can automatically search an optimal quantization bit width distribution strategy corresponding to each degree interval and the graph neural network according to the feedback of the hardware accelerator, namely, the automatic search of the optimal interval quantization bit width and the optimal network quantization bit width can be realized; meanwhile, the automatic search action of reinforcement learning is limited by a preset resource limiting condition, namely the finally obtained optimal interval quantization bit width and the optimal network quantization bit width can be ensured to be suitable for resource limited equipment; finally, the degree distribution range of the graph vertex is divided into a plurality of degree intervals, and the corresponding optimal interval quantization bit width is determined for each interval, namely, the quantization compression of the vertex characteristics of the graph vertex with different degrees can be carried out to different degrees, so that the problem of high precision loss easily caused by the simple selection of all data but the advanced unified quantization in the existing scheme can be effectively avoided. In brief, because the invention adopts reinforcement learning to determine the optimal quantization bit width for the graph neural network and the graph data used in the training, the automatic determination of the quantization bit width can be realized, and the relationship between the performance and the network model precision can be effectively balanced, so that the finally obtained quantization graph data and the quantization graph neural network not only have higher precision, but also can be suitable for resource-limited equipment.
Based on the above embodiments, the specific workflow of the neural network compression system will be described below. For ease of understanding, the sequence of actions, policies, time steps, prize values and conversion data presented hereinafter will be described first. The action sequence is used for storing the interval quantization bit width corresponding to each degree interval and the network quantization bit width corresponding to the graph neural network, for example, for given graph data G = (V, E), the vertex degree distribution range is counted first, and is divided into k intervals according to a certain strategy. Further, for k degree intervals and three matrices of the graph neural network, the length of the action sequence may be k +3. The process of determining a complete sequence of actions is called an one-time strategy (epicode) which contains N time steps (step), where the value of N is equal to the length of the sequence of actions. It should be noted that the sequence of actions is updated once per time step performed, so a strategy can typically produce N different sequences of actions. Further, it can be understood that the action sequences can be used for quantization compression, and since the previous action sequence is different from the next action sequence, the compression effects corresponding to the two action sequences are also different, in other words, the resource consumption (such as memory occupancy rate, calculation amount, etc.) corresponding to the quantized graph data and the quantized graph neural network generated by using the two action sequences is different, and the accuracy corresponding to the execution of the specified task is also different. Therefore, in the embodiment of the present invention, the state vector may be used to record the resource consumption and the change between the accuracy, specifically, for the quantized graph data and the quantized graph neural network compressed by the previous action sequence, the memory occupancy, the calculation amount and the accuracy corresponding to the execution of the specified task may be recorded by using the historical state vector, and the quantized graph data and the quantized graph neural network compressed by the next action sequence, the memory occupancy, the calculation amount and the accuracy corresponding to the execution of the specified task may be recorded by using the current state vector. Further, the original graph neural network can be used for executing reference accuracy corresponding to a specified task and the quantized graph neural network can be used for executing accuracy corresponding to the same task to determine the reward value, wherein the reference accuracy specifically refers to inference accuracy corresponding to the graph neural network after the original graph neural network is trained by using original graph data, such as classification accuracy in a classification task. After that, the historical state vector, the action sequence, the reward value and the current state vector corresponding to each time step constitute a transformation data (transition), obviously, the data includes the action, the reward and the state transition of the quantization compression, and the agent can sense the execution effect of the action through the data. In other words, the agent may be trained using the transformed data to update the strategy that the agent employs in determining the action.
Based on the above description, the following describes in detail a specific workflow of the graph neural network compression system, where in a possible case, under the constraint of a preset resource limitation condition, determining an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network by using a reinforcement learning and hardware accelerator may include:
s310, acquiring the reference accuracy corresponding to the execution of the designated task by the graph neural network, and initializing an agent and a historical reward value used for reinforcement learning; the agent includes an actor module and a critic module.
It should be noted that the embodiments of the present invention are not limited to the specific tasks performed by the neural network, and can be set according to the actual application requirements. The accuracy with which the original graph neural network performs the task will be set as the reference accuracy by embodiments of the present invention. The embodiment of the invention also does not limit the calculation mode of the accuracy, and can be set according to the actual application requirement. In one possible case, for a multi-classification task, a test pattern dataset is set
Figure DEST_PATH_IMAGE112
With only one class label per vertex and all vertices in common
Figure 100002_DEST_PATH_IMAGE113
A category label of
Figure DEST_PATH_IMAGE114
The ratio of the number of vertexes to the total number of vertexes of
Figure 100002_DEST_PATH_IMAGE115
And is and
Figure DEST_PATH_IMAGE116
. Considering each class as a "positive class (positive)" and the rest as a "negative class (negative)", and by using the definition of the corresponding index in the classical two-classification problem, the classification accuracy of the multi-classification problem can be defined as:
Figure 100002_DEST_PATH_IMAGE117
further, in order to determine the optimal interval quantization bit width and the optimal network quantization bit width in the search process of the agent, the embodiment of the present invention also specifically sets a historical reward value for recording the highest reward value occurring in the search process. When the highest reward value appears, the embodiment of the invention updates the historical record value, the optimal interval quantization bit width and the optimal network quantization bit width. Of course, it is understood that the historical award values should also have initial values, and the initialization process herein is to set initial values for the historical award values. The embodiment of the invention does not limit the specific initial value of the historical reward value, and the historical reward value is only required to be as small as possible.
Further, the embodiment of the present invention does not limit the specific process of initializing the agent, where the initialization mainly initializes the parameters in the agent, and reference may be made to the related technology of the DDPG agent.
S320, setting the strategy frequency to be 1, and initializing an action sequence and a historical state vector; the action sequence is used for storing the interval quantization bit width corresponding to each degree interval and the network quantization bit width corresponding to the graph neural network; the state vector is used for recording the corresponding memory occupation amount, the calculated amount and the corresponding accuracy when the quantized graph neural network processes the quantized graph data.
Specifically, the sequence of actions may be represented as:
Figure DEST_PATH_IMAGE118
wherein "
Figure 100002_DEST_PATH_IMAGE119
"means belonging to the interval
Figure DEST_PATH_IMAGE120
Is used for allocating quantized bit width
Figure 100002_DEST_PATH_IMAGE121
,“
Figure DEST_PATH_IMAGE122
”、“
Figure 100002_DEST_PATH_IMAGE123
"and"
Figure DEST_PATH_IMAGE124
"the quantized bit width set for the graph convolution kernel (if any), weight and activation (if any) of all layers of the graph neural network respectively is
Figure 100002_DEST_PATH_IMAGE125
Figure DEST_PATH_IMAGE126
And
Figure 100002_DEST_PATH_IMAGE127
. Of course, if different quantization bit widths are assigned to the graph convolution kernels (or weights or activations) of different layers of the graph neural network, at this time, the length of the action sequence a of the DDPG agent will become k +3l +2, where L represents the number of graph convolution layers, that is:
Figure DEST_PATH_IMAGE128
further, the state vector may be represented as:
Figure 100002_DEST_PATH_IMAGE129
where acc represents accuracy, store represents memory footprint, and comp represents computational effort. The determination of the memory occupation amount and the calculation amount can refer to the description in the subsequent embodiments.
S330, setting the time step to be 1, determining continuous actions by using an actor module under the constraint of a preset resource limiting condition, updating numerical values of the action sequence by using the continuous actions, and determining the memory occupation amount and the calculated amount corresponding to the action sequence after updating.
It will be appreciated that the numerical update of the action sequence by the actor module corresponds to the actor module giving an action based on the current state and strategy. It is noted that the actor module (actor) first determines a continuous action and then uses this continuous action to update the value of the action sequence. However, since the quantization bit width is usually a discrete value, for example, the conventional quantization bit width is 2, 4, 8, 16, 32 bits, etc., after obtaining the continuous operation, it is first necessary to discretize the continuous operation to obtain a discrete operation value, and then update the operation sequence with the discrete operation value. This process is described in detail below.
In a possible case, under the constraint of a preset resource limitation condition, determining continuous actions by using an actor module, performing numerical update on an action sequence by using the continuous actions, and determining the memory occupation amount and the calculated amount corresponding to the action sequence after the update, the method comprises the following steps:
step S331, selecting continuous action by using the actor module according to a Behavior strategy, and discretizing the continuous action in the following way to obtain a discrete action value:
Figure DEST_PATH_IMAGE130
wherein,
Figure 953312DEST_PATH_IMAGE002
denotes the first
Figure 346116DEST_PATH_IMAGE003
The i-th quantization bit width in the operation sequence of the time step corresponds to a successive operation,
Figure 603922DEST_PATH_IMAGE004
is shown and
Figure 830504DEST_PATH_IMAGE002
corresponding discrete action values, Q comprising a plurality of predetermined quantization bit width values,
Figure 213075DEST_PATH_IMAGE005
a function of rounding off is represented by,
Figure 284936DEST_PATH_IMAGE006
and
Figure 85664DEST_PATH_IMAGE007
indicating a preset minimum quantization bit width and a preset maximum quantization bit width,
Figure 155251DEST_PATH_IMAGE008
the function is used to select a target preset quantization bit width value Q in Q such that
Figure 100002_DEST_PATH_IMAGE131
Minimum;
step S332, performing numerical value updating on the action sequence by using the action value, determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence, and judging whether the memory occupation amount, the calculated amount and the delay amount meet the limit of a preset resource limiting condition; if yes, go to step S333, otherwise, go to step S334;
step S333, if yes, performing quantization compression on the vertex features and the graph neural network in the graph data by using the action sequence;
and step S334, if not, sequentially reducing the quantization bit width in the action sequence according to a preset sequence to update the action sequence again, and when each reduction action is completed, determining a memory occupation amount, a calculation amount, and a delay amount corresponding to the updated action sequence.
Specifically, for the action sequence with length k +3, the sequence
Figure 87435DEST_PATH_IMAGE003
Individual time step, DDPG agent takes continuous action
Figure DEST_PATH_IMAGE132
And satisfy
Figure 100002_DEST_PATH_IMAGE133
And each component thereof is divided by the formula
Figure DEST_PATH_IMAGE134
Round off to
Figure 100002_DEST_PATH_IMAGE135
The bit width value nearest to it
Figure DEST_PATH_IMAGE136
I.e. satisfy
Figure 100002_DEST_PATH_IMAGE137
At a minimum, wherein
Figure 556463DEST_PATH_IMAGE006
=2,
Figure 474347DEST_PATH_IMAGE007
=32. For example, when
Figure DEST_PATH_IMAGE138
The calculation results in the above formula show that when q is selected to be 4, the bit width of the quantization is ensured compared with other preset quantization bit widths
Figure 511573DEST_PATH_IMAGE137
At a minimum, and therefore correspond
Figure 868736DEST_PATH_IMAGE136
Should be set to 4.
Further, in practical applications, due to limited computational budget (i.e., computational load, latency, and memory footprint), embodiments of the present invention may wish to find a quantized bit-width allocation scheme with optimal speculative performance given constraints. Embodiments of the present invention encourage agents to meet computational budgets by limiting action space. In particular, each time an agent issues an action
Figure 100002_DEST_PATH_IMAGE139
In the embodiment of the invention, the amount of hardware resources to be used by the quantized graph neural network needs to be estimated. If the current allocation scheme exceeds the hardware accelerator resource budget, the vertices of each degree interval and the bit widths of the graph convolution kernels (if any), the weights and the activations (if any) of all the layers of the graph neural network are sequentially reduced until the hardware accelerator resource budget constraint is finally met. The bit width values may also decrease in other orders, for example, in order from large to small, and the embodiment of the present invention is not limited.
Further, the Behavior strategy
Figure DEST_PATH_IMAGE140
Is a strategy according to the current actor module and random UO (Uhlenbeck-Ornstein, ornstein-Urnbeck) noise
Figure 361028DEST_PATH_IMAGE011
The random process generated can be specifically as follows:
in one possible scenario, selecting a continuous action according to the Behavior policy with the actor module includes:
step S3311, selecting a continuous action according to the Behavior policy by using the actor module as follows:
Figure 447933DEST_PATH_IMAGE010
wherein,
Figure 100002_DEST_PATH_IMAGE141
is shown as
Figure 593743DEST_PATH_IMAGE054
Random UO noise corresponding to each time step,
Figure DEST_PATH_IMAGE142
denotes the first
Figure 421891DEST_PATH_IMAGE054
The historical state vector corresponding to each time step,
Figure 373667DEST_PATH_IMAGE013
an online actor network in the actor module is represented,
Figure 518340DEST_PATH_IMAGE014
representing online actor network parameters.
It should be noted here that one strategy of the actor module may be specifically represented by the specific model parameters in the module. In other words, a policy update to an actor module actually performs a parameter update to that module.
And S340, performing quantization compression on the vertex features and the graph neural network in the graph data by using the action sequence, and sending the obtained quantization graph data and the quantization graph neural network to the hardware accelerator, so that the hardware accelerator trains the quantization graph neural network by using the quantization graph data, and determines the current accuracy corresponding to the execution of the specified task by the trained quantization graph neural network.
S350, determining a current state vector by using the memory occupation amount, the calculated amount and the accuracy corresponding to the action sequence, and determining an award value by using the reference accuracy and the current accuracy;
specifically, the reward value may be calculated as follows:
Figure 100002_DEST_PATH_IMAGE143
wherein,
Figure DEST_PATH_IMAGE144
after an original graph neural network is trained by using an original training set, the corresponding reference accuracy of the original graph neural network is improved,
Figure 100002_DEST_PATH_IMAGE145
is the accuracy of the fine-tuned quantization map neural network,
Figure DEST_PATH_IMAGE146
as a scaling factor, the value may preferably be 0.1.
S360, when the reward value is determined to be larger than the historical reward value, updating the historical reward value by using the reward value, and updating the optimal interval quantization bit width and the optimal network quantization bit width by using the updated action sequence;
s370, generating conversion data by using the historical state vector, the continuous action, the reward value and the current state vector, and training the actor module and the critic module by using the conversion data so as to enable the critic module to update strategies used by the actor module when numerical value updating is carried out;
it should be noted that, the embodiment of the present invention does not limit the specific process of training the actor module and the critic module, and reference may be made to the description in the following embodiments. The significance of the training is to update the model parameters of the actor module so that it can use new strategies to determine the next action.
S380, when the time step is determined not to reach the length of the action sequence, adding 1 to the time step, updating the historical state vector by using the current state vector, and entering the step of determining continuous action by using the actor module under the constraint of a preset resource limiting condition;
s390, when the time step is determined to reach the length of the action sequence and the strategy frequency does not reach a preset value, adding 1 to the strategy frequency, and entering the step of initializing the action sequence and the historical state vector;
and S3100, outputting the optimal interval quantization bit width and the optimal network quantization bit width when the strategy times are determined to reach a preset value.
It should be noted that, the embodiment of the present invention does not limit the specific preset value, and the preset value can be set according to the actual application requirement. It can be understood that the larger the preset value is, the stronger the perception degree of the agent to the environment is, the more appropriate the generated optimal interval quantization bit width and the optimal network quantization bit width are, but the corresponding calculation time is longer, and the calculation amount is larger, so the preset upper limit corresponding to the number of times of the policy can be set as required after the precision and the calculation resource are balanced.
Based on the above-described embodiment, the manner of calculating the memory footprint, the calculation amount, and the delay amount will be described below. Of course, considering that the threshold values and the calculation formulas of the three quantities are determined by the direct feedback of the hardware accelerator, the processing manner of the hardware accelerator on the quantization map data and the quantization map neural network is also described. Specifically, the main processing content of the hardware accelerator for the quantization map data and the quantization map neural network is to train the quantization map neural network by using the quantization map data, and the training process may be optimized in various ways, for example, optimization of strategies such as full-batch (full-batch), mini-batch (mini-batch), or single-element (one-example) Stochastic Gradient Descent (SGD). In the embodiment of the invention, in order to improve the training efficiency, the hardware accelerator can optimize the training process of the quantization map neural network by adopting a small-batch stochastic gradient descent method.
In one possible scenario, the hardware accelerator training the quantization map neural network with quantization map data may include:
and S341, training the quantization map neural network by the hardware accelerator by using the quantization map data based on a small batch stochastic gradient descent method.
Based on the above training method, the following describes the calculation method of the memory footprint, the calculation amount, and the delay amount. In one possible case, determining the memory footprint, the computation amount and the delay amount corresponding to the updated action sequence includes:
s3321, calculating the memory occupation amount by using the following formula:
Figure 254959DEST_PATH_IMAGE015
wherein,
Figure 914610DEST_PATH_IMAGE016
the memory occupied amount is shown in the figure,
Figure 935656DEST_PATH_IMAGE017
representing the number of graph vertices within a single mini-batch,
Figure 652945DEST_PATH_IMAGE018
representation of quantization map neural network
Figure 671716DEST_PATH_IMAGE019
The vertex dimension values corresponding to the respective network layers,
Figure DEST_PATH_IMAGE147
Figure 21926DEST_PATH_IMAGE021
representing the number of all network layers of the neural network of the quantization map,
Figure 846663DEST_PATH_IMAGE022
representing all graphs within a single small batchThe maximum value in the interval quantization bit width to which the vertex is allocated, S represents the total number of convolution kernels,
Figure 185502DEST_PATH_IMAGE023
and
Figure 109596DEST_PATH_IMAGE024
respectively representing the network quantization bit width corresponding to the weight matrix and the convolution kernel of each network layer of the neural network of the quantization diagram;
s3322, calculating the calculated amount by using the following formula:
Figure 540577DEST_PATH_IMAGE025
wherein,
Figure 778792DEST_PATH_IMAGE026
the amount of calculation is represented by the amount of calculation,
Figure 611619DEST_PATH_IMAGE027
network quantization bit widths corresponding to activation matrixes of each network layer of the neural network of the quantization diagram are represented,
Figure 627985DEST_PATH_IMAGE028
representation of quantization map neural network
Figure 749525DEST_PATH_IMAGE019
A total number of multiply-accumulate operations for a layer;
s3323, calculating the delay amount using the following equation:
Figure 650485DEST_PATH_IMAGE029
wherein,
Figure 213184DEST_PATH_IMAGE030
it is indicated that the amount of delay,
Figure 744660DEST_PATH_IMAGE031
representing quantization graph neural network
Figure 884654DEST_PATH_IMAGE019
The network layer handles the delay of small batches of graph data.
It should be noted that after obtaining the memory footprint, the calculated amount, and the delay amount, corresponding threshold values may be used to determine whether the three amounts meet the requirements. Can adopt
Figure DEST_PATH_IMAGE148
Figure DEST_PATH_IMAGE149
And
Figure DEST_PATH_IMAGE150
representing a memory footprint threshold, a computation threshold and a latency threshold, of
Figure 173594DEST_PATH_IMAGE148
The storage capacity that can be provided by the hardware acceleration device,
Figure 715434DEST_PATH_IMAGE149
represents the upper limit of the total number of bit operations available per second for the hardware accelerator, and
Figure 604762DEST_PATH_IMAGE150
refers to the delay characteristics of the hardware accelerator.
Figure 232052DEST_PATH_IMAGE148
Figure 678077DEST_PATH_IMAGE149
And
Figure 949789DEST_PATH_IMAGE150
all determined by the characteristics of the hardware accelerator, and can be obtained directly or obtained through measurement.
Based on the above embodiments, the following describes a specific process of quantization compression. The embodiment of the present invention will be described by taking quantization and compression of graph data as an example. In one possible case, performing quantization compression on vertex features in graph data by using a sequence of actions may include:
s341, truncating vertex features of each graph vertex in the graph data to the range of [ -c, c ] (c > 0), and performing quantization compression on the truncated vertex features by using section quantization bits corresponding to degrees of the graph vertices in the operation sequence:
Figure 885384DEST_PATH_IMAGE032
wherein,
Figure 626070DEST_PATH_IMAGE033
a quantized compression function is represented that is,
Figure DEST_PATH_IMAGE151
a function of rounding off is represented by,
Figure DEST_PATH_IMAGE152
representing a truncation function of
Figure 813468DEST_PATH_IMAGE036
Is cut off to
Figure 533163DEST_PATH_IMAGE037
Figure DEST_PATH_IMAGE153
The characteristics of the vertex are represented and,
Figure 29872DEST_PATH_IMAGE039
representing the jth component in the vertex feature,
Figure 897334DEST_PATH_IMAGE040
which represents a scaling factor, is the ratio of the scaling factor,
Figure 357265DEST_PATH_IMAGE041
Figure 931466DEST_PATH_IMAGE042
indicating a neutralization in a sequence of actions
Figure 208864DEST_PATH_IMAGE043
And interval quantization bits corresponding to degrees of the vertex of the graph.
Of course, in order to further reduce the accuracy loss of the quantized graph data due to the selection of the cutoff value c, the embodiment of the present invention also designs that a method based on minimizing the characteristic distribution distance of the data before and after quantization is adopted to determine the appropriate c value. Specifically, before performing quantization compression on vertex features in the graph data by using the action sequence, the method may further include:
and S342, determining the c value in the following way:
Figure DEST_PATH_IMAGE154
wherein,
Figure DEST_PATH_IMAGE155
the function is used to select the value of x such that
Figure DEST_PATH_IMAGE157
At the minimum, the temperature of the mixture is controlled,
Figure DEST_PATH_IMAGE158
to represent
Figure DEST_PATH_IMAGE159
Characteristic distribution of
Figure DEST_PATH_IMAGE160
KL divergence between feature distributions of (a); the feature distribution is a maximum, minimum, mean, variance, sharpness, or kurtosis.
It should be noted that the embodiment of the present invention does not limit the calculation manner of the KL Divergence (Kullback-Leibler Divergence), and of course, the distance between the two feature distributions may also be determined in other manners, for example, JS distance (Jensen-Shannon Divergence) and Mutual Information (Mutual Information) may also be used, and may be set according to the actual application requirements. The embodiment of the invention also does not limit the specific acquisition mode of the characteristic distribution data, for example, the maximum value, the minimum value, the mean value and the variance can be directly obtained through target data; the sharpness and kurtosis are obtained by constructing a histogram of the target data. As for the graph convolution kernels (if any), weights and activations (if any) of the different layers of the graph neural network, the embodiments of the present invention will quantify them similarly. The difference is that for activation, embodiments of the present invention will truncate them to the range of [0, c ] rather than [ -c, c ] because the activation values (i.e., the outputs of the ReLU (linear rectification function) layer) are non-negative.
Based on the above embodiment, the initialization and training process of the actor module and the critic module will be described in detail below. First, a brief description of the structure of a DDPG agent will be given. The Actor-Critic framework consists of an Actor (which may also be referred to as a policy network μ) and Critic (which may also be referred to as a Q network or a value network). Wherein the Actor is responsible for interacting with the environment and learning a better strategy by a strategy gradient method under the guidance of a Critic value function; the Critic task is to learn a value function Q by utilizing collected data of interaction between the Actor and the environment, and the function of the function is to judge whether the current state-action pair is good or not so as to assist the Actor to update strategies. Both Actor and Critic contain two networks, one called online and one called target. Thus, four networks in the DDPG algorithm are available, namely an online Actor network (online Actor network), a target Actor network (target Actor network), an online Critic network (online Critic network), and a target Critic network (target Critic network). The online Actor network and the target Actor network have the same structure and different parameters; the same is true for the online critical network and the target critical network. In the network training process, the DDPG algorithm adopts the skill of freezing target network: the online network parameters are updated in real time, while the target network parameters are temporarily frozen. And when the target network is frozen, the online network is tried and explored, the target network summarizes the experience according to the samples generated by the online network, then acts again, and assigns the parameters of the online network to the target network.
In addition, the DDGP algorithm also employs an empirical playback (empirical playback) mechanism to remove data dependencies and improve sample utilization efficiency. The method specifically comprises the steps of maintaining an experience playback pool, storing a conversion data quadruple (state, action, reward and next state) sampled from the environment each time into the experience playback pool, and randomly sampling a plurality of data from a playback buffer area when a strategy network and a Q network are trained. Doing so can serve two functions: (1) let the samples satisfy the independent assumption. The relevance between samples can be broken through by adopting empirical playback, so that the samples can meet independent assumptions; and (2) improving the sample utilization rate.
The functions of the four networks of the DDPG agent are as follows:
online Actor network (online Actor network): responsible policy network parameters
Figure DEST_PATH_IMAGE161
Is updated iteratively according to the current environmental state
Figure DEST_PATH_IMAGE162
Selecting a current optimal action
Figure DEST_PATH_IMAGE163
And responsible for interacting with the environment to generate the next state
Figure DEST_PATH_IMAGE164
And a reward r;
target Actor network (target Actor network): responsible for the next state based on samples from the empirical replay pool
Figure 671944DEST_PATH_IMAGE164
Selecting the next optimal action
Figure DEST_PATH_IMAGE165
And the system is responsible for regularly using an exponential moving average method to convert the parameters of the Online Actor
Figure 466724DEST_PATH_IMAGE161
Updating parameters to a Target Actor network
Figure DEST_PATH_IMAGE166
online Critic network (online Critic network): responsible value network parameters
Figure DEST_PATH_IMAGE167
Is responsible for calculating the online Q value of the current state-action pair
Figure DEST_PATH_IMAGE168
Responsible for calculating the estimate of the Target critical network output
Figure DEST_PATH_IMAGE169
target Critic network (target Critic network): estimation responsible for calculating Target critical network output
Figure DEST_PATH_IMAGE170
In (1)
Figure DEST_PATH_IMAGE171
And responsible for periodically using the exponential moving average method to convert the parameters of Online Critic
Figure 236710DEST_PATH_IMAGE167
Updating parameters to Target critical network
Figure DEST_PATH_IMAGE172
In one possible scenario, where the actor module comprises an online actor network and a target actor network, the critic module comprises an online critic network and a target critic network, initializing an agent used for reinforcement learning may include:
s311, initializing online actor network parameters of the online actor network, and setting the target actor network parameters of the target actor network and the online actor network parameters to be the same values;
s312, initializing online commenting family network parameters of the online commenting family network, and setting the target commenting family network parameters of the target commenting family network and the online commenting family network parameters to be the same values.
In particular, parameters of the online actor and online reviewer networks may be initialized first
Figure DEST_PATH_IMAGE173
And
Figure 825954DEST_PATH_IMAGE167
and copying the parameters of the online network to the corresponding target network parameters:
Figure DEST_PATH_IMAGE174
in one possible scenario, training the actor module and the critic module with the transformed data may include:
s371, adding the conversion data to an experience playback pool, and randomly sampling a preset number of conversion data from the experience playback pool to serve as training data;
s372, determining a first gradient of the online critic network parameters by utilizing the training data, the target actor network, the target critic network, the online critic network and the following loss function;
Figure DEST_PATH_IMAGE175
wherein,
Figure DEST_PATH_IMAGE176
the function of the loss is represented by,
Figure 464746DEST_PATH_IMAGE052
it is shown that the continuous motion is performed,
Figure 594376DEST_PATH_IMAGE053
is shown as
Figure 674327DEST_PATH_IMAGE054
The historical state vector corresponding to each time step,
Figure DEST_PATH_IMAGE177
representing an online network of critics that is online,
Figure 716364DEST_PATH_IMAGE056
the online review of the family network parameters,
Figure 717818DEST_PATH_IMAGE057
representing a preset number;
Figure DEST_PATH_IMAGE178
representing an estimate of the target critic's network,
Figure DEST_PATH_IMAGE179
Figure 775773DEST_PATH_IMAGE060
is shown as
Figure 710231DEST_PATH_IMAGE054
The value of the reward corresponding to each time step,
Figure 703594DEST_PATH_IMAGE061
a pre-set discount factor is represented by,
Figure 864448DEST_PATH_IMAGE062
a network of target critics is represented,
Figure DEST_PATH_IMAGE180
representing the network parameters of the target critic,
Figure 552525DEST_PATH_IMAGE064
a network of target actors is represented as,
Figure 607069DEST_PATH_IMAGE065
a network parameter representing the target actor is shown,
Figure 771334DEST_PATH_IMAGE066
denotes the first
Figure 153905DEST_PATH_IMAGE054
Current state vectors corresponding to the time steps;
s373, updating the online critic network parameters according to the first gradient;
s374, determining a performance target by using the training data, the updated online critic network, the updated online actor network and the target function, and determining a second gradient of the performance target relative to the determined online actor network parameters:
Figure DEST_PATH_IMAGE181
wherein,
Figure 147138DEST_PATH_IMAGE068
indicating when the environmental state O obeys a distribution function of
Figure 525029DEST_PATH_IMAGE069
When being distributed
Figure DEST_PATH_IMAGE182
The expected value of (c) is,
Figure DEST_PATH_IMAGE183
Figure 594617DEST_PATH_IMAGE072
a network parameter of the on-line actor is represented,
Figure DEST_PATH_IMAGE184
representing a second gradient.
For the second gradient calculation process, it should be noted that the object of the embodiment of the present invention is to find an optimal policy network parameter
Figure DEST_PATH_IMAGE185
The DDPG agent is caused to act according to an optimal strategy corresponding to this parameter, maximizing the expectation of the cumulative prize accrued in the environment. To evaluate the performance of a policy μ, the present invention defines an objective function J called performance objective:
Figure DEST_PATH_IMAGE186
wherein,
Figure 418478DEST_PATH_IMAGE182
means that in each state O, if according to the policy
Figure DEST_PATH_IMAGE187
To select an action
Figure DEST_PATH_IMAGE188
The Q value can be generated.
Figure DEST_PATH_IMAGE189
Is that when the environmental state O obeys a distribution function of
Figure 215402DEST_PATH_IMAGE069
When the distribution of the pressure difference is small,
Figure 447800DEST_PATH_IMAGE182
is calculated from the expected value of (c). Objective function
Figure 360392DEST_PATH_IMAGE189
Network parameters with respect to policy
Figure DEST_PATH_IMAGE190
The gradient (policy gradient for short) of (a) can be calculated by the following formula:
Figure DEST_PATH_IMAGE191
the calculation of the strategy gradient utilizes a chain rule, firstly derives the action a, and then takes the strategy network parameter
Figure 58834DEST_PATH_IMAGE190
And (6) derivation. Then, the function Q is maximized by a gradient ascending method, resulting in the action with the largest value.
The expected value can be estimated by the Monte-Carlo method. Storing state transitions in an empirical playback pool P
Figure DEST_PATH_IMAGE192
Wherein
Figure DEST_PATH_IMAGE193
Based on DDPG intelligent agent and according to Behavior strategy
Figure DEST_PATH_IMAGE194
Generated, it will be converted into discrete action values based on the methods provided in the above embodiments. When N conversion data are randomly sampled from the empirical replay pool P to form a single batch, according to the Monte-Carlo method, the single batch data can be substituted into the policy gradient formula as a unbiased estimate of the expected value, so the policy gradient can be rewritten as:
Figure DEST_PATH_IMAGE195
s375, updating the network parameters of the online actors based on the second gradient;
s376, updating the target comment family network parameter and the target actor network parameter by using the updated online comment family network parameter and the updated online actor network parameter in the following way:
Figure 269235DEST_PATH_IMAGE074
wherein,
Figure 152878DEST_PATH_IMAGE075
is a preset value.
The neural network compression method of the above-described figure is described in detail below based on a specific example.
(a) A heterogeneous parallel computing system consisting of a host (namely an upper computer) and a hardware accelerator is built. Xilinx Zynq-7020 FPGA or Inspur F37X FPGA is used as GNN inference hardware accelerator. In the aspect of time sequence structure design, reconfigurable Bit-Serial Matrix Multiplication superposition (BISMO) is utilized. In the aspect of space structure, a Bitfusion architecture is adopted. Computation, storage and delay characteristic data for the hardware accelerator are obtained.
(b) The Graph neural Network selects GCN (Graph Convolutional neural Network), constructs a Graph data set by using Pumbed (a kind of abstract database), selects a Graph learning task as vertex classification, and then designs an objective function and an evaluation standard matched with the learning task. And constructing a GNN example containing the L-layer graph convolution layer, and training the GNN model by using a CPU (central processing unit) or a GPU (graphics processing unit) according to a small-batch stochastic gradient descent method on the upper computer to obtain the trained floating-point GNN model. Graph data and a trained floating-point GNN model are objects to be quantized by the present invention.
(c) And constructing a DDPG reinforcement learning environment and finishing initialization. 1) And constructing an Actor (policy network) and a Critic (value network). Each network has one copy, one is an online network and the other is a target network. 2) Initializing online network parameters of Actor and Critic
Figure 767530DEST_PATH_IMAGE173
And
Figure 736623DEST_PATH_IMAGE167
(ii) a Copying parameters of the online network to corresponding target network parameters:
Figure 314497DEST_PATH_IMAGE174
. 3) Initializing an environmental state
Figure DEST_PATH_IMAGE196
. 4) Initializing an empirical playback pool (replay buffer) P and a sampling threshold
Figure DEST_PATH_IMAGE197
. 5) The maximum reward r _ best and the optimal action a _ best are initialized.
(d) And finding an optimal quantization bit width distribution strategy by using a DDPG algorithm. All steps are performed on the upper computer unless explicitly stated. The method comprises the following specific steps:
repeat the following training process (one process for each epamode)
Figure DEST_PATH_IMAGE198
Secondly:
(1) initializing a UO random process;
(2) receiving a random initial state
Figure DEST_PATH_IMAGE199
(3) Repeatedly executing T time steps, at each time step
Figure 911700DEST_PATH_IMAGE003
The following operations are performed in sequence:
a. the Actor selects an action according to the Behavior strategy
Figure DEST_PATH_IMAGE200
Wherein,
Figure 290729DEST_PATH_IMAGE011
random UO (Uhlenbeck-Ornstein) noise. Will be provided with
Figure DEST_PATH_IMAGE201
Conversion to discrete actions
Figure 153643DEST_PATH_IMAGE136
b. The upper computer is based on
Figure 112371DEST_PATH_IMAGE136
And (3) specifying a good quantization bit width, and performing quantization on the features of all graph vertices, graph convolution kernels (if any) of all layers of GNN, weights and activations (if any) by adopting a quantization method based on a method for minimizing data feature distribution distance before and after quantization. Obtaining quantized graph vertex feature data and a GNN model, and mapping the GNN model to a hardware accelerator;
c. the hardware accelerator reads the quantized graph vertex characteristics and the adjacency matrix from the upper computer, trains the GNN model by adopting a small-batch stochastic gradient descent method, tests the classification accuracy and calculates the reward function
Figure DEST_PATH_IMAGE202
And output the value of (C)
Figure DEST_PATH_IMAGE203
(ii) a Will be provided with
Figure 797037DEST_PATH_IMAGE202
And
Figure 815809DEST_PATH_IMAGE203
returning to the upper computer;
d. and the upper computer updates r _ best and a _ best. The upper computer compares and returns
Figure 759494DEST_PATH_IMAGE202
And the size of r _ best, if
Figure 443285DEST_PATH_IMAGE202
>r _ best, then order
Figure DEST_PATH_IMAGE204
Figure DEST_PATH_IMAGE205
e. The upper computer converts the state into a process
Figure DEST_PATH_IMAGE206
And storing the data into an experience playback pool P.
f. When the number of transitions in the empirical playback pool P exceeds a threshold
Figure 828130DEST_PATH_IMAGE197
And then, sampling is carried out: the upper computer randomly samples N pieces of transition data from an experience playback pool P to serve as batch training data of an online Actor and an online critical network.
g. And the upper computer updates the gradients of the online Actor network and the online Critic network. Computing
Figure DEST_PATH_IMAGE207
About
Figure DEST_PATH_IMAGE208
And calculating a strategy gradient; updating online critical network parameters by adopting Adam optimizer
Figure DEST_PATH_IMAGE209
And online Actor network parameters
Figure 175060DEST_PATH_IMAGE208
h. The upper computer soft updates parameters of a target Actor network and a target Critic network: and soft updating the online network parameters corresponding to the online network parameters and the target network parameters by using a moving average method:
Figure 809304DEST_PATH_IMAGE074
(4) and the upper computer outputs r _ best and a _ best.
(e) And the hardware accelerator retrains the quantization model by an epoch according to the a _ best to recover the performance, and obtains the final fixed-point GNN quantization model and the quantized graph vertex characteristic data.
The following describes a neural network compression device, an electronic device, and a computer-readable storage medium according to embodiments of the present invention, and the neural network compression device, the electronic device, and the computer-readable storage medium described below and the neural network compression method described above may be referred to correspondingly.
Referring to fig. 4, fig. 4 is a block diagram of a neural network compression device according to an embodiment of the present invention. The apparatus may include:
an obtaining module 401, configured to obtain a trained graph neural network and graph data used in training the graph neural network;
an interval determining module 402, configured to determine degree distribution ranges corresponding to vertices of all graphs in graph data, and divide the degree distribution ranges into a plurality of degree intervals;
a quantization bit width determining module 403, configured to determine, by using a reinforcement learning and hardware accelerator, an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network under the constraint of a preset resource limitation condition;
and the quantization compression module 404 is configured to perform quantization compression on vertex features of graph vertices corresponding to degrees in the graph data by using the optimal interval quantization bit width, and perform quantization compression on the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network.
Optionally, the interval determining module 402 may include:
the arrangement submodule is used for arranging all graph vertexes in the graph data from small to large according to degrees to obtain a graph vertex sequence;
the dividing submodule is used for dividing the degree distribution range by using the graph vertex sequence to obtain a plurality of degree intervals; the number of the chart vertexes contained in each degree interval is the same or the difference value is smaller than a preset threshold value.
Optionally, the apparatus may further include:
and the training module is used for training the optimal quantization map neural network by using the optimal quantization map data to obtain a fine tuning quantization map neural network so as to deploy the fine tuning quantization map neural network to external service equipment.
Optionally, the time sequence structure of the hardware accelerator is reconfigurable bit-serial matrix multiplication superposition, and the space structure is a BitFusion architecture.
Optionally, the quantization bit width determining module 403 includes:
the initialization submodule is used for acquiring the reference accuracy corresponding to the execution of the designated task by the graph neural network and initializing an agent and a historical reward value used for reinforcement learning; the intelligent agent comprises an actor module and a critic module;
the first setting submodule is used for setting the strategy frequency to be 1 and initializing an action sequence and a historical state vector; the action sequence is used for storing the interval quantization bit width corresponding to each degree interval and the network quantization bit width corresponding to the graph neural network; the state vector is used for recording the corresponding memory occupation amount, the calculated amount and the corresponding accuracy when the quantized graph neural network processes the quantized graph data;
the second setting submodule is used for setting the time step to be 1, determining continuous actions by using the actor module under the constraint of a preset resource limiting condition, updating the numerical values of the action sequence by using the continuous actions, and determining the memory occupation amount and the calculated amount corresponding to the action sequence after updating;
the compression and training submodule is used for carrying out quantization compression on the vertex features and the graph neural network in the graph data by using the action sequence and sending the obtained quantization graph data and the quantization graph neural network to the hardware accelerator so that the hardware accelerator trains the quantization graph neural network by using the quantization graph data and determines the current accuracy corresponding to the execution of the designated task by the trained quantization graph neural network;
the calculation submodule is used for determining a current state vector by utilizing the memory occupation amount, the calculated amount and the accuracy corresponding to the action sequence and determining a reward value by utilizing the reference accuracy and the current accuracy;
the refinement submodule is used for updating the historical reward value by using the reward value when the reward value is determined to be larger than the historical reward value, and updating the optimal interval quantization bit width and the optimal network quantization bit width by using the updated action sequence;
the intelligent agent training submodule is used for generating conversion data by utilizing the historical state vector, the continuous action, the reward value and the current state vector, and training the actor module and the critic module by utilizing the conversion data so as to update the strategy used by the critic module when the actor module is subjected to numerical value updating;
a third setting submodule, configured to add 1 to the time step when it is determined that the time step does not reach the length of the action sequence, update the historical state vector using the current state vector, and perform a step of numerically updating the time step on the action sequence using the actor module under the constraint of a preset resource restriction condition;
a fourth setting submodule, configured to add 1 to the strategy times when it is determined that the time step reaches the length of the action sequence and the strategy times does not reach a preset value, and enter a step of initializing the action sequence and the historical state vector;
and the output sub-module is used for outputting the optimal interval quantization bit width and the optimal network quantization bit width when the strategy times are determined to reach the preset value.
Optionally, the second setting submodule may include:
the discrete action determining unit is used for selecting continuous action according to the Behavior strategy by utilizing the actor module, and discretizing the continuous action in the following mode to obtain a discrete action value:
Figure DEST_PATH_IMAGE210
wherein,
Figure 827944DEST_PATH_IMAGE002
denotes the first
Figure 598454DEST_PATH_IMAGE003
The i-th quantization bit width in the action sequence of each time step corresponds to a successive action,
Figure 631132DEST_PATH_IMAGE004
is represented by
Figure 814989DEST_PATH_IMAGE002
Corresponding discrete action values, Q comprises a plurality of predefined quantization bit width values,
Figure 73539DEST_PATH_IMAGE005
a function of rounding off is represented by,
Figure 760872DEST_PATH_IMAGE006
and
Figure 292347DEST_PATH_IMAGE007
representing a preset minimum quantization bit width and a maximum quantization bit width,
Figure 307708DEST_PATH_IMAGE008
the function is used to select a target preset quantization bit width value Q in Q such that
Figure 277938DEST_PATH_IMAGE131
Minimum;
the updating unit is used for carrying out numerical value updating on the action sequence by utilizing the action value, determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence, and judging whether the memory occupation amount, the calculated amount and the delay amount meet the limitation of the preset resource limitation condition or not;
the first processing unit is used for performing quantization compression on the vertex features and the graph neural network in the graph data by utilizing the action sequence if the vertex features and the graph neural network are in the same state;
and if not, sequentially reducing the quantization bit width in the action sequence according to a preset sequence so as to update the action sequence again, and entering the step of determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence when each reduction action is completed.
Optionally, the discrete motion determination unit may include:
a continuous action determining subunit for selecting a continuous action according to the Behavior strategy using the actor module in the following manner:
Figure 23040DEST_PATH_IMAGE010
wherein,
Figure 646788DEST_PATH_IMAGE011
is shown as
Figure 539658DEST_PATH_IMAGE054
Random UO noise corresponding to each time step,
Figure DEST_PATH_IMAGE211
is shown as
Figure 923366DEST_PATH_IMAGE054
The historical state vector corresponding to each time step,
Figure 211390DEST_PATH_IMAGE013
an online actor network in the actor module is represented,
Figure 84668DEST_PATH_IMAGE014
representing online actor network parameters.
Optionally, the compression and training submodule may include:
and the hardware accelerator unit is used for training the quantization map neural network by using the quantization map data based on a small batch stochastic gradient descent method.
Optionally, the updating unit may include:
the first calculating subunit is configured to calculate the memory usage amount by using the following formula:
Figure DEST_PATH_IMAGE213
wherein,
Figure 871358DEST_PATH_IMAGE016
the amount of memory usage is indicated,
Figure 652233DEST_PATH_IMAGE017
representing the number of graph vertices within a single mini-batch,
Figure 293298DEST_PATH_IMAGE018
representation of quantization map neural network
Figure 134215DEST_PATH_IMAGE019
The vertex dimension values corresponding to the individual network layers,
Figure DEST_PATH_IMAGE214
Figure DEST_PATH_IMAGE215
representing the number of all network layers of the neural network of the quantization map,
Figure 877044DEST_PATH_IMAGE022
represents the maximum value of the interval quantization bit widths to which all graph vertices within a single mini-batch are assigned, S represents the total number of convolution kernels,
Figure 84778DEST_PATH_IMAGE023
and
Figure 455716DEST_PATH_IMAGE024
respectively representing the network quantization bit width corresponding to the weight matrix and the convolution kernel of each network layer of the neural network of the quantization diagram;
a second calculating subunit, configured to calculate the calculated amount using the following formula:
Figure 608480DEST_PATH_IMAGE025
wherein,
Figure 432079DEST_PATH_IMAGE026
the amount of calculation is represented by the amount of calculation,
Figure 944969DEST_PATH_IMAGE027
representing the network quantization bit width corresponding to the activation matrix of each network layer of the neural network of the quantization diagram,
Figure DEST_PATH_IMAGE216
representation of quantization map neural network
Figure 904835DEST_PATH_IMAGE019
A total number of multiply-accumulate operations for a layer;
a third calculating subunit for calculating the delay amount using the following formula:
Figure DEST_PATH_IMAGE217
wherein,
Figure DEST_PATH_IMAGE218
it is indicated that the amount of delay,
Figure DEST_PATH_IMAGE219
representation of quantization map neural network
Figure 448074DEST_PATH_IMAGE019
The network layer handles the delay of small batches of graph data.
Optionally, the compression and training submodule comprises:
a compression unit, for truncating the vertex features of each graph vertex in the graph data to the range of [ -c, c ] (c > 0), and performing quantization compression on the truncated vertex features by using the interval quantization bits corresponding to the degrees of the graph vertices in the action sequence:
Figure 899915DEST_PATH_IMAGE032
wherein,
Figure DEST_PATH_IMAGE220
a quantized compression function is represented that is,
Figure DEST_PATH_IMAGE221
a function of rounding off is represented by,
Figure 216496DEST_PATH_IMAGE152
represents a truncation function of
Figure 562026DEST_PATH_IMAGE036
Is cut off to
Figure 56593DEST_PATH_IMAGE037
Figure DEST_PATH_IMAGE222
The characteristics of the vertex are represented and,
Figure 589205DEST_PATH_IMAGE039
representing the jth component in the vertex feature,
Figure 965870DEST_PATH_IMAGE040
which represents a scaling factor, is the ratio of the scaling factor,
Figure 165908DEST_PATH_IMAGE041
Figure 831375DEST_PATH_IMAGE042
indicating a neutralization in a sequence of actions
Figure 54546DEST_PATH_IMAGE043
And interval quantization bits corresponding to degrees of the vertex of the graph.
Optionally, the compression and training sub-module further comprises:
a cutoff value determination unit for determining the value of c by:
Figure DEST_PATH_IMAGE223
wherein,
Figure 509667DEST_PATH_IMAGE045
the function is used to select the value of x such that
Figure DEST_PATH_IMAGE224
At the minimum, the temperature of the mixture is controlled,
Figure DEST_PATH_IMAGE225
to represent
Figure DEST_PATH_IMAGE226
Characteristic distribution of
Figure DEST_PATH_IMAGE227
KL divergence between feature distributions of (a); the characteristic distribution is a maximum, a minimum, a mean, a variance, a sharpness, or a kurtosis.
Optionally, the actor module includes an online actor network and a target actor network, the critic module includes an online critic network and a target critic network, and the initialization sub-module includes:
the first initialization unit is used for initializing the online actor network parameters of the online actor network and setting the target actor network parameters of the target actor network and the online actor network parameters to be the same values;
the second initialization unit is used for initializing online comment family network parameters of the online comment family network and setting the target comment family network parameters of the target comment family network to be the same as the online comment family network parameters.
Optionally, the agent training submodule may include:
the training data extraction unit is used for adding the conversion data to the experience playback pool and randomly sampling a preset number of conversion data from the experience playback pool as training data;
the first gradient calculation unit is used for determining a first gradient of the online critic network parameters by utilizing the training data, the target actor network, the target critic network, the online critic network and the following loss function;
Figure DEST_PATH_IMAGE228
wherein,
Figure 190309DEST_PATH_IMAGE176
the function of the loss is represented by,
Figure 761099DEST_PATH_IMAGE052
it is shown that the continuous motion is performed,
Figure 737145DEST_PATH_IMAGE053
is shown as
Figure 809007DEST_PATH_IMAGE003
The historical state vector corresponding to each time step,
Figure DEST_PATH_IMAGE229
representing an online network of critics that is online,
Figure 639428DEST_PATH_IMAGE056
the online review of the family network parameters,
Figure DEST_PATH_IMAGE230
representing a preset number;
Figure DEST_PATH_IMAGE231
representing an estimate of the network of the target critic,
Figure DEST_PATH_IMAGE232
Figure DEST_PATH_IMAGE233
denotes the first
Figure 863343DEST_PATH_IMAGE003
The reward value corresponding to each time step is calculated,
Figure 61106DEST_PATH_IMAGE061
a pre-set discount factor is represented by,
Figure 812024DEST_PATH_IMAGE062
a network of target critics is represented by,
Figure DEST_PATH_IMAGE234
representing the network parameters of the target critic,
Figure DEST_PATH_IMAGE235
a network of target actors is represented as,
Figure DEST_PATH_IMAGE236
a network parameter representing the target actor is shown,
Figure 388630DEST_PATH_IMAGE066
is shown as
Figure 629119DEST_PATH_IMAGE003
Current state vectors corresponding to the time steps;
the first updating unit is used for updating the online critic network parameters according to the first gradient;
the second gradient calculation unit is used for determining a performance target by utilizing the training data, the updated online critic network, the updated online actor network and the target function, and determining a second gradient of the performance target relative to the determined online actor network parameters:
Figure DEST_PATH_IMAGE237
wherein,
Figure DEST_PATH_IMAGE238
indicating when the environmental state O obeys a distribution function of
Figure DEST_PATH_IMAGE239
When being distributed
Figure DEST_PATH_IMAGE240
The expected value of (c) is,
Figure DEST_PATH_IMAGE241
Figure 766708DEST_PATH_IMAGE072
a network parameter representing the on-line actor is shown,
Figure DEST_PATH_IMAGE242
representing a second gradient;
the second updating unit is used for updating the network parameters of the online actors based on the second gradient;
a third updating unit, configured to update the target comment family network parameter and the target actor network parameter by using the updated online comment family network parameter and online actor network parameter in the following manner:
Figure 586896DEST_PATH_IMAGE074
wherein,
Figure 673801DEST_PATH_IMAGE075
is a preset value.
Referring to fig. 5, fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention, where the embodiment of the present invention further provides an electronic device, including:
a memory 501 for storing a computer program;
a processor 502 for implementing the steps of the neural network compression method as described above when executing the computer program.
Since the embodiment of the electronic device portion corresponds to the embodiment of the neural network compression method portion, please refer to the description of the embodiment of the neural network compression method portion for the embodiment of the electronic device portion, and details are not repeated here.
Referring to fig. 6, fig. 6 is a block diagram of a computer-readable storage medium according to an embodiment of the present invention, and the embodiment of the present invention further provides a computer-readable storage medium 601 having a computer program stored thereon, where the computer program is executed by a processor to implement the steps of the graph neural network compression method according to any of the embodiments.
Since the embodiment of the computer-readable storage medium portion corresponds to the embodiment of the neural network compression method portion, please refer to the description of the embodiment of the neural network compression method portion for the embodiment of the storage medium portion, and details are not repeated here.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The method, the apparatus, the electronic device and the storage medium for compressing the neural network of the present invention are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (16)

1. A graph neural network compression method, comprising:
acquiring a trained graph neural network and graph data used in training the graph neural network;
determining degree distribution ranges corresponding to all graph vertexes in the graph data, and dividing the degree distribution ranges into a plurality of degree intervals;
under the constraint of a preset resource limiting condition, determining an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network by using a reinforcement learning and hardware accelerator;
and quantizing and compressing the vertex characteristics of the graph vertex corresponding to the degree in the graph data by using the optimal interval quantization bit width, and quantizing and compressing the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network.
2. The method according to claim 1, wherein the determining a degree distribution range corresponding to all graph vertices in the graph data and dividing the degree distribution range into a plurality of degree intervals comprises:
arranging all graph vertexes in the graph data from small to large according to degrees to obtain a graph vertex sequence;
dividing the degree distribution range by using the graph vertex sequence to obtain a plurality of degree intervals; the number of the chart vertexes contained in each degree interval is the same or the difference value is smaller than a preset threshold value.
3. The method of compressing a neural network of a graph according to claim 1, further comprising, after obtaining the optimal quantization map data and the optimal quantization map neural network:
and training the optimal quantization map neural network by using the optimal quantization map data to obtain a fine tuning quantization map neural network so as to deploy the fine tuning quantization map neural network to external service equipment.
4. The method of claim 1, wherein the hardware accelerator has a temporal structure of reconfigurable bit-serial matrix multiplication superposition and a spatial structure of a Bitfusion architecture.
5. The method according to any one of claims 1 to 4, wherein the determining, by using a reinforcement learning and hardware accelerator, an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network under a constraint of a preset resource constraint condition includes:
acquiring reference accuracy corresponding to the execution of a specified task by the graph neural network, and initializing an agent and a historical reward value used by the reinforcement learning; the intelligent agent comprises an actor module and a critic module;
setting the strategy frequency to be 1, and initializing an action sequence and a historical state vector; the action sequence is used for storing an interval quantization bit width corresponding to each degree interval and a network quantization bit width corresponding to the graph neural network; the state vector is used for recording the corresponding memory occupation amount, the calculated amount and the corresponding accuracy when the quantitative map neural network processes the quantitative map data;
setting the time step to be 1, determining continuous actions by using the actor module under the constraint of the preset resource limiting condition, performing numerical updating on the action sequence by using the continuous actions, and determining the memory occupation amount and the calculated amount corresponding to the action sequence after updating;
performing quantization compression on the vertex features in the graph data and the graph neural network by using the action sequence, and sending the obtained quantization graph data and the quantization graph neural network to the hardware accelerator, so that the hardware accelerator trains the quantization graph neural network by using the quantization graph data and determines the current accuracy corresponding to the designated task executed by the trained quantization graph neural network;
determining a current state vector by using the memory occupation amount, the calculated amount and the accuracy corresponding to the action sequence, and determining a reward value by using the reference accuracy and the current accuracy;
when the reward value is determined to be larger than the historical reward value, updating the historical reward value by using the reward value, and updating the optimal interval quantization bit width and the optimal network quantization bit width by using the updated action sequence;
generating conversion data by using the historical state vector, the continuous action, the reward value and the current state vector, and training the actor module and the commentator module by using the conversion data so as to enable the commentator module to update a strategy used by the actor module when the numerical value is updated;
when the time step is determined not to reach the length of the action sequence, adding 1 to the time step, updating the historical state vector by using the current state vector, and entering the step of determining continuous actions by using the actor module under the constraint of the preset resource limit condition;
when the time step is determined to reach the length of the action sequence and the strategy times do not reach a preset value, adding 1 to the strategy times, and entering the steps of initializing the action sequence and the historical state vector;
and when the strategy times are determined to reach the preset value, outputting the optimal interval quantization bit width and the optimal network quantization bit width.
6. The neural network compression method of claim 5, wherein the determining, by the actor module, continuous actions under the constraint of the preset resource limitation condition, performing numerical update on the action sequence by using the continuous actions, and determining the memory occupation amount and the calculation amount corresponding to the action sequence after the update comprises:
selecting the continuous action by the actor module according to a Behavior strategy, and discretizing the continuous action in the following mode to obtain a discrete action value:
Figure DEST_PATH_IMAGE002
wherein,
Figure DEST_PATH_IMAGE004
is shown as
Figure DEST_PATH_IMAGE006
The i-th quantization bit width in the operation sequence of the time step corresponds to a successive operation,
Figure DEST_PATH_IMAGE008
is shown and
Figure DEST_PATH_IMAGE009
corresponding discrete action values, Q comprising a plurality of predefined quantization bit width values, Q representing a target predefined quantization bit width value selected in Q,
Figure DEST_PATH_IMAGE011
a function of rounding off is represented by,
Figure DEST_PATH_IMAGE013
and
Figure DEST_PATH_IMAGE015
representing a preset minimum quantization bit width and a maximum quantization bit width,
Figure DEST_PATH_IMAGE017
the function is used to select a target preset quantization bit width value Q in Q such that
Figure DEST_PATH_IMAGE019
Minimum;
performing numerical value updating on the action sequence by using the action value, determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence, and judging whether the memory occupation amount, the calculated amount and the delay amount meet the limit of the preset resource limiting condition;
if yes, performing quantitative compression on the vertex features in the graph data and the graph neural network by using the action sequence;
if not, sequentially reducing the quantization bit width in the action sequence according to a preset sequence so as to update the action sequence again, and entering the step of determining the memory occupation amount, the calculated amount and the delay amount corresponding to the updated action sequence when each reduction action is completed.
7. The method of claim 6, wherein said selecting said continuous action according to a Behavior policy using said actor module comprises:
selecting, with the actor module, a continuous action according to a Behavior policy as follows:
Figure DEST_PATH_IMAGE021
wherein,
Figure DEST_PATH_IMAGE023
is shown as
Figure 492415DEST_PATH_IMAGE006
Random UO noise corresponding to each time step,
Figure DEST_PATH_IMAGE025
is shown as
Figure 389833DEST_PATH_IMAGE006
The historical state vector corresponding to each time step,
Figure DEST_PATH_IMAGE027
representing an online actor network in the actor module,
Figure DEST_PATH_IMAGE029
representing online actor network parameters.
8. The method of claim 6, wherein the hardware accelerator trains the quantization map neural network with the quantization map data, comprising:
and the hardware accelerator trains the quantization map neural network by utilizing the quantization map data based on a small batch stochastic gradient descent method.
9. The method of claim 8, wherein the determining the memory footprint, the computation load, and the delay load corresponding to the updated action sequence comprises:
calculating the memory occupancy amount by using the following formula:
Figure DEST_PATH_IMAGE031
wherein,
Figure DEST_PATH_IMAGE033
is indicative of the amount of memory usage,
Figure DEST_PATH_IMAGE035
representing the number of graph vertices within a single mini-batch,
Figure DEST_PATH_IMAGE037
representing the quantization map neural network
Figure DEST_PATH_IMAGE039
The vertex dimension values corresponding to the individual network layers,
Figure DEST_PATH_IMAGE041
Figure DEST_PATH_IMAGE043
representing the number of all network layers of the neural network of the quantization map,
Figure DEST_PATH_IMAGE045
represents the maximum value of the interval quantization bit widths to which all graph vertices within a single mini-batch are assigned, S represents the total number of convolution kernels,
Figure DEST_PATH_IMAGE047
and
Figure DEST_PATH_IMAGE049
respectively representing the weight matrix of each network layer of the neural network of the quantization diagram and the network quantization bit width corresponding to the convolution kernel;
the calculated amount is calculated using the following formula:
Figure DEST_PATH_IMAGE051
wherein,
Figure DEST_PATH_IMAGE053
the amount of the calculation is represented by,
Figure DEST_PATH_IMAGE055
network quantization bit width corresponding to an activation matrix of each network layer of the quantization graph neural network is represented,
Figure DEST_PATH_IMAGE057
representation of quantization map neural network
Figure 684197DEST_PATH_IMAGE039
A total number of multiply-accumulate operations for a layer;
the delay amount is calculated using the following formula:
Figure DEST_PATH_IMAGE059
wherein,
Figure DEST_PATH_IMAGE061
which is indicative of the amount of said delay,
Figure DEST_PATH_IMAGE063
representing the quantization map neural network
Figure 717881DEST_PATH_IMAGE039
The network layer handles the delay of small batches of graph data.
10. The graph neural network compression method of claim 5, wherein the performing quantization compression on the vertex features in the graph data by using the action sequence comprises:
truncating the vertex features of each graph vertex in the graph data into a range of [ -c, c ] (c > 0) and performing quantization compression on the truncated vertex features by using section quantization bits corresponding to the degrees of the graph vertex in the action sequence:
Figure DEST_PATH_IMAGE065
wherein,
Figure DEST_PATH_IMAGE067
which represents a function of the quantization compression,
Figure DEST_PATH_IMAGE068
a function of rounding off is represented by,
Figure DEST_PATH_IMAGE070
representing a truncation function of
Figure DEST_PATH_IMAGE072
Is cut off to
Figure DEST_PATH_IMAGE074
Figure DEST_PATH_IMAGE076
The characteristics of the vertex are represented and,
Figure DEST_PATH_IMAGE078
representing the jth component in the vertex feature,
Figure DEST_PATH_IMAGE080
which represents a scaling factor, is the ratio of the scaling factor,
Figure DEST_PATH_IMAGE082
Figure DEST_PATH_IMAGE084
representing the sum in the sequence of actions
Figure DEST_PATH_IMAGE085
And interval quantization bits corresponding to degrees of the vertex of the graph.
11. The method of claim 10, further comprising, prior to performing quantization compression on vertex features in the graph data using the sequence of actions:
the c value is determined by:
Figure DEST_PATH_IMAGE087
wherein,
Figure DEST_PATH_IMAGE089
the function is used to select the value of x such that
Figure DEST_PATH_IMAGE091
At the minimum, the temperature of the mixture is controlled,
Figure DEST_PATH_IMAGE092
to represent
Figure DEST_PATH_IMAGE093
Characteristic distribution of
Figure DEST_PATH_IMAGE095
KL divergence between feature distributions of (a); the characteristic distribution is a maximum value, a minimum value, a mean value, a variance, a sharpness or a kurtosis.
12. The graph neural network compression method of claim 5, wherein the actor modules comprise an online actor network and a target actor network, the critic modules comprise an online critic network and a target critic network, and the initializing agents used by the reinforcement learning comprises:
initializing online actor network parameters of the online actor network, and setting target actor network parameters of the target actor network and the online actor network parameters to be the same values;
initializing the online commenting family network parameters of the online commenting family network, and setting the target commenting family network parameters of the target commenting family network and the online commenting family network parameters to be the same values.
13. The method of graph neural network compression of claim 12, wherein the training the actor module and the critic module using the transformed data comprises:
adding the conversion data to an experience playback pool, and randomly sampling a preset number of conversion data from the experience playback pool as training data;
determining a first gradient of the online critic network parameter using the training data, the target actor network, the target critic network, the online critic network, and a loss function as follows;
Figure DEST_PATH_IMAGE097
wherein, the
Figure DEST_PATH_IMAGE099
The loss function is represented by a function of the loss,
Figure DEST_PATH_IMAGE101
the sequential action is represented by the sequential action,
Figure DEST_PATH_IMAGE103
is shown as
Figure 817879DEST_PATH_IMAGE006
The historical state vector corresponding to each time step,
Figure DEST_PATH_IMAGE105
representing an online network of critics that is online,
Figure DEST_PATH_IMAGE107
representing the online critic's network parameters,
Figure DEST_PATH_IMAGE109
representing the preset number;
Figure DEST_PATH_IMAGE111
representing an estimate of the target critic's network,
Figure DEST_PATH_IMAGE113
Figure DEST_PATH_IMAGE115
is shown as
Figure 330157DEST_PATH_IMAGE006
The reward value corresponding to each time step is calculated,
Figure DEST_PATH_IMAGE117
a pre-set discount factor is represented by,
Figure DEST_PATH_IMAGE119
a network of target critics is represented,
Figure DEST_PATH_IMAGE121
represents the network parameters of the target critic,
Figure DEST_PATH_IMAGE123
a network of target actors is represented as,
Figure DEST_PATH_IMAGE125
a network parameter representing the target actor is shown,
Figure DEST_PATH_IMAGE127
denotes the first
Figure 986135DEST_PATH_IMAGE006
A current state vector corresponding to each time step;
updating the online commenting family network parameters according to the first gradient;
determining a performance goal using the training data, the updated online critic network, the online actor network, and an objective function, and determining a second gradient of the performance goal with respect to determining the online actor network parameters:
Figure DEST_PATH_IMAGE129
wherein,
Figure DEST_PATH_IMAGE131
indicating when the environmental state O obeys a distribution function of
Figure DEST_PATH_IMAGE133
When being distributed
Figure DEST_PATH_IMAGE135
The expected value of (c) is,
Figure DEST_PATH_IMAGE137
Figure DEST_PATH_IMAGE139
a network parameter representing the on-line actor is shown,
Figure DEST_PATH_IMAGE141
representing the second gradient;
updating the online actor network parameters based on the second gradient;
updating the target comment family network parameter and the target actor network parameter by using the updated online comment family network parameter and the updated online actor network parameter in the following way:
Figure DEST_PATH_IMAGE143
wherein,
Figure DEST_PATH_IMAGE145
is a preset value.
14. A graph neural network compression apparatus, comprising:
the acquisition module is used for acquiring the trained graph neural network and graph data used in the training process;
the interval determining module is used for determining degree distribution ranges corresponding to all graph vertexes in the graph data and dividing the degree distribution ranges into a plurality of degree intervals;
a quantization bit width determining module, configured to determine, by using a reinforcement learning and a hardware accelerator, an optimal interval quantization bit width corresponding to each degree interval and an optimal network quantization bit width corresponding to the graph neural network under a constraint of a preset resource restriction condition;
and the quantization compression module is used for performing quantization compression on the vertex characteristics of the graph vertex corresponding to the degree in the graph data by using the optimal interval quantization bit width and performing quantization compression on the graph neural network by using the optimal network quantization bit width to obtain optimal quantization graph data and an optimal quantization graph neural network.
15. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the method of figure neural network compression as claimed in any one of claims 1 to 13 when executing the computer program.
16. A computer-readable storage medium having stored thereon computer-executable instructions that, when loaded and executed by a processor, carry out a method of graph neural network compression as claimed in any one of claims 1 to 13.
CN202211299256.8A 2022-10-24 2022-10-24 Graph neural network compression method and device, electronic equipment and storage medium Active CN115357554B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211299256.8A CN115357554B (en) 2022-10-24 2022-10-24 Graph neural network compression method and device, electronic equipment and storage medium
PCT/CN2023/085970 WO2024087512A1 (en) 2022-10-24 2023-04-03 Graph neural network compression method and apparatus, and electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211299256.8A CN115357554B (en) 2022-10-24 2022-10-24 Graph neural network compression method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115357554A CN115357554A (en) 2022-11-18
CN115357554B true CN115357554B (en) 2023-02-24

Family

ID=84007819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211299256.8A Active CN115357554B (en) 2022-10-24 2022-10-24 Graph neural network compression method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN115357554B (en)
WO (1) WO2024087512A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115357554B (en) * 2022-10-24 2023-02-24 浪潮电子信息产业股份有限公司 Graph neural network compression method and device, electronic equipment and storage medium
CN116011551B (en) * 2022-12-01 2023-08-29 中国科学技术大学 Graph sampling training method, system, equipment and storage medium for optimizing data loading
CN115934661B (en) * 2023-03-02 2023-07-14 浪潮电子信息产业股份有限公司 Method and device for compressing graphic neural network, electronic equipment and storage medium
CN116341633B (en) * 2023-05-29 2023-09-01 山东浪潮科学研究院有限公司 Model deployment method, device, equipment and storage medium
CN118296359B (en) * 2024-06-05 2024-08-06 山东德源电力科技股份有限公司 Electric energy meter with intelligent acquisition system for concentrator terminal

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108962393A (en) * 2018-05-12 2018-12-07 鲁东大学 Automatic arrhythmia analysis method based on compression figure neural network
CN112100286A (en) * 2020-08-14 2020-12-18 华南理工大学 Computer-aided decision-making method, device and system based on multi-dimensional data and server

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10747433B2 (en) * 2018-02-21 2020-08-18 Wisconsin Alumni Research Foundation Computer architecture for high-speed, graph-traversal
US11645493B2 (en) * 2018-05-04 2023-05-09 Microsoft Technology Licensing, Llc Flow for quantized neural networks
CN110852439B (en) * 2019-11-20 2024-02-02 字节跳动有限公司 Data processing method and device and storage medium
CN111563589B (en) * 2020-04-14 2024-01-16 中科物栖(南京)科技有限公司 Quantification method and device for neural network model
CN113570037A (en) * 2021-07-13 2021-10-29 清华大学 Neural network compression method and device
CN113762489A (en) * 2021-08-12 2021-12-07 北京交通大学 Method for carrying out multi-bit width quantization on deep convolutional neural network
CN113902108A (en) * 2021-11-24 2022-01-07 贵州电网有限责任公司 Neural network acceleration hardware architecture and method for quantizing bit width dynamic selection
US20220092391A1 (en) * 2021-12-07 2022-03-24 Santiago Miret System and method of using neuroevolution-enhanced multi-objective optimization for mixed-precision quantization of deep neural networks
CN114781615A (en) * 2022-04-24 2022-07-22 上海大学 Two-stage quantization implementation method and device based on compressed neural network
CN115357554B (en) * 2022-10-24 2023-02-24 浪潮电子信息产业股份有限公司 Graph neural network compression method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108962393A (en) * 2018-05-12 2018-12-07 鲁东大学 Automatic arrhythmia analysis method based on compression figure neural network
CN112100286A (en) * 2020-08-14 2020-12-18 华南理工大学 Computer-aided decision-making method, device and system based on multi-dimensional data and server

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Fully nested neural network for adaptive compression and quantization;Yufei Cui et al;《Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence》;20210107;第2080–2087页 *
一种基于分类的改进BP神经网络图像压缩方法;马义德 等;《兰州大学学报 (自然科学版)》;20050830;第41卷(第4期);第70-72页 *

Also Published As

Publication number Publication date
CN115357554A (en) 2022-11-18
WO2024087512A1 (en) 2024-05-02

Similar Documents

Publication Publication Date Title
CN115357554B (en) Graph neural network compression method and device, electronic equipment and storage medium
Jin et al. Data-driven evolutionary optimization
US11861474B2 (en) Dynamic placement of computation sub-graphs
Li et al. Deep reinforcement learning: Framework, applications, and embedded implementations
Gupta et al. Resource usage prediction of cloud workloads using deep bidirectional long short term memory networks
Praveen et al. Low cost PSO using metamodels and inexact pre-evaluation: Application to aerodynamic shape design
CN105488563A (en) Deep learning oriented sparse self-adaptive neural network, algorithm and implementation device
US11263513B2 (en) Method and system for bit quantization of artificial neural network
CN112513886A (en) Information processing method, information processing apparatus, and information processing program
Mammadli et al. The art of getting deep neural networks in shape
WO2019006976A1 (en) Neural network weight discretizing method, system and device, and readable storage medium
CN112529069A (en) Semi-supervised node classification method, system, computer equipment and storage medium
US10410140B1 (en) Categorical to numeric conversion of features for machine learning models
CN114692552A (en) Layout method and device of three-dimensional chip and terminal equipment
Ortega-Zamorano et al. FPGA implementation of neurocomputational models: comparison between standard back-propagation and C-Mantec constructive algorithm
Lyu et al. Efficient factorisation-based Gaussian process approaches for online tracking
Park et al. Continual learning with speculative backpropagation and activation history
Hosseini et al. The evolutionary convergent algorithm: A guiding path of neural network advancement
Wang et al. Balancing memory-accessing and computing over sparse DNN accelerator via efficient data packaging
US12061988B1 (en) Decomposition of ternary weight tensors
Chen et al. An effective surrogate model assisted algorithm for multi-objective optimization: application to wind farm layout design
CN112651492A (en) Self-connection width graph convolution neural network model and training method thereof
JP2021140493A (en) Information processing apparatus, information processing method, and program
Li et al. Forecasting shipping index using CEEMD-PSO-BiLSTM model
CN115934661B (en) Method and device for compressing graphic neural network, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant