CN112101538B

CN112101538B - Graphic neural network hardware computing system and method based on memory computing

Info

Publication number: CN112101538B
Application number: CN202011011776.5A
Authority: CN
Inventors: 俞德军
Original assignee: Deep Creatic Technologies Co ltd
Current assignee: Deep Creatic Technologies Co ltd
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2023-11-17
Anticipated expiration: 2040-09-23
Also published as: CN112101538A

Abstract

The invention discloses a graph neural network hardware computing system and a method based on memory computing, which relate to the technical field of memory computing hardware and comprise two memory computing modules, two MUX modules, one DMUX module, one control unit and one Relu module.

Description

Graphic neural network hardware computing system and method based on memory computing

Technical Field

The invention relates to the technical field of memory computing hardware, in particular to a graphic neural network hardware computing system and method based on memory computing.

Background

Over the last decade, with the rapid development of computing resources and the availability of large amounts of training data, deep learning and neural networks have rapidly emerged and evolved. Many machine learning tasks (such as object detection, machine translation, and speech recognition) that once heavily relied on manually extracted features have been thoroughly replaced today by various end-to-end deep learning neural network models (e.g., convolutional Neural Networks (CNNs), long short term memory models (LSTM), and automatic encoders).

Although the conventional neural network method has been applied to great success in extracting features of euclidean space data, data in many practical application scenarios are generated from non-euclidean space, and the performance of the conventional neural network method in processing non-euclidean space data is still unsatisfactory. For example, in electronic commerce, a Graph-based learning system can make very accurate recommendations using interactions between users and products, but the complexity of the Graph makes existing deep learning algorithms very challenging to handle. Because the graphs are irregular, each graph has a variable size out-of-order node, and each node in the graph has a different number of neighbors, which results in some important operations such as convolution that are easy to calculate on the image, but difficult for the graph. Furthermore, one core assumption of existing deep learning algorithms is that the data samples are independent of each other. However, this is not the case for graphs where each data sample (node) in the graph has edges that are related to other real data samples (nodes) in the graph, and this information can be used to capture the inter-dependencies between instances. In recent years, researchers have studied the ideas of convolutional networks, cyclic networks and depth automatic encoders, and have defined and designed the structure of the graphic neural network specially used for processing graphic data, this structure realizes the convolution operation on the topological graph by means of spectrogram theory (Spectral Graph Theory), the specific method is that the topological graph structure in the space domain is mapped into the frequency domain through Fourier transformation and convolved, and then the inverse transformation is utilized to return to the space domain, thus completing the graph convolution operation.

The traditional graph neural network calculation is completed on a traditional PC, data are stored in a magnetic disk, required data are loaded into a memory when calculation is performed, the calculation is completed by a CPU, and in the process, whether Laplacian matrix calculation or convolution calculation is performed, a large number of memory access operations are required, and the memory access speed becomes a bottleneck of calculation. In addition, since all the calculation operations and the control of the whole calculation flow are completed by the external CPU, a large amount of calculation resources are occupied, so that the CPU cannot complete other tasks. Therefore, how to increase the access speed and release the power of the external computing unit becomes an urgent problem to be solved.

Disclosure of Invention

The invention aims at: in order to solve the bottleneck problem of the conventional graph neural network calculation, the calculation system realizes hardware mapping of graph convolution operation by means of a memory calculation module, integrates calculation and storage of data into one, directly completes all operations in a storage unit, and controls the whole calculation flow by an internal control unit, thereby completely eliminating memory operation, greatly saving the calculation force of an external ALU, greatly improving the overall calculation efficiency of the system and reducing the operation power consumption.

The technical scheme adopted by the invention is as follows:

the invention relates to a graphic neural network hardware computing system based on memory computation, which comprises two memory computing modules, two MUXs, a DMUX, a control unit module and a Relu module,

the two memory calculation modules are respectively used for completing Laplace matrix calculation and neural network convolution calculation;

the two MUX modules respectively select the input data of the two memory calculation modules according to the instruction of the control unit;

the DMUX module selects the direction of outputting the data according to the instruction of the control unit, and outputs the data to the outside or transmits the data to the MUX module for the next round of calculation;

the Relu module performs activation processing on the calculation result of the memory calculation module to obtain a final result of one-layer neural network calculation, and then sends the result to the subsequent DMUX module;

the control unit is used for generating control signals of all other modules and controlling the whole calculation flow.

Further, the internal structures of the two memory calculation modules are the same, and the memory calculation modules comprise a storage unit array, a mode control module and a read-write transmission module;

the memory cell array is composed of basic memory cells and is used for storing data and calculating data;

the mode control module receives an instruction of an external control unit, activates the control unit and controls two working modes of the control unit, namely a Laplace matrix calculation mode and a convolution calculation mode;

the read-write transmission module is used for caching the input and output data of the storage unit, inputting or outputting the data to the storage unit, writing the external input data into the storage unit, inputting the external input data into the storage unit for calculation, outputting the calculation result to the subsequent module or reading the data in the storage unit outwards.

Further, the storage unit is RRAM, MRAM or Flash.

The graph neural network hardware computing method based on memory computing comprises the computing system and further comprises the following working procedures:

step 1, determining external graph data in a graph adjacency matrix form and feature vectors of graph nodes;

step 2, global reset, and initializing all modules in the system;

step 3, writing the neural network weight into a memory calculation module;

step 4, inputting the graph adjacent matrix into the system, and generating a control signal by the control unit to control the MUX module to input the input data into a read-write transmission module in a memory calculation module for Laplace matrix calculation;

step 5, the control unit generates a control signal of the memory calculation module in step 4, the control mode control module sets the storage unit to be in a Laplace matrix calculation mode, and activates the corresponding storage unit so as to perform Laplace matrix calculation and store calculation results;

step 6, inputting the characteristic vector of the graph node into a system, and controlling a MUX module to input the input data into a read-write transmission module in a memory calculation module for Laplace matrix calculation by a control unit to generate a control signal;

step 7, the control unit generates a control signal of the memory calculation module in step 6, the control mode control module sets the storage unit to be in a matrix multiplication calculation mode, so that matrix multiplication calculation is carried out on the Laplace matrix and input data, and a calculation result is output to a subsequent MUX module through the read-write transmission module;

step 8, the control unit controls the MUX module in step 7 to input the read-write transmission module of the memory calculation module for convolution calculation;

step 9, the control unit generates a control signal of the memory calculation module in step 8, the control mode control module sets the storage unit as matrix multiplication calculation, so as to perform convolution calculation of the weight matrix and input data, and output the result data to the subsequent Relu module;

step 10, the Relu module uses the Relu activation function to activate the convolution calculation result in step 9, and outputs the generated result to the subsequent DMUX module;

step 11, the control unit generates a DMUX control signal, selects and outputs a calculation result, if the calculation operation is not finished, the control unit transmits the input to the MUX module, and then jumps to step 6, otherwise, the control unit executes step 12;

and step 12, after the calculation is completed, the system outputs a final result, all buffer data are cleared, all modules in the system are reset, and the whole operation flow is finished.

In summary, by adopting the technical scheme, the invention has the beneficial effects that:

1. compared with the traditional computer architecture, the hardware computing system provided by the invention transfers all operations into the storage unit, and the whole computing flow is completed by the internal control unit, thereby completely eliminating memory access operation, greatly releasing the computing power of the external computing unit, greatly improving the overall computing efficiency of the system and reducing the operation power consumption.

Drawings

For a clearer description of the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and should not be considered as limiting the scope, for those skilled in the art, without performing creative efforts, other related drawings may be obtained according to the drawings, where the proportional relationships of the components in the drawings in the present specification do not represent the proportional relationships in actual material selection design, and are merely schematic diagrams of structures or positions, where:

FIG. 1 is a schematic diagram of a hardware computing system architecture of the present invention;

FIG. 2 is a schematic diagram illustrating an internal structure of a memory computing module according to the present invention;

FIG. 3 is a schematic diagram of the neural network architecture of the present invention for computation;

FIG. 4 is a schematic diagram of the data flow of the present invention in practice;

fig. 5 is a flowchart of a computing method of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the particular embodiments described herein are illustrative only and are not intended to limit the invention, i.e., the embodiments described are merely some, but not all, of the embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

The present invention will be described in detail with reference to the accompanying drawings.

Example 1

As shown in fig. 1, the present invention is a memory-calculation-based hardware computing system for a neural network, comprising two memory calculation modules, two muxes, a DMUX, a control unit module and a Relu module,

As shown in fig. 2, the internal structures of the two memory calculation modules are the same, and the memory calculation module comprises a storage unit array, a mode control module and a read-write transmission module;

The storage unit is RRAM, MRAM or Flash.

In this embodiment, the two memory calculation modules are respectively configured to complete laplace matrix calculation and neural network convolution calculation. Each memory calculation module has two calculation modes, wherein the Laplace memory calculation module calculates the Laplace matrix of the input image before calculation starts, and then performs matrix multiplication operation on the input data and the Laplace matrix in the operation process to generate data used as convolution; the convolution calculation module stores the weight data of the neural network before calculation starts, and then performs matrix multiplication operation on the data and the weights in the calculation process, so that a convolution operation result of the input data is finally generated.

Fig. 3 is a schematic diagram of a schematic structure of a neural network for calculation according to the present invention, where the neural network is mainly divided into two parts, i.e., a hidden layer (hidden layer) and an activation function layer (relu), the hidden layer is mainly used for performing convolution operation on output data of a previous layer and weights of the previous layer, and the activation function layer is used for performing activation processing on a convolution operation result. The input data of the graph neural network is graph information, the computation of a layer of neural network is completed through one convolution operation and one activation function operation, and the final graph information is output after the computation of all layers is completed.

FIG. 4 is a schematic diagram of a data flow in practical application of the present invention, where the hardware computing system provided by the present invention can process all data that can be represented as a graph, such as a user-project graph, a scene graph, a point cloud graph, a traffic road graph, a molecular structure graph, and so on. However, the graph is abstract data, and cannot be directly calculated, so that the adjacency matrix of the graph and the feature vector of the graph node are obtained, and then the feature vector can be input into the hardware computing system for calculation, and final graph information such as user-project relation prediction, scene generation, target recognition, action recognition, semantic segmentation and the like is obtained after calculation.

Example two

This example is a further illustration of the present invention.

The method for calculating the hardware of the neural network of the graph based on the memory calculation comprises a calculation system of the first embodiment, and further comprises the following workflow:

step 2, global reset, and initializing all modules in the system;

step 3, writing the neural network weight into a memory calculation module;

As shown in fig. 5, the whole calculation process of the hardware calculation system is simply divided into 5 steps:

firstly, writing weights of a graph neural network into a memory computing module, then converting input data into a graph form, such as a user-project relationship graph, a scene graph, a point cloud relationship graph, a traffic road relationship graph, a molecular structure graph and the like, then obtaining a feature vector of a graph adjacent matrix and a graph node, inputting the data into the memory computing system for computation, and finally obtaining a final computing result, such as user-project relationship prediction, scene generation, target recognition, action recognition, semantic segmentation and the like.

The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that are not creatively contemplated by those skilled in the art within the technical scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope defined by the claims.

Claims

1. The graphic neural network hardware computing system based on memory computation comprises two memory computing modules, two MUX modules, one DMUX module, one control unit and one Relu module, and is characterized in that:

the control unit is used for generating control signals of all other modules and controlling the whole calculation flow;

the internal structures of the two memory calculation modules are the same, and the memory calculation modules comprise a storage unit array, a mode control module and a read-write transmission module;

2. The memory-computation-based graphics neural network hardware computing system of claim 1, wherein: the storage unit is RRAM, MRAM or Flash.

3. The method for calculating the graphic neural network hardware based on the memory calculation comprises a calculation system according to any one of claims 1-2, and further comprises the following workflow:

step 2, global reset, and initializing all modules in the system;

step 3, writing the neural network weight into a memory calculation module;