CN109918951B

CN109918951B - Artificial intelligence processor side channel defense system based on interlayer fusion

Info

Publication number: CN109918951B
Application number: CN201910183870.XA
Authority: CN
Inventors: 侯锐; 王兴宾; 孟丹
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2019-03-12
Filing date: 2019-03-12
Publication date: 2020-09-01
Anticipated expiration: 2039-03-12
Also published as: CN109918951A

Abstract

The invention discloses an artificial intelligence processor side channel defense system based on interlayer fusion, which is composed of a general artificial intelligence processor architecture, a fusion control unit, a global on-chip cache unit and a strip fusion unit; on the basis of a general artificial intelligence processor architecture, a fusion control unit and a global on-chip cache are added, and a neural network model is subjected to fusion processing by combining a stripe fusion method and a fusion instruction, so that the artificial intelligence processor achieves higher performance and stronger safety; the invention has novel structure, strong adaptability, good performance and high safety, can be applied to the aspects of safety protection of the existing artificial intelligent processor, model protection of a neural network and the like, and has wide practical value and application prospect.

Description

Artificial intelligence processor side channel defense system based on interlayer fusion

Technical Field

The invention relates to an artificial intelligence processor side channel defense system based on interlayer fusion, which is applied to defense of an off-chip DRAM side channel of an artificial intelligence processor and belongs to the safety field of artificial intelligence processors.

Background

In recent years, artificial intelligence techniques have been widely used in many commercial fields, such as image recognition, speech recognition, image retrieval, and the like. Since the deep learning algorithm requires strong calculation power, more and more researchers are invested in the research of the deep learning accelerator. In order to design a high-performance, low-power, real-time deep learning accelerator, researchers have conducted research from various aspects of micro-architecture, circuitry, materials, and so on. In 2014, the researchers in the china institute's computing cloud have designed a first deep learning accelerator DianNao, which is composed of a computing unit, a control unit and a storage unit, and in order to design a more general artificial intelligence processor, they issue a first set of neural network instruction sets. The compatibility of various deep learning algorithms is realized through the compiling of the instruction set, so that a better acceleration effect is achieved. In 2017, the institute of labor and technology in Ma province proposed an Eyrris deep learning accelerator, which accelerates deep learning acceleration by adopting a data flow method. Google also introduced its neural network tensor processor TPU and applied to its company internal servers. In 2018, 5-month Google shows that TPU 3.0 is provided, and the calculation performance is improved by 8 times compared with TPU2.0 and can reach 1000 trillion floating point calculations.

Since inference of the neural network model requires high-performance and low-power-consumption dedicated hardware to perform, more and more neural network models are deployed to the artificial intelligence processor to operate, so as to improve the operation efficiency and the real-time performance. In addition, the model of the neural network needs to be protected in many application scenarios, including the structure and weights of the neural network model. Such as companies relying on neural networks to provide valuable value-added services, or to provide functional services of the neural network model. The neural network model is the important intellectual property of the company.

Recently, relevant literature indicates that artificial intelligence processors are being subjected to side channel attacks, including memory side channel attacks and time side channel attacks. The structure of the neural network model can be obtained through the attack method, the weight of the neural network model is stolen by utilizing the neural network model and adopting the vulnerability of a pruning technology, and the instruction type artificial intelligence processor enables an attacker to recover the structure of the neural network model from the obtained instruction and know the storage position of the neural network model. The artificial intelligence processor faces many challenges to be attacked, so a secure defense method is needed to protect the artificial intelligence processor.

At present, researches on methods for fusion processing between neural network layers are carried out at home and abroad, but the researches are only for improving the performance of the artificial intelligence processor and are not designed from the safety perspective.

Disclosure of Invention

The invention solves the problems: the defects of the prior art are overcome, the artificial intelligence processor side channel defense system based on interlayer fusion is provided, the leakage of the internal storage side channel information of the artificial intelligence processor is reduced, the data interaction between the artificial intelligence processor and an external internal storage DRAM is reduced, the safety of the artificial intelligence processor is improved, and the artificial intelligence processor side channel defense system has the characteristics of high performance, high safety and convenience.

The technical scheme of the invention is as follows:

an artificial intelligence processor side channel defense system based on interlayer fusion, comprising: the system comprises a general artificial intelligence processor architecture, a fusion control unit, a global on-chip cache unit and a stripe fusion unit; adding a fusion control unit on the basis of a general artificial intelligence processor architecture, customizing a fusion instruction between neural network layers of the artificial intelligence processor, and realizing fusion processing of each layer of the neural network by the fusion control unit in combination with the fusion instruction; adding a global on-chip cache unit on a general artificial intelligence processor architecture for caching intermediate data processed by an artificial intelligence processor, wherein the intermediate data comprises an input characteristic diagram and an output characteristic diagram; the strip fusion method is adopted to cooperate with the fusion control unit to perform fusion processing on each layer of the neural network, so that the leakage of channel information on the memory side is reduced, the structure of an attacker deducing the neural network model is confused, and the safety performance of the artificial intelligent processor is improved.

The fusion control unit consists of fusion control logic and fusion instruction analysis logic, and the fusion control logic mainly controls the artificial intelligence processor to perform intermediate data fusion; the fusion instruction analysis logic is mainly used for analyzing the fusion instruction and sending the fusion instruction to the fusion control logic.

The band fusion unit is realized by adopting a band fusion method and a fusion instruction, and the specific process is as follows: the input characteristic diagram determines the strip superposition part through a common strip division method and the size of a convolution kernel to carry out strip division, and then carries out interlayer fusion processing on a neural network model by combining a customized neural network interlayer fusion instruction according to a corresponding strip according to a strip fusion method so as to enhance the safety of the artificial intelligent processor and improve the performance of the artificial intelligent processor.

When the stripe fusion unit realizes the stripe fusion method, a part of data overlapping between adjacent stripes is necessary; the number of rows of the data overlapping part is determined by the size of the convolution kernel, and the specific formula is as follows:

D＝K-1

d is the number of rows of data overlap and K is the size of the convolution kernel.

Compared with the prior art, the invention has the advantages that:

(1) the method adopts the strategy of interlayer fusion, reduces the memory side channel information leakage of the artificial intelligent processor and the data interaction between the artificial intelligent processor and the external memory DRAM, can hide the boundary between layers of the deep neural network, effectively defends against the problem of the leakage of the channel information of the external memory DRAM of the artificial intelligent processor, improves the safety of the artificial intelligent processor, and has the advantages of high performance, high safety and convenience.

(2) The invention can effectively reduce the data transfer between the artificial intelligent processor and the off-chip DRAM, and can eliminate the boundary between the layers in the fusion layer, so that the attack can not deduce the structure of the neural network model through the information leakage of the channel at the memory side. The invention can be widely used in the fields of safety protection of the artificial intelligent processor, AIoT security terminal and the like, and has great market benefit and good application prospect. The method can be applied to the design of other artificial intelligence processors to improve the safety performance of the artificial intelligence processor and ensure the safety of the model running in the artificial intelligence processor.

(3) The stripe fusion method provided by the invention can be applied to an artificial intelligence processor with on-chip cache with a certain size, and the safety performance of the artificial intelligence processor is enhanced on the premise of not changing the hardware architecture of the conventional artificial intelligence processor.

Drawings

FIG. 1 is a diagram of a general artificial intelligence processor architecture;

the symbols in the figures are as follows:

SoC is a system on chip; PE is a processing unit; IFmap, inputting a feature map; OFmap output characteristic diagram.

FIG. 2 is an artificial intelligence processor side channel defense system based on inter-layer fusion;

the symbols in the figures are as follows: SoC is a system on chip; PE is a processing unit; IFmap, inputting a feature map; OFmap, output characteristic diagram, Psum: cumulative sum, SNin: weight cache, NBin: input profile cache, NBout: output profile cache, Pool: pooling operation, Relu: and (4) nonlinear activation.

FIG. 3 is a graph of AlexNet model off-chip DRAM access cycle versus on-chip cache size.

FIG. 4 is a graph of the relationship between the off-chip DRAM access cycle and the on-chip cache size of the VGG network model.

Fig. 5 is a schematic diagram of a stripe fusion method for two convolutional layers, the left graph is an input feature graph, the middle graph is intermediate data processed by a neural network processor, and the right graph is an output feature graph after fusion.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and examples.

As shown in FIG. 1, the hardware architecture of a general artificial intelligence processor comprises: the system comprises a CPU (running environment of the artificial intelligence processor runs on the CPU), an off-chip DRAM and the artificial intelligence processor, wherein the artificial intelligence processor comprises a PCIe controller, a memory controller, a processing unit array, an on-chip cache and an on-chip interconnection bus. The artificial intelligence processor also needs the support of a software stack, including a neural network model, a compiler of the artificial intelligence processor and a CPU (running environment containing the artificial intelligence processor).

In order to process one layer of the neural network model, the artificial intelligence processor must receive the instruction of the layer from the CPU, read the input feature map and the corresponding weight data from the off-chip DRAM according to the instruction, then carry to the on-chip cache, carry out the multiplication and accumulation operation in the processing unit array, write the processed data back to the off-chip DRAM after completing the nonlinear operation and the pooling operation, complete the processing of the current layer, and generate the probability of each class corresponding to the neural network model after completing the processing of all layers of the neural network model.

The artificial intelligence processor generates a large amount of intermediate data, i.e., feature maps, in the process of performing neural network inference (inference). However, the cache space on the artificial intelligence processor chip is too small to accommodate large intermediate data, so that the intermediate data can be carried to the off-chip DRAM in the process of processing the data by the artificial intelligence processor, and the channel information on the memory side can be leaked by carrying the intermediate data between the artificial intelligence processor and the off-chip DRAM repeatedly. An attacker can observe the address and read-write type of the interaction between the artificial intelligence processor and the off-chip DRAM, wherein memory access can be observed through a physical probe probing bus or a hardware trojan is inserted. The structure of the neural network layer is inferred by the observed memory access pattern (read-after-write (raw) -read-write dependence).

In order to reduce the information leakage of the channel at the memory side, the interlayer fusion processing method of the neural network is realized by carrying out appropriate modification in the general artificial intelligence processor architecture. As shown in fig. 2, the structure of the system of the present invention is that a global cache unit with a certain size and a fusion control unit are added on the basis of the original general artificial intelligence processor structure.

Firstly, an artificial intelligence processor receives a fusion instruction from a CPU; then, carrying out fusion processing operation on the specific layer according to the instructions; and finally, finishing the fusion processing of the whole network and outputting the probability of the neural network model class. The fusion control unit designs a new fusion instruction according to an instruction set designed by the artificial intelligent processor, and performs interlayer fusion operation of the neural network according to the fusion instruction, so that the control logic of the original artificial intelligent processor can be unmodified. If the instruction set designed by the artificial intelligence processor can not meet the interlayer fusion strategy, new fusion control logic needs to be added to adapt to the new fusion control instruction.

Corresponding statistics is carried out through analysis (including LeNet, AlexNet, GoogLeNet, VGG and ResNet) of each neural network model and the size of on-chip cache required by an interlayer fusion strategy of each neural network model. The global on-chip cache size required by the artificial intelligence processor is obtained according to the number of the fusion strategies (more fusion strategies are provided on a certain on-chip cache, for example, 6 fusion strategies are provided on 100-150KB on-chip cache in FIG. 4, which indicates that the fusion strategies have better security) and the on-chip cache size. Although the larger the global on-chip cache is, the more the strategy of fusion between neural network layers is, the higher the performance of the artificial intelligence processor is, and the higher the security is, the global on-chip cache on the artificial intelligence processor cannot be too large. According to the invention, through a large amount of repeated theoretical analysis and experimental statistics, the appropriate global on-chip cache size required by the artificial intelligence processor is obtained.

Meanwhile, a fusion control unit and a strip fusion method are added, wherein the fusion control unit is mainly used for adding fusion control logic (comprising fusion execution logic and fusion instruction analysis logic) on a control processor of the general artificial intelligence processor and customizing a fusion instruction; the stripe fusion method is to divide the size of the input feature graph according to the stripes and perform fusion processing by taking the divided stripes as units.

The whole work flow is as follows:

(1) the artificial intelligence processor generates a fusion instruction according to the selected fusion strategy;

(2) the artificial intelligence processor receives the fusion instruction, analyzes the executed instruction through the fusion control unit, and then performs fusion processing on the neural network model according to the strip fusion method.

(3) And (3) after the artificial intelligent processor finishes the processing result of the current stripe, putting the processed fusion result into the off-chip DRAM, continuously processing the rest stripes according to the steps (1) and (2) until the fusion processing of the whole feature map is finished, and finally putting the final result into the corresponding off-chip DRAM.

(4) And (4) processing the layers to be fused of the neural network model according to the steps (1) to (3), finally completing all the layers of the neural network model, and outputting a probability value corresponding to a certain class of the neural network model.

The fusion control unit is mainly used for receiving and analyzing a fusion control instruction from the CPU, controlling the processing unit array to perform corresponding processing on a fusion layer of the neural network according to the fusion control instruction, and allocating storage positions of the input characteristic diagram and the output characteristic diagram cached on the global chip.

The global on-chip cache is used for caching intermediate data processed by the artificial intelligence processor, and comprises an input feature map and an input feature map, so that data interaction between the artificial intelligence processor and the off-chip DRAM is reduced. The size of the global on-chip cache is determined by various existing neural network models and corresponding fusion strategies, and the size of the on-chip cache of the conventional general artificial intelligence processor is limited by the area and power consumption requirements of the artificial intelligence processor, so that the on-chip cache of the accelerator on the conventional general neural network is generally small. Fig. 3 and 4 are graphs of the on-chip cache size and off-chip DRAM access cycle number of the AlexNet and VGG network models, where different fusion policies determine the on-chip cache size and off-chip DRAM access cycle number of the artificial intelligence processor, and as shown in the figures, when the on-chip cache size is between 100KB and 250KB, the artificial intelligence processor can select more fusion policies, that is, the artificial intelligence processor has better security. The size of the cache on the artificial intelligence processor chip can be selected reasonably on the premise that the artificial intelligence processor has more fusion strategies under certain on-chip cache.

As shown in fig. 5, the schematic diagram of the stripe fusion method of the two-layer convolutional neural network provides a stripe fusion method to perform fusion processing on each layer of the neural network, where the diagram on the left is an input feature diagram and the size is 6 × 6; the middle graph is intermediate data processed by the neural network processor, and the intermediate data needs to be stored in a global on-chip cache and has the size of 4 x 4; the right graph is the output feature graph after fusion, with a size of 2 x 2. The convolution operation uses a convolution kernel with a size of 3 × 3, and it is seen from fig. 5 that the input feature map is divided into two strips, where the middle data of the middle map is obtained by performing convolution with the convolution kernel in the left-hand-side-graph dashed rectangle box, and then the result of convolution of the middle data of the middle map with the convolution kernel of 3 × 3 is in the right-hand-side-graph and the first row of data in the right-hand-side-graph. The same operation is performed for the lower strip resulting in the second row of data in the right hand graph.

The selection of the stripe size is selected by a user according to the global on-chip cache size of the artificial intelligence processor and a fusion strategy. But there must be a portion of data overlap between adjacent stripes, such as the middle striped data in the figure; the number of rows of the data overlapping part is determined by the size of the convolution kernel, and the specific formula is as follows:

D＝K-1

In order to quantify the security of the interlayer fusion method, the current fusion method is considered to be safe on the premise that the fixed global cache size on the artificial intelligence processor slice is selected to generate more fusion strategies.

The stripe fusion method can enable a user to flexibly select the size of the stripe, and also enable the user to fuse more layers by utilizing the on-chip cache as much as possible, and the stripe fusion method can enable the global on-chip cache to be fully utilized.

Claims

1. An artificial intelligence processor side channel defense system based on interlayer fusion, comprising: the system comprises a general artificial intelligence processor architecture, a fusion control unit, a global on-chip cache unit and a stripe fusion unit; adding a fusion control unit on the basis of a general artificial intelligence processor architecture, customizing an artificial intelligence processor for a fusion instruction between layers of a neural network model, and combining the fusion control unit with the fusion instruction to realize fusion processing of each layer of the neural network model; adding a global on-chip cache unit on a general artificial intelligence processor architecture for caching intermediate data processed by an artificial intelligence processor, wherein the intermediate data comprises an input characteristic diagram and an output characteristic diagram; the method adopts the strip fusion method to cooperate with the fusion control unit and the fusion instruction to perform fusion processing on each layer of the neural network, thereby reducing the information leakage of the channel at the memory side, confusing the structure of the attacker deducing the neural network model and improving the safety performance of the artificial intelligent processor;

the band fusion unit is realized by adopting a band fusion method and a fusion instruction, wherein the fusion control unit is responsible for realizing the analysis and execution of the fusion instruction; the specific process is as follows: determining a strip superposition part of an input characteristic diagram by a common strip division method and a convolution kernel size to carry out strip division, and then carrying out interlayer fusion processing on a neural network model by combining a customized neural network interlayer fusion instruction according to a corresponding strip according to a strip fusion method so as to enhance the safety of an artificial intelligent processor and improve the performance of the artificial intelligent processor;

the whole work flow is as follows:

(2) the artificial intelligence processor receives the fusion instruction, analyzes the executed instruction through the fusion control unit, and then performs fusion processing on the neural network model according to a strip fusion method;

(3) after the artificial intelligent processor finishes the processing result of the current stripe, putting the processed fusion result into an off-chip DRAM, continuously processing the rest stripes according to the steps (1) and (2) until the fusion processing of the whole feature map is finished, and finally putting the final result into the corresponding off-chip DRAM;

2. The system of claim 1, wherein: when the stripe fusion unit realizes the stripe fusion method, a part of data overlapping between adjacent stripes is necessary; the number of rows of the data overlapping part is determined by the size of the convolution kernel, and the specific formula is as follows:

D＝K-1