CN112348174A - Fault-tolerant recurrent neural network architecture searching method and system - Google Patents

Fault-tolerant recurrent neural network architecture searching method and system Download PDF

Info

Publication number
CN112348174A
CN112348174A CN202011356453.XA CN202011356453A CN112348174A CN 112348174 A CN112348174 A CN 112348174A CN 202011356453 A CN202011356453 A CN 202011356453A CN 112348174 A CN112348174 A CN 112348174A
Authority
CN
China
Prior art keywords
neural network
recurrent neural
fault
network architecture
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011356453.XA
Other languages
Chinese (zh)
Inventor
王蕾
胡凯
丁东
田烁
冯权友
周理
郑重
励楠
冯超超
王永文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202011356453.XA priority Critical patent/CN112348174A/en
Publication of CN112348174A publication Critical patent/CN112348174A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a fault-tolerant recurrent neural network architecture searching method and a fault-tolerant recurrent neural network architecture searching systemcAnd degree of confusion after fault injection pplfAnd based on the degree of confusion ppl of the normal assessmentcAnd degree of confusion after fault injection pplfWeighting and calculating reward R of each new candidate recurrent neural network architecture, updating the population based on the reward R, and finally encouraging a search algorithm to continuously search out the optimal recurrent neural network architecture with high performance and high fault tolerance according to the reward, so that the obtained optimal recurrent neural network architecture has very strong calculation fault tolerance and weight storage fault tolerance respectively, and the found optimal recurrent neural network architecture can be deployed on FPGA (field programmable gate array), ASIC (application specific integrated circuit) hardware and an accelerator based on RRAM (random access memory)Has better robustness.

Description

Fault-tolerant recurrent neural network architecture searching method and system
Technical Field
The invention relates to an architecture optimization technology of a Recurrent Neural Network (RNN) on a hardware platform, in particular to a fault-tolerant recurrent neural network architecture searching method and system.
Background
A Recurrent Neural Network (Recurrent Neural Network) is a deep Neural Network mainly based on sequence data input, and has important applications in the fields of natural language processing, such as speech recognition, language modeling, machine translation, and the like. Its variants include long-short term memory network (LSTM), gated cyclic unit network (GRU), etc. Neural architecture search (neural architecture search) is a sub-domain of AutoML. First proposed by google corporation, amazon, microsoft, etc. followed up in succession. In China, a great deal of research work is carried out in Noah's Huacheng laboratory and millet AI laboratory. The neural architecture search aims to replace the work of designing a neural network architecture by human experts by utilizing a large amount of calculation power, and achieves top performance in tasks such as image recognition, text prediction, object detection and the like. Fault Tolerance (Fault Tolerance) is a concept created to combat hardware failures. There are many reasons for the failure, including: weak joint reinforcement, radiation, temperature and humidity changes, aging in the process, etc. The hardware we focus on in this paper is primarily the accelerator used to deploy the neural network. The major neural network accelerator devices at present include: FPGA, ASIC, and RRAM-based accelerators. Where FPGAs and ASICs are hardware devices based on CMOS circuitry, and are less and less sensitive to environmental noise as CMOS fabrication processes become smaller. Accelerators based on RRAM are prone to hard failures, such as SAFs (Stuck-at-Faults), because the fabrication technology is still immature. The most traditional way of fault tolerance is redundant hardware, but this results in increased area and power consumption costs. Sensitivity analysis is also a fault-tolerant way, which is achieved by detecting a fault-prone part of the circuit and applying redundancy protection. Fault tolerance Training (Fault Tolerant Training) is a Fault tolerance mode for a neural network, and a trained neural network has certain Fault tolerance by inserting faults into weight Training of the network.
Faults that may be encountered during the deployment of an RNN network to a hardware device may be classified into two categories: and calculating faults and weight value storage faults. Computing failures occur mainly in CMOS based hardware platforms such as FPGAs and AISCs, where the probability of soft errors is much higher than hard errors. Weight storage failures occur mainly on RRAM-based accelerators. Therefore, how to perform fault tolerance when deploying the neural network to the edge device becomes a key technical problem to be solved urgently, for soft errors and hard errors which may occur when deploying the neural network to the edge device.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a fault-tolerant recurrent neural network architecture searching method and a fault-tolerant recurrent neural network architecture searching system, which achieve the aim of improving the fault tolerance by using architecture optimization, thereby realizing the final finding of a high-performance and high-fault-tolerant recurrent neural network architecture and effectively solving the problems of soft errors and hard errors caused by the deployment of a neural network on edge equipment.
In order to solve the technical problems, the invention adopts the technical scheme that:
a fault-tolerant recurrent neural network architecture search method comprises the following steps:
1) aiming at a target recurrent neural network endowed with a preset sharing weight, randomly initializing a population containing a plurality of candidate recurrent neural network architectures as a current population;
2) calculating the confusion degree ppl of the normal evaluation of each candidate recurrent neural network architecture in the current populationcAnd degree of confusion after fault injection pplfAnd based on the degree of confusion ppl of the normal assessmentcAnd degree of confusion after fault injection pplfCalculating the reward R of each candidate recurrent neural network architecture in a weighted mode;
3) generating new candidate recurrent neural network architectures by taking each candidate recurrent neural network architecture in the current population as a parent architecture, and calculating the normal evaluation confusion degree ppl of each new candidate recurrent neural network architecturecAnd degree of confusion after fault injection pplfAnd based on the degree of confusion ppl of the normal assessmentcAnd degree of confusion after fault injection pplfWeighted calculation of reward R of each new candidate recurrent neural network architecture;
4) if the reward R of the new candidate recurrent neural network architecture is larger than the optimal reward in the current population, adding the new candidate recurrent neural network architecture into the current population, and deleting the oldest candidate recurrent neural network architecture in the current population to enable the population total number to be unchanged;
5) judging the iteration number RNumber of roundsGreater than or equal to a preset threshold value RMaximum number of roundsIf the search is not true, skipping to execute the step 3) to continue the iterative search; otherwise, outputting the candidate recurrent neural network architecture corresponding to the optimal reward in the current population as the optimal recurrent neural network architecture obtained by searching.
Optionally, the calculation function expression of the reward R is as follows:
R=(1-αr)*pplcr*pplf
in the above formula, αrIs a value range of [0,1]Predetermined weight coefficient of between, pplcRepresenting the perplexity of the Normal evaluation of the candidate recurrent neural network architecture, pplfRepresenting the degree of confusion after fault injection for the candidate recurrent neural network architecture.
Optionally, step 2) is preceded by a step of training the target recurrent neural network to generate a preset shared weight, and the calculation function expression of the loss function for the weight in the training process is as follows:
L=(1–αl)*CEcl*CEf
in the above formula, L represents a loss function of weight, αlIs a value range of [0,1]Between the preset weight coefficients, CEcRepresenting cross-entropy loss, CE, under normal training of the searched recurrent neural network architecturefRepresenting the cross-entropy loss of the searched recurrent neural network architecture after fault injection.
Optionally, the functional expression of the target recurrent neural network is as follows:
c0=sigmoid(Wxc xt+Whc hprev),
h0=c0 tanh(Wxh x+Whh hprev)+(1-c0)hprev,
ct=sigmoid(Wc ct-1),
ht=ct f(Wh ht-1)+(1-ct)ht-1
in the above formula, WxcRepresenting inputs to an initial cell state weight matrix, WhcWeight matrix representing hidden layer to cell state, WxhRepresenting inputs to a hidden layer weight matrix, WhhRepresents the weight matrix from the previous hidden layer to the current hidden layer, WcRepresenting a cell state weight matrix, WhRepresents a hidden layer weight matrix, hprevOutput representing the last hidden layer, c0,h0Respectively representing the cell state vector and the hidden state vector at the initial time, ct,htRespectively representing the cell state vector and the hidden state vector at time t, ct-1,ht-1Respectively representing the cell state vector and the hidden state vector at time t-1, the function f () representing the selected activation function at each node, xtRepresenting the initial input vector and x representing the current input vector.
Optionally, the step 5) is followed by a step of constructing an optimal recurrent neural network architecture obtained by verifying the calculation fault and the weight fault.
Optionally, the function expression of the optimal recurrent neural network architecture after the construction calculation fault is as follows:
b=θ2α-1(-1)β
c0=sigmoid(Wxc xt+Whc hprev+b1),
h0=c0 tanh(Wxh x+Whh hprev+b2)+(1-c0)hprev,
ct=sigmoid(Wc ct-1),
ht=ct f(Wh ht-1)+(1-ct)ht-1+b3
in the above formula, b represents the bit flipping of a random bit, θ represents which bit weight fails, θ follows bernoulli distribution, α represents the bit with bias, α follows uniform distribution, β represents the positive or negative bias, β follows uniform distribution, and b1,b2,b3Is an independently generated bit flip b, WxcRepresenting inputs to an initial cell state weight matrix, WhcWeight matrix representing hidden layer to cell state, WxhRepresenting inputs to a hidden layer weight matrix, WhhRepresents the weight matrix from the previous hidden layer to the current hidden layer, WcRepresenting a cell state weight matrix, WhRepresents a hidden layer weight matrix, hprevOutput representing the last hidden layer, c0,h0Respectively representing the cell state vector and the hidden state vector at the initial time, ct,htRespectively representing the cell state vector and the hidden state vector at time t, ct-1,ht-1Respectively representing the cell state vector and the hidden state vector at time t-1, the function f () representing the selected activation function at each node, xtRepresenting the initial input vector, x representing the current input vector, and sigmoid representing the sigmoid activation function.
Optionally, the construction weight isThe barriers include a weight matrix W for the input to the initial cell statexcHidden layer to cell state weight matrix WhcInputting the weight matrix W into the hidden layerxhWeight matrix W from previous hidden layer to current hidden layerhhCell state weight matrix WcHidden layer weight matrix WhPerforms a process shown by the following equation:
W1=(1–θ)W0+θe,
θ~Bernoulli(p0+p1)H×W
m~Bernoulli(p1/(p0+p1))H×W, (4)
e=Rw sgn(W)m
in the above formula, W1To construct a target matrix after weight failure, W0In order to construct a target matrix before weight failure, theta represents which bit weight can fail, theta obeys Bernoulli distribution, e represents a failure target value, and p represents0Indicating a high resistance state fault in a hard fault, p1Indicating a low resistance state fault in a hard fault, H and W indicating the height and width of the target matrix, respectively, m indicating the type of fault at which the fault occurred, m obeying a Bernoulli distribution, Bernoulli indicating a Bernoulli distribution, RwRepresenting the maximum range of data that a piece of hardware can represent.
In addition, the invention also provides a fault-tolerant recurrent neural network architecture search system, which comprises a computer device, wherein the computer device comprises a microprocessor and a memory which are connected with each other, and the microprocessor is programmed or configured to execute the steps of the fault-tolerant recurrent neural network architecture search method.
In addition, the invention also provides a fault-tolerant recurrent neural network architecture search system, which comprises a computer device, wherein the computer device comprises a microprocessor and a memory which are connected with each other, and a computer program which is programmed or configured to execute the fault-tolerant recurrent neural network architecture search method is stored in the memory.
Furthermore, the present invention also provides a computer-readable storage medium having stored therein a computer program programmed or configured to perform the fault-tolerant recurrent neural network architecture search method.
Compared with the prior art, the invention has the following advantages: the invention adopts an evolutionary algorithm to search the circulating neural network architecture and calculates the confusion degree ppl of the normal evaluation of each new candidate circulating neural network architecturecAnd degree of confusion after fault injection pplfAnd based on the degree of confusion ppl of the normal assessmentcAnd degree of confusion after fault injection pplfAnd calculating the reward R of each new candidate recurrent neural network architecture in a weighted mode, and updating the population based on the reward R, so that a search algorithm is encouraged to continuously search candidate architectures with high performance and high fault tolerance according to the reward, the obtained optimal recurrent neural network architecture has strong weight storage fault tolerance, and the found optimal recurrent neural network architecture has better robustness when being deployed on FPGA, ASIC hardware and an accelerator based on RRAM.
Drawings
FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.
FIG. 2 is a schematic overall flow chart of a method according to an embodiment of the present invention.
Fig. 3 is two schematic structural diagrams of a target recurrent neural network in an embodiment of the present invention.
Fig. 4 is an experimental result of a calculation fault experiment in the embodiment of the present invention.
Fig. 5 is an experimental result of a weight storage failure experiment in the embodiment of the present invention.
Detailed Description
As shown in fig. 1, the method for searching a fault-tolerant recurrent neural network architecture of the present embodiment includes:
1) aiming at a target recurrent neural network endowed with a preset sharing weight, randomly initializing a population containing a plurality of candidate recurrent neural network architectures as a current population;
2) calculating the confusion degree ppl of the normal evaluation of each candidate recurrent neural network architecture in the current populationcAnd after fault injectionPerplexity of pplfAnd based on the degree of confusion ppl of the normal assessmentcAnd degree of confusion after fault injection pplfCalculating the reward R of each candidate recurrent neural network architecture in a weighted mode;
3) generating new candidate recurrent neural network architectures by taking each candidate recurrent neural network architecture in the current population as a parent architecture, and calculating the normal evaluation confusion degree ppl of each new candidate recurrent neural network architecturecAnd degree of confusion after fault injection pplfAnd based on the degree of confusion ppl of the normal assessmentcAnd degree of confusion after fault injection pplfWeighted calculation of reward R of each new candidate recurrent neural network architecture;
4) if the reward R of the new candidate recurrent neural network architecture is larger than the optimal reward in the current population, adding the new candidate recurrent neural network architecture into the current population, and deleting the oldest candidate recurrent neural network architecture in the current population to enable the population total number to be unchanged;
5) judging the iteration number RNumber of roundsGreater than or equal to a preset threshold value RMaximum number of roundsIf the search is not true, skipping to execute the step 3) to continue the iterative search; otherwise, outputting the candidate recurrent neural network architecture corresponding to the optimal reward in the current population as the optimal recurrent neural network architecture obtained by searching.
In order to search the RNN architecture with high performance and high fault tolerance in this huge search space, the fault-tolerant recurrent neural network architecture search method of this embodiment uses two reward signals to achieve this goal. The method comprises the following steps: perplexity of Normal evaluation of candidate recurrent neural network architecturecDegree of confusion for Normal assessment pplcRefers to the confusion of the candidate architecture (which is a measure for evaluating the quality of a language model, the lower the confusion represents the more excellent the model), and the confusion after fault injection of the candidate recurrent neural network architecture (ppl)fDegree of confusion after injection of a fault pplfRefers to the confusion of the candidate architecture after being injected with the fault. It should be noted that the confusion is a well-known calculation method in the recurrent neural network, and the confusion of the above-mentioned normal evaluationDegree of pplcConfusion after injection failure pplfThe confusion calculated in the two states of the candidate recurrent neural network architecture is only obtained, and improvement on the confusion calculation is not involved, so the specific calculation details of the confusion are not detailed herein. The search strategy in this embodiment employs an evolutionary algorithm. Firstly, a candidate architecture group is randomly initialized, and the ppl of the candidate architecture in the group is continuously evaluatedcAnd pplfAnd using both as a constant alpharBalancing results in a reward (which is also the smaller the better, according to the characteristics of the confusion), and the search algorithm is encouraged to continuously search out candidate architectures with high performance and high fault tolerance according to the reward. The weighted calculation of the reward R of each new candidate recurrent neural network architecture can be implemented in various ways, and as an alternative implementation, the calculation function of the reward R in the embodiment is expressed as follows:
R=(1-αr)*pplcr*pplf
in the above formula, αrIs a value range of [0,1]Predetermined weight coefficient of between, pplcRepresenting the perplexity of the Normal evaluation of the candidate recurrent neural network architecture, pplfRepresenting the degree of confusion after fault injection for the candidate recurrent neural network architecture. If the reward of the new framework is smaller than that of the current optimal framework, the new framework is marked as a parent framework and added into the population, the oldest candidate framework in the population is deleted, and the population total number is kept unchanged. This process will iterate until the algorithm runs over.
As shown in fig. 2, step 2) in this embodiment includes a step of training a target recurrent neural network to generate a preset shared weight, and a calculation function expression of a loss function for the weight in the training process is as follows:
L=(1–αl)*CEcl*CEf
in the above formula, L represents a loss function of weight, αlIs a value range of [0,1]Between the preset weight coefficients, CEcRepresenting cross-entropy loss, CE, under normal training of the searched recurrent neural network architecturefRepresenting the cross-entropy loss of the searched recurrent neural network architecture after fault injection. In order to accelerate the evaluation process of the candidate architecture, the fault-tolerant recurrent neural network architecture search method of the embodiment adopts a weight sharing technology during evaluation, that is, all the candidate architectures share a set of weights. In order to enable the set of weights to have certain fault-tolerant characteristics, in the training of the shared weights, two parts of weighted sum processing are carried out on cross entropy loss transmitted in the reverse direction. In this embodiment, the set of shared weights will be trained for 150 rounds, and the obtained shared weights will have a certain fault-tolerant capability, so that the search quality of the fault-tolerant architecture can be further improved. After training is completed, the method can be used for randomly initializing a population containing a plurality of candidate recurrent neural network architectures as a current population in step 1) of the method. Referring to fig. 2, after the optimal recurrent neural network architecture obtained by searching in step 5), hardware deployment may be performed on the optimal recurrent neural network architecture obtained by searching.
In this embodiment, the functional expression of the target recurrent neural network is shown as follows:
c0=sigmoid(Wxc xt+Whc hprev),
h0=c0 tanh(Wxh x+Whh hprev)+(1-c0)hprev,
ct=sigmoid(Wc ct-1),
ht=ct f(Wh ht-1)+(1-ct)ht-1
in the above formula, WxcRepresenting inputs to an initial cell state weight matrix, WhcWeight matrix representing hidden layer to cell state, WxhRepresenting inputs to a hidden layer weight matrix, WhhRepresents the weight matrix from the previous hidden layer to the current hidden layer, WcRepresenting a cell state weight matrix, WhRepresents a hidden layer weight matrix, hprevOutput representing the last hidden layer, c0,h0Respectively representing the cell state vector and the hidden state vector at the initial time, ct,htRespectively representing the cell state vector and the hidden state vector at time t, ct-1,ht-1Respectively representing the cell state vector and the hidden state vector at time t-1, the function f () representing the selected activation function at each node, xtRepresenting the initial input vector and x representing the current input vector. The target recurrent neural network shown in the above formula is an RNN network enhanced by a highway bypass (highway bypass) technology, and compared with an ordinary RNN network, the target recurrent neural network effectively avoids the gradient explosion and gradient disappearance phenomena that may occur in the original RNN network. In the embodiment, the target recurrent neural network is a unit of a directed acyclic graph with 12 nodes, and each node can select two activation functions, namely ReLU and tanh, as shown in sub-graph (1) and sub-graph (2) of fig. 3, which are two forms of the target recurrent neural network. Therefore, the size of the search space of the optimal recurrent neural network architecture in this embodiment is 212*12!≈2*1012
As an optional implementation manner, in this embodiment, step 5) is followed by a step of constructing an optimal recurrent neural network architecture obtained by verifying a calculation fault and a weight fault, and by this step, a fault that may be encountered in a process of deploying an RNN network to a hardware device may be simulated to test the optimal recurrent neural network architecture.
Computing failures occur mainly in CMOS based hardware platforms such as FPGAs and AISCs, where the probability of soft errors is much higher than hard errors. In the embodiment, it is assumed that all data in the hardware platform is 8-bit quantized storage. As an optional implementation manner, in this embodiment, a random bit flipping is performed on each multiply-accumulate unit to simulate a calculation error, and a functional expression of the optimal cyclic neural network architecture after the calculation fault is constructed is as follows:
b=θ2α-1(-1)β
c0=sigmoid(Wxc xt+Whc hprev+b1),
h0=c0 tanh(Wxh x+Whh hprev+b2)+(1-c0)hprev,
ct=sigmoid(Wc ct-1),
ht=ct f(Wh ht-1)+(1-ct)ht-1+b3
in the above formula, b represents the bit flipping of a random bit, θ represents which bit weight fails, θ follows bernoulli distribution, α represents the bit with bias, α follows uniform distribution, β represents the positive or negative bias, β follows uniform distribution, and b1,b2,b3Is an independently generated bit flip b, WxcRepresenting inputs to an initial cell state weight matrix, WhcWeight matrix representing hidden layer to cell state, WxhRepresenting inputs to a hidden layer weight matrix, WhhRepresents the weight matrix from the previous hidden layer to the current hidden layer, WcRepresenting a cell state weight matrix, WhRepresents a hidden layer weight matrix, hprevOutput representing the last hidden layer, c0,h0Respectively representing the cell state vector and the hidden state vector at the initial time, ct,htRespectively representing the cell state vector and the hidden state vector at time t, ct-1,ht-1Respectively representing the cell state vector and the hidden state vector at time t-1, the function f () representing the selected activation function at each node, xtRepresenting the initial input vector, x representing the current input vector, and sigmoid representing the sigmoid activation function. The dimension of the intermediate result tensor is denoted as (H, W).
Weight storage failures occur mainly on RRAM-based accelerators. Assuming that all data in the hardware platform is 8-bit quantized storage, the data range that can be represented is:
[-Rw,Rw]=[-2-l(2Q+1-1),2-l(2Q+1)-1]
in the above formula, RwThe maximum range of data that certain hardware can represent is represented, Q represents the bit width, and l represents the decimal length. Because the accelerator manufacturing process is not yet mature, stuck-at faults are likely to occur: when in the high resistance state, i.e. SAF0, pairThe corresponding weight value will be set to zero; when in the low resistance state, i.e., SAF1, if the weight is negative, the corresponding weight is set to the lower bound-R of the representation rangewIf the weight is positive, the corresponding weight is set as the upper bound R of the representation rangew
As an alternative implementation, the construction of the weight value fault in this embodiment includes inputting the weight value matrix W to the initial cell statexcHidden layer to cell state weight matrix WhcInputting the weight matrix W into the hidden layerxhWeight matrix W from previous hidden layer to current hidden layerhhCell state weight matrix WcHidden layer weight matrix WhPerforms a process shown by the following equation:
W1=(1–θ)W0+θe,
θ~Bernoulli(p0+p1)H×W
m~Bernoulli(p1/(p0+p1))H×W
e=Rw sgn(W)m
in the above formula, W1To construct a target matrix after weight failure, W0In order to construct a target matrix before weight failure, theta represents which bit weight can fail, theta obeys Bernoulli distribution, e represents a failure target value, and p represents0Indicating a high resistance state fault in a hard fault, p1Indicating a low resistance state fault in a hard fault, H and W indicating the height and width of the target matrix, respectively, m indicating the type of fault at which the fault occurred, m obeying a Bernoulli distribution, Bernoulli indicating a Bernoulli distribution, RwRepresenting the maximum range of data that a piece of hardware can represent.
For example, with ct=sigmoid(Wc ct-1) Cell state weight matrix W incFor example, the cell state weight matrix WcThe processing shown by the following equation is performed as the target matrix:
Wc1=(1–θ)Wc+θe,
θ~Bernoulli(p0+p1)H×W
m~Bernoulli(p1/(p0+p1))H×W
e=Rw sgn(W)m
in the above formula, Wc1To construct a cell state weight matrix after weight failure, WcFor the cell state weight matrix, θ indicates which bit weight is faulty, θ follows Bernoulli distribution, e indicates the target value of the fault, p0Indicating a high resistance state fault in a hard fault, p1Indicating a low resistance state fault in a hard fault, H and W indicating the height and width of the target matrix, respectively, m indicating the type of fault at which the fault occurred, m obeying a Bernoulli distribution, Bernoulli indicating a Bernoulli distribution, RwRepresenting the maximum range of data that a piece of hardware can represent. Wherein the probability of being in a high-impedance state when a fault occurs is p0In the low resistance state, the probability is p1. Theta represents which weight of a position fails; the m obeys Bernoulli distribution and shows that the SAF0 or the SAF1 fails when the failure occurs; e represents the fault target value (0 or + -R)w). Except that the weight matrix WcBesides, six weight matrix (W)xc、Whc、Wxh、Whh、Wc、Wh) A weight storage fault may be injected.
In order to verify the fault-tolerant recurrent neural network architecture searching method, a high-performance and high-fault-tolerance RNN network mechanism is searched on two ptb (penn treebank) data sets, and the architecture is well migrated to a larger Wikitxt-2 data set, so that the effectiveness of the method is further proved. In this example, the above experiment was performed in a PyTorch frame, and the display card model was Tesla V100. All data in the experiment were quantized to an 8-bit fixed-point representation.
Firstly, calculating a fault experiment.
In this experiment, two balance parameters alpha are determined through grid searchrAnd alphalBoth set to 0.5, the final results are shown in fig. 4, where the best architectures found by both ENAS and DARTS works are used as baseline for experimental comparison. C-FT-RThe NN is an optimal architecture which can resist the calculation fault and is obtained by the system under the condition of not performing fault-tolerant training, and the C-FTT-RNN is an optimal architecture which can resist the calculation fault and is obtained by the system after the fault-tolerant training. Experiments were performed on two data sets, PTB and WT2, respectively.
Confusion degree ppl of normal evaluation when no fault exists in architecture found on PTB data setcIt was 57.8. When the failure rate is 3 x 10-5Time, degree of confusion after injection of a fault pplf67.6, when the failure rate is 1 x 10-4Time, degree of confusion after injection of a fault pplf72.3, when the failure rate is 3 x 10-4Time, degree of confusion after injection of a fault pplfIs 85.3. The found architecture is then migrated to the wikitxt-2 dataset, with the confusion degree ppl of the architecture's normal evaluation in the absence of failuresfIt was 69.1. When the failure rate is 3 x 10-5Time, degree of confusion after injection of a fault pplf76.4, when the failure rate is 1 x 10-4Time, degree of confusion after injection of a fault pplf80.6, when the failure rate is 3 x 10-4Time, degree of confusion after injection of a fault pplfIs 94.3. The optimal recurrent neural network architecture obtained by the recurrent neural network architecture searching method capable of tolerating the faults has strong fault tolerance of the calculation faults, and the found optimal recurrent neural network architecture has better robustness when being deployed on FPGA hardware and ASIC hardware.
And secondly, carrying out a weight storage fault experiment.
In this experiment, two balance parameters alpha are determined through grid searchrAnd alphalThe final results are shown in FIG. 5 with the optimal architectures found for both ENAS and DARTS as baseline for experimental comparisons, set at 0.4 and 0.6, respectively. The W-FT-RNN is an optimal architecture which can resist weight storage and calculation faults and is obtained by the system after fault-tolerant training. Experiments were performed on two data sets, PTB and WT2, respectively. According to actual literature data, when a fixed fault occurs, the high-resistance fault rate is 83.7%, and the low-resistance fault rate is 16.3%. In the experiment IThey continue to use this data. Confusion degree ppl of normal evaluation when no fault exists in architecture found on PTB data setcIt was 57.9. When the failure rate is 8%, the degree of confusion after the failure is injected pplf71.3, the degree of confusion after fault injection ppl when the fault rate is 10%f73.9, the confusion after fault injection ppl when the fault rate is 12%fIs 90.2. And then migrating the found architecture to a wikitxt-2 data set, wherein the perl of the confusion degree of normal evaluation is realized when no fault existscIt was 69.1. When the failure rate is 8%, the degree of confusion after the failure is injected pplfAt a failure rate of 10%, 81.8, the degree of confusion after the injection of the failure pplf83.1, the degree of confusion after fault injection ppl when the fault rate is 12%fIt was 97.5. This shows that the optimal recurrent neural network architecture obtained by the method for searching for a recurrent neural network architecture with fault tolerance of the embodiment has a strong fault tolerance of weight storage, and the found optimal recurrent neural network architecture has a better robustness when deployed on an accelerator based on RRAM.
In addition, the present embodiment also provides a fault-tolerant recurrent neural network architecture search system, which includes a computer device including a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to execute the steps of the fault-tolerant recurrent neural network architecture search method.
In addition, the present embodiment also provides a fault-tolerant recurrent neural network architecture search system, which includes a computer device including a microprocessor and a memory connected to each other, wherein the memory stores therein a computer program programmed or configured to execute the above fault-tolerant recurrent neural network architecture search method.
Furthermore, the present embodiment also provides a computer-readable storage medium having stored therein a computer program programmed or configured to execute the aforementioned fault-tolerant recurrent neural network architecture search method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is directed to methods, apparatus (systems), and computer program products according to embodiments of the application wherein instructions, which execute via a flowchart and/or a processor of the computer program product, create means for implementing functions specified in the flowchart and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (10)

1. A fault-tolerant recurrent neural network architecture search method is characterized by comprising the following steps:
1) aiming at a target recurrent neural network endowed with a preset sharing weight, randomly initializing a population containing a plurality of candidate recurrent neural network architectures as a current population;
2) calculating the confusion degree ppl of the normal evaluation of each candidate recurrent neural network architecture in the current populationcAnd degree of confusion after fault injection pplfAnd based on the degree of confusion ppl of the normal assessmentcAnd degree of confusion after fault injection pplfCalculating the reward R of each candidate recurrent neural network architecture in a weighted mode;
3) generating new candidate recurrent neural network architectures by taking each candidate recurrent neural network architecture in the current population as a parent architecture, and calculating the normal evaluation confusion degree ppl of each new candidate recurrent neural network architecturecAnd degree of confusion after fault injection pplfAnd based on the degree of confusion ppl of the normal assessmentcAnd degree of confusion after fault injection pplfWeighted calculation of reward R of each new candidate recurrent neural network architecture;
4) if the reward R of the new candidate recurrent neural network architecture is larger than the optimal reward in the current population, adding the new candidate recurrent neural network architecture into the current population, and deleting the oldest candidate recurrent neural network architecture in the current population to enable the population total number to be unchanged;
5) judging the iteration number RNumber of roundsGreater than or equal to a preset threshold value RMaximum number of roundsIf the search is not true, skipping to execute the step 3) to continue the iterative search; otherwise, outputting the candidate recurrent neural network architecture corresponding to the optimal reward in the current population as the optimal recurrent neural network architecture obtained by searching.
2. The method of claim 1, wherein the reward R is calculated as a function of the following expression:
R=(1-αr)*pplcr*pplf
in the above formula, αrIs a value range of [0,1]Predetermined weight coefficient of between, pplcConfusion representing the normal evaluation of candidate recurrent neural network architecturesDegree of rotation, pplfRepresenting the degree of confusion after fault injection for the candidate recurrent neural network architecture.
3. The method according to claim 1, wherein step 2) is preceded by a step of training the target recurrent neural network to generate the preset shared weight, and the calculation function of the loss function of the weight in the training process is expressed as follows:
L=(1–αl)*CEcl*CEf
in the above formula, L represents a loss function of weight, αlIs a value range of [0,1]Between the preset weight coefficients, CEcRepresenting cross-entropy loss, CE, under normal training of the searched recurrent neural network architecturefRepresenting the cross-entropy loss of the searched recurrent neural network architecture after fault injection.
4. The method according to claim 1, wherein the functional expression of the target recurrent neural network is as follows:
c0=sigmoid(Wxcxt+Whchprev),
h0=c0tanh(Wxhx+Whhhprev)+(1-c0)hprev,
ct=sigmoid(Wcct-1),
ht=ctf(Whht-1)+(1-ct)ht-1
in the above formula, WxcRepresenting inputs to an initial cell state weight matrix, WhcWeight matrix representing hidden layer to cell state, WxhRepresenting inputs to a hidden layer weight matrix, WhhRepresents the weight matrix from the previous hidden layer to the current hidden layer, WcRepresenting a cell state weight matrix, WhRepresents a hidden layer weight matrix, hprevOutput representing the last hidden layer, c0,h0Respectively representing the cell state vector and the hidden state vector at the initial time, ct,htRespectively representing the cell state vector and the hidden state vector at time t, ct-1,ht-1Respectively representing the cell state vector and the hidden state vector at time t-1, the function f () representing the selected activation function at each node, xtRepresenting the initial input vector and x representing the current input vector.
5. The method for searching for a fault-tolerant recurrent neural network architecture of claim 4, wherein step 5) is followed by the step of constructing an optimal recurrent neural network architecture for verification of computational and weight failures.
6. The method according to claim 5, wherein the function of constructing the optimal recurrent neural network architecture after the calculation of the fault is expressed as follows:
b=θ2α-1(-1)β
c0=sigmoid(Wxcxt+Whchprev+b1),
h0=c0tanh(Wxhx+Whhhprev+b2)+(1-c0)hprev,
ct=sigmoid(Wcct-1),
ht=ctf(Whht-1)+(1-ct)ht-1+b3
in the above formula, b represents the bit flipping of a random bit, θ represents which bit weight fails, θ follows bernoulli distribution, α represents the bit with bias, α follows uniform distribution, β represents the positive or negative bias, β follows uniform distribution, and b1,b2,b3Is an independently generated bit flip b, WxcRepresenting inputs to an initial cell state weight matrix, WhcWeight matrix representing hidden layer to cell state, WxhRepresenting inputs to a hidden layer weight matrix, WhhRepresents the weight matrix from the previous hidden layer to the current hidden layer, WcRepresenting a cell state weight matrix, WhRepresents a hidden layer weight matrix, hprevOutput representing the last hidden layer, c0,h0Respectively representing the cell state vector and the hidden state vector at the initial time, ct,htRespectively representing the cell state vector and the hidden state vector at time t, ct-1,ht-1Respectively representing the cell state vector and the hidden state vector at time t-1, the function f () representing the selected activation function at each node, xtRepresenting the initial input vector, x representing the current input vector, and sigmoid representing the sigmoid activation function.
7. The method of claim 5, wherein constructing the weight fault comprises inputting the initial cell state weight matrix WxcHidden layer to cell state weight matrix WhcInputting the weight matrix W into the hidden layerxhWeight matrix W from previous hidden layer to current hidden layerhhCell state weight matrix WcHidden layer weight matrix WhPerforms a process shown by the following equation:
W1=(1–θ)W0+θe,
θ~Bernoulli(p0+p1)H×W
m~Bernoulli(p1/(p0+p1))H×W
e=Rwsgn(W)m
in the above formula, W1To construct a target matrix after weight failure, W0In order to construct a target matrix before weight failure, theta represents which bit weight can fail, theta obeys Bernoulli distribution, e represents a failure target value, and p represents0Indicating a high resistance state fault in a hard fault, p1Indicating low resistance state fault in hard fault, H and W respectively indicating height and width of target matrix, m indicating fault type when fault occurs, m obeying Bernoulli distribution, Bernoulli denotes the Bernoulli distribution, RwRepresenting the maximum range of data that a piece of hardware can represent.
8. A fault-tolerant recurrent neural network architecture search system, comprising a computer device including an interconnected microprocessor and memory, wherein the microprocessor is programmed or configured to perform the steps of the fault-tolerant recurrent neural network architecture search method of any one of claims 1 to 7.
9. A fault-tolerant recurrent neural network architecture search system, comprising a computer device comprising an interconnected microprocessor and memory, wherein the memory has stored therein a computer program programmed or configured to perform the fault-tolerant recurrent neural network architecture search method of any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon a computer program programmed or configured to perform the fault-tolerant recurrent neural network architecture search method of any one of claims 1-7.
CN202011356453.XA 2020-11-26 2020-11-26 Fault-tolerant recurrent neural network architecture searching method and system Pending CN112348174A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011356453.XA CN112348174A (en) 2020-11-26 2020-11-26 Fault-tolerant recurrent neural network architecture searching method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011356453.XA CN112348174A (en) 2020-11-26 2020-11-26 Fault-tolerant recurrent neural network architecture searching method and system

Publications (1)

Publication Number Publication Date
CN112348174A true CN112348174A (en) 2021-02-09

Family

ID=74365977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011356453.XA Pending CN112348174A (en) 2020-11-26 2020-11-26 Fault-tolerant recurrent neural network architecture searching method and system

Country Status (1)

Country Link
CN (1) CN112348174A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906887A (en) * 2021-02-20 2021-06-04 上海大学 Sparse GRU neural network acceleration realization method and device
CN116092646A (en) * 2023-04-10 2023-05-09 北京师范大学 Method and device for analyzing brain functions of pregnant alcohol-exposed women

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144555A (en) * 2019-12-31 2020-05-12 中国人民解放军国防科技大学 Recurrent neural network architecture search method, system and medium based on improved evolutionary algorithm

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144555A (en) * 2019-12-31 2020-05-12 中国人民解放军国防科技大学 Recurrent neural network architecture search method, system and medium based on improved evolutionary algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KAI HU.ET AL: ""FTR-NAS: Fault-Tolerant Recurrent Neural Architecture Search"", 《SPRINGER》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906887A (en) * 2021-02-20 2021-06-04 上海大学 Sparse GRU neural network acceleration realization method and device
CN112906887B (en) * 2021-02-20 2023-03-24 上海大学 Sparse GRU neural network acceleration realization method and device
CN116092646A (en) * 2023-04-10 2023-05-09 北京师范大学 Method and device for analyzing brain functions of pregnant alcohol-exposed women

Similar Documents

Publication Publication Date Title
Goswami et al. A physics-informed variational DeepONet for predicting crack path in quasi-brittle materials
Tran et al. Star-based reachability analysis of deep neural networks
CN110321603B (en) Depth calculation model for gas path fault diagnosis of aircraft engine
Salami et al. On the resilience of rtl nn accelerators: Fault characterization and mitigation
Li et al. FTT-NAS: Discovering fault-tolerant neural architecture
US10339447B2 (en) Configuring sparse neuronal networks
US9886663B2 (en) Compiling network descriptions to multiple platforms
US20150269481A1 (en) Differential encoding in neural networks
US20150170027A1 (en) Neuronal diversity in spiking neural networks and pattern classification
CN106030622A (en) In situ neural network co-processing
CN112348174A (en) Fault-tolerant recurrent neural network architecture searching method and system
Chaudhuri et al. Efficient fault-criticality analysis for AI accelerators using a neural twin
Ning et al. FTT-NAS: Discovering fault-tolerant convolutional neural architecture
Thompson et al. Building lego using deep generative models of graphs
Stratigopoulos et al. Testing and reliability of spiking neural networks: A review of the state-of-the-art
Eldebiky et al. Correctnet: Robustness enhancement of analog in-memory computing for neural networks by error suppression and compensation
US20150262061A1 (en) Contextual real-time feedback for neuromorphic model development
Xu et al. Reliability-driven memristive crossbar design in neuromorphic computing systems
Shen et al. A fast learning algorithm of neural network with tunable activation function
Kestor et al. Understanding scale-dependent soft-error behavior of scientific applications
Anker et al. Flag gadgets based on classical codes
Velazco et al. SEU fault tolerance in artificial neural networks
Krithivasan et al. Efficiency attacks on spiking neural networks
Coleman et al. A comparison of soft-fault error models in the parallel preconditioned flexible GMRES
Bose A scalable sparse distributed neural memory model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210209