CN112348174A

CN112348174A - Fault-tolerant recurrent neural network architecture searching method and system

Info

Publication number: CN112348174A
Application number: CN202011356453.XA
Authority: CN
Inventors: 王蕾; 胡凯; 丁东; 田烁; 冯权友; 周理; 郑重; 励楠; 冯超超; 王永文
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-11-26
Filing date: 2020-11-26
Publication date: 2021-02-09

Abstract

The invention discloses a fault-tolerant recurrent neural network architecture searching method and a fault-tolerant recurrent neural network architecture searching system_cAnd degree of confusion after fault injection ppl_fAnd based on the degree of confusion ppl of the normal assessment_cAnd degree of confusion after fault injection ppl_fWeighting and calculating reward R of each new candidate recurrent neural network architecture, updating the population based on the reward R, and finally encouraging a search algorithm to continuously search out the optimal recurrent neural network architecture with high performance and high fault tolerance according to the reward, so that the obtained optimal recurrent neural network architecture has very strong calculation fault tolerance and weight storage fault tolerance respectively, and the found optimal recurrent neural network architecture can be deployed on FPGA (field programmable gate array), ASIC (application specific integrated circuit) hardware and an accelerator based on RRAM (random access memory)Has better robustness.

Description

Fault-tolerant recurrent neural network architecture searching method and system

Technical Field

The invention relates to an architecture optimization technology of a Recurrent Neural Network (RNN) on a hardware platform, in particular to a fault-tolerant recurrent neural network architecture searching method and system.

Background

A Recurrent Neural Network (Recurrent Neural Network) is a deep Neural Network mainly based on sequence data input, and has important applications in the fields of natural language processing, such as speech recognition, language modeling, machine translation, and the like. Its variants include long-short term memory network (LSTM), gated cyclic unit network (GRU), etc. Neural architecture search (neural architecture search) is a sub-domain of AutoML. First proposed by google corporation, amazon, microsoft, etc. followed up in succession. In China, a great deal of research work is carried out in Noah's Huacheng laboratory and millet AI laboratory. The neural architecture search aims to replace the work of designing a neural network architecture by human experts by utilizing a large amount of calculation power, and achieves top performance in tasks such as image recognition, text prediction, object detection and the like. Fault Tolerance (Fault Tolerance) is a concept created to combat hardware failures. There are many reasons for the failure, including: weak joint reinforcement, radiation, temperature and humidity changes, aging in the process, etc. The hardware we focus on in this paper is primarily the accelerator used to deploy the neural network. The major neural network accelerator devices at present include: FPGA, ASIC, and RRAM-based accelerators. Where FPGAs and ASICs are hardware devices based on CMOS circuitry, and are less and less sensitive to environmental noise as CMOS fabrication processes become smaller. Accelerators based on RRAM are prone to hard failures, such as SAFs (Stuck-at-Faults), because the fabrication technology is still immature. The most traditional way of fault tolerance is redundant hardware, but this results in increased area and power consumption costs. Sensitivity analysis is also a fault-tolerant way, which is achieved by detecting a fault-prone part of the circuit and applying redundancy protection. Fault tolerance Training (Fault Tolerant Training) is a Fault tolerance mode for a neural network, and a trained neural network has certain Fault tolerance by inserting faults into weight Training of the network.

Faults that may be encountered during the deployment of an RNN network to a hardware device may be classified into two categories: and calculating faults and weight value storage faults. Computing failures occur mainly in CMOS based hardware platforms such as FPGAs and AISCs, where the probability of soft errors is much higher than hard errors. Weight storage failures occur mainly on RRAM-based accelerators. Therefore, how to perform fault tolerance when deploying the neural network to the edge device becomes a key technical problem to be solved urgently, for soft errors and hard errors which may occur when deploying the neural network to the edge device.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a fault-tolerant recurrent neural network architecture searching method and a fault-tolerant recurrent neural network architecture searching system, which achieve the aim of improving the fault tolerance by using architecture optimization, thereby realizing the final finding of a high-performance and high-fault-tolerant recurrent neural network architecture and effectively solving the problems of soft errors and hard errors caused by the deployment of a neural network on edge equipment.

In order to solve the technical problems, the invention adopts the technical scheme that:

a fault-tolerant recurrent neural network architecture search method comprises the following steps:

1) aiming at a target recurrent neural network endowed with a preset sharing weight, randomly initializing a population containing a plurality of candidate recurrent neural network architectures as a current population;

2) calculating the confusion degree ppl of the normal evaluation of each candidate recurrent neural network architecture in the current population_cAnd degree of confusion after fault injection ppl_fAnd based on the degree of confusion ppl of the normal assessment_cAnd degree of confusion after fault injection ppl_fCalculating the reward R of each candidate recurrent neural network architecture in a weighted mode;

3) generating new candidate recurrent neural network architectures by taking each candidate recurrent neural network architecture in the current population as a parent architecture, and calculating the normal evaluation confusion degree ppl of each new candidate recurrent neural network architecture_cAnd degree of confusion after fault injection ppl_fAnd based on the degree of confusion ppl of the normal assessment_cAnd degree of confusion after fault injection ppl_fWeighted calculation of reward R of each new candidate recurrent neural network architecture;

4) if the reward R of the new candidate recurrent neural network architecture is larger than the optimal reward in the current population, adding the new candidate recurrent neural network architecture into the current population, and deleting the oldest candidate recurrent neural network architecture in the current population to enable the population total number to be unchanged;

5) judging the iteration number R_{Number of rounds}Greater than or equal to a preset threshold value R_{Maximum number of rounds}If the search is not true, skipping to execute the step 3) to continue the iterative search; otherwise, outputting the candidate recurrent neural network architecture corresponding to the optimal reward in the current population as the optimal recurrent neural network architecture obtained by searching.

Optionally, the calculation function expression of the reward R is as follows:

R＝(1-α_r)*ppl_c+α_r*ppl_f

in the above formula, α_rIs a value range of [0,1]Predetermined weight coefficient of between, ppl_cRepresenting the perplexity of the Normal evaluation of the candidate recurrent neural network architecture, ppl_fRepresenting the degree of confusion after fault injection for the candidate recurrent neural network architecture.

Optionally, step 2) is preceded by a step of training the target recurrent neural network to generate a preset shared weight, and the calculation function expression of the loss function for the weight in the training process is as follows:

L＝(1–α_l)*CE_c+α_l*CE_f

in the above formula, L represents a loss function of weight, α_lIs a value range of [0,1]Between the preset weight coefficients, CE_cRepresenting cross-entropy loss, CE, under normal training of the searched recurrent neural network architecture_fRepresenting the cross-entropy loss of the searched recurrent neural network architecture after fault injection.

Optionally, the functional expression of the target recurrent neural network is as follows:

c₀＝sigmoid(W_xc x_t+W_hc h_prev),

h₀＝c₀ tanh(W_xh x+W_hh h_prev)+(1-c₀)h_prev,

c_t＝sigmoid(W_c c_t-1),

h_t＝c_t f(W_h h_t-1)+(1-c_t)h_t-1

in the above formula, W_xcRepresenting inputs to an initial cell state weight matrix, W_hcWeight matrix representing hidden layer to cell state, W_xhRepresenting inputs to a hidden layer weight matrix, W_hhRepresents the weight matrix from the previous hidden layer to the current hidden layer, W_cRepresenting a cell state weight matrix, W_hRepresents a hidden layer weight matrix, h_prevOutput representing the last hidden layer, c₀,h₀Respectively representing the cell state vector and the hidden state vector at the initial time, c_t,h_tRespectively representing the cell state vector and the hidden state vector at time t, c_t-1,h_t-1Respectively representing the cell state vector and the hidden state vector at time t-1, the function f () representing the selected activation function at each node, x_tRepresenting the initial input vector and x representing the current input vector.

Optionally, the step 5) is followed by a step of constructing an optimal recurrent neural network architecture obtained by verifying the calculation fault and the weight fault.

Optionally, the function expression of the optimal recurrent neural network architecture after the construction calculation fault is as follows:

b＝θ2^α-1(-1)^β

c₀＝sigmoid(W_xc x_t+W_hc h_prev+b₁),

h₀＝c₀ tanh(W_xh x+W_hh h_prev+b₂)+(1-c₀)h_prev,

c_t＝sigmoid(W_c c_t-1),

h_t＝c_t f(W_h h_t-1)+(1-c_t)h_t-1+b₃

in the above formula, b represents the bit flipping of a random bit, θ represents which bit weight fails, θ follows bernoulli distribution, α represents the bit with bias, α follows uniform distribution, β represents the positive or negative bias, β follows uniform distribution, and b₁，b₂，b₃Is an independently generated bit flip b, W_xcRepresenting inputs to an initial cell state weight matrix, W_hcWeight matrix representing hidden layer to cell state, W_xhRepresenting inputs to a hidden layer weight matrix, W_hhRepresents the weight matrix from the previous hidden layer to the current hidden layer, W_cRepresenting a cell state weight matrix, W_hRepresents a hidden layer weight matrix, h_prevOutput representing the last hidden layer, c₀,h₀Respectively representing the cell state vector and the hidden state vector at the initial time, c_t,h_tRespectively representing the cell state vector and the hidden state vector at time t, c_t-1,h_t-1Respectively representing the cell state vector and the hidden state vector at time t-1, the function f () representing the selected activation function at each node, x_tRepresenting the initial input vector, x representing the current input vector, and sigmoid representing the sigmoid activation function.

Optionally, the construction weight isThe barriers include a weight matrix W for the input to the initial cell state_xcHidden layer to cell state weight matrix W_hcInputting the weight matrix W into the hidden layer_xhWeight matrix W from previous hidden layer to current hidden layer_hhCell state weight matrix W_cHidden layer weight matrix W_hPerforms a process shown by the following equation:

W₁＝(1–θ)W₀+θe，

θ～Bernoulli(p₀+p₁)^H×W，

m～Bernoulli(p₁/(p₀+p₁))^H×W， (4)

e＝R^w sgn(W)m

in the above formula, W₁To construct a target matrix after weight failure, W₀In order to construct a target matrix before weight failure, theta represents which bit weight can fail, theta obeys Bernoulli distribution, e represents a failure target value, and p represents₀Indicating a high resistance state fault in a hard fault, p₁Indicating a low resistance state fault in a hard fault, H and W indicating the height and width of the target matrix, respectively, m indicating the type of fault at which the fault occurred, m obeying a Bernoulli distribution, Bernoulli indicating a Bernoulli distribution, R^wRepresenting the maximum range of data that a piece of hardware can represent.

In addition, the invention also provides a fault-tolerant recurrent neural network architecture search system, which comprises a computer device, wherein the computer device comprises a microprocessor and a memory which are connected with each other, and the microprocessor is programmed or configured to execute the steps of the fault-tolerant recurrent neural network architecture search method.

In addition, the invention also provides a fault-tolerant recurrent neural network architecture search system, which comprises a computer device, wherein the computer device comprises a microprocessor and a memory which are connected with each other, and a computer program which is programmed or configured to execute the fault-tolerant recurrent neural network architecture search method is stored in the memory.

Furthermore, the present invention also provides a computer-readable storage medium having stored therein a computer program programmed or configured to perform the fault-tolerant recurrent neural network architecture search method.

Compared with the prior art, the invention has the following advantages: the invention adopts an evolutionary algorithm to search the circulating neural network architecture and calculates the confusion degree ppl of the normal evaluation of each new candidate circulating neural network architecture_cAnd degree of confusion after fault injection ppl_fAnd based on the degree of confusion ppl of the normal assessment_cAnd degree of confusion after fault injection ppl_fAnd calculating the reward R of each new candidate recurrent neural network architecture in a weighted mode, and updating the population based on the reward R, so that a search algorithm is encouraged to continuously search candidate architectures with high performance and high fault tolerance according to the reward, the obtained optimal recurrent neural network architecture has strong weight storage fault tolerance, and the found optimal recurrent neural network architecture has better robustness when being deployed on FPGA, ASIC hardware and an accelerator based on RRAM.

Drawings

FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.

FIG. 2 is a schematic overall flow chart of a method according to an embodiment of the present invention.

Fig. 3 is two schematic structural diagrams of a target recurrent neural network in an embodiment of the present invention.

Fig. 4 is an experimental result of a calculation fault experiment in the embodiment of the present invention.

Fig. 5 is an experimental result of a weight storage failure experiment in the embodiment of the present invention.

Detailed Description

As shown in fig. 1, the method for searching a fault-tolerant recurrent neural network architecture of the present embodiment includes:

2) calculating the confusion degree ppl of the normal evaluation of each candidate recurrent neural network architecture in the current population_cAnd after fault injectionPerplexity of ppl_fAnd based on the degree of confusion ppl of the normal assessment_cAnd degree of confusion after fault injection ppl_fCalculating the reward R of each candidate recurrent neural network architecture in a weighted mode;

In order to search the RNN architecture with high performance and high fault tolerance in this huge search space, the fault-tolerant recurrent neural network architecture search method of this embodiment uses two reward signals to achieve this goal. The method comprises the following steps: perplexity of Normal evaluation of candidate recurrent neural network architecture_cDegree of confusion for Normal assessment ppl_cRefers to the confusion of the candidate architecture (which is a measure for evaluating the quality of a language model, the lower the confusion represents the more excellent the model), and the confusion after fault injection of the candidate recurrent neural network architecture (ppl)_fDegree of confusion after injection of a fault ppl_fRefers to the confusion of the candidate architecture after being injected with the fault. It should be noted that the confusion is a well-known calculation method in the recurrent neural network, and the confusion of the above-mentioned normal evaluationDegree of ppl_cConfusion after injection failure ppl_fThe confusion calculated in the two states of the candidate recurrent neural network architecture is only obtained, and improvement on the confusion calculation is not involved, so the specific calculation details of the confusion are not detailed herein. The search strategy in this embodiment employs an evolutionary algorithm. Firstly, a candidate architecture group is randomly initialized, and the ppl of the candidate architecture in the group is continuously evaluated_cAnd ppl_fAnd using both as a constant alpha_rBalancing results in a reward (which is also the smaller the better, according to the characteristics of the confusion), and the search algorithm is encouraged to continuously search out candidate architectures with high performance and high fault tolerance according to the reward. The weighted calculation of the reward R of each new candidate recurrent neural network architecture can be implemented in various ways, and as an alternative implementation, the calculation function of the reward R in the embodiment is expressed as follows:

R＝(1-α_r)*ppl_c+α_r*ppl_f

in the above formula, α_rIs a value range of [0,1]Predetermined weight coefficient of between, ppl_cRepresenting the perplexity of the Normal evaluation of the candidate recurrent neural network architecture, ppl_fRepresenting the degree of confusion after fault injection for the candidate recurrent neural network architecture. If the reward of the new framework is smaller than that of the current optimal framework, the new framework is marked as a parent framework and added into the population, the oldest candidate framework in the population is deleted, and the population total number is kept unchanged. This process will iterate until the algorithm runs over.

As shown in fig. 2, step 2) in this embodiment includes a step of training a target recurrent neural network to generate a preset shared weight, and a calculation function expression of a loss function for the weight in the training process is as follows:

L＝(1–α_l)*CE_c+α_l*CE_f

in the above formula, L represents a loss function of weight, α_lIs a value range of [0,1]Between the preset weight coefficients, CE_cRepresenting cross-entropy loss, CE, under normal training of the searched recurrent neural network architecture_fRepresenting the cross-entropy loss of the searched recurrent neural network architecture after fault injection. In order to accelerate the evaluation process of the candidate architecture, the fault-tolerant recurrent neural network architecture search method of the embodiment adopts a weight sharing technology during evaluation, that is, all the candidate architectures share a set of weights. In order to enable the set of weights to have certain fault-tolerant characteristics, in the training of the shared weights, two parts of weighted sum processing are carried out on cross entropy loss transmitted in the reverse direction. In this embodiment, the set of shared weights will be trained for 150 rounds, and the obtained shared weights will have a certain fault-tolerant capability, so that the search quality of the fault-tolerant architecture can be further improved. After training is completed, the method can be used for randomly initializing a population containing a plurality of candidate recurrent neural network architectures as a current population in step 1) of the method. Referring to fig. 2, after the optimal recurrent neural network architecture obtained by searching in step 5), hardware deployment may be performed on the optimal recurrent neural network architecture obtained by searching.

In this embodiment, the functional expression of the target recurrent neural network is shown as follows:

c₀＝sigmoid(W_xc x_t+W_hc h_prev),

h₀＝c₀ tanh(W_xh x+W_hh h_prev)+(1-c₀)h_prev,

c_t＝sigmoid(W_c c_t-1),

h_t＝c_t f(W_h h_t-1)+(1-c_t)h_t-1

in the above formula, W_xcRepresenting inputs to an initial cell state weight matrix, W_hcWeight matrix representing hidden layer to cell state, W_xhRepresenting inputs to a hidden layer weight matrix, W_hhRepresents the weight matrix from the previous hidden layer to the current hidden layer, W_cRepresenting a cell state weight matrix, W_hRepresents a hidden layer weight matrix, h_prevOutput representing the last hidden layer, c₀,h₀Respectively representing the cell state vector and the hidden state vector at the initial time, c_t,h_tRespectively representing the cell state vector and the hidden state vector at time t, c_t-1,h_t-1Respectively representing the cell state vector and the hidden state vector at time t-1, the function f () representing the selected activation function at each node, x_tRepresenting the initial input vector and x representing the current input vector. The target recurrent neural network shown in the above formula is an RNN network enhanced by a highway bypass (highway bypass) technology, and compared with an ordinary RNN network, the target recurrent neural network effectively avoids the gradient explosion and gradient disappearance phenomena that may occur in the original RNN network. In the embodiment, the target recurrent neural network is a unit of a directed acyclic graph with 12 nodes, and each node can select two activation functions, namely ReLU and tanh, as shown in sub-graph (1) and sub-graph (2) of fig. 3, which are two forms of the target recurrent neural network. Therefore, the size of the search space of the optimal recurrent neural network architecture in this embodiment is 2¹²*12！≈2*10¹²。

As an optional implementation manner, in this embodiment, step 5) is followed by a step of constructing an optimal recurrent neural network architecture obtained by verifying a calculation fault and a weight fault, and by this step, a fault that may be encountered in a process of deploying an RNN network to a hardware device may be simulated to test the optimal recurrent neural network architecture.

Computing failures occur mainly in CMOS based hardware platforms such as FPGAs and AISCs, where the probability of soft errors is much higher than hard errors. In the embodiment, it is assumed that all data in the hardware platform is 8-bit quantized storage. As an optional implementation manner, in this embodiment, a random bit flipping is performed on each multiply-accumulate unit to simulate a calculation error, and a functional expression of the optimal cyclic neural network architecture after the calculation fault is constructed is as follows:

b＝θ2^α-1(-1)^β

c₀＝sigmoid(W_xc x_t+W_hc h_prev+b₁),

h₀＝c₀ tanh(W_xh x+W_hh h_prev+b₂)+(1-c₀)h_prev,

c_t＝sigmoid(W_c c_t-1),

h_t＝c_t f(W_h h_t-1)+(1-c_t)h_t-1+b₃

in the above formula, b represents the bit flipping of a random bit, θ represents which bit weight fails, θ follows bernoulli distribution, α represents the bit with bias, α follows uniform distribution, β represents the positive or negative bias, β follows uniform distribution, and b₁，b₂，b₃Is an independently generated bit flip b, W_xcRepresenting inputs to an initial cell state weight matrix, W_hcWeight matrix representing hidden layer to cell state, W_xhRepresenting inputs to a hidden layer weight matrix, W_hhRepresents the weight matrix from the previous hidden layer to the current hidden layer, W_cRepresenting a cell state weight matrix, W_hRepresents a hidden layer weight matrix, h_prevOutput representing the last hidden layer, c₀,h₀Respectively representing the cell state vector and the hidden state vector at the initial time, c_t,h_tRespectively representing the cell state vector and the hidden state vector at time t, c_t-1,h_t-1Respectively representing the cell state vector and the hidden state vector at time t-1, the function f () representing the selected activation function at each node, x_tRepresenting the initial input vector, x representing the current input vector, and sigmoid representing the sigmoid activation function. The dimension of the intermediate result tensor is denoted as (H, W).

Weight storage failures occur mainly on RRAM-based accelerators. Assuming that all data in the hardware platform is 8-bit quantized storage, the data range that can be represented is:

[-R^w,R^w]＝[-2^-l(2^Q+1-1),2^-l(2^Q+1)-1]

in the above formula, R^wThe maximum range of data that certain hardware can represent is represented, Q represents the bit width, and l represents the decimal length. Because the accelerator manufacturing process is not yet mature, stuck-at faults are likely to occur: when in the high resistance state, i.e. SAF0, pairThe corresponding weight value will be set to zero; when in the low resistance state, i.e., SAF1, if the weight is negative, the corresponding weight is set to the lower bound-R of the representation range^wIf the weight is positive, the corresponding weight is set as the upper bound R of the representation range^w。

As an alternative implementation, the construction of the weight value fault in this embodiment includes inputting the weight value matrix W to the initial cell state_xcHidden layer to cell state weight matrix W_hcInputting the weight matrix W into the hidden layer_xhWeight matrix W from previous hidden layer to current hidden layer_hhCell state weight matrix W_cHidden layer weight matrix W_hPerforms a process shown by the following equation:

W₁＝(1–θ)W₀+θe，

θ～Bernoulli(p₀+p₁)^H×W，

m～Bernoulli(p₁/(p₀+p₁))^H×W，

e＝R^w sgn(W)m

For example, with c_t＝sigmoid(W_c c_t-1) Cell state weight matrix W in_cFor example, the cell state weight matrix W_cThe processing shown by the following equation is performed as the target matrix:

W_c1＝(1–θ)W_c+θe，

θ～Bernoulli(p₀+p₁)^H×W，

m～Bernoulli(p₁/(p₀+p₁))^H×W，

e＝R^w sgn(W)m

in the above formula, W_c1To construct a cell state weight matrix after weight failure, W_cFor the cell state weight matrix, θ indicates which bit weight is faulty, θ follows Bernoulli distribution, e indicates the target value of the fault, p₀Indicating a high resistance state fault in a hard fault, p₁Indicating a low resistance state fault in a hard fault, H and W indicating the height and width of the target matrix, respectively, m indicating the type of fault at which the fault occurred, m obeying a Bernoulli distribution, Bernoulli indicating a Bernoulli distribution, R^wRepresenting the maximum range of data that a piece of hardware can represent. Wherein the probability of being in a high-impedance state when a fault occurs is p₀In the low resistance state, the probability is p₁. Theta represents which weight of a position fails; the m obeys Bernoulli distribution and shows that the SAF0 or the SAF1 fails when the failure occurs; e represents the fault target value (0 or + -R)^w). Except that the weight matrix W_cBesides, six weight matrix (W)_xc、W_hc、W_xh、W_hh、W_c、W_h) A weight storage fault may be injected.

In order to verify the fault-tolerant recurrent neural network architecture searching method, a high-performance and high-fault-tolerance RNN network mechanism is searched on two ptb (penn treebank) data sets, and the architecture is well migrated to a larger Wikitxt-2 data set, so that the effectiveness of the method is further proved. In this example, the above experiment was performed in a PyTorch frame, and the display card model was Tesla V100. All data in the experiment were quantized to an 8-bit fixed-point representation.

Firstly, calculating a fault experiment.

In this experiment, two balance parameters alpha are determined through grid search_rAnd alpha_lBoth set to 0.5, the final results are shown in fig. 4, where the best architectures found by both ENAS and DARTS works are used as baseline for experimental comparison. C-FT-RThe NN is an optimal architecture which can resist the calculation fault and is obtained by the system under the condition of not performing fault-tolerant training, and the C-FTT-RNN is an optimal architecture which can resist the calculation fault and is obtained by the system after the fault-tolerant training. Experiments were performed on two data sets, PTB and WT2, respectively.

Confusion degree ppl of normal evaluation when no fault exists in architecture found on PTB data set_cIt was 57.8. When the failure rate is 3 x 10^-5Time, degree of confusion after injection of a fault ppl_f67.6, when the failure rate is 1 x 10^-4Time, degree of confusion after injection of a fault ppl_f72.3, when the failure rate is 3 x 10^-4Time, degree of confusion after injection of a fault ppl_fIs 85.3. The found architecture is then migrated to the wikitxt-2 dataset, with the confusion degree ppl of the architecture's normal evaluation in the absence of failures_fIt was 69.1. When the failure rate is 3 x 10^-5Time, degree of confusion after injection of a fault ppl_f76.4, when the failure rate is 1 x 10^-4Time, degree of confusion after injection of a fault ppl_f80.6, when the failure rate is 3 x 10^-4Time, degree of confusion after injection of a fault ppl_fIs 94.3. The optimal recurrent neural network architecture obtained by the recurrent neural network architecture searching method capable of tolerating the faults has strong fault tolerance of the calculation faults, and the found optimal recurrent neural network architecture has better robustness when being deployed on FPGA hardware and ASIC hardware.

And secondly, carrying out a weight storage fault experiment.

In this experiment, two balance parameters alpha are determined through grid search_rAnd alpha_lThe final results are shown in FIG. 5 with the optimal architectures found for both ENAS and DARTS as baseline for experimental comparisons, set at 0.4 and 0.6, respectively. The W-FT-RNN is an optimal architecture which can resist weight storage and calculation faults and is obtained by the system after fault-tolerant training. Experiments were performed on two data sets, PTB and WT2, respectively. According to actual literature data, when a fixed fault occurs, the high-resistance fault rate is 83.7%, and the low-resistance fault rate is 16.3%. In the experiment IThey continue to use this data. Confusion degree ppl of normal evaluation when no fault exists in architecture found on PTB data set_cIt was 57.9. When the failure rate is 8%, the degree of confusion after the failure is injected ppl_f71.3, the degree of confusion after fault injection ppl when the fault rate is 10%_f73.9, the confusion after fault injection ppl when the fault rate is 12%_fIs 90.2. And then migrating the found architecture to a wikitxt-2 data set, wherein the perl of the confusion degree of normal evaluation is realized when no fault exists_cIt was 69.1. When the failure rate is 8%, the degree of confusion after the failure is injected ppl_fAt a failure rate of 10%, 81.8, the degree of confusion after the injection of the failure ppl_f83.1, the degree of confusion after fault injection ppl when the fault rate is 12%_fIt was 97.5. This shows that the optimal recurrent neural network architecture obtained by the method for searching for a recurrent neural network architecture with fault tolerance of the embodiment has a strong fault tolerance of weight storage, and the found optimal recurrent neural network architecture has a better robustness when deployed on an accelerator based on RRAM.

In addition, the present embodiment also provides a fault-tolerant recurrent neural network architecture search system, which includes a computer device including a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to execute the steps of the fault-tolerant recurrent neural network architecture search method.

In addition, the present embodiment also provides a fault-tolerant recurrent neural network architecture search system, which includes a computer device including a microprocessor and a memory connected to each other, wherein the memory stores therein a computer program programmed or configured to execute the above fault-tolerant recurrent neural network architecture search method.

Furthermore, the present embodiment also provides a computer-readable storage medium having stored therein a computer program programmed or configured to execute the aforementioned fault-tolerant recurrent neural network architecture search method.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is directed to methods, apparatus (systems), and computer program products according to embodiments of the application wherein instructions, which execute via a flowchart and/or a processor of the computer program product, create means for implementing functions specified in the flowchart and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims

1. A fault-tolerant recurrent neural network architecture search method is characterized by comprising the following steps:

2. The method of claim 1, wherein the reward R is calculated as a function of the following expression:

R＝(1-α_r)*ppl_c+α_r*ppl_f

in the above formula, α_rIs a value range of [0,1]Predetermined weight coefficient of between, ppl_cConfusion representing the normal evaluation of candidate recurrent neural network architecturesDegree of rotation, ppl_fRepresenting the degree of confusion after fault injection for the candidate recurrent neural network architecture.

3. The method according to claim 1, wherein step 2) is preceded by a step of training the target recurrent neural network to generate the preset shared weight, and the calculation function of the loss function of the weight in the training process is expressed as follows:

L＝(1–α_l)*CE_c+α_l*CE_f

4. The method according to claim 1, wherein the functional expression of the target recurrent neural network is as follows:

c₀＝sigmoid(W_xcx_t+W_hch_prev),

h₀＝c₀tanh(W_xhx+W_hhh_prev)+(1-c₀)h_prev,

c_t＝sigmoid(W_cc_t-1),

h_t＝c_tf(W_hh_t-1)+(1-c_t)h_t-1

5. The method for searching for a fault-tolerant recurrent neural network architecture of claim 4, wherein step 5) is followed by the step of constructing an optimal recurrent neural network architecture for verification of computational and weight failures.

6. The method according to claim 5, wherein the function of constructing the optimal recurrent neural network architecture after the calculation of the fault is expressed as follows:

b＝θ2^α-1(-1)^β

c₀＝sigmoid(W_xcx_t+W_hch_prev+b₁),

h₀＝c₀tanh(W_xhx+W_hhh_prev+b₂)+(1-c₀)h_prev,

c_t＝sigmoid(W_cc_t-1),

h_t＝c_tf(W_hh_t-1)+(1-c_t)h_t-1+b₃

7. The method of claim 5, wherein constructing the weight fault comprises inputting the initial cell state weight matrix W_xcHidden layer to cell state weight matrix W_hcInputting the weight matrix W into the hidden layer_xhWeight matrix W from previous hidden layer to current hidden layer_hhCell state weight matrix W_cHidden layer weight matrix W_hPerforms a process shown by the following equation:

W₁＝(1–θ)W₀+θe，

θ～Bernoulli(p₀+p₁)^H×W，

m～Bernoulli(p₁/(p₀+p₁))^H×W，

e＝R^wsgn(W)m

in the above formula, W₁To construct a target matrix after weight failure, W₀In order to construct a target matrix before weight failure, theta represents which bit weight can fail, theta obeys Bernoulli distribution, e represents a failure target value, and p represents₀Indicating a high resistance state fault in a hard fault, p₁Indicating low resistance state fault in hard fault, H and W respectively indicating height and width of target matrix, m indicating fault type when fault occurs, m obeying Bernoulli distribution, Bernoulli denotes the Bernoulli distribution, R^wRepresenting the maximum range of data that a piece of hardware can represent.

8. A fault-tolerant recurrent neural network architecture search system, comprising a computer device including an interconnected microprocessor and memory, wherein the microprocessor is programmed or configured to perform the steps of the fault-tolerant recurrent neural network architecture search method of any one of claims 1 to 7.

9. A fault-tolerant recurrent neural network architecture search system, comprising a computer device comprising an interconnected microprocessor and memory, wherein the memory has stored therein a computer program programmed or configured to perform the fault-tolerant recurrent neural network architecture search method of any one of claims 1 to 7.

10. A computer-readable storage medium having stored thereon a computer program programmed or configured to perform the fault-tolerant recurrent neural network architecture search method of any one of claims 1-7.