CN117872097A

CN117872097A - Method and device for generating automatic test vector of digital circuit based on reinforcement learning

Info

Publication number: CN117872097A
Application number: CN202410263521.XA
Authority: CN
Inventors: 李文星; 叶靖
Original assignee: Zhongke Jianxin Beijing Technology Co ltd
Current assignee: Zhongke Jianxin Beijing Technology Co ltd
Priority date: 2024-03-08
Filing date: 2024-03-08
Publication date: 2024-04-12

Abstract

The application discloses a method and a device for generating an automatic test vector of a digital circuit based on reinforcement learning. Wherein the method comprises the following steps: in the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, acquiring historical data of a test generation algorithm PODEM oriented to path decision, namely, carrying out first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested to activate data formed by a test mode of a first fault, and learning a rollback strategy from the historical data by utilizing a reinforcement learning algorithm; and in the process of activating the test mode of the second fault by performing N-th round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, the PODEM is guided to select a rollback path by using a rollback strategy, so that a test vector of the chip to be tested is generated. The method and the device solve the technical problem that the backtracking times are more in the test vector generation process in the related technology.

Description

Method and device for generating automatic test vector of digital circuit based on reinforcement learning

Technical Field

The application relates to the field of chip testing, in particular to a method and a device for generating an automatic test vector of a digital circuit based on reinforcement learning.

Background

This section is intended to provide a background or context for the matter recited in the claims or specification, which is not admitted to be prior art by inclusion in this section.

The progressive advance of advanced semiconductor technology increases the complexity of chip design, resulting in increased chip defects that make testing of integrated circuits challenging, and for this purpose, an automatic test vector generation (Automatic Test Pattern Generation, ATPG) algorithm may be adopted to solve the problem, which searches for effective test vectors to detect all possible faults in the circuit as completely as possible, and the core goal of ATPG is to obtain a test vector set with high fault coverage, how to obtain a test vector set that is compact and has high fault coverage, which is critical for efficient detection of faults, which can shorten the test time of production test, and reduce the test cost.

For a combinational logic circuit with n-bit input, 2 is needed for exhaustive testing ⁿ And input vectors. Although testing using an exhaustive method can achieve a high failure rate, in practical production applications, conventional ATPG mainly uses a path sensitization method, which means that fault detection is completed in a circuit through fault sensitization, fault propagation and line value confirmation, and classical path sensitization methods include a D algorithm (using the idea of multidimensional sensitization, and simultaneously sensitizing all paths from a fault location to all output ends of the circuit), a podm algorithm (fully named Path Oriented Decision Making, i.e., path-oriented decision, optimizing a search process using implicit enumeration, making decisions only at input ends, and thus greatly reducing a search space), and a FAN algorithm (fully named FAN-out-oriented, introducing and utilizing FAN-out nodes in the circuit to reduce the complexity of ATPG).

The conventional automatic test vector generation method adopts a heuristic strategy to search for an effective test vector, however, when single-fixation fault detection in a circuit is completed, a large amount of backtracking operation is caused, and the efficiency of test generation is affected. The PODEM is a relatively mature path sensitization algorithm, and the searching process is optimized by using implicit enumeration, so that the testing efficiency is improved. When the podm backs off, the back off path may be selected randomly, or the relevant heuristic information may be combined to back off. The common heuristic information comprises a logic level and a testability measurement method of the circuit, and the backtracking times can be effectively reduced when the heuristic information is combined for rollback, so that the efficiency of test generation is improved, however, the PODEM can still cause a larger number of operations when the single-fixed fault detection is completed by combining the logic level and the testability measurement method of the circuit. When a large-scale circuit with complex search space is processed by the heuristic ATPG method, the execution time is very long, the performance is worry, and the requirement of rapid generation of the large-scale circuit cannot be met.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating an automatic test vector of a digital circuit based on reinforcement learning, which are used for at least solving the technical problem of more backtracking times in the test vector generating process in the related technology.

According to one aspect of the embodiments of the present application, there is provided a method for generating an automatic test vector of a digital circuit based on reinforcement learning, including: in the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, acquiring historical data of a path decision oriented test generation algorithm PODEM, wherein the historical data is formed in the process of activating a test mode of a first fault by utilizing the path decision oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and activating a plurality of fault test modes by utilizing the automatic test vector generation algorithm ATPG to generate the test vector for the chip to be tested, wherein the test mode of multiple rounds of fault sensitization, fault propagation and line value confirmation are required to be performed; learning a rollback strategy from historical data by using a reinforcement learning algorithm; in the process of activating a test mode of a second fault by performing N-th round of fault sensitization, fault propagation and line value confirmation on a chip to be tested by using a path decision-oriented test generation algorithm PODEM, a rollback strategy is utilized to guide the path decision-oriented test generation algorithm PODEM to select a rollback path, so that a test vector of the chip to be tested is generated, and N is an integer larger than 1.

Optionally, learning the rollback policy from the historical data using a reinforcement learning algorithm includes: utilizing historical data to construct a Markov decision process MDP, wherein the Markov decision process MDP is a mathematical model for describing interaction between an agent and an environment in a reinforcement learning algorithm; and using a reinforcement learning algorithm to enable the agent to learn the search strategy from the historical data to obtain a reinforcement learning table for representing the rollback strategy. Guiding a test generation algorithm PODEM facing path decision to select a rollback path by utilizing a rollback strategy, wherein the method comprises the following steps: the reinforcement learning table is embedded into a test generation algorithm PODEM for path decision, and the test generation algorithm PODEM for path decision embedded with the reinforcement learning table is utilized to select a rollback path so as to reduce backtracking times and shorten test generation time.

Optionally, obtaining historical data of a path decision oriented test generation algorithm PODEM includes: in the process of activating a test mode of a first fault by performing first-round fault sensitization, fault propagation and line value confirmation on a chip to be tested by using a test generation algorithm PODEM oriented to path decision, a rollback path, primary input reachability and frequency of backtracking occurrence are obtained, and historical data comprise the rollback path, the primary input reachability and the frequency of backtracking occurrence.

Optionally, constructing the markov decision process MDP using the historical data includes: the historical data is used to build a Markov decision process MDP [ s, a, r, s '] comprising the current circuit state s, the execution action a, the instant prize r and the next circuit state s' of the chip to be tested.

Optionally, in the learning of the rollback policy from the historical data using the reinforcement learning algorithm, the method further comprises: each line traversed by the path decision-oriented test generation algorithm PODEM in the rollback process is defined as a circuit state, the circuit state comprises a line ID and a target value, the line ID is a unique identifier of a line in a chip to be tested, and the target value is a target which needs to be met by the path decision-oriented test generation algorithm PODEM in the rollback process.

Optionally, in the learning of the rollback policy from the historical data using the reinforcement learning algorithm, the method further comprises: and taking a fan-in path selected by a test generation algorithm PODEM facing the path decision in the rollback process as a current execution action.

Optionally, in the learning of the rollback policy from the historical data using the reinforcement learning algorithm, the method further comprises: a negative reward is allocated for each backoff step in the backoff process for the path decision oriented test generation algorithm podm, and a positive reward is allocated when the primary input is reached.

According to another aspect of the embodiments of the present application, there is also provided a digital circuit automatic test vector generating apparatus based on reinforcement learning, including: the system comprises an acquisition unit, a path decision-oriented test generation algorithm PODEM and a test control unit, wherein the acquisition unit is used for acquiring historical data of the path decision-oriented test generation algorithm PODEM in the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, the historical data are data formed in the process of activating a test mode of a first fault by utilizing the path decision-oriented test generation algorithm PODEM to perform first round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, and the test mode of a plurality of faults is activated by utilizing the automatic test vector generation algorithm ATPG to generate the test vector for the chip to be tested, wherein the test mode of the first fault is required to be subjected to multiple rounds of fault sensitization, fault propagation and line value confirmation; a learning unit for learning a rollback policy from the history data using a reinforcement learning algorithm; the generating unit is used for guiding the path decision oriented test generating algorithm PODEM to select a rollback path by utilizing a rollback strategy in the process of activating a test mode of a second fault by carrying out N-th round of fault sensitization, fault propagation and line value confirmation on the chip to be tested by utilizing the path decision oriented test generating algorithm PODEM so as to generate a test vector of the chip to be tested, wherein N is an integer larger than 1.

Optionally, the learning unit is further configured to: utilizing historical data to construct a Markov decision process MDP, wherein the Markov decision process MDP is a mathematical model for describing interaction between an agent and an environment in a reinforcement learning algorithm; and using a reinforcement learning algorithm to enable the agent to learn the search strategy from the historical data to obtain a reinforcement learning table for representing the rollback strategy. The generating unit is further configured to: the reinforcement learning table is embedded into a test generation algorithm PODEM for path decision, and the test generation algorithm PODEM for path decision embedded with the reinforcement learning table is utilized to select a rollback path so as to reduce backtracking times and shorten test generation time.

Optionally, the learning unit is further configured to: in the process of activating a test mode of a first fault by performing first-round fault sensitization, fault propagation and line value confirmation on a chip to be tested by using a test generation algorithm PODEM oriented to path decision, a rollback path, primary input reachability and frequency of backtracking occurrence are obtained, and historical data comprise the rollback path, the primary input reachability and the frequency of backtracking occurrence.

Optionally, the learning unit is further configured to: the historical data is used to build a Markov decision process MDP [ s, a, r, s '] comprising the current circuit state s, the execution action a, the instant prize r and the next circuit state s' of the chip to be tested.

Optionally, the learning unit is further configured to: in the process of learning a rollback strategy from historical data by using a reinforcement learning algorithm, each line traversed by a test generation algorithm PODEM facing a path decision in the rollback process is defined as a circuit state, the circuit state comprises a line ID and a target value, the line ID is a unique identifier of a line in a chip to be tested, and the target value is a target which needs to be met by the test generation algorithm PODEM facing the path decision in the rollback process.

Optionally, the learning unit is further configured to: in the process of learning the rollback strategy from the historical data by using the reinforcement learning algorithm, a fan-in path selected by a test generation algorithm PODEM for path decision in the rollback process is used as the current execution action.

Optionally, the learning unit is further configured to: in learning the back-off strategy from the history data using the reinforcement learning algorithm, a negative reward is assigned to each back-off step in the back-off process for the path decision oriented test generation algorithm PODEM, and a positive reward is assigned when the primary input is reached.

According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program that when executed performs the above-described method.

According to another aspect of the embodiments of the present application, there is also provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the method described above by the computer program.

According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the steps of any of the embodiments of the method described above.

In the embodiment of the application, in the process of generating a test vector for a chip to be tested by using an ATPG algorithm, historical data of a PODEM algorithm is obtained, wherein the historical data is formed in the process of activating a test mode of a first fault by performing first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested by using the PODEM algorithm, and the test mode of a plurality of faults is activated by performing multiple rounds of fault sensitization, fault propagation and line value confirmation on the chip to be tested by using the ATPG algorithm; learning a rollback strategy from historical data by using a reinforcement learning algorithm; in the process of activating a test mode of a second fault by performing N-th round of fault sensitization, fault propagation and line value confirmation on a chip to be tested by using a PODEM algorithm, a rollback strategy is used for guiding the PODEM algorithm to select a rollback path, so that a test vector of the chip to be tested is generated, and N is an integer larger than 1. According to the scheme, the Q-learning algorithm in reinforcement learning is introduced into the PODEM algorithm, the Q-learning algorithm is applied to learn the rollback strategy from ATPG data generated by the PODEM algorithm, the learned model is used for guiding the PODEM algorithm to select the rollback path, the traditional heuristic strategy is replaced, so that an effective decision point can be reached as soon as possible, the technical problem that the backtracking times are more in the test vector generation process in the related technology can be solved, and the backtracking times are reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow chart of a reinforcement learning based digital circuit automatic test vector generation method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an reinforcement learning based automatic test vector generation framework according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a state transition of an MDP in a circuit according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a process flow of a Benchmark circuit according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a reinforcement learning based digital circuit automatic test vector generation apparatus according to an embodiment of the present application;

fig. 6 is a block diagram of a terminal according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In recent years, machine learning plays an important role in the field of electronic design automation, and in terms of test technology, important breakthroughs have been made in the directions of wafer diagnosis, scan chain diagnosis, circuit defect identification, test compression, and the like. The academia and industry have attempted to improve performance of ATPG by using a machine Learning-based method, such as Learning key features in a circuit using Deep Learning (DL) algorithm in machine Learning, to generate high-quality test vectors, thereby reducing the number of test vectors and optimizing the execution time, however, research on automatic test vector generation technology based on machine Learning is still in a start-up stage.

The method is characterized in that the PODEM can cause a larger number of backtracking operations when single-fixed fault detection is completed, when a large-scale circuit with complex search space is processed by the heuristic ATPG method, the execution time is very long, the performance is worry, the requirement of rapid test generation of the large-scale circuit cannot be met, in view of the limitation of the heuristic strategy, the performance of the ATPG can be improved by adopting a deep learning-based method, for example, an automatic vector test generation method based on an artificial neural network is adopted, the PODEM is guided to select a rollback path by utilizing structural data and testability measurement information of the artificial neural network learning circuit, and the performance in part of circuits is obviously superior to that of the traditional heuristic strategy, but the solution is seriously dependent on the collected data and has the problem of expandability due to the fact that the state change of the circuit cannot be captured.

In the research process of the inventor, the automatic test vector generation method based on the heuristic strategy is found to show a remarkable performance bottleneck when processing a large-scale complex circuit, and particularly in the execution time, the internal complexity of the circuit is correspondingly increased due to the continuous increase of the circuit scale. At the same time, the expansion of the search space is accompanied, resulting in an excessively long execution time of the ATPG algorithm. Therefore, the applicability of the method in a large-scale circuit test scene is greatly limited, and the automatic test vector generation method based on the traditional heuristic strategy cannot meet the rapid and effective test requirement of the large-scale circuit due to the challenges.

It has also been found during research that in the automatic test vector generation method based on deep learning, line nodes that generate test vectors are marked as "successful", and trace-back line nodes are marked as "failure", so that the optimization target of ATPG is abstracted as a classification problem, however, this strategy is severely dependent on data collected in advance, and may suffer from scalability problems when dealing with complex circuits due to the inability to capture dynamic changes in circuit states. In fact, ATPG is more suitable to be regarded as an optimization problem, with the aim of finding the optimal solution according to the state characteristics of the circuit. In this scenario, reinforcement learning (Reinforcement Learning, RL) techniques may be a promising solution. This is because reinforcement learning can effectively capture circuit characteristics and interact with the environment through trial and error and feedback, thereby quickly exploring search space, effectively finding optimal or near optimal search strategies.

In order to reduce the backtracking times of automatic test vector generation (ATPG) of a digital circuit and improve the performance of the automatic test vector generation (ATPG), according to an aspect of an embodiment of the present application, a method embodiment of an automatic test vector generation method of a digital circuit based on reinforcement learning is provided. The Q-learning algorithm in reinforcement learning is introduced into the POEDM test generation algorithm, a rollback strategy is learned from historical data generated by the PODEM algorithm through the Q-learning algorithm, and the learned model is used for helping the PODEM to more intelligently select a rollback path so as to reach an effective decision point as soon as possible, reduce backtracking times and improve performance.

FIG. 1 is a flow chart of a reinforcement learning based digital circuit automatic test vector generation method according to an embodiment of the present application, as shown in FIG. 1, the method may include the steps of:

step S102, in the process of generating test vectors for chips to be tested by utilizing an automatic test vector generation algorithm ATPG, historical data of a test generation algorithm PODEM facing path decision is obtained.

The historical data are data formed in the process of activating a test mode of a first fault by using a path decision oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on a chip to be tested, such as a rollback path, primary input accessibility and frequency of backtracking occurrence, and the test mode of activating a plurality of faults by using an automatic test vector generation algorithm ATPG to generate test vectors for the chip to be tested.

Step S104, learning the rollback strategy from the historical data by using a reinforcement learning algorithm.

The historical data may be used to construct a markov decision process MDP [ s, a, r, s '] (including the current circuit state s, the execution action a, the instant prize r, and the next circuit state s'), which is a mathematical model describing the interaction of the agent with the environment in a reinforcement learning algorithm, and then the reinforcement learning algorithm is used to let the agent learn the search strategy from the historical data, resulting in a reinforcement learning table for representing the rollback strategy.

In the above embodiment, 1) each line traversed by the path decision-oriented test generation algorithm podm in the rollback process may be defined as a circuit state (which may be the current circuit state or the next circuit state), where the circuit state includes a line ID and a target value, the line ID is a unique identifier of a line in the chip to be tested, and the target value is a target that needs to be satisfied by the path decision-oriented test generation algorithm podm in the rollback process; 2) The fan-in path selected by the test generation algorithm PODEM facing the path decision in the rollback process can be used as the current execution action; 3) The path decision oriented test generation algorithm podm may be assigned a negative prize for each backoff step in the backoff process and a positive prize when the primary input is reached.

Step S106, during the process of activating the test mode of the second fault by using the path decision oriented test generation algorithm PODEM to perform the N-th round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, the path decision oriented test generation algorithm PODEM is guided to perform the selection of the rollback path by using the rollback strategy, so that the test vector of the chip to be tested is generated, N is an integer greater than 1, and the specific number depends on the number of faults.

Specifically, the reinforcement learning table can be embedded into the path decision oriented test generation algorithm podm, and the path decision oriented test generation algorithm podm embedded with the reinforcement learning table is utilized to select the rollback path, so as to reduce the backtracking times and shorten the test generation time.

By the steps, in the process of generating the test vector for the chip to be tested by utilizing the automatic test vector generation algorithm ATPG, the historical data of the path decision-oriented test generation algorithm PODEM is obtained, wherein the historical data is formed in the process of activating the test mode of the first fault by utilizing the path decision-oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and the test mode of multiple faults is activated by utilizing the automatic test vector generation algorithm ATPG to generate the test vector for the chip to be tested, wherein the multiple rounds of fault sensitization, fault propagation and line value confirmation are required to be performed; learning a rollback strategy from historical data by using a reinforcement learning algorithm; in the process of activating a test mode of a second fault by performing N-th round of fault sensitization, fault propagation and line value confirmation on a chip to be tested by using a path decision-oriented test generation algorithm PODEM, a rollback strategy is utilized to guide the path decision-oriented test generation algorithm PODEM to select a rollback path, so that a test vector of the chip to be tested is generated, and N is an integer larger than 1. According to the scheme, the Q-learning algorithm in reinforcement learning is introduced into the PODEM test generation algorithm, the Q-learning algorithm is applied to learn the rollback strategy from ATPG data generated by the PODEM algorithm, the learned model is used for guiding the PODEM algorithm to select the rollback path, the traditional heuristic strategy is replaced, so that an effective decision point can be reached as soon as possible, the technical problem that the backtracking times are more in the test vector generation process in the related art can be solved, and the backtracking times are reduced.

The application provides a digital circuit automatic test vector generation method based on reinforcement learning, which applies a Q-learning algorithm in reinforcement learning to automatic test vector generation. In practical production applications, to ensure test quality, multiple rounds of ATPG are often required on the same circuit. However, for complex circuits, multiple rounds of ATPG are inefficient due to the lengthy test generation times. In order to improve the fault detection efficiency of the circuit, a new strategy is adopted in the application: in the first round of ATPG, test generation results of the podm are collected and a markov decision process (Markov Decision Process, MDP) is constructed therefrom. Then, the Q-learning algorithm is used to let the agent learn the search strategy from these collected empirical data, and finally a fully trained Q-table is formed, and then the Q-table is integrated into the ATPG algorithm. By the method, the PODEM algorithm embedded with the Q-table can more intelligently select the rollback path, effectively reduce the backtracking times and shorten the test generation time. As an alternative embodiment, the general framework of the invention is described below with the example in fig. 2, with the following detailed steps:

Step 1, data acquisition: the podm algorithm is used to detect faults in the circuit and to collect critical data required for the reinforcement learning algorithm. These critical data include the back-off path, primary Input (PI) reachability, and the frequency of backtracking occurrences, which are collected because they play an important role in MDP, affecting changes in status, action, and prize values. The rollback path includes a current state, an action taken, and a next state to which the action is transferred after being performed in the current state; primary input reachability and backtracking frequency correspond to the designed rewarding mechanism.

Step 2, Q-table creation: the Q-table uses a simple and efficient way to store and update the value estimates for each state and action pair, which is a two-dimensional table, with one dimension representing possible environmental states and the other dimension representing possible actions taken in those states. In this two-dimensional table, the value of each cell, i.e., the Q-value, represents the return that is expected to be obtained in performing a particular action in a particular state, thereby providing an explicit indicator to the algorithm to guide it in making optimal decisions in each case. Abstracting the path search process of the podm to MDP (including state space, action space, state transition, and rewards function) using a pre-collected data set, wherein the state space is composed of all possible line sets encountered during rollback; the action space is determined by the number of fan-ins of each node; state transition process: given one line (current state), selecting any one fan-in (action) will cause the agent to enter the other line (next state); the goal of the agent should be compatible with the goal of ATPG, the agent should be allowed to reach the primary input quickly and assign values correctly during rollback, and following this design concept, the design of the reward function should be consistent with the goal of ATPG optimization, since the goal of the agent is to maximize the jackpot. The establishment of MDP allows Q-learning algorithm to be used to solve ATPG problem.

Step 3, Q-table deployment: by integrating the trained Q-table into the podm, a more intelligent decision can be made by the podm when selecting the backoff path. Specifically, it effectively generates a test vector by identifying an optimal backoff path in the current line state using the Q-table. This significantly reduces the number of backtracking in the ATPG process, thereby improving failure detection efficiency.

1) Design rules:

as described above, the present application converts the podm path search procedure into MDP. This transformation enables the present application to reconstruct the ATPG problem according to RL principles. Next, a sequence of [ s, a, r, s' ] (corresponding to current state, action performed, immediate rewards, next state, respectively) in MDP is studied in depth and detailed design rules for states, actions and rewards in the circuit are provided.

2) And (3) state design:

each line traversed during rollback is defined as a state represented by a line ID and a target value. The line ID is a unique identifier for each line in the circuit, and the target value is the target that the PODEM needs to meet during rollback. For example, the state of line f in FIG. 3 is S _f (f, 0) (see (1) in fig. 3).

The reason for using the line ID as one of the features is that the Q-learning algorithm looks up the corresponding Q-value by indexing state-action pairs in the Q-table. Thus, characterizing the line ID can help the model quickly locate to the current state during the update of the Q-table. In addition, the feature representation mode can effectively reduce the dimension of the state, so that the calculation complexity of Q-learning and the storage requirement of Q-table are simplified.

One of the reasons for using the target value as one of the features is that two types of faults, namely single fixed type 0 (Stuck-at-0, SA 0) and single fixed type 1 (Stuck-at-1, SA 1), are contained in the circuit at the same time, and the fallback strategies for testing the two types of faults are different. For example, in fig. 3, for SA1 failure on line f, since G2 is and gate, setting f=0 is equivalent to one target: a=0 or d=0. However, if there is a SA0 failure on line f, this means that two objectives need to be met: both a and d need to be 1. Thus, the "target value" can help Q-learning make a more accurate localization when both faults are present in the circuit.

3) And (3) action design:

during rollback, the selected fan-in path corresponds to the current action, in FIG. 3, line f has two fan-in paths, which means that the agent is in state S _f There are two possible actions in (f, 0), for example, action 1 (see (2) in fig. 3) may be selected to evaluate the reward and put the agent into the next state (see (4) in fig. 3).

However, the number of fan-ins of different line nodes in the actual circuit may be different, which may cause the motion space of the agent in different states to be dynamically changed, e.g., state S _f The operation space in (f, 0) is 2, but in state S _c (c, 0) is 1, where a specific method is used to help the agent make more accurate action selections, create mask vectors of equal length in each state, the maximum number of logic gates in the circuit determines the length of the mask vector, sets the mask vector according to the number of fan-ins per line node, sets the active action to 1, and the inactive action to 0. For example, in a circuit with a maximum number of 4 fan-ins, the mask vector for a node with 3 fan-ins is [1, 1, 1, 0]. By setting the mask vector for the action space, the agent can limit its action selection to the allowable range, thereby improving the efficiency of algorithm execution. Is required toIt is noted that in the subsequent ATPG phase, if the state encountered during rollback is not in the pre-trained Q-table, this means that the data set used during training is not covered to that state. In this case, the agent may default to rollback using the first action.

4) And (3) rewarding design:

the use of the reward signal to specify that the goal is one of the most prominent features of reinforcement learning, whether the reward mechanism design is reasonably straightforward determines whether the final goal of the reinforcement learning task can be achieved. To optimize the search path, the present application assigns a negative prize to each backoff step and a positive prize when PI is reached (see (3) in fig. 3). Since backtracking increases computational and time costs, thereby reducing the performance of ATPG, the present application assigns a larger negative prize to backtracking, such that the algorithm "bypasses" the path that may cause backtracking.

Experiment and result analysis:

in order to be close to the industrial-level chip design flow, the Benchmark circuit used by the invention adopts a commercial EDA tool and a standard process library to comprehensively generate a netlist, and the processing flow is shown in FIG. 4: firstly, reading a Verilog file and a Smic180 process library of a register transfer level (Register Transfer Leve, RTL), and synthesizing to obtain a gate-level netlist (Gate Level Netlist) through DC (Design Compiler) tools of Synopsys corporation; and then, the internal data structure is built through a test synthesis tool ICtest read gate netlist, and test generation is performed by using ICtest-ATPG on the basis.

The PODEM algorithm based on ATPG is tested, and traditional heuristic strategies (Distance, SCOAP) and machine learning heuristic strategies (DL and RL) are used for guiding the PODEM to rollback in a circuit to be tested, so that the validity of the RL heuristic strategy of the application is verified. For the PODEM adopting the RL heuristic strategy, a data set required by training is obtained in the first ATPG, then a Q-learning algorithm is utilized to learn a rollback strategy from the data set, and then a model obtained by training is embedded into the PODEM so as to enhance the performance of the subsequent ATPG. The super parameters of the RL heuristic are set as: the learning rate is 0.01 and the discount factor is 0.99 (closer to 1 means more emphasis is given to future rewards by the agent). The reward mechanism is set as follows: each step in the rollback procedure rewards-0.1, reaching PI rewards +10, resulting in a backtracking of-100.

For the PODEM adopting the DL heuristic strategy, the application adopts an ANN algorithm to guide the PODEM to rollback. To improve the learning efficiency of ANN, "complex circuits" are used as "training circuits" to produce high quality sample data. The complexity of the circuit is measured according to the fault coverage rate, the logic depth and the fan-out number of the logic gates of the circuit. Circuits with lower fault coverage, deeper logic depth, and higher number of logic gate fans are considered to be more "complex". Thus, c499, s9234, and b05 are selected as training circuits for the ANN. The training circuit comprises the following characteristics: logic gate type, line level, number of logic gate fans out, and SCOAP (0, 1 controllability, observability). ANNs use a three-layer neural network structure that includes an input layer, a hidden layer, and an output layer. The activation function uses a ReLU and adopts Adam optimization algorithm during the training process. The sample size of the training dataset is limited to between 0 and 100 tens of thousands. Experimental results show that when the data set size is 80 ten thousand and the number of hidden layer nodes is 25, training errors of the ANN model are minimum.

Three experiments were carried out in this application, respectively:

(1) Effect of dataset size on RL policy. First, five podms (using a random heuristic) were performed on the c499 circuit, resulting in five sets of data sets of different sizes, each set covering a different circuit state. Next, the policies are learned from each set of data sets using a Q-learning algorithm. Finally, the trained model is used to assist the podm to retest c 499. The backtracking times under different strategies are shown in table 1.

TABLE 1 influence of dataset size on RL strategy in c499 circuits

(2) Influence of heuristic strategies on PODEM performance. Traditional heuristic policies (Distance and SCOAP) and machine learning heuristic policies (DL and RL) are applied to the six reference circuits to guide the PODEM to rollback path selection. And respectively observing index performance of the PODEM under different heuristic strategies by using the backtracking times, the rollback times, the running time and the fault coverage rate which are commonly used in test generation as measurement indexes. The results are shown in Table 2.

TABLE 2 comparison of PODEM Performance under different heuristic strategies

(3) The impact of training dataset size on strategy learning for both machine learning methods was explored. Considering that an effective rollback strategy can help the PODEM reduce the backtracking times and improve the performance thereof. Thus, the "backtracking times" are used to measure policy effectiveness. The results are shown in Table 3.

Table 3 training data set size for two machine learning methods

From an analysis of the data in Table 1, it can be seen that when using RL-based PODEM to fault detect c499 circuits, the more circuit states the data set covers, the more efficient the Q-learning strategy and the fewer backtracking times. This result shows that the rich circuit state information can enhance the understanding of the environment by Q-learning, enabling it to learn a more efficient fallback strategy.

As can be seen from table 2, the proposed method is superior to the conventional heuristic and DL heuristic on most circuits. The method can effectively reduce the backtracking times generated by the test, and shows that Q-learning can learn a considerable rollback strategy from the data set. The run time reduction indicates that the method improves the performance of ATPG. Fault coverage is an important indicator of ATPG, and since the reference circuit scale used by current research is not large, the difference of fault coverage under each heuristic strategy is not obvious. Notably, the RL heuristic proposed by the present application is nonetheless superior to other heuristics in some circuits. However, for the s13207 circuit, the RL heuristic is not as good as Distance and DL, which may be due to poor data set quality.

As shown in Table 2, the RL heuristic shows significant performance on the c6288 circuit. By looking at Table 3, in the data set used in c6288, the state coverage of the circuit reaches 94.68%, which means that the rich circuit environment information helps the agent learn the optimal strategy. In contrast, the state coverage of the s13207 circuit is 47.04%, consistent with the RL heuristic that performs poorly on this circuit. However, as the data set size increases, the computational overhead consumed by model training also increases. Thus, it becomes critical to balance state coverage against dataset size. RL heuristic requires training and testing on the same circuit, while DL heuristic requires training on only a few circuits and then testing on other circuits. However, observing the behavior of the c6288 circuit index using the DL heuristic in table 2, the generalization capability of the ANN model still has a certain limitation.

The application verifies on part of reference circuits of ISCAS85, ISCAS89 and ITC99, comprehensively evaluates the backtracking times, the rollback times, the running time and the fault coverage rate generated by the automatic test vector, and the result shows the effectiveness of the application relative to the traditional heuristic strategy and a rollback path selection strategy based on an Artificial Neural Network (ANN).

In summary, automatic test vector generation (ATPG) is a key technology in digital circuit testing, and excessive backtracking in the ATPG process can cause a great deal of consumption of computing resources and seriously affect performance. The method for generating the automatic test vector of the digital circuit based on reinforcement learning applies the Q-learning algorithm in reinforcement learning to the POEDM test generation algorithm, so that the PODEM can be guided to select the correct rollback path as much as possible, the backtracking times in the test generation process are effectively reduced, and the performance of the test generation process is improved. The traditional heuristic strategy (Distance, SCOAP) and a deep learning heuristic strategy (ANN) are comprehensively compared, and the effectiveness of the invention is verified through experimental results.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method described in the embodiments of the present application.

According to another aspect of the embodiments of the present application, there is also provided a reinforcement learning-based digital circuit automatic test vector generation apparatus for implementing the reinforcement learning-based digital circuit automatic test vector generation method. FIG. 5 is a schematic diagram of a reinforcement learning based digital circuit automatic test vector generation apparatus according to an embodiment of the present application, as shown in FIG. 5, the apparatus may include:

An obtaining unit 51, configured to obtain, in a process of generating a test vector for a chip to be tested by using an automatic test vector generation algorithm ATPG, historical data of a path decision oriented test generation algorithm PODEM, where the historical data is data formed in a process of activating a test mode of a first fault by performing a first round of fault sensitization, fault propagation and line value confirmation on the chip to be tested by using the path decision oriented test generation algorithm PODEM, and activating a plurality of test modes of faults by performing a plurality of rounds of fault sensitization, fault propagation and line value confirmation by using the automatic test vector generation algorithm ATPG;

a learning unit 53 for learning a rollback policy from the history data using a reinforcement learning algorithm;

the generating unit 55 is configured to instruct the path decision oriented test generating algorithm podm to select a rollback path by using the rollback policy in a process of performing an nth round of fault sensitization, fault propagation and line value confirmation on the chip to be tested to activate a test mode of a second fault, so as to generate a test vector of the chip to be tested, where N is an integer greater than 1.

Through the module, in the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, acquiring historical data of a path decision-oriented test generation algorithm PODEM, wherein the historical data is formed in the process of activating a first fault test mode by utilizing the path decision-oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and activating a plurality of fault test modes by utilizing the automatic test vector generation algorithm ATPG to generate the test vector for the chip to be tested; learning a rollback policy from the historical data using a reinforcement learning algorithm; and in the process of activating a test mode of a second fault by using the path decision oriented test generation algorithm PODEM to perform N-th round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, guiding the path decision oriented test generation algorithm PODEM to select a rollback path by using the rollback strategy so as to generate a test vector of the chip to be tested, wherein N is an integer larger than 1. According to the scheme, the Q-learning algorithm in reinforcement learning is introduced into the PODEM test generation algorithm, the Q-learning algorithm is applied to learn the rollback strategy from ATPG data generated by the PODEM algorithm, the learned model is used for guiding the PODEM algorithm to select the rollback path, the traditional heuristic strategy is replaced, so that an effective decision point can be reached as soon as possible, the technical problem that the backtracking times are more in the test vector generation process in the related art can be solved, and the backtracking times are reduced.

Optionally, the learning unit is further configured to: constructing a Markov decision process MDP by utilizing the historical data, wherein the Markov decision process MDP is a mathematical model for describing interaction of an agent and an environment in the reinforcement learning algorithm; and using the reinforcement learning algorithm to enable the agent to learn a search strategy from the historical data, and obtaining a reinforcement learning table for representing the rollback strategy. The generating unit is further configured to: embedding the reinforcement learning table into the path decision oriented test generation algorithm PODEM, and selecting a rollback path by utilizing the path decision oriented test generation algorithm PODEM embedded with the reinforcement learning table so as to reduce backtracking times and shorten test generation time.

Optionally, the learning unit is further configured to: and acquiring a rollback path, primary input reachability and frequency of backtracking occurrence in the process of activating a test mode of a first fault by using the path decision-oriented test generation algorithm PODEM to perform first round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, wherein the historical data comprises the rollback path, the primary input reachability and the frequency of backtracking occurrence.

Optionally, the learning unit is further configured to: and establishing the Markov decision process MDP [ s, a, r, s '] comprising the current circuit state s, the execution action a, the instant rewards r and the next circuit state s' of the chip to be tested by utilizing the historical data.

Optionally, the learning unit is further configured to: in the process of learning the rollback strategy from the historical data by using the reinforcement learning algorithm, each line traversed by the path decision-oriented test generation algorithm PODEM in the rollback process is defined as a circuit state, wherein the circuit state comprises a line ID and a target value, the line ID is a unique identification of a line in the chip to be tested, and the target value is a target which needs to be met by the path decision-oriented test generation algorithm PODEM in the rollback process.

Optionally, the learning unit is further configured to: and in the process of learning the rollback strategy from the historical data by using a reinforcement learning algorithm, taking the fanin path selected by the test generation algorithm PODEM facing the path decision in the rollback process as the current execution action.

Optionally, the learning unit is further configured to: during learning of the backoff strategy from the history data using a reinforcement learning algorithm, a negative reward is assigned to each backoff step in the backoff process for the path decision oriented test generation algorithm podm, and a positive reward is assigned when a primary input is reached.

It should be noted that the above modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to what is disclosed in the above embodiments. It should be noted that, the above modules may be implemented in a corresponding hardware environment as part of the apparatus, and may be implemented in software, or may be implemented in hardware, where the hardware environment includes a network environment.

According to another aspect of the embodiments of the present application, there is also provided a server or a terminal for implementing the above-mentioned reinforcement learning-based digital circuit automatic test vector generation method.

Fig. 6 is a block diagram of a terminal according to an embodiment of the present application, and as shown in fig. 6, the terminal may include: one or more (only one is shown in the figure) processors 601, memory 603, and transmission means 605, as shown in fig. 6, the terminal may further comprise an input output device 607.

The memory 603 may be configured to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for generating an automatic test vector for a digital circuit based on reinforcement learning in the embodiments of the present application, and the processor 601 executes the software programs and modules stored in the memory 603 to perform various functional applications and data processing, that is, to implement the method for generating an automatic test vector for a digital circuit based on reinforcement learning. Memory 603 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory 603 may further include memory remotely located with respect to the processor 601, which may be connected to the terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 605 is used to receive or transmit data via a network, and may also be used for data transmission between the processor and the memory. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 605 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 605 is a Radio Frequency (RF) module that is configured to communicate wirelessly with the internet.

In particular, the memory 603 is used to store applications.

The processor 601 may call an application program stored in the memory 603 through the transmission means 605 to perform the steps of:

in the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, acquiring historical data of a path decision oriented test generation algorithm PODEM, wherein the historical data is data formed in the process of activating a test mode of a first fault by utilizing the path decision oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and activating a plurality of test modes of faults by utilizing the automatic test vector generation algorithm ATPG to generate the test vector for the chip to be tested; learning a rollback policy from the historical data using a reinforcement learning algorithm; and in the process of activating a test mode of a second fault by using the path decision oriented test generation algorithm PODEM to perform N-th round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, guiding the path decision oriented test generation algorithm PODEM to select a rollback path by using the rollback strategy so as to generate a test vector of the chip to be tested, wherein N is an integer larger than 1.

Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments, and this embodiment is not described herein.

It will be appreciated by those skilled in the art that the structure shown in fig. 6 is only illustrative, and the terminal may be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 6 is not limited to the structure of the electronic device. For example, the terminal may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in fig. 6, or have a different configuration than shown in fig. 6.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device to execute in association with hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

Embodiments of the present application also provide a storage medium. Alternatively, in the present embodiment, the above-described storage medium may be used for program code for executing the reinforcement learning-based digital circuit automatic test vector generation method.

Alternatively, in this embodiment, the storage medium may be located on at least one network device of the plurality of network devices in the network shown in the above embodiment.

Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of:

Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims

1. The method for generating the automatic test vector of the digital circuit based on reinforcement learning is characterized by comprising the following steps of:

in the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, acquiring historical data of a path decision oriented test generation algorithm PODEM, wherein the historical data is data formed in the process of activating a test mode of a first fault by utilizing the path decision oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and activating a plurality of test modes of faults by utilizing the automatic test vector generation algorithm ATPG to generate the test vector for the chip to be tested;

Learning a rollback policy from the historical data using a reinforcement learning algorithm;

and in the process of activating a test mode of a second fault by using the path decision oriented test generation algorithm PODEM to perform N-th round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, guiding the path decision oriented test generation algorithm PODEM to select a rollback path by using the rollback strategy so as to generate a test vector of the chip to be tested, wherein N is an integer larger than 1.

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

learning a rollback policy from the historical data using a reinforcement learning algorithm, comprising: constructing a Markov decision process MDP by utilizing the historical data, wherein the Markov decision process MDP is a mathematical model for describing interaction of an agent and an environment in the reinforcement learning algorithm; using the reinforcement learning algorithm to enable the agent to learn a search strategy from the historical data to obtain a reinforcement learning table for representing the rollback strategy;

the method for utilizing the rollback policy to guide the test generation algorithm PODEM of the path-oriented decision to select the rollback path comprises the following steps: embedding the reinforcement learning table into the path decision oriented test generation algorithm PODEM, and selecting a rollback path by utilizing the path decision oriented test generation algorithm PODEM embedded with the reinforcement learning table so as to reduce backtracking times and shorten test generation time.

3. The method according to claim 2, wherein obtaining historical data of a path decision oriented test generation algorithm PODEM comprises:

and acquiring a rollback path, primary input reachability and frequency of backtracking occurrence in the process of activating a test mode of a first fault by using the path decision-oriented test generation algorithm PODEM to perform first round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, wherein the historical data comprises the rollback path, the primary input reachability and the frequency of backtracking occurrence.

4. The method according to claim 2, wherein constructing a markov decision process MDP using the historical data comprises:

and establishing the Markov decision process MDP [ s, a, r, s '] comprising the current circuit state s, the execution action a, the instant rewards r and the next circuit state s' of the chip to be tested by utilizing the historical data.

5. The method of claim 4, wherein in learning a rollback policy from the historical data using a reinforcement learning algorithm, the method further comprises:

each line traversed by the path decision-oriented test generation algorithm PODEM in the rollback process is defined as a circuit state, wherein the circuit state comprises a line ID and a target value, the line ID is a unique identifier of a line in the chip to be tested, and the target value is a target which needs to be met by the path decision-oriented test generation algorithm PODEM in the rollback process.

6. The method of claim 4, wherein in learning a rollback policy from the historical data using a reinforcement learning algorithm, the method further comprises:

and taking the fan-in path selected by the test generation algorithm PODEM facing the path decision in the rollback process as the current execution action.

7. The method of claim 4, wherein in learning a rollback policy from the historical data using a reinforcement learning algorithm, the method further comprises:

a negative reward is allocated for each backoff step in the backoff process of the path decision oriented test generation algorithm podm, and a positive reward is allocated when the primary input is reached.

8. An automatic test vector generation device for a digital circuit based on reinforcement learning, which is characterized by comprising:

the system comprises an acquisition unit, a path decision-oriented test generation algorithm PODEM and a test control unit, wherein the acquisition unit is used for acquiring historical data of the path decision-oriented test generation algorithm PODEM in the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, wherein the historical data are data formed in the process of activating a test mode of a first fault by utilizing the path decision-oriented test generation algorithm PODEM to perform first round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and the test mode of a plurality of faults is activated by utilizing the automatic test vector generation algorithm ATPG to generate the test vector for the chip to be tested;

A learning unit for learning a rollback policy from the history data using a reinforcement learning algorithm;

the generating unit is configured to instruct the path decision oriented test generating algorithm podm to select a rollback path by using the rollback policy in a process of performing nth round of fault sensitization, fault propagation and line value confirmation on the chip to be tested to activate a test mode of a second fault, so as to generate a test vector of the chip to be tested, where N is an integer greater than 1.

9. A computer readable storage medium, characterized in that the storage medium comprises a stored program, wherein the program when run performs the method of any of the preceding claims 1 to 7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor performs the method of any of the preceding claims 1 to 7 by means of the computer program.