CN117872097B

CN117872097B - Method and device for generating automatic test vector of digital circuit based on reinforcement learning

Info

Publication number: CN117872097B
Application number: CN202410263521.XA
Authority: CN
Inventors: 李文星; 叶靖
Original assignee: Zhongke Jianxin Beijing Technology Co ltd
Current assignee: Zhongke Jianxin Beijing Technology Co ltd
Filing date: 2024-03-08
Publication date: 2024-07-09
Anticipated expiration: 2044-03-08

Abstract

The application discloses a method and a device for generating an automatic test vector of a digital circuit based on reinforcement learning. Wherein the method comprises the following steps: in the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, historical data of a test generation algorithm PODEM facing a path decision is obtained, namely, data formed by a test mode of activating a first fault by performing first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested is obtained, and a rollback strategy is learned from the historical data by utilizing a reinforcement learning algorithm; and in the process of activating the test mode of the second fault by carrying out N-th round of fault sensitization, fault propagation and line value confirmation on the chip to be tested by utilizing PODEM, guiding PODEM to select a rollback path by utilizing a rollback strategy so as to generate a test vector of the chip to be tested. The application solves the technical problem of more backtracking times in the test vector generation process in the related technology.

Description

Method and device for generating automatic test vector of digital circuit based on reinforcement learning

Technical Field

The application relates to the field of chip testing, in particular to a method and a device for generating an automatic test vector of a digital circuit based on reinforcement learning.

Background

This section is intended to provide a background or context for the matter recited in the claims or specification, which is not admitted to be prior art by inclusion in this section.

The progressive advance of advanced semiconductor technology increases the complexity of chip design, resulting in increased chip defects in the manufacture, and makes the testing of integrated circuits meet unprecedented challenges, and for this purpose, an Automatic TEST PATTERN Generation (ATPG) algorithm is adopted to solve the problem, and the algorithm searches for effective test vectors to detect all possible faults in a circuit as completely as possible, and the core goal of ATPG is to obtain a test vector set with high fault coverage, how to obtain a test vector set that is simplified and has high fault coverage, which is critical for efficient detection of faults, so that the test time of production test can be shortened, and the test cost can be reduced.

For a combinational logic circuit with n-bit inputs, 2 ⁿ input vectors are required for exhaustive testing. Although testing using an exhaustive method can achieve a high failure rate, in practical production applications, conventional ATPG mainly uses a path sensitization method, which is to complete failure detection in a circuit through failure sensitization, failure propagation and line value confirmation, and the classical path sensitization method includes a D algorithm (adopting the idea of multidimensional sensitization, and sensitization of all paths from a failure position to all output ends of the circuit at the same time), a PODEM algorithm (totally called Path Oriented Decision Making, i.e. path-oriented decision, optimizing a search process using implicit enumeration, making decisions only at input ends, and further greatly reducing a search space), and a FAN algorithm (totally called FAN-out-oriented, introducing and utilizing FAN-out nodes in the circuit to reduce the complexity of ATPG).

The conventional automatic test vector generation method adopts a heuristic strategy to search for an effective test vector, however, when single-fixation fault detection in a circuit is completed, a large amount of backtracking operation is caused, and the efficiency of test generation is affected. PODEM is a relatively mature path sensitization algorithm, and the search process is optimized by using implicit enumeration, so that the test efficiency is improved. PODEM may randomly select a back-off path during back-off, or may combine related heuristic information to perform back-off. The common heuristic information comprises a logic level and a testability measurement method of the circuit, and the backtracking times can be effectively reduced when the heuristic information is combined for rollback, so that the efficiency of test generation is improved, however PODEM can still cause a larger number of operations when the single-fixed fault detection is completed by combining the logic level and the testability measurement method of the circuit. When a large-scale circuit with complex search space is processed by the heuristic ATPG method, the execution time is very long, and the performance cause anxiety cannot meet the requirement of rapid test generation of the large-scale circuit.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating an automatic test vector of a digital circuit based on reinforcement learning, which at least solve the technical problem of more backtracking times in the test vector generating process in the related technology.

According to an aspect of the embodiment of the present application, there is provided a method for generating an automatic test vector of a digital circuit based on reinforcement learning, including: in the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, historical data of a test generation algorithm PODEM facing a path decision is obtained, wherein the historical data is formed in the process of activating a test mode of a first fault by utilizing the test generation algorithm PODEM facing the path decision to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and the test mode of a plurality of faults is activated by utilizing the automatic test vector generation algorithm ATPG to generate the test vector for the chip to be tested, wherein the test mode of the first fault is required to be subjected to multiple rounds of fault sensitization, fault propagation and line value confirmation; learning a rollback strategy from historical data by using a reinforcement learning algorithm; in the process of activating the test mode of the second fault by using the path decision oriented test generation algorithm PODEM to perform the nth round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, the path decision oriented test generation algorithm PODEM is guided to perform the selection of the rollback path by using the rollback strategy, so that the test vector of the chip to be tested is generated, and N is an integer greater than 1.

Optionally, learning the rollback policy from the historical data using a reinforcement learning algorithm includes: utilizing historical data to construct a Markov decision process MDP, wherein the Markov decision process MDP is a mathematical model for describing interaction between an agent and an environment in a reinforcement learning algorithm; and using a reinforcement learning algorithm to enable the agent to learn the search strategy from the historical data to obtain a reinforcement learning table for representing the rollback strategy. Guiding the path decision oriented test generation algorithm PODEM to select a fallback path using the fallback policy includes: the reinforcement learning table is embedded into the path decision-oriented test generation algorithm PODEM, and the path decision-oriented test generation algorithm PODEM embedded with the reinforcement learning table is utilized to select the rollback path, so that the backtracking times are reduced and the test generation time is shortened.

Optionally, obtaining historical data of the path decision oriented test generation algorithm PODEM includes: in the process of activating a test mode of a first fault by using a path decision-oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on a chip to be tested, a rollback path, primary input reachability and frequency of occurrence of backtracking are obtained, and historical data comprise the rollback path, primary input reachability and frequency of occurrence of backtracking.

Optionally, constructing the markov decision process MDP using the historical data includes: the historical data is used to build a Markov decision process MDP [ s, a, r, s '] comprising the current circuit state s, the execution action a, the instant prize r and the next circuit state s' of the chip to be tested.

Optionally, in the learning of the rollback policy from the historical data using the reinforcement learning algorithm, the method further comprises: each line traversed by the path decision-oriented test generation algorithm PODEM in the rollback process is defined as a circuit state, the circuit state includes a line ID and a target value, the line ID is a unique identifier of a line in the chip to be tested, and the target value is a target that the path decision-oriented test generation algorithm PODEM needs to satisfy in the rollback process.

Optionally, in the learning of the rollback policy from the historical data using the reinforcement learning algorithm, the method further comprises: and taking the fan-in path selected by the path decision-oriented test generation algorithm PODEM in the rollback process as the current execution action.

Optionally, in the learning of the rollback policy from the historical data using the reinforcement learning algorithm, the method further comprises: a negative reward is assigned to each backoff step in the backoff process for the path decision oriented test generation algorithm PODEM and a positive reward is assigned when the primary input is reached.

According to another aspect of the embodiment of the present application, there is also provided a digital circuit automatic test vector generation apparatus based on reinforcement learning, including: the acquisition unit is used for acquiring historical data of a path decision-oriented test generation algorithm PODEM in the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, wherein the historical data is data formed in the process of activating a test mode of a first fault by utilizing the path decision-oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and activating a plurality of test modes of faults by utilizing the automatic test vector generation algorithm ATPG to generate the test vector for the chip to be tested, wherein the test mode of the first fault is required to be activated by performing multiple rounds of fault sensitization, fault propagation and line value confirmation; a learning unit for learning a rollback policy from the history data using a reinforcement learning algorithm; the generating unit is configured to instruct the path decision oriented test generating algorithm PODEM to select a fallback path by using a fallback policy in a process of activating a test mode of a second fault by performing an nth round of fault sensitization, fault propagation and line value confirmation on the chip to be tested by using the path decision oriented test generating algorithm PODEM, so as to generate a test vector of the chip to be tested, where N is an integer greater than 1.

Optionally, the learning unit is further configured to: utilizing historical data to construct a Markov decision process MDP, wherein the Markov decision process MDP is a mathematical model for describing interaction between an agent and an environment in a reinforcement learning algorithm; and using a reinforcement learning algorithm to enable the agent to learn the search strategy from the historical data to obtain a reinforcement learning table for representing the rollback strategy. The generating unit is further configured to: the reinforcement learning table is embedded into the path decision-oriented test generation algorithm PODEM, and the path decision-oriented test generation algorithm PODEM embedded with the reinforcement learning table is utilized to select the rollback path, so that the backtracking times are reduced and the test generation time is shortened.

Optionally, the learning unit is further configured to: in the process of activating a test mode of a first fault by using a path decision-oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on a chip to be tested, a rollback path, primary input reachability and frequency of occurrence of backtracking are obtained, and historical data comprise the rollback path, primary input reachability and frequency of occurrence of backtracking.

Optionally, the learning unit is further configured to: the historical data is used to build a Markov decision process MDP [ s, a, r, s '] comprising the current circuit state s, the execution action a, the instant prize r and the next circuit state s' of the chip to be tested.

Optionally, the learning unit is further configured to: in the process of learning the rollback policy from the historical data by using the reinforcement learning algorithm, each line traversed by the path decision-oriented test generation algorithm PODEM in the rollback process is defined as a circuit state, the circuit state comprises a line ID and a target value, the line ID is a unique identifier of a line in the chip to be tested, and the target value is a target that the path decision-oriented test generation algorithm PODEM needs to meet in the rollback process.

Optionally, the learning unit is further configured to: in the process of learning the rollback policy from the historical data by using the reinforcement learning algorithm, the fanin path selected by the test generation algorithm PODEM for the path decision in the rollback process is used as the current execution action.

Optionally, the learning unit is further configured to: in learning the backoff strategy from the history data using the reinforcement learning algorithm, the path decision oriented test generation algorithm PODEM is assigned a negative reward for each backoff step in the backoff process and a positive reward when the primary input is reached.

According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program that executes the above-described method when running.

According to another aspect of the embodiments of the present application, there is also provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the method described above by the computer program.

According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the steps of any of the embodiments of the method described above.

In the embodiment of the application, in the process of generating a test vector for a chip to be tested by utilizing an ATPG algorithm, historical data of PODEM algorithm is obtained, wherein the historical data is formed in the process of activating a test mode of a first fault by utilizing the PODEM algorithm to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and the test mode of a plurality of faults is activated by utilizing the ATPG algorithm to generate the test vector for the chip to be tested, wherein the test mode of the first fault is required to be activated by performing multiple rounds of fault sensitization, fault propagation and line value confirmation; learning a rollback strategy from historical data by using a reinforcement learning algorithm; in the process of activating the test mode of the second fault by carrying out N-th round of fault sensitization, fault propagation and line value confirmation on the chip to be tested by utilizing PODEM algorithm, the PODEM algorithm is guided to carry out the selection of the rollback path by utilizing the rollback strategy, so that the test vector of the chip to be tested is generated, and N is an integer larger than 1. According to the scheme, the Q-learning algorithm in reinforcement learning is introduced into the PODEM algorithm, the Q-learning algorithm is applied to learn the rollback strategy from ATPG data generated by the PODEM algorithm, the learned model is used for guiding the PODEM algorithm to select the rollback path, the traditional heuristic strategy is replaced, so that an effective decision point can be reached as soon as possible, the technical problem that the backtracking times are more in the test vector generation process in the related art can be solved, and the backtracking times are reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of a reinforcement learning based digital circuit automatic test vector generation method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an reinforcement learning based automatic test vector generation framework in accordance with an embodiment of the present application;

FIG. 3 is a schematic diagram of a state transition of an MDP in a circuit according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a process flow of a Benchmark circuit according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a reinforcement learning based digital circuit automatic test vector generation apparatus according to an embodiment of the present application;

fig. 6 is a block diagram of a structure of a terminal according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In recent years, machine learning plays an important role in the field of electronic design automation, and in terms of test technology, important breakthroughs have been made in the directions of wafer diagnosis, scan chain diagnosis, circuit defect identification, test compression, and the like. The academia and industry have attempted to improve performance of ATPG by machine learning, such as learning key features in a circuit by using deep learning (DEEP LEARNING, DL) algorithm in machine learning, to generate high-quality test vectors, thereby reducing the number of test vectors and optimizing execution time.

PODEM can cause a larger number of backtracking operations when finishing single-fixed fault detection, when processing a large-scale circuit with complex search space, the heuristic ATPG method has very long execution time and cause anxiety performance, can not meet the requirement of rapid test generation of the large-scale circuit, can adopt a deep learning-based method to improve the performance of ATPG in view of the limitation of heuristic strategies, such as adopting an automatic vector test generation method based on an artificial neural network, and utilizes structural data and testability measurement information of the artificial neural network learning circuit to guide PODEM to select a rollback path, and has the performance in part of circuits obviously superior to that of the traditional heuristic strategies, but the solution seriously depends on the collected data and has the problem of expandability due to the fact that the state change of the circuit can not be captured.

In the research process of the inventor, the automatic test vector generation method based on the heuristic strategy is found to show a remarkable performance bottleneck when processing a large-scale complex circuit, and particularly in the execution time, the internal complexity of the circuit is correspondingly increased due to the continuous increase of the circuit scale. At the same time, the expansion of the search space is accompanied, resulting in an excessively long execution time of the ATPG algorithm. Therefore, the applicability of the method in a large-scale circuit test scene is greatly limited, and the automatic test vector generation method based on the traditional heuristic strategy cannot meet the rapid and effective test requirement of the large-scale circuit due to the challenges.

It has also been found during research that in the automatic test vector generation method based on deep learning, line nodes that generate test vectors are marked as "successful", and trace-back line nodes are marked as "failure", so that the optimization target of ATPG is abstracted as a classification problem, however, this strategy is severely dependent on data collected in advance, and may suffer from scalability problems when dealing with complex circuits due to the inability to capture dynamic changes in circuit states. In fact, ATPG is more suitable to be regarded as an optimization problem, with the aim of finding the optimal solution according to the state characteristics of the circuit. In this scenario, reinforcement learning (Reinforcement Learning, RL) techniques may be a promising solution. This is because reinforcement learning can effectively capture circuit characteristics and interact with the environment through trial and error and feedback, thereby quickly exploring search space, effectively finding optimal or near optimal search strategies.

In order to reduce the backtracking times of automatic test vector generation (ATPG) of a digital circuit and improve the performance of the ATPG, according to one aspect of the embodiment of the application, a method embodiment of an automatic test vector generation method of a digital circuit based on reinforcement learning is provided. The Q-learning algorithm in reinforcement learning is introduced into POEDM test generation algorithm, the rollback strategy is learned from historical data generated by PODEM algorithm through the Q-learning algorithm, and the learned model is used for helping PODEM to more intelligently select the rollback path so as to reach an effective decision point as soon as possible, reduce backtracking times and improve performance.

FIG. 1 is a flow chart of a reinforcement learning based digital circuit automatic test vector generation method according to an embodiment of the present application, as shown in FIG. 1, the method may include the steps of:

in step S102, in the process of generating a test vector for the chip to be tested by using the automatic test vector generation algorithm ATPG, historical data of the path decision oriented test generation algorithm PODEM is obtained.

The above-mentioned historical data is data formed in the process of performing a first round of fault sensitization, fault propagation and line value confirmation on the chip to be tested by using a path decision-oriented test generation algorithm PODEM to activate a test mode of the first fault, such as a rollback path, primary input reachability and frequency of occurrence of backtracking, and multiple rounds of fault sensitization, fault propagation and line value confirmation are required to be performed to activate test modes of multiple faults by using an automatic test vector generation algorithm ATPG to generate a test vector for the chip to be tested.

Step S104, learning the rollback strategy from the historical data by using a reinforcement learning algorithm.

The historical data may be used to construct a markov decision process MDP [ s, a, r, s '] (including the current circuit state s, the execution action a, the instant prize r, and the next circuit state s'), which is a mathematical model describing the interaction of the agent with the environment in a reinforcement learning algorithm, and then the reinforcement learning algorithm is used to let the agent learn the search strategy from the historical data, resulting in a reinforcement learning table for representing the rollback strategy.

In the above embodiment, 1) each line traversed by the path decision-oriented test generation algorithm PODEM in the rollback process may be defined as a circuit state (which may be the current circuit state or the next circuit state), where the circuit state includes a line ID and a target value, where the line ID is a unique identifier of a line in the chip to be tested, and the target value is a target that needs to be met by the path decision-oriented test generation algorithm PODEM in the rollback process; 2) The fan-in path selected by the path decision oriented test generation algorithm PODEM in the rollback process can be used as the current execution action; 3) The path decision oriented test generation algorithm PODEM may be assigned a negative reward for each backoff step in the backoff process and a positive reward when the primary input is reached.

Step S106, in the process of activating the test mode of the second fault by using the path decision oriented test generation algorithm PODEM to perform the nth round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, the path decision oriented test generation algorithm PODEM is guided to perform the choice of the rollback path by using the rollback policy, so as to generate the test vector of the chip to be tested, where N is an integer greater than 1, and the specific number depends on the number of faults.

Specifically, the reinforcement learning table may be embedded in the path decision-oriented test generation algorithm PODEM, and the path decision-oriented test generation algorithm PODEM embedded with the reinforcement learning table is used to select the fallback path, so as to reduce the backtracking times and shorten the test generation time.

Through the steps, in the process of generating a test vector for a chip to be tested by using an automatic test vector generation algorithm ATPG, historical data of a path decision-oriented test generation algorithm PODEM is obtained, wherein the historical data is data formed in the process of activating a test mode of a first fault by using the path decision-oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and the test mode of a plurality of faults is activated by using the automatic test vector generation algorithm ATPG to generate the test vector for the chip to be tested, wherein the test mode of the plurality of faults is required to be subjected to multi-round fault sensitization, fault propagation and line value confirmation; learning a rollback strategy from historical data by using a reinforcement learning algorithm; in the process of activating the test mode of the second fault by using the path decision oriented test generation algorithm PODEM to perform the nth round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, the path decision oriented test generation algorithm PODEM is guided to perform the selection of the rollback path by using the rollback strategy, so that the test vector of the chip to be tested is generated, and N is an integer greater than 1. According to the scheme, the Q-learning algorithm in reinforcement learning is introduced into the PODEM test generation algorithm, the Q-learning algorithm is applied to learn the rollback strategy from ATPG data generated by the PODEM algorithm, the learned model is used for guiding the PODEM algorithm to carry out rollback path selection, the traditional heuristic strategy is replaced, so that an effective decision point can be reached as soon as possible, the technical problem that the backtracking times are more in the test vector generation process in the related technology can be solved, and the backtracking times are reduced.

The application provides a method for generating an automatic test vector of a digital circuit based on reinforcement learning, which applies a Q-learning algorithm in reinforcement learning to the automatic test vector generation. In practical production applications, to ensure test quality, multiple rounds of ATPG are often required on the same circuit. However, for complex circuits, multiple rounds of ATPG are inefficient due to the lengthy test generation times. In order to improve the fault detection efficiency of the circuit, the application adopts a new strategy: in the first round of ATPG, test generation results of PODEM are collected and a Markov decision process (Markov Decision Process, MDP) is constructed therefrom. Then, the Q-learning algorithm is used to let the agent learn the search strategy from these collected empirical data, and finally a fully trained Q-table is formed, and then the Q-table is integrated into the ATPG algorithm. By the mode, the PODEM algorithm embedded with the Q-table can more intelligently select the rollback path, effectively reduce the backtracking times and shorten the time for test generation. As an alternative embodiment, the general framework of the application is described below with the example in fig. 2, with the following detailed steps:

Step 1, data acquisition: the PODEM algorithm is used to detect faults in the circuit and to collect critical data required for the reinforcement learning algorithm. These critical data include the back-off path, primary Input (PI) reachability, and the frequency of backtracking occurrences, which are collected because they play an important role in MDP, affecting changes in status, action, and prize values. The rollback path includes a current state, an action taken, and a next state to which the action is transferred after being performed in the current state; primary input reachability and backtracking frequency correspond to the designed rewarding mechanism.

Step 2, Q-table creation: the Q-table uses a simple and efficient way to store and update the value estimates for each state and action pair, which is a two-dimensional table, with one dimension representing possible environmental states and the other dimension representing possible actions taken in those states. In this two-dimensional table, the value of each cell, i.e., the Q-value, represents the return that is expected to be obtained in performing a particular action in a particular state, thereby providing an explicit indicator to the algorithm to guide it in making optimal decisions in each case. Abstracting PODEM the path search process to MDP (including state space, action space, state transition, and rewards function) using a pre-collected dataset, wherein the state space is composed of all possible line sets encountered during rollback; the action space is determined by the number of fan-ins of each node; state transition process: given one line (current state), selecting any one fan-in (action) will cause the agent to enter the other line (next state); the goal of the agent should be compatible with the goal of ATPG, the agent should be allowed to reach the primary input quickly and assign values correctly during rollback, and following this design concept, the design of the reward function should be consistent with the goal of ATPG optimization, since the goal of the agent is to maximize the jackpot. The establishment of MDP allows Q-learning algorithm to be used to solve ATPG problem.

Step3, Q-table deployment: by integrating the trained Q-table into PODEM, PODEM can make more informed decisions when choosing the fallback path. Specifically, it effectively generates a test vector by identifying an optimal backoff path in the current line state using the Q-table. This significantly reduces the number of backtracking in the ATPG process, thereby improving failure detection efficiency.

1) Design rules:

As described above, the present application converts PODEM path search procedures to MDP. This transformation enables the present application to reconstruct the ATPG problem according to RL principles. Next, a sequence of [ s, a, r, s' ] (corresponding to current state, action performed, immediate rewards, next state, respectively) in MDP is studied in depth and detailed design rules for states, actions and rewards in the circuit are provided.

2) And (3) state design:

each line traversed during rollback is defined as a state represented by a line ID and a target value. The line ID is a unique identifier for each line in the circuit, and the target value is the target that PODEM needs to meet during rollback. For example, the state of the line f in fig. 3 is S _f (f, 0) (see ① in fig. 3).

The reason for using the line ID as one of the features is that the Q-learning algorithm looks up the corresponding Q-value by indexing state-action pairs in the Q-table. Thus, characterizing the line ID can help the model quickly locate to the current state during the update of the Q-table. In addition, the feature representation mode can effectively reduce the dimension of the state, so that the calculation complexity of Q-learning and the storage requirement of Q-table are simplified.

One of the reasons for using the target value as one of the features is that two types of faults, namely single fixed type 0 (Stuck-at-0, SA 0) and single fixed type 1 (Stuck-at-1, SA 1), are contained in the circuit at the same time, and the fallback strategies for testing the two types of faults are different. For example, in fig. 3, for SA1 failure on line f, since G2 is and gate, setting f=0 is equivalent to one target: a=0 or d=0. But if there is a SA0 failure on line f, this means that two objectives need to be met: both a and d need to be 1. Thus, the "target value" can help Q-learning make a more accurate localization when both faults are present in the circuit.

3) And (3) action design:

during rollback, the selected fan-in path corresponds to the current action, in fig. 3, line f has two fan-in paths, which means that the agent has two possible actions in state S _f (f, 0), e.g., action 1 (see ② in fig. 3) may be selected to evaluate the prize and bring the agent into the next state (see ④ in fig. 3).

However, the number of fan-ins of different circuit nodes in the actual circuit may be different, which may cause the motion space of the agent in different states to be dynamically changed, for example, the motion space in state S _f (f, 0) is 2, but in state S _c (c, 0) is 1, where the present application uses a specific method to help the agent make more accurate motion selection, creates mask vectors of equal length in each state, the maximum number of fan-ins of logic gates in the circuit determines the length of the mask vector, sets the mask vector according to the number of fan-ins of each circuit node, sets the effective motion to 1, and sets the ineffective motion to 0. For example, in a circuit with a maximum number of 4 fan-ins, the mask vector is [1, 1, 1, 0] for the node containing 3 fan-ins. By setting the mask vector for the action space, the agent can limit its action selection to the allowable range, thereby improving the efficiency of algorithm execution. It should be noted that in the subsequent ATPG phase, if the state encountered during rollback is not in the pre-trained Q-table, this means that the data set used during training is not overlaid to that state. In this case, the agent may default to rollback using the first action.

4) And (3) rewarding design:

The use of the reward signal to specify that the goal is one of the most prominent features of reinforcement learning, whether the reward mechanism design is reasonably straightforward determines whether the final goal of the reinforcement learning task can be achieved. To optimize the search path, the present application assigns a negative prize to each backoff step and a positive prize when PI is reached (see ③ in fig. 3). Since backtracking increases computational and time costs, thereby reducing the performance of ATPG, the present application assigns a larger negative prize to backtracking, so that the algorithm "bypasses" the path that may cause backtracking.

Experiment and result analysis:

In order to be close to the industrial-level chip design flow, the Benchmark circuit used by the invention adopts a commercial EDA tool and a standard process library to comprehensively generate a netlist, and the processing flow is shown in FIG. 4: firstly, reading a Verilog file and a Smic180 process library of a register transmission stage (REGISTER TRANSFER LEVE, RTL), and synthesizing by a DC (Design Compiler) tool of Synopsys company to obtain a gate-level netlist (GATE LEVEL NETLIST); and then reading the gate-level netlist through a test synthesis tool ICTest to build an internal data structure, and performing test generation on the basis of the internal data structure by using ICTest-ATPG.

Experiments were based on the PODEM algorithm of ATPG, and rollback was conducted in the circuit under test using a traditional heuristic (Distance, SCOAP) and a machine learning heuristic (DL, RL) guide PODEM, respectively, to verify the validity of the RL heuristic of the present application. For PODEM adopting the RL heuristic strategy, a dataset required by training is obtained in the first ATPG, then a Q-learning algorithm is utilized to learn a rollback strategy from the dataset, and then a model obtained by training is embedded in PODEM to enhance the performance of the subsequent ATPG. The super parameters of the RL heuristic are set as: the learning rate is 0.01 and the discount factor is 0.99 (closer to 1 means more emphasis is given to future rewards by the agent). The reward mechanism is set as follows: each step in the rollback procedure rewards-0.1, reaching PI rewards +10, resulting in a backtracking of-100.

For PODEM employing the DL heuristic, the present application employs an ANN algorithm to guide PODEM the rollback. To improve the learning efficiency of ANN, "complex circuits" are used as "training circuits" to produce high quality sample data. The complexity of the circuit is measured according to the fault coverage rate, the logic depth and the fan-out number of the logic gates of the circuit. Circuits with lower fault coverage, deeper logic depth, and higher number of logic gate fans are considered to be more "complex". Thus, c499, s9234, and b05 are selected as training circuits for the ANN. The training circuit comprises the following characteristics: logic gate type, line level, number of logic gate fans-out SCOAP (0, 1 controllability, observability). ANNs use a three-layer neural network structure that includes an input layer, a hidden layer, and an output layer. The activation function uses a ReLU and adopts Adam optimization algorithm during the training process. The sample size of the training dataset is limited to between 0 and 100 tens of thousands. Experimental results show that when the data set size is 80 ten thousand and the number of hidden layer nodes is 25, training errors of the ANN model are minimum.

Three groups of experiments were performed in the present application, respectively:

(1) Effect of dataset size on RL policy. First, five PODEM runs (using a random heuristic) on the c499 circuit, resulting in five sets of data sets of different sizes, each set covering a different circuit state. Next, the policies are learned from each set of data sets using a Q-learning algorithm. Finally, c499 is retested using the trained model assist PODEM. The backtracking times under different strategies are shown in table 1.

TABLE 1 influence of dataset size on RL strategy in c499 circuits

(2) Influence of heuristic policies on PODEM performance. Conventional heuristic (Distance and SCOAP) and machine learning heuristic (DL and RL) are applied to the six reference circuits to guide PODEM the back-off path selection. And using the backtracking times, the rollback times, the running time and the fault coverage rate which are commonly used in test generation as measurement indexes, and respectively observing PODEM index performance conditions under different heuristic strategies. The results are shown in Table 2.

TABLE 2 comparison of performance of PODEM under different heuristic strategies

(3) The impact of training dataset size on strategy learning for both machine learning methods was explored. Considering that an effective rollback policy can help PODEM reduce the backtracking times and improve the performance thereof. Thus, the "backtracking times" are used to measure policy effectiveness. The results are shown in Table 3.

Table 3 training data set size for two machine learning methods

From an analysis of the data in Table 1, it can be seen that when using RL-based PODEM to fault detect c499 circuits, the more circuit states the dataset covers, the more efficient the Q-learning strategy and the fewer backtracking times. This result shows that the rich circuit state information can enhance the understanding of the environment by Q-learning, enabling it to learn a more efficient fallback strategy.

As can be seen from table 2, the proposed method is superior to the conventional heuristic and DL heuristic on most circuits. The method can effectively reduce the backtracking times generated by the test, and shows that Q-learning can learn a considerable rollback strategy from the data set. The run time reduction indicates that the method improves the performance of ATPG. Fault coverage is an important indicator of ATPG, and since the reference circuit scale used by current research is not large, the difference of fault coverage under each heuristic strategy is not obvious. It is worth noting that, nevertheless, the RL heuristic proposed by the present application is still superior to other heuristic strategies in some circuits. However, for the s13207 circuit, the RL heuristic is not as good as Distance and DL, which may be due to poor data set quality.

As shown in Table 2, the RL heuristic shows significant performance on the c6288 circuit. By looking at Table 3, in the data set used in c6288, the state coverage of the circuit reaches 94.68%, which means that the rich circuit environment information helps the agent learn the optimal strategy. In contrast, the state coverage of the s13207 circuit is 47.04%, consistent with the RL heuristic that performs poorly on this circuit. However, as the data set size increases, the computational overhead consumed by model training also increases. Thus, it becomes critical to balance state coverage against dataset size. RL heuristic requires training and testing on the same circuit, while DL heuristic requires training on only a few circuits and then testing on other circuits. However, observing the behavior of the c6288 circuit index using the DL heuristic in table 2, the generalization capability of the ANN model still has a certain limitation.

The application verifies on part of reference circuits ISCAS, ISCAS, 89 and ITC99, comprehensively evaluates the backtracking times, the rollback times, the running time and the fault coverage rate generated by the automatic test vector, and the result shows the effectiveness of the application relative to the traditional heuristic strategy and a rollback path selection strategy based on an Artificial Neural Network (ANN).

In summary, automatic test vector generation (ATPG) is a key technology in digital circuit testing, and excessive backtracking in the ATPG process can cause a great deal of consumption of computing resources and seriously affect performance. The application provides a method for generating an automatic test vector of a digital circuit based on reinforcement learning, which applies a Q-learning algorithm in reinforcement learning to POEDM test generation algorithm, so that PODEM can be guided to select a correct rollback path as much as possible, the backtracking times in the test generation process can be effectively reduced, and the performance of the test generation process can be improved. The traditional heuristic strategy (Distance, SCOAP) and a deep learning heuristic strategy (ANN) are comprehensively compared, and the effectiveness of the application is verified by experimental results.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

According to another aspect of the embodiment of the present application, there is also provided a reinforcement learning-based digital circuit automatic test vector generation apparatus for implementing the reinforcement learning-based digital circuit automatic test vector generation method. FIG. 5 is a schematic diagram of a reinforcement learning based digital circuit automatic test vector generation apparatus according to an embodiment of the present application, as shown in FIG. 5, the apparatus may include:

An obtaining unit 51, configured to obtain, in a process of generating a test vector for a chip to be tested by using an automatic test vector generation algorithm ATPG, historical data of a path decision-oriented test generation algorithm PODEM, where the historical data is data formed in a process of activating a test mode of a first fault by performing a first round of fault sensitization, fault propagation and line value confirmation on the chip to be tested by using the path decision-oriented test generation algorithm PODEM, and activating a test mode of a plurality of faults by performing a plurality of rounds of fault sensitization, fault propagation and line value confirmation by using the automatic test vector generation algorithm ATPG;

a learning unit 53 for learning a rollback policy from the history data using a reinforcement learning algorithm;

the generating unit 55 is configured to instruct the path decision oriented test generating algorithm PODEM to select a rollback path by using the rollback policy in a process of performing an nth round of fault sensitization, fault propagation and line value confirmation on the chip to be tested to activate a test mode of a second fault by using the path decision oriented test generating algorithm PODEM, so as to generate a test vector of the chip to be tested, where N is an integer greater than 1.

Through the module, in the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, historical data of a path decision-oriented test generation algorithm PODEM is obtained, wherein the historical data is formed in the process of activating a first fault test mode by utilizing the path decision-oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and multiple rounds of fault sensitization, fault propagation and line value confirmation are required to be performed to generate the test vector for the chip to be tested by utilizing the automatic test vector generation algorithm ATPG to activate multiple fault test modes; learning a rollback policy from the historical data using a reinforcement learning algorithm; and in the process of activating the test mode of the second fault by using the path decision oriented test generation algorithm PODEM to perform the nth round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, guiding the path decision oriented test generation algorithm PODEM to perform the selection of a rollback path by using the rollback strategy, so as to generate a test vector of the chip to be tested, wherein N is an integer greater than 1. According to the scheme, the Q-learning algorithm in reinforcement learning is introduced into the PODEM test generation algorithm, the Q-learning algorithm is applied to learn the rollback strategy from ATPG data generated by the PODEM algorithm, the learned model is used for guiding the PODEM algorithm to carry out rollback path selection, the traditional heuristic strategy is replaced, so that an effective decision point can be reached as soon as possible, the technical problem that the backtracking times are more in the test vector generation process in the related technology can be solved, and the backtracking times are reduced.

Optionally, the learning unit is further configured to: constructing a Markov decision process MDP by utilizing the historical data, wherein the Markov decision process MDP is a mathematical model for describing interaction of an agent and an environment in the reinforcement learning algorithm; and using the reinforcement learning algorithm to enable the agent to learn a search strategy from the historical data, and obtaining a reinforcement learning table for representing the rollback strategy. The generating unit is further configured to: the reinforcement learning table is embedded into the path decision-oriented test generation algorithm PODEM, and the path decision-oriented test generation algorithm PODEM embedded with the reinforcement learning table is utilized to select a rollback path, so that the backtracking times are reduced and the test generation time is shortened.

Optionally, the learning unit is further configured to: and acquiring a rollback path, primary input reachability and frequency of backtracking occurrence in the process of activating a test mode of a first fault by using the path decision-oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, wherein the historical data comprises the rollback path, the primary input reachability and the frequency of backtracking occurrence.

Optionally, the learning unit is further configured to: and establishing the Markov decision process MDP [ s, a, r, s '] comprising the current circuit state s, the execution action a, the instant rewards r and the next circuit state s' of the chip to be tested by utilizing the historical data.

Optionally, the learning unit is further configured to: in the process of learning the rollback policy from the historical data by using the reinforcement learning algorithm, each line traversed by the path decision-oriented test generation algorithm PODEM in the rollback process is defined as a circuit state, and the circuit state includes a line ID and a target value, wherein the line ID is a unique identifier of a line in the chip to be tested, and the target value is a target that needs to be met by the path decision-oriented test generation algorithm PODEM in the rollback process.

Optionally, the learning unit is further configured to: and in the process of learning the rollback strategy from the historical data by using a reinforcement learning algorithm, taking the fanin path selected by the test generation algorithm PODEM facing the path decision in the rollback process as the current execution action.

Optionally, the learning unit is further configured to: during learning of the backoff strategy from the history data using the reinforcement learning algorithm, the path decision oriented test generation algorithm PODEM is assigned a negative reward for each backoff step during backoff and a positive reward when the primary input is reached.

It should be noted that the above modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to what is disclosed in the above embodiments. It should be noted that, the above modules may be implemented in a corresponding hardware environment as part of the apparatus, and may be implemented in software, or may be implemented in hardware, where the hardware environment includes a network environment.

According to another aspect of the embodiment of the present application, there is also provided a server or a terminal for implementing the method for generating an automatic test vector for a digital circuit based on reinforcement learning.

Fig. 6 is a block diagram of a terminal according to an embodiment of the present application, and as shown in fig. 6, the terminal may include: one or more (only one is shown in the figure) processors 601, memory 603, and transmission means 605, as shown in fig. 6, the terminal may further comprise an input output device 607.

The memory 603 may be configured to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for generating an automatic test vector for a digital circuit based on reinforcement learning in the embodiment of the present application, and the processor 601 executes the software programs and modules stored in the memory 603 to perform various functional applications and data processing, that is, to implement the method for generating an automatic test vector for a digital circuit based on reinforcement learning. Memory 603 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory 603 may further include memory remotely located with respect to the processor 601, which may be connected to the terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 605 is used to receive or transmit data via a network, and may also be used for data transmission between the processor and the memory. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 605 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 605 is a Radio Frequency (RF) module that is configured to communicate wirelessly with the internet.

In particular, the memory 603 is used to store applications.

The processor 601 may call an application program stored in the memory 603 through the transmission means 605 to perform the steps of:

In the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, historical data of a path decision-oriented test generation algorithm PODEM is obtained, wherein the historical data are data formed in the process of activating a test mode of a first fault by utilizing the path decision-oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and the test mode of a plurality of faults is activated by utilizing the automatic test vector generation algorithm ATPG to generate the test vector for the chip to be tested, wherein the test mode of the plurality of faults is required to be activated by utilizing the automatic test vector generation algorithm ATPG; learning a rollback policy from the historical data using a reinforcement learning algorithm; and in the process of activating the test mode of the second fault by using the path decision oriented test generation algorithm PODEM to perform the nth round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, guiding the path decision oriented test generation algorithm PODEM to perform the selection of a rollback path by using the rollback strategy, so as to generate a test vector of the chip to be tested, wherein N is an integer greater than 1.

Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments, and this embodiment is not described herein.

It will be appreciated by those skilled in the art that the structure shown in fig. 6 is only illustrative, and the terminal may be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile internet device (Mobile INTERNET DEVICES, MID), a PAD, etc. Fig. 6 is not limited to the structure of the electronic device. For example, the terminal may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in fig. 6, or have a different configuration than shown in fig. 6.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device to execute in association with hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

The embodiment of the application also provides a storage medium. Alternatively, in the present embodiment, the above-described storage medium may be used for program code for executing the reinforcement learning-based digital circuit automatic test vector generation method.

Alternatively, in this embodiment, the storage medium may be located on at least one network device of the plurality of network devices in the network shown in the above embodiment.

Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of:

Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. The method for generating the automatic test vector of the digital circuit based on reinforcement learning is characterized by comprising the following steps of:

In the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, historical data of a path decision-oriented test generation algorithm PODEM is obtained, wherein the historical data are data formed in the process of activating a test mode of a first fault by utilizing the path decision-oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and the test mode of a plurality of faults is activated by utilizing the automatic test vector generation algorithm ATPG to generate the test vector for the chip to be tested, wherein the test mode of the plurality of faults is required to be activated by utilizing the automatic test vector generation algorithm ATPG;

Learning a rollback policy from the historical data using a reinforcement learning algorithm: constructing a Markov decision process MDP by utilizing the historical data, wherein the Markov decision process MDP is a mathematical model for describing interaction between an agent and an environment in the reinforcement learning algorithm; using the reinforcement learning algorithm to enable the agent to learn a search strategy from the historical data to obtain a reinforcement learning table for representing the rollback strategy;

In the process of activating the test mode of the second fault by using the path decision oriented test generation algorithm PODEM to perform the nth round of fault sensitization, fault propagation and line value confirmation on the chip to be tested, the rollback policy is used to guide the path decision oriented test generation algorithm PODEM to perform the selection of the rollback path, so as to generate the test vector of the chip to be tested: embedding the reinforcement learning table into the path decision-oriented test generation algorithm PODEM, and selecting a rollback path by using the path decision-oriented test generation algorithm PODEM embedded with the reinforcement learning table to reduce backtracking times and shorten test generation time, wherein N is an integer greater than 1.

2. The method of claim 1, wherein obtaining historical data of a path decision oriented test generation algorithm PODEM comprises:

And acquiring a rollback path, primary input reachability and frequency of backtracking occurrence in the process of activating a test mode of a first fault by using the path decision-oriented test generation algorithm PODEM to perform first-round fault sensitization, fault propagation and line value confirmation on the chip to be tested, wherein the historical data comprises the rollback path, the primary input reachability and the frequency of backtracking occurrence.

3. The method according to claim 1, wherein constructing a markov decision process MDP using the historical data comprises:

And establishing the Markov decision process MDP [ s, a, r, s '] comprising the current circuit state s, the execution action a, the instant rewards r and the next circuit state s' of the chip to be tested by utilizing the historical data.

4. The method of claim 3, wherein in learning a rollback policy from the historical data using a reinforcement learning algorithm, the method further comprises:

Each line traversed by the path decision-oriented test generation algorithm PODEM in the rollback process is defined as a circuit state, where the circuit state includes a line ID and a target value, where the line ID is a unique identifier of a line in the chip to be tested, and the target value is a target that needs to be met by the path decision-oriented test generation algorithm PODEM in the rollback process.

5. The method of claim 3, wherein in learning a rollback policy from the historical data using a reinforcement learning algorithm, the method further comprises:

And taking the fan-in path selected by the path decision-oriented test generation algorithm PODEM in the rollback process as the current execution action.

6. The method of claim 3, wherein in learning a rollback policy from the historical data using a reinforcement learning algorithm, the method further comprises:

A negative reward is assigned to each backoff step in the backoff process by the path decision oriented test generation algorithm PODEM, and a positive reward is assigned when the primary input is reached.

7. An automatic test vector generation device for a digital circuit based on reinforcement learning, which is characterized by comprising:

The device comprises an acquisition unit, a path decision-oriented test generation algorithm PODEM and a path decision-oriented test generation unit, wherein the acquisition unit is used for acquiring historical data of the path decision-oriented test generation algorithm PODEM in the process of generating a test vector for a chip to be tested by utilizing an automatic test vector generation algorithm ATPG, wherein the historical data are data formed in the process of activating a test mode of a first fault by utilizing the path decision-oriented test generation algorithm PODEM to perform first round fault sensitization, fault propagation and line value confirmation on the chip to be tested, and the test mode of a plurality of faults is activated by utilizing the automatic test vector generation algorithm ATPG to generate the test vector for the chip to be tested;

A learning unit for learning a rollback policy from the history data using a reinforcement learning algorithm: constructing a Markov decision process MDP by utilizing the historical data, wherein the Markov decision process MDP is a mathematical model for describing interaction between an agent and an environment in the reinforcement learning algorithm; using the reinforcement learning algorithm to enable the agent to learn a search strategy from the historical data to obtain a reinforcement learning table for representing the rollback strategy;

The generating unit is configured to, in a process of activating a test mode of a second fault by performing an nth round of fault sensitization, fault propagation and line value confirmation on the chip to be tested by using the path decision oriented test generating algorithm PODEM, instruct the path decision oriented test generating algorithm PODEM to perform a choice of a rollback path by using the rollback policy, so as to generate a test vector of the chip to be tested: embedding the reinforcement learning table into the path decision-oriented test generation algorithm PODEM, and selecting a rollback path by using the path decision-oriented test generation algorithm PODEM embedded with the reinforcement learning table to reduce backtracking times and shorten test generation time, wherein N is an integer greater than 1.

8. A computer readable storage medium, characterized in that the storage medium comprises a stored program, wherein the program when run performs the method of any of the preceding claims 1 to 6.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor performs the method according to any of the preceding claims 1 to 6 by means of the computer program.