CN115048248A

CN115048248A - Failure positioning method applied to FPGA

Info

Publication number: CN115048248A
Application number: CN202210502701.XA
Authority: CN
Inventors: 郑赫男; 袁智皓
Original assignee: Shanghai Anlu Information Technology Co ltd
Current assignee: Shanghai Anlu Information Technology Co ltd
Priority date: 2022-05-10
Filing date: 2022-05-10
Publication date: 2022-09-13

Abstract

The invention discloses a failure positioning method applied to an FPGA. Determining a code block range for failure analysis according to a bottom layer physical netlist and an original code of a tested chip; performing secondary wiring operation on the bottom layer circuit corresponding to the code block range, and leading out intermediate node data of a plurality of fabric circuits of the bottom layer circuit to the outside of a chip; analyzing the bottom-layer physical netlist according to the intermediate node data to generate a plurality of sets of test sets with the code blocks as units, wherein each set of test set comprises a plurality of test patterns; actually measuring the tested chip according to the plurality of test sets, and collecting failure information; and positioning to the physical position of the chip failure according to the failure information. The technical scheme of the invention realizes the reduction of the failure range and the accurate positioning of the failure position by keeping the original physical network table and reproducing the failure.

Description

Failure positioning method applied to FPGA

Technical Field

The invention relates to the technical field of chip failure positioning, in particular to a failure positioning method applied to an FPGA (field programmable gate array).

Background

In all links from production to application of a chip, a failure may occur, and once a complex digital circuit running on a logic chip fails, since the chip resources used are huge (the number of the basic resources involved is thousands to tens of thousands), the pattern required for positioning may also continuously increase with the increase of the complexity of user codes and the integration degree of the chip. The amount of manpower, material resources and time spent in the whole positioning process are increased.

There are generally two types of prior art solutions: firstly, split is carried out aiming at failure pattern, complex design is continuously refined and disassembled to small simple function plates, failure phenomena are tried to reappear, then the function plates which reappear the failure phenomena are continuously split, and the processes are circulated until fundamental failure reasons are obtained. Secondly, an online debugging tool of software is used for extracting signals instantiated by user codes out of the chip and observing data streams of the signals for analysis.

Positioning of a logic chip is difficult due to the fact that the chip is lack of enough DFT circuits, and an inevitable problem, namely, regeneration of a physical netlist, is encountered no matter a failure project is split or an online debugging tool is used. Even if only a point is changed on the code, or even only one compiling environment is changed, when the software is synthesized, the software algorithm decides to use other resources due to the slight change, and other structures are used for realizing the previous functions, so that in the actual failure positioning process, the situation shown in fig. 2 occurs, namely, the failure cannot be reproduced, and further the failure positioning work cannot be carried out.

Disclosure of Invention

The invention provides a failure positioning method applied to an FPGA (field programmable gate array), which realizes the reduction of a failure range and the accurate positioning of a failure position by keeping an original physical netlist and reproducing failures.

An embodiment of the present invention provides a failure location method applied to an FPGA, including the following steps:

determining a code block range for failure analysis according to a bottom layer physical netlist and an original code of a tested chip;

performing secondary wiring operation on the bottom layer circuit corresponding to the code block range, and leading out intermediate node data of a plurality of fabric circuits of the bottom layer circuit to the outside of a chip;

analyzing the bottom-layer physical netlist according to the intermediate node data to generate a plurality of sets of test sets with the code blocks as units, wherein each set of test set comprises a plurality of test patterns;

actually measuring the tested chip according to the plurality of test sets, and collecting failure information;

and positioning to the physical position of the chip failure according to the failure information.

Furthermore, the failure fault circuit is positioned according to the failure information corresponding to the test pattern, and then the corresponding chip failure physical position is positioned according to the failure fault circuit.

Further, the secondary wiring operation is performed on the bottom layer circuit corresponding to the code block range, and the method comprises the following steps:

acquiring idle output IO resources and idle fabric circuits of a chip to be tested;

for first code blocks in the code block range, analyzing resources used by the first code blocks in the bottom-layer physical netlist one by one to obtain one or more resource output ports of the first code blocks;

and connecting the resource output port of one first code block to the idle fabric circuit at a time, and connecting the resource output port of one first code block to the idle output io resource through the idle fabric circuit.

Further, analyzing the bottom-layer physical netlist according to the intermediate node data to generate a plurality of sets of test sets with the code block as a unit, specifically:

the automatic program analyzes the bottom-layer physical netlist according to the intermediate node data to obtain the connection relation of the fabric circuit;

and obtaining expected input values and expected output values of each fabric circuit through circuit simulation, and generating a plurality of sets of test sets in units of code blocks.

Furthermore, each set of test set comprises 1-999 test patterns.

Further, the process of actually measuring the failed chip according to the plurality of test sets is an automatic test process.

The embodiment of the invention has the following beneficial effects:

the invention provides a failure positioning method applied to FPGA, which is characterized in that on the premise of keeping the original physical netlist synthesized by a user unchanged, other resources in a chip are used for leading out a bypass from an intermediate node of an original circuit, the bypass can transmit signals of the intermediate node to the outside of the chip, and the failure range can be reduced again in the later stage of failure analysis by analyzing the signals; because the physical netlist of the user is reserved, the failure condition is not worried about to be incapable of reappearing, and the intercepted signal of the intermediate node can faithfully reflect the internal data of the circuit; therefore, when the failure phenomenon occurs, the data flow error of which code block is caused can be quickly positioned.

Drawings

FIG. 1 is a schematic flow diagram of a prior art failure location method;

fig. 2 is a schematic flowchart of a failure location method applied to an FPGA according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an overall circuit of a failure location method applied to an FPGA according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a single code block of the failure location method applied to an FPGA according to an embodiment of the present invention;

fig. 5 is a schematic diagram of secondary wiring applied to the failure location method of the FPGA according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a test set applied to a failure location method of an FPGA according to an embodiment of the present invention;

fig. 7 is a schematic diagram of constructing a test set for a single code block according to the failure positioning method applied to an FPGA according to an embodiment of the present invention.

Detailed Description

The technical solutions in the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, a failure positioning method applied to an FPGA according to an embodiment of the present invention includes the following steps:

step S101: and determining a code block range for failure analysis according to the bottom-layer physical netlist and the original code of the tested chip. Specifically, the bottom-layer physical netlist and the original code of the tested chip are input into a failure analysis program, and a code block range for failure analysis is defined by a user.

As shown in fig. 3, the whole circuit is formed by connecting block modules (i.e., code blocks). Data exchange and processing are carried out between each block module, and finally functions expected by user codes are realized, so for early analysis, no matter the codes are split, or an online debugging tool is used, the minimum function subset can only stay at the level of the block modules. However, if the station is designed at the chip bottom, as shown in fig. 4, each block module is essentially composed of countless organic combinations of LUT resources, DFF resources, and routing resources, i.e., fabric/fabric resources. Based on the principle, the accuracy of failure location can be further advanced, and the failure of only one block module can be finally located in the past, however, no matter how simple the function of the block module is, the block module at least comprises a plurality of fabric circuits, the actual failure reason may be only caused by the fact that one or more of the fabric circuits are damaged in the physical layer, and the fabric circuits and the physical circuits on the actual chip have a mapping relation. Therefore, the system can be directly locked to an exact position in a chip through reanalysis and relocation of the resources; and then, other personnel can analyze the located failure position further.

Step S102: and performing secondary wiring operation on the bottom layer circuit corresponding to the code block range, and leading out the intermediate node data of the multiple fabric circuits of the bottom layer circuit to the outside of the chip.

And carrying out secondary wiring operation on the bottom layer circuit corresponding to the code block range, wherein the secondary wiring operation comprises the following steps:

obtaining idle output IO resources and idle fabric circuits of a chip to be tested;

As shown in fig. 5, since some idle wiring resources exist in the chip for calling, rewiring can be performed on the bottom layer circuit synthesized by the user writing the code (i.e., the range of the code block for failure analysis), and one or more bypasses are manufactured by secondary wiring to be led out of the chip for analysis (i.e., off-chip analysis) on the premise of not destroying the original circuit setting. The bypass is another data output path led out from a module used on the original physical netlist through an idle fabric circuit and an idle io resource, and the intermediate node data is output to the outside of the chip through the idle io resource for analysis.

Step S103: and analyzing the bottom-layer physical netlist according to the intermediate node data to generate a plurality of sets of test sets with the code blocks as units, wherein each set of test set comprises a plurality of test patterns. Specifically, an automation program analyzes the bottom-layer physical netlist according to the intermediate node data to obtain the connection relation of the fabric circuit;

As shown in fig. 6, the expected input value and the expected output value of each fabric circuit are obtained by circuit simulation, and a plurality of sets of test sets in units of code blocks are generated. Each set of test set comprises 1-999 test patterns.

Step S104: and actually measuring the tested chip according to the plurality of test sets, and collecting failure information. And the process of actually measuring the failed chip according to the plurality of test sets is an automatic test.

Step S105: and positioning to the physical position of the chip failure according to the failure information. Specifically, the failure fault circuit is located according to the failure information corresponding to the test pattern, and then the corresponding chip failure physical position is located according to the failure fault circuit.

If the bottom circuit synthesized by the user writing the code is complex, the bottom circuit can be processed by an automatic program, the connection relation of the fabric circuits is determined by analyzing the bottom physical netlist, the expected input value and the expected output value of each fabric circuit are obtained through circuit simulation, a plurality of sets of test sets are generated, and then the automatic pipelined test is carried out on the failed chip according to the plurality of sets of test sets. Because the expected input value and the expected output value are obtained through simulation in advance, once the simulated value and the measured value are different, the program automatically reserves the fail information corresponding to the pattern, and the user can directly locate the failed fabric circuit through the fail information and also locate the physical position of the chip failure.

As shown in fig. 7, if a user has located a certain block module by a method such as splitting a project or the like in an earlier stage, or if a certain code block fails, the range of the automation program may be adjusted, and only a specific block module or a circuit synthesized by specific codes is analyzed to generate a test pattern set only for the specific block module, so that a situation that the range of a problem point is clearly known but the whole project still needs to be analyzed can be avoided, repeated and meaningless analysis processes are greatly reduced, and analysis efficiency is accelerated.

The terms involved in the embodiments of the present invention are explained as follows:

a logic chip: refers to a chip for implementing user digital logic, the function of which is typically determined by user programming.

Failure positioning: when the logic chip can not realize the functions, the chip is determined to be failed, and the process of narrowing the failed range is called as failure positioning.

Physical netlist: is a file generated by processing user code with a software algorithm, which may be understood narrowly as a circuit built to implement the code function.

FPGA: (Field Programmable Gate Array) Field Programmable Gate Array.

Pattern: chip test vectors.

DFT: the Design For Test refers to that various hardware logics For improving the testability (including controllability and observability) of the chip are inserted in the original Design stage of the chip, and Test vectors are generated through the logics so as to achieve the purpose of testing the large-scale chip.

LUT: the (Look Up Table) refers to the smallest unit in a logic chip for implementing the underlying logical operations.

And (3) DFF: (D type flip-flop) refers to a kind of flip-flop of a clock signal.

Routing: and the wiring resource of the chip is used for transmitting signals.

Fabric: the underlying infrastructure of the logic chip.

The method has the advantages that on the premise that the original physical netlist synthesized by a user is not changed, other resources in the chip are used for leading out a bypass from an intermediate node of an original circuit, the bypass can transmit signals of the intermediate node to the outside of the chip, and the failure range can be reduced again in the later failure analysis stage by analyzing the signals; because the physical netlist of the user is reserved, the failure condition is not worried about to be incapable of reappearing, and the intercepted signal of the intermediate node can faithfully reflect the internal data of the circuit; therefore, when the failure phenomenon occurs, it can be seen that the data flow of which block is wrong.

Illustratively, the computer program may be partitioned into one or more modules that are stored in the memory and executed by the processor to implement the invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device.

The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the terminal device and connects the various parts of the whole terminal device using various interfaces and lines.

The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the terminal device by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein, the terminal device integrated module/unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, can be stored in a computer readable storage medium (i.e. the above readable storage medium). Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc.

It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

It will be understood by those skilled in the art that all or part of the processes of the above embodiments may be implemented by hardware related to instructions of a computer program, and the computer program may be stored in a computer readable storage medium, and when executed, may include the processes of the above embodiments. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims

1. The failure positioning method applied to the FPGA is characterized by comprising the following steps of:

2. The failure locating method applied to FPGA of claim 1,

and positioning to a failure fabric circuit according to failure information corresponding to the test pattern, and positioning to a corresponding chip failure physical position according to the failure fabric circuit.

3. The failure positioning method applied to the FPGA according to claim 2, wherein performing a secondary wiring operation on the underlying circuit corresponding to the code block range comprises:

4. The failure positioning method applied to the FPGA of claim 3, wherein the bottom-layer physical netlist is analyzed according to the intermediate node data to generate a plurality of sets of test sets with a code block as a unit, specifically:

5. The failure positioning method applied to the FPGA according to claim 4, wherein each set of test set comprises 1-999 test patterns.

6. The failure positioning method applied to FPGA according to any one of claims 1 to 5, characterized in that the process of actually measuring the failed chip according to the plurality of test sets is an automated test process.