CN114461472A

CN114461472A - Full-speed function test method for GPU (graphics processing Unit) core based on ATE (automatic test equipment)

Info

Publication number: CN114461472A
Application number: CN202210073984.0A
Authority: CN
Inventors: 樊石; 秦泰; 秦信刚; 程振洪
Original assignee: 709th Research Institute of CSIC
Current assignee: 709th Research Institute of CSIC
Priority date: 2022-01-21
Filing date: 2022-01-21
Publication date: 2022-05-10

Abstract

The invention discloses a GPU (graphics processing unit) core full-speed function testing method based on ATE (automatic test equipment), and relates to the field of GPU chip research and development and testing. Firstly, the ATE places the GPU chip in a GPU core full-speed function test mode through a mode selection interface, then the ATE configures a clock circuit inside the GPU chip through a control interface, then the ATE loads test data into an internal test memory of the GPU chip through a data interface and a special data path, further, the ATE starts the GPU core through the control interface, the GPU core reads the test data to calculate and writes a result back to the internal test memory of the GPU chip, finally, the ATE inquires the working state of the GPU core through the control interface, and when the working state of the ATE is idle, the test result is read out from the internal test memory of the GPU chip and is compared with a standard result. The invention has the advantages of rapidness, flexibility and low cost, and can be widely applied to the field of development and test of GPU chips.

Description

Full-speed function test method for GPU (graphics processing Unit) core based on ATE (automatic test equipment)

Technical Field

The invention belongs to the field of development and testing of GPU chips, and particularly relates to a full-speed function testing method of a GPU core based on ATE.

Background

With the development of semiconductor technology and the improvement of the process level, the size, complexity and operating frequency of the GPU core in the GPU chip are also continuously increased. With more advanced process nodes, the GPU core can accommodate more logic and memory cells, enabling higher operating frequencies and more complex system functions. The above variations present challenges to volume production testing and design for testability of GPU cores.

Traditional ATE-based mass production testing is mainly used to ensure consistency between production and design, i.e. to pay more attention to physical defects in the chip production and manufacturing process. Static testing based on a fixed (Stuck-At) fault model is still in use, and full-Speed (At-Speed) testing based on a transient (Transition) fault model has been introduced to advanced process nodes. At this time, the full-speed test is still mainly directed to structural tests such as internal logic Scan chain test (Scan), memory built-in self test (MBIST), Boundary Scan test (Boundary Scan), and the like.

In addition to the above structural testing, it is a new requirement for GPU chips to perform full-speed functional testing on GPU cores. In order to achieve the purpose, peripheral devices such as DDR particles and the like need to be arranged on the GPU test board, and a PCIe host is used for being linked with the GPU test board, so that the cost of the test hardware is high; meanwhile, in the testing process, complex PCIe initialization and DDR initialization need to be performed first, and the testing time is long.

In order to meet the requirement of full-speed functional test of the GPU core and avoid introducing too high test hardware cost and too long test time, a novel full-speed functional test method of the GPU core needs to be developed.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide an ATE-based GPU core full-speed function test method, which adopts a multiplexing GPU chip internal clock circuit and a multiplexing GPU chip original firmware memory, aims to solve the problems of high test hardware cost and long test time in the prior GPU core full-speed function test, and is flexible, quick and low in cost.

In order to achieve the above object, the present invention provides a method for testing full-speed functions of a GPU core based on ATE, which comprises the following steps:

s1: the ATE places the GPU chip in a GPU core full-speed functional test mode via the mode selection interface,

s2: the ATE configures a clock circuit inside the GPU chip through a control interface to provide a needed full-speed test clock for the GPU core, the system bus and the internal test memory,

s3: the ATE loads the test data to an internal test memory of the GPU chip through a data interface and a special data path,

s4: the ATE starts a GPU core through a control interface, so that the GPU core reads test data from an internal test memory of the GPU chip for calculation, and writes a test result back to the internal test memory of the GPU chip;

s5: the ATE queries the working state of the GPU core in time through the control interface,

s6: judging whether the working state of the GPU core is idle, if the working state is idle, entering the step S7, if not, jumping to the step S5,

s7: and the ATE reads the test result from the internal test memory of the GPU chip through the data interface and the special data path, and compares the test result with the standard result.

In the above technical solution, the full-speed test clock is generated by a clock circuit inside the GPU chip, and the ATE only needs to provide a low-frequency reference clock to the GPU chip; the frequency of the full-speed test clock can be configured by ATE, so that the full-speed function test requirements of the GPU cores under different working conditions such as conventional voltage, overdrive voltage and the like can be met, and the frequency performance screening requirements of the GPU cores of different batches of chips can also be met.

In the above technical solution, the GPU chip internal test memory for storing test data and test results is a PCIe PHY firmware memory and a DDR PHY firmware memory multiplexed inside the GPU chip in the normal operating mode; discrete memory spaces such as a PCIe PHY firmware memory, a DDR PHY firmware memory and the like in the GPU chip in the normal working mode are remapped into continuous memory spaces of the internal test memory in the GPU core full-speed function test mode and are used for storing test data and test results.

In the above technical solution, the data interface for loading test data and reading test results uses a dedicated non-blocking interface protocol instead of a standard interface protocol including a complex handshake mechanism.

In the above technical solution, the dedicated non-blocking interface protocol further includes the following signals: (1) t _ clk: the signal is an input signal of the GPU chip and provides a reference clock for all other signals on the data interface; (2) t _ address: the signal is an input signal of a GPU chip, comprises 32 bits and can address the internal test memory space of 4GB at most; (3) t _ write: the signal is an input signal of the GPU chip and is used for identifying the transmission direction of test data, and when the signal is 1, the signal represents that data is written into an internal test memory of the GPU chip; (4) t _ read: the signal is an input signal of the GPU chip and is used for identifying the transmission direction of data, and when the signal is 1, the signal represents that the data is read from an internal test memory of the GPU chip; (5) t _ wdata: the signal is an input signal of the GPU chip, comprises 32 bits and is used for transmitting data written into a test memory inside the GPU chip; (6) t _ rdata: the signal is an output signal of the GPU chip, comprises 32 bits and is used for transmitting data read from an internal test memory of the GPU chip; (7) t _ finish: the signal is an output signal of the GPU chip, and a value of 1 indicates that a previous write operation or read operation is completed.

In the above technical solution, when the ATE loads the test data into the GPU chip internal test memory and reads the test result from the GPU chip internal test memory, a dedicated data path is used instead of using a system bus including an intermediate node and a handshake protocol.

In the above technical solution, the dedicated data path for loading test data and reading test results includes a non-blocking pipeline structure, and further includes the following components: (1) inputting a water level: the special data path comprises a plurality of input pipeline stages for caching address, control and data signals such as t _ address, t _ write, t _ read and t _ wdata input by a data interface; (2) output flow water level: the special data path comprises a plurality of output stream levels for buffering data and state signals such as t _ rdata and t _ finish which are output to the data interface.

In the technical scheme, a handshake protocol is not needed between the front stage and the rear stage of the input pipeline stage, data flows in the pipeline stage in a one-way mode without blockage, and the data loading efficiency can be effectively improved; the input pipeline stage performs multi-level caching on test data input by the ATE, so that the problem of overlarge transmission delay of a single clock period in a chip can be avoided, and the data loading frequency can be effectively improved.

In the technical scheme, a handshake protocol is not needed between the front stage and the rear stage of the output pipeline stage, data flows in the pipeline stage in a one-way mode without blockage, and the data reading efficiency can be effectively improved; the output pipeline stage performs multi-level caching on the test result sent to the ATE, so that the problem of overlarge transmission delay of a single clock period in a chip can be avoided, and the data reading frequency can be effectively improved.

In the above technical solution, the number and physical location of the input pipeline stage and the output pipeline stage are determined by the physical layout of the PCIe PHY firmware memory and the DDR PHY firmware memory inside the GPU chip.

Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:

in the method, the internal clock circuit of the GPU chip under the normal working mode is multiplexed, and a full-speed test clock is provided for the full-speed function test of the GPU core. Under a normal working mode, discrete memory spaces such as a PCIe PHY firmware memory, a DDR PHY firmware memory and the like in the GPU chip are remapped into a continuous memory space of an internal test memory under a GPU core full-speed function test mode. The GPU chip internal test memory under the multiplexing normal working mode is specifically a multiplexing PCIe PHY firmware memory and a DDR PHY firmware memory, and provides a memory space for GPU core full-speed function test. The method meets the requirements of full-speed function test and frequency performance screening of the GPU core on ATE in the mass production test process of the GPU chip. Compared with the traditional scheme, the DDR particle and PCIe host can be avoided, and the hardware cost of the test is reduced. Meanwhile, the initialization time of PCIe links and DDR particles is avoided, and the test time is reduced.

Drawings

FIG. 1 is a schematic diagram of a hardware-bearing architecture of an ATE-based GPU core full-speed functional test method according to an embodiment of the present invention;

FIG. 2 is a schematic flowchart of an ATE-based GPU core full-speed functional test method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an internal test memory mapping of an ATE-based GPU core full-speed functional test method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an internal test memory architecture of an ATE-based GPU core full-speed functional test method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a data interface write timing diagram of an ATE-based GPU core full-speed functional test method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a data interface reading sequence of the full-speed functional test method for an ATE-based GPU core according to the embodiment of the present invention;

FIG. 7 is a diagram illustrating a dedicated data path structure of a method for full-speed functional testing of an ATE-based GPU core according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Fig. 1 is a schematic diagram of a hardware load-bearing architecture of an ATE-based GPU core full-speed functional test method in an embodiment of the present invention, and referring to fig. 1, a hardware carrier system required by the ATE-based GPU core full-speed functional test method of the present invention includes the following components: the system comprises a mode selection interface, a control interface, a data interface, a special data channel, a GPU core, an internal test memory, a system bus and a clock circuit.

Specifically, the mode selection interface 101 is connected to the control interface 102 and the data interface 103 at the same time, and the mode selection interface 101 is used for placing the GPU chip in a GPU core full-speed functional test mode; the control interface 102 is also connected to the system bus 107, and the control interface 102 is configured to configure a clock circuit inside the GPU chip to provide a required full-speed test clock for modules such as the GPU core, the internal test memory, and the system bus, and is also configured and queried for the GPU core, so that the ATE can start the GPU core and know the operating state of the GPU core. The data interface 103 is further connected to a dedicated data path 104, and the data interface 103 is configured to load test data into an internal test memory of the GPU chip and read test results from the internal test memory. The dedicated data path 104 is also connected to the internal test memory 106, and the dedicated data path 104 provides a high-bandwidth and low-latency data transmission path between the data interface and the internal test memory of the chip. The GPU core 105 is connected only to the system bus 107, and the GPU core is a graphic processing core of a GPU chip, which is an object to be tested in the present invention. The internal test memory 106 is also connected to the system bus 107, and the internal test memory 106 provides continuous on-chip memory space for full-speed functional testing of the GPU core. The system bus 107 is also connected to a clock circuit 108, and the system bus 107 provides a path for data interaction between the modules in the GPU chip. The ATE can configure the clock circuit 108 and the GPU core 105 inside the GPU chip through the control interface 102 and the system bus 107, and query the working state of the GPU core 105; after the GPU core 105 is started, the test data can be read from the internal test memory 106 through the system bus 107 to perform calculation, and the test result can be written to the internal test memory 106 through the system bus 107. The clock circuit 108 is connected to the system bus 107, and the clock circuit 108 provides the required full-speed test clock for the GPU core, the internal test memory, the system bus, and the like.

Fig. 2 is a schematic flowchart of an ATE-based GPU core full-speed function testing method according to an embodiment of the present invention, and referring to fig. 2, a preferred embodiment of the ATE-based GPU core full-speed function testing method according to the present invention includes the following steps:

s1: enabling the GPU chip to be in a GPU core full-speed functional test mode through the mode selection interface by the ATE;

s2: the ATE configures a clock circuit in the GPU chip through a control interface to provide a required full-speed test clock for a GPU core, a system bus and an internal test memory;

s3: the ATE loads test data to an internal test memory of the GPU chip through a data interface and a special data path;

Fig. 3 is a schematic diagram of an internal test memory map of the full-speed function test method of the ATE-based GPU core in the embodiment of the present invention, and referring to fig. 3, an internal test memory map 301 indicates that when the GPU chip is in a normal operation mode, 4 PCIe PHY firmware memories inside the GPU chip are all 32KB in size, their starting addresses are distributed in discrete address regions such as 0x00a0_0000, 0x00a1_0000, 0x00a2_0000, and 0x00A3_0000, and 4 DDR PHY firmware memories inside the GPU chip are all 32KB in size, their starting addresses are distributed in discrete address regions such as 0x00B0_0000, 0x00B1_0000, 0x00B2_0000, and 0x00B3_0000, and free address regions between the PCIe firmware memory and the DDR PHY firmware memory are used by other modules inside the GPU chip.

The internal test memory map 302 represents: when the GPU chip is in a GPU core full-speed functional test mode, the 4 PCIe PHY firmware memories and the 4 PCIe PHY firmware memories inside the GPU chip are remapped into 256KB continuous memory space to form an internal test memory, and when the GPU core accesses the internal test memory, the starting address of the internal test memory is 0x00F0_ 0000.

The internal test memory map 303 represents: when the GPU chip is in a GPU core full-speed functional test mode, 4 PCIe PHY firmware memories and 4 PCIe PHY firmware memories inside the GPU chip are remapped into 256KB continuous memory space, and an internal test memory is formed. When the ATE accesses the internal test memory through the data interface and the dedicated data path, its start address is 0x0000 — 0000.

Fig. 4 is a schematic diagram of an internal test memory architecture of the full-speed functional test method for the ATE-based GPU core according to the embodiment of the present invention, and referring to fig. 4, the internal test memory architecture of the present invention includes the following components: 4 PCIe PHY firmware memories, 4 DDR PHY firmware memories and 9 multiplexers. The 4 PCIe PHY firmware memories are PCIe PHY3 firmware memory 405, PCIe PHY2 firmware memory 406, PCIe PHY1 firmware memory 407, and PCIe PHY0 firmware memory 408, respectively, and the 4 DDR PHY firmware memories are DDR PHY3 firmware memory 401, DDR PHY2 firmware memory 402, DDR PHY1 firmware memory 403, and DDR PHY0 firmware memory 404, respectively. In the GPU core full-speed functional test mode, the GPU core full-speed functional test mode and the GPU core full-speed functional test mode form an internal test memory. The 8 multiplexers are a first multiplexer 411, a second multiplexer 412, a third multiplexer 413, a fourth multiplexer 414, a fifth multiplexer 415, a sixth multiplexer 416, a seventh multiplexer 417 and an eighth multiplexer 418, respectively. In the normal operating mode, the eight multiplexers select the 8 data streams from the system bus to be sent to the 4 PCIe PHY firmware memories and the 4 DDR PHY firmware memories, respectively.

In the GPU core full speed functional test mode, the data stream from the ninth multiplexer 419 is selected to the internal test memory. Ninth multiplexer 419 may select either the data stream from the GPU core or the data stream from the ATE in the GPU core full speed functional test mode. Under the normal operating mode, 8 data streams from the system bus access 4 PCIe PHY firmware memories and 4 DDR PHY firmware memories, respectively. The 8 data streams from the system bus are a first data stream 421, a second data stream 422, a third data stream 423, a fourth data stream 424, a fifth data stream 425, a sixth data stream 426, a seventh data stream 427, and an eighth data stream 428, respectively. In the GPU core full-speed functional test mode, the data stream 429 from the GPU core is test data read by the GPU core from the internal test memory through the system bus or test results written by the GPU core into the internal test memory through the system bus. In the GPU core full speed functional test mode, the data flow 430 from the ATE is test data that the ATE loads into the internal test memory through the data interface and the dedicated data path, or test results that the ATE reads from the internal test memory through the data interface and the dedicated data path. In the GPU core full speed functional mode, the data stream 431 from the GPU core or from the ATE is sent to the internal test memory after passing through the ninth multiplexer 419.

Fig. 5 is a schematic diagram of a data interface write timing diagram of an ATE-based GPU core full-speed functional test method according to an embodiment of the present invention, referring to fig. 5, which includes the following steps: (1) synchronizing with the rising edge of the t _ clk signal, enabling signals such as t _ address, t _ write and t _ data at the same time, and simultaneously sending information such as a write address, write enable and write data to a GPU chip; (2) after a plurality of t _ clk clock cycles, the information of the write address, the write enable, the write data and the like reaches an internal test memory of the GPU chip; (3) after the data is written into the test memory inside the GPU chip, generating corresponding writing completion state information; (4) after a number of t _ clk clock cycles, the write complete status information is output via t _ finish. Further, as shown in fig. 5, at each rising edge of t _ clk, ATE can send a write request to the GPU chip through the data interface without waiting for a handshake signal output by the data interface; after a plurality of t _ clk clock cycles, the write requests sequentially reach the internal test memory of the GPU chip and write operation is completed; after a plurality of t _ clk clock cycles, the write completion status information corresponding to the write requests is sequentially output through t _ finish of the data interface. The data interface writing time sequence of the ATE-based GPU core full-speed function testing method adopts a special non-blocking interface protocol, and at each rising edge of t _ clk, ATE can send a writing request to a GPU chip through a data interface without waiting for a handshake signal output by the data interface; after a plurality of t _ clk clock cycles, the write requests sequentially reach the internal test memory of the GPU chip and write operation is completed; after a plurality of t _ clk clock cycles, the write completion status information corresponding to the write requests is sequentially output through t _ finish of the data interface.

Fig. 6 is a schematic diagram of a data interface read timing sequence of the method for testing full-speed functions of an ATE-based GPU core according to the embodiment of the present invention, which shows that the method includes the following steps: (1) synchronizing with the rising edge of the t _ clk signal, enabling signals such as t _ address and t _ read simultaneously, and sending information such as read address and read enable to a GPU chip simultaneously; (2) after a plurality of t _ clk clock cycles, the information such as the read address, the read enable and the like reaches a test memory inside the GPU chip; (3) after the data is read out from the test memory inside the GPU chip, generating corresponding read completion state information; (4) after a plurality of t _ clk clock cycles, reading data, completing reading and other information, and outputting through t _ finish. The data interface read time sequence of the method for testing the full-speed function of the GPU core based on the ATE adopts a special non-blocking interface protocol, and at each rising edge of t _ clk, the ATE can send a read request to the data interface without waiting for a handshake signal output by the data interface; after a plurality of t _ clk clock cycles, the reading requests sequentially reach the internal test memory of the GPU chip and data reading operation is completed; after a plurality of t _ clk clock cycles, the read data and the read completion status information are sequentially output via t _ rdata and t _ finish.

The role of each signal of the dedicated non-blocking interface protocol is respectively as follows: (1) t _ clk: the signal is an input signal of the GPU chip and provides a reference clock for all other signals on the data interface; (2) t _ address: the signal is an input signal of a GPU chip, comprises 32 bits and can address the internal test memory space of 4GB at most; (3) t _ write: the signal is an input signal of the GPU chip and is used for identifying the transmission direction of test data, and when the signal is 1, the signal represents that data is written into an internal test memory of the GPU chip; (4) t _ read: the signal is an input signal of the GPU chip and is used for identifying the transmission direction of data, and when the signal is 1, the signal represents that the data is read from an internal test memory of the GPU chip; (5) t _ wdata: the signal is an input signal of the GPU chip, comprises 32 bits and is used for transmitting data written into an internal test memory of the GPU chip; (6) t _ rdata: the signal is an output signal of the GPU chip, comprises 32 bits and is used for transmitting data read from an internal test memory of the GPU chip; (7) t _ finish: the signal is an output signal of the GPU chip, and a value of 1 indicates that a previous write operation or read operation is completed.

Fig. 7 is a schematic diagram illustrating a dedicated data path structure of an ATE-based GPU core full-speed functional test method according to an embodiment of the present invention, which includes a plurality of input pipeline stages and a plurality of output pipeline stages. The number of the input flow water stages is N, the number of the input flow water stages is 701, the number of the input flow water stages is input flow water stage 1, the number of the input flow water stages is 702, and so on, the number of the input flow water stages is 70N, and the total N input flow water stages are used for caching input information such as t _ address, t _ write, t _ read and t _ wdata; every time a t _ clk clock period passes, the input information flows from the previous input pipeline stage to the next input pipeline stage without waiting for a handshake signal sent to the previous input pipeline stage by the next input pipeline stage. The number of the output pipeline stages is also N, wherein the reference number 711 is an output pipeline stage 1, the reference number 712 is an output stream water stage 2, and so on, 71N is an output stream water stage N, and the total N output stream water stages are used for caching output information such as t _ rdata, t _ finish and the like; every time a t _ clk clock period passes, the output information flows from the previous output pipeline stage to the next output pipeline stage without waiting for a handshake signal sent to the previous output pipeline stage by the next output pipeline stage.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A GPU core full-speed function test method based on ATE is characterized by comprising the following steps:

s5: the ATE queries the working state of the GPU core in time through a control interface;

s6: judging whether the working state of the GPU core is idle or not, if the working state is idle, entering the step S7, and if the working state is not idle, jumping to the step S5;

2. The method as claimed in claim 1, wherein in step S2, the full-speed test clock is generated by an internal clock circuit of the GPU chip, the ATE only needs to provide a low-frequency reference clock to the GPU chip, the frequency of the full-speed test clock is the same as the frequency of the GPU chip during normal operation, and the frequency of the full-speed test clock can be configured by the ATE, so as to meet the requirements of full-speed functional test of the GPU core under the operating conditions of normal voltage and overdrive voltage, and also meet the requirements of frequency performance screening of GPU cores of different batches of chips.

3. The method as claimed in claim 2, wherein the PCIe PHY firmware memory and the DDR PHY firmware memory inside the GPU chip are used as the internal test memory of the GPU chip in the multiplexing normal operation mode, and specifically, in the GPU core full-speed functional test mode, the PCIe PHY firmware memory and the DDR PHY firmware memory inside the GPU chip located in the discrete memory space are remapped to be a continuous memory space of the internal test memory for storing the test data and the test result.

4. The method according to claim 3, wherein a dedicated non-blocking interface protocol is used for the data interface for data loading and data reading of the GPU chip internal test memory, and a handshake mechanism is not required, and the dedicated non-blocking interface protocol comprises the following signals:

(1) t _ clk: the signal is an input signal of the GPU chip and provides a reference clock for all other signals on the data interface;

(2) t _ address: the signal is an input signal of a GPU chip, comprises 32 bits and can address the internal test memory space of 4GB at most;

(3) t _ write: the signal is an input signal of the GPU chip and is used for identifying the transmission direction of test data, and when the signal is 1, the signal represents that data is written into an internal test memory of the GPU chip;

(4) t _ read: the signal is an input signal of the GPU chip and is used for identifying the transmission direction of data, and when the signal is 1, the signal represents that the data is read from an internal test memory of the GPU chip;

(5) t _ wdata: the signal is an input signal of the GPU chip, comprises 32 bits and is used for transmitting data written into a test memory inside the GPU chip;

(6) t _ rdata: the signal is an output signal of the GPU chip, comprises 32 bits and is used for transmitting data read from an internal test memory of the GPU chip;

(7) t _ finish: the signal is an output signal of the GPU chip, and a value of 1 indicates that a previous predetermined write operation or read operation is completed.

5. The ATE-based GPU core full-speed functional test method of claim 4, wherein when the ATE loads test data into the GPU chip internal test memory or reads test results from the GPU chip internal test memory, a dedicated data path is used, and the dedicated data path is in a non-blocking pipeline structure.

6. The ATE-based GPU core full-speed functional test method of claim 5, wherein the non-blocking pipeline structure comprises an input pipeline stage and an output pipeline stage, wherein in the special data path, the input direction comprises a plurality of stages of input pipeline stages for buffering address signals, control signals and data signals input by the data interface and comprising t _ address, t _ write, t _ read and t _ wdata, and in the special data path, the output direction comprises a plurality of stages of output pipeline stages for buffering data signals and state signals output to the data interface and comprising t _ rdata and t _ finish.

7. The ATE-based GPU core full-speed functional testing method of claim 6, wherein no handshake protocol is used between the front and back stages of the input pipeline stage, data flows between the front and back stages of the input pipeline stage without blocking in a single direction, in this way, data loading efficiency is improved,

the input pipeline stage performs multi-stage caching on test data input by the ATE so as to avoid the problem of excessive transmission delay of a single clock period in the GPU chip and finally improve the data loading frequency.

8. The ATE-based GPU core full-speed functional testing method of claim 7, wherein a front-stage and a back-stage handshake-free protocol of the output pipeline stage, data flows between the front-stage and the back-stage of the output pipeline stage in a one-way without blocking, in this way, data reading efficiency is improved,

the output pipeline stage performs multi-level caching on the test result sent to the ATE, so that the problem of excessive transmission delay of a single clock period in the GPU chip can be avoided, and finally, the data reading frequency can be effectively improved.

9. The ATE-based method for full-speed functional testing of a GPU core as recited in claim 8, wherein the number and physical location of the input pipeline stages and the output pipeline stages are arranged according to the physical layout of the PCIe PHY firmware memory and the DDR PHY firmware memory within the GPU chip.