CN116302756B - Performance test system and method based on FPGA (field programmable Gate array) accelerator card - Google Patents
Performance test system and method based on FPGA (field programmable Gate array) accelerator card Download PDFInfo
- Publication number
- CN116302756B CN116302756B CN202310305022.8A CN202310305022A CN116302756B CN 116302756 B CN116302756 B CN 116302756B CN 202310305022 A CN202310305022 A CN 202310305022A CN 116302756 B CN116302756 B CN 116302756B
- Authority
- CN
- China
- Prior art keywords
- data
- performance
- fpga
- test
- card
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000011056 performance test Methods 0.000 title claims abstract description 89
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000012360 testing method Methods 0.000 claims abstract description 124
- 230000001133 acceleration Effects 0.000 claims abstract description 92
- 230000005540 biological transmission Effects 0.000 claims abstract description 83
- 230000004044 response Effects 0.000 claims abstract description 27
- 238000004422 calculation algorithm Methods 0.000 claims description 17
- 239000013307 optical fiber Substances 0.000 claims description 10
- 238000011156 evaluation Methods 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 230000001360 synchronised effect Effects 0.000 description 11
- 230000008569 process Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000010998 test method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/2205—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/2247—Verification or detection of system hardware configuration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/2273—Test methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
- G06F13/4004—Coupling between buses
- G06F13/4022—Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/0026—PCI express
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application discloses a performance test system and method based on an FPGA (field programmable gate array) accelerator card, and particularly relates to the technical field of computer accelerator cards, comprising the following steps of 1, constructing a test environment based on an OpenCL (open virtual machine language) framework; step 2, the time consumed by the data packet transmitted from the host end to the equipment end is acquired, and the basic response time of the test environment is acquired; step 3, executing a reading operation, acquiring data time and data packet reading time sent by a host end, and acquiring the reading performance of the FPGA acceleration card; step 4, executing write operation, obtaining execution time of the write operation of the DMA engine, and obtaining write performance of the FPGA accelerator card; step 5, testing the transmission speed access frequency from the host end to the off-chip memory and from the off-chip memory to the on-chip random access memory, and obtaining the transmission performance of the FPGA accelerator card; step 6, executing the encoding operation of the data, obtaining the encoding performance of the FPGA accelerator card, and executing the decoding operation, obtaining the decoding performance of the FPGA accelerator card; and 7, comprehensively evaluating the performance of the FPGA acceleration card.
Description
Technical Field
The application relates to the technical field of computer accelerator cards, in particular to a performance test system and method based on an FPGA accelerator card.
Background
The FPGA is a field programmable gate array, is a mainstream hardware platform for realizing algorithm optimization and acceleration, has the characteristics of high performance, low power consumption and high flexibility, and has the characteristics of short period and low cost. FPGA is based on OpenCL framework, and is fully named Open Computing Language, and is a working standard for writing programs by a heterogeneous parallel computing platform, and the heterogeneous computation can be mapped into FPGA equipment. OpenCL provides an abstract model of the underlying hardware structure, which aims to provide a generic, developed call interface. A developer may write a general-purpose computing program that runs on a computing device without having to map its algorithms onto the calling interface of the computing device.
The FPGA accelerating card is obtained through hardware programming, and the FPGA accelerating card is utilized to provide efficient compression, decompression algorithm, video coding, decoding, encryption, decryption, big data analysis, text searching and analysis, machine learning and algorithm verification.
The FPGA acceleration card is used for realizing the function of algorithm acceleration, is based on an OpenCL model (framework), is based on a task parallel and host-side control model, and realizes control through a command queue, wherein each task is data parallel. The OpenCL program reasonably distributes the storage space through optimizing the program design, and can greatly improve the program operation efficiency
The existing test method for the performance of the acceleration card mainly comprises the computing capacity, the memory and the network of the CPU, but the influence of a physical structure at a lower layer in the FPGA acceleration card framework on the FPGA acceleration card is not considered in the tests, so that the test result obtained by the existing test method for the performance of the FPGA acceleration card cannot reflect the real performance of the FPGA acceleration card, and the accuracy is not high.
Disclosure of Invention
In order to overcome the defects in the prior art, the embodiment of the application provides a performance test system and a performance test method based on an FPGA (field programmable gate array) accelerator card, which improve the accuracy of performance test of the FPGA accelerator card by building a test environment of the FPGA accelerator card, testing data transmission performance and data reading speed and evaluating the comprehensive performance of the FPGA accelerator card.
In order to achieve the above purpose, the present application provides the following technical solutions: the system comprises a test environment building module, a data response performance test module, a data read-write performance test module, a data transmission performance test module, a data encoding and decoding performance test module and a comprehensive performance evaluation module, wherein the environment building module is used for building an environment suitable for performance test of the FPGA accelerator card, the test environment comprises a host end and a device end based on an OpenCL framework, the host end operates on a host processor, the device end comprises the FPGA accelerator card, and the host end and the device end perform data transmission through a bus; the data response performance test module is used for testing the performance of the device-side compiled FPGA acceleration card execution file, testing the time consumed by transmitting the data packet from the host side to the device side, and calculating the basic response time B; the data read-write performance test module is used for testing the read speed and the write speed of the execution unit of the FPGA acceleration card on data; the data transmission performance test module is used for acquiring the data transmission speed and data of the FPGA accelerator card; the data coding and decoding performance testing module is used for testing the performance of the FPGA acceleration card on data coding and decoding, and after binary data and commands are obtained by the equipment end, the FPGA acceleration card is used for realizing the data coding and decoding operation; the data response performance test module is used for testing the data transmission performance of the FPGA accelerator card, testing the time consumed by transmitting the data packet from the host end to the equipment end, and calculating the basic response time, and comprises the following steps:
step S01, submitting kernel program codes from a host end to an equipment end in a command mode by the host end, compiling the kernel program codes into XO files, and compiling the XO files into binary files executable by an FPGA acceleration card by using a V++ language;
step S02, recording time t0 of sending kernel codes by a host end, and recording generation time t1 of binary files;
step S03, verifying whether the obtained binary file can be executed by the equipment end, if so, judging that the test is valid, otherwise, judging that the test is invalid;
step S04, repeatedly executing the steps S01-S03 until the effective test times reach the appointed test times n;
step S05, obtaining the sizes of all the data compiling binary files for effective test, recording the sizes of the binary files as size1, size2, … sizen, recording the time for sending commands by the host end as t01, t02, …, t0n, recording the generation time of the binary files as t11, t12, …, t1n, and calculating to obtain the data response performance transmitted from the host end to the equipment end, thereby meeting the requirements of
In a preferred embodiment, the FPGA acceleration card comprises an interface module, a power conversion module, a clock module and a storage module, wherein the FPGA acceleration card is connected with a host device end through a PCI-e bus, the interface module comprises an optical fiber interface, and an optical fiber interface indicator lamp is lighted to indicate that the FPGA acceleration card and the host device establish a data transmission channel; the power conversion module is used for eliminating interference signals, the clock module is used for providing independent reference clocks, the storage module comprises an off-chip memory and an on-chip memory, and the on-chip memory is a low-delay memory.
In a preferred embodiment, the OpenCL framework includes an OpenCL protocol layer, a hardware abstraction layer, and a PCIe device driver layer, where the OpenCL protocol layer is configured to implement management command queues, the hardware abstraction layer provides an access device handle for the protocol layer, provides a device control access interface for the device driver layer, and the driver layer is configured to drive a device side.
In a preferred embodiment, the data read-write performance test module includes a data read performance test unit and a data write performance test unit, the data read performance test unit including the steps of:
step S11, transmitting the data packet with the size of size to the internal memory of the equipment end through a DMA engine at the host end so as to finish DMA read operation of the data packet;
step S12, acquiring data time t2 sent by a host end through an index space, and acquiring time t3 for reading a data packet by an FPGA acceleration card;
s13, comparing the data received in the FPGA off-chip memory with the data packet of the host end, if the data are the same, judging that the test is effective, otherwise, judging that the test is ineffective;
step S14, repeatedly executing the steps S11 to S13 until the effective test times reach the appointed test times n;
step S15, obtaining data transmission time of all effective tests, recording the sizes of data packets as size1, size2 and … sizen, recording the data time sent by a host end as t21, t22, … and t2n, recording the time for reading the data packets by an FPGA accelerator card as t31, t22, … and t3n, so as to calculate that the reading performance of the FPGA accelerator card meets the formula
In a preferred embodiment, the data write performance test unit comprises the steps of:
s21, initializing a data packet with the size in an on-chip memory of an FPGA accelerator card, and transmitting the data packet from a virtual machine memory to an equipment end through a DMA engine in the FPGA accelerator card so as to finish DMA write operation on the data packet;
step S22, acquiring the execution time of the DMA engine write operation through an index space, and marking as t4;
s23, comparing an on-chip memory of the FPGA accelerator card with data written into the FPGA device, if the on-chip memory of the FPGA accelerator card is the same as the data written into the FPGA device, judging that the writing operation is effective, otherwise, judging that the test is ineffective;
step S24, repeatedly executing the steps S21 to S23 until the effective test times reach the preset test times n;
step S25, obtaining the data transmission time of each effective test, which is marked as t1, t2, … and tn, so as to calculate the writing performance of the FPGA accelerator card to satisfy the formula
In a preferred embodiment, the data transmission performance test, in which there are two data transmission channels in the Opencl framework, one is from the host to the off-chip memory and from the off-chip memory to the on-chip random access memory, the data at the host end is copied to the off-chip memory through PCIe during calculation, and when the kernel program really needs the data, the off-chip data is copied to the on-chip random access memory, and the data transmission performance test includes the following steps:
step S31, testing the transmission speed from the host end to the off-chip memory: the data transmission start time from the host device to the off-chip memory is denoted as T1, the end time is denoted as T2, the data block size is denoted as D1, the PCIe transmission speed V1=D1/(T2-T1), and the transmission speed V1 from the host terminal to the off-chip memory is obtained;
step S32, testing the transmission speed from the off-chip memory to the on-chip random access memory: the start time of data transmission from the off-chip memory to the on-chip memory is denoted as T3, the end time is denoted as T4, the data block size is denoted as D2, the PCIe transmission speed V2 = D2/(T4-T3), and the transmission speed V2 from the off-chip memory to the on-chip random access memory;
step S33, obtaining the access frequency Npl of the on-chip memory, calculating the access frequency Wpl of the off-chip memory, and calculating that the transmission performance of the FPGA accelerator card satisfies cs=v1× Npl +v2× Wpl.
In a preferred embodiment, the data coding performance test module includes a data coding performance test unit and a data decoding performance test unit, the data coding performance test unit including the steps of:
step S41, data to be encoded and an encoding command are sent from a host end, and an FPGA acceleration card invokes an encoding algorithm to execute an encoding task on the data so as to finish the encoding operation of the data, wherein the size of the data to be encoded is size;
step S42, obtaining the data coding time of each effective test, namely t1, t2, … and tn, obtaining the read-write operation of the off-chip memory and the on-chip memory from an index space, calculating the duty ratio of the access on-chip memory, and obtaining the frequency of the access on-chip memory in each coding operation, namely Npl, npl2, … and Npln;
step S43, performing read operation on the decoded data, comparing the read result with data to be encoded sent by a host end, if the read result is the same as the data to be encoded, judging that the encoding test operation is invalid, otherwise, judging that the encoding test operation is valid;
step S44, repeatedly executing the steps S41-S43 until the effective test times reach the preset test times n;
step S45, calculating the coding performance formula of the FPGA acceleration card to meet the requirement of
In a preferred embodiment, the data encoding performance test unit comprises the steps of:
step S51, data to be decoded and a decoding command are sent from a host end, and a decoding algorithm is called through an FPGA acceleration card to perform decoding operation;
step S52, acquiring execution time for executing data coding operation through an index space, which is marked as t, acquiring read-write operation of an off-chip memory and an on-chip memory from the index space, calculating the duty ratio of access on-chip memory, and acquiring the frequency of access on-chip memory in each coding operation, which is marked as Wpl1, wpl, … and Wpln;
step S53, comparing the decoding result of the FPGA accelerator card with the data to be encoded sent by the host, if the decoding result is the same as the data to be encoded, judging that the decoding test operation is effective, otherwise, judging that the test is ineffective;
step S54, repeatedly executing the steps S51 to S53 n times, and recording the effective test times n1;
s54, calculating the decoding performance of the FPGA acceleration card, wherein a decoding performance formula of the FPGA acceleration card is calculated to meet the following requirementsWhere k1 and k2 are the influencing factor constants.
In a preferred embodiment, the comprehensive performance evaluation module comprehensively evaluates the performance of the FPGA acceleration card based on the read-write performance, the data transmission performance and the encoding and decoding performance of the FPGA acceleration card, and satisfies the formula
In order to achieve the above purpose, the present application provides the following technical solutions: a performance test method based on FPGA acceleration card includes the following steps;
step S001, constructing a test environment based on an OpenCL framework, and enabling the FPGA accelerator card to be connected with a host program through an optical fiber interface;
step S002, the time consumed by the data packet from the host end to the equipment end is transmitted, and the basic response time of the test environment is obtained;
s003, transmitting the data packet to the memory of the equipment end through the DMA engine at the host end to complete the reading operation, acquiring the data time sent by the host end and the data packet reading time, and acquiring the reading performance of the FPGA accelerator card;
step S004, initializing a data packet in an on-chip memory of the FPGA accelerator card, completing DMA write operation of the data packet, acquiring execution time of the DMA engine write operation, and acquiring write performance of the FPGA accelerator card;
step S005, testing the transmission speed and the access frequency from the host end to the off-chip memory, testing the transmission speed access frequency from the off-chip memory to the on-chip random access memory, and obtaining the transmission performance of the FPGA accelerator card;
step S006, executing the encoding operation of the data, obtaining the encoding performance of the FPGA accelerator card, and executing the decoding operation, obtaining the decoding performance of the FPGA accelerator card;
and S007, comprehensively evaluating the performance of the FPGA acceleration card based on the read-write performance, the data transmission performance and the encoding and decoding performance of the FPGA acceleration card.
The application has the technical effects and advantages that:
according to the method, the test environment of the FPGA accelerator card is built based on an OpenCL framework, the operation is convenient and easy to implement, the response time caused by the test environment is obtained through the data response performance test module, the read-write performance of the FPGA accelerator card is obtained through the data read-write performance test, the access frequency of the on-chip memory is obtained through calculation through the transmission speed of the data transmission performance test module from the host end to the off-chip memory and the transmission speed of the off-chip memory to the on-chip random memory respectively, the access frequency of the off-chip memory is calculated, the transmission performance of the FPGA accelerator card is finally obtained accurately, the performance of an FPGA calling coding algorithm and a decoding algorithm is finally obtained through the data encoding and decoding performance test module, and finally the performance evaluation of the FPGA accelerator card is realized based on the read-write performance, the data transmission performance and the encoding and decoding performance of the FPGA accelerator card, so that the problems that the actual performance of the FPGA accelerator card cannot be reflected by the performance test result of the traditional FPGA accelerator card and the accuracy is not high are solved.
Drawings
Fig. 1 is a block diagram of a system architecture of the present application.
FIG. 2 is a flow chart of the data response performance test of the present application.
FIG. 3 is a flow chart of the data read performance test of the present application.
Fig. 4 is a flow chart of a method of the system of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "module," "system," and the like as used herein are intended to include a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a module may be, but is not limited to: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a module. One or more modules may be located in one process and/or thread of execution, and one module may be located on one computer and/or distributed between two or more computers.
Example 1
The embodiment provides a performance test system based on an FPGA (field programmable gate array) accelerator card as shown in fig. 1, which comprises a test environment building module, a data response performance test module, a data read-write performance test module, a data transmission performance test module, a data encoding and decoding performance test module and a comprehensive performance evaluation module, wherein the environment building module is used for building an environment suitable for performance test of the FPGA accelerator card, the test environment comprises a host end and a device end based on an OpenCL framework, the host end operates on a host processor, the device end comprises the FPGA accelerator card, and the host end and the device end perform data transmission through a bus; the data response performance test module is used for testing the performance of the device-side compiled FPGA acceleration card execution file, testing the time consumed by transmitting the data packet from the host side to the device side, and calculating the basic response time B; the data read-write performance test module is used for testing the read speed and the write speed of the execution unit of the FPGA acceleration card on data; the data transmission performance test module is used for acquiring the data transmission speed and data of the FPGA accelerator card; the data coding and decoding performance testing module is used for testing the performance of the FPGA acceleration card on data coding and decoding, and after binary data and commands are obtained by the equipment end, the FPGA acceleration card is used for realizing the data coding and decoding operation;
as shown in fig. 2, the data response performance testing module is configured to test the data transmission performance of the FPGA accelerator card, test the time consumed for transmitting the data packet from the host side to the device side, and calculate the basic response time, and includes the following steps:
step S01, submitting kernel program codes from a host end to an equipment end in a command mode by the host end, compiling the kernel program codes into XO files, and compiling the XO files into binary files executable by an FPGA acceleration card by using a V++ language;
step S02, recording time t0 of sending kernel codes by a host end, and recording generation time t1 of binary files;
step S03, verifying whether the obtained binary file can be executed by the equipment end, if so, judging that the test is valid, otherwise, judging that the test is invalid;
step S04, repeatedly executing the steps S01-S03 until the effective test times reach the appointed test times n;
step S05, obtaining the sizes of all the data compiling binary files for effective test, recording the sizes of the binary files as size1, size2, … sizen, recording the time for sending commands by the host end as t01, t02, …, t0n, recording the generation time of the binary files as t11, t12, …, t1n, and calculating to obtain the data response performance transmitted from the host end to the equipment end, thereby meeting the requirements of
Further, the FPGA acceleration card comprises an interface module, a power supply conversion module, a clock module and a storage module, wherein the FPGA acceleration card is connected with a host device end through a PCI-e bus, the interface module comprises an optical fiber interface, and an optical fiber interface indicator lamp is lighted to indicate that the FPGA acceleration card and the host device establish a data transmission channel; the power conversion module is used for eliminating interference signals, the clock module is used for providing independent reference clocks, the storage module comprises an off-chip memory and an on-chip memory, and the on-chip memory is a low-delay memory.
Further, the OpenCL framework includes an OpenCL protocol layer, a hardware abstraction layer, and a PCIe device driver layer, where the OpenCL protocol layer is configured to implement management command queues, the hardware abstraction layer provides access device handles for the protocol layer, provides a device control access interface for the device driver layer, and the driver layer is configured to drive a device end.
As shown in fig. 3, the data read-write performance test module includes a data read performance test unit and a data write performance test unit, and the data read performance test unit includes the following steps:
step S11, transmitting the data packet with the size of size to the internal memory of the equipment end through a DMA engine at the host end so as to finish DMA read operation of the data packet;
step S12, acquiring data time t2 sent by a host end through an index space, and acquiring time t3 for reading a data packet by an FPGA acceleration card;
s13, comparing the data received in the FPGA off-chip memory with the data packet of the host end, if the data are the same, judging that the test is effective, otherwise, judging that the test is ineffective;
step S14, repeatedly executing the steps S11 to S13 until the effective test times reach the appointed test times n;
step S15, obtaining data transmission time of all effective tests, recording the sizes of data packets as size1, size2 and … sizen, recording the data time sent by a host end as t21, t22, … and t2n, recording the time for reading the data packets by an FPGA accelerator card as t31, t22, … and t3n, so as to calculate that the reading performance of the FPGA accelerator card meets the formula
Further, the data writing performance test unit includes the following steps:
s21, initializing a data packet with the size in an on-chip memory of an FPGA accelerator card, and transmitting the data packet from a virtual machine memory to an equipment end through a DMA engine in the FPGA accelerator card so as to finish DMA write operation on the data packet;
step S22, acquiring the execution time of the DMA engine write operation through an index space, and marking as t4;
s23, comparing an on-chip memory of the FPGA accelerator card with data written into the FPGA device, if the on-chip memory of the FPGA accelerator card is the same as the data written into the FPGA device, judging that the writing operation is effective, otherwise, judging that the test is ineffective;
step S24, repeatedly executing the steps S21 to S23 until the effective test times reach the preset test times n;
step S25, obtaining the data transmission time of each effective test, which is marked as t1, t2, … and tn, so as to calculate the writing performance of the FPGA accelerator card to satisfy the formula
Furthermore, in the data transmission performance test, there are two data transmission channels in the Opencl framework, one is from the host to the off-chip memory and from the off-chip memory to the on-chip random access memory, when the data is calculated, the data at the host end is copied to the off-chip memory through PCIe, and when the kernel program really needs the data, the off-chip data is copied to the on-chip random access memory, and the data transmission performance test includes the following steps:
step S31, testing the transmission speed from the host end to the off-chip memory: the data transmission start time from the host device to the off-chip memory is denoted as T1, the end time is denoted as T2, the data block size is denoted as D1, the PCIe transmission speed V1=D1/(T2-T1), and the transmission speed V1 from the host terminal to the off-chip memory is obtained;
step S32, testing the transmission speed from the off-chip memory to the on-chip random access memory: the start time of data transmission from the off-chip memory to the on-chip memory is denoted as T3, the end time is denoted as T4, the data block size is denoted as D2, the PCIe transmission speed V2 = D2/(T4-T3), and the transmission speed V2 from the off-chip memory to the on-chip random access memory;
step S33, obtaining the access frequency Npl of the on-chip memory, calculating the access frequency Wpl of the off-chip memory, and calculating that the transmission performance of the FPGA accelerator card satisfies cs=v1× Npl +v2× Wpl.
Further, the data coding and decoding performance test module comprises a data coding performance test unit and a data decoding performance test unit, and the data coding performance test unit comprises the following steps:
step S41, data to be encoded and an encoding command are sent from a host end, and an FPGA acceleration card invokes an encoding algorithm to execute an encoding task on the data so as to finish the encoding operation of the data, wherein the size of the data to be encoded is size;
step S42, obtaining the data coding time of each effective test, namely t1, t2, … and tn, obtaining the read-write operation of the off-chip memory and the on-chip memory from an index space, calculating the duty ratio of the access on-chip memory, and obtaining the frequency of the access on-chip memory in each coding operation, namely Npl, npl2, … and Npln;
step S43, performing read operation on the decoded data, comparing the read result with data to be encoded sent by a host end, if the read result is the same as the data to be encoded, judging that the encoding test operation is invalid, otherwise, judging that the encoding test operation is valid;
step S44, repeatedly executing the steps S41-S43 until the effective test times reach the preset test times n;
step S45, calculating the coding performance formula of the FPGA acceleration card to meet the requirement of
Further, the data coding performance test unit includes the steps of:
step S51, data to be decoded and a decoding command are sent from a host end, and a decoding algorithm is called through an FPGA acceleration card to perform decoding operation;
step S52, acquiring execution time for executing data coding operation through an index space, which is marked as t, acquiring read-write operation of an off-chip memory and an on-chip memory from the index space, calculating the duty ratio of access on-chip memory, and acquiring the frequency of access on-chip memory in each coding operation, which is marked as Wpl1, wpl, … and Wpln;
step S53, comparing the decoding result of the FPGA accelerator card with the data to be encoded sent by the host, if the decoding result is the same as the data to be encoded, judging that the decoding test operation is effective, otherwise, judging that the test is ineffective;
step S54, repeatedly executing the steps S51 to S53 n times, and recording the effective test times n1;
s54, calculating the decoding performance of the FPGA acceleration card, wherein a decoding performance formula of the FPGA acceleration card is calculated to meet the following requirementsWhere k1 and k2 are the influencing factor constants.
Further, the encoding process of the FPGA acceleration card according to the embodiment of the present application includes the following steps:
step 1, receiving 16-bit parallel data sent by a host end at an FPGA acceleration card, determining control words by the FPGA according to a lookup table sequence, then determining that the control words generate pulses, compressing the data by using an FSAT protocol, detecting whether encoding starts or not, and generating an encoding synchronization head if the encoding starts;
step 2, generating odd check by using the coming data at the same time;
step 3, after the generation of the synchronous head and the odd check is completed, the synchronous head and the odd check are linked with data to generate encoded data, and the validity of the encoding operation is verified;
further, the encoding process of the FPGA acceleration card in the embodiment of the present application includes the following steps:
step 1, detecting a synchronous head, namely judging whether input data is data or a command, wherein the input data comprises an input signal decoding clock, a reset signal and the input data, after the FPGA starts to receive the data, detecting the input data all the time under the decoding clock, carrying out shift register on the data and putting the data into a register, starting to detect the synchronous head when the input data has a jump, reading the data in the shift register to a synchronous mode and outputting a synchronous type, and determining the synchronous head type as command synchronization by a signal format when the shift register reads the jump of the synchronous head which is a falling edge;
step 2, decoding the input coded data, when the synchronous head confirms to be effective, starting working of a counter, sampling the data, obtaining the sampled data by a shift register to finish decoding, and judging to finish parity check through the last bit of sampling information read when the counter cnt counts to check bits;
and step 3, converting serial data into parallel data to be output, wherein the data output comprises an output signal decoding signal, decoding completion enabling, command synchronous output, data synchronous output, check bit output and shift register.
Furthermore, the comprehensive performance evaluation module comprehensively evaluates the performance of the FPGA acceleration card based on the read-write performance, the data transmission performance and the encoding and decoding performance of the FPGA acceleration card, and satisfies the formula
As shown in fig. 4, the embodiment of the application provides the following technical scheme: a performance test method based on FPGA acceleration card includes the following steps;
step S001, building a performance test environment: setting up a test environment based on an OpenCL framework, and enabling the FPGA acceleration card to be connected with a host program through an optical fiber interface;
step S002, data response performance test: the time consumed by transmitting the data packet from the host end to the equipment end is acquired as the basic response time B of the test environment;
step S003, testing data reading performance: transmitting a data packet to a memory of an equipment end through a DMA engine at a host end to finish DMA reading operation of the data packet, acquiring data time sent by the host end and time for reading the data packet through an index space, and acquiring reading performance D of an FPGA acceleration card;
step S004, testing data writing performance: initializing a data packet in an on-chip memory of an FPGA acceleration card, transmitting the data packet from a virtual machine memory to an equipment end through a DMA engine in the FPGA acceleration card so as to finish DMA writing operation of the data packet, and acquiring the execution time of the DMA engine writing operation through an index space so as to acquire the writing performance X of the FPGA acceleration card;
step S005, data transmission performance test: testing the transmission speed access frequency from a host end to an off-chip memory and from the off-chip memory to an on-chip random access memory, and obtaining the transmission performance of the FPGA acceleration card;
step S006, data encoding and decoding performance test: the method comprises the steps that data to be encoded and an encoding command are sent from a host end, an FPGA acceleration card invokes an encoding algorithm to execute an encoding task on the data, the encoding performance BM of the FPGA acceleration card is obtained, the data to be decoded and a decoding command are sent from the host end, a decoding operation is executed through the FPGA acceleration card invoking a decoding algorithm, and the decoding performance JM of the FPGA acceleration card is obtained;
step S007, comprehensive performance evaluation: and comprehensively evaluating the performance of the FPGA acceleration card based on the read-write performance, the data transmission performance and the encoding and decoding performance of the FPGA acceleration card.
The present embodiment provides only one implementation and does not specifically limit the protection scope of the present application.
Finally: the foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the application are intended to be included within the scope of the application.
Claims (4)
1. A performance test system based on an FPGA acceleration card is characterized in that: the system comprises a test environment building module, a data response performance test module, a data read-write performance test module, a data transmission performance test module, a data encoding and decoding performance test module and a comprehensive performance evaluation module, wherein the environment building module is used for building an environment suitable for performance test of an FPGA (field programmable gate array) accelerator card, the test environment comprises a host end and a device end based on an OpenCL (open cell library) framework, the host end operates on a host processor, the device end comprises the FPGA accelerator card, and the host end and the device end perform data transmission through a bus; the data response performance test module is used for testing the performance of the device-side compiled FPGA acceleration card execution file, testing the time consumed by transmitting the data packet from the host side to the device side, and calculating the basic response time B; the data read-write performance test module is used for testing the read speed and the write speed of the execution unit of the FPGA acceleration card on data; the data transmission performance test module is used for acquiring the data transmission speed and data of the FPGA accelerator card; the data coding and decoding performance testing module is used for testing the performance of the FPGA acceleration card on data coding and decoding, and after binary data and commands are obtained by the equipment end, the FPGA acceleration card is used for realizing the data coding and decoding operation; the data response performance test module is used for testing the data transmission performance of the FPGA accelerator card, testing the time consumed by transmitting the data packet from the host end to the equipment end, and calculating the basic response time, and comprises the following steps:
step S01, submitting kernel program codes from a host end to an equipment end in a command mode by the host end, compiling the kernel program codes into XO files, and compiling the XO files into binary files executable by an FPGA acceleration card by using a V++ language;
step S02, recording time t0 of sending kernel codes by a host end, and recording generation time t1 of binary files;
step S03, verifying whether the obtained binary file can be executed by the equipment end, if so, judging that the test is valid, otherwise, judging that the test is invalid;
step S04, repeatedly executing the steps S01-S03 until the effective test times reach the appointed test times n;
step S05, obtaining the sizes of all the data compiled binary files tested effectively, and recording the sizes of the binary files as size 1 ,size 2 ,…size n The time of sending command at the host end is recorded as t 01 ,t 02 ,…,t 0n The generation time of the binary file is recorded as t 11 ,t 12 ,…,t 1n The data response performance transmitted from the host end to the equipment end is calculated and obtained, and the requirements are met
The data read-write performance test module comprises a data read-performance test unit and a data write-performance test unit, wherein the data read-performance test unit comprises the following steps:
step S11, transmitting the data packet with the size of size to the internal memory of the equipment end through a DMA engine at the host end so as to finish DMA read operation of the data packet;
step S12, acquiring data time t2 sent by a host end through an index space, and acquiring time t3 for reading a data packet by an FPGA acceleration card;
s13, comparing the data received in the FPGA off-chip memory with the data packet of the host end, if the data are the same, judging that the test is effective, otherwise, judging that the test is ineffective;
step S14, repeatedly executing the steps S11 to S13 until the effective test times reach the appointed test times n;
step S15, obtaining all data transmission time of effective test, and recording the size of the data packet as size 1 ,size 2 ,…size n The data time sent by the host end is recorded as t 21 ,t 22 ,…,t 2n The time for reading the data packet by the FPGA accelerator card is recorded as t 31 ,t 22 ,…,t 3n The reading performance of the FPGA acceleration card is calculated to meet the formula
The data write performance test unit includes the steps of:
s21, initializing a data packet with the size in an on-chip memory of an FPGA accelerator card, and transmitting the data packet from a virtual machine memory to an equipment end through a DMA engine in the FPGA accelerator card so as to finish DMA write operation on the data packet;
step S22, acquiring the execution time of the DMA engine write operation through an index space, and marking as t4;
s23, comparing an on-chip memory of the FPGA accelerator card with data written into the FPGA device, if the on-chip memory of the FPGA accelerator card is the same as the data written into the FPGA device, judging that the writing operation is effective, otherwise, judging that the test is ineffective;
step S24, repeatedly executing the steps S21 to S23 until the effective test times reach the preset test times n;
step S25, obtaining the data transmission time of each valid test, which is marked as t 1 ,t 2 ,…,t n Satisfying the formula by calculating the writing performance of the FPGA acceleration card
The data transmission performance test comprises the following steps:
step S31, testing the transmission speed from the host end to the off-chip memory: the data transmission start time from the host device to the off-chip memory is denoted as T1, the end time is denoted as T2, the data block size is denoted as D1, the PCIe transmission speed V1=D1/(T2-T1), and the transmission speed V1 from the host terminal to the off-chip memory is obtained;
step S32, testing the transmission speed from the off-chip memory to the on-chip random access memory: the start time of data transmission from the off-chip memory to the on-chip memory is denoted as T3, the end time is denoted as T4, the data block size is denoted as D2, the PCIe transmission speed V2 = D2/(T4-T3), and the transmission speed V2 from the off-chip memory to the on-chip random access memory;
step S33, obtaining the access frequency Npl of the on-chip memory, calculating the access frequency Wpl of the off-chip memory, and calculating that the transmission performance of the FPGA accelerator card meets CS=V1× Npl +V2× Wpl;
the data coding and decoding performance test module comprises a data coding performance test unit and a data decoding performance test unit, wherein the data coding performance test unit comprises the following steps:
step S41, data to be encoded and an encoding command are sent from a host end, and an FPGA acceleration card invokes an encoding algorithm to execute an encoding task on the data so as to finish the encoding operation of the data, wherein the size of the data to be encoded is size;
step S42, obtaining the data coding time of each effective test, namely t1, t2, … and tn, obtaining the read-write operation of the off-chip memory and the on-chip memory from an index space, calculating the duty ratio of the access on-chip memory, and obtaining the frequency of the access on-chip memory in each coding operation, namely Npl, npl2, … and Npln;
step S43, performing read operation on the decoded data, comparing the read result with data to be encoded sent by a host end, if the read result is the same as the data to be encoded, judging that the encoding test operation is invalid, otherwise, judging that the encoding test operation is valid;
step S44, repeatedly executing the steps S41-S43 until the effective test times reach the preset test times n;
step S45, calculating the coding performance formula of the FPGA acceleration card to meet the requirement of
The data decoding performance test unit includes the steps of:
step S51, data to be decoded and a decoding command are sent from a host end, and a decoding algorithm is called through an FPGA acceleration card to perform decoding operation;
step S52, acquiring execution time for executing data decoding operation through an index space, which is marked as t, acquiring read-write operation of an off-chip memory and an on-chip memory from the index space, calculating the duty ratio of accessing the on-chip memory, and acquiring the frequency of accessing the off-chip memory in each decoding operation, which is marked as Wpl1, wpl, … and Wpln;
step S53, comparing the decoding result of the FPGA accelerator card with data to be decoded sent by a host end, if the decoding result is the same as the data to be decoded, judging that the decoding test operation is effective at the time, otherwise, judging that the test is ineffective at the time;
step S54, repeatedly executing steps S51-S53 n times, and recording effective test times n 1 ;
Step S55, calculating the decoding performance of the FPGA acceleration card, wherein a decoding performance formula of the FPGA acceleration card is calculated to meet the following requirementsWherein k is 1 And k 2 Is an influence factor constant;
the comprehensive performance evaluation module comprehensively evaluates the performance of the FPGA acceleration card based on the read-write performance, the data transmission performance and the encoding and decoding performance of the FPGA acceleration card.
2. The performance test system based on the FPGA accelerator card according to claim 1, wherein: the FPGA acceleration card comprises an interface module, a power conversion module, a clock module and a storage module, wherein the FPGA acceleration card is connected with a host equipment end through a PCI-e bus, the interface module comprises an optical fiber interface, and an optical fiber interface indicator lamp is lighted to indicate that the FPGA acceleration card and the host equipment establish a data transmission channel; the power conversion module is used for eliminating interference signals, the clock module is used for providing independent reference clocks, the storage module comprises an off-chip memory and an on-chip memory, and the on-chip memory is a low-delay memory.
3. The performance test system based on the FPGA accelerator card according to claim 1, wherein: the OpenCL framework comprises an OpenCL protocol layer, a hardware abstraction layer and a PCIe device driving layer, wherein the OpenCL protocol layer is used for realizing management command queues, the hardware abstraction layer provides access device handles for the protocol layer and provides device control access interfaces for the device driving layer, and the driving layer is used for driving a device end.
4. A performance testing method based on FPGA accelerator card according to any one of claims 1-3, characterized by: comprises the following steps of;
step S001, building a performance test environment: setting up a testing environment based on an OpenCL framework, enabling the FPGA acceleration card to be connected with a host program through an optical fiber interface, and indicating that the FPGA acceleration card and the host establish a data transmission channel when an optical fiber interface indicator light is lighted;
step S002, data response performance test: the time consumed by transmitting the data packet from the host end to the equipment end is acquired as the basic response time B of the test environment;
step S003, testing data reading performance: transmitting a data packet to a memory of an equipment end through a DMA engine at a host end to finish DMA reading operation of the data packet, acquiring data time sent by the host end and time for reading the data packet through an index space, and acquiring reading performance D of an FPGA acceleration card;
step S004, testing data writing performance: initializing a data packet with size in an on-chip memory of an FPGA acceleration card, transmitting the data packet from a virtual machine memory to an equipment end through a DMA engine in the FPGA acceleration card so as to finish DMA writing operation on the data packet, and acquiring the execution time of the DMA engine writing operation through an index space so as to acquire the writing performance X of the FPGA acceleration card;
step S005, data transmission performance test: testing a transmission speed V1 and an access frequency Npl from a host end to an off-chip memory, testing a transmission speed V2 access frequency Wpl from the off-chip memory to an on-chip random access memory, and obtaining a transmission performance CS of the FPGA accelerator card;
step S006, data encoding and decoding performance test: the method comprises the steps that data to be encoded and an encoding command are sent from a host end, an FPGA acceleration card invokes an encoding algorithm to execute encoding tasks on the data so as to complete encoding operation on the data, the encoding performance BM of the FPGA acceleration card is obtained, the data to be decoded and a decoding command are sent from the host end, and a decoding algorithm is invoked by the FPGA acceleration card to perform decoding operation, so that the decoding performance JM of the FPGA acceleration card is obtained;
step S007, comprehensive performance evaluation: and comprehensively evaluating the performance of the FPGA acceleration card based on the read-write performance, the data transmission performance and the encoding and decoding performance of the FPGA acceleration card.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310305022.8A CN116302756B (en) | 2023-03-22 | 2023-03-22 | Performance test system and method based on FPGA (field programmable Gate array) accelerator card |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310305022.8A CN116302756B (en) | 2023-03-22 | 2023-03-22 | Performance test system and method based on FPGA (field programmable Gate array) accelerator card |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116302756A CN116302756A (en) | 2023-06-23 |
CN116302756B true CN116302756B (en) | 2023-10-31 |
Family
ID=86820405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310305022.8A Active CN116302756B (en) | 2023-03-22 | 2023-03-22 | Performance test system and method based on FPGA (field programmable Gate array) accelerator card |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116302756B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004077211A2 (en) * | 2003-02-28 | 2004-09-10 | Tilmon Systems Ltd. | Method and apparatus for increasing file server performance by offloading data path processing |
CN109739712A (en) * | 2019-01-08 | 2019-05-10 | 郑州云海信息技术有限公司 | FPGA accelerator card transmission performance test method, device and equipment and medium |
CN109885438A (en) * | 2019-02-27 | 2019-06-14 | 苏州浪潮智能科技有限公司 | A kind of FPGA method for testing reliability, system, terminal and storage medium |
CN113886162A (en) * | 2021-10-21 | 2022-01-04 | 统信软件技术有限公司 | Computing equipment performance test method, computing equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9952276B2 (en) * | 2013-02-21 | 2018-04-24 | Advantest Corporation | Tester with mixed protocol engine in a FPGA block |
-
2023
- 2023-03-22 CN CN202310305022.8A patent/CN116302756B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004077211A2 (en) * | 2003-02-28 | 2004-09-10 | Tilmon Systems Ltd. | Method and apparatus for increasing file server performance by offloading data path processing |
CN109739712A (en) * | 2019-01-08 | 2019-05-10 | 郑州云海信息技术有限公司 | FPGA accelerator card transmission performance test method, device and equipment and medium |
CN109885438A (en) * | 2019-02-27 | 2019-06-14 | 苏州浪潮智能科技有限公司 | A kind of FPGA method for testing reliability, system, terminal and storage medium |
CN113886162A (en) * | 2021-10-21 | 2022-01-04 | 统信软件技术有限公司 | Computing equipment performance test method, computing equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116302756A (en) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hady et al. | Platform storage performance with 3D XPoint technology | |
CN104077265B (en) | Utilize the method for shared memory transmitting mixed messaging | |
WO2014035463A1 (en) | System and methods for generating and managing a virtual device | |
CN110175107B (en) | FPGA cloud server performance test method and test system | |
CN110825435B (en) | Method and apparatus for processing data | |
JP2007286671A (en) | Software/hardware division program and division method | |
CN114168200B (en) | System and method for verifying memory access consistency of multi-core processor | |
US20220100512A1 (en) | Deterministic replay of a multi-threaded trace on a multi-threaded processor | |
EP3754481A1 (en) | Technology for generating input/output performance metrics | |
CN110737509B (en) | Thermal migration processing method and device, storage medium and electronic equipment | |
CN116302756B (en) | Performance test system and method based on FPGA (field programmable Gate array) accelerator card | |
CN111552652B (en) | Data processing method and device based on artificial intelligence chip and storage medium | |
US8250545B2 (en) | Associated apparatus and method for supporting development of semiconductor device | |
US20150154103A1 (en) | Method and apparatus for measuring software performance | |
US20070265822A1 (en) | Data processing system and method | |
CN116414762A (en) | PCIe link control method, control device, equipment and medium | |
CN109582523B (en) | Method and system for effectively analyzing performance of NVMe (network video recorder) module at front end of SSD (solid State drive) | |
CN113238974A (en) | Bus bandwidth efficiency statistical method, device, equipment and medium | |
KR101257041B1 (en) | Test and Debugging Method and System using AHB-UART thereof | |
CN116701085B (en) | Form verification method and device for consistency of instruction set design of RISC-V processor Chisel | |
CN113836031B (en) | System, method, device and medium for embedded system test | |
CN113065302B (en) | Method for simulating logic system design, simulator and readable storage medium | |
CN116842902B (en) | System-level simulation modeling method for black box model | |
CN117935901B (en) | EMIF verification method for accessing SDRAM | |
CN117666963B (en) | Data IO acceleration method of CPU cloud computing platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |