CN117271402A - FPGA-based low-latency PCIe DMA data transmission method - Google Patents

FPGA-based low-latency PCIe DMA data transmission method Download PDF

Info

Publication number
CN117271402A
CN117271402A CN202311558032.9A CN202311558032A CN117271402A CN 117271402 A CN117271402 A CN 117271402A CN 202311558032 A CN202311558032 A CN 202311558032A CN 117271402 A CN117271402 A CN 117271402A
Authority
CN
China
Prior art keywords
data
fpga
pcie dma
transmission
pcie
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311558032.9A
Other languages
Chinese (zh)
Other versions
CN117271402B (en
Inventor
鲁翔
乔晓冬
魏育成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ehiway Microelectronic Science And Technology Suzhou Co ltd
Original Assignee
Ehiway Microelectronic Science And Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ehiway Microelectronic Science And Technology Suzhou Co ltd filed Critical Ehiway Microelectronic Science And Technology Suzhou Co ltd
Priority to CN202311558032.9A priority Critical patent/CN117271402B/en
Publication of CN117271402A publication Critical patent/CN117271402A/en
Application granted granted Critical
Publication of CN117271402B publication Critical patent/CN117271402B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Communication Control (AREA)

Abstract

The invention provides a low-delay PCIe DMA data transmission method based on an FPGA, which comprises the steps of data transmission initialization, including large page memory application, wherein a host side driver configures a large page memory head address to the FPGA; a data transmission step, which comprises starting PCIe DMA transmission operation to form transmission data and read data; the step of forming transmission data comprises analyzing the market data, and the FPGA adds a serial number before analyzing the obtained market data result; carrying the analyzed market data result to a large page memory by the FPGA for storage; and a data reading step, wherein software searches and reads the market data in the large page memory according to the serial numbers. The technical scheme has the beneficial effects that the interaction times between the host and the FPGA are reduced, the FPGA is responsible for processing and carrying data, the host is responsible for reading the data and is not interfered with each other, the time delay between data transmission is greatly reduced, and the transmission efficiency is improved.

Description

FPGA-based low-latency PCIe DMA data transmission method
Technical Field
The invention belongs to the field of financial science and technology, and particularly relates to a low-latency PCIe DMA data transmission method based on an FPGA.
Background
With the rapid development of society and the rapid progress of technology, the securities trade industry is experiencing unprecedented explosion. The ever-increasing amount of transactions, the annual growth of the amount of transactions, results in the scale of market data exhibiting explosive growth. Meanwhile, with the rise of intelligent transaction modes such as programmed transaction, strategy transaction and the like, the demand for real-time market data is more urgent and accurate.
In order to meet the increasing demands of transaction amounts and market data sizes, the field of financial science and technology is continually striving to promote innovations and improvements in market data pushing technology. High performance data transmission channels have been developed to meet the pursuit of real-time. By employing low latency communication protocols and high speed network devices, market data can flow between systems at a staggering rate.
However, the real-time performance of market data pushing in modern financial transactions is more important, and both high-frequency transactions and event-driven transactions have strong demands for instant update of market data. Under the background, the financial institutions actively adopt advanced data transmission technology and efficient data decoding mode to ensure timeliness, accuracy and stability of market data. In order to achieve low-latency data transmission, various innovative measures are taken in the field of financial technology. First, on a physical level, the network infrastructure is optimized, bandwidth capacity is increased, and even in some cases, particularly in high frequency trading environments, servers are placed directly in the machine room near the exchange or data source to reduce the distance and transmission time of data transmissions. Second, at the transport protocol level, the financial institution chooses to use a low-latency communication protocol, such as UDP (user datagram protocol), instead of the conventional TCP (transmission control protocol). The UDP protocol has less transmission overhead and lower latency, but also has a risk of packet loss. Thus, to ensure the integrity and reliability of the data, financial institutions often employ other compensation measures, such as FEC (forward error correction) techniques or real-time retransmission mechanisms, but also lead to complexity of the transmission process and insufficient stability of the transmission.
In the prior art, in order to improve the efficiency and speed of data decoding, the field of financial technology has actively developed and applied hardware acceleration techniques, such as FPGAs (field programmable gate arrays) and ASICs (application specific integrated circuits). In the prior art, the FPGA participates in the financial data transmission process, which comprises the following steps: step one, a host initiates DMA reading operation; step two, the kernel driver creates a DMA read descriptor; step three, the kernel driver configures a DMA read descriptor to the FPGA; step four, the FPGA carries the data to the memory of the host according to the descriptor table; step five, after the FPGA is carried out, an interrupt is sent; and step six, after the host detects the interrupt, the data is taken out from the memory, and after one group of data analysis is completed and the host extracts, the next group of data is transmitted. Because the existing FPGA accelerates DMA transmission, each data transmission needs to be participated by server-side software, the time delay in the data transmission process is increased, and the data transmission is not efficient. Therefore, a new PCIe DMA data transmission method based on an FPGA needs to be developed, which can reduce software participation in a data transmission process, so as to improve a data transmission speed.
Disclosure of Invention
The invention provides a low-delay PCIe DMA data transmission method based on an FPGA, which reduces the participation of server software and improves the data transmission speed by reducing the interaction times of the FPGA and a CPU.
Other objects and advantages of the present invention will be further appreciated from the technical features disclosed in the present invention.
In order to achieve one or a part of or all of the above objects or other objects, a technical solution of the present invention provides a low latency PCIe DMA data transmission method based on FPGA, which is characterized in that: the method comprises the following steps: the data transmission initialization step comprises the steps of applying for a large page memory, and configuring a head address of the large page memory to an FPGA by a host side driver; a data transmission step, which comprises starting PCIe DMA transmission operation to form transmission data and read data; the step of forming transmission data comprises analyzing the market data, and the FPGA adds a serial number before analyzing the obtained market data result; carrying the analyzed market data result to a large page memory by the FPGA for storage;
and a data reading step, wherein software searches and reads the market data in the large page memory according to the serial numbers. The technical scheme has the advantages that in the invention, the FPGA and the host only generate one interaction when the host configures the head address of the large page memory to the FPGA at the beginning stage of data transmission, the FPGA is responsible for data processing and carrying in the later data processing and transmission process, and the software of the host only needs to be responsible for reading data, so that the two are not interfered with each other. Therefore, time expenditure caused by interaction in the data transmission process is reduced, and the delay of data transmission is reduced.
The FPGA board card is inserted into a server PCIe slot; and the PCIe physical interface on the FPGA board card is inserted into the PCIe slot of the server. The FPGA board card is inserted into the PCIe slot of the server, so that the data transmission quantity between the FPGA board card and the CPU of the server can be maximized, and the delay of data transmission is reduced.
And the large page memory is applied to the server kernel by a host side driver, and the large page memory is distributed by calling a corresponding kernel API function by the host side driver.
And the head address of the large page memory is configured to the FPGA through a PCIe user BAR space address. The main page memory head address is used for synchronizing the data writing address obtained by the FPGA end and the data reading address obtained by the host software side, and the system setting can be simplified by utilizing the BAR space address to configure the main page memory head address.
The starting of the PCIe DMA transmission operation is that a host side driver informs an FPGA card board to start PCIe DMA transmission in a register configuration mode. And after receiving the command, the FPGA chip executes PCIe DMA transmission operation according to the configuration information in the register. The mode has the advantages of flexibility and quick starting.
In the data transmission step, market data enter an FPGA board card through an Ethernet interface; the FPGA processes the quotation data according to the protocol provided by the transaction exchange, and each contract obtained by processing is formed into an analysis result packet according to the sequence. The network protocol employed herein is the DUP protocol.
Each parsing result packet is fixed in size. And each analysis result is output as a data packet with a fixed size, so that the data processing can be simplified, the data efficiency can be improved, and the cost for transmitting and receiving the data packet can be reduced.
And the FPGA dynamically analyzes a plurality of analysis result packets with fixed sizes according to the size of the network traffic. In the scheme, because the network flow is instantly floating, the data transmission speed is different under different bandwidths, and the analysis result packets with different numbers are dynamically analyzed according to the network flow, so that the time is saved and the efficiency is improved.
The forming of the transmission data further comprises the steps that the FPGA dynamically calculates PCIe DMA length and PCIe DMA addresses according to the size and the number of the analysis result packets, and the FPGA writes the market data result into the large page memory according to the PCIe DMA length and the PCIe DMA addresses. The FPGA logic can realize a dynamic data transmission control mode, ensure that data can be transmitted immediately, and realize the state of the buffer of the data 0 of the FPGA local end under the condition of bandwidth matching.
In the forming of the transmission data, adding a serial number before the analysis result of the mood data means adding a continuous serial number before each analysis result packet. And adding a continuous sequence number before the analysis result packet, and when the software recognizes that the sequence number is increased, the software can indicate that the data is newly added in the large page memory, and at the moment, the software can start to read the data. Meanwhile, the sequential increasing order can be arranged so that the host driver can quickly identify, and the host does not need to process the serial numbers.
And the analysis result packages are arranged in the large page memory according to the increasing sequence of the sequence numbers. The setting is unified with the continuous serial number increasing adding mode, and software can start searching data from the first address of the large page memory according to the serial number increasing sequence. The scheme is used for conveniently searching the data.
The FPGA is also provided with a quotation data receiving module, a quotation decoding processing logic module and a quotation decoding result data caching module; after the market data receiving module receives the market data, analyzing and processing the market data by the market decoding processing logic module to obtain an analysis result packet added with a serial number, a PCIe DMA transmission length and a PCIe DMA transmission address; and the quotation decoding processing logic module temporarily stores the analysis result into the quotation decoding result data buffer module.
The FPGA is also provided with a PCIe DMA descriptor calculation module and an analytic structure framing module; the PCIe DMA descriptor calculation module reads the analysis result packet temporarily stored in the market sense decoding result data buffer module and generates a PCIe DMA descriptor according to the main machine side drive configured main page memory head address; the analysis structure framing module reads the analysis result packet temporarily stored in the market sense decoding result data buffer module and then outputs a PCIe DMA transmission data frame; the generated PCIe DMA descriptor and PCIe DMA transfer data frame are used for carrying data to the large page memory by the FPGA.
After the host side driver recognizes that the sequence number changes, the server side software starts to read data from the first address of the large page memory, and needs to find a corresponding analysis result packet according to the sequence number. Because the sequence numbers are continuous, the host computer only recognizes the increment of the sequence numbers to indicate that the data is updated, and at the moment, the software can enter the large page memory to start searching the corresponding sequence numbers and extract the corresponding data.
After PCIe DMA transmission operation is started, new data enter the FPGA board card through the Ethernet interface, and the FPGA repeatedly analyzes the data; the formed transmission data continuously enter a large page memory for storage; the server software side extracts data at any time according to the needs. When data are continuously transmitted through the Ethernet, the data are received by a market data receiving module on the FPGA, decoded and processed by the FPGA and then conveyed into a large page memory. The FPGA directly processes data, is uninterrupted, and is extracted by a host according to the need, so that the transmission efficiency is greatly improved.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, the FPGA is utilized to analyze the market data, a continuous serial number is added to the received data after the market decoding logic in the FPGA analyzes the market data, and then the data is carried into a large page memory by the FPGA, and software searches a data packet of the data analysis result according to the serial number. In the whole process, the host computer interacts with the FPGA in the stage of applying for the large page memory, then the FPGA is responsible for analyzing data and carrying data, the software is responsible for reading the data, the FPGA does not interact with the host computer, and the two are not interfered with each other. Therefore, time expenditure caused by interaction in the data transmission process is reduced, and the delay of data transmission is reduced.
2. The FPGA dynamically analyzes the analysis result packet, the PCIe DMA length and the PCIe DMA address according to the size and the number of the network traffic, so that 0 cache of data at the FPGA end can be realized, the running speed of the FPGA is improved, and the delay of data transmission is reduced.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments, as illustrated in the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of specific embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.
FIG. 1 is a diagram illustrating DMA transfer in the prior art.
FIG. 2 is a diagram illustrating DMA transfer according to the present invention.
FIG. 3 is a schematic diagram of the data transmission between the FPGA and the server according to the present invention.
Fig. 4 is a schematic diagram of analyzing market data of the FPGA according to the present invention.
Detailed Description
The foregoing and other features, aspects, and advantages of the present invention will become more apparent from the following detailed description of a preferred embodiment, which proceeds with reference to the accompanying drawings. The directional terms mentioned in the following embodiments are, for example: upper, lower, left, right, front or rear, etc., are merely references to the directions of the attached drawings. Thus, the directional terminology is used for purposes of illustration and is not intended to be limiting of the invention.
The following explains in detail a low-latency PCIe DMA data transmission method based on an FPGA according to the present invention with reference to the accompanying drawings. Referring to fig. 1 and 2, fig. 1 is a diagram illustrating a manner of implementing DMA data transfer in the prior art, where the DMA data transfer refers to copying data from one address space to another address space, and performing high-speed data transfer between a peripheral and a memory or between a memory and a memory. DMA data transfer may not require CPU involvement. In the prior art, although the FPGA participates in data transmission, the data transmission is accelerated by utilizing the FPGA hardware, and the data transmission speed can be improved, the data transmission speed is necessarily reduced because the FPGA needs to send an interrupt signal to the host after carrying the data each time and the host extracts the data after detecting the interrupt signal, and the participation of the host is needed in each data transmission. Therefore, the invention improves the defects in the prior art, the improved scheme is shown in fig. 2, the technical scheme of the application comprises the following steps of data initialization, including large page memory application, the application is applied to a server kernel by a host side driver, the server kernel driver applies for a large page memory, and the allocation of the large page memory is realized by calling a corresponding kernel API function by the host side driver. The host side driver configures the head address of the large page memory to the FPGA through the BAR space address of the PCIe user. And configuring the head address of the large page memory to the FPGA, and enabling the host side driver to directly exchange data with the FPGA. The initial address of the large page memory allocated to the FPGA is used as the initial position for data placement when the FPGA carries data, and is also used for conveniently searching the data.
After the data initialization step is completed, PCIe DMA transfer operation can be started, wherein the starting of the PCIe DMA transfer operation is that a host side driver informs an FPGA card board to start PCIe transfer in a register configuration mode, and the register configuration mode comprises the following steps: determining a register address, the register address being defined by a header file or document in a driver package; establishing connection, and opening equipment files corresponding to the FPGA board card by a host side driving program to establish connection with equipment; the configuration register is used for writing configuration values into the FPGA board card register by the host side driver by using corresponding API functions or instructions; and transmitting operation trigger, and writing a trigger command or a related flag bit into a register by a host side driving program to inform the FPGA board card of executing PCIe transmission operation. Because the register configuration mode is used, the PCIe DMA transmission operation can be flexibly controlled, and meanwhile, the design can be simplified and the expandability can be improved.
After PCIe DMA transmission operation is started, the FPGA processes the market data to form transmission data, the transmission data is shown in fig. 3 and 4, a market data receiving module and market data decoding processing logic are arranged in the FPGA chip, when external futures or other financial market data continuously enter the FPGA board card through Ethernet, the market data is received through the market data receiving module in the FPGA board card, the market decoding logic in the FPGA decodes the market data, the decoding is carried out according to a decoding protocol, the decoding is carried out according to a protocol in the prior art, such as a DUP protocol, and the market decoding processing logic module sequentially composes each contract obtained through processing into an analysis result packet. The parsing result packet obtained here is a result packet of a fixed size in order to facilitate processing of the data. Meanwhile, the quotation decoding processing logic module also adds a sequence number to the analyzed result packet, specifically adds the sequence number to the analyzed result packet, wherein the specific application can be SEQ sequence number, and the added sequence number is used for facilitating the recognition and reading of data by software, and the quotation data analyzed result packet obtained through analysis processing can be temporarily placed in the quotation decoding result buffer module for storage.
In the above embodiment, the sequence number added to the front end of the result packet should be a consecutive sequence number. Meanwhile, the data packets are put into the big page memory according to the sequence of the sequence numbers, and the first data put into the big page memory is at the first address position. Because the serial numbers are continuous, software does not need to calculate and analyze the serial numbers when reading the data, and can know that new data is stored in the large page memory as long as the increase of the serial numbers is identified. At this time, the location of the first address in the large page memory provides a guide for the software to find the file location, i.e. each time the finding starts from the location of the first address, the guiding is performed according to the sequence number.
Meanwhile, in order to realize real-time data transmission, a quotation decoding processing logic in the FPGA can dynamically analyze an analysis result packet according to the size of the network traffic. This is because the size of data that can be transmitted is different for different traffic sizes, and the number of result packets that are resolved by the FPGA to have a fixed size at a time is different. If the FPGA analyzes the analysis result packet in real time according to the flow, the FPGA can work continuously to analyze the analysis result packet continuously, and the efficiency can be improved. And meanwhile, the market decoding processing logic dynamically calculates the PCIe DMA transmission size and PCIe DMA transmission address according to the number and the size of the parsed result packets. The PCIe DMA descriptor calculation module generates corresponding PCIe DMA descriptors according to the PCIe DMA transmission size and the PCIe DMA transmission address, thereby facilitating real-time handling of data. The specific handling is given in example 1 and will not be explained in detail here. And the PCIe DMA transmission size and the PCIe DMA transmission address are dynamically calculated according to the actual requirements, so that the self-adaptive time can be reduced, and the transmission efficiency and the response time are improved.
The PCIe DMA descriptor calculation module identifies and reads the analysis result packet temporarily stored in the market decoding result buffer module, and outputs the PCIe DMA descriptor according to the analysis result packet and the head address position of the large page memory configured to the FPGA by the host driver, wherein the obtained PCIe DMA descriptor is a data structure and contains related information of DMA data transmission, such as data length, starting address, direction and the like. Because the output PCIe DMA descriptor contains the first address location of the large page memory, the data stored in the large page memory has a datum location, and the first address is used for keeping the data written by the FPGA synchronous with the address read by the host driver. And the analysis result framing module in the FPGA reads the analysis result packet temporarily stored in the market decoding result caching module and outputs a PCIe DMA transmission data frame after processing the analysis result packet, wherein the PCIe DMA transmission data frame is a data format and is used for organizing and transmitting DMA data. After the PCIe DMA descriptor and the PCIe DMA transmission data frame are received by the PCIe receiving end, the PCIe controller analyzes and restores the DMA data, and is convenient for carrying the transmission data into a large page memory.
In the invention, the FPGA board card is inserted into the PCIe slot of the server, and the PCIe physical interface on the FPGA board card is inserted into the PCIe slot of the server. Referring to fig. 3, PCIe root complex of the server software end in the figure is a root complex, PCIe end point x8 in the FPGA is an endpoint, and connection between the two is performed through a PCIe bus. PCIe bus is a high speed serial bus that requires the use of PCIe slots and connectors on the slots when connecting PCIe root complex to PCIe end point x 8. The PCIe slot is arranged on the motherboard and is used for inserting PCIe equipment. The FPGA board card is directly inserted into the server and is directly used for processing data, so that the processing speed of the data is increased. PCIe DMA logic in the FPGA side of fig. 3 is a logic module, mainly used for DMA transfer, and generally includes a PCIe DMA descriptor calculation module, an parsing result framing module, and other modules in fig. 4. In fig. 3, when the PCIe DMA logic module carries data with an SEQ sequence number to a large page memory in a server for storage, a side driver of server software reads the data and transfers the data to a library (lib), and an application program (app software) invokes the data in the library by compiling links or API calls or runtime loads, so that a real-time market data result is obtained.
As an example of this aspect, the above explanation constitutes a complete parsing and transmission step of data, when data is continuously entered, the FPGA will repeatedly perform decoding parsing and carrying operations, and the software of the host only needs to extract the data in the large page memory according to the sequence number change and the requirement, and send the data to the terminal in time. Of course, the requirement here must be in order to guarantee the timeliness of the quotation data, and then the data can be required to be moved immediately, at this time, the data can not be cached in the FPGA, and the data can be efficiently transferred.
For a better explanation of the invention, the procedure of the invention is explained below with a specific example 1.
The embodiment is a process of analyzing specific market data, and the main operation steps are as follows:
step one: the FPGA board card is inserted into a server PCIe slot;
step two: the driver applies for the large page memory from the server. The head address of the large page memory is configured to the FPGA through the BAR space address of the PCIe user;
step three: the FPGA board card receives market data of the exchange through a tera-megaweb interface, and the network protocol is UDP protocol;
step four: decoding UDP protocol is completed by the internal logic of the FPGA; a payload (payload) of UDP varies from 48 bytes to 1400 bytes;
step five: and the FPGA internal logic obtains the quotation data by solving each contract according to the data format of the quotation data of the exchange, and the length of the quotation data is different from tens of bytes to hundreds of bytes. The majority of the quotation data are offset, such as the price of a contract is increased, the price of the contract is dropped, the price of the contract is achieved, and the like;
step six: the FPGA logic completes addition, subtraction, multiplication and division calculation on the market data according to the protocol provided by the exchange to obtain the latest price and the latest transaction amount, and then forms a latest market information result packet with a fixed size and 192 bytes in length for each contract obtained by decoding according to the sequence. And at the same time, marking a 4-byte serial number on the head of the 192-byte market information result packet;
step seven: because the length of UDP received by the network in real time is changed, the number of contracts which are solved is also changed, for example, the FPGA obtains n contracts after finishing the solution of one UDP packet, and the transmission length of the current DMA is n times 192 bytes; the current DMA transmission address is the first address of the n times 192+ large page memory;
step eight: the internal logic of the FPGA board writes the decoded data into the memory of the server according to the calculated DMA transmission length and the DMA transmission address;
step nine: the server software starts reading data from the first address of the large page memory. Every 192 bytes there will be a sequence number that is incremented to indicate a new data update. The sequence number is continuous, so that the correctness of data transmission is ensured.
The FPGA-based low-latency PCIe DMA data transmission method provided by the present invention is described in detail above, and specific examples are applied herein to illustrate the structure and working principle of the present invention, and the description of the above embodiments is only used to help understand the method and core idea of the present invention. It should be noted that it will be apparent to those skilled in the art that various improvements and modifications can be made to the present invention without departing from the principles of the invention, and such improvements and modifications fall within the scope of the appended claims.

Claims (15)

1. A low-latency PCIe DMA data transmission method based on an FPGA is characterized in that: the method comprises the following steps: the data transmission initialization step comprises the steps of applying for a large page memory, and configuring a head address of the large page memory to an FPGA by a host side driver;
a data transmission step, which comprises starting PCIe DMA transmission operation to form transmission data and read data;
the step of forming transmission data comprises analyzing the market data, and the FPGA adds a serial number before analyzing the obtained market data result; carrying the analyzed market data result to a large page memory by the FPGA for storage;
and a data reading step, wherein software searches and reads the market data in the large page memory according to the serial numbers.
2. The FPGA-based low latency PCIe DMA data transfer method of claim 1, wherein: the FPGA board card is inserted into a server PCIe slot;
and the PCIe physical interface on the FPGA board card is inserted into the PCIe slot of the server.
3. The FPGA-based low latency PCIe DMA data transfer method of claim 1, wherein: the large page memory is driven by the host to apply for the server kernel,
and the allocation of the large page memory is realized by calling a corresponding kernel API function by a host side driver.
4. The FPGA-based low latency PCIe DMA data transfer method of claim 1, wherein: and the head address of the large page memory is configured to the FPGA through a PCIe user BAR space address.
5. The FPGA-based low latency PCIe DMA data transfer method of claim 1, wherein: the starting of the PCIe DMA transmission operation is that a host side driver informs an FPGA card board to start PCIe DMA transmission in a register configuration mode.
6. The FPGA-based low latency PCIe DMA data transfer method of claim 1, wherein: in the data transmission step, market data enter an FPGA board card through an Ethernet interface;
the FPGA processes the quotation data according to the protocol provided by the transaction exchange, and each contract obtained by processing is formed into an analysis result packet according to the sequence.
7. The FPGA-based low latency PCIe DMA data transfer method of claim 6, wherein: each parsing result packet is fixed in size.
8. The FPGA-based low latency PCIe DMA data transfer method of claim 7, wherein: and the FPGA dynamically analyzes a plurality of analysis result packets with fixed sizes according to the size of the network traffic.
9. The FPGA-based low latency PCIe DMA data transfer method of claim 8, wherein: the forming of the transmission data further comprises the steps that the FPGA dynamically calculates PCIe DMA length and PCIe DMA addresses according to the size and the number of the analysis result packets, and the FPGA writes the market data result into the large page memory according to the PCIe DMA length and the PCIe DMA addresses.
10. The FPGA-based low latency PCIe DMA data transfer method of claim 7, wherein: adding a serial number before the analysis result of the quotation data means adding a continuous serial number before each analysis result packet.
11. The FPGA-based low latency PCIe DMA data transfer method of claim 10, wherein: and the analysis result packets are arranged in the large page memory according to the sequence number increasing order.
12. The low-latency PCIe DMA data transfer method based on FPGA of claim 2, wherein: the FPGA is also provided with a quotation data receiving module, a quotation decoding processing logic module and a quotation decoding result data caching module;
after the market data receiving module receives the market data, analyzing and processing the market data by the market decoding processing logic module to obtain an analysis result packet added with a serial number, a PCIe DMA transmission length and a PCIe DMA transmission address;
and the quotation decoding processing logic module temporarily stores the analysis result into the quotation decoding result data buffer module.
13. The FPGA-based low latency PCIe DMA data transfer method of claim 12, wherein: the FPGA is also provided with a PCIe DMA descriptor calculation module and an analytic structure framing module;
the PCIe DMA descriptor calculation module reads the analysis result packet temporarily stored in the market sense decoding result data buffer module and generates a PCIe DMA descriptor according to the main machine side drive configured main page memory head address;
the analysis structure framing module reads the analysis result packet temporarily stored in the market sense decoding result data buffer module and then outputs a PCIe DMA transmission data frame;
the generated PCIe DMA descriptor and PCIe DMA transfer data frame are used for carrying data to the large page memory by the FPGA.
14. The FPGA-based low latency PCIe DMA data transfer method of claim 11, wherein: after the host side driver recognizes that the sequence number changes, the server side software starts to read data from the first address of the large page memory, and needs to find a corresponding analysis result packet according to the sequence number.
15. The FPGA-based low latency PCIe DMA data transfer method of claim 9, wherein: after PCIe DMA transmission operation is started, new data enter the FPGA board card through the Ethernet interface, and the FPGA repeatedly analyzes the data; the formed transmission data continuously enter a large page memory for storage;
the server software side extracts data at any time according to the needs.
CN202311558032.9A 2023-11-22 2023-11-22 FPGA-based low-latency PCIe DMA data transmission method Active CN117271402B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311558032.9A CN117271402B (en) 2023-11-22 2023-11-22 FPGA-based low-latency PCIe DMA data transmission method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311558032.9A CN117271402B (en) 2023-11-22 2023-11-22 FPGA-based low-latency PCIe DMA data transmission method

Publications (2)

Publication Number Publication Date
CN117271402A true CN117271402A (en) 2023-12-22
CN117271402B CN117271402B (en) 2024-01-30

Family

ID=89218199

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311558032.9A Active CN117271402B (en) 2023-11-22 2023-11-22 FPGA-based low-latency PCIe DMA data transmission method

Country Status (1)

Country Link
CN (1) CN117271402B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110611624A (en) * 2018-06-15 2019-12-24 上海仪电(集团)有限公司中央研究院 Massive market quotation data acceleration system and acceleration method based on FPGA
CN111782154A (en) * 2020-07-13 2020-10-16 北京四季豆信息技术有限公司 Data moving method, device and system
CN112347020A (en) * 2020-10-26 2021-02-09 东方证券股份有限公司 FAST market analysis system and method based on CGRA
CN113688072A (en) * 2020-05-19 2021-11-23 华为技术有限公司 Data processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110611624A (en) * 2018-06-15 2019-12-24 上海仪电(集团)有限公司中央研究院 Massive market quotation data acceleration system and acceleration method based on FPGA
CN113688072A (en) * 2020-05-19 2021-11-23 华为技术有限公司 Data processing method and device
CN111782154A (en) * 2020-07-13 2020-10-16 北京四季豆信息技术有限公司 Data moving method, device and system
CN112347020A (en) * 2020-10-26 2021-02-09 东方证券股份有限公司 FAST market analysis system and method based on CGRA

Also Published As

Publication number Publication date
CN117271402B (en) 2024-01-30

Similar Documents

Publication Publication Date Title
CN108268328B (en) Data processing device and computer
CN113254368B (en) Data writing method and data reading method from AXI bus to OPB bus
US7864806B2 (en) Method and system for transmission control packet (TCP) segmentation offload
US5608867A (en) Debugging system using virtual storage means, a normal bus cycle and a debugging bus cycle
EP4050491A1 (en) Method for converting avalon bus into axi4 bus
CN111459856B (en) Data transmission device and transmission method
CN113411380B (en) Processing method, logic circuit and equipment based on FPGA (field programmable gate array) programmable session table
CN110825435B (en) Method and apparatus for processing data
CN116225992A (en) NVMe verification platform and method supporting virtualized simulation equipment
US20220365892A1 (en) Accelerating Method of Executing Comparison Functions and Accelerating System of Executing Comparison Functions
CN117271402B (en) FPGA-based low-latency PCIe DMA data transmission method
CN114691578A (en) High-performance serial communication method, system, medium, equipment and terminal
CN114297124A (en) Communication system of SRIO high-speed bus based on FPGA
WO2022032990A1 (en) Command information transmission method, system, and apparatus, and readable storage medium
CN117931478A (en) Inter-process communication method, inter-process communication device and storage medium
CN105159850A (en) FPGA based multi-channel data transmission system
CN111400230A (en) Data transmission method, system, control device and storage medium
US8572569B2 (en) Modified implementation of a debugger wire protocol and command packet
Nguyen et al. Reducing data copies between gpus and nics
CN114371920A (en) Network function virtualization system based on graphic processor accelerated optimization
CN105830027B (en) method and apparatus for prefetching and processing jobs for processor cores in a network processor
KR20010102285A (en) Methods and apparatus for facilitating direct memory access
CN114866534B (en) Image processing method, device, equipment and medium
CN115633098B (en) Storage management method and device of many-core system and integrated circuit
US20080288967A1 (en) Procedure calling method, procedure calling program, and computer product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant