CN112650499A - System for realizing hardware decoding processing of exchange level-2FAST market based on OpenCL platform - Google Patents

System for realizing hardware decoding processing of exchange level-2FAST market based on OpenCL platform Download PDF

Info

Publication number
CN112650499A
CN112650499A CN202011605658.7A CN202011605658A CN112650499A CN 112650499 A CN112650499 A CN 112650499A CN 202011605658 A CN202011605658 A CN 202011605658A CN 112650499 A CN112650499 A CN 112650499A
Authority
CN
China
Prior art keywords
market
data
module
decoding
quotation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011605658.7A
Other languages
Chinese (zh)
Inventor
俞枫
曾宏祥
金亭姝
马辉
邹经纬
周正鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guotai Junan Securities Co Ltd
Original Assignee
Guotai Junan Securities Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guotai Junan Securities Co Ltd filed Critical Guotai Junan Securities Co Ltd
Priority to CN202011605658.7A priority Critical patent/CN112650499A/en
Publication of CN112650499A publication Critical patent/CN112650499A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/165Combined use of TCP and UDP protocols; selection criteria therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/26Special purpose or proprietary protocols or architectures

Abstract

The invention relates to a system for realizing hardware decoding processing aiming at a trading place level-2FAST market based on an OpenCL platform, which comprises a TCP data input module, a TCP layer decoding module and a processing module, wherein the TCP data input module is used for receiving a trading place market data stream and analyzing a TCP or IP protocol layer; the market condition category filtering module is used for classifying and filtering market condition data; the quotation decoding module is used for analyzing various FAST quotations in parallel; the merging and distributing module is used for summarizing and scheduling the message; and the UDP data output module is used for carrying out multicast sending of the market information packet. By adopting the system for realizing hardware decoding processing aiming at the exchange level-2FAST market based on the OpenCL platform, the development period is shortened by utilizing the characteristics constructed at the top layer of the heterogeneous platform, and compared with the traditional pure hardware register level bottom layer design, the development time can be shortened by at least half a year on average. Specific optimization is performed on OpenCL development logic at the FPGA side by combining a platform compiler compiling principle according to hardware characteristics, an ultra-low-delay heterogeneous market link capable of efficiently decoding can be realized, and decoding delay is as low as about 1 us.

Description

System for realizing hardware decoding processing of exchange level-2FAST market based on OpenCL platform
Technical Field
The invention relates to the technical field of information, in particular to the field of market data decoding, and specifically relates to a system for realizing hardware decoding processing aiming at the market level-2FAST market based on an OpenCL platform.
Background
With the rapid development of the capital market in recent years, the need of investors for ultra-low delay market processing and forwarding systems is more and more urgent. For an algorithm trading investor, if the market can have a faster market than a competitor, the market can acquire a richer investment return more accurately and more accurately grasping the buying and selling time in the ever-changing stock market. The FPGA (Field Programmable Gate Array) is an ideal technology for creating low-delay market and trading systems due to its Programmable, stable and low-delay characteristics.
The upper level-2 market is the market with higher decoding difficulty in Shanghai and Shenzhou market. Currently there are mainly two decoding schemes in the industry: one is pure software decoding based on a general CPU, and the other is customized pure hardware decoding based on an FPGA.
The pure software decoding adopts high-level programming languages such as C/C + +, the development and test environment is friendly, the development period is short, and great advantages are achieved in updating iteration. In the aspect of decoding efficiency, although the CPU master frequency is much higher than the FPGA clock frequency, it is difficult to completely avoid uncontrollable delay introduced in the process of compiling the top-level development language to the bottom-level machine language by the compiler in such a development environment based on the high-level language + the general CPU. In addition, the software decoding program runs on the operating system, and the interrupt mechanism and thread scheduling of the operating system bring great instability to the overall decoding delay.
Pure hardware decoding based on FPGA adopts the bottom hardware language Verilog/VHDL to describe decoding logic, can strictly program the shortest delay needed for completing specific functions, but the time period (usually measured in half a year) for developing and maintaining change is also greatly increased by the development logic facing the bottom layer.
In order to combine the advantages of software decoding and hardware decoding, a heterogeneous accelerated OpenCL platform recently introduced by an FPGA main stream supplier in the industry is adopted to decode and develop the level-2 market of the upper business. On one hand, the development cycle is shortened by utilizing the characteristics constructed by the top layer of the heterogeneous platform, on the other hand, specific optimization is carried out on the OpenCL development logic at the FPGA side by combining the platform compiler compiling principle according to the hardware characteristics, and the heterogeneous acceleration market link with rapid development and efficient decoding is successfully realized.
The level-2 market data is packaged by a standard stock exchange protocol (STEP), namely, the market data is transmitted by a standard tag value FIX message format. Because the STEP data format redundancy is large, the level-2 system carries out FAST compression coding on the STEP data of the real-time market, and embeds the compressed data into the number 96 label of the STEP message.
For a single STEP packet, tag number 35 indicates the message type of the market data in the STEP message, tag number 95 indicates the total length of the FAST message, and tag number 96 is the actual FAST encoded message data, which may include one or more FAST messages of the same message type.
The lev2 market of the ministry of the shang, the characteristics of the embedded FAST compressed encapsulation format of the STEP protocol and the snapshot market increment transmission bring about not less challenges to the design of a hardware decoding program, and the characteristics are embodied in the following aspects:
coding of indefinite length
The FAST compression protocol adopts a stop bit encoding mode, for a certain market situation field, the occupied byte in the market situation binary stream is indefinite, and the indefinite length encoding means that the value of a field cannot be obtained through a memory mapping mode taking byte (byte) as a unit, but the field must be deep inside the binary stream, and market situation data is analyzed by taking bit (bit) as a unit.
Strong coupling
The specific data type + data operation mode of the FAST protocol results in strong coupling of multiple FAST messages in the same STEP, and the quotation field in one FAST protocol needs to depend on the dictionary value after the decoding of the last FAST protocol is completed. The coupling between the data streams increases the difficulty of designing the parallel decoding task of the FPGA.
Incremental transmission
The snapshot market adopts an increment transmission mode, so that a decoding program needs to correctly solve the market compressed by the FAST, and also needs to cache all snapshot fields of the target market and perform correct updating operation, which puts higher requirements on the correctness, storage space and read-write efficiency of FAST decoding.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a system which can realize hardware decoding processing aiming at the level-2FAST market of a trading exchange based on an OpenCL platform, and has the advantages of rapid development, high-efficiency decoding and low time delay.
In order to achieve the above object, the system for implementing hardware decoding processing for exchange level-2FAST market quotations based on the OpenCL platform of the present invention is as follows:
the system for realizing hardware decoding processing aiming at the exchange level-2FAST market based on the OpenCL platform is mainly characterized by comprising the following steps:
the TCP data input module is used for receiving the exchange market data stream, analyzing a TCP or IP protocol layer and extracting the original market application layer data of the exchange;
the market classification filtering module is connected with the TCP data input module and is used for classifying and filtering market data to acquire FAST coded data corresponding to the transaction market, the index market and the snapshot market respectively;
the market decoding module is connected with the market category filtering module and used for analyzing various FAST market in parallel to realize a pipelined decoding core;
the merging and distributing module is connected with the market decoding module and is used for summarizing and scheduling the message;
and the UDP data output module is connected with the merging and distributing module and is used for carrying out multicast sending of the market information packet.
Preferably, the quotation decoding module includes a UA3201 decoding module, a UA3202 decoding module, and a UA3113 decoding module, which are respectively accessed to three fast protocol data output channels, and decode the fast protocol quotations of respective categories in parallel, and output corresponding decoded quotation messages.
Preferably, the TCP data input module establishes a TCP connection with the upstream exchange market gateway VDE, and obtains a TCP load market data stream by parsing the TCP protocol stack.
Preferably, the market classification filtering module classifies and filters market classifications according to the exchange market STEP protocol rule, and outputs market data of three classifications by three routes.
Preferably, the market condition category filtering module adopts a parallel working principle of a tag35 filter and a tag96 filter, and the tag35 filter is used for filtering and searching a tag number 35 in a STEP message to judge the specific market condition category of the current market condition data stream; the tag96 filter is used for analyzing the content of the 96 # label in the STEP message to extract the effective FAST coding market data in the market message, and the market filtered by the tag96 filter is output to the UA3201, UA3202 and UA3113 three way FAST protocol data channels in parallel according to the message category searched by the tag35 filter.
Preferably, the merging and distributing module adopts a finite state machine, the state machine inputs the three decoded market information messages generated by the decoding module, the state machine is in a standby state by default, the validity of channel data of the three market information message input channels is continuously detected, and the data meeting the sending condition is output to the UDP data output module according to the priority order of UA3202 message sending, UA3201 message sending and UA3113 message sending.
Preferably, the UDP data output module performs cache connection through a channel, and the output data meets a downstream multicast data sending rule to generate a UDP output data stream.
Preferably, the UA3202 decoding module and the UA3113 decoding module split the decoding mechanism into two stages of pipeline parallel tasks of field segmentation and state machine parsing,
the field segmentation assembly line receives binary stream market data coded by FAST, processes four-byte field segmentation in parallel according to the FAST protocol field stop bit coding rule at the flow rate of four-byte data width per clock cycle, outputs the segmented fields, and sends the fields to the state machine analysis assembly line;
the state machine analysis production line receives field output generated by the field segmentation production line, organizes data by taking the field as a unit and stores the data into a data cache space; and according to the format of the market message, analyzing the data of the data cache space in real time to complete the complete decoding of the FAST message.
Preferably, the UA3202 market decoding module splits the whole decoding mechanism into three-stage pipeline parallel steps of a field division pipeline, state machine double nested parsing and volume/price entrusting operation,
the state machine double-nested analysis assembly line receives field output generated by a field segmentation assembly line, organizes data by taking the field as a unit, stores the data into a data cache space, carries out real-time state skip and field analysis on the data in the data cache space according to a market message format to obtain three types of operation operations of adding or deleting or updating prices and quantity of buying and selling ten grades of prices and fifty orders of buying and selling one grade of prices, and transmits the service field and corresponding operation information obtained by decoding to a lower assembly line in real time;
the volume/price entrusting operation assembly line receives the business fields and the corresponding operation information generated by the state machine double-nested analysis assembly line, performs incremental operation on the current price queue and the entrusting queue according to the security codes, and manages the calculated full market and stores and manages the hash.
By adopting the system for realizing hardware decoding processing aiming at the exchange level-2FAST market based on the OpenCL platform, the development period is shortened by utilizing the characteristics constructed at the top layer of the heterogeneous platform, and compared with the traditional pure hardware register level bottom layer design, the development time can be shortened by at least half a year on average. Specific optimization is performed on OpenCL development logic at the FPGA side by combining a platform compiler compiling principle according to hardware characteristics, an ultra-low-delay heterogeneous market link capable of efficiently decoding can be realized, and decoding delay is as low as about 1 us.
Drawings
Fig. 1 is a schematic diagram of a general design scheme of a decoding device of a system for implementing hardware decoding processing for exchange level-2FAST market quotations based on an OpenCL platform according to the present invention.
Fig. 2 is a schematic diagram of a UA3202 snapshot market information decoding module three-stage pipeline method of a system for implementing hardware decoding processing for exchange level-2FAST market based on an OpenCL platform.
Fig. 3 is a schematic diagram illustrating functions and an overall connection relationship of five modules of a decoding device of the system for implementing hardware decoding processing for exchange level-2FAST market quotations based on an OpenCL platform and an internal architecture thereof.
Fig. 4 is a schematic diagram of independent data RAM storage access of a system for implementing hardware decoding processing for exchange level-2FAST quotations based on an OpenCL platform according to the present invention.
Fig. 5 is a schematic diagram of hash mapping of a discrete storage space of a system for implementing hardware decoding processing for exchange level-2FAST market quotations based on an OpenCL platform according to the present invention.
Detailed Description
In order to more clearly describe the technical contents of the present invention, the following further description is given in conjunction with specific embodiments.
The system for realizing hardware decoding processing aiming at the exchange level-2FAST market quotation based on the OpenCL platform comprises the following steps:
the TCP data input module is used for receiving the exchange market data stream, analyzing a TCP or IP protocol layer and extracting the original market application layer data of the exchange;
the market classification filtering module is connected with the TCP data input module and is used for classifying and filtering market data to acquire FAST coded data corresponding to the transaction market, the index market and the snapshot market respectively;
the market decoding module is connected with the market category filtering module and used for analyzing various FAST market in parallel to realize a pipelined decoding core;
the merging and distributing module is connected with the market decoding module and is used for summarizing and scheduling the message;
and the UDP data output module is connected with the merging and distributing module and is used for carrying out multicast sending of the market information packet.
As a preferred embodiment of the present invention, the quotation decoding module includes a UA3201 decoding module, a UA3202 decoding module, and a UA3113 decoding module, which are respectively accessed to three fast protocol data output channels, and decode the fast protocol quotations of respective categories in parallel, and output corresponding decoded quotation messages.
As a preferred embodiment of the present invention, the TCP data input module establishes a TCP connection with the upstream exchange market gateway VDE, and obtains a TCP load market data stream by parsing a TCP protocol stack.
As a preferred embodiment of the present invention, the market conditions category filtering module performs classification filtering of market conditions according to the trading exchange market conditions STEP protocol rule, and outputs market conditions data of three categories in three ways.
As a preferred embodiment of the present invention, the market situation category filtering module adopts a parallel working principle of a tag35 filter and a tag96 filter, and the tag35 filter is used for filtering and searching a tag No. 35 in a STEP message to determine a specific market situation category of a current market situation data stream; the tag96 filter is used for analyzing the content of the 96 # label in the STEP message to extract the effective FAST coding market data in the market message, and the market filtered by the tag96 filter is output to the UA3201, UA3202 and UA3113 three way FAST protocol data channels in parallel according to the message category searched by the tag35 filter.
As a preferred embodiment of the present invention, the merging and distributing module employs a finite state machine, the state machine inputs the three decoded market conditions generated by the decoding module, the state machine is in a standby state by default, continuously detects validity of channel data of input channels of the three market conditions, and outputs the data meeting the sending condition to the UDP data output module according to a priority order of UA3202 message sending, UA3201 message sending, and UA3113 message sending.
As a preferred embodiment of the present invention, the UDP data output module performs cache connection through a channel, and generates a UDP output data stream, where the output data meets a downstream multicast data transmission rule.
In a preferred embodiment of the present invention, the UA3202 decoding module and the UA3113 decoding module split the decoding mechanism into two-stage pipeline parallel tasks of field segmentation and state machine parsing,
the field segmentation assembly line receives binary stream market data coded by FAST, processes four-byte field segmentation in parallel according to the FAST protocol field stop bit coding rule at the flow rate of four-byte data width per clock cycle, outputs the segmented fields, and sends the fields to the state machine analysis assembly line;
the state machine analysis production line receives field output generated by the field segmentation production line, organizes data by taking the field as a unit and stores the data into a data cache space; and according to the format of the market message, analyzing the data of the data cache space in real time to complete the complete decoding of the FAST message.
As the preferred implementation mode of the invention, the UA3202 market condition decoding module divides the whole decoding mechanism into three-stage pipeline parallel steps of a field division pipeline, state machine double nested analysis and quantity/price entrusting operation,
the state machine double-nested analysis assembly line receives field output generated by a field segmentation assembly line, organizes data by taking the field as a unit, stores the data into a data cache space, carries out real-time state skip and field analysis on the data in the data cache space according to a market message format to obtain three types of operation operations of adding or deleting or updating prices and quantity of buying and selling ten grades of prices and fifty orders of buying and selling one grade of prices, and transmits the service field and corresponding operation information obtained by decoding to a lower assembly line in real time;
the volume/price entrusting operation assembly line receives the business fields and the corresponding operation information generated by the state machine double-nested analysis assembly line, performs incremental operation on the current price queue and the entrusting queue according to the security codes, and manages the calculated full market and stores and manages the hash.
In the specific implementation mode of the invention, the system comprises a TCP data input module, a market condition category filtering module and a market condition decoding module, and the combination and distribution module and the UDP data output module comprise five parts. The five modules are sequentially connected with each other according to the sequence, and the modules transmit data information or control information through the channel. The exchange market data stream is accessed by a TCP data input module, sequentially passes through a market category filtering module, a market decoding module and a merging and distributing module, and finally a UDP data output module carries out UDP multicast forwarding on the decoded market packet. And data caching and data transmission are carried out among the modules through an OpenCL platform channel. The TCP data input module completes analysis of a TCP/IP protocol layer and extracts the original market application layer data of the exchange; the market classification filtering module completes the analysis and filtering of the application layer STEP message and takes FAST coded data corresponding to the transaction market, the index market and the snapshot market one by one; the quotation decoding module analyzes various FAST quotations in parallel and realizes a pipelined decoding core in the module; finally, the merging and distributing module and the UDP data output module finish the summary scheduling and multicast forwarding of the decoded message to the client. The invention can effectively realize the ultra-low delay heterogeneous accelerated market link with rapid development and efficient decoding, combines the respective advantages of the FPGA and the CPU, and provides support for the acceleration technology in the financial field.
Among five modules of the decoding device, a TCP data input module establishes TCP connection with a market gateway VDE of an upstream exchange on one hand, and extracts application layer STEP market data by analyzing a TCP/IP protocol stack on the other hand, and sends the data to a filtering module.
The TCP data access module is found by a TCP-offload Engine (TOE) IP core, the purpose of TCP network communication is met, on one hand, the module establishes a TCP connection with a VDE (virtual desktop exchange) of an upstream exchange market gateway, and on the other hand, a TCP load market data stream is obtained through complete analysis of a TCP protocol stack.
The market classification filtering module finishes the analysis of the STEP message, screens out UA3201 transaction market information, UA3202 snapshot market information and UA3113 index market information one by one according to the number 35 label of the application layer, and respectively sends the number 96 label contents of the three market information to the corresponding decoding modules for decoding the FAST layer protocol.
The market category filtering module is responsible for classifying and filtering market data, after the output data of the TCP data access module is obtained, the market category filtering module performs classified filtering of market categories according to a trading place market STEP protocol rule, then outputs three categories of market data in three ways, and considers the encoding mode of a STEP protocol of 'tag value', the implementation method adopts the parallel working principle of a tag35 filter and a tag96 filter, the tag35 filter is used for filtering and searching a 35-number label in a STEP message so as to judge the specific market category of the current market data stream, wherein 35 UA3201 is a transaction message, 35 UA3202 is a snapshot message, and 35 UA3113 is an index message; the tag96 filter is used for analyzing the content of the 96 # label in the STEP message, so as to extract the effective FAST coding quotation data in the quotation message, and each batch of quotation filtered by the tag96 filter is output to three FAST protocol data channels of UA3201, UA3202 and UA3113 in parallel according to the message category searched by the tag35 filter, and is butted with a lower decoding module.
The quotation decoding module is composed of three decoding modules which are a UA3201 decoding module, a UA3202 decoding module and a UA3113 decoding module respectively, the three decoding modules are respectively accessed to UA3201, UA3202 and UA3113 fast protocol data output channels of the quotation type filtering module, the false protocol quotations of respective types are decoded in parallel, and corresponding decoded quotation messages are output.
The quotation decoding module is the core of quotation decoding, three decoding modules are respectively accessed into FAST coding quotation data output by UA3201, UA3202 and UA3113 channels, and are used for performing parallel decoding aiming at respective categories, mainly completing analysis of FAST protocol pmap, field separation and operation, dictionary storage, quotation field mapping, forwarding quotation protocol encapsulation and the like, and outputting corresponding decoded quotation messages; the decoded message messages are collected to a merging and distributing module in a unified mode, and the module detects effectiveness of channel data of three market information input channels continuously. Once the data arrives, the sending condition is met, the market information message is sent to the UDP data output module according to the sequence of the UA3202 message sending, the UA3201 message sending and the UA3113 message sending, and finally the module generates UDP output data stream.
The UA3201/UA3113 market decoding module adopts two-stage pipeline parallel execution of field division and state machine analysis. A field segmentation production line receives a binary system quotation data stream coded by FAST, processes multi-byte field segmentation in parallel according to FAST protocol field stop bit coding rules, outputs segmented fields and sends the segmented fields to a finite state machine decoding production line; on one hand, the decoding pipeline of the finite state machine organizes the data by taking a field as a unit and stores the data into a data cache space; on the other hand, according to the market message format of 'pmap + common field', the data in the data cache space is analyzed in real time, and the complete decoding of the FAST message is completed.
The specific working method of the UA3202 market condition decoding module adopts three-stage pipeline parallel execution of field segmentation, finite state machine nested decoding and volume price operation. The field division pipeline has the same function as the UA3201/UA3113 field division pipeline, processes multi-byte field division in parallel, and outputs the divided fields. On one hand, the state machine double-nested analysis pipeline organizes data by taking a field as a unit and stores the data into a data cache space; and on the other hand, according to the quotation message format of 'pmap + common field + one-layer nested field + two-layer nested field', real-time state skip and field analysis are carried out on the data in the data cache space, operation (addition/deletion/update) of price and quantity in the ten-grade price and fifty entrusted quantities in the one-grade price is obtained, and the service field obtained by decoding and corresponding operation information are output in real time. And the price measuring operation production line performs incremental operation on the current price queue and the entrusting queue according to the security codes, and performs hash storage and management on the calculated full quotation in an RAM built in the FPGA board card.
The merging and distributing module is realized by adopting a finite state machine, the input of the state machine is three paths of decoded quotation messages generated by the decoding module, the state machine is in a standby state by default, the validity of channel data of input channels of the three paths of quotation messages is continuously detected, once the data arrives, the sending conditions are met, the corresponding quotation messages are output according to the priority sequence of UA3202 message sending, UA3201 message sending and UA3113 message sending, the width of the output data in unit clock running period is 16 bytes, and the output data is sent to the UDP data output module.
The UDP data output module is found by a UDP-offload Engine (UOE) IP core and meets the purpose of UDP network communication, the input of the module is the output of the merging and distributing module, the merging and distributing module is connected with a cache through a channel, and the output data of the module meets the downstream multicast sending data rule and generates UDP output data flow.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Firstly, the overall architecture of market decoding and forwarding shown in fig. 1 is designed by combining the characteristics of the market transmission protocol of Shanghai stock exchange lev 2. TCP data input, market conditions type filtering, market conditions decoding, merging distribution and UDP data output are sequentially connected with each other, and data information or control information is transmitted among the modules through channel.
Secondly, for the core market decoding module, a multi-class parallel processing strategy is adopted, and three routes of FAST market filtered according to the STEP message are respectively connected to the corresponding decoding modules for parallel decoding. The three-way parallel connection mode realizes the independence of decoding between the situation category data of each line, and reduces the serial time loss of the whole processing by using the space advantage of processing the multi-way parallel by the FPGA.
And thirdly, a multi-stage pipeline design is adopted in the core market decoding module, pipeline tasks at all stages are definitely divided, and the definition and flexibility of a decoding main body are ensured. Taking UA3202 message decoding as an example, as shown in fig. 2/3, the decoding module is internally divided into three stages of pipelines, namely FAST field division, state machine double nested parsing, and volume/price delegation operation. The field segmentation pipeline segments the FAST coding No. 96 label content sent by the filter module according to a stop bit coding rule, and organizes and stores the data stream by taking the field as a unit; the state machine continuously reads and maps pmap or disassembles an internal two-layer nested loop according to the segmented fields and the current decoding state position, and further matches the fields taken from the previous stage with the template operator to the specific trading exchange service layer meaning; and after obtaining the business field after FAST decoding, performing incremental operation on the current price queue and the entrustment queue according to the security codes, and storing the calculated total quotation.
For the UA3201 transaction quotation messages and the UA3113 index quotation messages one by one, the corresponding FAST template does not have two-layer nesting and incremental operation and maintenance of application layer price or entrusted queues, and the corresponding decoding service field can be obtained by field stop bit segmentation and state machine analysis in the core decoding module.
For UA3202 snapshot quotation, since the quotation application layer protocol includes operation operations (add/delete/update) for buying and selling prices and quantities in ten steps and buying and selling consistencies in one step, it is inevitable to store these information with the security code as an index when the snapshot quotation is decoded. Specifically, after the decoding module completes decompression of the fast field, the security code is required to find the associated price or quantity for incremental operation, and after the incremental operation is completed, the result of the incremental operation is updated to the storage. Therefore, the snapshot market decoding involves frequent data loading and storage with the on-chip RAM, and the use and the notice of the RAM based on the OpenCL platform also become the key of the improvement of the decoding function and the optimization of the efficiency. Three embodiments for RAM creation, RAM reading and writing, and RAM space utilization in the design are described in detail below:
a) RAM creation
When the RAM for storing the market information fields of the stock marks is created, the independence of RAM access and storage among all market information fields of any stock code needs to be ensured. For the OpenCL platform, different Load/Store logical units are allocated to these market fields that need to be read and written simultaneously. As shown in fig. 4, after the kernel calculation unit is compiled by the platform compiler, there are three sets of Load/Store logic units, and 4 RAM banks are read and written through the Load/Store logic, where Bank _0 and Bank _1 can be loaded or stored with data independently (stall-free) at the same time, and there is a common Load/Store unit between Bank _2 and Bank _3, and if it is logically required to access Bank _2 and Bank _3 at the same time, RAM blocking (stall) is caused, and the access efficiency is reduced.
b) RAM read-write
For the incremental snapshot quotation, the updating logic on the application layer is to read in the quotation field of the last time slice from the RAM first, and write back the quotation field to the RAM after the updating is finished, but the platform compiler has default requirements on the reading and writing sequence of the RAM. In order to ensure that data is valid when the RAM is read, a related protection circuit is automatically generated in the compiling and converting process of the OpenCL language to ensure that the read operation is positioned after the write operation. Such default optimization items not only increase the use of logic resources, but also affect the overall decoding efficiency. For the application of snapshot market decoding, the data can be completely guaranteed to be effective when the RAM is read from the sequence of market data receiving time. Therefore, the compiler can be prevented from automatically optimizing the read-write sequence of the RAM through a proper compiling instruction, and therefore the board card running frequency and the decoding delay are further improved.
c) RAM space utilization
The market information field for storing snapshot market increment operation needs a large RAM space, and great challenge is provided for the RAM resource of the board card. Taking an Intel middle-end development board card used in the invention as an example, the board card has 5.25MB of RAM resources in total, a fixed storage space occupied by an OpenCL platform and a storage space required by a TOE/UOE IP core are removed, the remaining RAM resources of the board card can be less than 2MB, and the storage space of about 800 bytes is required for storing a stock market quotation, so that the board card can store about two code segment snapshot market quotations theoretically at most. And stock only has 600/602/603/605/688 five code segments, which means that 2 boards are needed to support decoding of only stock type quotation snapshots of the exchange, and more are needed to support snapshot quotations of funds and bonds. Therefore, it is very important to optimize the space utilization of the RAM from the viewpoints of cost saving, system deployment, and operation and maintenance simplicity.
Aiming at the characteristics of the segmented configuration of the security codes, the invention selects a proper hash mapping function in the optimization process, converts the security code storage of the discrete address into the storage mapping of the continuous address (as shown in figure 5), fully utilizes the available storage of the board card and contains the market information of more code segments. At present, through optimization, on the used board card, the snapshot type quotation can support the exchange of all stocks, ETF funds, convertible debt and part of LOF funds.
By adopting the system for realizing hardware decoding processing aiming at the exchange level-2FAST market based on the OpenCL platform, the development period is shortened by utilizing the characteristics constructed at the top layer of the heterogeneous platform, and compared with the traditional pure hardware register level bottom layer design, the development time can be shortened by at least half a year on average. Specific optimization is performed on OpenCL development logic at the FPGA side by combining a platform compiler compiling principle according to hardware characteristics, an ultra-low-delay heterogeneous market link capable of efficiently decoding can be realized, and decoding delay is as low as about 1 us.
In this specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims (9)

1. A system for realizing hardware decoding processing aiming at exchange level-2FAST market based on an OpenCL platform is characterized by comprising:
the TCP data input module is used for receiving the exchange market data stream, analyzing a TCP or IP protocol layer and extracting the original market application layer data of the exchange;
the market classification filtering module is connected with the TCP data input module and is used for classifying and filtering market data to acquire FAST coded data corresponding to the transaction market, the index market and the snapshot market respectively;
the market decoding module is connected with the market category filtering module and used for analyzing various FAST market in parallel to realize a pipelined decoding core;
the merging and distributing module is connected with the market decoding module and is used for summarizing and scheduling the message;
and the UDP data output module is connected with the merging and distributing module and is used for carrying out multicast sending of the market information packet.
2. The system for implementing hardware decoding processing for level-2FAST quotation of a exchange based on an OpenCL platform according to claim 1, wherein the quotation decoding module comprises a UA3201 decoding module, a UA3202 decoding module, and a UA3113 decoding module, which are respectively accessed to three FAST protocol data output channels, decode the FAST protocol quotation of each category in parallel, and output corresponding decoded quotation messages.
3. The system for realizing hardware decoding processing for exchange level-2FAST quotation based on the OpenCL platform as claimed in claim 1, wherein the TCP data input module establishes a TCP connection with an upstream exchange quotation gateway VDE, and obtains a TCP load quotation data stream by parsing a TCP protocol stack.
4. The system for realizing hardware decoding processing for exchange level-2FAST quotation based on the OpenCL platform as claimed in claim 1, wherein the quotation category filtering module performs classification filtering of quotation categories according to exchange quotation STEP protocol rules, and outputs three categories of quotation data in three ways.
5. The system for realizing hardware decoding processing for exchange level-2FAST market based on the OpenCL platform according to claim 1, wherein the market category filtering module adopts a tag35 filter and a tag96 filter working in parallel, and the tag35 filter is used for filtering and searching a number 35 tag in a STEP message to determine a specific market category of a current market data stream; the tag96 filter is used for analyzing the content of the 96 # label in the STEP message to extract the effective FAST coding market data in the market message, and the market filtered by the tag96 filter is output to the UA3201, UA3202 and UA3113 three way FAST protocol data channels in parallel according to the message category searched by the tag35 filter.
6. The system for implementing hardware decoding processing for level-2FAST quotation of a trading exchange based on an OpenCL platform according to claim 1, wherein the merging and distributing module employs a finite state machine, the input of the state machine is a three-way decoded quotation message generated by the decoding module, the state machine is in a standby state by default, the validity of channel data of the three-way quotation message input channels is continuously detected, and data meeting the sending conditions are output to the UDP data output module according to the priority order of UA3202 message sending, UA3201 message sending and UA3113 message sending.
7. The system for realizing hardware decoding processing for exchange level-2FAST market quotation based on the OpenCL platform according to claim 1, wherein the UDP data output module performs cache connection through a channel, and the output data meets a downstream multicast sending data rule to generate a UDP output data stream.
8. The system according to claim 2, wherein the UA3202 and UA3113 decoding modules split the decoding mechanism into two-stage pipeline parallel tasks of field segmentation and state machine parsing,
the field segmentation assembly line receives binary stream market data coded by FAST, processes four-byte field segmentation in parallel according to the FAST protocol field stop bit coding rule at the flow rate of four-byte data width per clock cycle, outputs the segmented fields, and sends the fields to the state machine analysis assembly line;
the state machine analysis production line receives field output generated by the field segmentation production line, organizes data by taking the field as a unit and stores the data into a data cache space; and according to the format of the market message, analyzing the data of the data cache space in real time to complete the complete decoding of the FAST message.
9. The system for implementing hardware decoding processing for exchange level-2FAST market based on OpenCL platform according to claim 2, wherein said UA3202 market decoding module splits the whole decoding mechanism into three-stage pipeline parallel steps of field splitting pipeline, state machine double nested parsing, and volume/price delegation operation,
the state machine double-nested analysis assembly line receives field output generated by a field segmentation assembly line, organizes data by taking the field as a unit, stores the data into a data cache space, carries out real-time state skip and field analysis on the data in the data cache space according to a market message format to obtain three types of operation operations of adding or deleting or updating prices and quantity of buying and selling ten grades of prices and fifty orders of buying and selling one grade of prices, and transmits the service field and corresponding operation information obtained by decoding to a lower assembly line in real time;
the volume/price entrusting operation assembly line receives the business fields and the corresponding operation information generated by the state machine double-nested analysis assembly line, performs incremental operation on the current price queue and the entrusting queue according to the security codes, and manages the calculated full market and stores and manages the hash.
CN202011605658.7A 2020-12-29 2020-12-29 System for realizing hardware decoding processing of exchange level-2FAST market based on OpenCL platform Pending CN112650499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011605658.7A CN112650499A (en) 2020-12-29 2020-12-29 System for realizing hardware decoding processing of exchange level-2FAST market based on OpenCL platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011605658.7A CN112650499A (en) 2020-12-29 2020-12-29 System for realizing hardware decoding processing of exchange level-2FAST market based on OpenCL platform

Publications (1)

Publication Number Publication Date
CN112650499A true CN112650499A (en) 2021-04-13

Family

ID=75363993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011605658.7A Pending CN112650499A (en) 2020-12-29 2020-12-29 System for realizing hardware decoding processing of exchange level-2FAST market based on OpenCL platform

Country Status (1)

Country Link
CN (1) CN112650499A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113132478A (en) * 2021-04-16 2021-07-16 国泰君安证券股份有限公司 System for realizing Binary protocol market accelerated decoding in security trading system based on OpenCL
CN113193974A (en) * 2021-07-02 2021-07-30 深圳华云信息系统有限公司 Multicast-based market information pushing method, system, equipment and medium
CN114443566A (en) * 2022-01-24 2022-05-06 北京中科胜芯科技有限公司 Method for judging consistency of incremental snapshot market data
CN114995813A (en) * 2022-06-28 2022-09-02 上海中汇亿达金融信息技术有限公司 Exchange API module and related exchange application platform
CN115174698A (en) * 2022-09-07 2022-10-11 深圳华锐分布式技术股份有限公司 Market data decoding method, device, equipment and medium based on table entry index
CN115378847A (en) * 2022-08-23 2022-11-22 国联证券股份有限公司 Security market delay measurement system and method
CN117135231A (en) * 2023-10-26 2023-11-28 上海特高信息技术有限公司 Decompression method of FPGA-based low-delay financial big data stream

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633493A (en) * 2019-08-06 2019-12-31 杨涛 OpenCL transaction data processing method based on Intel FPGA
CN111147462A (en) * 2019-12-17 2020-05-12 苏州浪潮智能科技有限公司 FPGA-based step protocol analysis method, system, terminal and storage medium
CN111600731A (en) * 2020-07-27 2020-08-28 南京艾科朗克信息科技有限公司 System and method for rapidly processing futures market gears
CN111967244A (en) * 2020-07-30 2020-11-20 浪潮(北京)电子信息产业有限公司 FAST protocol decoding method, device and equipment based on FPGA

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633493A (en) * 2019-08-06 2019-12-31 杨涛 OpenCL transaction data processing method based on Intel FPGA
CN111147462A (en) * 2019-12-17 2020-05-12 苏州浪潮智能科技有限公司 FPGA-based step protocol analysis method, system, terminal and storage medium
CN111600731A (en) * 2020-07-27 2020-08-28 南京艾科朗克信息科技有限公司 System and method for rapidly processing futures market gears
CN111967244A (en) * 2020-07-30 2020-11-20 浪潮(北京)电子信息产业有限公司 FAST protocol decoding method, device and equipment based on FPGA

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHQS52: "上海证券交易所Level-2 行情发布系统接口说明书", pages 1 - 43, Retrieved from the Internet <URL:https://max.book118.com/html/2015/1101/28282548.shtm> *
邹经纬等: "OPENCL开发FPGA应用的路径探索", 《交易技术前沿》, no. 30, pages 51 - 55 *
邹经纬等: "基于OpenCL开发的深交所Binary协议行情解码", 《交易技术前沿》, no. 39, 17 September 2020 (2020-09-17), pages 59 - 66 *
邹经纬等: "基于OpenCL开发的深交所Binary协议行情解码", 《交易技术前沿》, no. 39, pages 59 - 66 *
金乐人等: "FPGA技术在极速交易场景的应用示范", 《交易技术前沿》, no. 39, pages 94 - 100 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113132478A (en) * 2021-04-16 2021-07-16 国泰君安证券股份有限公司 System for realizing Binary protocol market accelerated decoding in security trading system based on OpenCL
CN113193974A (en) * 2021-07-02 2021-07-30 深圳华云信息系统有限公司 Multicast-based market information pushing method, system, equipment and medium
CN114443566A (en) * 2022-01-24 2022-05-06 北京中科胜芯科技有限公司 Method for judging consistency of incremental snapshot market data
CN114443566B (en) * 2022-01-24 2023-03-24 北京中科胜芯科技有限公司 Method for judging consistency of incremental snapshot market data
CN114995813A (en) * 2022-06-28 2022-09-02 上海中汇亿达金融信息技术有限公司 Exchange API module and related exchange application platform
CN114995813B (en) * 2022-06-28 2023-12-19 上海中汇亿达金融信息技术有限公司 Exchange API module and related exchange application platform
CN115378847A (en) * 2022-08-23 2022-11-22 国联证券股份有限公司 Security market delay measurement system and method
CN115378847B (en) * 2022-08-23 2023-10-31 国联证券股份有限公司 Securities market time delay measuring system and method
CN115174698A (en) * 2022-09-07 2022-10-11 深圳华锐分布式技术股份有限公司 Market data decoding method, device, equipment and medium based on table entry index
CN115174698B (en) * 2022-09-07 2022-12-13 深圳华锐分布式技术股份有限公司 Market data decoding method, device, equipment and medium based on table entry index
CN117135231A (en) * 2023-10-26 2023-11-28 上海特高信息技术有限公司 Decompression method of FPGA-based low-delay financial big data stream
CN117135231B (en) * 2023-10-26 2023-12-29 上海特高信息技术有限公司 Decompression method of FPGA-based low-delay financial big data stream

Similar Documents

Publication Publication Date Title
CN112650499A (en) System for realizing hardware decoding processing of exchange level-2FAST market based on OpenCL platform
CN105654383B (en) Low-delay FAST market decoding device and method based on pipeline architecture
US10672014B2 (en) Method and a device for decoding data streams in reconfigurable platforms
US10372653B2 (en) Apparatuses for providing data received by a state machine engine
US8548900B1 (en) FPGA memory paging
CN109997154A (en) Information processing method and terminal device
CN110018850A (en) For can configure equipment, the method and system of the multicast in the accelerator of space
US8768805B2 (en) Method and apparatus for high-speed processing of financial market depth data
US7921046B2 (en) High speed processing of financial information using FPGA devices
CN102510323B (en) Frame identifying method for serial data
CN105684020B (en) Order book management equipment in hardware platform
CN101329665A (en) Method for analyzing marking language document and analyzer
US10592450B2 (en) Custom compute cores in integrated circuit devices
CN104699448A (en) Paralleled decoding system of FAST protocol and realization method of paralleled decoding system
US7185179B1 (en) Architecture of a parallel computer and an information processing unit using the same
CN103595571A (en) Preprocessing method, device and system for website access logs
US20180089019A1 (en) Validation of a symbol response memory
US20190089370A1 (en) Program counter compression method and hardware circuit thereof
CN114281339A (en) Program compiling method, compiler, device, and storage medium
US10936320B1 (en) Efficient performance of inner loops on a multi-lane processor
Li et al. Fast protocol decoding in parallel with FPGA hardware
CN107221067A (en) A kind of serial number access control method and system
Lu et al. Accelerating generic graph neural networks via architecture, compiler, partition method co-design
EP4195092A1 (en) Text processing method and apparatus, system, device, and storage medium
Sidhu High throughput, tree automata based XML processing using FPGAs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination