CN113312024A

CN113312024A - Option pricing calculation hardware accelerator, accelerator card and computer equipment

Info

Publication number: CN113312024A
Application number: CN202110674306.5A
Authority: CN
Inventors: 黎渊; 戴艺; 陆平静; 欧洋; 常俊胜; 孙岩; 张建民; 徐金波; 罗章; 王子聪; 熊泽宇
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2021-08-27
Anticipated expiration: 2041-06-17
Also published as: CN113312024B

Abstract

The invention discloses an option pricing calculation hardware accelerator, an accelerator card and computer equipment, wherein the option pricing calculation hardware accelerator comprises a Gaussian random number generator, a multiplier M2, an adder a2, an EXP module, a multiplexer MUXA, a multiplexer MUXB, a multiplier M5, a subtracter s0, a comparator, a delay module delay, an accumulator, a shifter and a multiplication array, and through the combination of the components, a Monte Carlo iteration path can be simulated in M clock cycles, so that one option execution price SM can be predicted. To reduce hardware resource consumption and circuit complexity, 64-bit floating point operations are converted to fixed point operations; the accelerator card is a card comprising the hardware accelerator, and the computer equipment is provided with the option pricing calculation hardware accelerator. The invention can realize no pause in the accelerated realization process of Monte Carlo option pricing calculation hardware and full hydration in the whole calculation process, and has better performance and energy efficiency ratio compared with the realization of a CPU and a GPU under the same process.

Description

Option pricing calculation hardware accelerator, accelerator card and computer equipment

Technical Field

The invention relates to a hardware acceleration technology of Monte Carlo option pricing, in particular to an option pricing calculation hardware accelerator, an accelerator card and computer equipment.

Background

Monte Carlo option pricing is an existing software algorithm, and as shown in FIG. 1, the calculation process of Monte Carlo option pricing mainly includes two cycles: the inner loop (8 th to 11 th rows) simulates a primary random prediction path of the option price; the outer loop (lines 6-15) calculates and accumulates the proceeds from all paths, then averages the sum of the proceeds and discounts (lines 16-17) to get the predicted option price. Aiming at the software algorithm model of the Monte Carlo option pricing, the hardware accelerator is realized by adopting the FPGA, and the calculation efficiency of the Monte Carlo option pricing is expected to be improved. However, currently, the implementation of the hardware accelerator by using the FPGA is expected to improve the monte carlo option pricing to directly convert the software algorithm into the hardware accelerator, and the hardware accelerator generated in this way still has a large amount of optimization space and has the problems of insufficient energy efficiency ratio and performance.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides an option pricing calculation hardware accelerator, an accelerator card and computer equipment, which can realize no pause in the Monte Carlo option pricing calculation hardware acceleration realization process and full hydration in the whole calculation process, and have better energy efficiency ratio and performance compared with the CPU and GPU realization under the same process.

In order to solve the technical problems, the invention adopts the technical scheme that:

an option pricing calculation hardware accelerator comprising a first circuit unit for M time slices to complete a monte carlo simulation of an option execution price, SM, the first circuit unit comprising:

a Gaussian random number generator for generating a Gaussian random number z;

a multiplier M2, configured to multiply an input parameter sigqrdt by a gaussian random number z, where a calculation function expression of the parameter sigqrdt is sigma sqrt (T/M), where sigma is a preset option price fluctuation rate, T is a preset option validity period, and M is a preset number of time slices simulated by each monte carlo iteration;

an adder a2 for summing an input parameter drift and an output of the multiplier M2, wherein the calculated function expression of the parameter drift is (r-0.5 sigma) sigma (T/M), where r is a preset risk-free rate;

the EXP module is used for carrying out exponential operation on the output of the adder a 2;

a multiplexer MUXA for selecting the option initial prices S0, 1 and the multiplication result acc _ sm output by the multiplier M5, the multiplexer MUXA selects the option initial price S0 at the (dg + dm2+ da2+ de) th clock cycle after reset to wait for the first gaussian random number z0 to enter the multiplier M5, the multiplexer MUXA selects the option initial price S0 next cycle, then selects the constant 1 in the next 3 cycles, selects the multiplication result acc _ sm output by the multiplier M5 next M-4 cycles, and these three selection operations repeat I times to simulate all monte carlo simulation paths, where I is the preset number of monte carlo iterations; where dg denotes the number of delayed clock cycles of the gaussian random number generator, dm2 denotes the number of delayed clock cycles of multiplier m2, da2 denotes the number of delayed clock cycles of adder a2, and de denotes the number of delayed clock cycles of the EXP module;

the multiplier m5 is used for multiplying the output of the EXP module and the output of the multiplexer MUXA to obtain a multiplication result acc _ sm and outputting the multiplication result acc _ sm to the multiplication array;

and the multiplication array is used for multiplying 4 multiplication intermediate results which are generated after M time slice simulations of one random path are completed and distributed in a four-stage pipeline of the M5 multiplier, and outputting a final option price SM predicted by Monte Carlo iteration after 3+2dma beats.

Optionally, the multiplication array comprises:

the multiplier ma0 is used for multiplying the result obtained by registering the output of the multiplier m5 and the output of the multiplier m5 through a primary register;

the multiplier ma1 is used for multiplying the results obtained after the output of the multiplier m5 is respectively registered by a two-stage register and a three-stage register;

and the multiplier ma2 is used for multiplying the outputs of the multipliers ma0 and ma1 to obtain the final option price SM of the monte carlo iterative prediction.

Optionally, the multiplication array further includes an output terminal of the available signal valid, and the condition for generating the available signal valid of the multiplication array is that the (dg + dm2+ da2+ de + M +3+2dma) th clock cycle after reset is valid and then valid for one beat per M time slices, where dg represents the number of delay clock cycles of the gaussian random number generator, dm2 represents the number of delay clock cycles of multiplier M2, da2 represents the number of delay clock cycles of adder a2, de represents the number of delay clock cycles of the EXP module, M represents the number of simulated time slices per monte carlo iteration, the number of delay clock cycles of multiplier ma0, multiplier ma1 and multiplier ma2 are dma, and 2dma represents that the number of delay clock cycles is twice dma.

Optionally, the gaussian random number generator comprises:

a uniformly distributed random number generation module for generating uniformly distributed random numbers URNs using a WELL19937 method;

Box-Muller for converting uniformly distributed random numbers URNs to gaussian random numbers z using Box-Muller method.

Optionally, a second circuit unit for processing an external loop data path of the monte carlo option pricing calculation is further connected to the pipeline output end of the first circuit unit, and the second circuit unit includes:

a subtraction unit s0, configured to subtract the option price SM and the right price strike of the option predicted by the monte carlo iteration at this time;

a comparator for comparing 0 with the output of the subtraction unit s0 and outputting a control signal;

a multiplexer MUXB for selecting 0 or the output of the subtracting unit s0 as its output according to the output of the comparator complerisor, and the selection condition is that if the output of the subtracting unit s0 is greater than or equal to 0, the comparator complerisor outputs 1, the multiplexer MUXB selects the output of the subtracting unit s0 for output, otherwise the multiplexer MUXB selects 0 for output;

a delayer delay for generating an enable signal en according to an available signal valid of the multiplication array;

and the accumulator is used for accumulating the output of the multiplexer MUXB under the control of the enable signal en to obtain an accumulated value sum _ payoff of the option benefits predicted by all Monte Carlo simulation paths.

Optionally, the pipeline output end of the second circuit unit is further connected to a third circuit unit for generating a final option estimation price according to the accumulated value sum _ payoff of the option proceeds predicted by all monte carlo simulation paths.

Optionally, the third circuit unit includes:

the shift unit is used for shifting the accumulated value sum _ payoff of the option benefits predicted by all Monte Carlo simulation paths to realize division;

and the multiplier m7 is configured to multiply the output of the shift unit and the discount rate ert to obtain a final benefit payoff, where a calculation function expression of the discount rate ert is ert ═ exp (-r × T), r is a preset risk-free interest rate, and T is a preset option validity period.

In addition, the invention also provides a hardware accelerator card, which comprises an accelerator card body and an accelerator chip arranged on the accelerator card body, wherein the accelerator chip is used for calculating the hardware accelerator for the option pricing.

In addition, the invention also provides computer equipment which comprises a mainboard provided with a microprocessor and a memory which are connected with each other, and is characterized by also comprising the option pricing calculation hardware accelerator, wherein the microprocessor and the option pricing calculation hardware accelerator are in communication connection.

Optionally, the option pricing calculation hardware accelerator is integrated on the motherboard, or the option pricing calculation hardware accelerator is installed on the motherboard in a manner of a board card in an inserting manner.

Compared with the prior art, the invention has the following advantages:

1. the invention comprises a Gaussian random number generator, a multiplier M2, an adder a2, an EXP module, a multiplexer MUXA, a multiplexer MUXB, a multiplier M5, a subtracter s0, a comparator, a delay module delay, an accumulator, a shifter and a multiplication array, through the combination of the above components, Monte Carlo simulation of one option execution price SM can be completed in M time slices, 64-bit floating point operation is converted into fixed point operation, the efficiency of accelerated calculation can be effectively improved, the hardware accelerated realization process of Monte Carlo option pricing calculation has no pause, the total hydration of the whole calculation process has better energy efficiency ratio and performance compared with the realization of a CPU and a GPU under the same process.

2. The multiplication array is used for multiplying 4 multiplication intermediate results which are generated after M time slice simulations of a random path are completed and distributed in a four-stage pipeline of an M5 multiplier, and outputting a final option price SM which is subjected to Monte Carlo iterative prediction after 3+2dma beats, so that for the 4 multiplication intermediate results which are generated after the M time slice simulations of the random path and distributed in the four-stage pipeline of the M5 multiplier, the generated 4 multiplication intermediate results are distributed in the four-stage pipeline of the M5 multiplier, and by introducing the multiplication array, the four intermediate results sequentially enter the multiplication array, and the final multiplication result is obtained from an output end of the multiplication array after 3+2dma beats, so that the simulation process is not halted, and the whole calculation process is fully hydrated.

3. The invention converts 64-bit floating point operation into fixed point operation, which can effectively improve the efficiency of accelerated calculation.

Drawings

Fig. 1 is a pseudo code diagram of a software implementation of a conventional monte carlo option pricing algorithm.

Fig. 2 is a schematic circuit diagram of an option pricing calculation hardware accelerator according to an embodiment of the present invention.

Fig. 3 is a diagram illustrating comparison of performance and performance of option pricing calculation hardware accelerators according to an embodiment of the present invention.

Detailed Description

As shown in fig. 2, the option pricing calculation hardware accelerator of the embodiment includes a first circuit unit for completing monte carlo simulation of the option execution price SM once in M time slices, and the first circuit unit includes:

a Gaussian random number generator for generating a Gaussian random number z;

It should be noted that the bold characters or numbers in fig. 2 indicate the number of delayed clock cycles of the corresponding components, and (x, y) indicate the bit width value of the corresponding operand, where x indicates the bit width of the integer part and y indicates the bit width of the fractional part.

As shown in fig. 2, the multiplication array includes:

As shown in fig. 2, the multiplication array further includes an output terminal for the valid signal, and the condition for generating the valid signal of the multiplication array is that the (dg + dm2+ da2+ de + M +3+2dma) th clock cycle after reset is valid and then valid for one beat every M time slices, where dg represents the number of delay clock cycles of the gaussian random number generator, dm2 represents the number of delay clock cycles of the multiplier M2, da2 represents the number of delay clock cycles of the adder a2, de represents the number of delay clock cycles of the EXP module, M represents the number of simulated time slices per monte carlo iteration, the number of delay clock cycles of the multipliers ma0, ma1 and ma2 are dma, and 2dma represents that the number of delay clock cycles is twice dma.

As shown in fig. 2, the gaussian random number generator includes:

The Gaussian random number generator in this embodiment employs the WELL19937 algorithm as a uniformly distributed random number generator, which has proven to be the most advanced of the current uniformly distributed random number generators, is capable of generating the highest quality random numbers, and has a2¹⁹⁹³⁷The introduction of the algorithm ensures the correctness of the simulation result, and the BM method is selected in this embodiment because it can generate a completely accurate gaussian sample. Furthermore, unlike reject methods that include if-else conditions in the datapath (e.g., Ziggurat and Monty-Python), the BM has a fixed datapath that ensures that GRNs are available every clock cycle. Similarly, the WELL19937 algorithm has been shown to have perfect distribution characteristics, and can generate extremely long period of 2¹⁹⁹³⁷The URNs of (1). These ensure the quality of the converted GRNs and the correctness of the final system.

Referring to fig. 2, the execution stage of the first circuit unit in this embodiment includes a first stage and a second stage of the hardware structure of the full pipeline, the first stage is the execution stage of the gaussian random number generator, and the second stage is the execution stage of the first circuit unit. In the first stage, a Gaussian Random Number Generator (GRNG) is used to generate Gaussian Random Numbers (GRNs) to simulate the wiener process. First, Uniform Random Numbers (URNs) were generated using the WELL19937 method. It is then converted to a Gaussian random number z by the Box-Muller (BM) method. In the second stage, Monte Carlo simulation of option execution price SM is completed in M time slices, namely acceleration of inner loop in software algorithm is completed. (line 8 to line 11). To reduce complexity, we first extend the price volatility formula (line 10) to a two-input static single-valued intermediate representation containing only basic operations. These operations are mapped into two multiplications, one addition, and one EXP block. The parameter near each module is the number of clock cycles that the module needs to complete the corresponding calculation. The multiplexer MUXA controls the computation flow, which selects the option initial price S0 at the (dg + dm2+ da2+ de) th clock cycle after system start-up to wait for the first gaussian variable z0 to enter multiplier m 5. The multiplexer MUXA then selects the epoch initial price S0 in the next cycle, constant 1 in the next 3 cycles, and the multiply result acc _ sm output by multiplier M5 in the next M-4 cycle. These three selection operations are repeated I times to simulate all monte carlo iteration paths. At every iteration mth clock cycle, the accumulated SM will be distributed in the four-stage pipeline of multiplier M5. These four multipliers will then be forwarded in turn to the multiplier array to obtain the option price SM predicted for this monte carlo iteration. The entire logic of the second stage is full-flow, and when the pipeline is full, an analog path of option prices can be calculated within M time slices.

As shown in fig. 2, in this embodiment, the pipeline output end of the first circuit unit is further connected to a second circuit unit of the external loop data path for processing monte carlo option pricing calculation, and the second circuit unit includes:

the multiplexer MUXB is used for selecting 0 or the output of the subtraction unit s0 as the output of the multiplexer according to the output of the comparator complerisor, and the selection condition is that if the output of the subtraction unit s0 is greater than or equal to 0, the comparator complerisor outputs 1, the multiplexer MUXB selects the output of the subtraction unit s0 to output, otherwise, the multiplexer MUXB selects 0 to output;

Referring to fig. 2, the execution stage of the second circuit unit in this embodiment is the third stage of the hardware structure of the full pipeline.

The third stage is to process the data path of the outer loop (lines 6 to 15), where the earnings are calculated and accumulated from the option prices SM predicted this monte carlo iteration obtained in the second stage. The subtraction unit s0 calculates the option price SM of the current monte carlo iterative prediction minus strike _ price. Then, under the condition that the option price SM predicted by the Monte Carlo iteration of the time is greater than or equal to strike _ price, the calculation result is forwarded to the accumulator through the multiplexer MUXB. The enable signal en is connected to the enable signal of the accumulator. And a delay is added to ensure that the available signal valid of the multiplier array arrives at the accumulator at the same time as the output of the multiplexer MUXB arrives at the accumulator.

As shown in fig. 2, the pipeline output end of the second circuit unit in this embodiment is further connected to a third circuit unit for generating a final option estimated price according to the accumulated value sum _ payoff of the option proceeds predicted by all monte carlo simulation paths.

As shown in fig. 2, the third circuit unit in the present embodiment includes:

Referring to fig. 2, the execution stage of the third circuit unit in this embodiment is the fourth stage of the hardware structure of the full pipeline. The fourth stage generates the final option forecast price. It consists of two operations, one division, one multiplication, performing averaging and rendering respectively. To reduce complexity, we set the number of iterations I to a power of 2, so that a simple shift unit can be used to implement the division.

In the embodiment, 64-bit floating point operation is converted into fixed point operation, so that the running performance of the hardware accelerator can be improved, hardware resources consumed by the hardware accelerator are reduced, and the design complexity is reduced. In addition, as an optional implementation manner, in this embodiment, operand bit width in the hardware accelerator structure is optimized, and a bit width search of the whole system is performed through a simulated annealing algorithm, so that an operand bit width optimization result (x, y) of each component is obtained as follows:

the bit width value of the gaussian random number z input by the multiplier m2 is (5,19), and the bit width value of the parameter sigqrdt is (0, 30);

the bit width value of the parameter drift input by the adder a2 is (5,24), and the bit width value of the output result of the multiplier m2 is (5, 24);

the EXP module inputs the adder a2 and outputs the result with the bit width value of (5, 24);

the bit width value of the output result of the EXP module input by the multiplier m5 is (2,14), and the bit width value of the output result of the multiplexer MUXA is (16, 16);

the operation digit width values input by the multipliers ma0, ma1 and ma2 are (12, 12);

the bit width value of the operation input from the subtraction unit s0 is (16,4), and the bit width value of the output result is (16, 4);

the bit width value of the output result of the accumulator is (40, 4);

the bit width value of the output result of the shift unit is (17, 7);

the bit width value of the chip rate ert input to the multiplier m7 is (2,6), and the bit width value of the output result is (17, 13).

The structure after bit width optimization reduces hardware area overhead and hardware complexity while ensuring calculation accuracy. Compared with the traditional greedy algorithm, the greedy algorithm only accepts a more optimal solution as a next search state when the whole-system bit width search is carried out, and the simulated annealing algorithm probably accepts a worse solution probabilistically, so that a local optimal solution can be skipped to obtain a global optimal solution.

It should be noted that the parameters related in fig. 1 may be pre-calculated in advance and then written into the corresponding registers, or may be temporarily calculated after the parameters are directly input, which may be selected as needed.

Fig. 3 is a performance comparison structure of the option pricing calculation hardware accelerator implemented in this embodiment and the monte carlo option pricing calculation implemented by the CPU and the GPU in the existing equivalent process. As can be seen from fig. 3, the option pricing calculation hardware accelerator implemented in this embodiment has significant advantages in terms of throughput, power consumption, and throughput/power consumption ratio compared to the monte carlo option pricing calculation implemented by the CPU and the GPU in the existing equivalent process.

In addition, an embodiment further provides a hardware accelerator card, which includes an accelerator card body and an accelerator chip disposed on the accelerator card body, where the accelerator chip is the option pricing calculation hardware accelerator.

In addition, the embodiment also provides a computer device, which includes a main board installed with a microprocessor and a memory connected with each other, and also includes the option pricing calculation hardware accelerator, where the microprocessor and the option pricing calculation hardware accelerator are connected in communication.

The option pricing calculation hardware accelerator is integrated on the mainboard, or the option pricing calculation hardware accelerator is installed on the mainboard in a board card mode in an inserting mode.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims

1. An option pricing computation hardware accelerator comprising a first circuit unit for M time slices to complete a monte carlo simulation of an option execution price, SM, the first circuit unit comprising:

a Gaussian random number generator for generating a Gaussian random number z;

2. The option pricing computation hardware accelerator of claim 1, wherein the multiplier array comprises:

3. The option pricing computation hardware accelerator of claim 2, wherein the multiplier array further comprises an output for an available signal valid, and the condition for generating the available signal valid of the multiplier array is that the (dg + dm2+ da2+ de + M +3+2dma) th clock cycle after reset is valid and then valid for one beat per M time slices, where dg represents the number of delay clock cycles of the gaussian random number generator, dm2 represents the number of delay clock cycles of multiplier M2, da2 represents the number of delay clock cycles of adder a2, de represents the number of delay clock cycles of the EXP module, M represents the number of time slices simulated per monte carlo iteration, the number of delay clock cycles of multiplier 0, multiplier 1 and multiplier ma2 are all a, and 2dma represents the number of delay clock cycles that is twice dma.

4. The option pricing computation hardware accelerator of claim 3, wherein the Gaussian random number generator comprises:

5. The option pricing calculation hardware accelerator of claim 4, wherein the pipeline output of the first circuit unit is further coupled to a second circuit unit for processing an external round robin data path for Monte Carlo option pricing calculations, the second circuit unit comprising:

6. The option pricing calculation hardware accelerator according to claim 5, wherein the pipeline output end of the second circuit unit is further connected to a third circuit unit for generating a final option estimation price according to an accumulated value sum _ payoff of option proceeds predicted by all Monte Carlo simulation paths.

7. The option pricing computation hardware accelerator of claim 6, wherein the third circuit unit comprises:

8. A hardware accelerator card, comprising an accelerator card body and an accelerator chip arranged on the accelerator card body, wherein the accelerator chip is the option pricing calculation hardware accelerator according to any one of claims 1 to 7.

9. A computer device comprising a motherboard on which a microprocessor and a memory are mounted, wherein the microprocessor and the memory are connected to each other, and further comprising an option pricing calculation hardware accelerator according to any one of claims 1 to 7, wherein the microprocessor and the option pricing calculation hardware accelerator are connected in communication.

10. The computer device of claim 9, wherein the option pricing calculation hardware accelerator is integrated on a motherboard, or the option pricing calculation hardware accelerator is mounted on the motherboard in a board manner by plugging.