CN113012760B

CN113012760B - FPGA-based gene sequence assembly algorithm calculation acceleration method

Info

Publication number: CN113012760B
Application number: CN202011484784.1A
Authority: CN
Inventors: 柳星; 张敏杰; 蔡晨冉; 叶晓艺
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2022-07-05
Anticipated expiration: 2040-12-16
Also published as: CN113012760A

Abstract

The invention discloses a gene sequence assembly algorithm calculation acceleration method based on FPGA, which comprises a CPU and a heterogeneous calculation platform of the FPGA, and the method comprises the following steps: 1) in the filtering stage, a query sequence Kmer is converted on a CPU host to obtain a series of seeds, and then matching positions of all the seeds on a reference sequence interval are sequentially searched, namely, the seeds are hit; then counting the number of bases overlapped by the seeds in each hit interval, and selecting the positions with the number larger than a threshold value as candidate positions; 2) in the expansion stage, the candidate position is used as a starting point to start expansion through an optimized Smith-Waterman algorithm, and the blocks of the matrix are controlled through a CPU; the FPGA carries out operation on each block matrix to obtain a part of backtracking paths; 3) and the CPU sequentially splices the backtracking paths to obtain a complete backtracking path. The optimized Smith-Waterman algorithm is adopted, so that the running speed of sequence comparison can be greatly improved.

Description

FPGA-based gene sequence assembly algorithm calculation acceleration method

Technical Field

The invention relates to a gene sequence comparison calculation technology, in particular to a gene sequence assembly algorithm calculation acceleration method based on an FPGA (field programmable gate array).

Background

In recent years, with the rapid development of sequencing technology, the growth rate of genome data far exceeds moore's law, so that the existing computer resources cannot meet the requirements of people for processing the massive data. Genome assembly is the primary link for processing these massive data, and how to optimize or accelerate the assembly process is a hot topic at present. Sequence alignment is one of important links of genome assembly, and plays an important role in the field of precise medical treatment.

The existing sequence comparison algorithm is mostly based on a seed and expansion strategy, compared with the original comparison algorithm, areas where future comparison results may appear are screened out in a filtering mode before comparison calculation is carried out, and then comparison calculation is carried out in the areas, so that a great amount of time and space resource waste caused by calculation of the whole area is avoided. According to this strategy, there are currently the following major research directions: filtering technique optimization, seed indexing technique optimization, contrast algorithm optimization, and accelerating the contrast algorithm using hardware.

Despite the tremendous computational burden of gene assembly, current large multi-gene assembly application tools remain developed based on traditional CPU platforms. However, since the CPU is a general-purpose processor whose hardware structure is not specifically designed for the genetic calculation algorithm, execution of the assembly algorithm using the CPU becomes a bottleneck in the face of massive genome data in the big data era.

Compared with the traditional CPU parallel or GPU hardware acceleration mode, the hardware acceleration realized by using the FPGA can better reduce the calculation time and has lower energy consumption. The invention realizes the acceleration of sequence comparison by designing a CPU + FPGA heterogeneous computing platform, and is subsidized by 202010497040 in the national university student innovation entrepreneurship training plan.

Disclosure of Invention

The invention aims to solve the technical problem of providing a gene sequence assembly algorithm calculation acceleration method based on FPGA (field programmable gate array) aiming at the defects in the prior art.

The technical scheme adopted by the invention for solving the technical problems is as follows: a gene sequence assembly algorithm calculation acceleration method based on FPGA comprises a CPU and a heterogeneous calculation platform of FPGA, and comprises the following steps:

1) in the filtering stage, a query sequence Kmer is converted on a CPU host to obtain a series of seeds, and then matching positions of all the seeds on a reference sequence interval are sequentially searched, namely, the seeds are hit; then counting the number of bases overlapped by the seeds in each hit interval, and selecting the positions with the number larger than a threshold value as candidate positions;

2) in the expansion stage, the candidate position is used as a starting point to start expansion through an optimized Smith-Waterman algorithm, and the blocks of the matrix are controlled through a CPU; the FPGA carries out operation on each block matrix to obtain a part of backtracking paths;

3) and the CPU sequentially splices the backtracking paths to obtain a complete backtracking path.

According to the scheme, the step 1) of searching the matching positions of all the seeds on the reference sequence interval specifically comprises the following steps:

1.1) obtaining a series of query sequence seeds by using a sliding window with the size of K, and partitioning a reference sequence according to a fixed size;

1.2) finding the hit position of the seed on the diagonal band of the reference sequence interval.

According to the scheme, the step 1.2) is completed by using a seed pointer table and a seed position table of a data structure based on hash index, which are as follows:

obtaining reference sequence seeds by using the same sliding window with the size of K, sequentially recording the positions of the seeds in a seed position table, and simultaneously recording the initial position of each type of seeds in the storage of the seed position table by using a seed pointer table;

and finding the matched position of each query sequence seed on the reference sequence through a seed pointer table and a seed position table which are constructed and filled in advance.

According to the scheme, the optimized Smith-Waterman algorithm in the step 2) is as follows:

2.1) taking the candidate position obtained in the step 1) as a starting point, and firstly carrying out left expansion to the upper left;

2.2) determining a matrix with the size of T multiplied by T in the CPU as a block matrix to limit the range of each calculation, transmitting the size of the current block matrix and the position information of the current block matrix in the initial matrix to the FPGA so as to control the FPGA to perform scoring and backtracking operation on the block by using a Smith-Waterman algorithm and return a backtracking result to the CPU, and then the CPU moves the block matrix according to the received backtracking result to determine the area of the next calculation;

2.3) if the length of the backtracking path obtained by the current block matrix is 0 or reaches the edge of the initial matrix, ending left expansion, and splicing the backtracking paths of the block matrices in sequence by the CPU to obtain the left half part of the final backtracking path;

2.4) starting from the same starting point, expanding towards the lower right direction, wherein the specific process is similar to the step 2.2), similarly, when the right expansion is finished, the right half part of the final backtracking path can be obtained, and finally, the left part and the right part are spliced to obtain the final complete backtracking path;

the FPGA adopts a hardware parallel algorithm based on a systolic array to realize the parallel calculation of the Smith-Waterman.

The invention has the following beneficial effects:

1. aiming at the filtering stage, a Filter algorithm based on the number of the overlapped bases of the diagonal seeds is used for realizing the filtering, and meanwhile, a Hash index technology is introduced to accelerate the speed of searching the matching position by the seeds.

2. In the expansion stage, considering the space complexity of the square stage of the existing comparison algorithm, the FPGA cannot process long sequences under the limitation of a limited on-chip memory, therefore, an optimized Smith-Waterman algorithm is adopted, and the space complexity is fixed at a certain constant level by using blocks, so that the method is better suitable for hardware acceleration, then the FPGA is used for realizing the comparison step, and the running speed of sequence comparison is greatly improved in a hardware acceleration mode.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a flow chart of a method of an embodiment of the present invention;

FIG. 2 is a schematic diagram of the calculation acceleration of the FPGA-based gene sequence assembly algorithm according to the embodiment of the present invention;

FIG. 3 is a schematic diagram of extended algorithm in left extension according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a parallel computing acceleration hardware framework for implementing the Smith-Waterman algorithm on an FPGA according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1 and 2, a method for accelerating computation of a gene sequence assembly algorithm based on an FPGA includes a CPU and a heterogeneous computation platform of the FPGA, and includes the following steps:

1) in the Filter filtering stage, a query sequence Kmer is converted on a CPU host to obtain a series of seeds, and then matching positions of all the seeds on a reference sequence interval are sequentially searched, namely, the seeds are hit; then counting the number of bases overlapped by the seeds in each hit interval, and selecting the positions with the number larger than a threshold value as candidate positions;

the step 1) specifically comprises the following steps:

1.1) obtaining a series of query sequence seeds by using a sliding window with the size of k, and partitioning a reference sequence into a group according to B bases with fixed size;

1.2) constructing and filling a 'seed pointer table' and a 'seed position table', obtaining reference sequence seeds by using the same sliding window, and sequentially recording the positions of the seeds in a seed position table, wherein the seed pointer table records the initial position of each type of seeds in the storage of the seed position table;

1.3) finding the matched position of each query sequence seed on the reference sequence, and accelerating the step by constructing and filling a seed pointer table and a seed position table in advance;

1.4) counting the overlapped bases of the seeds in each reference sequence interval, and screening according to a threshold value to obtain a candidate position.

2) An extended expansion stage, namely expanding by taking the candidate position as a starting point through an optimized Smith-Waterman algorithm, and controlling the block division of the matrix through a CPU (Central processing Unit); the FPGA carries out operation on each block matrix to obtain a part of backtracking paths;

2.1) determining the whole matrix calculation range of the complete backtracking path by taking the candidate position obtained in the step 1) as a starting point;

2.2) determining a matrix with the size of T multiplied by T as a block matrix through the CPU to limit the range of each calculation, controlling the FPGA to calculate the block and returning a backtracking result to the CPU, and moving the block matrix and determining the area of the next calculation by the CPU according to the backtracking result;

the FPGA adopts a hardware parallel algorithm based on a systolic array to realize the acceleration of the parallel computation of the Smith-Waterman.

In the expansion stage of the application, the optimized Smith-Waterman algorithm is called an extended algorithm, the Smith-Waterman algorithm mainly realizes the calculation and the table filling of the final score through a W permutation matrix, an F score matrix and a gap penalty, and the whole calculation process follows the following state transition equation:

F(0,0)＝0

since the 0 option is added to the state transition equation, the negative score that occurs will be filled with 0. The backtracking phase then starts with the highest scoring matrix element until an element with a zero score is encountered. The highest score local alignment results are generated in this process.

The scoring and backtracking operation of the extended algorithm is similar to that of the existing Smith-Waterman algorithm, the optimization part mainly comprises that the extended algorithm takes a certain small square block in a matrix as a scoring and backtracking basic unit, and the purpose of calculating the whole matrix is achieved by continuously moving the block matrix. The extended respectively carries out left extension and extended with the candidate position obtained in the Filter stage as a starting point, finds a longer and more complete comparison path as far as possible, and can simultaneously Extend a plurality of candidate positions in a batch processing mode.

FIG. 3 is a diagram of the extended algorithm in the left expansion, and the method of dividing the dynamic programming matrix into several small matrices achieves the purpose of reducing the space complexity to candidate the position (i)^*,j^*) And as a starting point, continuously constructing and moving small matrixes according to the given matrix side length T, solving partial backtracking paths by using a Smith-Waterman algorithm, and finally splicing the backtracking paths of the small matrixes to obtain a final backtracking path.

When constructing and moving the block matrix, pass (i)^*,j^*) That is, the starting position (icurr, jcurr) of the lower right corner of the current partition matrix, T, and O can obviously determine the starting position of the upper left corner of the first partition matrix: (i)_start,j_start)＝(max(0,icurr-T),max(0,jcurr-T))。

Due to the side length T of the partition matrix, in practiceIn the actual operation, when a partition block is constructed from the position starting from the lower right corner of the partition matrix, a square partition matrix cannot be constructed, i.e. i^*-T，j^*T is negative, so it is necessary to put i^*-T，j^*-T is compared with 0 and if the result is less than 0, a rectangular partition matrix is constructed. In the process of calculating the first partition matrix backtracking, the offset of the partition matrix backtracking pointer in the horizontal direction and the vertical direction are obtained and are respectively used as (i)_off,j_off) And (4) showing. The next partition matrix is calculated next: after the trace-back path of the current segmentation matrix is calculated, the offset (i) of the trace-back pointer of the current matrix is subtracted by the (icurr, jcurr) position of the lower right corner of the current matrix_off,j_off) Then get the starting position of the lower right corner of the next partition matrix, and operate again (i)_start,j_start) And (max (0, icurr-T), max (0, jcurr-T)), and obtaining the starting position of the upper left corner of the next matrix according to the overlapping threshold, so that the construction of the next segmentation matrix is completed. By parity of reasoning, the backtracking calculation is continued according to the steps until the backtracking process is calculated to obtain (i)_off,j_off) Ending the whole process for (0, 0).

The parallel computing acceleration of the Smith-Waterman algorithm is realized on an FPGA, a hardware frame is shown in figure 4, firstly, an input reference sequence and an input query sequence are respectively stored in two BRAMs, one PE in a pulse array is responsible for scoring computation of a certain row in a matrix, a base of the reference sequence is sent to the leftmost PE in each clock cycle, and the former base is sent to the next PE, so that multiple rows of parallel computation are realized. Each PE stores the generated backtracking pointer in SRAM during the calculation process. After all the rows are calculated, the backtracking logic module performs backtracking operation according to the backtracking pointer in the SRAM and outputs the final backtracking path of each block matrix.

3) And the CPU sequentially splices the backtracking paths of each block matrix so as to obtain a complete backtracking path.

The invention adopts a standard system of an OpenCL cross-platform computing framework to construct a complete CPU + FPGA heterogeneous parallel computing platform, the CPU is responsible for executing a filtering algorithm and controlling Smith-Waterman partitioning, the FPGA carries out hardware acceleration on computation intensive steps in the algorithm, PCIE interfaces are used for realizing data interaction between the CPU and the Smith-Waterman, a final result is tested, and the acceleration performance is evaluated. The invention is subsidized by 202010497040 of the training plan of the innovative entrepreneur of the national university students.

The invention adds a Filter filtering stage, adopts a filtering algorithm based on the number of seed hit bases in a diagonal zone to count, and uses a Hash index structure of a seed position table and a seed pointer table to accelerate filtering; and then, an extended algorithm is adopted to realize an extension stage, and the original Smith-Waterman algorithm is optimized to reduce the space complexity, so that the method is more suitable for running on the FPGA.

The applicant uses C + + language to realize pure software version of Filter algorithm + Extend algorithm on CPU, compares the operation result with the original Smith-Waterman algorithm operation result using the same test data, and verifies the correctness of the scheme, wherein the result is the same.

By means of the Xilinx SDAccel platform, interaction with an FPGA is achieved by calling an OpenCL API to modify a host program, meanwhile, hardware codes are compiled into binary files and are written on an FPGA board, so that the building of a CPU + FPGA heterogeneous computing platform is achieved, tests are respectively carried out in a Software-Emulation mode (pure CPU) and a System mode (CPU + FPGA) provided by the SDAccel platform, the final acceleration performance is evaluated, as shown in Table 1, five tests are carried out by using the same 16384 sets (a set of reference sequences with the length of 256 and query sequences with the length of 128) and time consumption is recorded, and from the result, the scheme can be proved to be capable of actually accelerating a gene comparison algorithm and has a remarkable acceleration effect.

TABLE 1 comparison of run results

It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims

1. A gene sequence assembly algorithm calculation acceleration method based on FPGA comprises a CPU and a heterogeneous calculation platform of FPGA, and is characterized by comprising the following steps:

2) in the expansion stage, the candidate position is used as a starting point to start expansion through the optimized Smith-Waterman algorithm, and the blocks of the matrix are controlled through a CPU; the FPGA carries out operation on each block matrix to obtain a partial backtracking path;

the optimized Smith-Waterman algorithm in the step 2) is as follows:

2.4) starting from the same starting point, expanding towards the lower right direction, wherein the specific process is the same as the step 2.2), similarly, obtaining the right half part of the final backtracking path when the right expansion is finished, and finally splicing the left part and the right part to obtain the final complete backtracking path;

the FPGA adopts a hardware parallel algorithm based on a pulse array to realize the parallel calculation of the Smith-Waterman;

2. The FPGA-based gene sequence assembly algorithm calculation acceleration method according to claim 1, wherein the step 1) is implemented by searching for matching positions of all seeds on a reference sequence interval as follows:

3. The FPGA-based gene sequence assembly algorithm computation acceleration method of claim 2, wherein the step 1.2) is performed by using a seed pointer table and a seed position table of a hash index-based data structure, specifically as follows: