CN110427262B

CN110427262B - Gene data analysis method and heterogeneous scheduling platform

Info

Publication number: CN110427262B
Application number: CN201910918380.XA
Authority: CN
Inventors: 杨姣博; 于闯; 宋超; 贺增泉; 王今安
Original assignee: BGI Technology Solutions Co Ltd
Current assignee: BGI Technology Solutions Co Ltd
Priority date: 2019-09-26
Filing date: 2019-09-26
Publication date: 2020-05-15
Anticipated expiration: 2039-09-26
Also published as: CN110427262A

Abstract

The invention provides a gene data analysis method and a heterogeneous scheduling platform, which are used for carrying out pipeline management on tasks to be executed by utilizing load balance through interaction among a controller, a GPU, an FPGA and a CPU. The method specifically comprises the following steps: reading a gene sequencing file to be analyzed, processing the gene sequencing file to obtain gene data, storing the gene data into a memory, combining the load and the running state of a processor, reading a gene sequence in the memory, distributing the gene sequence to the processor which is in an idle state and/or is not overloaded, executing multi-task calculation analysis, correcting and converting a calculation result, generating an analysis report by statistical standard output information, and completing data analysis. Monitoring load balance in real time in the analysis process, and improving the accelerated processing of computing resources if the load for processing the current task is insufficient; correspondingly reducing the pre-task computing resources to reduce the input; the efficiency of analyzing and reading the multitask gene data in unit time is greatly improved.

Description

Gene data analysis method and heterogeneous scheduling platform

Technical Field

The invention relates to the technical field of biological information, in particular to a gene data analysis method and a heterogeneous scheduling platform.

Background

With the continuous improvement of gene detection technology and the continuous popularization of gene detection service. Although the sample size in the gene market is increasing year by year, the price of gene testing is usually high due to the cost required for gene testing, which greatly affects the user's selection and hinders the product promotion. At present, the cost of gene detection is mainly divided into sequencing cost and analysis cost. And after the sequencer completes sequencing, analyzing and reading the gene data by adopting matched gene data analysis software.

The specific conventional method for analyzing and reading gene data is as follows: the genetic data is analyzed and interpreted by a trusted analyst on a server of a Central Processing Unit (CPU) in a command line manner. However, the process is complicated, the calculation efficiency is low, the utilization rate of the CPU, the memory and the like is also low, the task executed in unit time cannot be maximized, and the problem that system resources cannot be fully utilized occurs. Furthermore, if multiple users use the server simultaneously or the requested resources exceed the load capacity of the CPU itself, the problem of server crash is also easily caused.

Disclosure of Invention

In view of this, embodiments of the present invention provide a genetic data analysis method and a heterogeneous scheduling platform to solve the problems that a current biological information analyst analyzes and decodes data on a server of a CPU in a command line manner, the process is complicated, the efficiency is low, the execution task in a unit time cannot be maximized, and system resources cannot be fully utilized.

In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

the invention provides a gene data analysis method, which is suitable for a heterogeneous scheduling platform comprising a controller and a processor, wherein the processor at least comprises a GPU (graphics processing unit), a CPU (central processing unit) and an FPGA (field programmable gate array), and the method comprises the following steps:

the controller monitors load balance of each module in real time in the process of executing gene data analysis, if any module is overloaded, computing resources of the current module are increased, and computing resources of the front-end module are reduced; the process of each module of the controller for executing gene data analysis comprises the following steps:

a reading module in the controller reads a gene sequencing file to be analyzed and task parameters from the task queue for preprocessing to obtain a task to be executed;

an algorithm acceleration module in the controller detects the load and the running state of the GPU, the CPU and the FPGA in real time and obtains the computing resources of the GPU, the FPGA and the CPU;

an algorithm acceleration module in the controller sequentially distributes the tasks to be executed to processors which are currently in an idle state and/or not overloaded according to the load and the running state of the GPU, the CPU and the FPGA detected in real time from the head of the task queue, and the controller controls the corresponding processors to process the tasks to be executed;

a conversion module in the controller acquires a calculation result output by a processor for processing the task to be executed, and performs data calibration and format conversion on the calculation result to obtain standard output information;

and an output module in the controller counts the standard output information to generate an analysis report.

Preferably, the process of controlling the corresponding processor to execute the task by the controller includes:

the controller processes a gene sequencing file to be analyzed carried in the task to be executed to obtain a gene data sequence, and stores the gene data sequence in a memory database of the heterogeneous scheduling platform, wherein the gene sequencing file to be analyzed comprises a gene sequencing fragment to be analyzed, analysis parameters and the like;

under the condition of large-scale calculation, the controller reads the gene data sequence stored in the memory database and distributes the gene data sequence to the thread to be operated of the processor, wherein the GPU processes the thread to be operated in a heterogeneous parallel mode, and the FPGA processes the thread to be operated in a pipelined parallel mode.

Preferably, the reading module in the controller reads the gene sequencing file to be analyzed and the task parameters from the task queue for preprocessing, and before obtaining the task to be executed, the method further includes:

the controller determines whether the current load can receive a new task.

Preferably, if the loads of the GPU, the CPU, and the FPGA are not overloaded and/or in an idle state, the controller controls the corresponding processor to process the task to be executed, including:

the controller reads a gene data sequence stored in a memory database, distributes part of the gene data sequence needing large-scale calculation to N to-be-run threads executed by the GPU, and distributes the other part of the gene data sequence to M to-be-run threads executed by the FPGA; the other part is distributed to the CPU;

and the values of M and N are more than or equal to 1, and the total number of M and N is less than or equal to the total number of threads which can be operated by the optimal load.

Preferably, the method further comprises:

and in the process that the controller controls the corresponding processor to process the task to be executed, if the controller detects that the processor is overloaded in real time, increasing the computing resources of the processor with the overloaded load.

Preferably, the GPU processes the thread to be run in a heterogeneous parallel manner, including:

the GPU executes threads to be operated in parallel, and for each thread to be operated, the GPU executes multi-core parallel analysis processing on an input queue input into the thread to be operated based on a multi-core structure to obtain a corresponding output result, wherein the input queue comprises a gene data sequence, and the gene data sequence comprises data needing efficient parallel data processing;

and the GPU converts the output result into a standard output file and outputs the standard output file.

Preferably, the FPGA processes the thread to be run in a pipelined parallel manner, including:

the FPGA executes threads to be operated in parallel, and for each thread to be operated, the FPGA sequentially carries out parallel processing on data input into an input queue of the thread to be operated in the n operator threads on the basis of n operator threads in the thread to be operated, wherein the value of n is a positive integer greater than 1;

and the FPGA outputs a processing result after all data input into the input queue of the thread to be operated are processed aiming at each thread to be operated.

The invention also provides a heterogeneous scheduling platform, comprising: the system comprises a controller and a processor, wherein the processor at least comprises a GPU, a CPU and an FPGA;

the controller comprises a reading module, an algorithm acceleration module, a conversion module and an output module, and is used for monitoring load balance of each module in real time in the process of executing gene data analysis, if any module is overloaded, the computing resources of the current module are increased, and the computing resources of the front-end module are reduced;

wherein the content of the first and second substances,

the reading module is used for reading a gene sequencing file to be analyzed and task parameters from the task queue for preprocessing to obtain a task to be executed;

the algorithm acceleration module is used for detecting the load and the running state of the GPU, the CPU and the FPGA in real time, acquiring the computing resources of the GPU, the FPGA and the CPU, sequentially distributing the tasks to be executed to the processors which are currently in an idle state and/or not overloaded by the head of the task queue according to the real-time detected load and running state of the GPU, the CPU and the FPGA, and controlling the corresponding processors to process the tasks to be executed;

the conversion module is used for acquiring a calculation result output by a processor for processing the task to be executed, and performing data calibration and format conversion on the calculation result to obtain standard output information;

and the output module is used for counting the standard output information and generating an analysis report.

Preferably, the reading module is further configured to determine whether the current load can receive a new task.

Compared with the prior art, the technical scheme provided by the invention has the following advantages:

the embodiment of the invention discloses a gene data analysis method and a heterogeneous scheduling platform, which are used for carrying out pipeline management on tasks to be executed by utilizing load balance through interaction among a controller, a GPU, an FPGA and a CPU. The method specifically comprises the following steps: reading a gene sequencing file to be analyzed, processing the gene sequencing file to obtain gene data, storing the gene data into a memory, combining the load and the running state of a processor, reading a gene sequence in the memory, distributing the gene sequence to the processor which is in an idle state and/or is not overloaded, executing multi-task calculation analysis, correcting and converting a calculation result, generating an analysis report by statistical standard output information, and completing data analysis. Monitoring load balance in real time in the analysis process, and improving the accelerated processing of computing resources if the load for processing the current task is insufficient; correspondingly reducing the pre-task computing resources to reduce the input; the efficiency of analyzing and reading the multitask gene data in unit time is greatly improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a heterogeneous scheduling platform according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for analyzing genetic data according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a controller controlling a corresponding processor to process a task to be executed according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a controller according to an embodiment of the present invention;

FIG. 5 is a flow chart of another method for genetic data analysis according to an embodiment of the present invention;

fig. 6 is a flowchart illustrating that, if the loads of the GPU, the CPU, and the FPGA are not overloaded and/or are in an idle state, the controller controls the corresponding processor to process a task to be executed according to the embodiment of the present invention;

FIG. 7 is a flow chart of another method for genetic data analysis according to an embodiment of the present invention;

fig. 8 is a flowchart illustrating that a GPU processes a thread to be run in a heterogeneous parallel manner according to an embodiment of the present invention;

fig. 9 is a working schematic diagram of efficient parallel data processing performed by a GPU worker to be run in a GPU according to an embodiment of the present invention;

fig. 10 is a flowchart illustrating that an FPGA processes a thread to be run in a pipelined parallel manner according to an embodiment of the present invention;

fig. 11 is a working schematic diagram of parallel data processing performed by an FPGA worker pipeline to be operated in an FPGA according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of a heterogeneous scheduling platform according to an embodiment of the present invention.

Detailed Description

The invention provides a gene data analysis method and a heterogeneous scheduling platform, which realize the high-efficiency Processing of gene test data by coordinating a CPU (central Processing Unit) through a controller, a Graphic Processing Unit (GPU) and a Field Programmable Gate Array (FPGA).

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention aims to provide a pipelined multithread scheduling scheme for input files and improve the analysis efficiency on the basis of ensuring high accuracy. As shown in fig. 1, the present invention provides an architectural diagram of a heterogeneous scheduling platform based on an FPGA and a GPU chip bioinformation algorithm.

The heterogeneous scheduling platform comprises: a controller 11 and a processor.

The processor includes a GPU12, an FPGA13, and a CPU 14.

Among them, the controller 11 has the functions of: the method can carry out task management on delivered tasks (various off-line data needing analysis), and specifically comprises the following steps: controlling the running, delivering, suspending, inquiring, deleting, clearing tasks and the like of the queued tasks.

For queued tasks, the controller 11 allocates a corresponding processor to the queued task according to the idle conditions of the CPU12, the FPGA13, and the GPU 14.

The controller 11 is matched with the GPU12 and the FPGA13 to perform heterogeneous calculation, so that monitoring, scheduling and log generation of the analysis process are completed. The dispatching coordination capability of the controller 11 is fully utilized to complete multi-task management, the computing capabilities of the GPU12, the FPGA13 and the CPU14 are exerted to the maximum extent, the parallel efficiency of multiple samples is improved, analysts are enabled to do the least operation, and data can be obtained more quickly. Furthermore, the controller 11 can perform intelligent task delivery management, intelligently monitor server load, and intelligently allocate computing resources of servers such as the GPU12, the FPGA13, and the CPU14, thereby reducing the operation of analysts and reducing the loss of intermediate links.

The controller 11 serves as a control center and is responsible for task scheduling and data preparation; the GPU12 and the FPGA13 are used as an operation center and are responsible for parallel and pipelined calculation.

For ease of understanding, the controller 11, GPU12, and FPGA13 are explained herein.

The controller 11 provides platform support for the calculation of the whole heterogeneous cooperative CPU and provides a scheduling architecture for tasks.

The GPU12 is a GPU chip, also called a display core, a visual processor, and a display chip, and is a microprocessor that is specially used for image operation on a personal computer, a workstation, a game machine, and some mobile devices (e.g., a tablet computer, a smart phone, etc.).

GPU12 is designed specifically to perform complex mathematical and geometric calculations. GPU12 may provide tens or hundreds of times as much performance as a CPU, and the standards for general purpose computing of GPU12 are OpenCL, CUDA, and ATI STREAM.

The FPGA13 is an FPGA chip, i.e., a field programmable gate array, which is a product further developed on the basis of programmable devices such as PAL, GAL, CPLD, etc. The circuit is a semi-custom circuit in the field of Application Specific Integrated Circuits (ASIC), not only overcomes the defects of the custom circuit, but also overcomes the defect that the number of gate circuits of the original programmable device is limited.

Based on the heterogeneous scheduling platform disclosed by the embodiment of the invention, the controller reads the gene sequencing file to be analyzed for processing, the obtained gene data is stored in the memory, the controller reads the gene sequence in the memory and distributes the gene sequence to the processor which is in the idle state and/or the processor without overload load at present by combining the load and the running state of the processor, the multitask calculation analysis is executed, the calculation result is corrected and converted, the statistical standard output information is generated into an analysis report, and the data analysis is completed. Monitoring load balance in real time in the analysis process, and improving the accelerated processing of computing resources if the load for processing the current task is insufficient; correspondingly reducing the pre-task computing resources to reduce the input; the efficiency of analyzing and reading the multitask gene data in unit time is greatly improved.

With reference to the heterogeneous scheduling platform disclosed in the embodiment of the present invention, as shown in fig. 2, a flowchart of a method for analyzing gene data provided in the embodiment of the present invention is shown, and the method includes the following steps:

and S201, the controller monitors load balance of each module in real time in the process of executing gene data analysis, and if any module is overloaded, the computing resources of the current module are increased, and the computing resources of the front-end module are reduced.

In the process of executing S201, in the process of analyzing the gene data by each module, in order to avoid the overload situation of a certain module when analyzing the gene data, the controller needs to monitor each running module, so that the running situation of each module can be known in real time, and the overload situation of the module can be alleviated by executing corresponding operations on the module with the overload situation.

For example: when a module is overloaded, the computing resources of the module need to be increased, and the module without overload can correspondingly increase some computing resources, so that the computing resources are effectively distributed, and the overload condition of the module is relieved.

Step S202: and a reading module in the controller reads the gene sequencing file to be analyzed and the task parameters from the task queue for preprocessing to obtain the task to be executed.

In the process of executing step S202, the controller performs data preprocessing on the gene sequencing sequence file to be analyzed, the task parameters, and the like in the task queue through its own reading module, to obtain a task to be executed.

It should be noted that, specifically, how to pre-process the data may be implemented according to actual situations.

And S203, detecting the load and the running state of the GPU, the CPU and the FPGA in real time by an algorithm acceleration module in the controller, and acquiring the computing resources of the GPU, the FPGA and the CPU.

In the process of executing step S203, the algorithm acceleration module of the controller itself can detect the load conditions of the GPU, the CPU, and the FPGA in real time, and determine whether there is a load overload condition and whether the running state is abnormal, so that real-time detection can be performed, and improved operation processing can be performed according to the conditions.

The algorithm acceleration module of the controller can detect the loads of the GPU, the CPU and the FPGA and can also acquire the calculation resources of the GPU, the FPGA and the CPU, and the GPU, the FPGA and the CPU can calculate and process the calculation resources based on the calculation resources.

And S204, an algorithm acceleration module in the controller sequentially distributes the tasks to be executed to the processors which are currently in an idle state and/or not overloaded according to the load and the running state of the GPU, the CPU and the FPGA detected in real time from the head of the task queue, and the controller controls the corresponding processors to process the tasks to be executed.

In the process of executing the step S204, the task to be executed is correspondingly allocated to the GPU, the CPU, and the FPGA for calculation processing according to the load and the operating state of the GPU, the CPU, and the FPGA detected in real time by the algorithm acceleration module, and then a corresponding calculation result is output.

For example: the CPU is not overloaded and has a good running state at present, the load of the GPU is in an overloaded running state, at the moment, more tasks are distributed to the GPU by the controller, and the CUP does not add the tasks to be executed for calculation processing at the moment.

And S205, a conversion module in the controller acquires a calculation result output by a processor for processing the task to be executed, and performs data calibration and format conversion on the calculation result to obtain standard output information.

In the process of executing step S205, the conversion module in the controller is configured to perform calibration and format conversion on the acquired data, so as to convert the data into standard output information with a preset format.

And S206, an output module in the controller counts standard output information to generate an analysis report.

In the process of executing step S206, the output module of the controller obtains the standard information, counts the obtained standard information, and then generates an analysis report. The analysis report clearly and clearly shows various gene data.

According to the gene data analysis method disclosed by the embodiment of the invention, the tasks to be executed are subjected to pipeline management by utilizing load balance through interaction among the controller, the GPU, the FPGA and the CPU. The method specifically comprises the following steps: reading a gene sequencing file to be analyzed, processing the gene sequencing file to obtain gene data, storing the gene data into a memory, combining the load and the running state of a processor, reading a gene sequence in the memory, distributing the gene sequence to the processor which is in an idle state and/or is not overloaded, executing multi-task calculation analysis, correcting and converting a calculation result, generating an analysis report by statistical standard output information, and completing data analysis. Monitoring load balance in real time in the analysis process, and improving the accelerated processing of computing resources if the load for processing the current task is insufficient; correspondingly reducing the pre-task computing resources to reduce the input; the efficiency of analyzing and reading the multitask gene data in unit time is greatly improved.

Based on the above gene data analysis method disclosed in fig. 2 of the embodiment of the present invention, the specific implementation process of the controller controlling the corresponding processor to process the task to be executed mainly includes, as shown in fig. 3:

and S301, the controller processes a gene sequencing file to be analyzed carried in a task to be executed to obtain a gene data sequence, and stores the gene data sequence in a memory database of the heterogeneous scheduling platform.

In step S301, the gene sequencing file to be analyzed includes the gene sequencing fragment to be analyzed, the analysis parameters, and the like.

In the process of executing step S301, the controller performs serialization processing on the obtained gene sequencing file to be analyzed, so as to obtain a corresponding gene data sequence. The gene sequencing file to be analyzed is subjected to serialization processing, and the obtained gene data sequence can be conveniently read in the heterogeneous scheduling platform. And the corresponding calculation and operation of the gene data sequence in the heterogeneous scheduling platform are facilitated.

For example: in the C language, the program in the C language is processed in a corresponding format, otherwise, after compiling, the corresponding result cannot be output. Therefore, step S301 is executed to process the gene test file to be analyzed into a gene data sequence through serialization processing, store the obtained gene data sequence in the heterogeneous scheduling platform, and directly call the gene data sequence in the heterogeneous scheduling platform when the gene data sequence is needed.

It should be noted that the heterogeneous scheduling platform includes, but is not limited to, a controller, a GPU, an FPGA, and a CPU.

Step S302: under the condition of large-scale calculation, the controller reads the gene data sequence stored in the memory database and distributes the gene data sequence to the to-be-run thread executed by the processor.

In step S302, the thread refers to: is the smallest unit that the operating system can perform operation scheduling. It is included in the process and is the actual unit of operation in the process. A thread refers to a single sequential control flow in a process, multiple threads can be concurrently executed in a process, and each thread executes different tasks in parallel. Threads are the basic unit of independent scheduling and dispatch. A thread may be a kernel thread scheduled by an operating system kernel, and multiple threads in the same process share all system resources in the process, such as virtual address space, file descriptors, signal processing, and so on. However, multiple threads in the same process have respective call stacks (call stacks), respective register contexts (register contexts), and respective thread-local stores (local stores).

The advantage of using multi-thread programming on multi-core or multi-CPU, or CPU supporting Hyper-threading, is to increase the execution throughput of the program. On a computer with a single CPU and a single core, a multithreading technology is used, a part which is in charge of I/O processing and man-machine interaction and is often blocked in a process can be separated from a part for intensive calculation to be executed, a special workhorse thread is written to execute the intensive calculation, and therefore the execution efficiency of a program is improved.

In the case of large-scale calculation, step S302 is performed, and the gene data sequence can be distributed in three different distribution manners.

The first mode is as follows: and the controller reads the gene data sequence stored in the memory database and distributes the whole gene data sequence to the thread to be operated executed by the GPU.

The second mode is as follows: and the controller reads the gene data sequence stored in the memory database and distributes the whole gene data sequence to the to-be-operated process executed by the FPGA.

The third mode is as follows: and the controller reads the gene data sequence stored in the memory database, distributes part of the gene data sequence to N to-be-run threads executed by the GPU, and distributes the other part of the gene data sequence to M to-be-run threads executed by the FPGA.

And executing step S302 under the condition that large-scale calculation is needed, reading the gene data sequence stored in the memory database in the heterogeneous scheduling platform through the controller, and distributing the read gene data sequence to the thread to be run executed by the GPU and the FPGA so as to facilitate the GPU and the FPGA to process the thread to be run.

Fig. 4 shows a schematic diagram of the operation of a controller in a heterogeneous scheduling platform. The controller mainly ensures the parallel execution of the pipelined tasks and has the functions of controlling the running, delivering, suspending, inquiring, deleting, clearing tasks and the like of the queued tasks. When the task is delivered, the controller is switched from a sleep state to an initialization state and then to an operation state, the task is continuously executed, and the load of the sub-module is adjusted; the manager changes the task execution termination into a cancellation state, and terminates the execution of the current task queue; when no task is delivered within the set time, the system is switched to a sleep state, and the consumption of system resources is reduced.

It should be noted that, the gene data sequences are allocated to the corresponding GPU and FPGA for processing according to the attributes of the gene data sequences, so that the processing efficiency is improved.

Step S303: and the GPU processes the N threads to be run in a heterogeneous parallel mode.

In the process of executing step S303, the GPU processes the N to-be-run threads in a single-instruction and multi-thread manner.

It should be noted that, the specific GPU processes how many threads to be run, and the threads to be run that need to be processed actually are taken as a standard.

And S304, the FPGA processes the M threads to be run in a pipelined parallel mode.

In the process of executing step S304, the FPGA processes the M to-be-run threads in a pipelined parallel manner.

It should be noted that, what number of to-be-run threads are processed by the specific FPGA is a standard according to the to-be-run threads that need to be processed actually.

Step S303 and step S304 are executed in parallel.

According to the gene data analysis method disclosed by the embodiment of the invention, the GPU is used for processing the N to-be-run processes in a heterogeneous parallel mode, and the FPGA is used for processing the M to-be-run processes in a pipelined parallel mode, so that the efficiency of analyzing and reading the gene data in unit time is greatly improved.

Based on the gene data analysis method disclosed in fig. 2 of the embodiment of the present invention, as shown in fig. 5, a flowchart of another gene data analysis method provided in the embodiment of the present invention mainly includes:

step S501, the controller judges whether the current load can receive a new task, if so, the step S502 is executed, and if not, the step S503 is executed.

In the process of executing step S501, the controller monitors the load balance of each module, and determines whether the current load can continue to receive a new task to be executed according to the monitored current load condition.

Step S502, a new task continues to be received.

In the process of performing step S502, if the load can be not overloaded, the load continues to receive a new task.

Step S503, stopping receiving new tasks.

In the process of executing step S503, if the load is overloaded, the load stops receiving new tasks, and when the load is not overloaded, the load continues to receive new tasks.

According to the gene data analysis method disclosed by the embodiment of the invention, whether the current load can receive the new task or not is judged by the controller, if the load is not overloaded, the new task is received, and if the load is overloaded, the new task is stopped being received, so that the function of protecting the load is realized.

Based on the above gene data analysis method disclosed in fig. 2 of the embodiment of the present invention, if the loads of the GPU, the CPU, and the FPGA are not overloaded and/or are in an idle state, the controller controls the corresponding processor to process a specific implementation process of a task to be executed, as shown in fig. 6, the method mainly includes:

step S601, the controller reads the gene data sequence stored in the memory database and distributes the gene data sequence to the corresponding to-be-run thread of the processor.

In the process of executing step S601, the controller reads a gene data sequence stored in the memory database, and then allocates the gene data sequence to an input queue corresponding to a thread to be run, where allocating to the thread to be run includes: a thread to be run in a GPU processor and a thread to be run in an FPGA processor. Specifically, the gene data sequence may be allocated to the to-be-executed thread in the GPU processor and the to-be-executed thread in the FPGA processor by using a high-speed serial computer extended bus (PCIE) standard.

It should be noted that, the controller reads the gene data sequence stored in the memory database, and can run the calculation processing on the gene data sequence in any processor of the CPU, the GPU and the FPGA, in this scheme, in order to improve the efficiency of the calculation processing, when the disposable threads are more according to the load conditions of the CPU, the GPU and the FPGA, the data is preferentially processed in combination with the respective adept calculation functions of the CPU, the GPU and the FPGA.

It should be noted that, the method includes, but is not limited to, the PCIE allocating the gene data sequence to the thread to be run in the GPU and the thread to be run in the FPGA.

And step S602, distributing the gene data sequence part needing large-scale calculation to N threads to be run executed by the GPU.

In the process of executing step S602, the input queue that needs to execute data processing efficiently in parallel in the input queue is sent to the thread to be run executed by the GPU, that is, the GPU needs to perform corresponding calculation processing on the data that needs to be executed efficiently in parallel in the thread to be run, and then outputs a corresponding result.

It should be noted that, in many input queues, data that needs to be efficiently executed in parallel is screened out and sent to the GPU for processing, so that the efficiency of data calculation processing is improved.

And step S603, distributing the other part of the gene data sequence needing large-scale calculation to M to-be-run threads executed by the FPGA.

In the process of executing step S603, the input queue in the input queue, which needs to be processed by pipeline parallel execution data, is sent to the to-be-run thread executed by the FPGA, that is, the FPGA needs to perform corresponding calculation processing on the to-be-run data in parallel in the to-be-run thread, and then outputs a corresponding result.

In step S604, another part of the gene data sequence requiring large-scale calculation is distributed to the CPU.

It should be noted that, in many input queues, the data that needs to be executed by pipelines is screened out and sent to the FPGA for processing, so that the efficiency of data calculation processing is improved.

According to the gene data analysis method disclosed by the embodiment of the invention, the controller reads the gene data sequence stored in the memory database and distributes the gene data sequence to the GPU, the to-be-run thread executed by the FPGA and the CPU, and the GPU and the FPGA are used for processing the high-efficiency parallel execution data and the pipeline parallel execution data, so that the efficiency of data calculation processing is improved.

Based on the gene data analysis method disclosed in the above embodiment of the present invention, as shown in fig. 7, a flowchart of another gene data analysis method provided in the embodiment of the present invention mainly includes:

step S701, in the process that the controller controls the corresponding processor to process the task to be executed, whether the processor has overload is detected, if yes, step S702 is executed, and if not, step S703 is executed.

In the process of executing step S701, the controller monitors the running thread load in the processor, so as to know the running thread load, and the thread load is divided into two cases: one is that the running thread is loaded beyond the optimal load, i.e., cannot be within the machine's tolerance; alternatively, the running thread load is within the optimal load range, i.e., the range that the machine can tolerate.

The optimum load is set according to actual conditions. In addition to monitoring the load of the running threads of the processor, the controller further adjusts the load balance by utilizing the utilization rates of the I/O ports, MEMs, CPUs and GPUs.

Step S702, increasing the computing resource of the processor with overload.

In the process of executing step S702, if the controller monitors that the load of the running thread exceeds the optimal load, the controller suspends the processing by controlling a part of threads running in the processor, so that the load of the current running thread does not exceed the optimal load, that is, the machine can continue to operate within the tolerable range, thereby greatly reducing the service life of the machine.

For example: when the I/O port is too heavy, the input of data is reduced, and the speed balance of input and calculation is adjusted. It should be noted that, when the load of the running thread exceeds the optimal load, only some threads running in the processor are suspended, and other threads continue to run.

It should be noted that, if the load of the running thread exceeds the optimal load, part of the running threads may be suspended according to the running sequence of the threads.

Step S703 of maintaining the computing resources of the current processor load.

According to the gene data analysis method disclosed by the embodiment of the invention, the running thread load in the processor is monitored by the controller, whether the running thread load exceeds the optimal load or not is judged, and if the running thread load exceeds the optimal load, part of running threads are suspended, so that the service life of the machine is greatly shortened; if the optimal load is not exceeded, the read gene data sequences stored in the memory database are distributed to the processors, so that the processing of the gene data sequences stored in the memory database is accelerated, and the overall processing efficiency is obviously improved.

Based on the above embodiments of the present invention, a specific implementation process in which the GPU processes the thread to be run in a heterogeneous parallel manner is mainly included, as shown in fig. 8:

step S801, the GPU executes N threads to be operated in parallel, and for each thread to be operated, the GPU executes multi-core parallel analysis processing on an input queue input into the thread to be operated based on a multi-core structure to obtain a corresponding output result.

In step S801, the multi-core structure refers to that two or more complete computing cores are integrated into the GPU, and the two or more complete computing cores can operate independently and perform computing processing on data without interference.

In the process of executing step S801, the GPU performs processing on the N threads to be run in a parallel manner. And processing the input queue in the thread to be run of the input GPU based on the multi-core structure of the GPU and outputting a corresponding result aiming at each thread to be run. The input queue includes a gene data sequence including data to be efficiently run and to be executed in parallel for data processing.

It should be noted that the data calculated by the GPU is stored in the memory database. And for the N threads to be run in the GPU, each thread to be run correspondingly processes one task.

Fig. 9 is a schematic diagram of the work of efficiently executing data processing in parallel by a to-be-run thread GPU worker in the GPU.

The specific working principle is as follows: and sending a data input queue (FIFO) to a plurality of CUDAs in the GPU worker to be operated to execute data processing in parallel, and outputting the processing result in the form of a data output queue (FIFO).

Specifically, the input queue in the thread to be run of the input GPU processor is computed through a CUDA (computer Unified Device Architecture), where the CUDA is an operation platform provided by a video card vendor NVIDIA. CUDA is a general-purpose parallel computing architecture derived from NVIDIA, which enables a GPU to solve complex computing problems. It contains the CUDA Instruction Set Architecture (ISA) and the parallel computing engine inside the GPU. Developers can now write programs for the CUDA architecture using the C language, which is one of the most widely used high-level programming languages. The written program may run at ultra-high performance on a processor that supports CUDA. CUDA3.0 has begun to support C + + and FORTRAN. Due to the multi-core structure processed by the GUP, important guarantee is provided for processing N to-be-run threads with high efficiency.

It should be noted that GPU processors are different, and the multi-core structure is also different, for example: among the CPU processors of the mobile phone, there are four-core CPU processors, eight-core CPU processors, ten-core processors, and earlier, single-core and dual-core CPU processors.

And S802, converting the output result into a standard output file by the GPU for outputting.

In the process of executing step S802, the GPU processor converts the output result of the input queue in the thread to be run into a standard output file for output, that is, in the GPU processor, in addition to performing calculation processing on the input queue in the thread to be run, the GPU processor also needs to convert the output result of the calculation processing on the input queue in the thread to be run, which is input into the GPU processor, into a standard output file for output.

It should be noted that the standard output file is converted to be output, so that the user can be more intuitively displayed on the computer.

According to the gene data analysis method disclosed by the embodiment of the invention, due to the multi-core structure of the GUP, important guarantee is provided for efficiently processing N to-be-run threads.

Based on the above gene data analysis method disclosed in the embodiment of the present invention, the specific implementation process of processing the thread to be run by the FPGA in a pipelined parallel manner is mainly as shown in fig. 10, and includes:

and S1001, the FPGA executes the threads to be operated in parallel, and for each thread to be operated, the FPGA sequentially carries out parallel processing on the data input into the input queue of the thread to be operated in n operator threads on the basis of n operator threads in the thread to be operated.

In the process of executing step S1001, the FPGA processor executes M threads to be operated in parallel, and for each thread to be operated, the FPGA processor sequentially performs parallel processing on data input into the input queue of the thread to be operated in the n operator threads based on the n operator threads in the thread to be operated, that is, each thread to be operated includes the n operator threads, so as to improve the efficiency of data processing.

As shown in fig. 11, a working schematic diagram of parallel data processing is performed for one to-be-run FPGA worker pipeline in the FPGA.

As shown in fig. 11, data6 and data7 denote data in a queue, data1 to data5 denote sequential input data, pipeline parallel data processing is performed on the input data, and data0 denotes data output after the pipeline parallel data processing is performed.

Based on fig. 11, the specific working principle is as follows: the FPGA Worker modularizes a plurality of parts suitable for Pipeline calculation in Pipeline, specifically, the FPGA Worker is divided into six modules, data6 and data7 in FIG. 7 represent data in queue, data1 to data5 represent sequential input data, data1 to data5 are sequentially operated rightwards in the six modules, diagonally-diagonal data are calculated at the same time to complete Pipeline work, and data0 represents output data after the Pipeline module is completed. And inputting the data6 and the data7 as input data into a corresponding module for pipeline processing to obtain corresponding output data8 after the pipeline module is completed, wherein the data0 and the data8 are data in an output queue.

It should be noted that the number of the operator threads in the to-be-run threads of different FPGA processors is different. The data calculated by the FPGA processor can be stored in an internal memory database. For M to-be-run threads in the FPGA processor, each to-be-run thread correspondingly processes one task, namely the FPGA processor is provided with M processor nodes, and the FPGA processor nodes can sequentially run each calculation module in a pipelined manner.

And step S1002, the FPGA outputs a processing result after all data in the input queue of the input thread to be operated are processed for each thread to be operated.

According to the gene data analysis method disclosed by the embodiment of the invention, the to-be-operated thread of the FPGA processor comprises a plurality of operator threads, and the data is processed through the operator threads, so that the data processing efficiency is improved.

Corresponding to the foregoing method embodiment, an embodiment of the present invention further provides a structural schematic diagram of a heterogeneous scheduling platform, as shown in fig. 12, including: controller 1201, GPU1202, FPGA1203, and CPU 1204.

The controller 1201 is configured to monitor load balancing of each module in real time during execution of the gene data analysis, and if any module is overloaded, increase the computational resources of the current module and decrease the computational resources of the front-end module.

Further, the controller 1201 includes: the device comprises a reading module, an algorithm acceleration module, a conversion module and an output module.

And the reading module is used for reading the gene sequencing file to be analyzed and the task parameters from the task queue for preprocessing to obtain the task to be executed.

And the algorithm acceleration module is used for detecting the load and the running state of the GPU1202, the CPU1204 and the FPGA1203 in real time, acquiring the computing resources of the GPU1202, the FPGA1203 and the CPU1204, sequentially distributing the tasks to be executed to the processors which are currently in an idle state and/or not overloaded by the head of the task queue according to the load and the running state of the GPU1202, the CPU1204 and the FPGA1203 detected in real time, and controlling the corresponding processors to process the tasks to be executed.

And the conversion module is used for acquiring a calculation result output by the processor for processing the task to be executed, and performing data calibration and format conversion on the calculation result to obtain standard output information.

Further, the algorithm acceleration module is further configured to, in a process in which the controller 1201 controls the corresponding processor to process the task to be executed, increase the computational resource of the processor with an overload if the controller 1201 detects that the processor has an overload in real time.

Further, the reading module is further configured to determine whether the current load can receive a new task.

The GPU1202 is configured to process the N threads in a heterogeneous parallel manner.

Further, the GPU1202 is specifically configured to: and executing N threads to be operated in parallel, executing multi-core parallel analysis processing on an input queue in each thread to be operated based on the multi-core structure to obtain a corresponding output result, converting the output result into a standard output file to be output, wherein the input queue comprises a gene data sequence, and the gene data sequence comprises data which needs to be efficiently executed with data processing in parallel.

And the FPGA1203 is used for processing the M threads in a pipelined parallel manner.

Further, the FPGA1203 is specifically configured to: and executing M threads to be operated in parallel, sequentially carrying out parallel processing on data input into an input queue of the threads to be operated in the n operator threads on the basis of the n operator threads in the threads to be operated aiming at each thread to be operated, and outputting a processing result after all the data input into the input queue of the threads to be operated are processed.

And the CPU processor 1204 is configured to coordinate processing of data by the GPU processor 1202 and the FPGA processor 1203.

According to the gene data analysis method disclosed by the embodiment of the invention, the tasks to be executed are subjected to pipeline management through interaction among the controller, the GPU, the FPGA and the CPU. The method specifically comprises the following steps: reading a gene sequencing file to be analyzed, processing the gene sequencing file to obtain gene data, storing the gene data into a memory, combining the load and the running state of a processor, reading a gene sequence in the memory, distributing the gene sequence to the processor which is in an idle state and/or is not overloaded, executing multi-task calculation analysis, correcting and converting a calculation result, generating an analysis report by statistical standard output information, and completing data analysis. Monitoring load balance in real time in the analysis process, and improving the accelerated processing of computing resources if the load for processing the current task is insufficient; correspondingly reducing the pre-task computing resources to reduce the input; the efficiency of analyzing and reading the multitask gene data in unit time is greatly improved.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A gene data analysis method is applicable to a heterogeneous scheduling platform comprising a controller and a processor, wherein the processor at least comprises a GPU, a CPU and an FPGA, and the method comprises the following steps:

an algorithm acceleration module in the controller sequentially distributes the tasks to be executed to processors which are currently in an idle state and/or not overloaded according to the load and the running state of the GPU, the CPU and the FPGA detected in real time from the head of the task queue, and the controller controls the corresponding processors to process the tasks to be executed; in the process that the controller controls the corresponding processor to process the task to be executed, if the controller detects that the processor is overloaded in real time, increasing the computing resources of the processor which is overloaded;

2. The method of claim 1, wherein the controller controlling the corresponding processor to perform the task comprises:

the controller processes a gene sequencing file to be analyzed carried in the task to be executed to obtain a gene data sequence, and stores the gene data sequence in a memory database of the heterogeneous scheduling platform, wherein the gene sequencing file to be analyzed comprises a gene sequencing fragment to be analyzed and analysis parameters;

3. The method of claim 1, wherein a reading module in the controller reads the gene sequencing file to be analyzed and the task parameters from the task queue for preprocessing, and before obtaining the task to be executed, the method further comprises:

the controller determines whether the current load can receive a new task.

4. The method according to claim 1, wherein if the loads of the GPU, the CPU, and the FPGA are not overloaded and/or are in an idle state, the controller controls the corresponding processor to process the task to be executed, including:

5. The method according to any of claims 2-4, wherein the GPU processes the threads to be run in a heterogeneous parallel manner, comprising:

6. The method according to any one of claims 2-4, wherein the FPGA processes the thread to be run in a pipelined parallel manner, comprising:

7. A heterogeneous scheduling platform, comprising: the system comprises a controller and a processor, wherein the processor at least comprises a GPU, a CPU and an FPGA;

wherein the content of the first and second substances,

the algorithm acceleration module is further configured to, in a process in which the controller controls the corresponding processor to process the task to be executed, increase the computational resources of the processor with an overload if the controller detects that the processor is overloaded in real time;

8. The heterogeneous scheduling platform of claim 7, wherein the reading module is further configured to determine whether a current load can receive a new task.