CN110262887B

CN110262887B - CPU-FPGA task scheduling method and device based on feature recognition

Info

Publication number: CN110262887B
Application number: CN201910563352.0A
Authority: CN
Inventors: 张海涛; 杜沛伦; 马华东
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2019-06-26
Filing date: 2019-06-26
Publication date: 2022-04-01
Anticipated expiration: 2039-06-26
Also published as: CN110262887A

Abstract

The embodiment of the invention provides a CPU-FPGA task scheduling method and a device based on feature recognition, wherein the method comprises the following steps: the method comprises the steps of obtaining a plurality of tasks to be processed and data volume of each task to be processed, extracting characteristic information of each task to be processed in the plurality of tasks to be processed, generating characteristic vectors of each task to be processed based on CPU characteristic information, FPGA characteristic information and task self characteristic information of each task to be processed, inputting the generated characteristic vectors of each task to be processed into a pre-trained classification model to obtain classification results of each task to be processed, sequencing the tasks to be processed according to the size relation among the data volume of each task to be processed, and scheduling the sequenced tasks to be processed into a CPU and an FPGA respectively according to a preset strategy. The embodiment of the invention can improve the load balance of the CPU-FPGA in the task scheduling process of the CPU-FPGA.

Description

CPU-FPGA task scheduling method and device based on feature recognition

Technical Field

The invention relates to the technical field of computers, in particular to a CPU-FPGA task scheduling method and device based on feature recognition.

Background

With the development of artificial intelligence, multimedia technology and high-performance computing in recent years, the interest of people in extensive research on heterogeneous computing architectures is stimulated. The traditional server platform cannot bear massive and diversified data processing tasks, and heterogeneous computing is carried along with the development of distributed computing and high-performance computing. The heterogeneous computing platform integrates heterogeneous computing resources and storage resources, provides elastic resource allocation for data processing of tasks, improves the resource utilization rate, reduces the service cost, provides a fault tolerance and fault recovery technology, provides a safe and reliable platform for data processing of the tasks, and enables more and more tasks to be migrated to the heterogeneous platform for processing. For example, the task may be video data processing, image data processing, etc., and the heterogeneous platform may be a heterogeneous server, etc.

At present, for a task that needs to be processed on a CPU (Central Processing Unit) -FPGA (Field-Programmable Gate Array) heterogeneous platform, a Processing method adopted by the present invention is as follows: the CPU preferentially distributes the received tasks to the FPGA to be executed according to the sequence of the received tasks aiming at the tasks received on the CPU-FPGA heterogeneous platform, and distributes the received tasks to the CPU to be executed when the tasks required to be executed on the FPGA reach a certain number (at the moment, the memory of the FPGA is completely occupied).

However, in the existing processing method for tasks on the CPU-FPGA heterogeneous platform, the tasks are allocated according to the sequence of the received tasks, and in practical application, because the size of the data volume required to be processed by different tasks is different, the tasks are allocated according to the sequence of the received tasks, so that the problem of CPU-FPGA load imbalance caused by allocating tasks with small data volume required to be processed or low computational complexity to FPGAs and allocating tasks with large data volume required to be processed or high computational complexity to CPUs easily occurs.

Disclosure of Invention

The embodiment of the invention aims to provide a CPU-FPGA task scheduling method and device based on feature recognition, which can further improve the load balance of a CPU-FPGA in the CPU-FPGA task scheduling process. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a CPU-FPGA task scheduling method based on feature recognition, where the method includes:

acquiring a plurality of tasks to be processed and data volume of each task to be processed, wherein the tasks to be processed are tasks to be processed in the CPU and the FPGA, and the data volume is used for representing data processing volume required by processing the tasks to be processed;

extracting feature information of each task to be processed in the plurality of tasks to be processed, wherein the feature information comprises: CPU characteristic information, FPGA characteristic information and task self characteristic information; the CPU characteristic information is used for representing characteristics of the CPU when the CPU processes the task to be processed, and the FPGA characteristic information is used for representing characteristics of the FPGA when the FPGA processes the task to be processed;

generating a feature vector of each task to be processed based on the CPU feature information, the FPGA feature information and the feature information of the task to be processed; the characteristic vector is generated after dimension of the characteristic information of the task to be processed is removed;

inputting the generated feature vector of each task to be processed into a pre-trained classification model to obtain a classification result of each task to be processed, wherein the classification result comprises: the system comprises a first classification result and a second classification result, wherein a first task to be processed corresponding to the first classification result is used for processing in a CPU, and a second task to be processed corresponding to the second classification result is used for processing in an FPGA; the classification model is obtained by training according to the feature vector corresponding to the preset task and the class label corresponding to the preset task;

sequencing the first tasks to be processed according to the magnitude relation among the data volumes of the first tasks to be processed, and scheduling the sequenced first tasks to be processed into a CPU (Central processing Unit) according to a preset strategy for processing;

and sequencing the second tasks to be processed according to the magnitude relation among the data volumes of the second tasks to be processed, and scheduling the sequenced second tasks to be processed into the FPGA according to a preset strategy for processing.

Optionally, the step of generating a feature vector of each to-be-processed task based on the CPU feature information, the FPGA feature information, and the task own feature information of each to-be-processed task includes:

removing dimensions from the CPU characteristic information, the FPGA characteristic information and the task self characteristic information of each task to be processed to obtain CPU characteristic data, FPGA characteristic data and task self characteristic data of each task to be processed;

and combining the CPU characteristic data, the FPGA characteristic data and the task self characteristic data of each task to be processed into a characteristic vector of each task to be processed according to a preset rule.

Optionally, the step of sorting the first to-be-processed tasks according to the size relationship between the data volumes of the first to-be-processed tasks includes:

sorting first tasks to be processed corresponding to the CPU task subsets in an ascending order according to the magnitude relation among the data volumes of the first tasks to be processed to obtain a first CPU task queue;

or sorting the first tasks to be processed corresponding to the CPU task subsets in a descending order according to the magnitude relation between the data volumes of the first tasks to be processed to obtain a second CPU task queue;

the step of sequencing the second tasks to be processed according to the magnitude relationship between the data volumes of the second tasks to be processed includes:

sorting second tasks to be processed corresponding to the FPGA task subset in a descending order according to the magnitude relation among the data volumes of the second tasks to be processed to obtain a first FPGA task queue;

or sorting the second tasks to be processed corresponding to the FPGA task subset in an ascending order according to the magnitude relation among the data volumes of the second tasks to be processed to obtain a second FPGA task queue.

Optionally, the step of scheduling each sequenced first task to be processed to the CPU according to a preset policy for processing includes:

according to the ordered first tasks to be processed, sequentially scheduling the first tasks to be processed in a first CPU task queue to a CPU for processing according to the sequence of the first tasks to be processed in the first CPU task queue;

or, for each sorted first task to be processed, each first task to be processed in the second CPU task queue is sequentially scheduled to the CPU for processing according to the reverse order of each first task to be processed in the second CPU task queue.

Optionally, the step of scheduling each of the sequenced second tasks to be processed into the FPGA according to a preset policy for processing includes:

according to the ordered second tasks to be processed, sequentially scheduling the second tasks to be processed in the first FPGA task queue to the FPGA for processing according to the sequence of the second tasks to be processed in the first FPGA task queue;

or, for each second to-be-processed task after sequencing, sequentially scheduling each second to-be-processed task in a second FPGA task queue to the FPGA for processing according to a reverse order of each second to-be-processed task in the second FPGA task queue.

In a second aspect, an embodiment of the present invention provides a CPU-FPGA task scheduling device based on feature recognition, where the device includes:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plurality of tasks to be processed and data volume of each task to be processed, the tasks to be processed are tasks to be processed in the CPU and the FPGA, and the data volume is used for representing data processing volume required by processing the tasks to be processed;

an extraction module, configured to extract feature information of each to-be-processed task in the multiple to-be-processed tasks, where the feature information includes: CPU characteristic information, FPGA characteristic information and task self characteristic information; the CPU characteristic information is used for representing characteristics of the CPU when the CPU processes the task to be processed, and the FPGA characteristic information is used for representing characteristics of the FPGA when the FPGA processes the task to be processed;

the generating module is used for generating a feature vector of each task to be processed based on the CPU feature information, the FPGA feature information and the feature information of the task; the characteristic vector is generated after dimension of the characteristic information of the task to be processed is removed;

a classification module, configured to input the generated feature vector of each to-be-processed task into a pre-trained classification model, to obtain a classification result of each to-be-processed task, where the classification result includes: the system comprises a first classification result and a second classification result, wherein a first task to be processed corresponding to the first classification result is used for processing in a CPU, and a second task to be processed corresponding to the second classification result is used for processing in an FPGA; the classification model is obtained by training according to the feature vector corresponding to the preset task and the class label corresponding to the preset task;

the first scheduling module is used for sequencing the first tasks to be processed according to the magnitude relation among the data volumes of the first tasks to be processed and scheduling the sequenced first tasks to be processed into a CPU (central processing unit) according to a preset strategy for processing;

and the second scheduling module is used for sequencing the second tasks to be processed according to the size relationship among the data volumes of the second tasks to be processed and scheduling the sequenced second tasks to be processed into the FPGA according to a preset strategy for processing.

Optionally, the generating module includes:

the dimensionless submodule is used for removing dimensions of the CPU characteristic information, the FPGA characteristic information and the task self characteristic information of each task to be processed to obtain CPU characteristic data, FPGA characteristic data and task self characteristic data of each task to be processed;

and the generation submodule is used for combining the CPU characteristic data, the FPGA characteristic data and the task self characteristic data of each task to be processed into a characteristic vector of each task to be processed according to a preset rule.

Optionally, the first scheduling module is specifically configured to:

the second scheduling module is specifically configured to:

Optionally, the first scheduling module is specifically configured to:

according to the ordered first tasks to be processed, sequentially scheduling the first tasks to be processed in the first CPU task queue to the CPU for processing according to the sequence of the first tasks to be processed in the first CPU task queue;

Optionally, the second scheduling module is specifically configured to:

or, for each second to-be-processed task after sequencing, sequentially scheduling each second to-be-processed task in the second FPGA task queue to the FPGA for processing according to the reverse order of each second to-be-processed task in the second FPGA task queue.

In a third aspect, an embodiment of the present invention further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of the CPU-FPGA task scheduling method based on the feature recognition in the first aspect when executing the program stored in the memory.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the CPU-FPGA task scheduling method based on feature recognition in the first aspect.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a CPU-FPGA task scheduling method and a device based on feature recognition, since the classification of the tasks to be processed is performed based on the feature vectors generated based on the feature information of the respective tasks to be processed, before scheduling the tasks to be processed, comprehensively considering the CPU characteristic information, the FPGA characteristic information and the self characteristic information of the tasks, and then the tasks to be processed are classified, so that the first task to be processed obtained after classification is more suitable for processing in a CPU, the second task to be processed is more suitable for processing in an FPGA, the load balance of the CPU-FPGA is further improved in the process of scheduling the CPU-FPGA tasks, and in addition, when the tasks to be processed are scheduled, the scheduling is performed based on the size relationship between the data volumes of the tasks to be processed, so that the resources of the CPU and the FPGA can be better utilized in a coordinated manner.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a CPU-FPGA task scheduling method based on feature recognition according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a feature vector generation method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a CPU-FPGA task scheduling device based on feature recognition according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a generating module according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to solve the problem of CPU-FPGA load imbalance caused in the existing CPU-FPGA task scheduling process, the embodiment of the invention provides a CPU-FPGA task scheduling method and device based on feature recognition.

First, a CPU-FPGA task scheduling method based on feature recognition provided in the embodiments of the present invention is described below.

As shown in fig. 1, fig. 1 is a schematic flowchart of a CPU-FPGA task scheduling method based on feature recognition according to an embodiment of the present invention, where the method may include:

s101, a plurality of tasks to be processed and the data volume of each task to be processed are obtained.

In the embodiment of the invention, the scheduling process of the tasks on the CPU-FPGA heterogeneous platform can be executed by the CPU. Before scheduling the task, a plurality of to-be-processed tasks and the data volume of each to-be-processed task may be acquired. The task to be processed may be a task to be processed in the CPU and the FPGA, and the data amount of the task to be processed may represent a data processing amount required for processing the task to be processed.

In practical application, the acquired task to be processed may be a task sent to the CPU-FPGA heterogeneous platform by a user, or may be a task sent to the CPU-FPGA heterogeneous platform by another device.

S102, extracting characteristic information of each task to be processed in a plurality of tasks to be processed.

After obtaining the multiple to-be-processed tasks, feature information corresponding to each to-be-processed task in the multiple to-be-processed tasks may be extracted, where the feature information may include: CPU characteristic information, FPGA characteristic information and task self characteristic information. The CPU characteristic information is used for representing characteristics of the CPU when the CPU processes the task to be processed, and the FPGA characteristic information is used for representing characteristics of the FPGA when the FPGA processes the task to be processed.

It is understood that the CPU characteristic information may be some characteristic attributes inherent to the CPU, and the FPGA characteristic information may be some characteristic attributes inherent to the FPGA. For example, the CPU feature information and the FPGA feature information may include: CPU chip frequency, FPGA chip frequency, maximum data transmission rate between CPU and host memory, maximum bandwidth between CPU and FPGA, bandwidth between local storage and global storage, etc.

The task self characteristic information can comprise: static characteristic information and dynamic characteristic information. Illustratively, the static feature information may include: the method includes the steps of interval time of an OpenCL (Open Computing Language) kernel, number of clock cycles lost before first effective output, LUTs (Look-Up-Tables) utilization rate, FFs (Fringe Field Switching) utilization rate, optimal clock cycles of the OpenCL kernel, complexity of a Computing task, and the like, for example, the number of clock cycles lost before first effective output may be, when the first effective output is available after a third clock cycle, the number of clock cycles lost before the first effective output is 2, the complexity of a task may be, for example, complexity of an algorithm in the task, and the static feature information of the task may indicate a feature that the task to be processed does not change in a processing process. The dynamic characteristic information may include: the dynamic characteristic information of the task can represent the characteristic that the task to be processed changes in the processing process.

S103, generating a feature vector of each task to be processed based on the CPU feature information, the FPGA feature information and the feature information of the task.

In the embodiment of the present invention, feature information of each to-be-processed task is extracted, and a feature vector of each to-be-processed task is generated after removing dimensions from the feature information of each to-be-processed task, referring to fig. 2, fig. 2 is a schematic flow diagram of a feature vector generation method provided in the embodiment of the present invention, where the generation method may include:

and S1031, removing dimensions from the CPU characteristic information, the FPGA characteristic information and the task self characteristic information of each task to be processed to obtain CPU characteristic data, the FPGA characteristic data and the task self characteristic data of each task to be processed.

For example, the CPU feature information, the FPGA feature information, and the task feature information of the task to be processed may include: the CPU chip frequency is 50Hz, the maximum bandwidth between the CPU and the FPGA is 50bps, the optimal clock cycle of the OpenCL kernel is 50s, the data volume transmitted to the host by the FPGA is 50MB, the local workload quantity of the CPU and the FPGA is 50, and the like, dimensions of the characteristic information are removed, CPU characteristic data of a task to be processed are correspondingly obtained, and the FPGA characteristic data and the characteristic data of the task are as follows: the CPU chip frequency is 50, the maximum bandwidth between the CPU and the FPGA is 50, the optimum clock cycle of the OpenCL kernel is 50, the data volume transmitted to the host by the FPGA is 50, the local workload quantity of the CPU and the FPGA is 50, and the like.

S1032, combining the CPU characteristic data, the FPGA characteristic data and the task self characteristic data of each task to be processed into a characteristic vector of each task to be processed according to a preset rule.

Illustratively, the CPU feature data of the task to be processed is obtained, and the FPGA feature data and the task feature data are: the CPU chip frequency is 50, the maximum bandwidth between the CPU and the FPGA is 50, the optimal clock cycle of the OpenCL kernel is 50, the data volume transmitted to the host by the FPGA is 50, the local workload quantity of the CPU and the FPGA is 50, and the like, and the characteristic data are combined into a characteristic vector of a task to be processed, which can be expressed as (50,50,50,50,50, 50). The preset rule can be that CPU characteristic data is placed in front, FPGA characteristic data is placed in the middle, and the characteristic data of the task is placed at the end, so that the characteristic vectors of the task to be processed are combined; the preset rule can also be that CPU characteristic data, FPGA characteristic data and task self characteristic data of the task to be processed are randomly arranged to be combined into a characteristic vector of the task to be processed. Specifically, the setting of the preset rule may be performed by a person skilled in the art according to actual requirements, and the embodiment of the present invention is not limited herein.

Referring to fig. 1, S104, inputting the generated feature vector of each to-be-processed task into a pre-trained classification model, so as to obtain a classification result of each to-be-processed task.

And inputting the generated feature vectors of the tasks to be processed into a pre-trained classification model, and further obtaining the classification result of each task to be processed. The classification result may include: and the first classification result and the second classification result, wherein a first task to be processed corresponding to the first classification result is used for processing in the CPU, and a second task to be processed corresponding to the second classification result is used for processing in the FPGA.

The classification model is obtained by training according to the feature vector corresponding to the preset task and the class label corresponding to the preset task. Illustratively, the classification model may be a SVM classification model based on a SVM (Support Vector Machine) algorithm, or a KNN (K-nearest neighbor) classification model based on a KNN algorithm, or a K-means classification model based on a K-means clustering algorithm. The preset task may be a task in a task set collected in advance for training the classification model, the task set collected in advance may be a task set including a plurality of tasks, the data amount of which is varied from 1M to 1G, and the tasks in the task set may include a visual task, an encryption task, a high-performance parallelism task, and the like, for example, the visual task may include: an edge detection type task, a watermark removal type task or a target detection type task and the like.

For example, for a pre-collected preset task, the implementation manner of extracting the feature information of the to-be-processed task and generating the feature vector of each to-be-processed task may be referred to, so as to extract the feature information of the preset task and generate the feature vector corresponding to the preset task. And further carrying out actual measurement on the CPU-FPGA platform to obtain an acceleration ratio corresponding to the preset task, wherein the acceleration ratio is the acceleration degree, and can be accurately described by the ratio of the processing time required by the preset task in the CPU and the FPGA respectively. The preset task with the acceleration ratio larger than 4 can be defined as an acceleration task, the preset task with the acceleration ratio not larger than 4 can be defined as a common task, further, the category corresponding to the acceleration task is identified as a second classification result, the category corresponding to the common task is identified as a first classification result, and the category label corresponding to the preset task is obtained. Then, a classification model is obtained through training according to the feature vectors corresponding to the preset tasks and the class labels corresponding to the preset tasks, and the specific training process can be realized by referring to the prior art, which is not described herein again in the embodiments of the present invention.

In the embodiment of the invention, the class corresponding to the acceleration task is identified as the second classification result, the class corresponding to the common task is identified as the first classification result, the obtained classification model is trained, the feature vector of each task to be processed can be input into the classification model, and the first classification result corresponding to the first task to be processed in the CPU and the second classification result corresponding to the second task to be processed in the FPGA are obtained. The task to be processed with low task complexity or low data calculation parallelism and not beneficial to the acceleration of the FPGA chip can be divided into the first classification result to be processed in the CPU, and the task to be processed with high task complexity or high data calculation parallelism and beneficial to the acceleration of the FPGA chip can be divided into the second classification result to be processed in the FPGA, so that the load balance of the CPU-FPGA can be improved.

And S105, sequencing the first tasks to be processed according to the magnitude relation among the data volumes of the first tasks to be processed, and scheduling the sequenced first tasks to be processed to a CPU (Central processing Unit) according to a preset strategy for processing.

In the embodiment of the present invention, the first classification result may be represented as a CPU task subset, and the second classification result may be represented as an FPGA task subset. The CPU task subset corresponds to a plurality of first tasks to be processed, and the FPGA task subset corresponds to a plurality of second tasks to be processed.

The implementation manner of sorting the first to-be-processed tasks according to the magnitude relationship between the data volumes of the first to-be-processed tasks may include:

in an implementation of sorting the first tasks to be processed, the first tasks to be processed corresponding to the CPU task subset may be sorted in an ascending order according to a magnitude relationship between data volumes of the first tasks to be processed, so as to obtain a first CPU task queue. The data amount of the first to-be-processed task may be represented as a data processing amount required for processing the first to-be-processed task.

In another implementation of ordering the first tasks to be processed, the first tasks to be processed corresponding to the CPU task subset may be ordered in a descending order according to a magnitude relationship between data amounts of the first tasks to be processed, so as to obtain a second CPU task queue. The data amount of the second to-be-processed task may be represented as a data processing amount required for processing the second to-be-processed task.

And the implementation modes of sequencing the first tasks to be processed are different, and the implementation modes of scheduling each sequenced first task to be processed to the CPU according to the preset strategy are also different. Specifically, for the above implementation of sorting the first tasks to be processed, the implementation of scheduling each sorted first task to be processed into the CPU according to the preset policy may be:

and aiming at each first task to be processed after sequencing, scheduling each first task to be processed in the first CPU task queue to the CPU for processing according to the sequence of each first task to be processed in the first CPU task queue. At this time, the preset policy may be: and scheduling the first tasks to be processed in sequence according to the sequence of the first tasks to be processed in the first CPU task queue.

For another above implementation of ordering the first tasks to be processed, the implementation of scheduling each ordered first task to be processed to the CPU according to the preset policy may be:

and aiming at each first task to be processed after sequencing, scheduling each first task to be processed in the second CPU task queue to the CPU for processing according to the reverse sequence of each first task to be processed in the second CPU task queue. At this time, the preset policy may be: and scheduling the first tasks to be processed in sequence according to the reverse order of the first tasks to be processed in the first CPU task queue.

As an optional implementation manner of the present invention, the first tasks to be processed are sorted according to the size relationship between the data volumes of the first tasks to be processed, and the first tasks to be processed are sequentially scheduled to the CPU for processing according to the sequence or the reverse sequence after sorting, so that the first tasks to be processed corresponding to the smaller data volume are preferentially processed in the CPU. Furthermore, when the task queue of the CPU is empty, a second task to be processed corresponding to a smaller data amount in the task queue of the FPGA can be preferentially scheduled, so as to better coordinate and utilize resources of the CPU and the FPGA.

S106, sequencing the second tasks to be processed according to the size relation among the data volumes of the second tasks to be processed, and scheduling the sequenced second tasks to be processed into the FPGA according to a preset strategy for processing.

The implementation manner of sequencing the second tasks to be processed according to the magnitude relationship between the data volumes of the second tasks to be processed may include:

in an implementation manner of sorting the second to-be-processed tasks, the second to-be-processed tasks corresponding to the FPGA task subset may be sorted in a descending order according to a magnitude relation between data volumes of the second to-be-processed tasks, so as to obtain the first FPGA task queue.

In another implementation of sequencing the second to-be-processed tasks, the second to-be-processed tasks corresponding to the FPGA task subset may be sequenced in an ascending order according to a magnitude relationship between data volumes of the second to-be-processed tasks, so as to obtain a second FPGA task queue.

And the second tasks to be processed are scheduled to the FPGA according to a preset strategy for processing according to different implementation modes of the second tasks to be processed. Specifically, for the above implementation of ordering the second to-be-processed tasks, the implementation of scheduling each ordered second to-be-processed task into the FPGA according to the preset policy for processing may be:

and aiming at each second task to be processed after sequencing, scheduling each second task to be processed in the first FPGA task queue to the FPGA in sequence according to the sequence of each second task to be processed in the first FPGA task queue. At this time, the preset policy may be: and scheduling the second tasks to be processed in sequence according to the sequence of the second tasks to be processed in the first FPGA task queue.

For another implementation of sequencing the second to-be-processed tasks, the implementation of scheduling each sequenced second to-be-processed task into the FPGA according to the preset policy to process may be:

and aiming at each second task to be processed after sequencing, scheduling each second task to be processed in the second FPGA task queue to the FPGA in sequence according to the reverse sequence of each second task to be processed in the second FPGA task queue. At this time, the preset policy may be: and scheduling the second tasks to be processed in sequence according to the reverse order of the second tasks to be processed in the second FPGA task queue.

As an optional implementation manner of the present invention, the second to-be-processed tasks are sorted according to the size relationship between the data volumes of the second to-be-processed tasks, and after sorting, the second to-be-processed tasks are sequentially scheduled to the FPGA according to the sequence or the reverse sequence to be processed, so that the second to-be-processed tasks corresponding to the larger data volumes are preferentially processed in the FPGA. Furthermore, when the task queue of the FPGA is empty, the first to-be-processed task with a larger data amount in the task queue of the CPU can be preferentially scheduled, so as to better coordinate and utilize resources of the CPU and the FPGA.

The embodiment of the invention provides a CPU-FPGA task scheduling method based on feature recognition, since the classification of the tasks to be processed is performed based on the feature vectors generated based on the feature information of the respective tasks to be processed, before scheduling the tasks to be processed, comprehensively considering the CPU characteristic information, the FPGA characteristic information and the self characteristic information of the tasks, and then the tasks to be processed are classified, so that the first task to be processed obtained after classification is more suitable for processing in a CPU, the second task to be processed is more suitable for processing in an FPGA, the load balance of the CPU-FPGA is further improved in the process of scheduling the CPU-FPGA tasks, and in addition, when the tasks to be processed are scheduled, the scheduling is performed based on the size relationship between the data volumes of the tasks to be processed, so that the resources of the CPU and the FPGA can be better utilized in a coordinated manner.

Corresponding to the above method embodiment, an embodiment of the present invention provides a CPU-FPGA task scheduling device based on feature recognition, and as shown in fig. 3, the device may include:

the acquiring module 201 is configured to acquire a plurality of tasks to be processed and data volumes of the tasks to be processed, where the tasks to be processed are tasks to be processed in the CPU and the FPGA, and the data volumes are used to indicate data processing volumes required for processing the tasks to be processed.

An extracting module 202, configured to extract feature information of each to-be-processed task in multiple to-be-processed tasks, where the feature information includes: CPU characteristic information, FPGA characteristic information and task self characteristic information; the CPU characteristic information is used for representing the characteristics of the CPU when the CPU processes the tasks to be processed, and the FPGA characteristic information is used for representing the characteristics of the FPGA when the FPGA processes the tasks to be processed.

The generating module 203 is configured to generate a feature vector of each to-be-processed task based on the CPU feature information, the FPGA feature information, and the feature information of the task itself; the feature vector is generated after removing dimension of the feature information of the task to be processed.

The classification module 204 is configured to input the generated feature vector of each to-be-processed task into a pre-trained classification model to obtain a classification result of each to-be-processed task, where the classification result includes: the system comprises a first classification result and a second classification result, wherein a first task to be processed corresponding to the first classification result is used for processing in a CPU, and a second task to be processed corresponding to the second classification result is used for processing in an FPGA; the classification model is obtained by training according to the feature vector corresponding to the preset task and the class label corresponding to the preset task.

The first scheduling module 205 is configured to sort the first tasks to be processed according to a size relationship between data volumes of the first tasks to be processed, and schedule the sorted first tasks to be processed into the CPU according to a preset policy for processing.

And the second scheduling module 206 is configured to sort the second tasks to be processed according to the size relationship between the data volumes of the second tasks to be processed, and schedule the sorted second tasks to be processed into the FPGA according to a preset policy for processing.

The embodiment of the invention provides a CPU-FPGA task scheduling device based on feature recognition, since the classification of the tasks to be processed is performed based on the feature vectors generated based on the feature information of the respective tasks to be processed, before scheduling the tasks to be processed, comprehensively considering the CPU characteristic information, the FPGA characteristic information and the self characteristic information of the tasks, and then the tasks to be processed are classified, so that the first task to be processed obtained after classification is more suitable for processing in a CPU, the second task to be processed is more suitable for processing in an FPGA, the load balance of the CPU-FPGA is further improved in the process of scheduling the CPU-FPGA tasks, and in addition, when the tasks to be processed are scheduled, the scheduling is performed based on the size relationship between the data volumes of the tasks to be processed, so that the resources of the CPU and the FPGA can be better utilized in a coordinated manner.

It should be noted that the device according to the embodiment of the present invention is a device corresponding to the CPU-FPGA task scheduling method based on feature recognition shown in fig. 1, and all embodiments of the CPU-FPGA task scheduling method based on feature recognition shown in fig. 1 are applicable to the device and can achieve the same beneficial effects.

Optionally, as shown in fig. 4, the generating module 203 includes:

the dimension removing submodule 2031 is configured to remove dimensions from the CPU feature information, the FPGA feature information, and the task feature information of each to-be-processed task, so as to obtain CPU feature data, FPGA feature data, and task feature data of each to-be-processed task.

The generating submodule 2032 is configured to combine the CPU feature data, the FPGA feature data, and the feature data of the task itself of each to-be-processed task into a feature vector of each to-be-processed task according to a preset rule.

Optionally, the first scheduling module 205 is specifically configured to:

and sorting the first tasks to be processed corresponding to the CPU task subsets in an ascending order according to the magnitude relation among the data volumes of the first tasks to be processed to obtain a first CPU task queue.

Or sorting the first tasks to be processed corresponding to the CPU task subsets in a descending order according to the magnitude relation between the data volumes of the first tasks to be processed to obtain a second CPU task queue.

The second scheduling module 206 is specifically configured to:

and sorting the second tasks to be processed corresponding to the FPGA task subset in a descending order according to the magnitude relation among the data volumes of the second tasks to be processed to obtain a first FPGA task queue.

Optionally, the first scheduling module 205 is specifically configured to:

and aiming at each first task to be processed after sequencing, scheduling each first task to be processed in the first CPU task queue to the CPU for processing according to the sequence of each first task to be processed in the first CPU task queue.

Optionally, the second scheduling module 206 is specifically configured to:

and aiming at each second task to be processed after sequencing, scheduling each second task to be processed in the first FPGA task queue to the FPGA in sequence according to the sequence of each second task to be processed in the first FPGA task queue.

An embodiment of the present invention further provides an electronic device, as shown in fig. 5, including a processor 301, a communication interface 302, a memory 303, and a communication bus 304, where the processor 301, the communication interface 302, and the memory 303 complete mutual communication through the communication bus 304,

a memory 303 for storing a computer program;

the processor 301 is configured to implement the steps of any one of the above-mentioned CPU-FPGA task scheduling methods based on feature recognition when executing the program stored in the memory 303.

In the electronic device provided by the embodiment of the invention, the classification of the tasks to be processed is realized based on the feature vectors generated by the feature information of each task to be processed, before the tasks to be processed are scheduled, the CPU feature information, the FPGA feature information and the feature information of the tasks are comprehensively considered, and then the tasks to be processed are classified, so that the first tasks to be processed obtained after classification are more suitable for being processed in the CPU, the second tasks to be processed are more suitable for being processed in the FPGA, the load balance of the CPU-FPGA is further improved in the scheduling process of the CPU-FPGA tasks, and when the tasks to be processed are scheduled, the resources of the CPU and the FPGA can be better utilized in a coordinated manner due to the scheduling based on the size relationship between the data volumes of the tasks to be processed.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the above-mentioned CPU-FPGA task scheduling methods based on feature recognition.

In yet another embodiment of the present invention, a computer program product containing instructions is further provided, which when run on a computer, causes the computer to perform the steps of any one of the above-mentioned embodiments of the CPU-FPGA task scheduling method based on feature recognition.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the device/electronic apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to some descriptions of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A CPU-FPGA task scheduling method based on feature recognition is characterized by comprising the following steps:

inputting the generated feature vector of each task to be processed into a pre-trained classification model to obtain a classification result of each task to be processed, wherein the classification result comprises: the system comprises a first classification result and a second classification result, wherein a first task to be processed corresponding to the first classification result is used for processing in a CPU, and a second task to be processed corresponding to the second classification result is used for processing in an FPGA; the classification model is obtained by training according to the feature vector corresponding to the preset task and the class label corresponding to the preset task; the first classification result is a CPU task subset, and the second classification result is an FPGA task subset;

sequencing the second tasks to be processed according to the magnitude relation among the data volumes of the second tasks to be processed, and scheduling the sequenced second tasks to be processed into an FPGA (field programmable gate array) according to a preset strategy for processing;

the method for sequencing the second tasks to be processed according to the size relationship between the data volumes of the second tasks to be processed and scheduling the sequenced second tasks to be processed into the FPGA according to a preset strategy includes:

sorting second tasks to be processed corresponding to the FPGA task subset in a descending order according to the magnitude relation among the data volumes of the second tasks to be processed to obtain a first FPGA task queue; according to the ordered second tasks to be processed, sequentially scheduling the second tasks to be processed in the first FPGA task queue to the FPGA for processing according to the sequence of the second tasks to be processed in the first FPGA task queue;

or sorting second to-be-processed tasks corresponding to the FPGA task subset in an ascending order according to the magnitude relation among the data volumes of the second to-be-processed tasks to obtain a second FPGA task queue; and sequentially scheduling each second task to be processed in the second FPGA task queue to the FPGA for processing according to the reverse order of each second task to be processed in the second FPGA task queue aiming at each second task to be processed after sequencing.

2. The method according to claim 1, wherein the step of generating a feature vector of each to-be-processed task based on CPU feature information, FPGA feature information, and task own feature information of each to-be-processed task includes:

3. The method according to claim 2, wherein the step of sorting the first tasks to be processed according to the size relationship between the data volumes of the first tasks to be processed comprises:

4. The method according to claim 3, wherein the step of scheduling each of the ordered first tasks to be processed to a CPU according to a preset policy for processing comprises:

5. A CPU-FPGA task scheduling device based on feature recognition is characterized by comprising:

a classification module, configured to input the generated feature vector of each to-be-processed task into a pre-trained classification model, to obtain a classification result of each to-be-processed task, where the classification result includes: the system comprises a first classification result and a second classification result, wherein a first task to be processed corresponding to the first classification result is used for processing in a CPU, and a second task to be processed corresponding to the second classification result is used for processing in an FPGA; the classification model is obtained by training according to the feature vector corresponding to the preset task and the class label corresponding to the preset task; the first classification result is a CPU task subset, and the second classification result is an FPGA task subset;

the second scheduling module is used for sequencing the second tasks to be processed according to the size relationship among the data volumes of the second tasks to be processed and scheduling the sequenced second tasks to be processed into the FPGA according to a preset strategy for processing;

the second scheduling module is specifically configured to:

6. The apparatus of claim 5, wherein the generating module comprises:

7. The apparatus of claim 6, wherein the first scheduling module is specifically configured to:

8. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 4 when executing a program stored in the memory.

9. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 4.