CN116360941A

CN116360941A - Multi-core DSP-oriented parallel computing resource organization scheduling method and system

Info

Publication number: CN116360941A
Application number: CN202310265508.3A
Authority: CN
Inventors: 郭冯凤; 焦淼; 韩源冬; 包达尔罕; 高洪宇
Original assignee: Xian Microelectronics Technology Institute
Current assignee: Xian Microelectronics Technology Institute
Priority date: 2023-03-17
Filing date: 2023-03-17
Publication date: 2023-06-30

Abstract

The invention discloses a parallel computing resource organization scheduling method and system for a multi-core DSP, and belongs to the field of embedded distributed computing. The self-organizing scheduling method of parallel computing resources firstly executes system initialization to ensure that the system enters a stable state; then starting a user parallel computing task and starting to create parallel tasks and data blocks; after the creation is completed, starting calculation when the parallel operation is started, switching the system to a parallel scheduling engine, starting scheduling tasks and data blocks to an acceleration core, and after the acceleration core receives a corresponding request, starting to execute corresponding modules to complete initialization of a task buffer area, loading and repositioning tasks, calculation of the data blocks and reduction of task contexts. After all data blocks are calculated and reduced, the main control core recovers and outputs the result, so that self-organizing management of computing resources and self-adaptive scheduling of parallel computing tasks are realized, and programming difficulty of parallel computing application is reduced.

Description

Multi-core DSP-oriented parallel computing resource organization scheduling method and system

Technical Field

The invention belongs to the field of embedded distributed computation, and relates to a multi-core DSP-oriented parallel computing resource organization scheduling method and system.

Background

With the rapid development of high-performance processors, parallel development frameworks in the middle of hardware and application development are also becoming increasingly abundant. Currently, a common general parallel computing software framework includes: unified computing device architecture CUDA (Compute Unified Device Architecture, CUDA), openMP, message passing interface MPI (Message Passing Interface, MPI), open computing language (Open Computing Language, openCL), pthread, and the like.

The traditional distributed computing frameworks adopt a master-slave architecture to complete resource management and task scheduling and realize parallel execution, and the design of the traditional distributed computing frameworks cannot support efficient management and utilization of a virtualized computing pool, needs to manually divide data and tasks, and cannot realize self-organizing management of computing resources and self-adaptive scheduling of parallel computing tasks.

Disclosure of Invention

The invention aims to solve the problem that the self-organizing management of computing resources and the self-adapting scheduling of parallel computing tasks cannot be realized in the prior art, and provides a multi-core DSP-oriented parallel computing resource organization scheduling method and system.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

the invention provides a parallel computing resource organization scheduling method for a multi-core DSP, which comprises the following steps:

initializing an embedded DSP operating system and hardware of the embedded DSP operating system, and then performing error detection on the embedded DSP operating system;

after initializing the memory protection in the embedded DSP operating system, creating parallel tasks and data blocks;

and according to the parallel task and the data block parallel scheduling engine, transmitting scheduling information to an acceleration core for processing, and then clearing the parallel task to realize the parallel computing resource organization scheduling.

Preferably, the log system, file system and user heap space in the embedded DSP operating system are initialized.

Preferably, the initialization operation is performed based on hardware characteristics and registers of the hardware.

Preferably, the initialization process is performed on the data structure, idle tasks, clock processing tasks, and interrupt latency processing tasks of the kernel in the embedded DSP operating system.

Preferably, the parallel scheduling engine scheduling method is as follows:

firstly, selecting an initialized task from a task scheduling linked list, and adding the task without dependence into a ready linked list;

secondly, taking out the task from the ready linked list, and informing the acceleration core of finishing the initialization of the task buffer area according to the number of the request cores, and loading and repositioning the task;

and finally, after the initialization is finished, starting to schedule the data blocks to the acceleration core for processing according to a scheduling strategy set by the task, and after all data processing is finished, performing context reduction according to the task requirement and destroying the task.

Preferably, the method for transmitting the scheduling information to the acceleration core for processing is as follows:

the acceleration core receives the request of the main control core, and performs task buffer initialization, task loading and repositioning, data flow multi-buffer pipelining processing and task up-down Wen Guiyao;

wherein each acceleration core processing flow is the same.

Preferably, the method for clearing parallel tasks is as follows:

after the acceleration core finishes calculation of all data blocks, the main control core dispatching engine starts to detect;

when the processing of all the data blocks in the task data block queue is confirmed to be completed, the dispatching engine is suspended and returns to the main thread, and the main thread starts to destroy the task and exits the operation.

The invention provides a parallel computing resource self-organizing scheduling system for a multi-core DSP, which comprises the following components:

the initialization and error detection module is used for carrying out error detection on the embedded DSP operating system after initializing the embedded DSP operating system and hardware of the embedded DSP operating system;

the parallel task and data block creation module is used for creating parallel tasks and data blocks after initializing memory protection in the embedded DSP operating system;

and the information transmission and processing module is used for transmitting the scheduling information to the acceleration core for processing according to the parallel task and the data block parallel scheduling engine, and then clearing the parallel task to realize the parallel computing resource organization scheduling.

A computer device comprising a memory storing a computer program and a processor implementing the steps of a multi-core DSP oriented parallel computing resource organization scheduling method when the computer program is executed.

A computer readable storage medium storing a computer program which when executed by a processor implements the steps of a multi-core DSP oriented parallel computing resource organization scheduling method.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a parallel computing resource self-organizing scheduling method for a multi-core DSP, which comprises the steps of firstly, executing system initialization and ensuring that a system enters a stable state; then starting a user parallel computing task and starting to create parallel tasks and data blocks; after the creation is completed, starting calculation when the parallel operation is started, switching the system to a parallel scheduling engine, starting scheduling tasks and data blocks to an acceleration core, and after the acceleration core receives a corresponding request, starting to execute corresponding modules to complete initialization of a task buffer area, loading and repositioning tasks, calculation of the data blocks and reduction of task contexts. After all data blocks are calculated and reduced, the main control core recovers and outputs the result, so that self-organizing management of computing resources and self-adaptive scheduling of parallel computing tasks are realized, and programming difficulty of parallel computing application is reduced.

Further, the log system, the multi-core file system and the user stack are optional interfaces, and if the user uses the three functions, the initialization is performed.

The parallel computing resource self-organizing scheduling system for the multi-core DSP provided by the invention realizes the parallel computing resource self-organizing scheduling by dividing the system into an initialization and error detection module, a parallel task and data block creation module and an information transmission and processing module. The modules are mutually independent by adopting a modularized idea, so that the modules are convenient to manage uniformly.

Drawings

For a clearer description of the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a parallel computing resource organization scheduling method for a multi-core DSP.

FIG. 2 is a diagram of a parallel computing resource self-organizing scheduling system for a multi-core DSP.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

In the description of the embodiments of the present invention, it should be noted that, if the terms "upper," "lower," "horizontal," "inner," and the like indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, or the azimuth or the positional relationship in which the inventive product is conventionally put in use, it is merely for convenience of describing the present invention and simplifying the description, and does not indicate or imply that the apparatus or element to be referred to must have a specific azimuth, be configured and operated in a specific azimuth, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Furthermore, the term "horizontal" if present does not mean that the component is required to be absolutely horizontal, but may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.

In the description of the embodiments of the present invention, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" should be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

The invention is described in further detail below with reference to the attached drawing figures:

the invention provides a parallel computing resource organization scheduling method for a multi-core DSP, which is shown in figure 1 and comprises the following steps:

s1, initializing an embedded DSP operating system and hardware of the embedded DSP operating system, and then performing error detection on the embedded DSP operating system;

and initializing according to the hardware characteristics and the hardware registers.

And initializing a data structure, an idle task, a clock processing task and an interrupt delay processing task of a kernel in the embedded DSP operating system.

S2, after initializing memory protection in the embedded DSP operating system, creating parallel tasks and data blocks;

and initializing a log system, a file system and a user heap space in the embedded DSP operating system.

And S3, according to the parallel tasks and the data block parallel scheduling engine, transmitting scheduling information to an acceleration core for processing, and then clearing the parallel tasks to realize the parallel computing resource organization scheduling.

The scheduling method of the parallel scheduling engine is as follows:

The method for transmitting the scheduling information to the acceleration core for processing is as follows:

wherein each acceleration core processing flow is the same.

The method for clearing the parallel tasks is as follows:

The specific logic steps are as follows:

step 1, initializing system hardware and operating system

Initializing system hardware: the system hardware initialization mainly completes the initialization of a CPU register and the initialization of other parts of the system, and comprises the steps of initializing an interrupt vector table, and writing an interrupt entry address into an interrupt service table pointer register ISTP; initializing all general registers A0-A31, B0-B31 and control registers; initializing a timestamp counter, PLL frequency doubling controller, DDR3 SRAM, EDMA, EMIF, etc.

Operating system initialization: the work needed by the initialization of the operating system comprises initializing the data structure of the kernel, idle tasks, clock processing tasks, interrupt delay processing tasks and the like, and provides a stable multi-task environment for the application program. The data structures of the kernel include a priority bit table, a ready linked list, a semaphore, a mutex, a message pool, a message queue, a condition variable, a memory heap, and the like. The idle task is a task which runs when the multi-task kernel is idle after being started, the clock processing task is a task for processing clock interrupt and is used for waking up the periodic sleep task, and the interrupt delay task can queue interrupt for processing, so that interrupt closing time is reduced.

Step 2, error detection and memory protection initialization

Initializing memory protection drive, interrupt and hardware exception detection. The initialization method is to configure the registers of each hardware according to the characteristics of the hardware and specific use requirements. And enabling interruption after the initialization is finished, and ensuring that the system has corresponding capability to the outside.

Step 3, creating parallel tasks and data blocks

After the system initialization is completed, a thread can be created to execute parallel computing tasks, and the creation of parallel running, tasks and data blocks is completed. The parallel operation time environment comprises a parallel dispatching engine, a parallel task dispatching linked list, a task thread pool, a task linked list, a synchronous data block queue, a data block transmission list pool, description information related to a platform, an error processing handle and other data structures, wherein tasks created by the system are added into the parallel task dispatching linked list, and data blocks related to the tasks are added into the data block queue of the tasks. When all data structures are created, the parallel scheduler engine may be started to begin scheduling.

Step 4, scheduling by parallel scheduling engine

The parallel dispatching engine dispatching is mainly completed by selecting the tasks with no dependency in an initialized state from the task dispatching linked list to add into the ready linked list, then taking out the tasks from the ready linked list, and informing the acceleration core of completing the initialization of the task buffer zone according to the number of request cores, and loading and repositioning the tasks. After initialization is completed, the data blocks are started to be scheduled to the acceleration core for processing according to a scheduling strategy set by the task, and after all data processing is completed, context reduction is performed according to the task requirements and the task is destroyed.

Step 5, accelerating the core processing

The acceleration core mainly receives the request of the main control core, and executes corresponding processing, which mainly comprises task buffer initialization, task loading and repositioning, data flow multi-buffer pipelining processing and task context reduction. Each acceleration core processing flow is the same.

Step 6, after the dispatching is finished, the parallel tasks and the running time are cleared

After the acceleration core finishes calculation of all data blocks, the main control core dispatching engine starts to detect, and after confirming that all data blocks in the task data block queue are processed, the dispatching engine suspends the condition and returns to the main thread, and the main thread starts to destroy the task and exits the operation.

Wherein the log system, file system, user heap space is initialized (optional). Initializing a log system, a multi-core file system and a user heap. The log system mainly comprises recording system error information; the file system is mainly used for reading formatted data from the memory; the user heap space modifies the user heap space for subsequent user allocation of the heap. This part of the operation belongs to the user selectable interface, and if the user does not use the three functions, no initialization is needed.

The function and the capability of the self-organizing computing method of the embedded multi-core parallel computing system provided by the invention are verified by taking a development board based on the FT-6678 processor as a carrier. Three algorithms of matrix addition, first-order filtering and maximum and minimum number finding are selected to test and verify the performance of the parallel computing framework, and the specific test results are summarized as shown in table 1.

TABLE 1 parallel computing Performance test results summary table

As can be seen from the performance test of the algorithm, the parallel computing self-organizing computing method provided by the invention can obviously improve the running efficiency of parallel application and can fully exert the parallel computing efficiency of the multi-core processor.

The invention provides a parallel computing resource self-organizing scheduling system for a multi-core DSP, which is shown in figure 2 and comprises an initialization and error detection module, a parallel task and data block creation module and an information transmission and processing module;

the information transmission and processing module is used for transmitting the scheduling information to the acceleration core for processing according to the parallel task and the data block parallel scheduling engine, and then clearing the parallel task to realize the parallel computing resource organization scheduling.

An embodiment of the present invention provides a terminal device, where the terminal device includes: a processor, a memory, and a computer program stored in the memory and executable on the processor. The steps of the various method embodiments described above are implemented when the processor executes the computer program. Alternatively, the processor may implement the functions of the modules/units in the above-described device embodiments when executing the computer program.

The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor to accomplish the present invention.

The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory.

The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like.

The memory may be used to store the computer program and/or module, and the processor may implement various functions of the terminal device by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory.

The modules/units integrated in the terminal device may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

After the execution of the self-organizing computing framework of the embedded multi-core parallel computing system is started, the system firstly executes system initialization, including processor initialization, hardware system initialization and operating system initialization, so as to ensure that the system enters a stable state; then starting a user parallel computing task, and starting to create a parallel runtime environment, parallel tasks and data blocks; after the creation is completed, starting calculation when the parallel operation is started, switching the system to a parallel scheduling engine, starting scheduling tasks and data blocks to an acceleration core, and after the acceleration core receives a corresponding request, starting to execute corresponding modules to complete initialization of a task buffer area, loading and repositioning tasks, calculation of the data blocks and reduction of task contexts. After all data block calculation and reduction are completed, the main control core recovers the result and outputs,

the invention can achieve the following effects: 1) The invention is oriented to the multi-core DSP processor, the parallel computing frame is mutually isolated from resource management, and the purposes of self-organizing the computing frame, independently running multiple computing tasks and reducing the coupling relation of different computing tasks are achieved; 2) The multi-dimensional resource self-organizing scheduling strategy is realized, the computing node resources are reasonably allocated, the load balancing is realized, and the execution effect of the computing efficiency is improved.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The parallel computing resource organization scheduling method for the multi-core DSP is characterized by comprising the following steps of:

2. The multi-core DSP oriented parallel computing resource organization scheduling method of claim 1, wherein initializing operations are performed on a log system, a file system, and a user heap space in an embedded DSP operating system.

3. The multi-core DSP oriented parallel computing resource organization scheduling method of claim 1, wherein the initializing operation is performed according to hardware characteristics and registers of hardware.

4. The multi-core DSP oriented parallel computing resource organization scheduling method of claim 1, wherein the data structures, idle tasks, clock processing tasks, and interrupt delay processing tasks of the cores in the embedded DSP operating system are initialized.

5. The multi-core DSP oriented parallel computing resource organization scheduling method of claim 1, wherein the parallel scheduling engine scheduling method is as follows:

6. The multi-core DSP oriented parallel computing resource organization scheduling method according to claim 1, wherein the method for transmitting the scheduling information to the acceleration core for processing is as follows:

wherein each acceleration core processing flow is the same.

7. The multi-core DSP oriented parallel computing resource organization scheduling method of claim 1, wherein the method of clearing parallel tasks is as follows:

8. The parallel computing resource self-organizing scheduling system for the multi-core DSP is characterized by comprising the following components:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the multi-core DSP oriented parallel computing resource organization scheduling method according to any of claims 1 to 7.

10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the multi-core DSP oriented parallel computing resource organization scheduling method of any of claims 1 to 7.