CN114610674A - Programmable real-time stream processing device for polymorphic heterogeneous computing unit - Google Patents

Programmable real-time stream processing device for polymorphic heterogeneous computing unit Download PDF

Info

Publication number
CN114610674A
CN114610674A CN202210094696.3A CN202210094696A CN114610674A CN 114610674 A CN114610674 A CN 114610674A CN 202210094696 A CN202210094696 A CN 202210094696A CN 114610674 A CN114610674 A CN 114610674A
Authority
CN
China
Prior art keywords
component
task
user
heterogeneous
user task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210094696.3A
Other languages
Chinese (zh)
Inventor
李志刚
李玉成
陶磊
项世珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 52 Research Institute
Original Assignee
CETC 52 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 52 Research Institute filed Critical CETC 52 Research Institute
Priority to CN202210094696.3A priority Critical patent/CN114610674A/en
Publication of CN114610674A publication Critical patent/CN114610674A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44521Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a programmable real-time stream processing device facing a polymorphic heterogeneous computing unit, which comprises a heterogeneous computing unit firmware component, a task management component, a secondary development interface component and a visual task programming component, wherein the heterogeneous computing unit firmware component is deployed on a heterogeneous accelerator card and comprises a static part and a dynamic part, the static part is used as resident firmware, and the dynamic part is used as a user task part. The task management component is deployed on the mainboard and used for providing a task component model and a user task component warehouse; the secondary development interface component is deployed on the mainboard and the heterogeneous accelerator card and is used for providing a secondary development interface for a user to develop the user task component; the visual task arrangement component is deployed on a mainboard and used for achieving heterogeneous accelerator card selection, user task component selection, topology arrangement, user task component loading and operation monitoring. The invention is used for realizing the support of various heterogeneous computing units such as CPU/DSP/FPGA/GPU.

Description

Programmable real-time stream processing device for polymorphic heterogeneous computing unit
Technical Field
The application belongs to the technical field of heterogeneous high-performance computing, and particularly relates to a programmable real-time stream processing device for a polymorphic heterogeneous computing unit.
Background
With the vigorous development of HPC, data processing, artificial intelligence and the like, heterogeneous computing has more and more advantages in performance, cost performance, power consumption, area and other indexes. In this situation, heterogeneous computing devices, such as Metal, OpenCL, CUDA, etc., are becoming more and more widely used as integrated heterogeneous computing resources. In a complex heterogeneous device, there are many different heterogeneous computing units, such as CPUs, DSPs, FPGAs, GPUs, etc. The programming framework employed by different heterogeneous computing units is also different. OpenCL is a heterogeneous parallel computing framework standard mainly promoted in the industry, and is supported by most manufacturers in the industry, such as Nvidia, Apple, AMD, ARM, INTEL, TI, and the like, and Nvidia CUDA and Apple Metal are computing private frameworks designed for respective heterogeneous hardware, and are closed systems, but are supported by a wide range of developers.
However, these conventional heterogeneous computing frameworks are designed for only one type of heterogeneous hardware, and cannot support multiple heterogeneous hardware modalities. In addition, in a complex heterogeneous device, a plurality of heterogeneous computing units often need to implement real-time service data stream processing through a plurality of combinations or topological connection manners, such as real-time signal processing, image processing or data analysis, and a traditional heterogeneous computing framework also lacks support for real-time stream processing logic. In this case, the difficulty and complexity of heterogeneous high-performance computation and development in the complex heterogeneous device are high, and a large development and application threshold is provided for technicians.
Disclosure of Invention
The present application aims to provide a programmable real-time stream processing apparatus for polymorphic heterogeneous computing units, so as to solve the problem that the conventional heterogeneous computing framework lacks support for polymorphic heterogeneous computing units and real-time stream processing logic.
In order to achieve the purpose, the technical scheme of the application is as follows:
the programmable real-time stream processing device for the polymorphic heterogeneous computing unit is applied to complex heterogeneous equipment, the complex heterogeneous equipment comprises a main board and a plurality of heterogeneous accelerator cards, the programmable real-time stream processing device for the polymorphic heterogeneous computing unit comprises a heterogeneous computing unit firmware component, a task management component, a secondary development interface component and a visual task arrangement component, wherein:
the heterogeneous computing unit firmware component is deployed on a heterogeneous accelerator card and comprises a static part and a dynamic part, wherein the static part is used as resident firmware and is used for realizing initialization, state monitoring, message and data receiving and sending between the accelerator cards and user task loading of the heterogeneous accelerator card, and the dynamic part is used as a user task part and is dynamically loaded;
the task management component is deployed on the mainboard and used for providing a task component model and a user task component warehouse;
the secondary development interface component is deployed on the mainboard and the heterogeneous accelerator card and is used for providing a secondary development interface for a user to develop the user task component;
the visual task arrangement component is deployed on a mainboard and used for achieving heterogeneous accelerator card selection, user task component selection, topology arrangement, user task component loading and operation monitoring.
Further, the task component model comprises a target file, an MD5 file and a metadata file, wherein the target file is an object code which is realized based on a secondary development interface component and has a processing function; the MD5 file is a check file for ensuring the integrity of a target file; the metadata file is used to describe the target file.
Further, the metadata file includes: a user task component identifier, a user task component version, a user task component attribute, a user task component description, a source task component, a destination task component, an input/output description, and user-defined information.
Further, the user task component warehouse is used for achieving warehousing, browsing, querying and deleting of the user task components.
Furthermore, the visual task orchestration component browses all existing user task components in the system through the user task component warehouse, queries a specified user task component from the user task components which are put in the warehouse, and then, through visual task orchestration, organizes and connects the input and output ports of a plurality of user task components through connecting lines, and dynamically deploys the input and output ports into a specified heterogeneous accelerator card, so that user business process orchestration is realized.
According to the method, through decoupling design based on static and dynamic separation, the static firmware part shields complexity of bottom heterogeneous hardware cross-linking and communication and data interaction, and the design of the dynamic part can enable a user to concentrate on design and development of self business logic and support dynamic loading of user tasks. The user business is flexibly arranged, the arrangement work of the business process can be carried out in the complex heterogeneous equipment consisting of the heterogeneous computing units through the design realization of the task component model and the visual arrangement component, and the user can flexibly combine and connect each task according to the business characteristics and the task component design, so that the efficient real-time processing of the business application is realized. The heterogeneous hardware is supported in various forms, unified task component modeling is carried out aiming at heterogeneous computing units of different types, and an easy-to-use secondary development interface component is provided for a user to carry out heterogeneous computing task development, so that the support for various heterogeneous computing units such as a CPU/DSP/FPGA/GPU is realized.
Drawings
FIG. 1 is a schematic diagram of a complex isomerization plant;
FIG. 2 is a schematic diagram of the components of the present application on a complex heterogeneous plant;
FIG. 3 is a diagram of a task component model;
FIG. 4 is a schematic diagram of visualization of task choreography;
FIG. 5 is a schematic diagram of visualization task orchestration.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
As shown in fig. 1, the complex heterogeneous device includes a motherboard and a plurality of heterogeneous accelerator cards, the motherboard is provided with a central processing unit CPU, and the heterogeneous computing units adopted by each of the heterogeneous accelerator cards may be one or more of a field programmable gate array FPGA, a digital signal processor DSP, or a graphics processing unit GPU.
The application provides a programmable real-time stream processing device for polymorphic heterogeneous computing units, which is applied to the complex heterogeneous equipment and comprises: the system comprises a heterogeneous computing unit firmware component, a task management component, a secondary development interface component and a visual task arrangement component.
The heterogeneous computing unit firmware component is deployed on a heterogeneous accelerator card and comprises a static part and a dynamic part, wherein the static part is used as resident firmware and is used for realizing initialization, state monitoring, message and data receiving and sending between the accelerator cards and user task loading, and the dynamic part is used as a user task part and is dynamically loaded.
In one embodiment, the heterogeneous compute unit firmware component is FPGA firmware designed by FPGA framework technology, which divides the FPGA into a solidified area (static part) and a user area (dynamic part). The solidified area is invisible to the user, a bridge for communicating with upper-layer host end software and FPGA side user area FPGA tasks is built through the solidified area, the solidified area mainly comprises a PCIe protocol controller, DMA data transmission, memory management, address management, a dynamic loading module and the like, and the user area corresponds to the user FPGA tasks. The solidification area is used as FPGA resident firmware to realize the functions of FPGA initialization, state monitoring, data and message receiving and sending, user FPGA task loading and the like; the user area is used as a task part of a user FPGA and can be dynamically loaded; and performing data interaction with the curing part, receiving data through an interface provided by the curing area, performing data processing, and sending the processed data through the interface of the curing area.
In another embodiment, the heterogeneous computing unit firmware component is DSP firmware, and mainly utilizes the characteristic that a multi-core DSP chip supports multiple programs deployed on multiple cores, and controls DSP code switching in a single core in a local reset and IPC interrupt mode of the single core DSP. The program in the DSP can be divided into a Core0 program and other Core programs, the Core0 program is used as a DSP resident firmware (static part) to realize the functions of DSP initialization, state monitoring, data/message receiving and sending, user task loading and the like; other core programs are dynamically loadable as part of the user's DSP task (dynamic part). And performing data interaction with the Core0 program, receiving data through an interface provided by the Core0 program, performing data processing, and sending the processed data through the interface of the Core0 program.
GPUs, however, typically employ the programming model of CUDA (computer Unified Device Architecture), and therefore do not require the development of additional firmware.
The task management component is deployed on the mainboard and comprises a task component model and a user task component warehouse, and the task management component has a good modularization characteristic.
The task component model, as shown in FIG. 2, includes an object file, an MD5 file, and a metadata file. The target file is a target code which is realized based on a secondary development interface component and has a processing function, and is presented in a dynamic library or firmware form; the MD5 file is a check file for ensuring the integrity of a target file; the metadata file is used for describing a target file, wherein input and output descriptions can be used for constructing a task flow diagram, namely input and output ports of a plurality of user task components are connected through connecting lines, and user business processes (such as radar signal processing, remote sensing image processing and the like) are realized.
The metadata can accurately describe the characteristics of the task component, and besides enabling the framework to perform various processing (such as dependency analysis) on the task component, the metadata can also better identify the task component, and a typical metadata file comprises the following information:
the user task component identifier is used as a unique identifier of the task component;
the user task component version describes the version information of the task component;
user task component attributes, CPU user tasks, GPU user tasks, DSP user tasks or FPGA user tasks;
user task component description: describing task component functions such as FFT transformation, filtering, image segmentation, etc.;
a source task component: describing a source task component of the task component, wherein the source task component is used for constructing visual topological connection;
a destination task component: describing a target task component of the task component, wherein the target task component is used for constructing visual topological connection;
inputting a description: the method comprises the steps of inputting a bus type, an input port number, an input data format and type and the like;
outputting a description: the method comprises the steps of outputting a bus type, an output port number, an output data format and type and the like;
and others: and (4) customizing information by a user.
And the user task component warehouse realizes operations of warehousing, browsing, inquiring, deleting and the like of the user task components based on the task component model and the database technology. For example, a user may query based on task component identification, attributes, functionality, version, custom information, etc. to retrieve task components that meet the requirements.
The secondary development interface component provides a secondary development interface for a user to develop a heterogeneous computing task component, and comprises an FPGA API interface library (IP core form), a DSP API interface library and a CPU interface library, wherein the interface library mainly encapsulates functions of data receiving and transmitting, message communication, exception handling and the like among devices. The development and the writing work of a DSP user task component, an FPGA user task component, a CPU user task component and the like can be carried out based on the interface library.
Wherein, FPGA user task subassembly: the user develops a task x bin firmware file based on the interface library IP, and provides a metadata file description of the bin file.
DSP user task component: and the user develops and generates a task based on the DSP interface library, and simultaneously provides a metadata file description of the out file.
CPU user task component: the so file is developed and generated by a user based on the CPU interface library, and simultaneously, the metadata file description of the so file is provided.
GPU user task component: the user writes GPU code based on the OpenCL model.
The visual task orchestration component comprises a background service and a visual client. The method mainly realizes the functions of heterogeneous accelerator card selection, user task component selection, topology arrangement, user task component loading, operation monitoring and the like.
In practical application, the FPGA firmware and the DSP firmware in the heterogeneous computing unit firmware assembly are programmed into the FPGA acceleration card and the DSP acceleration card through a JTAG tool. And analyzing and decomposing the user task component according to the user service application processing flow and the workload behavior by combining the computing type matched with the heterogeneous accelerator card. And then, writing a user task component based on the secondary development interface component, wherein the user task component comprises an FPGA task component, a DSP task component, a CPU task component and a GPU task component.
And then uploading the FPGA user task component, the DSP user task component, the GPU user task component and the CPU user task component by the user through the visual task arranging component, and performing warehousing operation on the user task components through a task management component warehouse.
When the visual task arrangement is carried out, all the existing user task components in the system are browsed through the user task component warehouse, the specified user task components are inquired from the user task components which are put in the warehouse, then the input ports and the output ports of the user task components are arranged and connected through connecting lines through the visual task arrangement, and the input ports and the output ports are dynamically deployed into the specified heterogeneous accelerator card, so that the user business process arrangement is realized.
As shown in fig. 4 and 5, fig. 4 is a logic topology diagram of tasks of a complex heterogeneous device, and for a CPU device, a plurality of CPU user tasks are supported, and for a DSP, a GPU and an FPGA heterogeneous device, only a single user task is supported, and connection lines in the diagram indicate communication and cross-linking relationships between tasks. During visual arrangement, a user can specify the used heterogeneous accelerator cards, CPU user task components and cross-linking relations (as shown by black bold lines in FIG. 5), and a real-time stream processing logic topological graph is generated through device and task connection.
After the task arrangement is completed, the operation flow of the heterogeneous computing framework is started through the visual task arrangement component, the framework automatically and dynamically loads the FPGA task, the DSP task, the GPU task and the CPU task into corresponding computing devices according to the arranged topology, and initialization work is completed. Event-driven task flow operation such as input data, signal triggering or control commands and the like is adopted, and real-time flowing and processing of data among tasks/devices are achieved in real time.
After the tasks are loaded and run, the running condition of each task component, the data flow of the input/output port of each heterogeneous device, the running state of each heterogeneous device and the like can be checked in real time based on the visual task arranging component.
When some task component users no longer need to use it, it can be deleted through the user task component repository.
According to the method, through decoupling design based on static and dynamic separation, the static firmware part shields complexity of bottom heterogeneous hardware cross-linking and communication and data interaction, and the design of the dynamic part can enable a user to concentrate on design and development of self business logic and support dynamic loading of user tasks. The user business is flexibly arranged, the arrangement work of the business process can be carried out in the complex heterogeneous equipment consisting of the heterogeneous computing units through the design realization of the task component model and the visual arrangement component, and the user can flexibly combine and connect each task according to the business characteristics and the task component design, so that the efficient real-time processing of the business application is realized. The heterogeneous hardware is supported in various forms, unified task component modeling is carried out aiming at heterogeneous computing units of different types, and an easy-to-use secondary development interface component is provided for a user to carry out heterogeneous computing task development, so that the support for various heterogeneous units such as a CPU/DSP/FPGA/GPU is realized.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (5)

1. The programmable real-time stream processing device for the polymorphic heterogeneous computing unit is applied to complex heterogeneous equipment, and is characterized in that the complex heterogeneous equipment comprises a main board and a plurality of heterogeneous accelerator cards, and the programmable real-time stream processing device for the polymorphic heterogeneous computing unit comprises a heterogeneous computing unit firmware component, a task management component, a secondary development interface component and a visual task arrangement component, wherein:
the heterogeneous computing unit firmware component is deployed on a heterogeneous accelerator card and comprises a static part and a dynamic part, wherein the static part is used as resident firmware and is used for realizing initialization, state monitoring, message and data receiving and sending between the accelerator cards and user task loading of the heterogeneous accelerator card, and the dynamic part is used as a user task part and is dynamically loaded;
the task management component is deployed on the mainboard and used for providing a task component model and a user task component warehouse;
the secondary development interface component is deployed on the mainboard and the heterogeneous accelerator card and is used for providing a secondary development interface for a user to develop the user task component;
the visual task arrangement component is deployed on a mainboard and used for achieving heterogeneous accelerator card selection, user task component selection, topology arrangement, user task component loading and operation monitoring.
2. The choreographed real-time stream processing device facing the polymorphic heterogeneous computing unit according to claim 1, wherein the task component model comprises an object file, an MD5 file and a metadata file, and the object file is an object code with a processing function realized based on a secondary development interface component; the MD5 file is a check file for ensuring the integrity of a target file; the metadata file is used to describe the target file.
3. The choreographed real-time stream processing apparatus for polymorphic heterogeneous computing units according to claim 2, wherein the metadata file includes: a user task component identifier, a user task component version, a user task component attribute, a user task component description, a source task component, a destination task component, an input/output description, and user-defined information.
4. The choreographed real-time stream processing apparatus for polymorphic heterogeneous computing units according to claim 1, wherein the user task component repository is configured to implement user task component warehousing, browsing, querying, and deleting.
5. The choreography real-time stream processing apparatus for polymorphic heterogeneous computing units according to claim 1, wherein the visual task choreography component browses all existing user task components in the system through a user task component warehouse, queries a designated user task component from the warehoused user task components, then links the input and output ports of a plurality of user task components through connection choreography by visual task choreography, and dynamically deploys the input and output ports to a designated heterogeneous accelerator card to realize user business process choreography.
CN202210094696.3A 2022-01-26 2022-01-26 Programmable real-time stream processing device for polymorphic heterogeneous computing unit Pending CN114610674A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210094696.3A CN114610674A (en) 2022-01-26 2022-01-26 Programmable real-time stream processing device for polymorphic heterogeneous computing unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210094696.3A CN114610674A (en) 2022-01-26 2022-01-26 Programmable real-time stream processing device for polymorphic heterogeneous computing unit

Publications (1)

Publication Number Publication Date
CN114610674A true CN114610674A (en) 2022-06-10

Family

ID=81859519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210094696.3A Pending CN114610674A (en) 2022-01-26 2022-01-26 Programmable real-time stream processing device for polymorphic heterogeneous computing unit

Country Status (1)

Country Link
CN (1) CN114610674A (en)

Similar Documents

Publication Publication Date Title
US8874681B2 (en) Remote direct memory access (‘RDMA’) in a parallel computer
US8438578B2 (en) Network on chip with an I/O accelerator
US8495655B2 (en) Messaging in a parallel computer using remote direct memory access (‘RDMA’)
US9607116B2 (en) Distributed hardware device simulation
JP7012689B2 (en) Command execution method and device
US20110289485A1 (en) Software Trace Collection and Analysis Utilizing Direct Interthread Communication On A Network On Chip
US8689228B2 (en) Identifying data communications algorithms of all other tasks in a single collective operation in a distributed processing system
US20090300154A1 (en) Managing performance of a job performed in a distributed computing system
CN103793326A (en) Assembly test method and device
CN115629296B (en) Chip testing method, device, equipment and storage medium
CN105338045A (en) Cloud computing resource processing device, method and cloud computing system
Abbani et al. A distributed reconfigurable active SSD platform for data intensive applications
CN110717268B (en) Portable component unit packaging method based on FACE architecture
CN114398179B (en) Method and device for acquiring tracking identifier, server and storage medium
CN111966739A (en) Method and equipment for processing graph data
CN116662039B (en) Industrial information parallel detection method, device and medium based on shared memory
CN114610674A (en) Programmable real-time stream processing device for polymorphic heterogeneous computing unit
CN115617407A (en) Hardware driving method of embedded operating system
CN110825664A (en) Information processing system and method
US10713103B2 (en) Lightweight application programming interface (API) creation and management
CN110968566A (en) Migration tool-based domestic application system migration method
KR102443301B1 (en) Adaptive data processing system for processing various data and method thereof
EP4167069A1 (en) System, method, and device for accessing device program on storage device
CN111488216B (en) Data processing method and device and electronic equipment
US8914498B2 (en) Calculating a checksum with inactive networking components in a computing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination