WO2023125463A1 - Heterogeneous computing framework-based processing method and apparatus, and device and medium - Google Patents

Heterogeneous computing framework-based processing method and apparatus, and device and medium Download PDF

Info

Publication number
WO2023125463A1
WO2023125463A1 PCT/CN2022/142134 CN2022142134W WO2023125463A1 WO 2023125463 A1 WO2023125463 A1 WO 2023125463A1 CN 2022142134 W CN2022142134 W CN 2022142134W WO 2023125463 A1 WO2023125463 A1 WO 2023125463A1
Authority
WO
WIPO (PCT)
Prior art keywords
heterogeneous
data
processing
processing unit
framework
Prior art date
Application number
PCT/CN2022/142134
Other languages
French (fr)
Chinese (zh)
Inventor
罗恒锋
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023125463A1 publication Critical patent/WO2023125463A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a processing method, device, device and medium based on a heterogeneous computing framework.
  • the demand for logical computing power of the computing system is getting higher and higher.
  • the current computing system is directly implemented on the algorithm module according to the business requirements of the upper layer, such as the CPU algorithm module and the GPU algorithm module. . If the upper-layer business needs to expand or change the algorithm implementation, the implementation between the algorithm module and the upper layer should also be modified synchronously.
  • the disclosure provides a processing method, device, device and medium based on a heterogeneous computing framework.
  • an embodiment of the present disclosure provides a processing method based on a heterogeneous computing framework, the method including:
  • heterogeneous computing engine framework by invoking an interface of at least one heterogeneous processing unit according to a data processing task, wherein the heterogeneous computing engine framework includes at least one data input terminal, at least one said heterogeneous processing unit, and at least one data output terminal;
  • creating a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit includes:
  • the algorithm module includes: any one or more of a central processing unit CPU, a digital signal processing module DSP, an image processor module GPU, an application specific integrated circuit module ASIC, and a field programmable logic gate array module FPGA a combination.
  • the method also includes:
  • the method also includes:
  • All the multiple subtask results are provided to the data output terminal, wherein the data output terminal is used to combine the multiple subtask results.
  • the method also includes:
  • a self-defined algorithm node is inserted into the framework of the heterogeneous computing engine, wherein the self-defined algorithm node is connected to the data input end or the output end of the heterogeneous processing unit, and the self-defined algorithm node is used Calculate the data provided by the data input terminal or the output terminal of the heterogeneous processing unit according to a self-defined algorithm, and feed back the calculated result to the data input terminal or the heterogeneous processing unit In the next node connected to the output terminal of .
  • the method also includes:
  • a position termination node is inserted into the framework of the heterogeneous computing engine, wherein the position termination node is connected to the output end of the heterogeneous processing unit, and the position termination node is used to stop the output end of the heterogeneous processing unit Subsequent processing nodes continue to calculate, and output the data provided by the output terminals of the heterogeneous processing units as processing results.
  • an embodiment of the present disclosure further provides a processing device based on a heterogeneous computing framework, the device comprising:
  • a first creating module configured to create a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each of the heterogeneous processing units is generated by encapsulating at least one algorithm module;
  • the second creation module is used to create a heterogeneous computing engine framework by invoking the interface of at least one heterogeneous processing unit according to the data processing task, wherein the heterogeneous computing engine framework includes at least one data input terminal, and at least one of the heterogeneous processing units unit, and at least one data output; and
  • the acquisition module is used to call the application program interface API of the required algorithm module according to the heterogeneous computing engine framework to perform the data processing task, and output the task processing result to the target device.
  • the present disclosure provides a computer-readable storage medium, where computer programs/instructions are stored in the computer-readable storage medium, and when the computer programs/instructions are executed by a processor, the processor can realize the above method.
  • the present disclosure provides an electronic device, including: a processor; and a memory for storing instructions executable by the processor; the processor is used for reading the instructions from the memory, and Execute the above method.
  • the present disclosure provides a computer program product, where the computer program product includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the above method is implemented.
  • the present disclosure provides a computer program, where the computer program includes instructions, and when the instructions are executed by a processor, the above method is implemented.
  • FIG. 1 is a schematic flowchart of a processing method based on a heterogeneous computing framework provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a heterogeneous processing unit provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a heterogeneous computing engine framework provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic flowchart of another processing method based on a heterogeneous computing framework provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of another heterogeneous computing engine framework provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic flowchart of another processing method based on a heterogeneous computing framework provided by an embodiment of the present disclosure
  • FIG. 7 is a schematic diagram of another heterogeneous computing engine framework provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of inserting a custom algorithm node provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of an insertion position termination node provided by an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of inserting a custom algorithm node and a position termination node provided by an embodiment of the present disclosure
  • FIG. 11 is a schematic structural diagram of a processing device based on a heterogeneous computing framework provided by an embodiment of the present disclosure.
  • Fig. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • the current computing system has low computing efficiency, takes a long time to calculate, is not suitable for expansion, is not flexible, and cannot meet the computing power requirements of the current business.
  • the processing method based on the heterogeneous computing framework of the embodiment of the present disclosure creates a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each of the heterogeneous processing units is implemented through Generated by encapsulating at least one algorithm module; calling at least one interface of a heterogeneous processing unit according to a data processing task to create a heterogeneous computing engine framework, wherein the heterogeneous computing engine framework includes at least one data input terminal, and at least one of the heterogeneous A processing unit, and at least one data output terminal; call the application program interface API of the required algorithm module according to the heterogeneous computing engine framework to execute the data processing task, and output the task processing result to the target device.
  • a heterogeneous computing engine framework is established through a heterogeneous processing unit according to a data processing task, and the data processing task is connected through the heterogeneous computing engine framework.
  • Adjust the heterogeneous computing engine framework to meet the corresponding changes, so it is suitable for expansion and has strong flexibility. It can improve processing efficiency for large-scale algorithm systems, and integrates the characteristics of different algorithm modules in heterogeneous processing units through the heterogeneous computing engine framework. , so as to take advantage of the computing power of heterogeneous processing units, improve computing efficiency, reduce computing time, and improve the performance of heterogeneous processing units, thereby meeting the current business needs for computing power.
  • the hierarchical design of the architecture, the modular design, the cascading of algorithms, and the intelligent scheduling of hybrid computing are realized through the heterogeneous computing engine framework, thereby improving the efficiency of the framework design.
  • an embodiment of the present disclosure provides a processing method based on a heterogeneous computing framework, which will be introduced in conjunction with specific embodiments below.
  • FIG. 1 is a schematic flowchart of a processing method based on a heterogeneous computing framework provided by an embodiment of the present disclosure.
  • the method can be executed by a processing device based on a heterogeneous computing framework, where the device can be implemented by software and/or hardware.
  • the method includes:
  • Step 101 create a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each heterogeneous processing unit is generated by encapsulating at least one algorithm module.
  • the data processing task is implemented based on heterogeneous processing units, and the number of heterogeneous processing units implementing the data processing task may be one or multiple.
  • the heterogeneous processing unit is an abstract execution layer for data processing tasks.
  • At least one algorithm module is encapsulated in the heterogeneous processing unit. If there are multiple algorithm modules encapsulated in a heterogeneous processing unit, the types of the multiple algorithm modules can be The same can also be different. Wherein, the differences of different types of algorithm modules include but are not limited to: one or more of different process architectures, different instruction sets, and different functions.
  • the algorithm module includes: a central processing unit (Central Processing Unit, CPU), a digital signal processing module (Digital Signal Processing, DSP), an image processor module (Graphics Processing Unit, GPU), an application-specific integrated circuit module ( Application Specific Integrated Circuit, ASIC), field programmable logic gate array module (Field Programmable Gate Array, FPGA).
  • a heterogeneous processing unit may be packaged with any one of the above-mentioned algorithm modules, or may be packaged with any number of the above-mentioned algorithm modules.
  • a CPU, GPU, and DSP can be packaged in a heterogeneous processing unit.
  • the application program interface includes: a call interface (backend interface) capable of invoking the corresponding algorithm module and capable of controlling the corresponding The task running interface (runtime interface) that the algorithm module runs.
  • a call interface backend interface
  • the task running interface runtime interface
  • FIG. 2 is a schematic diagram of a heterogeneous processing unit provided by an embodiment of the present disclosure.
  • Backend represents an invocation interface
  • Runtime represents a task execution interface
  • Process represents a heterogeneous processing unit.
  • the subsequent embodiments And in the accompanying drawings for simplicity of expression, the involved heterogeneous processing units may also be simply described as Process.
  • the image recognition function is implemented through a Process, which can be packaged with GPU and CPU.
  • the GPU Backend interface and the CPU Backend interface are included in the Backend of Figure 2; the GPU Runtime interface and the CPU Runtime interface are included in the Runtime of Figure 2.
  • Step 102 call at least one heterogeneous processing unit interface to create a heterogeneous computing engine framework according to the data processing task, wherein the heterogeneous computing engine framework includes at least one data input terminal, at least one heterogeneous processing unit, and at least one data output terminal .
  • the interface of at least one heterogeneous processing unit can be called to create a heterogeneous computer engine framework.
  • the heterogeneous computing engine framework includes one or more data input ports for input data and one or more data output ports for output data, and at least one heterogeneous Processing unit, the at least one heterogeneous processing unit may be packaged with one or more algorithm modules, for example, a heterogeneous computing engine framework may include a heterogeneous processing unit, and the one heterogeneous processing unit is packaged with two or, the framework of the heterogeneous computing engine may include two heterogeneous processing units, each of which is packaged with an algorithm module, and the types of the two algorithm modules are different.
  • the connection relationship between the data input terminal, heterogeneous processing unit and data output terminal in the above-mentioned heterogeneous computing engine framework can be matched and set according to the data processing task, etc., so when the data processing task changes, it is necessary to expand or change the algorithm to achieve
  • the data processing task can be realized by adjusting the heterogeneous computing engine framework.
  • the cascade processing of multiple algorithm modules can be realized through the connection relationship in the heterogeneous computing engine framework
  • the multiplexing of algorithm modules can also be realized through the connection relationship in the heterogeneous computing engine framework (for example: image pre-processing and image post-processing). Reuse of relevant algorithms in processing), etc.
  • the number of heterogeneous processing units in the heterogeneous computing engine framework can be determined according to the performance of a single heterogeneous processing unit, compatibility of devices, characteristics of algorithms executed by the heterogeneous processing unit, and the like. Embodiments of the present disclosure do not limit the aforementioned heterogeneous computing engine framework.
  • FIG. 3 is a schematic diagram of a heterogeneous computing engine framework provided by an embodiment of the present disclosure.
  • the heterogeneous computing engine framework includes a data input terminal and a data output terminal, and 7 Process, wherein part of the Process is packaged with a CPU, and another part of the Process is packaged with a GPU. If image processing is performed based on the heterogeneous computing engine framework, due to the image pre-processing part (Process2 and Process3) and image post-processing ( Process5 and Process6) can be reused, so Process2 can be the same as Process5, and Process3 can be the same as Process6.
  • the data transmission direction in the framework of the heterogeneous computing engine is from the data input end to Process1, from Process1 to Process2 and Process4, from Process2 to Process3, from Process3 and Process1 to Process4, from Process4 to Process5 and Process7, from Process5 to Process6, from Process6 and Process4 to Process7, from Process7 to data output.
  • Step 103 call the application program interface API of the required algorithm module according to the framework of the heterogeneous computing engine to execute the data processing task, and output the task processing result to the target device.
  • the business processing results of data processing tasks can be obtained based on the heterogeneous computing engine framework.
  • the application program interface corresponding to the required algorithm module according to the heterogeneous computing engine framework
  • the API performs data processing tasks, thereby realizing the orderly scheduling of the operation of the algorithm modules, and then outputs the task processing results to the target device.
  • the target device may be selected according to the application scenario, which is not limited in the embodiment of the present disclosure, for example, a mobile phone, a microcomputer, and the like.
  • the Process is packaged with CPU and GPU. If the CPU needs to be used according to the framework of the heterogeneous computing engine, the CPU Backend interface is called to start the CPU, and the CPU Runtime interface is called to control the operation of the CPU. , similarly, if the GPU needs to be used and run according to the framework of the heterogeneous computing engine, the GPU Backend interface and the GPU Runtime interface should be called. The order of calling is determined and will not be repeated here.
  • the processing method based on the heterogeneous computing framework of the embodiment of the present disclosure creates a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each heterogeneous processing unit is packaged by at least Generated by an algorithm module; call at least one heterogeneous processing unit interface to create a heterogeneous computing engine framework according to the data processing task, wherein the heterogeneous computing engine framework includes at least one data input terminal, at least one heterogeneous processing unit, and at least one Data output terminal: call the application program interface API of the required algorithm module according to the heterogeneous computing engine framework to perform data processing tasks, and output the task processing results to the target device.
  • the corresponding algorithms are directly run on the algorithm modules such as CPU and GPU, and the data processing tasks are connected by the specific algorithms. If the algorithm needs to be expanded or changed, the realization of the data processing tasks is also Synchronous modification is required, so it is not suitable for expansion, inflexible, and the processing efficiency for large-scale algorithm systems is very low.
  • a heterogeneous computing engine framework is established through heterogeneous processing units according to data processing tasks, and the data processing tasks are connected through the heterogeneous computing engine framework.
  • the application program interface between the computing engine framework and data processing tasks does not need to be modified synchronously, so it is suitable for expansion and has strong flexibility. It can improve the processing efficiency for large-scale algorithm systems, and integrates heterogeneous processing units through the heterogeneous computing engine framework.
  • the characteristics of different algorithm modules in the system can take advantage of the computing power of heterogeneous processing units, improve computing efficiency, reduce computing time, and improve the performance of heterogeneous processing units, so as to meet the current business needs for computing power.
  • the hierarchical design of the architecture, the modular design, the cascading of algorithms, and the intelligent scheduling of hybrid computing are realized through the heterogeneous computing engine framework, thereby improving the efficiency of the framework design.
  • FIG. 4 is another processing based on the heterogeneous computing framework provided by the embodiment of the present disclosure
  • the schematic flow chart of the method as shown in Figure 4, also includes the following steps in the method:
  • Step 401 performing split processing on the input data of the heterogeneous processing units to generate multiple data blocks.
  • the input data is segmented.
  • the input data may be divided according to the data type of the input data to obtain corresponding data blocks, and the data in the same data block obtained by the division may belong to the same data type.
  • Step 402 establishing multiple computing threads corresponding to multiple data blocks in the heterogeneous processing unit, wherein the multiple computing threads are used to execute multiple data blocks in parallel to generate multiple corresponding data processing results.
  • multiple calculation threads corresponding to the data block can be established, so as to call the application program interface (for example: Backend interface and Runtime interface) based on the thread, and process the data block based on the thread, for Each data block generates a corresponding data processing result.
  • the application program interface for example: Backend interface and Runtime interface
  • Step 403 combining and processing multiple data processing results to generate output data corresponding to heterogeneous processing units.
  • the multiple data processing results may be merged to generate output data corresponding to the heterogeneous processing unit.
  • the input data is divided into two data blocks, the data related to digital signal processing is divided into the third data block, and other types of data are divided into
  • the data is divided into first data blocks.
  • Set up the first calculation thread Thread1 for processing the first data block call the first calling interface Backend1 and the first task running interface Runtime1 based on Thread1, thereby run the CPU to process the first data block, and generate the first data processing result;
  • establish The second calculation thread Thread2 for processing the second data block calls the second calling interface Backend2 and the second task running interface Runtime2 based on Thread2, thereby running the GPU to process the second data block and generating the second data processing result;
  • the third computing thread Thread3 for processing the third data block calls the third calling interface Backend3 and the third task running interface Runtime3 based on Thread3, thereby running the DSP to process the third data block and generating a third data processing result.
  • Thread1, Thread2, and Thread3 may be executed in parallel in the Process, so as to improve the efficiency of input data processing.
  • the processing method based on the heterogeneous computing framework of the embodiment of the present disclosure can divide the input data into blocks and establish computing threads corresponding to the data blocks, thereby further improving the processing efficiency of the hardware and reducing the processing time.
  • FIG. 6 is another framework based on heterogeneous computing provided by an embodiment of the present disclosure
  • a schematic flow chart of the processing method, as shown in Figure 6, also includes the following steps in the method:
  • Step 601 determining multiple subtasks corresponding to the data processing task, and establishing a corresponding subflowchart for each subtask.
  • the data processing tasks are subdivided.
  • the data processing tasks can be subdivided according to the business type. After processing, multiple subtasks can be obtained, and then a corresponding subflowchart can be established for each subtask.
  • the number of heterogeneous processing units in the subtask flowchart and the number and type of algorithm modules in each heterogeneous processing unit are disclosed in this disclosure. The embodiment is not limited, and those skilled in the art can design the sub-flowchart according to the processing capabilities of the heterogeneous processing units and the complexity of the sub-tasks.
  • Step 602 simultaneously input data of at least one data input terminal to multiple sub-flowcharts corresponding to multiple sub-tasks, wherein the multiple sub-flowcharts are used to execute heterogeneous calculations in parallel to generate multiple sub-task results.
  • data from at least one data input terminal may be input into the above-mentioned multiple sub-flowcharts at the same time, and then heterogeneous calculations are performed according to each sub-flowchart, thereby generating multiple subtask results.
  • Step 603 providing multiple subtask results to the data output terminal, wherein the data output terminal is used to combine the multiple subtask results.
  • the results of multiple subtasks are merged through the data output terminal to obtain output data.
  • the target task is to collect images and perform face recognition
  • the target task is divided into three subtasks: image noise reduction, image feature extraction, and processing the collected digital signals. Therefore, for image noise reduction Set up the first sub-flowchart, the Process in the first sub-flowchart is encapsulated with CPU; the second sub-flowchart is established for image feature extraction, the Process in the second sub-flowchart is encapsulated with GPU; the second is established for digital signal processing Three sub-flowcharts, the Process in the third sub-flowchart is encapsulated with DSP.
  • Input the collected data into the above three sub-flowcharts, execute heterogeneous calculations in parallel according to the corresponding sub-flowcharts, and then generate the results of the first subtask corresponding to the first subtask and the results of the second subtask corresponding to the second subtask and the result of the third subtask corresponding to the third subtask, and then the data output end combines the results of the above three subtasks to obtain the output data.
  • the processing method based on the heterogeneous computing framework of the embodiment of the present disclosure can divide the target task into multiple subtasks, and establish a corresponding subflowchart for each subtask, thereby further improving the processing efficiency of the hardware and reducing the processing time. duration.
  • the custom algorithm node is connected to the data input terminal or the output terminal of the heterogeneous processing unit, and the custom algorithm node is used to process the data input terminal according to the custom algorithm, or the data provided by the output terminal of the heterogeneous processing unit, and feed back the calculated result to the next node connected to the data input terminal or the output terminal of the heterogeneous processing unit.
  • the custom algorithm includes an algorithm written by the user according to the requirement, an algorithm selected by the user from multiple provided algorithms according to the requirement, and the like.
  • FIG. 8 is a schematic diagram of inserting a self-defined algorithm node provided by an embodiment of the present disclosure. As shown in FIG. 8, a first self-defined algorithm node is inserted between the data input terminal and the first heterogeneous processing unit. And a second custom algorithm node is inserted between the first heterogeneous processing unit and the second heterogeneous processing unit.
  • the data is input from the data input terminal, it is processed by the first custom algorithm node and input to the first heterogeneous processing unit, and then processed by the first heterogeneous processing unit and input to the second custom algorithm node , is processed by the second custom algorithm node and input to the second heterogeneous processing unit, and the second heterogeneous processing unit continues to process.
  • a location termination node can also be inserted into the heterogeneous computing engine framework, specifically including:
  • a position termination node is inserted into the framework of the heterogeneous computing engine, where the position termination node is connected to the output end of the heterogeneous processing unit, and the position termination node is used to stop the processing nodes after the output end of the heterogeneous processing unit to continue computing, and the heterogeneous
  • the data provided by the output terminal of the structural processing unit is output as the processing result.
  • FIG. 9 is a schematic diagram of an insertion position termination node provided by an embodiment of the present disclosure. As shown in FIG. 9 , it is necessary to output the processing result of the second heterogeneous processing unit after the second heterogeneous processing unit, then The node may be terminated at the output connection location of the second heterogeneous processing unit.
  • the node When performing heterogeneous computing, after the data is input from the data input terminal, after being processed by the first heterogeneous processing unit, it is input into the second heterogeneous processing unit, and after being processed by the second heterogeneous processing unit, the second heterogeneous The data output by the processing unit is output as a processing result.
  • FIG. 10 is a schematic diagram of inserting a custom algorithm node and a location termination node provided by an embodiment of the present disclosure , as shown in FIG. 10 , insert a custom algorithm node between the data output terminal and the first heterogeneous processing unit, and output the processing result of the first heterogeneous processing unit after the first heterogeneous processing unit.
  • the data is input from the data input terminal, after being processed by a custom algorithm node, it is input to the first heterogeneous processing unit, and the first heterogeneous processing unit performs processing, and the first heterogeneous processing unit The output data is output as the processing result.
  • the processing method based on the heterogeneous computing framework of the embodiment of the present disclosure can insert a custom algorithm node and/or a location termination node in the heterogeneous computing engine framework, and the user can flexibly insert functions that meet their own or scenario requirements, Moreover, the output nodes of the computing framework are determined flexibly, thereby improving the scalability of the heterogeneous computing framework, and making the applicable scenarios of the heterogeneous computing framework more abundant.
  • FIG. 11 is a schematic structural diagram of a processing device based on a heterogeneous computing framework provided by an embodiment of the present disclosure.
  • the device can be implemented by software and/or hardware, and can generally be integrated into an electronic device. As shown in Figure 11, the device includes:
  • the first creation module 1101 is configured to create a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each heterogeneous processing unit is generated by encapsulating at least one algorithm module;
  • the second creation module 1102 is configured to call at least one interface of a heterogeneous processing unit according to a data processing task to create a heterogeneous computing engine framework, wherein the heterogeneous computing engine framework includes at least one data input terminal, and at least one of the heterogeneous a processing unit, and at least one data output;
  • the obtaining module 1103 is configured to call the application program interface API of the required algorithm module according to the framework of the heterogeneous computing engine to execute the data processing task, and output the task processing result to the target device.
  • the first creation module 1101 is configured to:
  • the algorithm module includes: a central processing unit CPU, a digital signal processing module DSP, an image processor module GPU, an application-specific integrated circuit module ASIC, field programmable Any one or combination of logic gate array modules FPGA.
  • the device also includes:
  • a segmentation module configured to perform segmentation processing on the input data of the heterogeneous processing unit to generate multiple data blocks
  • the first establishment module is configured to establish a plurality of calculation threads corresponding to the plurality of data blocks in the heterogeneous processing unit, wherein the plurality of calculation threads are used to perform parallel execution on the plurality of data blocks to generate corresponding Multiple data processing results of ;
  • a generating module configured to combine and process the multiple data processing results to generate output data corresponding to the heterogeneous processing units.
  • the device also includes:
  • a second building module configured to determine a plurality of subtasks corresponding to the data processing task, and establish a corresponding subflow chart for each of the subtasks;
  • the input module is configured to simultaneously input the data of the at least one data input terminal to multiple sub-flowcharts corresponding to the multiple sub-tasks, wherein the multiple sub-flowcharts are used to perform heterogeneous calculations in parallel to generate multiple sub-task results ;
  • An output module configured to provide the multiple subtask results to the data output terminal, wherein the data output terminal is used to combine the multiple subtask results.
  • the device also includes:
  • a first insertion module configured to insert a self-defined algorithm node into the framework of the heterogeneous computing engine, wherein the self-defined algorithm node is connected to the data input end or the output end of the heterogeneous processing unit,
  • the self-defined algorithm node is used to calculate the data provided by the data input terminal or the output terminal of the heterogeneous processing unit according to a self-defined algorithm, and feed back the calculated result to the data input terminal, Or in the next node connected to the output end of the heterogeneous processing unit.
  • the device also includes:
  • the second insertion module is used to insert a position termination node into the framework of the heterogeneous computing engine, wherein the position termination node is connected to the output end of the heterogeneous processing unit, and the position termination node is used to stop the The processing nodes after the output end of the heterogeneous processing unit continue to calculate, and output the data provided by the output end of the heterogeneous processing unit as a processing result.
  • the processing device based on the heterogeneous computing framework provided by the embodiments of the present disclosure can execute the processing method based on the heterogeneous computing framework provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
  • the present disclosure further proposes a computer program product, including computer programs/instructions, and when the computer program/instructions are executed by a processor, the processing method based on the heterogeneous computing framework in the above embodiments is implemented.
  • the present disclosure also proposes a computer program, including/instructions, and when the instructions are executed by a processor, the processing method based on the heterogeneous computing framework in the above embodiments is implemented.
  • Fig. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 12 shows a schematic structural diagram of an electronic device 1200 suitable for implementing an embodiment of the present disclosure.
  • the electronic device 1200 in the embodiment of the present disclosure may include, but not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), vehicle-mounted terminals ( Mobile terminals such as car navigation terminals) and stationary terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 12 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device 1200 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 1201, which may be randomly accessed according to a program stored in a read-only memory (ROM) 1202 or loaded from a storage device 1208. Various appropriate actions and processes are executed by programs in the memory (RAM) 1203 . In the RAM 1203, various programs and data necessary for the operation of the electronic device 1200 are also stored.
  • the processing device 1201, ROM 1202, and RAM 1203 are connected to each other through a bus 1204.
  • An input/output (I/O) interface 1205 is also connected to the bus 1204 .
  • the following devices can be connected to the I/O interface 1205: input devices 1206 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 1207 such as a computer; a storage device 1208 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 1209.
  • the communication means 1209 may allow the electronic device 1200 to perform wireless or wired communication with other devices to exchange data. While FIG. 12 shows electronic device 1200 having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 1209, or from storage means 1208, or from ROM 1202.
  • the processing device 1201 When the computer program is executed by the processing device 1201, the above-mentioned functions defined in the processing method based on the heterogeneous computing framework of the embodiment of the present disclosure are executed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
  • HTTP HyperText Transfer Protocol
  • the communication eg, communication network
  • Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device:
  • each heterogeneous processing unit is generated by encapsulating at least one algorithm module; call at least one heterogeneous processing unit according to the data processing task Create a heterogeneous computing engine framework through the interface, where the heterogeneous computing engine framework includes at least one data input terminal, at least one heterogeneous processing unit, and at least one data output terminal; call the application of the required algorithm module according to the heterogeneous computing engine framework
  • the program interface API executes data processing tasks and outputs task processing results to the target device.
  • a heterogeneous computing engine framework is established through a heterogeneous processing unit according to a data processing task, and the data processing task is connected through the heterogeneous computing engine framework.
  • Adjust the heterogeneous computing engine framework to meet the corresponding changes, so it is suitable for expansion and has strong flexibility. It can improve processing efficiency for large-scale algorithm systems, and integrates the characteristics of different algorithm modules in heterogeneous processing units through the heterogeneous computing engine framework. , so as to take advantage of the computing power of heterogeneous processing units, improve computing efficiency, reduce computing time, and improve the performance of heterogeneous processing units, thereby meeting the current business needs for computing power.
  • the hierarchical design of the architecture, the modular design, the cascading of algorithms, and the intelligent scheduling of hybrid computing are realized through the heterogeneous computing engine framework, thereby improving the efficiency of the framework design.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)
  • Agricultural Chemicals And Associated Chemicals (AREA)

Abstract

The embodiments of the present disclosure relate to a heterogeneous computing framework-based processing method and apparatus, and a device and a medium. The method comprises: creating a corresponding application program interface (API) for each algorithm module in at least one heterogeneous processing unit, wherein the heterogeneous processing unit is generated by encapsulating at least one algorithm module; according to a data processing task, calling an interface of the at least one heterogeneous processing unit to create a heterogeneous computing engine framework; and according to the heterogeneous computing engine framework, calling an API of a required algorithm module to execute the data processing task, and then outputting a task processing result to a target device.

Description

基于异构计算框架的处理方法、装置、设备及介质Processing method, device, equipment and medium based on heterogeneous computing framework
相关申请的交叉引用Cross References to Related Applications
本申请是以申请号为202111629485.7,申请日为2021年12月28日的中国申请为基础,并主张其优先权,该中国申请的公开内容在此作为整体引入本申请中。This application is based on the Chinese application with the application number 202111629485.7 and the filing date is December 28, 2021, and claims its priority. The disclosure content of the Chinese application is hereby incorporated into this application as a whole.
技术领域technical field
本公开涉及计算机技术领域,尤其涉及一种基于异构计算框架的处理方法、装置、设备及介质。The present disclosure relates to the field of computer technology, and in particular to a processing method, device, device and medium based on a heterogeneous computing framework.
背景技术Background technique
伴随着业务需求的多样化和复杂化,对计算系统的逻辑算力需求越来越高,但是当前计算系统中是根据上层业务需求直接在算法模块上进行实现,比如CPU算法模块、GPU算法模块。如果上层业务需要扩展或者更改算法实现,算法模块与上层之间的实现也要同步修改。With the diversification and complexity of business requirements, the demand for logical computing power of the computing system is getting higher and higher. However, the current computing system is directly implemented on the algorithm module according to the business requirements of the upper layer, such as the CPU algorithm module and the GPU algorithm module. . If the upper-layer business needs to expand or change the algorithm implementation, the implementation between the algorithm module and the upper layer should also be modified synchronously.
发明内容Contents of the invention
本公开提供了一种基于异构计算框架的处理方法、装置、设备及介质。The disclosure provides a processing method, device, device and medium based on a heterogeneous computing framework.
第一方面,本公开实施例提供了一种基于异构计算框架的处理方法,所述方法包括:In a first aspect, an embodiment of the present disclosure provides a processing method based on a heterogeneous computing framework, the method including:
对至少一个异构处理单元中的每个算法模块创建对应的应用程序接口API,其中,每个所述异构处理单元是通过封装至少一个算法模块生成的;Create a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each said heterogeneous processing unit is generated by encapsulating at least one algorithm module;
根据数据处理任务调用至少一个异构处理单元的接口创建异构计算引擎框架,其中,所述异构计算引擎框架包括至少一个数据输入端,至少一个所述异构处理单元,以及至少一个数据输出端;以及Create a heterogeneous computing engine framework by invoking an interface of at least one heterogeneous processing unit according to a data processing task, wherein the heterogeneous computing engine framework includes at least one data input terminal, at least one said heterogeneous processing unit, and at least one data output terminal; and
根据所述异构计算引擎框架调用所需算法模块的应用程序接口API执行所述数据处理任务,并向目标设备输出任务处理结果。Call the application program interface API of the required algorithm module according to the heterogeneous computing engine framework to execute the data processing task, and output the task processing result to the target device.
在一些实施例中,所述对至少一个异构处理单元中的每个算法模块创建对应的应用程序接口API,包括:In some embodiments, creating a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit includes:
创建与每个算法模块对应的调用接口和任务运行接口。Create a calling interface and task running interface corresponding to each algorithm module.
在一些实施例中,所述算法模块包括:中央处理器模块CPU、数字信号处理模块DSP、图像处理器模块GPU、专用集成电路模块ASIC、现场可编程逻辑门阵列模块FPGA中的 任一个或多个的组合。In some embodiments, the algorithm module includes: any one or more of a central processing unit CPU, a digital signal processing module DSP, an image processor module GPU, an application specific integrated circuit module ASIC, and a field programmable logic gate array module FPGA a combination.
在一些实施例中,所述方法还包括:In some embodiments, the method also includes:
对所述异构处理单元的输入数据进行分割处理生成多个数据块;performing segmentation processing on the input data of the heterogeneous processing unit to generate multiple data blocks;
在所述异构处理单元建立与所述多个数据块对应的多个计算线程,其中,所述多个计算线程用于对所述多个数据块并行执行生成对应的多个数据处理结果;以及Establishing multiple computing threads corresponding to the multiple data blocks in the heterogeneous processing unit, wherein the multiple computing threads are used to execute the multiple data blocks in parallel to generate corresponding multiple data processing results; as well as
对所述多个数据处理结果合并处理,生成与所述异构处理单元对应的输出数据。Merge the multiple data processing results to generate output data corresponding to the heterogeneous processing units.
在一些实施例中,所述方法还包括:In some embodiments, the method also includes:
确定与所述数据处理任务对应的多个子任务,针对每个所述子任务建立对应的子流程图;determining a plurality of subtasks corresponding to the data processing task, and establishing a corresponding subflow chart for each of the subtasks;
将所述至少一个数据输入端的数据同时输入到与所述多个子任务对应的多个子流程图,其中,所述多个子流程图用于并行执行异构计算生成多个子任务结果;以及Simultaneously input the data of the at least one data input terminal into multiple sub-flowcharts corresponding to the multiple sub-tasks, wherein the multiple sub-flowcharts are used to perform heterogeneous calculations in parallel to generate multiple sub-task results; and
将所述多个子任务结果都提供给所述数据输出端,其中,所述数据输出端用于对所述多个子任务结果进行合并处理。All the multiple subtask results are provided to the data output terminal, wherein the data output terminal is used to combine the multiple subtask results.
在一些实施例中,所述方法还包括:In some embodiments, the method also includes:
在所述异构计算引擎框架中插入自定义算法节点,其中,所述自定义算法节点与所述数据输入端,或与所述异构处理单元的输出端连接,所述自定义算法节点用于根据自定义算法对所述数据输入端,或所述异构处理单元的输出端提供的数据进行计算,并将计算后的结果反馈到与所述数据输入端,或所述异构处理单元的输出端连接的下一个节点中。A self-defined algorithm node is inserted into the framework of the heterogeneous computing engine, wherein the self-defined algorithm node is connected to the data input end or the output end of the heterogeneous processing unit, and the self-defined algorithm node is used Calculate the data provided by the data input terminal or the output terminal of the heterogeneous processing unit according to a self-defined algorithm, and feed back the calculated result to the data input terminal or the heterogeneous processing unit In the next node connected to the output terminal of .
在一些实施例中,所述方法还包括:In some embodiments, the method also includes:
在所述异构计算引擎框架中插入位置终止节点,其中,所述位置终止节点与所述异构处理单元的输出端连接,所述位置终止节点用于停止所述异构处理单元的输出端之后的处理节点继续计算,并将所述异构处理单元的输出端提供的数据作为处理结果输出。A position termination node is inserted into the framework of the heterogeneous computing engine, wherein the position termination node is connected to the output end of the heterogeneous processing unit, and the position termination node is used to stop the output end of the heterogeneous processing unit Subsequent processing nodes continue to calculate, and output the data provided by the output terminals of the heterogeneous processing units as processing results.
第二方面,本公开实施例还提供了一种基于异构计算框架的处理装置,所述装置包括:In a second aspect, an embodiment of the present disclosure further provides a processing device based on a heterogeneous computing framework, the device comprising:
第一创建模块,用于对至少一个异构处理单元中的每个算法模块创建对应的应用程序接口API,其中,每个所述异构处理单元是通过封装至少一个算法模块生成的;A first creating module, configured to create a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each of the heterogeneous processing units is generated by encapsulating at least one algorithm module;
第二创建模块,用于根据数据处理任务调用至少一个异构处理单元的接口创建异构计算引擎框架,其中,所述异构计算引擎框架包括至少一个数据输入端,至少一个所述异构处理单元,以及至少一个数据输出端;以及The second creation module is used to create a heterogeneous computing engine framework by invoking the interface of at least one heterogeneous processing unit according to the data processing task, wherein the heterogeneous computing engine framework includes at least one data input terminal, and at least one of the heterogeneous processing units unit, and at least one data output; and
获取模块,用于根据所述异构计算引擎框架调用所需算法模块的应用程序接口API执 行所述数据处理任务,并向目标设备输出任务处理结果。The acquisition module is used to call the application program interface API of the required algorithm module according to the heterogeneous computing engine framework to perform the data processing task, and output the task processing result to the target device.
第三方面,本公开提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序/指令,当所述计算机程序/指令被处理器执行时,使得所述处理器实现上述的方法。In a third aspect, the present disclosure provides a computer-readable storage medium, where computer programs/instructions are stored in the computer-readable storage medium, and when the computer programs/instructions are executed by a processor, the processor can realize the above method.
第四方面,本公开提供了一种电子设备,包括:处理器;以及用于存储所述处理器可执行的指令的存储器;所述处理器用于从所述存储器中读取所述指令,并执行上述的方法。In a fourth aspect, the present disclosure provides an electronic device, including: a processor; and a memory for storing instructions executable by the processor; the processor is used for reading the instructions from the memory, and Execute the above method.
第五方面,本公开提供了一种计算机程序产品,所述计算机程序产品包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现上述的方法。In a fifth aspect, the present disclosure provides a computer program product, where the computer program product includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the above method is implemented.
第六方面,本公开提供了一种计算机程序,所述计算机程序包括指令,所述指令被处理器执行时实现上述的方法。In a sixth aspect, the present disclosure provides a computer program, where the computer program includes instructions, and when the instructions are executed by a processor, the above method is implemented.
附图说明Description of drawings
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.
图1为本公开实施例提供的一种基于异构计算框架的处理方法的流程示意图;FIG. 1 is a schematic flowchart of a processing method based on a heterogeneous computing framework provided by an embodiment of the present disclosure;
图2为本公开实施例提供的一种异构处理单元的示意图;FIG. 2 is a schematic diagram of a heterogeneous processing unit provided by an embodiment of the present disclosure;
图3为本公开实施例提供的一种异构计算引擎框架的示意图;FIG. 3 is a schematic diagram of a heterogeneous computing engine framework provided by an embodiment of the present disclosure;
图4为本公开实施例提供的另一种基于异构计算框架的处理方法的流程示意图;FIG. 4 is a schematic flowchart of another processing method based on a heterogeneous computing framework provided by an embodiment of the present disclosure;
图5为本公开实施例提供的另一种异构计算引擎框架的示意图;FIG. 5 is a schematic diagram of another heterogeneous computing engine framework provided by an embodiment of the present disclosure;
图6为本公开实施例提供的又一种基于异构计算框架的处理方法的流程示意图;FIG. 6 is a schematic flowchart of another processing method based on a heterogeneous computing framework provided by an embodiment of the present disclosure;
图7为本公开实施例提供的又一种异构计算引擎框架的示意图;FIG. 7 is a schematic diagram of another heterogeneous computing engine framework provided by an embodiment of the present disclosure;
图8为本公开实施例提供的一种插入自定义算法节点的示意图;FIG. 8 is a schematic diagram of inserting a custom algorithm node provided by an embodiment of the present disclosure;
图9为本公开实施例提供的一种插入位置终止节点的示意图;FIG. 9 is a schematic diagram of an insertion position termination node provided by an embodiment of the present disclosure;
图10为本公开实施例提供的一种插入自定义算法节点和位置终止节点的示意图;FIG. 10 is a schematic diagram of inserting a custom algorithm node and a position termination node provided by an embodiment of the present disclosure;
图11为本公开实施例提供的一种基于异构计算框架的处理装置的结构示意图;以及FIG. 11 is a schematic structural diagram of a processing device based on a heterogeneous computing framework provided by an embodiment of the present disclosure; and
图12为本公开实施例提供的一种电子设备的结构示意图。Fig. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method implementations of the present disclosure may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this respect.
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "comprise" and its variations are open-ended, ie "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one further embodiment"; the term "some embodiments" means "at least some embodiments." Relevant definitions of other terms will be given in the description below.
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as "first" and "second" mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence.
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。It should be noted that the modifications of "one" and "multiple" mentioned in the present disclosure are illustrative and not restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, it should be understood as "one or more" multiple".
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are used for illustrative purposes only, and are not used to limit the scope of these messages or information.
当前的计算系统的计算效率低,计算花费时长长,不适合扩展,不灵活,无法满足当下业务的算力需求。The current computing system has low computing efficiency, takes a long time to calculate, is not suitable for expansion, is not flexible, and cannot meet the computing power requirements of the current business.
本公开实施例提供的技术方案与相关技术相比具有如下优点:Compared with related technologies, the technical solutions provided by the embodiments of the present disclosure have the following advantages:
综上,本公开实施例的基于异构计算框架的处理方法,对至少一个异构处理单元中的每个算法模块创建对应的应用程序接口API,其中,每个所述异构处理单元是通过封装至少一个算法模块生成的;根据数据处理任务调用至少一个异构处理单元的接口创建异构计算引擎框架,其中,所述异构计算引擎框架包括至少一个数据输入端,至少一个所述异构处理单元,以及至少一个数据输出端;根据所述异构计算引擎框架调用所需算法模块的应用程序接口API执行所述数据处理任务,并向目标设备输出任务处理结果。本公开实施例中根据数据处理任务通过异构处理单元建立异构计算引擎框架,通过异构计算引擎框架对接数据处理任务,如果由于数据处理任务发生更改等原因需要扩展或者更改算法实现,能够通过调整异构计算引擎框架满足相应的更改,因而适合扩展,灵活性较强,对于大规模 的算法系统能够提高处理效率,并且通过异构计算引擎框架综合了异构处理单元中不同算法模块的特点,从而能够发挥异构处理单元的算力优势,提高计算效率,降低计算时长,提高了异构处理单元的性能,从而能够满足当下业务对算力的需求。同时,通过异构计算引擎框架实现了架构的分层设计、模块化设计、算法的级联以及混合计算的智能调度,从而提高了框架设计的效率。To sum up, the processing method based on the heterogeneous computing framework of the embodiment of the present disclosure creates a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each of the heterogeneous processing units is implemented through Generated by encapsulating at least one algorithm module; calling at least one interface of a heterogeneous processing unit according to a data processing task to create a heterogeneous computing engine framework, wherein the heterogeneous computing engine framework includes at least one data input terminal, and at least one of the heterogeneous A processing unit, and at least one data output terminal; call the application program interface API of the required algorithm module according to the heterogeneous computing engine framework to execute the data processing task, and output the task processing result to the target device. In the embodiment of the present disclosure, a heterogeneous computing engine framework is established through a heterogeneous processing unit according to a data processing task, and the data processing task is connected through the heterogeneous computing engine framework. Adjust the heterogeneous computing engine framework to meet the corresponding changes, so it is suitable for expansion and has strong flexibility. It can improve processing efficiency for large-scale algorithm systems, and integrates the characteristics of different algorithm modules in heterogeneous processing units through the heterogeneous computing engine framework. , so as to take advantage of the computing power of heterogeneous processing units, improve computing efficiency, reduce computing time, and improve the performance of heterogeneous processing units, thereby meeting the current business needs for computing power. At the same time, the hierarchical design of the architecture, the modular design, the cascading of algorithms, and the intelligent scheduling of hybrid computing are realized through the heterogeneous computing engine framework, thereby improving the efficiency of the framework design.
为了解决上述问题,本公开实施例提供了一种基于异构计算框架的处理方法,下面结合具体的实施例对该方法进行介绍。In order to solve the above problems, an embodiment of the present disclosure provides a processing method based on a heterogeneous computing framework, which will be introduced in conjunction with specific embodiments below.
图1为本公开实施例提供的一种基于异构计算框架的处理方法的流程示意图,该方法可以由基于异构计算框架的处理装置执行,其中该装置可以采用软件和/或硬件实现,一般可集成在电子设备中。如图1所示,该方法包括:FIG. 1 is a schematic flowchart of a processing method based on a heterogeneous computing framework provided by an embodiment of the present disclosure. The method can be executed by a processing device based on a heterogeneous computing framework, where the device can be implemented by software and/or hardware. Generally, Can be integrated in electronic equipment. As shown in Figure 1, the method includes:
步骤101,对至少一个异构处理单元中的每个算法模块创建对应的应用程序接口API,其中,每个异构处理单元是通过封装至少一个算法模块生成的。 Step 101, create a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each heterogeneous processing unit is generated by encapsulating at least one algorithm module.
在本公开实施例中,数据处理任务是基于异构处理单元实现的,实现该数据处理任务的异构处理单元的数目可以为一个,也可以为多个。异构处理单元为针对数据处理任务的抽象执行层,该异构处理单元中封装有至少一个算法模块,若一个异构处理单元中封装有多个算法模块,该多个算法模块的种类可以是相同的,也可以是不同的。其中,不同种类算法模块的不同之处包括但不限于:不同的制程架构、不同的指令集、不同的功能中的一个多个。In the embodiments of the present disclosure, the data processing task is implemented based on heterogeneous processing units, and the number of heterogeneous processing units implementing the data processing task may be one or multiple. The heterogeneous processing unit is an abstract execution layer for data processing tasks. At least one algorithm module is encapsulated in the heterogeneous processing unit. If there are multiple algorithm modules encapsulated in a heterogeneous processing unit, the types of the multiple algorithm modules can be The same can also be different. Wherein, the differences of different types of algorithm modules include but are not limited to: one or more of different process architectures, different instruction sets, and different functions.
举例而言,该算法模块包括:中央处理器模块(Central Processing Unit,CPU)、数字信号处理模块(Digital Signal Processing,DSP)、图像处理器模块(Graphics Processing Unit,GPU)、专用集成电路模块(Application Specific Integrated Circuit,ASIC)、现场可编程逻辑门阵列模块(Field Programmable Gate Array,FPGA)。一个异构处理单元中可以封装有上述任意一个算法模块,也可以封装有上述任意多个算法模块。例如,在一个异构处理单元中可以封装CPU、GPU、DSP。For example, the algorithm module includes: a central processing unit (Central Processing Unit, CPU), a digital signal processing module (Digital Signal Processing, DSP), an image processor module (Graphics Processing Unit, GPU), an application-specific integrated circuit module ( Application Specific Integrated Circuit, ASIC), field programmable logic gate array module (Field Programmable Gate Array, FPGA). A heterogeneous processing unit may be packaged with any one of the above-mentioned algorithm modules, or may be packaged with any number of the above-mentioned algorithm modules. For example, a CPU, GPU, and DSP can be packaged in a heterogeneous processing unit.
在本公开实施例中,需要预先对每个算法模块创建对应的应用程序接口(Application Programming Interface,API),该应用程序接口包括:能够调用对应算法模块的调用接口(backend接口)和能够控制对应算法模块运行的任务运行接口(runtime接口)。通过调用应用程序接口,能够使算法模块正常运行,从而在此基础上实现数据处理任务。In the embodiment of the present disclosure, it is necessary to create a corresponding application program interface (Application Programming Interface, API) for each algorithm module in advance, and the application program interface includes: a call interface (backend interface) capable of invoking the corresponding algorithm module and capable of controlling the corresponding The task running interface (runtime interface) that the algorithm module runs. By calling the application programming interface, the algorithm module can be operated normally, so as to realize the data processing task on this basis.
图2为本公开实施例提供的一种异构处理单元的示意图,参见图2,其中的Backend 表示调用接口,Runtime表示任务运行接口,Process表示异构处理单元,需要说明的是,后续实施例以及附图中,为了表达简便,也可以将所涉及的异构处理单元简述为Process。假设通过一个Process实现图像识别功能,该Process中可以封装有GPU和CPU,在进行异构计算之前,需要创建GPU对应的GPU Backend接口和GPU Runtime接口,创建CPU对应的CPU Backend接口和CPU Runtime接口。其中,GPU Backend接口和CPU Backend接口包括在图2的Backend中;GPU Runtime接口和CPU Runtime接口包括在图2的Runtime中。FIG. 2 is a schematic diagram of a heterogeneous processing unit provided by an embodiment of the present disclosure. Referring to FIG. 2 , Backend represents an invocation interface, Runtime represents a task execution interface, and Process represents a heterogeneous processing unit. It should be noted that the subsequent embodiments And in the accompanying drawings, for simplicity of expression, the involved heterogeneous processing units may also be simply described as Process. Assume that the image recognition function is implemented through a Process, which can be packaged with GPU and CPU. Before performing heterogeneous computing, it is necessary to create the GPU Backend interface and GPU Runtime interface corresponding to the GPU, and create the CPU Backend interface and CPU Runtime interface corresponding to the CPU. . Wherein, the GPU Backend interface and the CPU Backend interface are included in the Backend of Figure 2; the GPU Runtime interface and the CPU Runtime interface are included in the Runtime of Figure 2.
步骤102,根据数据处理任务调用至少一个异构处理单元的接口创建异构计算引擎框架,其中,异构计算引擎框架包括至少一个数据输入端,至少一个异构处理单元,以及至少一个数据输出端。 Step 102, call at least one heterogeneous processing unit interface to create a heterogeneous computing engine framework according to the data processing task, wherein the heterogeneous computing engine framework includes at least one data input terminal, at least one heterogeneous processing unit, and at least one data output terminal .
可以理解地,不同的算法模块存在对应的处理效率高的任务类型,举例而言,GPU处理图像类型任务时效率高、DSP处理数字信号类型任务时效率高,因而为了释放各算法模块的计算能力,可以根据数据处理任务以及各个算法模块的特点调用至少一个异构处理单元的接口创建异构计算机引擎框架。It is understandable that different algorithm modules have corresponding task types with high processing efficiency. For example, GPU has high efficiency when processing image type tasks, and DSP has high efficiency when processing digital signal type tasks. Therefore, in order to release the computing power of each algorithm module According to the data processing tasks and the characteristics of each algorithm module, the interface of at least one heterogeneous processing unit can be called to create a heterogeneous computer engine framework.
需要说明的是,本公开实施例中该异构计算引擎框架中包括用于输入数据的一个或多个数据输入端和用于输出数据的一个或多个数据输出端,还包括至少一个异构处理单元,该至少一个异构处理单元中可以封装有一种或多种算法模块,例如,异构计算引擎框架中可以包括一个异构处理单元,该一个异构处理单元中封装有两个相同种类的算法模块;或者,异构计算引擎框架中可以包括两个异构处理单元,该两个异构处理单元中各封装有一个算法模块,且该两个算法模块的种类不相同。It should be noted that in the embodiment of the present disclosure, the heterogeneous computing engine framework includes one or more data input ports for input data and one or more data output ports for output data, and at least one heterogeneous Processing unit, the at least one heterogeneous processing unit may be packaged with one or more algorithm modules, for example, a heterogeneous computing engine framework may include a heterogeneous processing unit, and the one heterogeneous processing unit is packaged with two or, the framework of the heterogeneous computing engine may include two heterogeneous processing units, each of which is packaged with an algorithm module, and the types of the two algorithm modules are different.
其中,上述异构计算引擎框架中的数据输入端、异构处理单元以及数据输出端之间的连接关系可以根据数据处理任务等进行匹配设置,因而在数据处理任务发生改变需要扩展或更改算法实现的情况下,能够通过调整异构计算引擎框架实现该数据处理任务。例如,可以通过异构计算引擎框架中的连接关系实现多个算法模块的级联处理、也可以通过异构计算引擎框架中的连接关系实现算法模块的复用(例如:图像前处理和图像后处理中相关算法的复用)等。此外,异构计算引擎框架中异构处理单元的数目可以根据单个异构处理单元的性能、设备的兼容性、该异构处理单元执行的算法特性等确定。本公开实施例不对上述异构计算引擎框架作限制。Among them, the connection relationship between the data input terminal, heterogeneous processing unit and data output terminal in the above-mentioned heterogeneous computing engine framework can be matched and set according to the data processing task, etc., so when the data processing task changes, it is necessary to expand or change the algorithm to achieve In the case of , the data processing task can be realized by adjusting the heterogeneous computing engine framework. For example, the cascade processing of multiple algorithm modules can be realized through the connection relationship in the heterogeneous computing engine framework, and the multiplexing of algorithm modules can also be realized through the connection relationship in the heterogeneous computing engine framework (for example: image pre-processing and image post-processing). Reuse of relevant algorithms in processing), etc. In addition, the number of heterogeneous processing units in the heterogeneous computing engine framework can be determined according to the performance of a single heterogeneous processing unit, compatibility of devices, characteristics of algorithms executed by the heterogeneous processing unit, and the like. Embodiments of the present disclosure do not limit the aforementioned heterogeneous computing engine framework.
图3为本公开实施例提供的一种异构计算引擎框架的示意图,为了更清楚的说明,参 见图3,该异构计算引擎框架中包括一个数据输入端和一个数据输出端,以及7个Process,其中部分Process中封装有CPU,另一部分Process中封装有GPU,若基于该异构计算引擎框架进行图像处理,由于图像处理过程中的图像前处理部分(Process2和Process3)和图像后处理(Process5和Process6)部分可以复用,因而Process2可以与Process5相同,Process3可以与Process6相同。具体地,该异构计算引擎框架中数据传输方向为从数据输入端到Process1,从Process1到Process2和Process4,从Process2到Process3,从Process3和Process1到Process4,从Process4到Process5和Process7,从Process5到Process6,从Process6和Process4到Process7,从Process7到数据输出端。FIG. 3 is a schematic diagram of a heterogeneous computing engine framework provided by an embodiment of the present disclosure. For a clearer description, refer to FIG. 3. The heterogeneous computing engine framework includes a data input terminal and a data output terminal, and 7 Process, wherein part of the Process is packaged with a CPU, and another part of the Process is packaged with a GPU. If image processing is performed based on the heterogeneous computing engine framework, due to the image pre-processing part (Process2 and Process3) and image post-processing ( Process5 and Process6) can be reused, so Process2 can be the same as Process5, and Process3 can be the same as Process6. Specifically, the data transmission direction in the framework of the heterogeneous computing engine is from the data input end to Process1, from Process1 to Process2 and Process4, from Process2 to Process3, from Process3 and Process1 to Process4, from Process4 to Process5 and Process7, from Process5 to Process6, from Process6 and Process4 to Process7, from Process7 to data output.
步骤103,根据异构计算引擎框架调用所需算法模块的应用程序接口API执行数据处理任务,并向目标设备输出任务处理结果。 Step 103, call the application program interface API of the required algorithm module according to the framework of the heterogeneous computing engine to execute the data processing task, and output the task processing result to the target device.
在通过上述步骤确定异构计算引擎框架之后,可以基于该异构计算引擎框架获取数据处理任务的业务处理结果,具体地,需要根据该异构计算引擎框架调用所需算法模块对应的应用程序接口API执行数据处理任务,从而实现对算法模块运行的有序调度,进而向目标设备输出任务处理结果。其中,该目标设备可以根据应用场景进行选择,本公开实施例不作限制,例如:手机、微型计算机等。After the heterogeneous computing engine framework is determined through the above steps, the business processing results of data processing tasks can be obtained based on the heterogeneous computing engine framework. Specifically, it is necessary to call the application program interface corresponding to the required algorithm module according to the heterogeneous computing engine framework The API performs data processing tasks, thereby realizing the orderly scheduling of the operation of the algorithm modules, and then outputs the task processing results to the target device. Wherein, the target device may be selected according to the application scenario, which is not limited in the embodiment of the present disclosure, for example, a mobile phone, a microcomputer, and the like.
以通过一个Process实现数据处理任务为例,该Process中封装有CPU和GPU,若根据异构计算引擎框架需要使用CPU,则调用CPU Backend接口使CPU开始运行,并且调用CPU Runtime接口控制CPU的运行,类似的,若根据异构计算引擎框架需要使用并运行GPU,则调用GPU Backend接口以及GPU Runtime接口,需要说明的是,具体的应用程序接口的调用顺序根据异构计算引擎框架中对应算法模块的调用顺序确定,此处不再赘述。Take the realization of data processing tasks through a Process as an example. The Process is packaged with CPU and GPU. If the CPU needs to be used according to the framework of the heterogeneous computing engine, the CPU Backend interface is called to start the CPU, and the CPU Runtime interface is called to control the operation of the CPU. , similarly, if the GPU needs to be used and run according to the framework of the heterogeneous computing engine, the GPU Backend interface and the GPU Runtime interface should be called. The order of calling is determined and will not be repeated here.
综上,本公开实施例的基于异构计算框架的处理方法,对至少一个异构处理单元中的每个算法模块创建对应的应用程序接口API,其中,每个异构处理单元是通过封装至少一个算法模块生成的;根据数据处理任务调用至少一个异构处理单元的接口创建异构计算引擎框架,其中,异构计算引擎框架包括至少一个数据输入端,至少一个异构处理单元,以及至少一个数据输出端;根据异构计算引擎框架调用所需算法模块的应用程序接口API执行数据处理任务,并向目标设备输出任务处理结果。To sum up, the processing method based on the heterogeneous computing framework of the embodiment of the present disclosure creates a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each heterogeneous processing unit is packaged by at least Generated by an algorithm module; call at least one heterogeneous processing unit interface to create a heterogeneous computing engine framework according to the data processing task, wherein the heterogeneous computing engine framework includes at least one data input terminal, at least one heterogeneous processing unit, and at least one Data output terminal: call the application program interface API of the required algorithm module according to the heterogeneous computing engine framework to perform data processing tasks, and output the task processing results to the target device.
在相关技术中,根据数据处理任务直接在CPU、GPU等算法模块上运行对应的算法,由具体的算法实现对接数据处理任务,如果需要扩展或者更改算法实现,与数据处理任务 之间的实现也要同步修改,因而不适合扩展,不灵活,对于大规模的算法系统处理效率很低。In related technologies, according to the data processing tasks, the corresponding algorithms are directly run on the algorithm modules such as CPU and GPU, and the data processing tasks are connected by the specific algorithms. If the algorithm needs to be expanded or changed, the realization of the data processing tasks is also Synchronous modification is required, so it is not suitable for expansion, inflexible, and the processing efficiency for large-scale algorithm systems is very low.
本公开实施例中根据数据处理任务通过异构处理单元建立异构计算引擎框架,通过异构计算引擎框架对接数据处理任务,如果由于数据处理任务发生更改等原因需要扩展或者更改算法实现,异构计算引擎框架与数据处理任务之间的应用程序接口不用同步修改,因而适合扩展,灵活性较强,对于大规模的算法系统能够提高处理效率,并且通过异构计算引擎框架综合了异构处理单元中不同算法模块的特点,从而能够发挥异构处理单元的算力优势,提高计算效率,降低计算时长,提高了异构处理单元的性能,从而能够满足当下业务对算力的需求。In the embodiment of the present disclosure, a heterogeneous computing engine framework is established through heterogeneous processing units according to data processing tasks, and the data processing tasks are connected through the heterogeneous computing engine framework. The application program interface between the computing engine framework and data processing tasks does not need to be modified synchronously, so it is suitable for expansion and has strong flexibility. It can improve the processing efficiency for large-scale algorithm systems, and integrates heterogeneous processing units through the heterogeneous computing engine framework. The characteristics of different algorithm modules in the system can take advantage of the computing power of heterogeneous processing units, improve computing efficiency, reduce computing time, and improve the performance of heterogeneous processing units, so as to meet the current business needs for computing power.
同时,通过异构计算引擎框架实现了架构的分层设计、模块化设计、算法的级联以及混合计算的智能调度,从而提高了框架设计的效率。At the same time, the hierarchical design of the architecture, the modular design, the cascading of algorithms, and the intelligent scheduling of hybrid computing are realized through the heterogeneous computing engine framework, thereby improving the efficiency of the framework design.
基于上述实施例,为了进一步提高异构计算框架的计算能力,可以采用分块多Backend并行执行的方法,具体说明如下:图4为本公开实施例提供的另一种基于异构计算框架的处理方法的流程示意图,如图4所示,该方法中还包括以下步骤:Based on the above-mentioned embodiments, in order to further improve the computing capability of the heterogeneous computing framework, the method of block multi-backend parallel execution can be adopted, and the specific description is as follows: FIG. 4 is another processing based on the heterogeneous computing framework provided by the embodiment of the present disclosure The schematic flow chart of the method, as shown in Figure 4, also includes the following steps in the method:
步骤401,对异构处理单元的输入数据进行分割处理生成多个数据块。 Step 401, performing split processing on the input data of the heterogeneous processing units to generate multiple data blocks.
在本公开实施例中,为了使算法模块能够运行其适合的业务,从而提高异构处理单元对输入数据的处理效率,对输入数据进行分割处理。在一些实施例中,可以根据输入数据的数据类型对输入数据进行分割处理获得对应的数据块,分割获得的同一数据块中的数据可能属于同一个数据类型。In the embodiment of the present disclosure, in order to enable the algorithm module to run its suitable business, thereby improving the processing efficiency of the input data by the heterogeneous processing units, the input data is segmented. In some embodiments, the input data may be divided according to the data type of the input data to obtain corresponding data blocks, and the data in the same data block obtained by the division may belong to the same data type.
步骤402,在异构处理单元建立与多个数据块对应的多个计算线程,其中,多个计算线程用于对多个数据块并行执行生成对应的多个数据处理结果。 Step 402, establishing multiple computing threads corresponding to multiple data blocks in the heterogeneous processing unit, wherein the multiple computing threads are used to execute multiple data blocks in parallel to generate multiple corresponding data processing results.
在本公开实施例中,可以建立与数据块对应的多个计算线程,从而基于该线程对应用程序接口(例如:Backend接口和Runtime接口)进行调用,并且基于该线程对数据块进行处理,对于每个数据块生成对应的数据处理结果。In the embodiment of the present disclosure, multiple calculation threads corresponding to the data block can be established, so as to call the application program interface (for example: Backend interface and Runtime interface) based on the thread, and process the data block based on the thread, for Each data block generates a corresponding data processing result.
步骤403,对多个数据处理结果合并处理,生成与异构处理单元对应的输出数据。 Step 403, combining and processing multiple data processing results to generate output data corresponding to heterogeneous processing units.
在获得多个数据处理结果之后,可以对该多个数据处理结果进行合并处理,通过合并处理生成与该异构处理单元对应的输出数据。After the multiple data processing results are obtained, the multiple data processing results may be merged to generate output data corresponding to the heterogeneous processing unit.
举例而言,如图5所示,对输入数据进行分割处理,将与图形处理相关的数据分割为第二数据块,将与数字信号处理相关的数据分割为第三数据块,将其他类型的数据分割为 第一数据块。建立用于处理第一数据块的第一计算线程Thread1,基于Thread1调用第一调用接口Backend1以及第一任务运行接口Runtime1,从而运行CPU对第一数据块进行处理,生成第一数据处理结果;建立用于处理第二数据块的第二计算线程Thread2,基于Thread2调用第二调用接口Backend2以及第二任务运行接口Runtime2,从而运行GPU对第二数据块进行处理,生成第二数据处理结果;建立用于处理第三数据块的第三计算线程Thread3,基于Thread3调用第三调用接口Backend3以及第三任务运行接口Runtime3,从而运行DSP对第三数据块进行处理,生成第三数据处理结果。需要说明的是,上述Thread1、Thread2、Thread3可以是在Process中并行执行的,从而提高输入数据处理的效率。进一步地,将第一数据处理结果、第二数据处理结果以及第三数据处理结果进行合并处理,从而获得输出数据。For example, as shown in FIG. 5, the input data is divided into two data blocks, the data related to digital signal processing is divided into the third data block, and other types of data are divided into The data is divided into first data blocks. Set up the first calculation thread Thread1 for processing the first data block, call the first calling interface Backend1 and the first task running interface Runtime1 based on Thread1, thereby run the CPU to process the first data block, and generate the first data processing result; establish The second calculation thread Thread2 for processing the second data block calls the second calling interface Backend2 and the second task running interface Runtime2 based on Thread2, thereby running the GPU to process the second data block and generating the second data processing result; The third computing thread Thread3 for processing the third data block calls the third calling interface Backend3 and the third task running interface Runtime3 based on Thread3, thereby running the DSP to process the third data block and generating a third data processing result. It should be noted that the above Thread1, Thread2, and Thread3 may be executed in parallel in the Process, so as to improve the efficiency of input data processing. Further, the first data processing result, the second data processing result and the third data processing result are combined to obtain output data.
综上,本公开实施例的基于异构计算框架的处理方法能够将输入数据进行分块处理,并建立与数据块对应的计算线程,从而进一步提高了硬件的处理效率,降低了处理时长。To sum up, the processing method based on the heterogeneous computing framework of the embodiment of the present disclosure can divide the input data into blocks and establish computing threads corresponding to the data blocks, thereby further improving the processing efficiency of the hardware and reducing the processing time.
在一些实施例中,可以采用较上一个实施方式更高粒度的处理方法,采用多任务多Backend并行的方法,具体说明如下:图6为本公开实施例提供的又一种基于异构计算框架的处理方法的流程示意图,如图6所示,该方法中还包括以下步骤:In some embodiments, a processing method with a higher granularity than the previous embodiment may be adopted, and a method of multi-task and multi-backend parallelism may be adopted. The specific description is as follows: FIG. 6 is another framework based on heterogeneous computing provided by an embodiment of the present disclosure A schematic flow chart of the processing method, as shown in Figure 6, also includes the following steps in the method:
步骤601,确定与数据处理任务对应的多个子任务,针对每个子任务建立对应的子流程图。 Step 601, determining multiple subtasks corresponding to the data processing task, and establishing a corresponding subflowchart for each subtask.
在本公开实施例中,为了提高异构处理单元对输入数据的处理效率,对数据处理任务进行细分处理,在一些实施例中,可以根据业务类型对数据处理任务进行细分处理,细分处理之后,能够获得多个子任务,进而针对每个子任务建立对应的子流程图,该子任务流程图中的异构处理单元数目以及每个异构处理单元中算法模块数目和算法模块类型本公开实施例不作限制,本领域技术人员可以根据异构处理单元的处理能力以及子任务的复杂程度设计该子流程图。In the embodiments of the present disclosure, in order to improve the processing efficiency of heterogeneous processing units for input data, the data processing tasks are subdivided. In some embodiments, the data processing tasks can be subdivided according to the business type. After processing, multiple subtasks can be obtained, and then a corresponding subflowchart can be established for each subtask. The number of heterogeneous processing units in the subtask flowchart and the number and type of algorithm modules in each heterogeneous processing unit are disclosed in this disclosure. The embodiment is not limited, and those skilled in the art can design the sub-flowchart according to the processing capabilities of the heterogeneous processing units and the complexity of the sub-tasks.
步骤602,将至少一个数据输入端的数据同时输入到与多个子任务对应的多个子流程图,其中,多个子流程图用于并行执行异构计算生成多个子任务结果。 Step 602, simultaneously input data of at least one data input terminal to multiple sub-flowcharts corresponding to multiple sub-tasks, wherein the multiple sub-flowcharts are used to execute heterogeneous calculations in parallel to generate multiple sub-task results.
在本公开实施例中,可以将至少一个数据输入端的数据同时输入到该上述多个子流程图中,进而根据每个子流程图执行异构计算,从而生成多个子任务结果。In an embodiment of the present disclosure, data from at least one data input terminal may be input into the above-mentioned multiple sub-flowcharts at the same time, and then heterogeneous calculations are performed according to each sub-flowchart, thereby generating multiple subtask results.
步骤603,将多个子任务结果都提供给数据输出端,其中,数据输出端用于对多个子任务结果进行合并处理。 Step 603, providing multiple subtask results to the data output terminal, wherein the data output terminal is used to combine the multiple subtask results.
在本公开实施例中,将多个子任务结果通过数据输出端进行合并处理,进而获得输出数据。In the embodiment of the present disclosure, the results of multiple subtasks are merged through the data output terminal to obtain output data.
举例说明,参见图7,假设目标任务为采集图像并进行人脸识别,将该目标任务分为图像降噪、图像特征提取、对采集到的数字信号进行处理三个子任务,因而针对图像降噪建立第一子流程图,该第一子流程图中的Process封装有CPU;针对图像特征提取建立第二子流程图,该第二子流程图中的Process封装有GPU;针对数字信号处理建立第三子流程图,该第三子流程图中的Process封装有DSP。将采集到的数据输入上述三个子流程图中,根据对应的子流程图并行执行异构计算,进而生成第一子任务对应的第一子任务结果、第二子任务对应的第二子任务结果以及第三子任务对应的第三子任务结果,进而数据输出端将上述三个子任务结果进行合并处理,从而获得输出数据。For example, see Figure 7. Assuming that the target task is to collect images and perform face recognition, the target task is divided into three subtasks: image noise reduction, image feature extraction, and processing the collected digital signals. Therefore, for image noise reduction Set up the first sub-flowchart, the Process in the first sub-flowchart is encapsulated with CPU; the second sub-flowchart is established for image feature extraction, the Process in the second sub-flowchart is encapsulated with GPU; the second is established for digital signal processing Three sub-flowcharts, the Process in the third sub-flowchart is encapsulated with DSP. Input the collected data into the above three sub-flowcharts, execute heterogeneous calculations in parallel according to the corresponding sub-flowcharts, and then generate the results of the first subtask corresponding to the first subtask and the results of the second subtask corresponding to the second subtask and the result of the third subtask corresponding to the third subtask, and then the data output end combines the results of the above three subtasks to obtain the output data.
综上,本公开实施例的基于异构计算框架的处理方法能够将目标任务分为多个子任务,并针对每个子任务建立对应的子流程图,从而进一步提高了硬件的处理效率,降低了处理时长。To sum up, the processing method based on the heterogeneous computing framework of the embodiment of the present disclosure can divide the target task into multiple subtasks, and establish a corresponding subflowchart for each subtask, thereby further improving the processing efficiency of the hardware and reducing the processing time. duration.
基于上述实施例,还能够在异构计算引擎框架中插入自定义算法节点,具体包括:Based on the above embodiments, it is also possible to insert a custom algorithm node in the heterogeneous computing engine framework, specifically including:
在异构计算引擎框架中插入自定义算法节点,其中,自定义算法节点与数据输入端,或与异构处理单元的输出端连接,自定义算法节点用于根据自定义算法对数据输入端,或异构处理单元的输出端提供的数据进行计算,并将计算后的结果反馈到与数据输入端,或异构处理单元的输出端连接的下一个节点中。其中,自定义算法包括用户根据需求编写的算法、用户根据需求从提供的多个算法中选择的算法等。Insert a custom algorithm node into the framework of the heterogeneous computing engine, where the custom algorithm node is connected to the data input terminal or the output terminal of the heterogeneous processing unit, and the custom algorithm node is used to process the data input terminal according to the custom algorithm, or the data provided by the output terminal of the heterogeneous processing unit, and feed back the calculated result to the next node connected to the data input terminal or the output terminal of the heterogeneous processing unit. Among them, the custom algorithm includes an algorithm written by the user according to the requirement, an algorithm selected by the user from multiple provided algorithms according to the requirement, and the like.
举例说明如下,图8为本公开实施例提供的一种插入自定义算法节点的示意图,如图8所示,在数据输入端和第一异构处理单元之间插入第一自定义算法节点,并且在第一异构处理单元和第二异构处理单元之间插入第二自定义算法节点。在执行异构计算时,数据从数据输入端输入之后,经过第一自定义算法节点处理输入到第一异构处理单元中,经过第一异构处理单元处理输入到第二自定义算法节点中,经过第二自定义算法节点处理输入到第二异构处理单元中,第二异构处理单元继续进行处理。An example is as follows. FIG. 8 is a schematic diagram of inserting a self-defined algorithm node provided by an embodiment of the present disclosure. As shown in FIG. 8, a first self-defined algorithm node is inserted between the data input terminal and the first heterogeneous processing unit. And a second custom algorithm node is inserted between the first heterogeneous processing unit and the second heterogeneous processing unit. When performing heterogeneous computing, after the data is input from the data input terminal, it is processed by the first custom algorithm node and input to the first heterogeneous processing unit, and then processed by the first heterogeneous processing unit and input to the second custom algorithm node , is processed by the second custom algorithm node and input to the second heterogeneous processing unit, and the second heterogeneous processing unit continues to process.
在一些实施例中,还能够在异构计算引擎框架中插入位置终止节点,具体包括:In some embodiments, a location termination node can also be inserted into the heterogeneous computing engine framework, specifically including:
在异构计算引擎框架中插入位置终止节点,其中,位置终止节点与异构处理单元的输出端连接,位置终止节点用于停止异构处理单元的输出端之后的处理节点继续计算,并将异构处理单元的输出端提供的数据作为处理结果输出。A position termination node is inserted into the framework of the heterogeneous computing engine, where the position termination node is connected to the output end of the heterogeneous processing unit, and the position termination node is used to stop the processing nodes after the output end of the heterogeneous processing unit to continue computing, and the heterogeneous The data provided by the output terminal of the structural processing unit is output as the processing result.
举例说明如下,图9为本公开实施例提供的一种插入位置终止节点的示意图,如图9所示,需要在第二异构处理单元之后输出该第二异构处理单元的处理结果,则可以在第二异构处理单元的输出端连接位置终止节点。在执行异构计算时,数据从数据输入端输入之后,经过第一异构处理单元处理之后,输入到第二异构处理单元中,经过第二异构处理单元处理,将该第二异构处理单元输出的数据作为处理结果输出。An example is as follows. FIG. 9 is a schematic diagram of an insertion position termination node provided by an embodiment of the present disclosure. As shown in FIG. 9 , it is necessary to output the processing result of the second heterogeneous processing unit after the second heterogeneous processing unit, then The node may be terminated at the output connection location of the second heterogeneous processing unit. When performing heterogeneous computing, after the data is input from the data input terminal, after being processed by the first heterogeneous processing unit, it is input into the second heterogeneous processing unit, and after being processed by the second heterogeneous processing unit, the second heterogeneous The data output by the processing unit is output as a processing result.
可以理解地,还可以在异构计算引擎框架中同时插入自定义算法节点和位置终止节点,举例说明如下,图10为本公开实施例提供的一种插入自定义算法节点和位置终止节点的示意图,如图10所示,在数据输出端和第一异构处理单元之间插入自定义算法节点,并且在第一异构处理单元之后输出该第一异构处理单元的处理结果。在执行异构计算时,数据从数据输入端输入之后,经过自定义算法节点处理之后,输入到第一异构处理单元中,第一异构处理单元进行处理,将该第一异构处理单元输出的数据作为处理结果输出。Understandably, it is also possible to simultaneously insert a custom algorithm node and a location termination node in the framework of a heterogeneous computing engine, for example as follows, FIG. 10 is a schematic diagram of inserting a custom algorithm node and a location termination node provided by an embodiment of the present disclosure , as shown in FIG. 10 , insert a custom algorithm node between the data output terminal and the first heterogeneous processing unit, and output the processing result of the first heterogeneous processing unit after the first heterogeneous processing unit. When performing heterogeneous computing, after the data is input from the data input terminal, after being processed by a custom algorithm node, it is input to the first heterogeneous processing unit, and the first heterogeneous processing unit performs processing, and the first heterogeneous processing unit The output data is output as the processing result.
综上,本公开实施例的基于异构计算框架的处理方法可以在异构计算引擎框架中插入自定义算法节点和/或位置终止节点,用户可以灵活地插入符合自身需求或场景需求的函数,并且灵活地确定该计算框架的输出节点,从而提高了该异构计算框架的可扩展性,使得该异构计算框架可适用的场景更加丰富。To sum up, the processing method based on the heterogeneous computing framework of the embodiment of the present disclosure can insert a custom algorithm node and/or a location termination node in the heterogeneous computing engine framework, and the user can flexibly insert functions that meet their own or scenario requirements, Moreover, the output nodes of the computing framework are determined flexibly, thereby improving the scalability of the heterogeneous computing framework, and making the applicable scenarios of the heterogeneous computing framework more abundant.
图11为本公开实施例提供的一种基于异构计算框架的处理装置的结构示意图,该装置可由软件和/或硬件实现,一般可集成在电子设备中。如图11所示,该装置包括:FIG. 11 is a schematic structural diagram of a processing device based on a heterogeneous computing framework provided by an embodiment of the present disclosure. The device can be implemented by software and/or hardware, and can generally be integrated into an electronic device. As shown in Figure 11, the device includes:
第一创建模块1101,用于对至少一个异构处理单元中的每个算法模块创建对应的应用程序接口API,其中,每个所述异构处理单元是通过封装至少一个算法模块生成的;The first creation module 1101 is configured to create a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each heterogeneous processing unit is generated by encapsulating at least one algorithm module;
第二创建模块1102,用于根据数据处理任务调用至少一个异构处理单元的接口创建异构计算引擎框架,其中,所述异构计算引擎框架包括至少一个数据输入端,至少一个所述异构处理单元,以及至少一个数据输出端;The second creation module 1102 is configured to call at least one interface of a heterogeneous processing unit according to a data processing task to create a heterogeneous computing engine framework, wherein the heterogeneous computing engine framework includes at least one data input terminal, and at least one of the heterogeneous a processing unit, and at least one data output;
获取模块1103,用于根据所述异构计算引擎框架调用所需算法模块的应用程序接口API执行所述数据处理任务,并向目标设备输出任务处理结果。The obtaining module 1103 is configured to call the application program interface API of the required algorithm module according to the framework of the heterogeneous computing engine to execute the data processing task, and output the task processing result to the target device.
在一些实施例中,所述第一创建模块1101,用于:In some embodiments, the first creation module 1101 is configured to:
创建与每个算法模块对应的调用接口和任务运行接口,其中,所述算法模块包括:中央处理器模块CPU、数字信号处理模块DSP、图像处理器模块GPU、专用集成电路模块ASIC、现场可编程逻辑门阵列模块FPGA中的任一个或多个的组合。Create a call interface and a task running interface corresponding to each algorithm module, wherein the algorithm module includes: a central processing unit CPU, a digital signal processing module DSP, an image processor module GPU, an application-specific integrated circuit module ASIC, field programmable Any one or combination of logic gate array modules FPGA.
在一些实施例中,所述装置还包括:In some embodiments, the device also includes:
分割模块,用于对所述异构处理单元的输入数据进行分割处理生成多个数据块;A segmentation module, configured to perform segmentation processing on the input data of the heterogeneous processing unit to generate multiple data blocks;
第一建立模块,用于在所述异构处理单元建立与所述多个数据块对应的多个计算线程,其中,所述多个计算线程用于对所述多个数据块并行执行生成对应的多个数据处理结果;The first establishment module is configured to establish a plurality of calculation threads corresponding to the plurality of data blocks in the heterogeneous processing unit, wherein the plurality of calculation threads are used to perform parallel execution on the plurality of data blocks to generate corresponding Multiple data processing results of ;
生成模块,用于对所述多个数据处理结果合并处理,生成与所述异构处理单元对应的输出数据。A generating module, configured to combine and process the multiple data processing results to generate output data corresponding to the heterogeneous processing units.
在一些实施例中,所述装置还包括:In some embodiments, the device also includes:
第二建立模块,用于确定与所述数据处理任务对应的多个子任务,针对每个所述子任务建立对应的子流程图;A second building module, configured to determine a plurality of subtasks corresponding to the data processing task, and establish a corresponding subflow chart for each of the subtasks;
输入模块,用于将所述至少一个数据输入端的数据同时输入到与所述多个子任务对应的多个子流程图,其中,所述多个子流程图用于并行执行异构计算生成多个子任务结果;The input module is configured to simultaneously input the data of the at least one data input terminal to multiple sub-flowcharts corresponding to the multiple sub-tasks, wherein the multiple sub-flowcharts are used to perform heterogeneous calculations in parallel to generate multiple sub-task results ;
输出模块,用于将所述多个子任务结果都提供给所述数据输出端,其中,所述数据输出端用于对所述多个子任务结果进行合并处理。An output module, configured to provide the multiple subtask results to the data output terminal, wherein the data output terminal is used to combine the multiple subtask results.
在一些实施例中,所述装置还包括:In some embodiments, the device also includes:
第一插入模块,用于在所述异构计算引擎框架中插入自定义算法节点,其中,所述自定义算法节点与所述数据输入端,或与所述异构处理单元的输出端连接,所述自定义算法节点用于根据自定义算法对所述数据输入端,或所述异构处理单元的输出端提供的数据进行计算,并将计算后的结果反馈到与所述数据输入端,或所述异构处理单元的输出端连接的下一个节点中。A first insertion module, configured to insert a self-defined algorithm node into the framework of the heterogeneous computing engine, wherein the self-defined algorithm node is connected to the data input end or the output end of the heterogeneous processing unit, The self-defined algorithm node is used to calculate the data provided by the data input terminal or the output terminal of the heterogeneous processing unit according to a self-defined algorithm, and feed back the calculated result to the data input terminal, Or in the next node connected to the output end of the heterogeneous processing unit.
在一些实施例中,所述装置还包括:In some embodiments, the device also includes:
第二插入模块,用于在所述异构计算引擎框架中插入位置终止节点,其中,所述位置终止节点与所述异构处理单元的输出端连接,所述位置终止节点用于停止所述异构处理单元的输出端之后的处理节点继续计算,并将所述异构处理单元的输出端提供的数据作为处理结果输出。The second insertion module is used to insert a position termination node into the framework of the heterogeneous computing engine, wherein the position termination node is connected to the output end of the heterogeneous processing unit, and the position termination node is used to stop the The processing nodes after the output end of the heterogeneous processing unit continue to calculate, and output the data provided by the output end of the heterogeneous processing unit as a processing result.
本公开实施例所提供的基于异构计算框架的处理装置可执行本公开任意实施例所提供的基于异构计算框架的处理方法,具备执行方法相应的功能模块和有益效果。The processing device based on the heterogeneous computing framework provided by the embodiments of the present disclosure can execute the processing method based on the heterogeneous computing framework provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
为了实现上述实施例,本公开还提出一种计算机程序产品,包括计算机程序/指令,该计算机程序/指令被处理器执行时实现上述实施例中的基于异构计算框架的处理方法。In order to implement the above embodiments, the present disclosure further proposes a computer program product, including computer programs/instructions, and when the computer program/instructions are executed by a processor, the processing method based on the heterogeneous computing framework in the above embodiments is implemented.
为了实现上述实施例,本公开还提出一种计算机程序,包括/指令,该指令被处理器执 行时实现上述实施例中的基于异构计算框架的处理方法。In order to implement the above embodiments, the present disclosure also proposes a computer program, including/instructions, and when the instructions are executed by a processor, the processing method based on the heterogeneous computing framework in the above embodiments is implemented.
图12为本公开实施例提供的一种电子设备的结构示意图。Fig. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
下面具体参考图12,其示出了适于用来实现本公开实施例中的电子设备1200的结构示意图。本公开实施例中的电子设备1200可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图12示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Referring to FIG. 12 in detail below, it shows a schematic structural diagram of an electronic device 1200 suitable for implementing an embodiment of the present disclosure. The electronic device 1200 in the embodiment of the present disclosure may include, but not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), vehicle-mounted terminals ( Mobile terminals such as car navigation terminals) and stationary terminals such as digital TVs, desktop computers and the like. The electronic device shown in FIG. 12 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
如图12所示,电子设备1200可以包括处理装置(例如中央处理器、图形处理器等)1201,其可以根据存储在只读存储器(ROM)1202中的程序或者从存储装置1208加载到随机访问存储器(RAM)1203中的程序而执行各种适当的动作和处理。在RAM 1203中,还存储有电子设备1200操作所需的各种程序和数据。处理装置1201、ROM 1202以及RAM1203通过总线1204彼此相连。输入/输出(I/O)接口1205也连接至总线1204。As shown in FIG. 12, an electronic device 1200 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 1201, which may be randomly accessed according to a program stored in a read-only memory (ROM) 1202 or loaded from a storage device 1208. Various appropriate actions and processes are executed by programs in the memory (RAM) 1203 . In the RAM 1203, various programs and data necessary for the operation of the electronic device 1200 are also stored. The processing device 1201, ROM 1202, and RAM 1203 are connected to each other through a bus 1204. An input/output (I/O) interface 1205 is also connected to the bus 1204 .
通常,以下装置可以连接至I/O接口1205:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置1206;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置1207;包括例如磁带、硬盘等的存储装置1208;以及通信装置1209。通信装置1209可以允许电子设备1200与其他设备进行无线或有线通信以交换数据。虽然图12示出了具有各种装置的电子设备1200,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Typically, the following devices can be connected to the I/O interface 1205: input devices 1206 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 1207 such as a computer; a storage device 1208 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 1209. The communication means 1209 may allow the electronic device 1200 to perform wireless or wired communication with other devices to exchange data. While FIG. 12 shows electronic device 1200 having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置1209从网络上被下载和安装,或者从存储装置1208被安装,或者从ROM 1202被安装。在该计算机程序被处理装置1201执行时,执行本公开实施例的基于异构计算框架的处理方法中限定的上述功能。In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 1209, or from storage means 1208, or from ROM 1202. When the computer program is executed by the processing device 1201, the above-mentioned functions defined in the processing method based on the heterogeneous computing framework of the embodiment of the present disclosure are executed.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、 便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some embodiments, the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium The communication (eg, communication network) interconnections. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device:
对至少一个异构处理单元中的每个算法模块创建对应的应用程序接口API,其中,每个异构处理单元是通过封装至少一个算法模块生成的;根据数据处理任务调用至少一个异构处理单元的接口创建异构计算引擎框架,其中,异构计算引擎框架包括至少一个数据输入端,至少一个异构处理单元,以及至少一个数据输出端;根据异构计算引擎框架调用所需算法模块的应用程序接口API执行数据处理任务,并向目标设备输出任务处理结果。本公开实施例中根据数据处理任务通过异构处理单元建立异构计算引擎框架,通过异构计算引擎框架对接数据处理任务,如果由于数据处理任务发生更改等原因需要扩展或者更改算法实现,能够通过调整异构计算引擎框架满足相应的更改,因而适合扩展,灵活性较强,对于大规模的算法系统能够提高处理效率,并且通过异构计算引擎框架综合了异构处理单 元中不同算法模块的特点,从而能够发挥异构处理单元的算力优势,提高计算效率,降低计算时长,提高了异构处理单元的性能,从而能够满足当下业务对算力的需求。同时,通过异构计算引擎框架实现了架构的分层设计、模块化设计、算法的级联以及混合计算的智能调度,从而提高了框架设计的效率。Create a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each heterogeneous processing unit is generated by encapsulating at least one algorithm module; call at least one heterogeneous processing unit according to the data processing task Create a heterogeneous computing engine framework through the interface, where the heterogeneous computing engine framework includes at least one data input terminal, at least one heterogeneous processing unit, and at least one data output terminal; call the application of the required algorithm module according to the heterogeneous computing engine framework The program interface API executes data processing tasks and outputs task processing results to the target device. In the embodiment of the present disclosure, a heterogeneous computing engine framework is established through a heterogeneous processing unit according to a data processing task, and the data processing task is connected through the heterogeneous computing engine framework. Adjust the heterogeneous computing engine framework to meet the corresponding changes, so it is suitable for expansion and has strong flexibility. It can improve processing efficiency for large-scale algorithm systems, and integrates the characteristics of different algorithm modules in heterogeneous processing units through the heterogeneous computing engine framework. , so as to take advantage of the computing power of heterogeneous processing units, improve computing efficiency, reduce computing time, and improve the performance of heterogeneous processing units, thereby meeting the current business needs for computing power. At the same time, the hierarchical design of the architecture, the modular design, the cascading of algorithms, and the intelligent scheduling of hybrid computing are realized through the heterogeneous computing engine framework, thereby improving the efficiency of the framework design.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读 介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present disclosure and an illustration of the applied technical principle. Those skilled in the art should understand that the disclosure scope involved in this disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but also covers the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of equivalent features. For example, a technical solution formed by replacing the above-mentioned features with (but not limited to) technical features with similar functions disclosed in this disclosure.
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。In addition, while operations are depicted in a particular order, this should not be understood as requiring that the operations be performed in the particular order shown or performed in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims (12)

  1. 一种处理方法,所述处理方法基于异构计算框架,包括:A processing method, the processing method is based on a heterogeneous computing framework, comprising:
    对至少一个异构处理单元中的每个算法模块创建对应的应用程序接口API,其中,每个所述异构处理单元是通过封装至少一个算法模块生成的;Create a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each said heterogeneous processing unit is generated by encapsulating at least one algorithm module;
    根据数据处理任务调用至少一个所述异构处理单元的接口创建异构计算引擎框架,其中,所述异构计算引擎框架包括至少一个数据输入端,至少一个所述异构处理单元,以及至少一个数据输出端;以及Create a heterogeneous computing engine framework by invoking at least one interface of the heterogeneous processing unit according to the data processing task, wherein the heterogeneous computing engine framework includes at least one data input terminal, at least one heterogeneous processing unit, and at least one data output; and
    根据所述异构计算引擎框架调用所需算法模块的应用程序接口API执行所述数据处理任务,并向目标设备输出任务处理结果。Call the application program interface API of the required algorithm module according to the heterogeneous computing engine framework to execute the data processing task, and output the task processing result to the target device.
  2. 根据权利要求1所述的处理方法,其中,所述对至少一个异构处理单元中的每个算法模块创建对应的应用程序接口API,包括:The processing method according to claim 1, wherein said creating a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit comprises:
    创建与每个算法模块对应的调用接口和任务运行接口。Create a calling interface and task running interface corresponding to each algorithm module.
  3. 根据权利要求1或2所述的处理方法,其中,所述算法模块包括:中央处理器模块CPU、数字信号处理模块DSP、图像处理器模块GPU、专用集成电路模块ASIC、现场可编程逻辑门阵列模块FPGA中的任一个或多个的组合。The processing method according to claim 1 or 2, wherein the algorithm module comprises: a central processing unit CPU, a digital signal processing module DSP, an image processor module GPU, an application specific integrated circuit module ASIC, a field programmable logic gate array Any one or combination of multiple FPGAs.
  4. 根据权利要求1-3中任一项所述的处理方法,还包括:The processing method according to any one of claims 1-3, further comprising:
    对所述异构处理单元的输入数据进行分割处理生成多个数据块;performing segmentation processing on the input data of the heterogeneous processing unit to generate multiple data blocks;
    在所述异构处理单元建立与所述多个数据块对应的多个计算线程,其中,所述多个计算线程用于对所述多个数据块并行执行生成对应的多个数据处理结果;以及Establishing multiple computing threads corresponding to the multiple data blocks in the heterogeneous processing unit, wherein the multiple computing threads are used to execute the multiple data blocks in parallel to generate corresponding multiple data processing results; as well as
    对所述多个数据处理结果合并处理,生成与所述异构处理单元对应的输出数据。Merge the multiple data processing results to generate output data corresponding to the heterogeneous processing units.
  5. 根据权利要求1-3中任一项所述的处理方法,还包括:The processing method according to any one of claims 1-3, further comprising:
    确定与所述数据处理任务对应的多个子任务,针对每个所述子任务建立对应的子流程图;determining a plurality of subtasks corresponding to the data processing task, and establishing a corresponding subflow chart for each of the subtasks;
    将所述至少一个数据输入端的数据同时输入到与所述多个子任务对应的多个子流程图,其中,所述多个子流程图用于并行执行异构计算生成多个子任务结果;以及Simultaneously input the data of the at least one data input terminal into multiple sub-flowcharts corresponding to the multiple sub-tasks, wherein the multiple sub-flowcharts are used to perform heterogeneous calculations in parallel to generate multiple sub-task results; and
    将所述多个子任务结果都提供给所述数据输出端,其中,所述数据输出端用于对所述多个子任务结果进行合并处理。All the multiple subtask results are provided to the data output terminal, wherein the data output terminal is used to combine the multiple subtask results.
  6. 根据权利要求1-5任一所述的处理方法,还包括:The processing method according to any one of claims 1-5, further comprising:
    在所述异构计算引擎框架中插入自定义算法节点,其中,所述自定义算法节点与所述数据输入端,或与所述异构处理单元的输出端连接,所述自定义算法节点用于根据自定义算法对所述数据输入端,或所述异构处理单元的输出端提供的数据进行计算,并将计算后的结果反馈到与所述数据输入端,或所述异构处理单元的输出端连接的下一个节点中。A self-defined algorithm node is inserted into the framework of the heterogeneous computing engine, wherein the self-defined algorithm node is connected to the data input end or the output end of the heterogeneous processing unit, and the self-defined algorithm node is used Calculate the data provided by the data input terminal or the output terminal of the heterogeneous processing unit according to a self-defined algorithm, and feed back the calculated result to the data input terminal or the heterogeneous processing unit In the next node connected to the output terminal of .
  7. 根据权利要求1-6任一所述的处理方法,还包括:The processing method according to any one of claims 1-6, further comprising:
    在所述异构计算引擎框架中插入位置终止节点,其中,所述位置终止节点与所述异构处理单元的输出端连接,所述位置终止节点用于停止所述异构处理单元的输出端之后的处理节点继续计算,并将所述异构处理单元的输出端提供的数据作为处理结果输出。A position termination node is inserted into the framework of the heterogeneous computing engine, wherein the position termination node is connected to the output end of the heterogeneous processing unit, and the position termination node is used to stop the output end of the heterogeneous processing unit Subsequent processing nodes continue to calculate, and output the data provided by the output terminals of the heterogeneous processing units as processing results.
  8. 一种处理装置,所述处理装置基于异构计算框架,包括:A processing device based on a heterogeneous computing framework, comprising:
    第一创建模块,用于对至少一个异构处理单元中的每个算法模块创建对应的应用程序接口API,其中,每个所述异构处理单元是通过封装至少一个算法模块生成的;A first creating module, configured to create a corresponding application program interface API for each algorithm module in at least one heterogeneous processing unit, wherein each of the heterogeneous processing units is generated by encapsulating at least one algorithm module;
    第二创建模块,用于根据数据处理任务调用至少一个异构处理单元的接口创建异构计算引擎框架,其中,所述异构计算引擎框架包括至少一个数据输入端,至少一个所述异构处理单元,以及至少一个数据输出端;以及The second creation module is used to create a heterogeneous computing engine framework by invoking the interface of at least one heterogeneous processing unit according to the data processing task, wherein the heterogeneous computing engine framework includes at least one data input terminal, and at least one of the heterogeneous processing units unit, and at least one data output; and
    获取模块,用于根据所述异构计算引擎框架调用所需算法模块的应用程序接口API执行所述数据处理任务,并向目标设备输出任务处理结果。The obtaining module is used to call the application program interface API of the required algorithm module according to the heterogeneous computing engine framework to execute the data processing task, and output the task processing result to the target device.
  9. 一种电子设备,所述电子设备包括:An electronic device comprising:
    处理器;以及processor; and
    用于存储所述处理器可执行的指令的存储器;memory for storing instructions executable by the processor;
    所述处理器,用于从所述存储器中读取所述指令,并执行所述指令以实现上述权利要求1-7中任一所述的处理方法。The processor is configured to read the instruction from the memory, and execute the instruction to implement the processing method described in any one of claims 1-7.
  10. 一种计算机可读存储介质,所述存储介质存储有计算机程序/指令,所述计算机程序/指令被处理器执行时执行上述权利要求1-7中任一所述的处理方法。A computer-readable storage medium, the storage medium stores computer programs/instructions, and when the computer programs/instructions are executed by a processor, the processing method described in any one of claims 1-7 above is executed.
  11. 一种计算机程序产品,所述计算机程序产品包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现如权利要求1-7任一项所述的处理方法。A computer program product, the computer program product comprising a computer program/instruction, when the computer program/instruction is executed by a processor, the processing method according to any one of claims 1-7 is realized.
  12. 一种计算机程序,所述计算机程序包括指令,所述指令被处理器执行时实现如权利要求1-7任一项所述的处理方法。A computer program, wherein the computer program includes instructions, and when the instructions are executed by a processor, the processing method according to any one of claims 1-7 is implemented.
PCT/CN2022/142134 2021-12-28 2022-12-27 Heterogeneous computing framework-based processing method and apparatus, and device and medium WO2023125463A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111629485.7 2021-12-28
CN202111629485.7A CN116360971A (en) 2021-12-28 2021-12-28 Processing method, device, equipment and medium based on heterogeneous computing framework

Publications (1)

Publication Number Publication Date
WO2023125463A1 true WO2023125463A1 (en) 2023-07-06

Family

ID=86939323

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/142134 WO2023125463A1 (en) 2021-12-28 2022-12-27 Heterogeneous computing framework-based processing method and apparatus, and device and medium

Country Status (2)

Country Link
CN (1) CN116360971A (en)
WO (1) WO2023125463A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117077608A (en) * 2023-08-22 2023-11-17 北京市合芯数字科技有限公司 Connection method and device of power switch unit, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190122415A1 (en) * 2017-10-20 2019-04-25 Westghats Technologies Private Limited Graph based heterogeneous parallel processing system
CN109783141A (en) * 2017-11-10 2019-05-21 华为技术有限公司 Isomery dispatching method
CN111258744A (en) * 2018-11-30 2020-06-09 中兴通讯股份有限公司 Task processing method based on heterogeneous computation and software and hardware framework system
CN111399911A (en) * 2020-03-24 2020-07-10 杭州博雅鸿图视频技术有限公司 Artificial intelligence development method and device based on multi-core heterogeneous computation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190122415A1 (en) * 2017-10-20 2019-04-25 Westghats Technologies Private Limited Graph based heterogeneous parallel processing system
CN109783141A (en) * 2017-11-10 2019-05-21 华为技术有限公司 Isomery dispatching method
CN111258744A (en) * 2018-11-30 2020-06-09 中兴通讯股份有限公司 Task processing method based on heterogeneous computation and software and hardware framework system
CN111399911A (en) * 2020-03-24 2020-07-10 杭州博雅鸿图视频技术有限公司 Artificial intelligence development method and device based on multi-core heterogeneous computation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117077608A (en) * 2023-08-22 2023-11-17 北京市合芯数字科技有限公司 Connection method and device of power switch unit, electronic equipment and storage medium
CN117077608B (en) * 2023-08-22 2024-02-27 北京市合芯数字科技有限公司 Connection method and device of power switch unit, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116360971A (en) 2023-06-30

Similar Documents

Publication Publication Date Title
WO2021103479A1 (en) Method and apparatus for training deep learning model
CN109523187B (en) Task scheduling method, device and equipment
CN110609872B (en) Method and apparatus for synchronizing node data
CN111475235B (en) Acceleration method, device, equipment and storage medium for function calculation cold start
US10423442B2 (en) Processing jobs using task dependencies
CN111581555B (en) Document loading method, device, equipment and storage medium
WO2023125463A1 (en) Heterogeneous computing framework-based processing method and apparatus, and device and medium
WO2023078072A1 (en) Byzantine fault tolerance-based asynchronous consensus method and apparatus, server and medium
CN111625422B (en) Thread monitoring method, thread monitoring device, electronic equipment and computer readable storage medium
CN111324376B (en) Function configuration method, device, electronic equipment and computer readable medium
CN115600676A (en) Deep learning model reasoning method, device, equipment and storage medium
WO2022148231A1 (en) Application launch control method, apparatus, electronic device, and storage medium
CN114330689A (en) Data processing method and device, electronic equipment and storage medium
CN110489219B (en) Method, device, medium and electronic equipment for scheduling functional objects
US20220269622A1 (en) Data processing methods, apparatuses, electronic devices and computer-readable storage media
WO2023056841A1 (en) Data service method and apparatus, and related product
CN113918298B (en) Multimedia data processing method, system and equipment
CN115378937A (en) Distributed concurrency method, device and equipment for tasks and readable storage medium
KR20210042992A (en) Method and apparatus for training a deep learning model
CN113176937A (en) Task processing method and device and electronic equipment
CN116755889B (en) Data acceleration method, device and equipment applied to server cluster data interaction
WO2023197868A1 (en) Image processing method and apparatus, system, and storage medium
WO2023093474A1 (en) Multimedia processing method and apparatus, and device and medium
CN115827415B (en) System process performance test method, device, equipment and computer medium
CN113626160B (en) Network data packet high-concurrency processing method and system based on cavium processor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22914717

Country of ref document: EP

Kind code of ref document: A1