CN101551761A - Method for sharing stream memory of heterogeneous multi-processor - Google Patents

Method for sharing stream memory of heterogeneous multi-processor Download PDF

Info

Publication number
CN101551761A
CN101551761A CNA2009100149388A CN200910014938A CN101551761A CN 101551761 A CN101551761 A CN 101551761A CN A2009100149388 A CNA2009100149388 A CN A2009100149388A CN 200910014938 A CN200910014938 A CN 200910014938A CN 101551761 A CN101551761 A CN 101551761A
Authority
CN
China
Prior art keywords
processor
stream
unit
memory
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2009100149388A
Other languages
Chinese (zh)
Inventor
魏健
王守昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Langchao Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Langchao Electronic Information Industry Co Ltd filed Critical Langchao Electronic Information Industry Co Ltd
Priority to CNA2009100149388A priority Critical patent/CN101551761A/en
Publication of CN101551761A publication Critical patent/CN101551761A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Multi Processors (AREA)

Abstract

The invention provides a method for sharing the stream memory of a heterogeneous multi-processor. The method comprises the following steps: an application program runs on a master processor and an API is called for the first time, and one or more executable programs are encoded from the source code containing local variables for a plurality of processor units with stream memory; and then the API is called for the second time so as to load one or more executable programs to a plurality of processor units; collateral execution is conducted on a plurality of treads; when in loading, local storage units are allocated from the local storage of a processor; in addition, when in loading, a first stream storage unit is allocated from the stream memory; when a processing unit executes a plurality of treads simultaneously, the threads access to the values of the variables on the basis of the storage units of the stream memory; the source program containing stream variables further comprises the following steps: the API is called for the third time; in the stream memory, a second stream storage unit is allocated for the stream variables; based on the second stream storage unit, the variable values of the stream variables are accessed through a plurality of processor units.

Description

Share the method for stream memory in a kind of heterogeneous multi-processor
Technical field
The present invention relates to a kind of data parallel computing technique, especially history carries out sharing when data parallel calculates the method for stream memory by heterogeneous multi-processor CPUs and GPUs.
Background technology
Along with GPU includes high performance parallel computation equipment gradually in, GPU has been developed increasing application program and has finished data parallel calculating by the computing equipment according to general objects.Today, we design these application programs with professional interface and professional GPU equipment that supplier provides, and therefore, even CPU and GPU one are used from data handling system, it is overweight that CPU can load yet, and application program also may operate on the GPU of different vendor.
Yet, along with being embedded into multinuclear, increasing CPU finishes data parallel calculating, and the more and more data Processing tasks promptly can be finished with CPUs and GPUs.The processor of a plurality of CPU or GPU combination is write a Chinese character in simplified form CPUs and GPUs, and on the traditional sense, GPUs and CPUs are to compile by different program environments respectively, therefore make that CPU and GPU interoperability are not fine.Therefore making application make good use of CPUs and GPUs simultaneously, to handle resource be unusual difficulty, thereby need a new data handling system overcome above-mentioned difficulties.Thereby make application can make full use of CPU and the various processing resources of GPU.
Summary of the invention
The purpose of this invention is to provide the method for sharing stream memory in a kind of heterogeneous multi-processor.
The objective of the invention is to realize in the following manner, comprise primary processor and computation processor, operate in the application program in the primary processor, based on main processor invokes API, executable program is loaded into computation processor from primary processor, and be computation processor configuration store ability, being certain variable storage allocation of the thread accesses in the computation processor, computation processor is GPU or CPU;
Step is as follows: application program operates in the primary processor calls API for the first time, for a plurality of processor units of being furnished with stream memory from the one or more executable programs of the compilation of source code that comprises local variable; For the second time call API then, remove to load one or more executable programs in a plurality of processor units, a plurality of threads of executed in parallel during loading, distribute LSU local store unit from the local storage of a processor; And when loading, from stream memory, distribute first stream storage unit, in a processing unit, carry out a plurality of threads simultaneously, these threads are based on the value of the memory unit access variable of stream memory, further comprise for the source program that comprises flow variables: call API for the third time, in stream memory, for flow variables is distributed second stream storage unit; Based on second stream storage unit, from the variate-value of a plurality of processor unit access stream variablees.
Excellent effect of the present invention is well to make application program make good use of CPUs and GPUs processing resource simultaneously, improves the ability that application program is handled mass data.
Description of drawings
Fig. 1 finishes the computing equipment arrangement plan that data parallel calculates;
Fig. 2 is that parallel multiprocessor is carried out the shared stream memory synoptic diagram of multithreading;
Fig. 3 is the process synoptic diagram that scheduling API finishes Memory Allocation.
Embodiment
With reference to explaining below the method work of Figure of description to shared stream memory in a kind of heterogeneous multi-processor of the present invention.
Operate in the application on the primary processor among the present invention, the storage capacity of configuration computation processor, computation processor can be CPU or GPU, and is the executable program of one group of thread execution in the computing, visits a variable memory allocated unit.By the value of variable of this group thread accesses, the perhaps stream memory of sharing from the local memory of computation processor or primary processor and computation processor.By API Calls, use distribution and the configuration of finishing internal memory.When calling API for the first time, for being furnished with a plurality of processing units of stream memory, from the one or more executable programs of compilation of source code; For the second time call API then, remove to load these executable programs in a plurality of processing units, and carry out a plurality of threads simultaneously.During loading, distribute LSU local store unit from the local storage of a processor, this storage unit is used for preserving the local variable of source code; And from stream memory, distribute first stream storage unit when loading, carry out a plurality of threads simultaneously in a processing unit, a plurality of threads are based on the value of the memory unit access local variable of stream memory.Further comprise for the source program that comprises flow variables: call API for the third time, in stream memory, for flow variables is distributed second stream storage unit; Based on second stream storage unit, can be from a plurality of processor units, the access stream variable.In the stream buffer memory,, preserve the value of variable in the stream storage unit in the buffer unit for variable distributes buffer unit.
Embodiment
Fig. 1 is for finishing the computing equipment arrangement plan of application data parallel processing, in this computing equipment, comprise central processor CPU and graphic process unit GPU, a primary processor is arranged in the host processing system wherein, can upload data download and checkout result in network, primary processor connects heterogeneous processor CPUs and GPUs by data bus.CPU can be the CPU of multinuclear, and GPU is the hardware that can support graphics process and double-precision floating point computing.Function library is preserved source code and executable program, and compiling layer is responsible for compile source code, uses and passes through API Calls, load executable program to firing floor, firing floor is by the distribution of computational resource, the management processing task executions, the calculate platform layer, the sign of responsible physical computing devices.The executable program that compiling is finished is loaded into firing floor by API Calls, and firing floor is according to the data file of processor during operation, and mutual with compiling layer, compile source code generates new executable program in real time.Firing floor is assigned to computational resource to qualified executable program by the calculate platform layer.
Fig. 2 is that parallel multiprocessor is carried out the shared stream memory synoptic diagram of multithreading, and at this moment application program is loaded into computation processor with executable program from primary processor by API Calls.Executable program is a plurality of threads of executed in parallel in a processing unit, as seen from the figure, have 1 in computation processor _ 1 to M thread, have 1 among computation processor _ L to N thread, each thread is by the value of its privately owned its local variable of internal storage access in computing, a plurality of threads in computing are by the value of local shared drive access variable, and the thread in a plurality of processing is based on the value of the memory unit access flow variables of stream memory.For example, the value of the 1 storage thread of the privately owned internal memory in the computation processor 1,1 local variable to be processed; The variate-value that storage thread 1 and M need handle in the local shared drive; And computation processor _ 1 thread M and computation processor _ L thread N, then by flowing the value of cache access flow variables.Local shared drive also is based on the storage unit of stream memory.
Fig. 3 is the process synoptic diagram that scheduling API finishes Memory Allocation, and application program at first by the API scheduling, is finished the compiling of source code, and compiling generates one or more executable programs; And then call API loading executable program to handling the unit, finish Memory Allocation during loading to local variable in the executable program, this Memory Allocation is based on the local storage capability of processor, finish the distribution of first stream memory simultaneously, be used for a plurality of threads of processor access variable simultaneously; Call API at last for the third time, in stream memory, distribute second stream storage unit, thereby make a plurality of processor units, the access stream variable for flow variables.

Claims (4)

1. share the method for stream memory in the heterogeneous multi-processor, comprise primary processor and computation processor, it is characterized in that, operate in the application program in the primary processor, based on main processor invokes API, executable program is loaded into computation processor from primary processor, and is computation processor configuration store ability, be certain variable storage allocation of the thread accesses in the computation processor, computation processor is GPU or CPU;
Step is as follows: application program operates in the primary processor calls API for the first time, for a plurality of processor units of being furnished with stream memory from the one or more executable programs of the compilation of source code that comprises local variable; For the second time call API then, remove to load one or more executable programs in a plurality of processor units, a plurality of threads of executed in parallel during loading, distribute LSU local store unit from the local storage of a processor; And when loading, from stream memory, distribute first stream storage unit, in a processing unit, carry out a plurality of threads simultaneously, these threads are based on the value of the memory unit access variable of stream memory, further comprise for the source program that comprises flow variables: call API for the third time, in stream memory, for flow variables is distributed second stream storage unit; Based on second stream storage unit, from the variate-value of a plurality of processor unit access stream variablees.
2, method according to claim 1, it is characterized in that, storage unit is the local storage of being furnished with on the processing unit, or stream memory, the stream storage unit is to be distributed by the application that operates on the main processor unit, the storage capacity of stream memory does not comprise the support of local storage, for variable distributes buffer unit, preserves the value of variable in the stream storage unit in the buffer unit in the stream buffer memory.
3, method according to claim 1 is characterized in that, heterogeneous multi-processor comprises primary processor, one or more processor unit, API storehouse; Wherein primary processor and processor unit are furnished with shared stream memory; Comprise source code and executable program in the API storehouse; Have at least a processing unit that local storage is arranged in one or more processing units, the distribution of the internal memory of local variable is based on the storage capacity of this local storage in the executable program.
4, method according to claim 1 is characterized in that, a processor unit comprises a CPU or a GPU at least.
CNA2009100149388A 2009-04-30 2009-04-30 Method for sharing stream memory of heterogeneous multi-processor Pending CN101551761A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2009100149388A CN101551761A (en) 2009-04-30 2009-04-30 Method for sharing stream memory of heterogeneous multi-processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2009100149388A CN101551761A (en) 2009-04-30 2009-04-30 Method for sharing stream memory of heterogeneous multi-processor

Publications (1)

Publication Number Publication Date
CN101551761A true CN101551761A (en) 2009-10-07

Family

ID=41156010

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2009100149388A Pending CN101551761A (en) 2009-04-30 2009-04-30 Method for sharing stream memory of heterogeneous multi-processor

Country Status (1)

Country Link
CN (1) CN101551761A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314670A (en) * 2010-06-29 2012-01-11 技嘉科技股份有限公司 Processing module, operating system and processing method
CN102323917A (en) * 2011-09-06 2012-01-18 中国人民解放军国防科学技术大学 Shared memory based method for realizing multiprocess GPU (Graphics Processing Unit) sharing
CN102870096A (en) * 2010-05-20 2013-01-09 苹果公司 Subbuffer objects
CN102902654A (en) * 2012-09-03 2013-01-30 东软集团股份有限公司 Method and device for linking data among heterogeneous platforms
CN103412823A (en) * 2013-08-07 2013-11-27 格科微电子(上海)有限公司 Chip architecture based on ultra-wide buses and data access method of chip architecture
CN103559078A (en) * 2013-11-08 2014-02-05 华为技术有限公司 GPU (Graphics Processing Unit) virtualization realization method as well as vertex data caching method and related device
CN104836970A (en) * 2015-03-27 2015-08-12 北京联合大学 Multi-projector fusion method based on GPU real-time video processing, and multi-projector fusion system based on GPU real-time video processing
CN105427236A (en) * 2015-12-18 2016-03-23 魅族科技(中国)有限公司 Method and device for image rendering
CN105900065A (en) * 2014-01-13 2016-08-24 华为技术有限公司 Method for pattern processing
CN107180010A (en) * 2016-03-09 2017-09-19 联发科技股份有限公司 Heterogeneous computing system and method
CN109471673A (en) * 2017-09-07 2019-03-15 智微科技股份有限公司 For carrying out the method and electronic device of hardware resource management in electronic device
CN109921895A (en) * 2019-02-26 2019-06-21 成都国科微电子有限公司 A kind of calculation method and system of data hash value
CN110704362A (en) * 2019-09-12 2020-01-17 无锡江南计算技术研究所 Processor array local storage hybrid management technology
CN110990151A (en) * 2019-11-24 2020-04-10 浪潮电子信息产业股份有限公司 Service processing method based on heterogeneous computing platform
WO2020134833A1 (en) * 2018-12-29 2020-07-02 深圳云天励飞技术有限公司 Data sharing method, device, equipment and system
CN111625330A (en) * 2020-05-18 2020-09-04 北京达佳互联信息技术有限公司 Cross-thread task processing method and device, server and storage medium

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102870096A (en) * 2010-05-20 2013-01-09 苹果公司 Subbuffer objects
US9691346B2 (en) 2010-05-20 2017-06-27 Apple Inc. Subbuffer objects
CN102870096B (en) * 2010-05-20 2016-01-13 苹果公司 Sub-impact damper object
CN102314670B (en) * 2010-06-29 2016-04-27 技嘉科技股份有限公司 There is the processing module of painting processor, operating system and disposal route
CN102314670A (en) * 2010-06-29 2012-01-11 技嘉科技股份有限公司 Processing module, operating system and processing method
CN102323917A (en) * 2011-09-06 2012-01-18 中国人民解放军国防科学技术大学 Shared memory based method for realizing multiprocess GPU (Graphics Processing Unit) sharing
CN102323917B (en) * 2011-09-06 2013-05-15 中国人民解放军国防科学技术大学 Shared memory based method for realizing multiprocess GPU (Graphics Processing Unit) sharing
CN102902654A (en) * 2012-09-03 2013-01-30 东软集团股份有限公司 Method and device for linking data among heterogeneous platforms
US9250986B2 (en) 2012-09-03 2016-02-02 Neusoft Corporation Method and apparatus for data linkage between heterogeneous platforms
CN103412823A (en) * 2013-08-07 2013-11-27 格科微电子(上海)有限公司 Chip architecture based on ultra-wide buses and data access method of chip architecture
WO2015018237A1 (en) * 2013-08-07 2015-02-12 格科微电子(上海)有限公司 Superwide bus-based chip architecture and data access method therefor
CN103412823B (en) * 2013-08-07 2017-03-01 格科微电子(上海)有限公司 Chip architecture based on ultra-wide bus and its data access method
CN103559078A (en) * 2013-11-08 2014-02-05 华为技术有限公司 GPU (Graphics Processing Unit) virtualization realization method as well as vertex data caching method and related device
CN103559078B (en) * 2013-11-08 2017-04-26 华为技术有限公司 GPU (Graphics Processing Unit) virtualization realization method as well as vertex data caching method and related device
CN105900065A (en) * 2014-01-13 2016-08-24 华为技术有限公司 Method for pattern processing
CN104836970A (en) * 2015-03-27 2015-08-12 北京联合大学 Multi-projector fusion method based on GPU real-time video processing, and multi-projector fusion system based on GPU real-time video processing
CN104836970B (en) * 2015-03-27 2018-06-15 北京联合大学 More projection fusion methods and system based on GPU real time video processings
CN105427236A (en) * 2015-12-18 2016-03-23 魅族科技(中国)有限公司 Method and device for image rendering
CN107180010A (en) * 2016-03-09 2017-09-19 联发科技股份有限公司 Heterogeneous computing system and method
CN109471673A (en) * 2017-09-07 2019-03-15 智微科技股份有限公司 For carrying out the method and electronic device of hardware resource management in electronic device
CN109471673B (en) * 2017-09-07 2022-02-01 智微科技股份有限公司 Method for hardware resource management in electronic device and electronic device
WO2020134833A1 (en) * 2018-12-29 2020-07-02 深圳云天励飞技术有限公司 Data sharing method, device, equipment and system
CN109921895A (en) * 2019-02-26 2019-06-21 成都国科微电子有限公司 A kind of calculation method and system of data hash value
CN110704362A (en) * 2019-09-12 2020-01-17 无锡江南计算技术研究所 Processor array local storage hybrid management technology
CN110704362B (en) * 2019-09-12 2021-03-12 无锡江南计算技术研究所 Processor array local storage hybrid management method
CN110990151A (en) * 2019-11-24 2020-04-10 浪潮电子信息产业股份有限公司 Service processing method based on heterogeneous computing platform
CN111625330A (en) * 2020-05-18 2020-09-04 北京达佳互联信息技术有限公司 Cross-thread task processing method and device, server and storage medium

Similar Documents

Publication Publication Date Title
CN101551761A (en) Method for sharing stream memory of heterogeneous multi-processor
US11847508B2 (en) Convergence among concurrently executing threads
US8707314B2 (en) Scheduling compute kernel workgroups to heterogeneous processors based on historical processor execution times and utilizations
TWI525540B (en) Mapping processing logic having data-parallel threads across processors
JP5859639B2 (en) Automatic load balancing for heterogeneous cores
KR102253426B1 (en) Gpu divergence barrier
US9477526B2 (en) Cache utilization and eviction based on allocated priority tokens
US9135077B2 (en) GPU compute optimization via wavefront reforming
US9354892B2 (en) Creating SIMD efficient code by transferring register state through common memory
KR101477882B1 (en) Subbuffer objects
Tang et al. Controlled kernel launch for dynamic parallelism in GPUs
US9626216B2 (en) Graphics processing unit sharing between many applications
Aoki et al. Hybrid opencl: Enhancing opencl for distributed processing
US20170053374A1 (en) REGISTER SPILL MANAGEMENT FOR GENERAL PURPOSE REGISTERS (GPRs)
CN103176848A (en) Compute work distribution reference counters
US11934867B2 (en) Techniques for divergent thread group execution scheduling
JP2021034020A (en) Methods and apparatus to enable out-of-order pipelined execution of static mapping of workload
CN101599009A (en) A kind of method of executing tasks parallelly on heterogeneous multiprocessor
Dastgeer et al. Flexible runtime support for efficient skeleton programming on heterogeneous GPU-based systems
KR20140001970A (en) Device discovery and topology reporting in a combined cpu/gpu architecture system
US10289418B2 (en) Cooperative thread array granularity context switch during trap handling
KR20140004654A (en) Methods and systems for synchronous operation of a processing device
Hoffmann et al. Dynamic task scheduling and load balancing on cell processors
KR101755154B1 (en) Method and apparatus for power load balancing for heterogeneous processors
US12020076B2 (en) Techniques for balancing workloads when parallelizing multiply-accumulate computations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20091007