CN102184093A - Parallel computing array framework constructed by multi CELL processors - Google Patents

Parallel computing array framework constructed by multi CELL processors Download PDF

Info

Publication number
CN102184093A
CN102184093A CN2011101588623A CN201110158862A CN102184093A CN 102184093 A CN102184093 A CN 102184093A CN 2011101588623 A CN2011101588623 A CN 2011101588623A CN 201110158862 A CN201110158862 A CN 201110158862A CN 102184093 A CN102184093 A CN 102184093A
Authority
CN
China
Prior art keywords
cell
array
processor
cell processor
processors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011101588623A
Other languages
Chinese (zh)
Inventor
周学功
王伶俐
曹伟
叶晓敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Shanghai Redneurons Co Ltd
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN2011101588623A priority Critical patent/CN102184093A/en
Publication of CN102184093A publication Critical patent/CN102184093A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Power Sources (AREA)

Abstract

The invention belongs to the technical field of a PPGA (Plastic Pin Grid Array) and a parallel counting array, in particular to a parallel counting array framework constructed by multi CELL processors. In the parallel counting array framework, the existing broadband engine interface in a CELL is utilized, a plurality of CELL processors are connected to form an array, software is configured to realize the balance between memory coupling in the array and I/O (Input/Output) transmission and load of non-coupling at an outer part. At the soft configuration aspect, by using a memory coupling CELL broadband engine interface BIF (Boot Image File) protocol of each CELL processor, each CELL processor in the array is connected with the rest CELL processors in the array through IOIF0; and by optimizing the dispatching algorithm, the load among the CELL processors in the array can be dispatched, so that the load balance is realized. In the parallel counting array framework, the optimized stroke dispatching among the CELL processors can be realized and optimized, the utilization rate of the CELL processors can be improved, and the load is balanced, and the power consumption can be reduced to the greatest extent.

Description

The parallel computation array architecture that many CELL processor makes up
Technical field
The invention belongs to FPGA and parallel computing field, be specifically related to a kind of parallel computation array based on the CELL processor.
Background technology
In investment in March calendar year 2001, Sony, new power computer amusement, Toshiba, U.S.'s IBM (IBM) company develop jointly the processor that is used for high-speed computation to the CELL processor by Sony.It is that PowerPC framework with the RISC instruction system designs, and has high clock frequency, the high characteristics such as efficient of carrying out.Be mainly used on PlayStation 3 and the cutter point server.The CELL processor has 1 processor core (comprise 1 simplified by PowerPC970 and come PPE and 8 to be called SPE collaborative process device), and frequency of operation surpasses 4GHz.The CELL processor is a 64-bit Power processor, and built-in 8 processing units that cooperate with each other have and handle the ability that separate type is calculated, and have the ability that uniprocessor moves a plurality of operating systems.
The design of high flexibility and Distributed Calculation are two spotlights of CELL except that high-performance.CELL is applicable in nearly all computing equipment such as from the embedded device to the mainframe computer, so CELL is designed to a general processor platform.If be used for workstation/server system, IBM can directly integrate two pieces of CELL processors with the higher usefulness of acquisition, and will be used for mainframe computer, and CELL then can be configured to comprise " the MCM module " of four pieces of independent processors.Therefore use a plurality of CELL processors to make up array and carry out the important directions that high-effect parallel computation becomes present research.
List of references
[1] IBM.?PowerPC?Microprocessor?Family:?Vector/SIMD?Multimedia?Extension?Technology?Programming?Environments?Manual?[M].?NY,?USA,?2005:?317.
[2] IBM.?PowerPC?Operating?Environment?Architecture?[M].?NY,?USA,?2005:?135。
Summary of the invention
The object of the present invention is to provide a kind of utilization factor that can improve the CELL processor, balanced load, and reduce the parallel computation array architecture of power consumption as far as possible.This array architecture is applicable to the FPGA structure of various complexity towards the FPGA software systems, carries out high performance universal vanning and FPGA placement-and-routing.
The parallel computation array architecture that the present invention proposes is by forming that many CELL processor makes up.Specifically, the parallel computation array architecture that the present invention proposes is connection mode, the communication pattern between the CELL processor, the thread scheduling between the CELL processor and the load balance pattern that redefines between the CELL processor.Wherein:
Described connection mode is with a plurality of CELL processors, by the wideband engine interface, couples together the composition array.
Described communication pattern is to be coupled by the internal memory that wideband engine interface BIF agreement is carried out between the CELL processor, and carries out the communication between the CELL processor.
Described thread scheduling and load balance pattern are with original single CELL multithreading implementation, are decomposed into multitask multiprogramming by software, distribute and are dispatched on the CELL array and carry out, and realize the large-scale data parallel computation.
Concrete operations are as follows:
A) a plurality of CELL processors are coupled together by the wideband engine interface, form the array that comprises some CELL processors;
B) configuration software is used for the intercommunication mutually between the CELL processor, by wideband engine interface BIF agreement, realizes the internal memory coupling between a plurality of CELL processors;
C) configuration software is used for the thread scheduling and the load balance of CELL array, with the support that provides it that large-scale parallel is calculated.
Parallel computation array architecture of the present invention can be implemented under the condition of low-power consumption and carries out high-effect parallel computation.This array architecture is applicable to the FPGA structure of various complexity towards the FPGA software systems, carries out high performance universal vanning and FPGA placement-and-routing.
Description of drawings
Fig. 1 is a parallel computation array architecture model diagram of the present invention.
Embodiment
The concrete basic mode of implementing of the parallel computation array architecture that the present invention proposes is as follows: utilize existing wideband engine interface among the CELL, a plurality of CELL processors are coupled together the formation array, by software arrangements realize this array internal memory coupling and with the uncoupled I/O transmission of outside and the balance of load.Aspect software arrangements, at first utilize the internal memory coupling CELL wideband engine interface BIF agreement of CELL processor, can allow each CELL processor can be connected to remaining CELL processor in the array by IOIF0 in the array; Secondly by the Optimization Dispatching algorithm, will dispatch between the CELL processor of computing load in array, realize load balance.The CELL array architecture that structure is finished is a core with one of them CELL processor, is responsible for task analysis, thread scheduling; Utilize configuration software and dispatching algorithm, the large-scale parallel data operation is packaged into suitable single CELL processor Program for Calculation, be assigned in the array and carry out on each CELL processor; Each CELL processor all keeps original architecture in the array, and single CELL processor has had powerful computation capability, and the CELL array just can be realized the multiplier effect of parallel computation like this.
Specifically be described below:
A) a plurality of CELL processors are coupled together by the wideband engine interface, form the array that comprises some CELL processors,, realize the internal memory coupling between a plurality of CELL processors by wideband engine interface BIF agreement;
B) a CELL processor in the array is a processor controls, is responsible for creation task, and according to the loading condition of each processor in the array task is distributed in other processors and moves;
C) load factor of CELL processor is the data traffic between other processors and the weighted mean value of three factors of SPE occupancy in data traffic, PPE and the array between PPE and SPE in the unit interval, and the weights of three factors can be adjusted according to needs of different applications.Processor controls is distributed to the minimum processor of load factor with the task of each new establishment;
C) task is the program of a multithreading, comprises one or more PPE threads, and a plurality of SPE thread;
D) background program of operation on the CELL of each in the array processor is responsible for monitoring processor communication and load calculated rate, reports to processor controls;
E) background program on each CELL processor receives task from processor controls, and the initiating task operation.

Claims (2)

1. the parallel computation array architecture that makes up of CELL processor more than a kind, it is characterized in that by forming that many CELL processor makes up, and redefine connection mode, the communication pattern between the CELL processor, the thread scheduling between the CELL processor and load balance pattern between the CELL processor; Wherein:
Described connection mode is with a plurality of CELL processors, by the wideband engine interface, couples together the composition array;
Described communication pattern is to be coupled by the internal memory that wideband engine interface BIF agreement is carried out between the CELL processor, and carries out the communication between the CELL processor;
Described thread scheduling and load balance pattern are with original single CELL multithreading implementation, are decomposed into multitask multiprogramming by software, distribute and are dispatched on the CELL array and carry out, and realize the large-scale data parallel computation.
2. the parallel computation array architecture that many CELL processor according to claim 1 makes up is characterized in that:
A CELL processor in the array is a processor controls, is responsible for creation task, and according to the loading condition of each processor in the array task is distributed in other processors and moves;
Background program of operation is responsible for monitoring processor communication and load calculated rate on each CELL processor in array, reports to processor controls; Background program receives task from processor controls simultaneously, and the initiating task operation.
CN2011101588623A 2011-06-14 2011-06-14 Parallel computing array framework constructed by multi CELL processors Pending CN102184093A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011101588623A CN102184093A (en) 2011-06-14 2011-06-14 Parallel computing array framework constructed by multi CELL processors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011101588623A CN102184093A (en) 2011-06-14 2011-06-14 Parallel computing array framework constructed by multi CELL processors

Publications (1)

Publication Number Publication Date
CN102184093A true CN102184093A (en) 2011-09-14

Family

ID=44570274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011101588623A Pending CN102184093A (en) 2011-06-14 2011-06-14 Parallel computing array framework constructed by multi CELL processors

Country Status (1)

Country Link
CN (1) CN102184093A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448516A (en) * 2021-06-04 2021-09-28 山东英信计算机技术有限公司 Data processing method, system, medium and equipment based on RAID card

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458634A (en) * 2008-01-22 2009-06-17 中兴通讯股份有限公司 Load equilibration scheduling method and device
US20100013656A1 (en) * 2008-07-21 2010-01-21 Brown Lisa M Area monitoring using prototypical tracks
CN101657795A (en) * 2007-04-11 2010-02-24 苹果公司 Data parallel computing on multiple processors
CN101661457A (en) * 2008-08-29 2010-03-03 国际商业机器公司 Method and device for solving triangular linear equation set of multiprocessor system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101657795A (en) * 2007-04-11 2010-02-24 苹果公司 Data parallel computing on multiple processors
CN101458634A (en) * 2008-01-22 2009-06-17 中兴通讯股份有限公司 Load equilibration scheduling method and device
US20100013656A1 (en) * 2008-07-21 2010-01-21 Brown Lisa M Area monitoring using prototypical tracks
CN101661457A (en) * 2008-08-29 2010-03-03 国际商业机器公司 Method and device for solving triangular linear equation set of multiprocessor system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448516A (en) * 2021-06-04 2021-09-28 山东英信计算机技术有限公司 Data processing method, system, medium and equipment based on RAID card

Similar Documents

Publication Publication Date Title
Gong et al. MALOC: A fully pipelined FPGA accelerator for convolutional neural networks with all layers mapped on chip
CN102710477B (en) Data processing system based on VPX bus structure
CN110119311A (en) A kind of distributed stream computing system accelerated method based on FPGA
CN105190542B (en) The scalable method for calculating structure is provided, calculates equipment and printing device
CN102135949B (en) Computing network system, method and device based on graphic processing unit
CN103279445A (en) Computing method and super-computing system for computing task
CN102360313B (en) Performance acceleration method of heterogeneous multi-core computing platform on chip
CN102812439A (en) Power management in a multi-processor computer system
WO2011162628A3 (en) Apparatus and method for data stream processing using massively parallel processors
CN109992385A (en) A kind of inside GPU energy consumption optimization method of task based access control balance dispatching
CN102306139A (en) Heterogeneous multi-core digital signal processor for orthogonal frequency division multiplexing (OFDM) wireless communication system
CN109522108A (en) A kind of GPU task scheduling system and method merged based on Kernel
CN104537713B (en) A kind of novel three-dimensional reconfiguration system
Zhang et al. Comparison and analysis of GPGPU and parallel computing on multi-core CPU
CN104360962B (en) Be matched with multistage nested data transmission method and the system of high-performance computer structure
CN103049329A (en) High-efficiency system based on central processing unit (CPU)/many integrated core (MIC) heterogeneous system structure
CN102184093A (en) Parallel computing array framework constructed by multi CELL processors
CN102929714B (en) uC/OS-II-based hardware task manager
CN105957131B (en) Graphic system and its method
CN201274500Y (en) Parallel file transmission server group system based on MPI
CN103020008A (en) Reconfigurable micro server with enhanced computing power
CN100589080C (en) CMP task allocation method based on hypercube
Andrade et al. Efficient execution of microscopy image analysis on CPU, GPU, and MIC equipped cluster systems
CN104699520B (en) A kind of power-economizing method based on virtual machine (vm) migration scheduling
Raval et al. Low-power TinyOS tuned processor platform for wireless sensor network motes

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
ASS Succession or assignment of patent right

Owner name: SHANGHAI REDNEURONS INFORMATION TECHNOLOGY CO., LT

Effective date: 20111205

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20111205

Address after: 200433 Handan Road, Shanghai, No. 220, No.

Applicant after: Fudan University

Co-applicant after: Shanghai RedNeurons Information Technology Co., Ltd.

Address before: 200433 Handan Road, Shanghai, No. 220, No.

Applicant before: Fudan University

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110914