CN104899008A - Shared storage structure and method in parallel processor - Google Patents
Shared storage structure and method in parallel processor Download PDFInfo
- Publication number
- CN104899008A CN104899008A CN201510346816.4A CN201510346816A CN104899008A CN 104899008 A CN104899008 A CN 104899008A CN 201510346816 A CN201510346816 A CN 201510346816A CN 104899008 A CN104899008 A CN 104899008A
- Authority
- CN
- China
- Prior art keywords
- thread
- counter
- line end
- multicast
- write
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
- Multi Processors (AREA)
Abstract
The invention belongs to the field of computer system structures and particularly relates to information storage and transmission in parallel processors. According to the technical scheme, the structure comprises a plurality of information channels, a plurality of multicast counter sets and an arbiter. By means of the structure and the method, an information transmission mechanism among parallel processor multiple threads is provided, the information multicast transmission is achieved, and memory access mutual exclusion check can be completed only through one clock, accordingly, communication costs and power consumption are reduced greatly. By means of the structure and the method, parallel computation can be performed on programmable processor arrays efficiently.
Description
Technical field
The invention belongs to field of computer architecture, between the thread particularly in parallel processor, adopt exchanges data and the storage of information transmission mechanism.
Background technology
Modern computing machine architecture improves its processing power with multiprocessor, multi-threaded parallel mode, and this mode requires to exchange information to low expense between multithreading.And the information transmission of cross-thread at present adopts by the mutual exclusion mechanism of software simulating usually, efficiency is very low, realizes also very complicated.
Summary of the invention
The object of the invention is: provide a kind of be specifically designed to adopt between multiprocessor multithreading information transmission mechanism, the shared storage organization of exchanges data and method, efficiently complete the message exchange of multithreading between processor in parallel machine.
Technical scheme of the present invention is: the shared storage organization in a kind of parallel processor, is characterized in that: it comprises: several information channel, read-write binary channels storer, several multicast counter group, and moderator;
Described several information channel connects multiple parallel processor separately, and parallel processor described in each performs 2-16 thread; Described several information channels include: address signal line end, storage enable signal line end, read-write select signal line end, byte to select signal line end, input data signal line end, outputting data signals line end, export useful signal line end; Described input data signal line end comprises: input data line end, multicast counting line end and obstruction/non-blocking mode line end;
The data that described read-write binary channels storer transmits for accessing described cross-thread, data width is 4 bytes, can by byte access, and data are the single precision word of 32 or the double precision word of 64;
The each self-contained n counter of described several multicast counter groups, described counter, for recording multicast count information, is chosen by address signal, and corresponding signal controls to read and write and decrement operations;
Described moderator performs repeating query arbitration algorithm, and each selection described thread is read and write described read-write binary channels storer.
A shared storage means in parallel processor, the shared storage organization in its upper described parallel processor, and perform following operation:
A. the write operation of blocking model
A1. when several thread from described information channel asks to write data to described binary channels storer simultaneously, choose one of them thread by described moderator, and export the address signal that this thread provides;
A2. choose a described multicast counter group by the minimum 1-2 position of described address signal, choose corresponding counter in this multicast counter group by other position of described address signal, check whether the value of this counter is 0;
If be A3. 0, then this counter of multicast counting write this thread provided, described output useful signal line end puts 1, according to this thread given address, data is write described binary channels storer; Otherwise,
A4. described output useful signal line end sets to 0, this write operation failure;
B. the read operation of blocking model
B1. when several thread from described information channel is asked from described binary channels storer read data simultaneously, one of them thread is chosen by described moderator, and exporting the address signal that this thread provides, this address signal is also used for selecting a byte of described binary channels storer;
B2. choose a described multicast counter group by the minimum 1-2 position of described address signal, choose corresponding counter in this multicast counter group by other position of described address signal, check whether the value of this counter is greater than 0;
If be B3. greater than 0, then this Counter Value is subtracted 1, described output useful signal line end puts 1, reads the specified bytes of described binary channels storer according to the given address of this thread; Otherwise,
B4. described output useful signal line end sets to 0, this read operation failure;
C. the write operation of non-blocking mode
C1. when several thread from described information channel asks to write data to described binary channels storer simultaneously, choose one of them thread by described moderator, and export the address signal that this thread provides;
C2. choose a described multicast counter group by the minimum 1-2 position of described address signal, choose corresponding counter in this multicast counter group by other position of described address signal;
C3. this counter of multicast counting write this thread provided, described output useful signal line end puts 1, according to this thread given address, data is write described binary channels storer;
D. the read operation of non-blocking mode
D1. when several thread from described information channel is asked from described binary channels storer read data simultaneously, one of them thread is chosen by described moderator, and exporting the address signal that this thread provides, this address signal is also used for selecting a byte of described binary channels storer;
D2. choose a described multicast counter group by the minimum 1-2 position of described address signal, choose corresponding counter in this multicast counter group by other position of described address signal, check whether the value of this counter is greater than 0;
If be D3. greater than 0, then this Counter Value is subtracted 1, described output useful signal line end puts 1, reads the specified bytes of described binary channels storer according to the given address of this thread; Otherwise,
D4. described output useful signal line end sets to 0, this read operation failure.
The invention provides the information transmission mechanism between parallel processor multithreading, realize the multicast transmission of information, usually only need a clock just can complete memory access mutual exclusion inspection, greatly reduce communication overhead and power consumption.Structure and method of the present invention can also realize concurrent operation efficiently on programmable processor array.
Accompanying drawing explanation
Accompanying drawing 1 is structural representation of the present invention;
Accompanying drawing 2 is multicast counter group structural representation in the present invention;
Accompanying drawing 3 is the write operation process flow diagram of blocking model in the present invention;
Accompanying drawing 4 is the read operation process flow diagram of blocking model and non-blocking mode in the present invention;
Accompanying drawing 5 is the write operation process flow diagram of non-blocking mode in the present invention.
Embodiment
Embodiment 1: see accompanying drawing 1,2, the shared storage organization in a kind of parallel processor, is characterized in that: it comprises: several information channel, read-write binary channels storer, several multicast counter group, and moderator;
Described several information channel connects multiple parallel processor separately, and parallel processor described in each performs 2-16 thread; Described several information channels include: address signal line end, storage enable signal line end, read-write select signal line end, byte to select signal line end, input data signal line end, outputting data signals line end, export useful signal line end; Described input data signal line end comprises: input data line end, multicast counting line end and obstruction/non-blocking mode line end;
The data that described read-write binary channels storer transmits for accessing described cross-thread, data width is 4 bytes, can by byte access, and data are the single precision word of 32 or the double precision word of 64;
The each self-contained n counter of described several multicast counter groups, described counter, for recording multicast count information, is chosen by address signal, and corresponding signal controls to read and write and decrement operations;
Described moderator performs repeating query arbitration algorithm, and each selection described thread is read and write described read-write binary channels storer.
Embodiment 2: see accompanying drawing 1,2, the shared storage organization as described in Example 1 in parallel processor, is characterized in that:
The quantity of described information channel is 4, and each described information channel connects 4 parallel processors;
The quantity of described multicast counter group is 16, and the counter that each described multicast counter group comprises is 16, and each described counter is 3-4 bit, supports the data multicast of 6-8 cross-thread;
The address space that storage organization occupies the storage of described parallel processor data is shared in described information transmission, is mapped in the superlatively location part of this data space.
Embodiment 3: see accompanying drawing 3,4,5, the shared storage means in a kind of parallel processor, it uses the shared storage organization in the parallel processor as described in embodiment 1 or 2, and performs following operation:
A. the write operation of blocking model
A1. when several thread from described information channel asks to write data to described binary channels storer simultaneously, choose one of them thread by described moderator, and export the address signal that this thread provides;
A2. choose a described multicast counter group by the minimum 1-2 position of described address signal, choose corresponding counter in this multicast counter group by other position of described address signal, check whether the value of this counter is 0;
If be A3. 0, then this counter of multicast counting write this thread provided, described output useful signal line end puts 1, according to this thread given address, data is write described binary channels storer; Otherwise,
A4. described output useful signal line end sets to 0, this write operation failure;
B. the read operation of blocking model
B1. when several thread from described information channel is asked from described binary channels storer read data simultaneously, one of them thread is chosen by described moderator, and exporting the address signal that this thread provides, this address signal is also used for selecting a byte of described binary channels storer;
B2. choose a described multicast counter group by the minimum 1-2 position of described address signal, choose corresponding counter in this multicast counter group by other position of described address signal, check whether the value of this counter is greater than 0;
If be B3. greater than 0, then this Counter Value is subtracted 1, described output useful signal line end puts 1, reads the specified bytes of described binary channels storer according to the given address of this thread; Otherwise,
B4. described output useful signal line end sets to 0, this read operation failure;
C. the write operation of non-blocking mode
C1. when several thread from described information channel asks to write data to described binary channels storer simultaneously, choose one of them thread by described moderator, and export the address signal that this thread provides;
C2. choose a described multicast counter group by the minimum 1-2 position of described address signal, choose corresponding counter in this multicast counter group by other position of described address signal;
C3. this counter of multicast counting write this thread provided, described output useful signal line end puts 1, according to this thread given address, data is write described binary channels storer;
D. the read operation of non-blocking mode
D1. when several thread from described information channel is asked from described binary channels storer read data simultaneously, one of them thread is chosen by described moderator, and exporting the address signal that this thread provides, this address signal is also used for selecting a byte of described binary channels storer;
D2. choose a described multicast counter group by the minimum 1-2 position of described address signal, choose corresponding counter in this multicast counter group by other position of described address signal, check whether the value of this counter is greater than 0;
If be D3. greater than 0, then this Counter Value is subtracted 1, described output useful signal line end puts 1, reads the specified bytes of described binary channels storer according to the given address of this thread; Otherwise,
D4. described output useful signal line end sets to 0, this read operation failure.
Claims (3)
1. the shared storage organization in parallel processor, is characterized in that: it comprises: several information channel, read-write binary channels storer, several multicast counter group, and moderator;
Described several information channel connects multiple parallel processor separately, and parallel processor described in each performs 2-16 thread; Described several information channels include: address signal line end, storage enable signal line end, read-write select signal line end, byte to select signal line end, input data signal line end, outputting data signals line end, export useful signal line end; Described input data signal line end comprises: input data line end, multicast counting line end and obstruction/non-blocking mode line end;
The data that described read-write binary channels storer transmits for accessing described cross-thread, data width is 4 bytes, can by byte access, and data are the single precision word of 32 or the double precision word of 64;
The each self-contained n counter of described several multicast counter groups, described counter, for recording multicast count information, is chosen by address signal, and corresponding signal controls to read and write and decrement operations;
Described moderator performs repeating query arbitration algorithm, and each selection described thread is read and write described read-write binary channels storer.
2. the shared storage organization in parallel processor according to claim 1, is characterized in that:
The quantity of described information channel is 4, and each described information channel connects 4 parallel processors;
The quantity of described multicast counter group is 16, and the counter that each described multicast counter group comprises is 16, and each described counter is 3-4 bit, supports the data multicast of 6-8 cross-thread;
The address space that storage organization occupies the storage of described parallel processor data is shared in described information transmission, is mapped in the superlatively location part of this data space.
3. the shared storage means in parallel processor, it uses the shared storage organization in parallel processor as claimed in claim 1 or 2, and performs following operation:
A. the write operation of blocking model
A1. when several thread from described information channel asks to write data to described binary channels storer simultaneously, choose one of them thread by described moderator, and export the address signal that this thread provides;
A2. choose a described multicast counter group by the minimum 1-2 position of described address signal, choose corresponding counter in this multicast counter group by other position of described address signal, check whether the value of this counter is 0;
If be A3. 0, then this counter of multicast counting write this thread provided, described output useful signal line end puts 1, according to this thread given address, data is write described binary channels storer; Otherwise,
A4. described output useful signal line end sets to 0, this write operation failure;
B. the read operation of blocking model
B1. when several thread from described information channel is asked from described binary channels storer read data simultaneously, one of them thread is chosen by described moderator, and exporting the address signal that this thread provides, this address signal is also used for selecting a byte of described binary channels storer;
B2. choose a described multicast counter group by the minimum 1-2 position of described address signal, choose corresponding counter in this multicast counter group by other position of described address signal, check whether the value of this counter is greater than 0;
If be B3. greater than 0, then this Counter Value is subtracted 1, described output useful signal line end puts 1, reads the specified bytes of described binary channels storer according to the given address of this thread; Otherwise,
B4. described output useful signal line end sets to 0, this read operation failure;
C. the write operation of non-blocking mode
C1. when several thread from described information channel asks to write data to described binary channels storer simultaneously, choose one of them thread by described moderator, and export the address signal that this thread provides;
C2. choose a described multicast counter group by the minimum 1-2 position of described address signal, choose corresponding counter in this multicast counter group by other position of described address signal;
C3. this counter of multicast counting write this thread provided, described output useful signal line end puts 1, according to this thread given address, data is write described binary channels storer;
D. the read operation of non-blocking mode
D1. when several thread from described information channel is asked from described binary channels storer read data simultaneously, one of them thread is chosen by described moderator, and exporting the address signal that this thread provides, this address signal is also used for selecting a byte of described binary channels storer;
D2. choose a described multicast counter group by the minimum 1-2 position of described address signal, choose corresponding counter in this multicast counter group by other position of described address signal, check whether the value of this counter is greater than 0;
If be D3. greater than 0, then this Counter Value is subtracted 1, described output useful signal line end puts 1, reads the specified bytes of described binary channels storer according to the given address of this thread; Otherwise,
D4. described output useful signal line end sets to 0, this read operation failure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510346816.4A CN104899008B (en) | 2015-06-23 | 2015-06-23 | Shared storage organization in parallel processor and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510346816.4A CN104899008B (en) | 2015-06-23 | 2015-06-23 | Shared storage organization in parallel processor and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104899008A true CN104899008A (en) | 2015-09-09 |
CN104899008B CN104899008B (en) | 2018-10-12 |
Family
ID=54031687
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510346816.4A Expired - Fee Related CN104899008B (en) | 2015-06-23 | 2015-06-23 | Shared storage organization in parallel processor and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104899008B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109845199A (en) * | 2016-09-12 | 2019-06-04 | 马维尔国际贸易有限公司 | Merge the read requests in network device architecture |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1758229A (en) * | 2005-10-28 | 2006-04-12 | 中国人民解放军国防科学技术大学 | Local space shared memory method of heterogeneous multi-kernel microprocessor |
CN1781079A (en) * | 2003-06-11 | 2006-05-31 | 思科技术公司 | Maintaining entity order with gate managers |
US7680988B1 (en) * | 2006-10-30 | 2010-03-16 | Nvidia Corporation | Single interconnect providing read and write access to a memory shared by concurrent threads |
CN102622192A (en) * | 2012-02-27 | 2012-08-01 | 北京理工大学 | Weak correlation multiport parallel store controller |
US8392891B2 (en) * | 2008-06-26 | 2013-03-05 | Microsoft Corporation | Technique for finding relaxed memory model vulnerabilities |
-
2015
- 2015-06-23 CN CN201510346816.4A patent/CN104899008B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1781079A (en) * | 2003-06-11 | 2006-05-31 | 思科技术公司 | Maintaining entity order with gate managers |
CN1758229A (en) * | 2005-10-28 | 2006-04-12 | 中国人民解放军国防科学技术大学 | Local space shared memory method of heterogeneous multi-kernel microprocessor |
US7680988B1 (en) * | 2006-10-30 | 2010-03-16 | Nvidia Corporation | Single interconnect providing read and write access to a memory shared by concurrent threads |
US8392891B2 (en) * | 2008-06-26 | 2013-03-05 | Microsoft Corporation | Technique for finding relaxed memory model vulnerabilities |
CN102622192A (en) * | 2012-02-27 | 2012-08-01 | 北京理工大学 | Weak correlation multiport parallel store controller |
Non-Patent Citations (1)
Title |
---|
黄安文: "多核处理器Cache一致性协议关键技术研究", 《计算机工程与科学》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109845199A (en) * | 2016-09-12 | 2019-06-04 | 马维尔国际贸易有限公司 | Merge the read requests in network device architecture |
CN109845199B (en) * | 2016-09-12 | 2022-03-04 | 马维尔亚洲私人有限公司 | Merging read requests in a network device architecture |
Also Published As
Publication number | Publication date |
---|---|
CN104899008B (en) | 2018-10-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI425512B (en) | Flash memory controller circuit and storage system and data transfer method thereof | |
CN103246625B (en) | A kind of method of data and address sharing pin self-adaptative adjustment memory access granularity | |
CN111433758A (en) | Programmable operation and control chip, design method and device thereof | |
CN111258535B (en) | Ordering method for FPGA implementation | |
KR20110059712A (en) | Independently controlled virtual memory devices in memory modules | |
CN103744644A (en) | Quad-core processor system built in quad-core structure and data switching method thereof | |
CN104317770A (en) | Data storage structure and data access method for multiple core processing system | |
CN103927270A (en) | Shared data caching device for a plurality of coarse-grained dynamic reconfigurable arrays and control method | |
TW202211034A (en) | Method and system of processing dataset, and memory module | |
CN112017700A (en) | Dynamic power management network for memory devices | |
CN104239232A (en) | Ping-Pong cache operation structure based on DPRAM (Dual Port Random Access Memory) in FPGA (Field Programmable Gate Array) | |
CN101930407B (en) | Flash memory control circuit and memory system and data transmission method thereof | |
CN108959149B (en) | Multi-core processor interaction bus design method based on shared memory | |
CN102681820B (en) | The register file of dynamic clustering and use the Reconfigurable Computation device of this register file | |
CN102789424B (en) | External extended DDR2 (Double Data Rate 2) read-write method on basis of FPGA (Field Programmable Gate Array) and external extended DDR2 particle storage on basis of FPGA | |
CN109614145A (en) | A kind of processor core core structure and data access method | |
CN103412848A (en) | Method for sharing single program memory by four-core processor system | |
CN104035898A (en) | Memory access system based on VLIW (Very Long Instruction Word) type processor | |
CN104572519A (en) | Multiport access and storage controller for multiprocessor and control method thereof | |
CN104899008A (en) | Shared storage structure and method in parallel processor | |
CN103365821A (en) | Address generator of heterogeneous multi-core processor | |
CN102930898B (en) | Method of structuring multiport asynchronous storage module | |
CN103748566A (en) | An innovative structure for the register group | |
CN105335296A (en) | Data processing method, apparatus and system | |
CN112565474B (en) | Batch data transmission method oriented to distributed shared SPM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20181012 |