WO2013005343A1 - Apparatus and method for a marker guided data transfer between a single memory and an array of memories with unevenly distributed data amount in an simd processor system - Google Patents

Apparatus and method for a marker guided data transfer between a single memory and an array of memories with unevenly distributed data amount in an simd processor system Download PDF

Info

Publication number
WO2013005343A1
WO2013005343A1 PCT/JP2011/065739 JP2011065739W WO2013005343A1 WO 2013005343 A1 WO2013005343 A1 WO 2013005343A1 JP 2011065739 W JP2011065739 W JP 2011065739W WO 2013005343 A1 WO2013005343 A1 WO 2013005343A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
processing element
marker
memory
single memory
Prior art date
Application number
PCT/JP2011/065739
Other languages
French (fr)
Inventor
Hanno Lieske
Original Assignee
Renesas Electronics Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renesas Electronics Corporation filed Critical Renesas Electronics Corporation
Priority to PCT/JP2011/065739 priority Critical patent/WO2013005343A1/en
Priority to TW101122857A priority patent/TWI512614B/en
Publication of WO2013005343A1 publication Critical patent/WO2013005343A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4234Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being a memory bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • G06F15/8015One dimensional arrays, e.g. rings, linear arrays, buses

Definitions

  • the present invention relates to a data transfer between a single memory and an array of memories in an SIMD processor system. More particularly, it relates to a fast data transfer with small implementation costs and low data transfer amount increase for an unevenly distributed amount of data in each memory of the memory array.
  • the amount of data compressed in the memory of each PE of the PE array can be different.
  • this situation could occur in the following case.
  • An image data that is taken with a CCD camera or a CMOS sensor is processed by a plurality of PEs in parallel processing.
  • an image compression is executed by the PEs. Because image compression ratio could be different depending on a portion of the image data, the amount of data compressed in the memory of each PE of the PE array can be different.
  • the amount of data to be transferred to single memory depends on the highest amount of data stored in any of the memories of the memory array, because the highest amount of data determines the number of data transfers to transfer all necessary data between the memory array and the single memory.
  • Fig. 18 shows the structure of the architecture used to explain the data transfer between an internal memory array and a single external memory in case of evenly distributed data in the memory array as presented in NPL 1.
  • the architecture consists of an array of PEs with memory 14.
  • the array is composed of PEs 11 and memory elements 12 which are grouped into group of 4 "PE with memory element" 13.
  • Data is transferred between the internal memory array and a single external memory 18 over a bus system 15 which is a pipelined ring bus.
  • the registers 16 are arranged over the ring bus in such a way, that between 2 registers either a group of PE or the control unit 17 is connected to the bus 15.
  • NPL1 for the write transfer from internal to external memory, the evenly distributed data in the memory elements of the internal memory array is accessed at the same time.
  • the read data from each memory element is then stored into the registers on the ring bus from where they are successively transferred to the external memory.
  • the data is successively read element wise from external memory and stored in the registers on the ring bus from where the data is finally stored at the same time into the memory elements of the internal memory array.
  • NPL 1 S. yo, et.al.
  • S. yo, et.al. A Low-Cost Mixed-Mode Parallel Processor Architecture for Embedded Systems
  • PTL1 H06-75929 discloses a parallel processing device in which one processing element transmit its loads to other processing elements, thereby dispersing loads between PEs.
  • PTL2 H05 -94425) discloses a task managing method to reduce a required time for load allocation.
  • PTL3 WO2009/131007 discloses a SIMD parallel computer system which uniform processing loads between PEs.
  • the present invention has been made in view of the above mentioned problems, and an object of the present invention is to provide a possibility to reduce the data amount which has to be stored to the single memory compared to the case described in for the case of unevenly distributed data stored inside the memory elements of the memory array.
  • a data transfer apparatus comprising:
  • processing element array that has multiple processing elements controlled in a Single Instruction Multiple Data style; memory elements that are provided inside each of the processing elements, data access to all the memory elements of the processing elements being done in parallel;
  • control unit controlling the processing element array in the Single Instruction Multiple Data style
  • an end marker setting unit that is responsible to set an end marker at an end of a data stream stored inside the memory elements
  • the marker evaluation unit for write direction has the task to delete the data which is transferred from that processing element in the following rows, and
  • the marker evaluation unit for read direction has the task to insert data for that processing element in the following line.
  • unevenly distributed data inside the memory elements of the PE array can be transferred fast and efficient with small hardware
  • Fig. 1 shows the architecture of the SIMD processor.
  • Fig. 2 shows an end marker setting unit.
  • Fig. 3 shows an example of state where the end markers have been added to data stored inside the each memory.
  • Fig. 4A shows a situation where the memory is divided into sections of 4 bytes and 9 bytes data are stored inside the memory.
  • FIG. 4B shows the end marker set at an unaligned position at the end of the data stream.
  • FIG. 4C shows the end maker set at an aligned position at the end of the data stream.
  • Fig. 5 shows the marker evaluation unit for write direction of the marker evaluation apparatus.
  • Fig. 6A shows the transition of flag values inside the flag register.
  • Fig. 6B shows the transition of flag values inside the flag register.
  • Fig. 7 shows data that is stored in the PE array.
  • FIG. 8 shows the data that the end markers have been already set inside the each memory.
  • Fig. 9 shows a flowchart that is executed in the marker evaluation unit for write direction 420.
  • Fig. 10 shows a flowchart that is executed in the marker evaluation unit for write direction 420.
  • Fig. 11 shows the data 1001 in the external single memory after the transfer is finished.
  • Fig. 12 shows the marker evaluation unit for read direction of the marker evaluation apparatus.
  • Fig. 13 shows the operation of the selection switch.
  • Fig. 14 shows the stored data in the external memory and output data 1102 to the PE array.
  • FIG. 15 shows data that are transferred from the single external memory to the PE array.
  • Fig. 16 shows a possible system design in which the SIMD processor with the example architecture could operate.
  • Fig. 17 shows the case where an end marker setting unit 1302 is placed next to the marker evaluation apparatus 1301 into the control unit 1300.
  • Fig. 18 shows a structure of a architecture used to explain the data transfer between an internal memory array and a single external memory in case of evenly distributed data in the memory array as presented in NPL1.
  • Fig. 1 shows the architecture of the SIMD processor 100.
  • the architecture of the SIMD processor 100 in Fig. 1 has an array 200 of PEs 220.
  • four PEs 210 compose one group 210 of PEs 220.
  • each PEs 220 has an end marker setting unit 240.
  • Fig. 2 shows the end marker setting unit 240.
  • the end marker setting unit 240 adds in each PE 220 an end marker at the end of the data stream which should be transferred from the memory 230 of the PE 220 to a single memory 500.
  • the setting of the end marker can be either at an unaligned position (Fig. 4B) or an aligned position (Fig. 4C).
  • Fig. 3 shows an example of state where the end markers 600 have been added to data 231 stored inside the each memory 230.
  • the memory is divided into sections of 4 bytes and 9 bytes data are stored inside the memory, as shown in Fig. 4A.
  • Fig. 4B shows the end marker set at an unaligned position at the end of the data stream
  • Fig. 4C shows the end maker set at an aligned position at the end of the data stream.
  • the selection whether the end marker or the data input is transferred to the data output is done using the data output selector 241. Data from the memory 230 is input to the end marker setting unit 240 sequentially.
  • the end marker setting unit 240 determines whether the input data from the memory 230 is the end data (the last data) or not.
  • the data output selector 241 adds the end marker 600 to the end data.
  • the data output selector 241 allow the input data pass without any change.
  • Data is transferred between the PE array 200 and the single external memory 500 over a bus system 300 which is in this embodiment a pipelined ring bus.
  • Some registers (shift register) 310 are arranged over the ring bus 300 in such a way that, between two registers 310, either a group of PEs 210 or the control unit 400 is connected to the ring bus 300.
  • the ring bus 300 has a capacity of 128 bits and an each line 250 that connects each PE 220 and the ring bus 300 has a capacity of 32 bits.
  • the control unit 400 has a marker evaluation apparatus 410, which apparatus has a marker evaluation unit for write direction 420 and a marker evaluation unit for read direction 430.
  • Transferred data is passing either the marker evaluation unit for write direction 420 or the marker evaluation unit for read direction 430 inside the marker evaluation apparatus 410.
  • Fig. 5 shows the marker evaluation unit for write direction 420 of the marker evaluation apparatus 410.
  • Data transferred from the memories 230 via the ring bus 300 is taken in to the control unit 400.
  • the data taken into the control unit 400 is input to the marker evaluation unit for write direction 420.
  • a comparator 421 is provided in the marker evaluation unit for write direction 420.
  • the comparator 421 has an inverter at an output terminal.
  • an end marker code is input to the comparator 421.
  • the data input is compared with the end marker code in a comparator 421.
  • the result is stored in a flag register 422 which is provided at the latter stage of the comparator 421.
  • the output of the flag register 422 controls a switch 423, which has the task to let the input data only pass to the output buffer 424 if not earlier an end marker had been detected for that PE. If an end marker had been detected for that PE, no data is allowed to pass.
  • Fig. 6A and Fig. 6B shows the transition of flag values inside the flag register 422.
  • the flag register 422 stores flag status for each PEs 220, which flag status represents whether the end marker of certain PE 220 had passed or not. As shown in Fig. 6A, at first, all flag values are 'T'.
  • the comparator 421 output low level signal.
  • the flag value for PE6 is changed to "0" as shown in Fig. 6B.
  • the switch 423 opens and does not pass data from PE6.
  • Data from the switch are stored in an output buffer 424 temporarily.
  • the output buffer 424 has a capacity of 128 bytes.
  • the status of the output buffer 424 is checked in a comparator 425 whether it is full or not. In the case that the output buffer 424 is full, the data is sent to the single memory 500 by switching on a switch 426 and the buffer 424 is emptied by switching on a switch 427.
  • the end marker setting unit 240 adds the end marker to data which is stored in the own memory.
  • the end marker is set either aligned or unaligned, as shown in Fig. 4B and Fig. 4C.
  • Each PE 220 outputs data stored in the own memory to the ring bus 300 sequentially.
  • the data output from each PE 220 are transferred to the control unit 400 and the control unit 400 takes in the data (ST100). Every time the control unit 400 takes in the data, the flowchart of Fig. 9 and Fig. 10 are executed in the marker evaluation unit for write direction 420.
  • Received data (ST 100) is compared with the end marker code by the comparator 421 (ST110). The result is output the flag register 422 and updates an appropriate flag which is the flag for the PE 220 where the data element belongs to.
  • the flag value specifies whether the end marker has been transferred with this data element or not.
  • the flag value is changed to "0" (ST 120).
  • the flag value is kept at "1" and next data is received (ST100).
  • the information of flag value is read out of the flag register 422 (ST200) and the data is transferred to the output buffer 424 depending on the flag value. This selection is performed with the switch 423.
  • the data is transferred to the output buffer 424 (ST230).
  • the data is not transferred (ST220). In other words, in the case that an end marker is detected regarding certain PE 220, the data from this PE in the following rows is automatically skipped.
  • Fig. 11 shows the data 1001 in the external single memory 500 after the transfer is finished.
  • the first data of each PE 220 is transferred to the external single memory 500 starting from the left side, then, the following rows are transferred.
  • the end marker of PE6 is detected in the second row; therefore data from PE6 is skipped in the third line.
  • the end marker of PE3 is detected in the third row, therefore data from PE3 is skipped in the fourth line.
  • the output buffer 242 is checked whether all places are filled with elements (ST250). In the case that the output buffer 242 is full (ST250: YES), the data stored in the output buffer 242 is sent to the single external memory 500 while the content of the output buffer 242 is cleared.
  • unevenly distributed data inside the memory elements of the PE array 200 can be transferred fast and efficient with small hardware implementation costs and low data transfer amount increase to the single memory, because the end markers are set in advance and in the case that the end marker is detected regarding certain PE 220, the data from this PE in the following rows can be automatically skipped.
  • Fig. 12 shows the marker evaluation unit for read direction 430 of the marker evaluation apparatus 410. Data is transferred from the external memory 500 to the control unit 400.
  • the data 1001 in the external single memory 500 is already processed so that the end marker is added to appropriate positions, after data from PEs are transferred in the manner described in the first embodiment.
  • the data received into the control unit 400 is input to the marker evaluation unit for read direction 430.
  • a comparator 431 , a flag register 432, an output buffer 424, a comparator 425, a switch 436, and a switch 437 are fundamentally equal to corresponding part of the maker evaluation unit for write direction 420 of the first embodiment.
  • a selection switch 433 is provided in the marker evaluation unit for read direction 430.
  • the output of the flag register 432 controls the selection switch 433.
  • the selection switch 433 has the task to let the input data pass to the output buffer 434 if not earlier an end marker had been detected for that PE. If an end marker had been detected, instead zero data is passed to the output buffer 434 for that PE.
  • the operation of the flag register 432 is fundamentally equal to the operation of the flag register 422 of the marker evaluation unit for write direction 420. Fig. 9 and the explanation thereof can be applied to the flag register 432.
  • Fig. 13 shows the operation of the selection switch 433.
  • the information of the flag register 432 is read out of the flag register (ST300) and the data input from the external memory 500 is transferred to the output buffer depending on the flag value. This selection is performed in the selection switch 433. In the case that the flag value is "1" (ST310: NO), the data is transferred to the output buffer434 (ST330). In the case that the flag value is "0" (ST310: YES), instead zero data is transferred to the output buffer 434 (ST320).
  • Fig. 14 shows the stored data 1101 in the external memory 500 and output data 1102 to the PE array 200. Starting from the left side, data element is transferred sequentially to each memory of the PE array 200.
  • this end marker is the last data from the external memory 500 which is transferred for this PE to the PE array 200.
  • data from the external memory 500 can be stored in an unevenly distributed form inside the memory units of the PE array 200 efficiently with small hardware implementation costs.
  • Fig. 16 shows a possible system design in which the SIMD processor 1202 with the example architecture could operate.
  • Other units inside the system could be a central processing unit 1201 and a single memory element 1203, which are all connected over connections 1205 to a bus system 1204.
  • Fig. 17 shows the case where the end marker setting units are taken out of each PE and one end marker setting unit (global marker setting unit) 1302 is placed next to the marker evaluation apparatus 1301 into the control unit 1300, responsible to set the end markers in all single memory elements of the memory array on request of the responsible processing elements.
  • the present invention can be applied to a method and an apparatus for an image processing, and the image data can be acquired with a camera, a laser probe, or an internet.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Multi Processors (AREA)
  • Image Processing (AREA)

Abstract

An end marker setting unit sets an end marker at an end of a data stream stored inside memory elements. When transferring data from a processing element array to a single memory over a bus system, in the case that the end marker is detected regarding certain processing element, a marker evaluation unit for write direction deletes the data which is transferred from that processing element in the following rows. And when transferring data from the single memory to the processing element array, in the case that the end marker for certain processing element is detected, a marker evaluation unit for read direction inserts data for that processing element in the following row.

Description

DESCRIPTION
Title of Invention
APPARATUS AND METHOD FOR A MARKER GUIDED DATA TRANSFER BETWEEN A SINGLE MEMORY AND AN ARRAY OF MEMORIES WITH UNEVENLY DISTRIBUTED DATA AMOUNT IN AN SIMD PROCESSOR SYSTEM
Technical Field
[0001]
The present invention relates to a data transfer between a single memory and an array of memories in an SIMD processor system. More particularly, it relates to a fast data transfer with small implementation costs and low data transfer amount increase for an unevenly distributed amount of data in each memory of the memory array.
Background Art
[0002]
When processing e.g. a compression algorithm on the processing elements (PE) in an SIMD processor, the amount of data compressed in the memory of each PE of the PE array can be different.
For an example, this situation could occur in the following case. An image data that is taken with a CCD camera or a CMOS sensor is processed by a plurality of PEs in parallel processing. As an example of an image processing, an image compression is executed by the PEs. Because image compression ratio could be different depending on a portion of the image data, the amount of data compressed in the memory of each PE of the PE array can be different.
[0003]
When transferring the unevenly distributed data amount to a single memory by using a bus system to which each memory of the array is accessing only in parallel, the amount of data to be transferred to single memory depends on the highest amount of data stored in any of the memories of the memory array, because the highest amount of data determines the number of data transfers to transfer all necessary data between the memory array and the single memory.
[0004]
For the case of unevenly distributed data in the memory array, there exists the time point, where some memories have already transferred all compressed data while other memories still have to transfer further data.
Because of the SIMD style data transfer, however, all memories are accessed at the same time to, e.g., read the same amount of data out which is then transferred over a bus system to the single memory, so that large amount of data overhead is transferred which reduces for the compression example the reachable compression factor.
[0005]
For the case of evenly distributed data in the memory array, there exists a solution to transfer the data between an internal memory array and a single external memory over a ring bus as described in NPL 1.
Fig. 18 shows the structure of the architecture used to explain the data transfer between an internal memory array and a single external memory in case of evenly distributed data in the memory array as presented in NPL 1.
[0006]
The architecture consists of an array of PEs with memory 14. The array is composed of PEs 11 and memory elements 12 which are grouped into group of 4 "PE with memory element" 13. Data is transferred between the internal memory array and a single external memory 18 over a bus system 15 which is a pipelined ring bus.
The registers 16 are arranged over the ring bus in such a way, that between 2 registers either a group of PE or the control unit 17 is connected to the bus 15.
[0007]
In NPL1, for the write transfer from internal to external memory, the evenly distributed data in the memory elements of the internal memory array is accessed at the same time.
The read data from each memory element is then stored into the registers on the ring bus from where they are successively transferred to the external memory.
For the read direction, the data is successively read element wise from external memory and stored in the registers on the ring bus from where the data is finally stored at the same time into the memory elements of the internal memory array.
Citation List
Patent Literature
[0008]
PTL 1: Japanese unexamined patent application publication No.H06-75929
PTL 2: Japanese unexamined patent application publication No.H05-94425
PTL 3: International Patent Publication No. WO2009/131007
Non Patent Literature
[0009] NPL 1: S. yo, et.al.," A Low-Cost Mixed-Mode Parallel Processor Architecture for Embedded Systems", Proceedings of the 21st annual international conference on
Supercomputing, ICS'07, June 2007 Summary of Invention
Technical Problem
[0010]
While the described data transfer between the internal memory array and the external single memory is working for evenly distributed data without data storage overhead in the external memory, unevenly distributed data transfers would require such data storage overhead.
This is due to the fact that the internal memory array can only be accessed line wise, not element wise, and a line consists of "number of memory elements inside the memory array" elements, so using the line data transfer described in NPLl would require storing line wise data to external memory till the required data from all the internal memory elements have been transferred.
[0011]
Here, PTL1 (H06-75929) discloses a parallel processing device in which one processing element transmit its loads to other processing elements, thereby dispersing loads between PEs. PTL2 (H05 -94425) discloses a task managing method to reduce a required time for load allocation. Furthermore, PTL3 (WO2009/131007) discloses a SIMD parallel computer system which uniform processing loads between PEs. However, even though employing these techniques disclosed in above patent literatures, the above problem remains unsolved.
[0012]
The present invention has been made in view of the above mentioned problems, and an object of the present invention is to provide a possibility to reduce the data amount which has to be stored to the single memory compared to the case described in for the case of unevenly distributed data stored inside the memory elements of the memory array.
Solution of Problem
[0013]
According to an aspect of the present invention, there is provided a data transfer apparatus comprising:
a processing element array that has multiple processing elements controlled in a Single Instruction Multiple Data style; memory elements that are provided inside each of the processing elements, data access to all the memory elements of the processing elements being done in parallel;
a control unit controlling the processing element array in the Single Instruction Multiple Data style;
a data bus system connecting all of the processing elements with each other and with the control unit;
an single memory that exchanges data with the memory elements of the processing element array;
an end marker setting unit that is responsible to set an end marker at an end of a data stream stored inside the memory elements;
a marker evaluation unit for write direction; and
a marker evaluation unit for read direction,
wherein when transferring data from the processing element array to the single memory over the bus system, in the case that the end marker is detected regarding certain processing element, the marker evaluation unit for write direction has the task to delete the data which is transferred from that processing element in the following rows, and
when transferring data from the single memory to the processing element array over the bus system, in the case that the end marker is detected regarding certain processing element, the marker evaluation unit for read direction has the task to insert data for that processing element in the following line.
Advantageous Effects of Invention
[0014]
According to the present invention, unevenly distributed data inside the memory elements of the PE array can be transferred fast and efficient with small hardware
implementation costs and low data transfer amount increase to the single memory, because the end markers are set in advance and in the case that the end marker is detected regarding certain PE, the data from this PE in the following rows can be automatically skipped. Brief Description of Drawings
[0015]
[Fig. 1] Fig. 1 shows the architecture of the SIMD processor.
[Fig. 2] Fig. 2 shows an end marker setting unit.
[Fig. 3] Fig. 3 shows an example of state where the end markers have been added to data stored inside the each memory.
[Fig. 4A] Fig. 4A shows a situation where the memory is divided into sections of 4 bytes and 9 bytes data are stored inside the memory.
[Fig. 4B] Fig. 4B shows the end marker set at an unaligned position at the end of the data stream. [Fig. 4C] Fig. 4C shows the end maker set at an aligned position at the end of the data stream. [Fig. 5] Fig. 5 shows the marker evaluation unit for write direction of the marker evaluation apparatus.
[Fig. 6A] Fig. 6A shows the transition of flag values inside the flag register.
[Fig. 6B] Fig. 6B shows the transition of flag values inside the flag register.
[Fig. 7] Fig. 7 shows data that is stored in the PE array.
[Fig. 8] Fig. 8 shows the data that the end markers have been already set inside the each memory. [Fig. 9] Fig. 9 shows a flowchart that is executed in the marker evaluation unit for write direction 420.
[Fig. 10] Fig. 10 shows a flowchart that is executed in the marker evaluation unit for write direction 420.
[Fig. 11] Fig. 11 shows the data 1001 in the external single memory after the transfer is finished. [Fig. 12] Fig. 12 shows the marker evaluation unit for read direction of the marker evaluation apparatus.
[Fig. 13] Fig. 13 shows the operation of the selection switch.
[Fig. 14] Fig. 14 shows the stored data in the external memory and output data 1102 to the PE array.
[Fig. 15] Fig. 15 shows data that are transferred from the single external memory to the PE array. [Fig. 16] Fig. 16 shows a possible system design in which the SIMD processor with the example architecture could operate.
[Fig. 17] Fig. 17 shows the case where an end marker setting unit 1302 is placed next to the marker evaluation apparatus 1301 into the control unit 1300.
[Fig. 18] Fig. 18 shows a structure of a architecture used to explain the data transfer between an internal memory array and a single external memory in case of evenly distributed data in the memory array as presented in NPL1.
Description of Embodiments
[0016]
With reference to the accompanying drawings, exemplary embodiments of the present invention will be described. [First embodiment]
As a first embodiment, transfer of unevenly distributed data from the memory array to a single external memory will be described.
Fig. 1 shows the architecture of the SIMD processor 100. The architecture of the SIMD processor 100 in Fig. 1 has an array 200 of PEs 220. In the array 200, four PEs 210 compose one group 210 of PEs 220. In addition to a memory 230, each PEs 220 has an end marker setting unit 240.
[0017]
Fig. 2 shows the end marker setting unit 240. The end marker setting unit 240 adds in each PE 220 an end marker at the end of the data stream which should be transferred from the memory 230 of the PE 220 to a single memory 500. The setting of the end marker can be either at an unaligned position (Fig. 4B) or an aligned position (Fig. 4C).
[0018]
Fig. 3 shows an example of state where the end markers 600 have been added to data 231 stored inside the each memory 230. Here, we take as an example the situation where the memory is divided into sections of 4 bytes and 9 bytes data are stored inside the memory, as shown in Fig. 4A. In this case, Fig. 4B shows the end marker set at an unaligned position at the end of the data stream and Fig. 4C shows the end maker set at an aligned position at the end of the data stream.
[0019]
In the end marker setting unit 240, the selection whether the end marker or the data input is transferred to the data output is done using the data output selector 241. Data from the memory 230 is input to the end marker setting unit 240 sequentially.
The end marker setting unit 240 determines whether the input data from the memory 230 is the end data (the last data) or not.
When the input data is the end data, the data output selector 241 adds the end marker 600 to the end data. When the input data is not the end data, the data output selector 241 allow the input data pass without any change.
[0020]
Data is transferred between the PE array 200 and the single external memory 500 over a bus system 300 which is in this embodiment a pipelined ring bus.
Some registers (shift register) 310 are arranged over the ring bus 300 in such a way that, between two registers 310, either a group of PEs 210 or the control unit 400 is connected to the ring bus 300. In this embodiment, the ring bus 300 has a capacity of 128 bits and an each line 250 that connects each PE 220 and the ring bus 300 has a capacity of 32 bits.
[0021]
Between the ring bus 300 and the external memory 500 is provided a control unit 400. The control unit 400 has a marker evaluation apparatus 410, which apparatus has a marker evaluation unit for write direction 420 and a marker evaluation unit for read direction 430.
Transferred data is passing either the marker evaluation unit for write direction 420 or the marker evaluation unit for read direction 430 inside the marker evaluation apparatus 410.
[0022]
Fig. 5 shows the marker evaluation unit for write direction 420 of the marker evaluation apparatus 410. Data transferred from the memories 230 via the ring bus 300 is taken in to the control unit 400. The data taken into the control unit 400 is input to the marker evaluation unit for write direction 420. A comparator 421 is provided in the marker evaluation unit for write direction 420. Here, the comparator 421 has an inverter at an output terminal. In addition to the input data, an end marker code is input to the comparator 421.
[0023]
The data input is compared with the end marker code in a comparator 421. The result is stored in a flag register 422 which is provided at the latter stage of the comparator 421. The output of the flag register 422 controls a switch 423, which has the task to let the input data only pass to the output buffer 424 if not earlier an end marker had been detected for that PE. If an end marker had been detected for that PE, no data is allowed to pass.
[0024]
Fig. 6A and Fig. 6B shows the transition of flag values inside the flag register 422. The flag register 422 stores flag status for each PEs 220, which flag status represents whether the end marker of certain PE 220 had passed or not. As shown in Fig. 6A, at first, all flag values are 'T'.
Here, as an example, if all stored data in the PE6 had been sent and the end marker of PE6 reached to the comparator 421, the comparator output low level signal. As a result, the flag value for PE6 is changed to "0" as shown in Fig. 6B. When the flag for PE6 is "0", the switch 423 opens and does not pass data from PE6.
[0025]
Data from the switch are stored in an output buffer 424 temporarily. Here, as an example, the output buffer 424 has a capacity of 128 bytes.
[0026] Further on, the status of the output buffer 424 is checked in a comparator 425 whether it is full or not. In the case that the output buffer 424 is full, the data is sent to the single memory 500 by switching on a switch 426 and the buffer 424 is emptied by switching on a switch 427.
[0027]
Next, described is the operation of this SIMD processor 100. As shown in Fig. 7, data is stored in the PE array 200 and these data should be sent to the single external memory 500. In each PEs 220, as shown in Fig. 8, the end marker setting unit 240 adds the end marker to data which is stored in the own memory. The end marker is set either aligned or unaligned, as shown in Fig. 4B and Fig. 4C.
[0028]
Each PE 220 outputs data stored in the own memory to the ring bus 300 sequentially. The data output from each PE 220 are transferred to the control unit 400 and the control unit 400 takes in the data (ST100). Every time the control unit 400 takes in the data, the flowchart of Fig. 9 and Fig. 10 are executed in the marker evaluation unit for write direction 420.
[0029]
Received data (ST 100) is compared with the end marker code by the comparator 421 (ST110). The result is output the flag register 422 and updates an appropriate flag which is the flag for the PE 220 where the data element belongs to.
The flag value specifies whether the end marker has been transferred with this data element or not. When the input data = the end marker code (ST110: YES), the flag value is changed to "0" (ST 120). When the input data is not equal to the end marker code (ST110:NO), the flag value is kept at "1" and next data is received (ST100).
[0030]
The information of flag value is read out of the flag register 422 (ST200) and the data is transferred to the output buffer 424 depending on the flag value. This selection is performed with the switch 423.
In the case that the flag value is "1" (ST210: NO), the data is transferred to the output buffer 424 (ST230).
In the case that the flag value is "0" (ST210: YES), the data is not transferred (ST220). In other words, in the case that an end marker is detected regarding certain PE 220, the data from this PE in the following rows is automatically skipped.
[0031]
For an example, Fig. 11 shows the data 1001 in the external single memory 500 after the transfer is finished. The first data of each PE 220 is transferred to the external single memory 500 starting from the left side, then, the following rows are transferred. The end marker of PE6 is detected in the second row; therefore data from PE6 is skipped in the third line. Similarly, the end marker of PE3 is detected in the third row, therefore data from PE3 is skipped in the fourth line.
[0032]
Here, as already described in Fig. 4A, when taking an example that one data unit is composed of 4byte (32bits) data, if we can skip the process of one data unit, it reduces a lot of process steps. Moreover, when end markers are detected, we can skip the data from the PEs whose end markers are already detected in the following rows.
Therefore data transfer amount can be reduced dramatically.
[0033]
The output buffer 242 is checked whether all places are filled with elements (ST250). In the case that the output buffer 242 is full (ST250: YES), the data stored in the output buffer 242 is sent to the single external memory 500 while the content of the output buffer 242 is cleared.
[0034]
In this embodiment, unevenly distributed data inside the memory elements of the PE array 200 can be transferred fast and efficient with small hardware implementation costs and low data transfer amount increase to the single memory, because the end markers are set in advance and in the case that the end marker is detected regarding certain PE 220, the data from this PE in the following rows can be automatically skipped.
[0035]
[Second embodiment]
As a second embodiment, transfer of unevenly distributed data from the single external memory to the PE array will be described.
Fig. 12 shows the marker evaluation unit for read direction 430 of the marker evaluation apparatus 410. Data is transferred from the external memory 500 to the control unit 400.
Here, the data 1001 in the external single memory 500 is already processed so that the end marker is added to appropriate positions, after data from PEs are transferred in the manner described in the first embodiment.
[0036]
The data received into the control unit 400 is input to the marker evaluation unit for read direction 430. A comparator 431 , a flag register 432, an output buffer 424, a comparator 425, a switch 436, and a switch 437 are fundamentally equal to corresponding part of the maker evaluation unit for write direction 420 of the first embodiment.
A selection switch 433 is provided in the marker evaluation unit for read direction 430. The output of the flag register 432 controls the selection switch 433. The selection switch 433 has the task to let the input data pass to the output buffer 434 if not earlier an end marker had been detected for that PE. If an end marker had been detected, instead zero data is passed to the output buffer 434 for that PE.
[0037]
Next, described is the operation of this SIMD processor.
The operation of the flag register 432 is fundamentally equal to the operation of the flag register 422 of the marker evaluation unit for write direction 420. Fig. 9 and the explanation thereof can be applied to the flag register 432.
[0038]
Fig. 13 shows the operation of the selection switch 433.
First, the information of the flag register 432 is read out of the flag register (ST300) and the data input from the external memory 500 is transferred to the output buffer depending on the flag value. This selection is performed in the selection switch 433. In the case that the flag value is "1" (ST310: NO), the data is transferred to the output buffer434 (ST330). In the case that the flag value is "0" (ST310: YES), instead zero data is transferred to the output buffer 434 (ST320).
[0039]
Fig. 14 shows the stored data 1101 in the external memory 500 and output data 1102 to the PE array 200. Starting from the left side, data element is transferred sequentially to each memory of the PE array 200.
Then, the following rows are transferred. In the case that an end marker is detected, this end marker is the last data from the external memory 500 which is transferred for this PE to the PE array 200.
[0040]
Afterwards, only filling zeros are transferred for this PE.
As shown in Fig. 14, the end marker for PE6 is detected in the second line; therefore "zero data" is selected for PE6 in the third row. Data line 1102 is output to the ring bus 300 and each PE takes in the own data. As a result, PE array can take data as shown in Fig. 15 where we write "Zero" clearly so that it help reader understand the operation of this invention.
[0041]
In this embodiment, data from the external memory 500 can be stored in an unevenly distributed form inside the memory units of the PE array 200 efficiently with small hardware implementation costs.
[0042]
(Modified embodiment)
This invention is not limited to the embodiment described above.
Fig. 16 shows a possible system design in which the SIMD processor 1202 with the example architecture could operate. Other units inside the system could be a central processing unit 1201 and a single memory element 1203, which are all connected over connections 1205 to a bus system 1204.
[0043]
Moreover, as alternative to the implementation shown in Fig. 1 , Fig. 17 shows the case where the end marker setting units are taken out of each PE and one end marker setting unit (global marker setting unit) 1302 is placed next to the marker evaluation apparatus 1301 into the control unit 1300, responsible to set the end markers in all single memory elements of the memory array on request of the responsible processing elements.
[0044]
It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the claims.
Industrial applicability
[0045]
The present invention can be applied to a method and an apparatus for an image processing, and the image data can be acquired with a camera, a laser probe, or an internet.
Reference Signs List
[0046]
100 the SIMD processor
200 PE array
210 one group of PEs
220 PE (processing element)
230 memory inside a PE
240 end marker setting unit
300 ring bus shift register
control unit
marker evaluation apparatus
marker evaluation unit for write direction marker evaluation unit for read direction

Claims

[Claim 1]
A data transfer apparatus comprising:
a processing element array that comprises multiple processing elements controlled in a
Single Instruction Multiple Data style;
memory elements that are provided inside each of the processing elements, data access to all the memory elements of the processing elements being done in parallel;
a control unit controlling the processing element array in the Single Instruction Multiple Data style;
a bus system connecting all of the processing elements with each other and with the control unit;
an single memory that exchanges data with the memory elements of the processing element array;
an end marker setting unit that is responsible to set an end marker at an end of a data stream stored inside the memory elements;
a marker evaluation unit for write direction; and
a marker evaluation unit for read direction,
wherein when transferring data from the processing element array to the single memory over the bus system, in the case that the end marker is detected regarding certain processing element, the marker evaluation unit for write direction has the task to delete the data which is transferred from that processing element in the following rows, and
when transferring data from the single memory to the processing element array over the bus system, in the case that the end marker is detected regarding certain processing element, the marker evaluation unit for read direction has the task to insert data for that processing element in the following line.
[Claim 2]
The data transfer apparatus according to claim 1 , wherein the marker setting unit is provided inside each of the processing element.
[Claim 3]
The data transfer apparatus according to claim 1 , wherein the marker setting unit is provided inside the control unit.
[Claim 4]
The data transfer apparatus according to any one of claims 1, 2 and 3, wherein the marker setting unit adds the end marker at an aligned position or an aligned position.
[Claim 5]
The data transfer apparatus according to any one of claims 1 to 4, wherein the bus system is a ring bus.
[Claim 6]
The data transfer apparatus according to any one of claims 1 to 5, wherein the single memory is an external memory.
[Claim 7]
A data transfer method for transferring data between a processing element array that comprises multiple processing elements with own memory element and a single memory in parallel processing, the data transfer method comprising:
transferring data from the processing element array to the single memory over a bus system, and
transferring data from the single memory to the processing element array over the bus system,
wherein in a case of transferring data from the processing element array to the single memory over the bus system;
setting an end marker at an end of a data stream stored inside the memory elements;
transferring data from the processing element array to the single memory over a bus system;
detecting the end marker for certain processing element; and deleting the data which is transferred from that processing element in the following rows when the end marker is detected for the certain processing element, and,
in a case of transferring data from the single memory to the processing element array over the bus system;
transferring data from the single memory to the processing element array over the bus system;
detecting the end marker for certain processing element; and inserting data for that processing element in the following row when the end marker is detected regarding certain processing element.
PCT/JP2011/065739 2011-07-01 2011-07-01 Apparatus and method for a marker guided data transfer between a single memory and an array of memories with unevenly distributed data amount in an simd processor system WO2013005343A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2011/065739 WO2013005343A1 (en) 2011-07-01 2011-07-01 Apparatus and method for a marker guided data transfer between a single memory and an array of memories with unevenly distributed data amount in an simd processor system
TW101122857A TWI512614B (en) 2011-07-01 2012-06-26 Apparatus and method for a marker guided data transfer between a single memory and an array of memories with unevenly distributed data amount in an simd processor system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2011/065739 WO2013005343A1 (en) 2011-07-01 2011-07-01 Apparatus and method for a marker guided data transfer between a single memory and an array of memories with unevenly distributed data amount in an simd processor system

Publications (1)

Publication Number Publication Date
WO2013005343A1 true WO2013005343A1 (en) 2013-01-10

Family

ID=44628820

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2011/065739 WO2013005343A1 (en) 2011-07-01 2011-07-01 Apparatus and method for a marker guided data transfer between a single memory and an array of memories with unevenly distributed data amount in an simd processor system

Country Status (2)

Country Link
TW (1) TWI512614B (en)
WO (1) WO2013005343A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0594425A (en) 1991-10-01 1993-04-16 Nippon Telegr & Teleph Corp <Ntt> Task managing method using multiple list
JPH0675929B2 (en) 1990-02-22 1994-09-28 オーツタイヤ株式会社 Raw tire handling equipment
US6415366B1 (en) * 1999-06-02 2002-07-02 Alcatel Canada Inc. Method and apparatus for load distribution across memory banks with constrained access
WO2009013100A2 (en) 2007-07-20 2009-01-29 Basf Se Method of combating pollen beetles
WO2009131007A1 (en) 2008-04-22 2009-10-29 日本電気株式会社 Simd parallel computer system, simd parallel computing method, and control program

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100123717A1 (en) * 2008-11-20 2010-05-20 Via Technologies, Inc. Dynamic Scheduling in a Graphics Processor
JP5221332B2 (en) * 2008-12-27 2013-06-26 株式会社東芝 Memory system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0675929B2 (en) 1990-02-22 1994-09-28 オーツタイヤ株式会社 Raw tire handling equipment
JPH0594425A (en) 1991-10-01 1993-04-16 Nippon Telegr & Teleph Corp <Ntt> Task managing method using multiple list
US6415366B1 (en) * 1999-06-02 2002-07-02 Alcatel Canada Inc. Method and apparatus for load distribution across memory banks with constrained access
WO2009013100A2 (en) 2007-07-20 2009-01-29 Basf Se Method of combating pollen beetles
WO2009131007A1 (en) 2008-04-22 2009-10-29 日本電気株式会社 Simd parallel computer system, simd parallel computing method, and control program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KYO S ET AL: "A low-cost mixed-mode parallel processor architecture for embedded systems", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SUPERCOMPUTING - PROCEEDINGS OF ICS07: 21ST ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING 2007 ASSOCIATION FOR COMPUTING MACHINERY US,, 18 June 2007 (2007-06-18), pages 253 - 262, XP002502618, ISBN: 978-1-59593-768-1 *
S. KYO: "A Low-Cost Mixed-Mode Parallel Processor Architecture for Embedded Systems", PROCEEDINGS OF THE 21ST ANNUAL INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS'07, June 2007 (2007-06-01)

Also Published As

Publication number Publication date
TW201319932A (en) 2013-05-16
TWI512614B (en) 2015-12-11

Similar Documents

Publication Publication Date Title
EP0601029B1 (en) Massively parallel computer system including input/output arrangement
US7650484B2 (en) Array—type computer processor with reduced instruction storage
US6275921B1 (en) Data processing device to compress and decompress VLIW instructions by selectively storing non-branch NOP instructions
CN111159075B (en) Data transmission method and data transmission device
EP2565786A1 (en) Information processing device and task switching method
CN100541449C (en) Bus controller
CN112380148B (en) Data transmission method and data transmission device
EP2428893A2 (en) A reduction operation device, a processor, and a computer system
KR101627081B1 (en) Programmable logic controller
US20080048718A1 (en) Programmable gate array apparatus and method for switching circuits
WO2013005343A1 (en) Apparatus and method for a marker guided data transfer between a single memory and an array of memories with unevenly distributed data amount in an simd processor system
US8473649B2 (en) Command management device configured to store and manage received commands and storage apparatus with the same
CN103309831A (en) Data transmission device and data transmission method
US9996500B2 (en) Apparatus and method of a concurrent data transfer of multiple regions of interest (ROI) in an SIMD processor system
EP2000922A1 (en) Processor array system having function for data reallocation between high-speed pe
CN110113530B (en) Method and device for reconfiguring instruction chain of space infrared camera
CN112380154A (en) Data transmission method and data transmission device
US20070226468A1 (en) Arrangements for controlling instruction and data flow in a multi-processor environment
EP2085886A1 (en) Memory management device applied to shared-memory multiprocessor
JP4451433B2 (en) Parallel processor
US4133029A (en) Data processing system with two or more subsystems having combinational logic units for forming data paths between portions of the subsystems
JP4723334B2 (en) DMA transfer system
CN112506815B (en) Data transmission method and data transmission device
CN111260042A (en) Data selector, data processing method, chip and electronic equipment
KR101706201B1 (en) Direct memory access controller and operating method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11735565

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11735565

Country of ref document: EP

Kind code of ref document: A1