GB2356718A - Data processing - Google Patents

Data processing Download PDF

Info

Publication number
GB2356718A
GB2356718A GB0015766A GB0015766A GB2356718A GB 2356718 A GB2356718 A GB 2356718A GB 0015766 A GB0015766 A GB 0015766A GB 0015766 A GB0015766 A GB 0015766A GB 2356718 A GB2356718 A GB 2356718A
Authority
GB
United Kingdom
Prior art keywords
data
queue
processing
task
queues
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0015766A
Other versions
GB0015766D0 (en
GB2356718B (en
Inventor
Ken Cameron
Eamon O'dea
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pixelfusion Ltd
Original Assignee
Pixelfusion Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB9915060A external-priority patent/GB2355633A/en
Application filed by Pixelfusion Ltd filed Critical Pixelfusion Ltd
Priority to AU55545/00A priority Critical patent/AU5554500A/en
Priority to US10/019,188 priority patent/US6898692B1/en
Priority to PCT/GB2000/002474 priority patent/WO2001001352A1/en
Publication of GB0015766D0 publication Critical patent/GB0015766D0/en
Publication of GB2356718A publication Critical patent/GB2356718A/en
Application granted granted Critical
Publication of GB2356718B publication Critical patent/GB2356718B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/40Hidden part removal
    • G06T15/405Hidden part removal using Z-buffer

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Image Generation (AREA)
  • Executing Machine-Instructions (AREA)

Description

2356718 DATA PROCESSING The present invention relates to data processing,
and in particular to processing data items using a single instruction multiple data (SIMD) architecture.
Background of the Invention
Conventional data processing techniques process data serially through different tasks. For example see Figure 1 of the accompanying drawings which illustrates a conventional process in which data items (Data #1) are generated, for example by a result from a calculation or from a memory fetch operation, and are then processed by f irst task (task A). Task A results in new data (Data #2) f or processing by a second task (task B) to produce result data 0 data).
Conventionally these tasks need to be repeated for each new data item for processing.
In a single instruction multiple data (SIMD) architecture a number of processing elements act to process respective data items according to a single instruction at any one time. Such processing is illustrated in Figure 2 of the accompanying drawings, which show processing by n elements.
With a single instruction stream it is necessary for all the n processing elements to perform the same tasks, although each processing element has it's own data: this is SIMD. Every processing element generates a new item of data (Data#1 0 - Data#1 n). Each respective processing element then performs a respective Task A on its respective Data#1.
on completion of Task A, by each of the processing elements, some percentage (between 0% and 100%) of the processing elements will have a respective valid data HL74232/ TETRIS 3 item on which to perform a respective Task B. Since all the processing elements must perform the same Task at the same time, those without valid data are performing no useful work, and the set of processing elements, as a whole, are not working at full utilisation, i.e. maximum efficiency.
As the fraction of processing elements producing valid data, as a result of Task A, as input data (Data#2) to Task B decreases, the efficiency of the whole array of processing elements also decreases.
Furthermore, as the "cost" of Task B increases, i.e.
number of cycles required to perform the task, the utilisation of the whole of the processing flow decreases.
( - by way of an example, Fixed Point Processing requires approx 10 cycles for a typical 4 byte integer and Floating Point Processing requires approx 100 cycles for a 4 byte floating point number.) Clearly the f low through tasks A and B can be extended with further Tasks, i.e. Task C, Task D etc.
The output data from Task B feeds into Task C and clearly if Task B eliminates the data, Task c will suffer under-utilisation, and so on. Further Tasks can be cascaded in this fashion, with utilisation rapidly decreasing through each step as data items are eliminated.
Summary of the present Invention
In order to overcome the drawbacks of conventional SIMD processing, according to the present invention there is provided a method of processing data using a SIMD computer architecture having a plurality of processing elements for processing data, the method comprising: for each processing element defining at HL74232/ TETRIS 3 least one processing task, being operable to process input data to form task output data, defining a data queue for receiving data input to the task, and processing the data stored in the queue in a f irst in f irst out manner, when a predetermined condition is met.
Preferably, the predetermined condition is that either no further data items are available or a predetermined queue status is met.
Preferably, the predetermined queue status is that at least one of the queues is full.
Alternatively, the predetermined queue status is that all of the data queues have at least one data item.
Alternatively, the predetermined queue status is that a proportion of the queues have at least one data item.
Brief description of the Drawings
Figures 1 and 2 illustrate conventional data processing techniques; Figure 3 illustrates a data processing technique embodying one aspect of the present invention; and Figures 4 to 7 illustrate data queues in accordance with one aspect of the present invention.
Description of the preferred embodiment
Figure 3 illustrates a method embodying the present invention, which will be explained with reference to that Figure and to Figures 4 to 7. In Figures 4 to 7, one set of tasks and related queues for a single processor are shown for the sake of clarity.
It will be readily appreciated, however, that the definition of queues extends to many processors in a SIMD architecture.
HL74232/ TETRIS 3 Also, although the preferred embodiment is described in relation to at least one of the queues becoming full or no further data items being available before processing a successive task, it will be readily appreciated by a person skilled in the art that the successive task can be started upon other conditions being satisfied. For example, in response to all of the queues having at least one data item, in response to a proportion of the queues having at least one data item, by delaying the successive processing for a predetermined period of time, or after at least one of the queues has been filled to a predetermined level.
In step A of Figure 3 a data queues is defined for each SIMD processing element. In step B data is received for processing by the processing element in accordance with Task A. Not all of the processing elements will receive data items at the same time, since the source of the data items depends on the task to be performed and on the previous processing stage.
However, it could be expected that over a reasonable period of time, all of the elements would receive at least one data item. At step C, the new data item is examined to determine whether it can replace the data items currently stored in the queue for that element.
If this is the case, then, at step D the queue is cleared. The new data item is stored in the next available queue position (step E), which will be the first position if the queue has been cleared, or the next available position if data is already stored in the queue. It is to be noted that data is stored in the queue in a f irst in first out manner. Storage of the f irst new data item is shown in Figure 5. Assuming that the queue is not f ull (step F) and that there is more data available (step H) the process continues to receive new data items (steps B to E) until the queue HL74232/ TETRIS 3 is full or until no more data is available. A full queue is illustrated in Figure 6.
When data items are no linger received, the data stored in the queue is processed in a first in first out manner, i.e. The first data item to be stored in a queue is processed by Task A (step G) The result of the processing of the first data item by task A is supplied to the queue of Task B, as illustrated in Figure 7.
It will be appreciated that with a multiple processor design using a SIMD architecture that the processing elements in the architecture will probably all have data to be processed by Task A by the time one of the data queues is full. This results in greater utilisation of the processors in the architecture.
Preferably, each processing element has a queue defined for each of a number of expected tasks. For example, if three tasks A, B and C are expected to be processed sequentially, three queues will be defined.
It will therefore be appreciated that, with a queue present between sequential Tasks, it is not necessary to run Task B immediately after running each Task A.
Instead, Task A can be run multiple times, until one or more of the Task B queues is filled. When one or more of the queues situated between Tasks A and B is filled, it is at that point when Task B is eventually allowed to run.
If the distribution of the expected data is approximately random, then, for a sufficiently deep queue, it would be expected that most, if not all, queues would contain at least one data entry by the time Task B is run. Every processing element would have data on which it can perform Task B. The result of introducing a queue results in a much higher utilisation of the available processing power and HL74232/ TETRIS 3 therefore overall processing efficiency. Such efficiency would tend toward 100%.
The principle of introducing a queue between successive Tasks can be extended to any number of cascaded tasks. When a queue becomes full and can no longer accept input data, the preceding Task ceases processing and the next successive Task is run.
This means that a method of identifying when at least one queue has been filled is provided in order to change the instructions being issued from the first Task (A) to instructions for running the second Task (B).
A further refinement of this process is to add some rules to each Task that is placing data into a is queue so as to allow it to replace the current contents of the queue with a single new item. This effectively allows items which would otherwise have been processed by Task B to be eliminated after further items have been processed by Task A, but before the processing by Task B is performed.
By way of a practical example, the following now describes the computer graphics method of "deferred blending" in terms of the above principle.
Rasterising a primitive i.e. turning it from a geometric shape into a set of fragments, one per processor is Task A.
In an array of processing elements, some processing elements will have a fragment of the triangle and some will not. Those processing elements that do have fragment data can place it in the queue.
Shading and blending a fragment into the frame buffer is Task B. This is an expensive task, and it would not want to be performed when there would otherwise be low utilisation, i.e. low efficiency.
A f ragment only ends up in the queue if it is in HL74232/ TETRIS 3 front of preceding fragments. A simple rule could be added indicating when to discard and not discard the contents of a queue. If a fragment is opaque, all previous entries in the queue can be discarded and a blended fragment does not trigger this rule.
As mentioned above, although the preferred embodiment refers to Task B being run when either one or more of the queues between Tasks A and B is filled or no other data items are available, other alternative embodiments also fall within the scope of the invention as defined in the appended claims. For example, the Task B could be run in response to all of the queues having at least one data item, in response to a proportion of the queues having at least one data item, by delaying Task B for a predetermined period of time af ter Task A, or after at least one of the queues has been filled to a predetermined level.
HL74232/ TETRIS 3

Claims (9)

CLAIMS:
1. A method of processing data items in a single instruction multiple data (SIMD) processing architecture having a plurality of processing elements for processing data, the method comprising:
for each processing element defining a data queue having a plurality of queue positions; receiving a new data item for at least one processing element in the architecture; storing the data item in the next available queue position in the queue defined for the processing element concerned; receiving and storing further data items until a predetermined condition is met; and processing the first data item in each queue using the associated processing element, all of the processing element operating according to the same single instruction, thereby producing respective result data items.
2. A method as claimed in claim 1, wherein the predetermined condition comprises either no further data items being available or a predetermined queue status being met.
3. A method as claimed in claim 2, wherein the predetermined queue status relates to at least one of the queues becoming full.
4. A method as claimed in claim 2, wherein the predetermined queue status relates to all of the data queues having at least one data item.
5. A method as claimed in claim 2, wherein the predetermined queue status relates to a proportion of the queues having at least one data item.
6. A method as claimed in any preceding claim, wherein the received data item is examined to determine whether it replaces data items already stored in the HL74232/ TETRIS 3 queue concerned, and if so clearing the queue before storing that new data item.
7. A method as claimed in any preceding claim, wherein respective queues are defined for a plurality of processing tasks f or each processing element.
8. A method as claimed in any claim 7 wherein result data items produced by one task are supplied to a queue defined for a further task.
9. A method as claimed in claim 8, wherein the further task is processed by the processing elements in the array when the queue for that task is full.
HL74232/ TETRIS 3
GB0015766A 1999-06-28 2000-06-27 Data processing Expired - Fee Related GB2356718B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU55545/00A AU5554500A (en) 1999-06-28 2000-06-28 Method and apparatus for rendering in parallel a z-buffer with transparency
US10/019,188 US6898692B1 (en) 1999-06-28 2000-06-28 Method and apparatus for SIMD processing using multiple queues
PCT/GB2000/002474 WO2001001352A1 (en) 1999-06-28 2000-06-28 Method and apparatus for rendering in parallel a z-buffer with transparency

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB9915060A GB2355633A (en) 1999-06-28 1999-06-28 Processing graphical data
GB0006986A GB2352381B (en) 1999-06-28 2000-03-22 Processing graphical data

Publications (3)

Publication Number Publication Date
GB0015766D0 GB0015766D0 (en) 2000-08-16
GB2356718A true GB2356718A (en) 2001-05-30
GB2356718B GB2356718B (en) 2001-11-21

Family

ID=26243941

Family Applications (3)

Application Number Title Priority Date Filing Date
GB0120840A Expired - Fee Related GB2362552B (en) 1999-06-28 2000-03-22 Processing graphical data
GB0015766A Expired - Fee Related GB2356718B (en) 1999-06-28 2000-06-27 Data processing
GB0015678A Expired - Fee Related GB2356717B (en) 1999-06-28 2000-06-27 Data processing

Family Applications Before (1)

Application Number Title Priority Date Filing Date
GB0120840A Expired - Fee Related GB2362552B (en) 1999-06-28 2000-03-22 Processing graphical data

Family Applications After (1)

Application Number Title Priority Date Filing Date
GB0015678A Expired - Fee Related GB2356717B (en) 1999-06-28 2000-06-27 Data processing

Country Status (1)

Country Link
GB (3) GB2362552B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11776672B1 (en) 2020-12-16 2023-10-03 Express Scripts Strategic Development, Inc. System and method for dynamically scoring data objects
US11862315B2 (en) 2020-12-16 2024-01-02 Express Scripts Strategic Development, Inc. System and method for natural language processing
US11423067B1 (en) 2020-12-16 2022-08-23 Express Scripts Strategic Development, Inc. System and method for identifying data object combinations

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0424618A2 (en) * 1989-10-24 1991-05-02 International Business Machines Corporation Input/output system
US5790879A (en) * 1994-06-15 1998-08-04 Wu; Chen-Mie Pipelined-systolic single-instruction stream multiple-data stream (SIMD) array processing with broadcasting control, and method of operating same

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5923333A (en) * 1997-01-06 1999-07-13 Hewlett Packard Company Fast alpha transparency rendering method
JPH10320573A (en) * 1997-05-22 1998-12-04 Sega Enterp Ltd Picture processor, and method for processing picture
JP4399910B2 (en) * 1998-09-10 2010-01-20 株式会社セガ Image processing apparatus and method including blending processing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0424618A2 (en) * 1989-10-24 1991-05-02 International Business Machines Corporation Input/output system
US5790879A (en) * 1994-06-15 1998-08-04 Wu; Chen-Mie Pipelined-systolic single-instruction stream multiple-data stream (SIMD) array processing with broadcasting control, and method of operating same

Also Published As

Publication number Publication date
GB0015766D0 (en) 2000-08-16
GB0015678D0 (en) 2000-08-16
GB2362552B (en) 2003-12-10
GB0120840D0 (en) 2001-10-17
GB2356718B (en) 2001-11-21
GB2356717A (en) 2001-05-30
GB2362552A (en) 2001-11-21
GB2356717B (en) 2001-12-12

Similar Documents

Publication Publication Date Title
US8335812B2 (en) Methods and apparatus for efficient complex long multiplication and covariance matrix implementation
US6230180B1 (en) Digital signal processor configuration including multiplying units coupled to plural accumlators for enhanced parallel mac processing
US20160342418A1 (en) Functional unit having tree structure to support vector sorting algorithm and other algorithms
US20090077154A1 (en) Microprocessor
US6898692B1 (en) Method and apparatus for SIMD processing using multiple queues
US7769982B2 (en) Data processing apparatus and method for accelerating execution of subgraphs
US5704052A (en) Bit processing unit for performing complex logical operations within a single clock cycle
Ronquist Fast Fitch-parsimony algorithms for large data sets
US20030097391A1 (en) Methods and apparatus for performing parallel integer multiply accumulate operations
KR100812555B1 (en) Arrangement, system and method for vector permutation in single-instruction multiple-data microprocessors
US6715065B1 (en) Micro program control method and apparatus thereof having branch instructions
JP3955741B2 (en) SIMD type microprocessor having sort function
US5778208A (en) Flexible pipeline for interlock removal
GB2356718A (en) Data processing
EP1634163B1 (en) Result partitioning within simd data processing systems
US5974531A (en) Methods and systems of stack renaming for superscalar stack-based data processors
JP2007183712A (en) Data driven information processor
EP0992917B1 (en) Linear vector computation
EP0775970B1 (en) Graphical image convolution
US7107478B2 (en) Data processing system having a Cartesian Controller
EP1132813A2 (en) Computer with high-speed context switching
US20030126178A1 (en) Fast forwarding ALU
US6757813B1 (en) Processor
JP3088956B2 (en) Arithmetic unit
JP3264114B2 (en) Sorting device

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)

Free format text: REGISTERED BETWEEN 20101111 AND 20101117

PCNP Patent ceased through non-payment of renewal fee

Effective date: 20180627