GB2356718A

GB2356718A - Data processing

Info

Publication number: GB2356718A
Application number: GB0015766A
Authority: GB
Inventors: Ken Cameron; Eamon O'dea
Original assignee: Pixelfusion Ltd
Current assignee: Pixelfusion Ltd
Priority date: 1999-06-28
Filing date: 2000-06-27
Publication date: 2001-05-30
Anticipated expiration: 2020-06-27
Also published as: GB0015766D0; GB0015678D0; GB2362552B; GB0120840D0; GB2356718B; GB2356717A; GB2362552A; GB2356717B

Description

2356718 DATA PROCESSING The present invention relates to data processing,

and in particular to processing data items using a single instruction multiple data (SIMD) architecture.

Background of the Invention

Conventional data processing techniques process data serially through different tasks. For example see Figure 1 of the accompanying drawings which illustrates a conventional process in which data items (Data #1) are generated, for example by a result from a calculation or from a memory fetch operation, and are then processed by f irst task (task A). Task A results in new data (Data #2) f or processing by a second task (task B) to produce result data 0 data).

Conventionally these tasks need to be repeated for each new data item for processing.

In a single instruction multiple data (SIMD) architecture a number of processing elements act to process respective data items according to a single instruction at any one time. Such processing is illustrated in Figure 2 of the accompanying drawings, which show processing by n elements.

With a single instruction stream it is necessary for all the n processing elements to perform the same tasks, although each processing element has it's own data: this is SIMD. Every processing element generates a new item of data (Data#1 0 - Data#1 n). Each respective processing element then performs a respective Task A on its respective Data#1.

on completion of Task A, by each of the processing elements, some percentage (between 0% and 100%) of the processing elements will have a respective valid data HL74232/ TETRIS 3 item on which to perform a respective Task B. Since all the processing elements must perform the same Task at the same time, those without valid data are performing no useful work, and the set of processing elements, as a whole, are not working at full utilisation, i.e. maximum efficiency.

As the fraction of processing elements producing valid data, as a result of Task A, as input data (Data#2) to Task B decreases, the efficiency of the whole array of processing elements also decreases.

Furthermore, as the "cost" of Task B increases, i.e.

number of cycles required to perform the task, the utilisation of the whole of the processing flow decreases.

( - by way of an example, Fixed Point Processing requires approx 10 cycles for a typical 4 byte integer and Floating Point Processing requires approx 100 cycles for a 4 byte floating point number.) Clearly the f low through tasks A and B can be extended with further Tasks, i.e. Task C, Task D etc.

The output data from Task B feeds into Task C and clearly if Task B eliminates the data, Task c will suffer under-utilisation, and so on. Further Tasks can be cascaded in this fashion, with utilisation rapidly decreasing through each step as data items are eliminated.

Summary of the present Invention

In order to overcome the drawbacks of conventional SIMD processing, according to the present invention there is provided a method of processing data using a SIMD computer architecture having a plurality of processing elements for processing data, the method comprising: for each processing element defining at HL74232/ TETRIS 3 least one processing task, being operable to process input data to form task output data, defining a data queue for receiving data input to the task, and processing the data stored in the queue in a f irst in f irst out manner, when a predetermined condition is met.

Preferably, the predetermined condition is that either no further data items are available or a predetermined queue status is met.

Preferably, the predetermined queue status is that at least one of the queues is full.

Alternatively, the predetermined queue status is that all of the data queues have at least one data item.

Alternatively, the predetermined queue status is that a proportion of the queues have at least one data item.

Brief description of the Drawings

Figures 1 and 2 illustrate conventional data processing techniques; Figure 3 illustrates a data processing technique embodying one aspect of the present invention; and Figures 4 to 7 illustrate data queues in accordance with one aspect of the present invention.

Description of the preferred embodiment

Figure 3 illustrates a method embodying the present invention, which will be explained with reference to that Figure and to Figures 4 to 7. In Figures 4 to 7, one set of tasks and related queues for a single processor are shown for the sake of clarity.

It will be readily appreciated, however, that the definition of queues extends to many processors in a SIMD architecture.

HL74232/ TETRIS 3 Also, although the preferred embodiment is described in relation to at least one of the queues becoming full or no further data items being available before processing a successive task, it will be readily appreciated by a person skilled in the art that the successive task can be started upon other conditions being satisfied. For example, in response to all of the queues having at least one data item, in response to a proportion of the queues having at least one data item, by delaying the successive processing for a predetermined period of time, or after at least one of the queues has been filled to a predetermined level.

In step A of Figure 3 a data queues is defined for each SIMD processing element. In step B data is received for processing by the processing element in accordance with Task A. Not all of the processing elements will receive data items at the same time, since the source of the data items depends on the task to be performed and on the previous processing stage.

However, it could be expected that over a reasonable period of time, all of the elements would receive at least one data item. At step C, the new data item is examined to determine whether it can replace the data items currently stored in the queue for that element.

If this is the case, then, at step D the queue is cleared. The new data item is stored in the next available queue position (step E), which will be the first position if the queue has been cleared, or the next available position if data is already stored in the queue. It is to be noted that data is stored in the queue in a f irst in first out manner. Storage of the f irst new data item is shown in Figure 5. Assuming that the queue is not f ull (step F) and that there is more data available (step H) the process continues to receive new data items (steps B to E) until the queue HL74232/ TETRIS 3 is full or until no more data is available. A full queue is illustrated in Figure 6.

When data items are no linger received, the data stored in the queue is processed in a first in first out manner, i.e. The first data item to be stored in a queue is processed by Task A (step G) The result of the processing of the first data item by task A is supplied to the queue of Task B, as illustrated in Figure 7.

It will be appreciated that with a multiple processor design using a SIMD architecture that the processing elements in the architecture will probably all have data to be processed by Task A by the time one of the data queues is full. This results in greater utilisation of the processors in the architecture.

Preferably, each processing element has a queue defined for each of a number of expected tasks. For example, if three tasks A, B and C are expected to be processed sequentially, three queues will be defined.

It will therefore be appreciated that, with a queue present between sequential Tasks, it is not necessary to run Task B immediately after running each Task A.

Instead, Task A can be run multiple times, until one or more of the Task B queues is filled. When one or more of the queues situated between Tasks A and B is filled, it is at that point when Task B is eventually allowed to run.

If the distribution of the expected data is approximately random, then, for a sufficiently deep queue, it would be expected that most, if not all, queues would contain at least one data entry by the time Task B is run. Every processing element would have data on which it can perform Task B. The result of introducing a queue results in a much higher utilisation of the available processing power and HL74232/ TETRIS 3 therefore overall processing efficiency. Such efficiency would tend toward 100%.

The principle of introducing a queue between successive Tasks can be extended to any number of cascaded tasks. When a queue becomes full and can no longer accept input data, the preceding Task ceases processing and the next successive Task is run.

This means that a method of identifying when at least one queue has been filled is provided in order to change the instructions being issued from the first Task (A) to instructions for running the second Task (B).

A further refinement of this process is to add some rules to each Task that is placing data into a is queue so as to allow it to replace the current contents of the queue with a single new item. This effectively allows items which would otherwise have been processed by Task B to be eliminated after further items have been processed by Task A, but before the processing by Task B is performed.

By way of a practical example, the following now describes the computer graphics method of "deferred blending" in terms of the above principle.

Rasterising a primitive i.e. turning it from a geometric shape into a set of fragments, one per processor is Task A.

In an array of processing elements, some processing elements will have a fragment of the triangle and some will not. Those processing elements that do have fragment data can place it in the queue.

Shading and blending a fragment into the frame buffer is Task B. This is an expensive task, and it would not want to be performed when there would otherwise be low utilisation, i.e. low efficiency.

A f ragment only ends up in the queue if it is in HL74232/ TETRIS 3 front of preceding fragments. A simple rule could be added indicating when to discard and not discard the contents of a queue. If a fragment is opaque, all previous entries in the queue can be discarded and a blended fragment does not trigger this rule.

As mentioned above, although the preferred embodiment refers to Task B being run when either one or more of the queues between Tasks A and B is filled or no other data items are available, other alternative embodiments also fall within the scope of the invention as defined in the appended claims. For example, the Task B could be run in response to all of the queues having at least one data item, in response to a proportion of the queues having at least one data item, by delaying Task B for a predetermined period of time af ter Task A, or after at least one of the queues has been filled to a predetermined level.

HL74232/ TETRIS 3

Claims

CLAIMS:

1. A method of processing data items in a single instruction multiple data (SIMD) processing architecture having a plurality of processing elements for processing data, the method comprising:

for each processing element defining a data queue having a plurality of queue positions; receiving a new data item for at least one processing element in the architecture; storing the data item in the next available queue position in the queue defined for the processing element concerned; receiving and storing further data items until a predetermined condition is met; and processing the first data item in each queue using the associated processing element, all of the processing element operating according to the same single instruction, thereby producing respective result data items.

2. A method as claimed in claim 1, wherein the predetermined condition comprises either no further data items being available or a predetermined queue status being met.

3. A method as claimed in claim 2, wherein the predetermined queue status relates to at least one of the queues becoming full.

4. A method as claimed in claim 2, wherein the predetermined queue status relates to all of the data queues having at least one data item.

5. A method as claimed in claim 2, wherein the predetermined queue status relates to a proportion of the queues having at least one data item.

6. A method as claimed in any preceding claim, wherein the received data item is examined to determine whether it replaces data items already stored in the HL74232/ TETRIS 3 queue concerned, and if so clearing the queue before storing that new data item.

7. A method as claimed in any preceding claim, wherein respective queues are defined for a plurality of processing tasks f or each processing element.

8. A method as claimed in any claim 7 wherein result data items produced by one task are supplied to a queue defined for a further task.

9. A method as claimed in claim 8, wherein the further task is processed by the processing elements in the array when the queue for that task is full.

HL74232/ TETRIS 3