GB2356718A - Data processing - Google Patents
Data processing Download PDFInfo
- Publication number
- GB2356718A GB2356718A GB0015766A GB0015766A GB2356718A GB 2356718 A GB2356718 A GB 2356718A GB 0015766 A GB0015766 A GB 0015766A GB 0015766 A GB0015766 A GB 0015766A GB 2356718 A GB2356718 A GB 2356718A
- Authority
- GB
- United Kingdom
- Prior art keywords
- data
- queue
- processing
- task
- queues
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/40—Hidden part removal
- G06T15/405—Hidden part removal using Z-buffer
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Processing (AREA)
- Image Generation (AREA)
- Executing Machine-Instructions (AREA)
Description
2356718 DATA PROCESSING The present invention relates to data processing,
and in particular to processing data items using a single instruction multiple data (SIMD) architecture.
Background of the Invention
Conventional data processing techniques process data serially through different tasks. For example see Figure 1 of the accompanying drawings which illustrates a conventional process in which data items (Data #1) are generated, for example by a result from a calculation or from a memory fetch operation, and are then processed by f irst task (task A). Task A results in new data (Data #2) f or processing by a second task (task B) to produce result data 0 data).
Conventionally these tasks need to be repeated for each new data item for processing.
In a single instruction multiple data (SIMD) architecture a number of processing elements act to process respective data items according to a single instruction at any one time. Such processing is illustrated in Figure 2 of the accompanying drawings, which show processing by n elements.
With a single instruction stream it is necessary for all the n processing elements to perform the same tasks, although each processing element has it's own data: this is SIMD. Every processing element generates a new item of data (Data#1 0 - Data#1 n). Each respective processing element then performs a respective Task A on its respective Data#1.
on completion of Task A, by each of the processing elements, some percentage (between 0% and 100%) of the processing elements will have a respective valid data HL74232/ TETRIS 3 item on which to perform a respective Task B. Since all the processing elements must perform the same Task at the same time, those without valid data are performing no useful work, and the set of processing elements, as a whole, are not working at full utilisation, i.e. maximum efficiency.
As the fraction of processing elements producing valid data, as a result of Task A, as input data (Data#2) to Task B decreases, the efficiency of the whole array of processing elements also decreases.
Furthermore, as the "cost" of Task B increases, i.e.
number of cycles required to perform the task, the utilisation of the whole of the processing flow decreases.
( - by way of an example, Fixed Point Processing requires approx 10 cycles for a typical 4 byte integer and Floating Point Processing requires approx 100 cycles for a 4 byte floating point number.) Clearly the f low through tasks A and B can be extended with further Tasks, i.e. Task C, Task D etc.
The output data from Task B feeds into Task C and clearly if Task B eliminates the data, Task c will suffer under-utilisation, and so on. Further Tasks can be cascaded in this fashion, with utilisation rapidly decreasing through each step as data items are eliminated.
Summary of the present Invention
In order to overcome the drawbacks of conventional SIMD processing, according to the present invention there is provided a method of processing data using a SIMD computer architecture having a plurality of processing elements for processing data, the method comprising: for each processing element defining at HL74232/ TETRIS 3 least one processing task, being operable to process input data to form task output data, defining a data queue for receiving data input to the task, and processing the data stored in the queue in a f irst in f irst out manner, when a predetermined condition is met.
Preferably, the predetermined condition is that either no further data items are available or a predetermined queue status is met.
Preferably, the predetermined queue status is that at least one of the queues is full.
Alternatively, the predetermined queue status is that all of the data queues have at least one data item.
Alternatively, the predetermined queue status is that a proportion of the queues have at least one data item.
Brief description of the Drawings
Figures 1 and 2 illustrate conventional data processing techniques; Figure 3 illustrates a data processing technique embodying one aspect of the present invention; and Figures 4 to 7 illustrate data queues in accordance with one aspect of the present invention.
Description of the preferred embodiment
Figure 3 illustrates a method embodying the present invention, which will be explained with reference to that Figure and to Figures 4 to 7. In Figures 4 to 7, one set of tasks and related queues for a single processor are shown for the sake of clarity.
It will be readily appreciated, however, that the definition of queues extends to many processors in a SIMD architecture.
HL74232/ TETRIS 3 Also, although the preferred embodiment is described in relation to at least one of the queues becoming full or no further data items being available before processing a successive task, it will be readily appreciated by a person skilled in the art that the successive task can be started upon other conditions being satisfied. For example, in response to all of the queues having at least one data item, in response to a proportion of the queues having at least one data item, by delaying the successive processing for a predetermined period of time, or after at least one of the queues has been filled to a predetermined level.
In step A of Figure 3 a data queues is defined for each SIMD processing element. In step B data is received for processing by the processing element in accordance with Task A. Not all of the processing elements will receive data items at the same time, since the source of the data items depends on the task to be performed and on the previous processing stage.
However, it could be expected that over a reasonable period of time, all of the elements would receive at least one data item. At step C, the new data item is examined to determine whether it can replace the data items currently stored in the queue for that element.
If this is the case, then, at step D the queue is cleared. The new data item is stored in the next available queue position (step E), which will be the first position if the queue has been cleared, or the next available position if data is already stored in the queue. It is to be noted that data is stored in the queue in a f irst in first out manner. Storage of the f irst new data item is shown in Figure 5. Assuming that the queue is not f ull (step F) and that there is more data available (step H) the process continues to receive new data items (steps B to E) until the queue HL74232/ TETRIS 3 is full or until no more data is available. A full queue is illustrated in Figure 6.
When data items are no linger received, the data stored in the queue is processed in a first in first out manner, i.e. The first data item to be stored in a queue is processed by Task A (step G) The result of the processing of the first data item by task A is supplied to the queue of Task B, as illustrated in Figure 7.
It will be appreciated that with a multiple processor design using a SIMD architecture that the processing elements in the architecture will probably all have data to be processed by Task A by the time one of the data queues is full. This results in greater utilisation of the processors in the architecture.
Preferably, each processing element has a queue defined for each of a number of expected tasks. For example, if three tasks A, B and C are expected to be processed sequentially, three queues will be defined.
It will therefore be appreciated that, with a queue present between sequential Tasks, it is not necessary to run Task B immediately after running each Task A.
Instead, Task A can be run multiple times, until one or more of the Task B queues is filled. When one or more of the queues situated between Tasks A and B is filled, it is at that point when Task B is eventually allowed to run.
If the distribution of the expected data is approximately random, then, for a sufficiently deep queue, it would be expected that most, if not all, queues would contain at least one data entry by the time Task B is run. Every processing element would have data on which it can perform Task B. The result of introducing a queue results in a much higher utilisation of the available processing power and HL74232/ TETRIS 3 therefore overall processing efficiency. Such efficiency would tend toward 100%.
The principle of introducing a queue between successive Tasks can be extended to any number of cascaded tasks. When a queue becomes full and can no longer accept input data, the preceding Task ceases processing and the next successive Task is run.
This means that a method of identifying when at least one queue has been filled is provided in order to change the instructions being issued from the first Task (A) to instructions for running the second Task (B).
A further refinement of this process is to add some rules to each Task that is placing data into a is queue so as to allow it to replace the current contents of the queue with a single new item. This effectively allows items which would otherwise have been processed by Task B to be eliminated after further items have been processed by Task A, but before the processing by Task B is performed.
By way of a practical example, the following now describes the computer graphics method of "deferred blending" in terms of the above principle.
Rasterising a primitive i.e. turning it from a geometric shape into a set of fragments, one per processor is Task A.
In an array of processing elements, some processing elements will have a fragment of the triangle and some will not. Those processing elements that do have fragment data can place it in the queue.
Shading and blending a fragment into the frame buffer is Task B. This is an expensive task, and it would not want to be performed when there would otherwise be low utilisation, i.e. low efficiency.
A f ragment only ends up in the queue if it is in HL74232/ TETRIS 3 front of preceding fragments. A simple rule could be added indicating when to discard and not discard the contents of a queue. If a fragment is opaque, all previous entries in the queue can be discarded and a blended fragment does not trigger this rule.
As mentioned above, although the preferred embodiment refers to Task B being run when either one or more of the queues between Tasks A and B is filled or no other data items are available, other alternative embodiments also fall within the scope of the invention as defined in the appended claims. For example, the Task B could be run in response to all of the queues having at least one data item, in response to a proportion of the queues having at least one data item, by delaying Task B for a predetermined period of time af ter Task A, or after at least one of the queues has been filled to a predetermined level.
HL74232/ TETRIS 3
Claims (9)
1. A method of processing data items in a single instruction multiple data (SIMD) processing architecture having a plurality of processing elements for processing data, the method comprising:
for each processing element defining a data queue having a plurality of queue positions; receiving a new data item for at least one processing element in the architecture; storing the data item in the next available queue position in the queue defined for the processing element concerned; receiving and storing further data items until a predetermined condition is met; and processing the first data item in each queue using the associated processing element, all of the processing element operating according to the same single instruction, thereby producing respective result data items.
2. A method as claimed in claim 1, wherein the predetermined condition comprises either no further data items being available or a predetermined queue status being met.
3. A method as claimed in claim 2, wherein the predetermined queue status relates to at least one of the queues becoming full.
4. A method as claimed in claim 2, wherein the predetermined queue status relates to all of the data queues having at least one data item.
5. A method as claimed in claim 2, wherein the predetermined queue status relates to a proportion of the queues having at least one data item.
6. A method as claimed in any preceding claim, wherein the received data item is examined to determine whether it replaces data items already stored in the HL74232/ TETRIS 3 queue concerned, and if so clearing the queue before storing that new data item.
7. A method as claimed in any preceding claim, wherein respective queues are defined for a plurality of processing tasks f or each processing element.
8. A method as claimed in any claim 7 wherein result data items produced by one task are supplied to a queue defined for a further task.
9. A method as claimed in claim 8, wherein the further task is processed by the processing elements in the array when the queue for that task is full.
HL74232/ TETRIS 3
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU55545/00A AU5554500A (en) | 1999-06-28 | 2000-06-28 | Method and apparatus for rendering in parallel a z-buffer with transparency |
US10/019,188 US6898692B1 (en) | 1999-06-28 | 2000-06-28 | Method and apparatus for SIMD processing using multiple queues |
PCT/GB2000/002474 WO2001001352A1 (en) | 1999-06-28 | 2000-06-28 | Method and apparatus for rendering in parallel a z-buffer with transparency |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9915060A GB2355633A (en) | 1999-06-28 | 1999-06-28 | Processing graphical data |
GB0006986A GB2352381B (en) | 1999-06-28 | 2000-03-22 | Processing graphical data |
Publications (3)
Publication Number | Publication Date |
---|---|
GB0015766D0 GB0015766D0 (en) | 2000-08-16 |
GB2356718A true GB2356718A (en) | 2001-05-30 |
GB2356718B GB2356718B (en) | 2001-11-21 |
Family
ID=26243941
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB0120840A Expired - Fee Related GB2362552B (en) | 1999-06-28 | 2000-03-22 | Processing graphical data |
GB0015766A Expired - Fee Related GB2356718B (en) | 1999-06-28 | 2000-06-27 | Data processing |
GB0015678A Expired - Fee Related GB2356717B (en) | 1999-06-28 | 2000-06-27 | Data processing |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB0120840A Expired - Fee Related GB2362552B (en) | 1999-06-28 | 2000-03-22 | Processing graphical data |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB0015678A Expired - Fee Related GB2356717B (en) | 1999-06-28 | 2000-06-27 | Data processing |
Country Status (1)
Country | Link |
---|---|
GB (3) | GB2362552B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11776672B1 (en) | 2020-12-16 | 2023-10-03 | Express Scripts Strategic Development, Inc. | System and method for dynamically scoring data objects |
US11862315B2 (en) | 2020-12-16 | 2024-01-02 | Express Scripts Strategic Development, Inc. | System and method for natural language processing |
US11423067B1 (en) | 2020-12-16 | 2022-08-23 | Express Scripts Strategic Development, Inc. | System and method for identifying data object combinations |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0424618A2 (en) * | 1989-10-24 | 1991-05-02 | International Business Machines Corporation | Input/output system |
US5790879A (en) * | 1994-06-15 | 1998-08-04 | Wu; Chen-Mie | Pipelined-systolic single-instruction stream multiple-data stream (SIMD) array processing with broadcasting control, and method of operating same |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5923333A (en) * | 1997-01-06 | 1999-07-13 | Hewlett Packard Company | Fast alpha transparency rendering method |
JPH10320573A (en) * | 1997-05-22 | 1998-12-04 | Sega Enterp Ltd | Picture processor, and method for processing picture |
JP4399910B2 (en) * | 1998-09-10 | 2010-01-20 | 株式会社セガ | Image processing apparatus and method including blending processing |
-
2000
- 2000-03-22 GB GB0120840A patent/GB2362552B/en not_active Expired - Fee Related
- 2000-06-27 GB GB0015766A patent/GB2356718B/en not_active Expired - Fee Related
- 2000-06-27 GB GB0015678A patent/GB2356717B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0424618A2 (en) * | 1989-10-24 | 1991-05-02 | International Business Machines Corporation | Input/output system |
US5790879A (en) * | 1994-06-15 | 1998-08-04 | Wu; Chen-Mie | Pipelined-systolic single-instruction stream multiple-data stream (SIMD) array processing with broadcasting control, and method of operating same |
Also Published As
Publication number | Publication date |
---|---|
GB0015766D0 (en) | 2000-08-16 |
GB0015678D0 (en) | 2000-08-16 |
GB2362552B (en) | 2003-12-10 |
GB0120840D0 (en) | 2001-10-17 |
GB2356718B (en) | 2001-11-21 |
GB2356717A (en) | 2001-05-30 |
GB2362552A (en) | 2001-11-21 |
GB2356717B (en) | 2001-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8335812B2 (en) | Methods and apparatus for efficient complex long multiplication and covariance matrix implementation | |
US6230180B1 (en) | Digital signal processor configuration including multiplying units coupled to plural accumlators for enhanced parallel mac processing | |
US20160342418A1 (en) | Functional unit having tree structure to support vector sorting algorithm and other algorithms | |
US20090077154A1 (en) | Microprocessor | |
US6898692B1 (en) | Method and apparatus for SIMD processing using multiple queues | |
US7769982B2 (en) | Data processing apparatus and method for accelerating execution of subgraphs | |
US5704052A (en) | Bit processing unit for performing complex logical operations within a single clock cycle | |
Ronquist | Fast Fitch-parsimony algorithms for large data sets | |
US20030097391A1 (en) | Methods and apparatus for performing parallel integer multiply accumulate operations | |
KR100812555B1 (en) | Arrangement, system and method for vector permutation in single-instruction multiple-data microprocessors | |
US6715065B1 (en) | Micro program control method and apparatus thereof having branch instructions | |
JP3955741B2 (en) | SIMD type microprocessor having sort function | |
US5778208A (en) | Flexible pipeline for interlock removal | |
GB2356718A (en) | Data processing | |
EP1634163B1 (en) | Result partitioning within simd data processing systems | |
US5974531A (en) | Methods and systems of stack renaming for superscalar stack-based data processors | |
JP2007183712A (en) | Data driven information processor | |
EP0992917B1 (en) | Linear vector computation | |
EP0775970B1 (en) | Graphical image convolution | |
US7107478B2 (en) | Data processing system having a Cartesian Controller | |
EP1132813A2 (en) | Computer with high-speed context switching | |
US20030126178A1 (en) | Fast forwarding ALU | |
US6757813B1 (en) | Processor | |
JP3088956B2 (en) | Arithmetic unit | |
JP3264114B2 (en) | Sorting device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
732E | Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977) | ||
732E | Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977) |
Free format text: REGISTERED BETWEEN 20101111 AND 20101117 |
|
PCNP | Patent ceased through non-payment of renewal fee |
Effective date: 20180627 |